CN113076127A - Method, system, electronic device and medium for extracting question and answer content in programming environment - Google Patents

Method, system, electronic device and medium for extracting question and answer content in programming environment Download PDF

Info

Publication number
CN113076127A
CN113076127A CN202110449778.0A CN202110449778A CN113076127A CN 113076127 A CN113076127 A CN 113076127A CN 202110449778 A CN202110449778 A CN 202110449778A CN 113076127 A CN113076127 A CN 113076127A
Authority
CN
China
Prior art keywords
words
question
text
answer
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110449778.0A
Other languages
Chinese (zh)
Other versions
CN113076127B (en
Inventor
陈林
赵恒辉
李言辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202110449778.0A priority Critical patent/CN113076127B/en
Priority to PCT/CN2021/089820 priority patent/WO2022226714A1/en
Publication of CN113076127A publication Critical patent/CN113076127A/en
Application granted granted Critical
Publication of CN113076127B publication Critical patent/CN113076127B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a method, a system, electronic equipment and a medium for extracting question and answer contents in a programming environment, wherein the system comprises the following components: a data processing module for performing: preprocessing input network question and answer text data, removing useless information and performing word segmentation; an entity identification module to perform: performing entity recognition in the field of software engineering on the text processed by the data processing module; a document reading module to perform: inputting the text identified by the entity identification module into a neural network for document reading; a digest extraction module for performing: and extracting key contents in the question and answer text by using another neural network. The invention can extract the key content in the technical question and answer, reduce the browsing time of developers and improve the development efficiency of a programming field.

Description

Method, system, electronic device and medium for extracting question and answer content in programming environment
Technical Field
The invention relates to a method, a system, electronic equipment and a medium for extracting question and answer contents in a programming environment, and belongs to the technical field of internet.
Background
Software development is a flexible and challenging task, and developers need strong learning ability and problem solving ability. On the programming field, developers can frequently seek network help except for looking up a tool book when encountering problems, ask other developers who encounter similar problems, use solutions of other people for reference, avoid repeated labor and improve development efficiency. Therefore, the software question-answering community is gradually activated, and a platform which helps each other and records the problems is provided for developers.
Active developers on a technical question and answer platform are more and more, the active developers put forward questions to answer the questions and provide ideas for solving the questions for other developers who encounter similar questions, but not all the questions can be solved on the platform, and a large amount of redundant information and irrelevant information exist on the platform, so that the assistance for the developers is hindered. A question on the technical question-and-answer platform often corresponds to more than one answer, and there are cases where answers are irrelevant to the question, cases where answers are similar to each other in a repeated manner, and cases where some relevant parts are irrelevant and some parts are repeated in the answer. Much effort has also been made by the platform for these situations, such as Stack Overflow to allow users to score each answer to a question, and to allow answers with high scores to be seen by more people. This solves the interference of irrelevant information to some extent, but still has considerable limitations. If all answers under the same question are taken as a document, all answers are abstracted and marked with key contents, the function similar to 'highlight' can be achieved, the user is helped to shorten the browsing time, and the development efficiency of a programming field is improved.
Text summarization techniques may convert a text or a collection of texts into a short summary containing key information. The text abstract can be divided into an abstract type abstract and a generated abstract type abstract according to output types, wherein the abstract type abstract is an abstract formed by directly extracting a plurality of sentences from an original text and sequencing and recombining the sentences. The abstract is applied to the technical question-answering community, so that key contents in answers can be extracted, and developers can be helped to quickly locate the desired answer contents.
In recent years, scholars have proposed a number of methods for abstract extraction. Julian Kupiec et al propose that abstract extraction can be regarded as a classic classification problem, a series of training document data and an abstract result of manual extraction are given, a classifier is obtained through training, and the probability that a given sentence can be included in an abstract is obtained; conroy and O' Leary propose to use hidden Markov model to abstract and extract, and obtain the best effect compared with other models at that time; erkanand proposes a graph-based algorithm LexPageRank, and when the cosine similarity of two sentences exceeds a certain threshold value, a corresponding edge is added into a connection matrix, so that the importance of the sentences is calculated through the connection matrix; woodsend et al propose a model of joint content selection and compression for document summarization, which uses integer linear programming to select and combine terms to form a summary according to length, coverage and grammatical constraints; kageback et al compute the similarity between sentences by continuous vector space representation and extract the summary of the document using a recursive auto-encoder; yin et al project sentences to a continuous vector space through a Convolutional Neural Network (CNN), minimize costs based on "prestige" and "diversity", extract appropriate sentences, and achieve good effects in a multi-document extraction type summarization task; cao et al also solved the query-oriented multi-document summarization problem using CNN, they represented documents using weighted sum-posing on sentence representation basis, the weights being learned from the sentence-represented attention mechanism based on the query; cheng et al propose an automatic summarization framework based on hierarchical document encoders and attention mechanisms that can achieve a relatively robust summarization extraction without resorting to language labeling. However, the existing abstract extraction work is directed at the general field, and no scholars have provided technology and method for abstract extraction in the field of software engineering.
Disclosure of Invention
The invention aims to provide an automatic extraction system for key contents of technical questions and answers in a programming field, which can extract key contents in the technical questions and answers, reduce the browsing time of developers and improve the development efficiency of the programming field. The second purpose of the invention is to provide a method for automatically extracting key contents of the programming field technical question answering.
The invention specifically adopts the following technical scheme: the system for extracting the question and answer content in the programming environment comprises:
a data processing module for performing: preprocessing input network question and answer text data, removing useless information and performing word segmentation;
an entity identification module to perform: performing entity recognition in the field of software engineering on the text processed by the data processing module;
a document reading module to perform: inputting the text identified by the entity identification module into a neural network for document reading;
a digest extraction module for performing: and extracting key contents in the question and answer text by using another neural network.
As a preferred embodiment, the data processing module specifically executes the following steps: an initial state; processing code segments in the question and answer text; processing the HTML label; processing the URL; processing the emoticons; processing "@" information; utilizing an nltk tool to perform word segmentation; and finishing the data processing.
As a preferred embodiment, the entity identification module specifically executes the following steps: an initial state; calculating the spelling characteristics of the words, including whether the first letters of the words are capitalized, whether the words contain underlines and whether the words contain "-"; calculating to obtain the context characteristics of the words, specifically, adding two words in a window, namely the front word and the rear word, as the characteristics by using the window of [ -2,2 ]; calculating to obtain the bit stream characteristics of the words, specifically, clustering the words in similar distribution into a class by using an unlabeled text in the field of large-scale software engineering, wherein the class is represented by bit streams with different lengths as the characteristics; calculating to obtain the external dictionary features of the words, specifically collecting a large number of known entities to form an external dictionary, and checking whether the words exist in the external dictionary or not; performing entity recognition by using a CRF model obtained by training a tool CRF + +; and finishing the entity recognition.
As a preferred embodiment, the document reading module specifically executes the following steps: an initial state; obtaining sentence-level vector representation through a single-layer convolutional neural network with maximum pooling; converting the sentence-level vector representation into a document-level vector representation through a recurrent neural network; finishing reading the document; the abstract extraction module specifically executes and executes the following steps: an initial state; by using the idea of attention mechanism, a recurrent neural network is used for sequentially marking whether each sentence can be taken as an abstract or not; and (5) finishing abstract extraction.
The invention also provides a method for extracting the question and answer content in the programming environment, which comprises the following steps:
the data processing step specifically comprises the following steps: preprocessing input network question and answer text data, removing useless information and performing word segmentation;
the entity identification step specifically comprises the following steps: performing entity recognition in the field of software engineering on the text processed by the data processing module;
the document reading step specifically comprises the following steps: inputting the text identified by the entity identification module into a neural network for document reading;
the abstract extraction step specifically comprises the following steps: and extracting key contents in the question and answer text by using another neural network.
As a preferred embodiment, the data processing step specifically includes: an initial state; processing code segments in the question and answer text; processing the HTML label; processing the URL; processing the emoticons; processing "@" information; utilizing an nltk tool to perform word segmentation; and finishing the data processing.
As a preferred embodiment, the entity identifying step specifically includes: an initial state; calculating the spelling characteristics of the words, including whether the first letters of the words are capitalized, whether the words contain underlines and whether the words contain "-"; calculating to obtain the context characteristics of the words, specifically, adding two words in a window, namely the front word and the rear word, as the characteristics by using the window of [ -2,2 ]; calculating to obtain the bit stream characteristics of the words, specifically, clustering the words in similar distribution into a class by using an unlabeled text in the field of large-scale software engineering, wherein the class is represented by bit streams with different lengths as the characteristics; calculating to obtain the external dictionary features of the words, specifically collecting a large number of known entities to form an external dictionary, and checking whether the words exist in the external dictionary or not; performing entity recognition by using a CRF model obtained by training a tool CRF + +; and finishing the entity recognition.
As a preferred embodiment, the document reading step specifically includes: an initial state; obtaining sentence-level vector representation through a single-layer convolutional neural network with maximum pooling; converting the sentence-level vector representation into a document-level vector representation through a recurrent neural network; finishing reading the document; the abstract extracting step specifically comprises the following steps: an initial state; by using the idea of attention mechanism, a recurrent neural network is used for sequentially marking whether each sentence can be taken as an abstract or not; and (5) finishing abstract extraction.
The invention also proposes an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method are implemented when the processor executes the program.
The invention also proposes a medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method.
The invention achieves the following beneficial effects: (1) the automatic extraction system for the key content of the technical question and answer in the programming field can extract the key content in the technical question and answer, reduce the browsing time of developers and improve the development efficiency of the programming field. (2) The automatic extraction system for the key content of the technical question and answer in the programming site can automatically extract the key content of the technical question and answer without manual marking, and greatly reduces the cost for extracting the key content. (3) The method for automatically extracting the key content of the question and answer of the programming field technology is a brand new attempt oriented to the field of software engineering, and fills the blank of the field of software engineering about extraction of the key content.
Drawings
FIG. 1 is a flow chart of a method for extracting question and answer content in a programming environment of the present invention.
Fig. 2 is a schematic diagram of an example of the structure of CNN of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Example 1: the invention provides a question and answer content extraction system in a programming environment, which comprises:
a data processing module for performing: preprocessing input network question and answer text data, removing useless information and performing word segmentation;
an entity identification module to perform: performing entity recognition in the field of software engineering on the text processed by the data processing module;
a document reading module to perform: inputting the text identified by the entity identification module into a neural network for document reading;
a digest extraction module for performing: and extracting key contents in the question and answer text by using another neural network.
Preferably, the data processing module specifically executes the following steps: an initial state; processing code segments in the question and answer text; processing the HTML label; processing the URL; processing the emoticons; processing "@" information; utilizing an nltk tool to perform word segmentation; and finishing the data processing.
Preferably, the specific execution of the entity identification module includes: an initial state; calculating the spelling characteristics of the words, including whether the first letters of the words are capitalized, whether the words contain underlines and whether the words contain "-"; calculating to obtain the context characteristics of the words, specifically, adding two words in a window, namely the front word and the rear word, as the characteristics by using the window of [ -2,2 ]; calculating to obtain the bit stream characteristics of the words, specifically, clustering the words in similar distribution into a class by using an unlabeled text in the field of large-scale software engineering, wherein the class is represented by bit streams with different lengths as the characteristics; calculating to obtain the external dictionary features of the words, specifically collecting a large number of known entities to form an external dictionary, and checking whether the words exist in the external dictionary or not; performing entity recognition by using a CRF model obtained by training a tool CRF + +; and finishing the entity recognition.
Preferably, the specific execution of the document reading module includes: an initial state; obtaining sentence-level vector representation through a single-layer convolutional neural network with maximum pooling; converting the sentence-level vector representation into a document-level vector representation through a recurrent neural network; finishing reading the document; the abstract extraction module specifically executes and executes the following steps: an initial state; by using the idea of attention mechanism, a recurrent neural network is used for sequentially marking whether each sentence can be taken as an abstract or not; and (5) finishing abstract extraction.
Example 2: the invention also provides a method for extracting the question and answer content in the programming environment, the general framework of the invention is shown in figure 1, and the method for extracting the question and answer content in the programming environment comprises the following 4 steps:
step 1: for the question and answer text on the network, firstly clearing the contents in all < pre > tags, because the code segments in the question and answer appear in the < pre > tags, and the clearing of the contents in the < pre > tags also clears the code segments; then all html tags are deleted, e.g., < pre > < p > < div >, etc.; then replacing URL appeared in the text with "@ u @", replacing the appeared expression such as "@ e @", and replacing the content of other users with "@ a @"; finally, the text is participled using the nltk participle tool, where the participle requires the API name as a whole, e.g., os.
Step 2: and performing entity recognition on the text after data processing. The entity recognition method mainly comprises a conditional random field model (CRF), wherein the model is realized on the basis of a tool CRF + +, and the characteristics of the CRF model comprise:
l characteristics in the spelling of the word. Such as whether the word first is capitalized, contains an underline, and contains ";
l contextual characteristics. Using a window of [ -2,2], adding two words in the window, namely the front word and the rear word, as a characteristic;
l bit stream characteristics of the word. The method comprises the steps of utilizing unlabeled texts in the field of large-scale software, adopting a Brown clustering algorithm, classifying words appearing in similar contexts into one class, setting the class number of the words to be 1000, and representing the words in the same class by using the same bit stream as a characteristic;
l external dictionary features. A large number of known entities are collected in advance to constitute an external dictionary, and it is checked whether or not a word exists in the external dictionary.
And step 3: and reading and coding the text identified by the entity. Firstly, a single-layer Convolutional Neural Network (CNN) is used to obtain a document expression vector at a sentence level; a vector representation of the document is then constructed using a Recurrent Neural Network (RNN). The CNN operates at the word level to obtain a sentence-level representation, which is then used as input to the RNN, which obtains a document-level representation in a hierarchical manner. The embedding dimensions of the words, sentences and documents are set to 150, 300 and 750 respectively.
In the single-layer convolutional neural network, for each convolution kernel, a series of features are obtained by calculation by using a plurality of feature maps, so that the number of the features is also 300 and is matched with the dimensionality of a sentence. And different feature representation vectors of sentences are obtained by using different convolution kernels with the dimensionality of 1-7, and finally the vectors are added to obtain final sentence vector representation. The lower part of fig. 2 is an example of a CNN structure. The dimension of the word is 5, the example sentences total 6 words, the two colors respectively represent two convolution kernels, the dimension of the blue convolution kernel is 2, the dimension of the red convolution kernel is 3, and the convolution kernels of the two dimensions respectively have 6 characteristic maps. Each feature map corresponds to one dimension in the final vector after pooling, so that two vectors with the dimension of 6 can be obtained through two convolution kernels, and the two vectors are summed to obtain the final sentence vector.
The Recurrent Neural Network (RNN) uses a single-layer long-short term memory recurrent neural network (LSTM) to solve the problem of gradient disappearance during long sentence training.
And 4, step 4: by using the idea of attention mechanism, a recurrent neural network is used to sequentially label whether each sentence can be used as key content, and the labeling process considers whether the sentences are independent from each other and whether the meaning is repeated. As shown in the upper right part of fig. 2, the labeling result of the next sentence depends not only on the current input but also on the labeling result of the previous sentence.
Example 3: the invention also proposes an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method are implemented when the processor executes the program.
Example 4: the invention also proposes a medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (10)

1. The question-answer content extraction system under the programming environment is characterized by comprising the following steps:
a data processing module for performing: preprocessing input network question and answer text data, removing useless information and performing word segmentation;
an entity identification module to perform: performing entity recognition in the field of software engineering on the text processed by the data processing module;
a document reading module to perform: inputting the text identified by the entity identification module into a neural network for document reading;
a digest extraction module for performing: and extracting key contents in the question and answer text by using another neural network.
2. The system for extracting question and answer content under programming environment of claim 1, wherein the data processing module specifically executes the following steps: an initial state; processing code segments in the question and answer text; processing the HTML label; processing the URL; processing the emoticons; processing "@" information; utilizing an nltk tool to perform word segmentation; and finishing the data processing.
3. The system for extracting question and answer content in a programming environment according to claim 1, wherein the entity identification module specifically executes the following steps: an initial state; calculating the spelling characteristics of the words, including whether the first letters of the words are capitalized, whether the words contain underlines and whether the words contain "-"; calculating to obtain the context characteristics of the words, specifically, adding two words in a window, namely the front word and the rear word, as the characteristics by using the window of [ -2,2 ]; calculating to obtain the bit stream characteristics of the words, specifically, clustering the words in similar distribution into a class by using an unlabeled text in the field of large-scale software engineering, wherein the class is represented by bit streams with different lengths as the characteristics; calculating to obtain the external dictionary features of the words, specifically collecting a large number of known entities to form an external dictionary, and checking whether the words exist in the external dictionary or not; performing entity recognition by using a CRF model obtained by training a tool CRF + +; and finishing the entity recognition.
4. The system for extracting question and answer content under programming environment of claim 1, wherein the document reading module specifically executes the following steps: an initial state; obtaining sentence-level vector representation through a single-layer convolutional neural network with maximum pooling; converting the sentence-level vector representation into a document-level vector representation through a recurrent neural network; finishing reading the document; the abstract extraction module specifically executes the following steps: an initial state; by using the idea of attention mechanism, a recurrent neural network is used for sequentially marking whether each sentence can be taken as an abstract or not; and (5) finishing abstract extraction.
5. The method for extracting the question and answer content in the programming environment is characterized by comprising the following steps of:
the data processing step specifically comprises the following steps: preprocessing input network question and answer text data, removing useless information and performing word segmentation;
the entity identification step specifically comprises the following steps: performing entity recognition in the field of software engineering on the text processed by the data processing module;
the document reading step specifically comprises the following steps: inputting the text identified by the entity identification module into a neural network for document reading;
the abstract extraction step specifically comprises the following steps: and extracting key contents in the question and answer text by using another neural network.
6. The method for extracting question and answer content in programming environment according to claim 5, wherein the data processing step is specifically executed by: an initial state; processing code segments in the question and answer text; processing the HTML label; processing the URL; processing the emoticons; processing "@" information; utilizing an nltk tool to perform word segmentation; and finishing the data processing.
7. The method for extracting question and answer content in a programming environment according to claim 5, wherein the entity identification step is specifically executed by: an initial state; calculating the spelling characteristics of the words, including whether the first letters of the words are capitalized, whether the words contain underlines and whether the words contain "-"; calculating to obtain the context characteristics of the words, specifically, adding two words in a window, namely the front word and the rear word, as the characteristics by using the window of [ -2,2 ]; calculating to obtain the bit stream characteristics of the words, specifically, clustering the words in similar distribution into a class by using an unlabeled text in the field of large-scale software engineering, wherein the class is represented by bit streams with different lengths as the characteristics; calculating to obtain the external dictionary features of the words, specifically collecting a large number of known entities to form an external dictionary, and checking whether the words exist in the external dictionary or not; performing entity recognition by using a CRF model obtained by training a tool CRF + +; and finishing the entity recognition.
8. The method for extracting question and answer content under programming environment of claim 5, wherein the document reading step is specifically executed and comprises: an initial state; obtaining sentence-level vector representation through a single-layer convolutional neural network with maximum pooling; converting the sentence-level vector representation into a document-level vector representation through a recurrent neural network; finishing reading the document; the abstract extracting step specifically comprises the following steps: an initial state; by using the idea of attention mechanism, a recurrent neural network is used for sequentially marking whether each sentence can be taken as an abstract or not; and (5) finishing abstract extraction.
9. Electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 5 to 8 are implemented when the processor executes the program.
10. Medium, on which a computer program is stored, characterized in that the computer program realizes the steps of the method of any of claims 5 to 8 when executed by a processor.
CN202110449778.0A 2021-04-25 2021-04-25 Method, system, electronic device and medium for extracting question and answer content in programming environment Active CN113076127B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110449778.0A CN113076127B (en) 2021-04-25 2021-04-25 Method, system, electronic device and medium for extracting question and answer content in programming environment
PCT/CN2021/089820 WO2022226714A1 (en) 2021-04-25 2021-04-26 Method and system for extracting question and answer content in programming environment, electronic device, and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110449778.0A CN113076127B (en) 2021-04-25 2021-04-25 Method, system, electronic device and medium for extracting question and answer content in programming environment

Publications (2)

Publication Number Publication Date
CN113076127A true CN113076127A (en) 2021-07-06
CN113076127B CN113076127B (en) 2023-08-29

Family

ID=76618820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110449778.0A Active CN113076127B (en) 2021-04-25 2021-04-25 Method, system, electronic device and medium for extracting question and answer content in programming environment

Country Status (2)

Country Link
CN (1) CN113076127B (en)
WO (1) WO2022226714A1 (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN106776562A (en) * 2016-12-20 2017-05-31 上海智臻智能网络科技股份有限公司 A kind of keyword extracting method and extraction system
CN109902175A (en) * 2019-02-20 2019-06-18 上海方立数码科技有限公司 A kind of file classification method and categorizing system based on neural network structure model
CN110298037A (en) * 2019-06-13 2019-10-01 同济大学 The matched text recognition method of convolutional neural networks based on enhancing attention mechanism
CN110390049A (en) * 2019-07-10 2019-10-29 北京航空航天大学 A kind of answer automatic generation method of software-oriented development problem
CN111428012A (en) * 2020-03-02 2020-07-17 平安科技(深圳)有限公司 Intelligent question-answering method, device, equipment and storage medium based on attention mechanism
CN111522965A (en) * 2020-04-22 2020-08-11 重庆邮电大学 Question-answering method and system for entity relationship extraction based on transfer learning
CN111611361A (en) * 2020-04-01 2020-09-01 西南电子技术研究所(中国电子科技集团公司第十研究所) Intelligent reading, understanding, question answering system of extraction type machine
CN111666752A (en) * 2020-04-20 2020-09-15 中山大学 Circuit teaching material entity relation extraction method based on keyword attention mechanism
CN112149421A (en) * 2020-09-23 2020-12-29 云南师范大学 Software programming field entity identification method based on BERT embedding
WO2020261234A1 (en) * 2019-06-28 2020-12-30 Tata Consultancy Services Limited System and method for sequence labeling using hierarchical capsule based neural network
US20210027770A1 (en) * 2019-07-22 2021-01-28 Capital One Services, Llc Multi-turn dialogue response generation with persona modeling
CN112329465A (en) * 2019-07-18 2021-02-05 株式会社理光 Named entity identification method and device and computer readable storage medium
CN112417854A (en) * 2020-12-15 2021-02-26 北京信息科技大学 Chinese document abstraction type abstract method
CN115952263A (en) * 2022-12-16 2023-04-11 桂林电子科技大学 Question-answering method fusing machine reading understanding

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6842167B2 (en) * 2017-05-08 2021-03-17 国立研究開発法人情報通信研究機構 Summary generator, summary generation method and computer program

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN106776562A (en) * 2016-12-20 2017-05-31 上海智臻智能网络科技股份有限公司 A kind of keyword extracting method and extraction system
CN109902175A (en) * 2019-02-20 2019-06-18 上海方立数码科技有限公司 A kind of file classification method and categorizing system based on neural network structure model
CN110298037A (en) * 2019-06-13 2019-10-01 同济大学 The matched text recognition method of convolutional neural networks based on enhancing attention mechanism
WO2020261234A1 (en) * 2019-06-28 2020-12-30 Tata Consultancy Services Limited System and method for sequence labeling using hierarchical capsule based neural network
CN110390049A (en) * 2019-07-10 2019-10-29 北京航空航天大学 A kind of answer automatic generation method of software-oriented development problem
CN112329465A (en) * 2019-07-18 2021-02-05 株式会社理光 Named entity identification method and device and computer readable storage medium
US20210027770A1 (en) * 2019-07-22 2021-01-28 Capital One Services, Llc Multi-turn dialogue response generation with persona modeling
CN111428012A (en) * 2020-03-02 2020-07-17 平安科技(深圳)有限公司 Intelligent question-answering method, device, equipment and storage medium based on attention mechanism
CN111611361A (en) * 2020-04-01 2020-09-01 西南电子技术研究所(中国电子科技集团公司第十研究所) Intelligent reading, understanding, question answering system of extraction type machine
CN111666752A (en) * 2020-04-20 2020-09-15 中山大学 Circuit teaching material entity relation extraction method based on keyword attention mechanism
CN111522965A (en) * 2020-04-22 2020-08-11 重庆邮电大学 Question-answering method and system for entity relationship extraction based on transfer learning
CN112149421A (en) * 2020-09-23 2020-12-29 云南师范大学 Software programming field entity identification method based on BERT embedding
CN112417854A (en) * 2020-12-15 2021-02-26 北京信息科技大学 Chinese document abstraction type abstract method
CN115952263A (en) * 2022-12-16 2023-04-11 桂林电子科技大学 Question-answering method fusing machine reading understanding

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MOHINI WAKCHAURE 等: "A Scheme of Answer Selection In Community Question Answering Using Machine Learning Techniques", 《2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS)》, pages 879 - 883 *
史学良: "面向软件开发领域的知识问答系统设计与实现", 《中国优秀硕士学位论文全文数据库》, no. 2020, pages 138 - 92 *
李慧颖 等: "面向知识库问答的多注意力RNN关系链接方法", 《东南大学学报》, vol. 36, no. 4, pages 385 - 392 *

Also Published As

Publication number Publication date
WO2022226714A1 (en) 2022-11-03
CN113076127B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
CN108363743B (en) Intelligent problem generation method and device and computer readable storage medium
CN107463607B (en) Method for acquiring and organizing upper and lower relations of domain entities by combining word vectors and bootstrap learning
Choi et al. Identifying sources of opinions with conditional random fields and extraction patterns
CN110175246B (en) Method for extracting concept words from video subtitles
CN101520802A (en) Question-answer pair quality evaluation method and system
WO2009035863A2 (en) Mining bilingual dictionaries from monolingual web pages
Layton et al. Recentred local profiles for authorship attribution
CN108345583B (en) Event identification and classification method and device based on multilingual attention mechanism
CN110851599A (en) Automatic scoring method and teaching and assisting system for Chinese composition
CN111143507B (en) Reading and understanding method based on compound problem
CN112966525B (en) Law field event extraction method based on pre-training model and convolutional neural network algorithm
US11170169B2 (en) System and method for language-independent contextual embedding
Kim et al. Figure text extraction in biomedical literature
CN112800184B (en) Short text comment emotion analysis method based on Target-Aspect-Opinion joint extraction
CN113268576B (en) Deep learning-based department semantic information extraction method and device
CN114282527A (en) Multi-language text detection and correction method, system, electronic device and storage medium
CN115034218A (en) Chinese grammar error diagnosis method based on multi-stage training and editing level voting
CN113065349A (en) Named entity recognition method based on conditional random field
CN111444720A (en) Named entity recognition method for English text
Wong et al. isentenizer-: Multilingual sentence boundary detection model
CN114091448A (en) Text countermeasure sample generation method, system, computer device and storage medium
CN111815426B (en) Data processing method and terminal related to financial investment and research
CN111274354B (en) Referee document structuring method and referee document structuring device
Al Ghamdi A novel approach to printed Arabic optical character recognition
CN113076127B (en) Method, system, electronic device and medium for extracting question and answer content in programming environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant