CN113076127B - Method, system, electronic device and medium for extracting question and answer content in programming environment - Google Patents
Method, system, electronic device and medium for extracting question and answer content in programming environment Download PDFInfo
- Publication number
- CN113076127B CN113076127B CN202110449778.0A CN202110449778A CN113076127B CN 113076127 B CN113076127 B CN 113076127B CN 202110449778 A CN202110449778 A CN 202110449778A CN 113076127 B CN113076127 B CN 113076127B
- Authority
- CN
- China
- Prior art keywords
- question
- steps
- words
- text
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3335—Syntactic pre-processing, e.g. stopword elimination, stemming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a method, a system, an electronic device and a medium for extracting question and answer contents in a programming environment, wherein the system comprises the following steps: a data processing module for executing: preprocessing the input network question-answering text data, removing useless information and performing word segmentation; entity identification module for executing: entity identification in the field of software engineering is carried out on the text processed by the data processing module; a document reading module for performing: inputting the text recognized by the entity recognition module into a neural network for document reading; the abstract extraction module is used for executing the following steps: and extracting key contents in the question-answer text by using another neural network. The invention can extract the key content in the technical questions and answers, reduce the browsing time of developers and improve the on-site development efficiency of programming.
Description
Technical Field
The invention relates to a method, a system, electronic equipment and a medium for extracting question and answer contents in a programming environment, and belongs to the technical field of Internet.
Background
Software development is a flexible and challenging task that developers need strong learning ability and ability to solve problems. In a programming field, a developer can look up a tool book when encountering problems, can also frequently search for network help, inquire other developers who encounter similar problems, reference solutions of other people, avoid repeated labor, and improve development efficiency. Therefore, the software question-answering community is gradually active and aims to provide a platform for the developer to help each other and record the problems.
Active developers on the technical question-answering platform are more and more, and the developers are provided with questions to answer the questions, and meanwhile, the solution thought is provided for other developers who encounter similar questions, but not all questions can be solved on the platform, and a large amount of redundant information and irrelevant information exist on the platform, so that the assistance seeking by the developers is hindered. A question on a technical question-and-answer platform often corresponds to more than one answer, and there are cases where the answer is not related to the question, cases where the answers are repeated similarly, and cases where the relevant part is not related and the answer is repeated partially in the answer. For these situations, the platform has also made a lot of effort, such as Stack Overflow, to give the user a score for each answer to the question, and to make the highest scoring answer as visible to more people. This solves to some extent the interference of extraneous information, but there are still considerable limitations. If all answers under the same question are regarded as a document, abstract extraction is carried out on all the answers, and key contents are marked, so that the method can play a role similar to 'highlighting', help users reduce browsing time and improve development efficiency of programming sites.
Text summarization techniques may convert text or a collection of text into a short summary containing key information. The text abstract can be divided into an extraction type abstract and a generation type abstract according to the output type, wherein the extraction type abstract is an abstract formed by directly extracting a plurality of sentences from an original text, and sequencing and recombining the sentences. The extraction type abstract is applied to a technical question-answer community, so that key contents in answers can be extracted, and a developer is helped to quickly locate the desired answer contents.
In recent years, scholars have proposed a number of methods for summary extraction. Julian Kupiec et al propose that abstract extraction can be regarded as classical classification problem, given a series of training document data and manually extracted abstract results, training to obtain a classifier, obtaining probability that a given sentence can be incorporated into the abstract; conroy and O' Leary propose to abstract with hidden Markov model, and get the best effect compared with other models at that time; the Erkanand proposes a graph-based algorithm LexPageRank, when the cosine similarity of two sentences exceeds a certain threshold value, a corresponding edge is added into a connection matrix, and then the importance of the sentences is calculated through the connection matrix; woodsend et al propose a model of joint content selection and compression for document summarization that uses integer linear programming to select and combine words to construct a summary based on length, coverage and grammatical constraints; kageback et al calculate similarity between sentences through successive vector space representations and abstract extraction of documents using a recursive automatic encoder; yin et al project sentences into a continuous vector space through a Convolutional Neural Network (CNN), minimize cost based on 'prestige' and 'diversity', extract proper sentences, and obtain good effects in multi-document extraction type abstract tasks; cao et al also solved the query-oriented multi-document summarization problem using CNN, they expressed documents using weighted sum-pooling based on sentence representation, weights were learned from the attention mechanism of the query-clause representation; cheng et al propose an automatic summary framework based on hierarchical document encoders and attention mechanisms that can achieve a fairly good summary extraction without the aid of language labeling. However, the current abstract extraction work is aimed at the general field, and no technology and method have been proposed by a learner for abstract extraction in the field of software engineering.
Disclosure of Invention
The first object of the invention is to provide an automatic extraction system for key contents of programming field technical questions and answers, which can extract the key contents in the technical questions and answers, reduce the browsing time of developers and improve the programming field development efficiency. The second object of the invention is to provide an automatic extraction method of the key contents of the programming field technology question-answer.
The invention adopts the following technical scheme: the system for extracting the question and answer content in the programming environment comprises:
a data processing module for executing: preprocessing the input network question-answering text data, removing useless information and performing word segmentation;
entity identification module for executing: entity identification in the field of software engineering is carried out on the text processed by the data processing module;
a document reading module for performing: inputting the text recognized by the entity recognition module into a neural network for document reading;
the abstract extraction module is used for executing the following steps: and extracting key contents in the question-answer text by using another neural network.
As a preferred embodiment, the data processing module specifically performs the steps of: an initial state; processing code segments in the question-answering text; processing the HTML tag; processing the URL; processing the expression symbol; processing the "@" information; word segmentation is carried out by using an ntk tool; and finishing the data processing.
As a preferred embodiment, the entity identification module specifically performs the following steps: an initial state; calculating to obtain spelling patterns of words, including whether the initial of the word is capitalized, whether the underline is included and whether the word is included; calculating to obtain the context characteristics of the words, specifically using a window of [ -2,2], and adding the words in the window, namely the front word and the rear word, as the characteristics; calculating to obtain bit stream characteristics of words, specifically using unlabeled texts in the field of large-scale software engineering, clustering similarly distributed words into one class by using a clustering method, and representing the class by bit streams with different lengths as characteristics; calculating to obtain the external dictionary characteristics of the words, specifically collecting a large number of known entities to form an external dictionary, and checking whether the words exist in the external dictionary; performing entity identification by using a CRF model obtained by training a tool CRF++; and (5) finishing entity identification.
As a preferred embodiment, the document reading module specifically performs the steps of: an initial state; obtaining a sentence-level vector representation through a single-layer convolutional neural network with maximum pooling; converting the sentence-level vector representation into a document-level vector representation through a recurrent neural network; finishing the document reading; the abstract extraction module specifically performs the following steps: an initial state; by taking the thought of the attention mechanism as a reference, a cyclic neural network is used for marking whether each sentence can be regarded as a summary or not in sequence; and (5) finishing abstract extraction.
The invention also provides a method for extracting the question and answer content in the programming environment, which comprises the following steps:
the data processing step specifically comprises the following steps: preprocessing the input network question-answering text data, removing useless information and performing word segmentation;
the entity identification step specifically comprises the following steps: entity identification in the field of software engineering is carried out on the text processed by the data processing module;
the document reading step specifically comprises the following steps: inputting the text recognized by the entity recognition module into a neural network for document reading;
the abstract extraction step specifically comprises the following steps: and extracting key contents in the question-answer text by using another neural network.
As a preferred embodiment, the data processing step specifically includes: an initial state; processing code segments in the question-answering text; processing the HTML tag; processing the URL; processing the expression symbol; processing the "@" information; word segmentation is carried out by using an ntk tool; and finishing the data processing.
As a preferred embodiment, the entity identification step specifically includes: an initial state; calculating to obtain spelling patterns of words, including whether the initial of the word is capitalized, whether the underline is included and whether the word is included; calculating to obtain the context characteristics of the words, specifically using a window of [ -2,2], and adding the words in the window, namely the front word and the rear word, as the characteristics; calculating to obtain bit stream characteristics of words, specifically using unlabeled texts in the field of large-scale software engineering, clustering similarly distributed words into one class by using a clustering method, and representing the class by bit streams with different lengths as characteristics; calculating to obtain the external dictionary characteristics of the words, specifically collecting a large number of known entities to form an external dictionary, and checking whether the words exist in the external dictionary; performing entity identification by using a CRF model obtained by training a tool CRF++; and (5) finishing entity identification.
As a preferred embodiment, the document reading step specifically includes: an initial state; obtaining a sentence-level vector representation through a single-layer convolutional neural network with maximum pooling; converting the sentence-level vector representation into a document-level vector representation through a recurrent neural network; finishing the document reading; the abstract extracting step comprises the following specific execution steps: an initial state; by taking the thought of the attention mechanism as a reference, a cyclic neural network is used for marking whether each sentence can be regarded as a summary or not in sequence; and (5) finishing abstract extraction.
The invention also proposes an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method when executing the program.
The invention also proposes a medium on which a computer program is stored which, when being executed by a processor, implements the steps of the method.
The invention has the beneficial effects that: (1) The automatic extraction system for the key contents of the programming field technical questions and answers provided by the invention can extract the key contents in the technical questions and answers, reduce the browsing time of developers and improve the programming field development efficiency. (2) The automatic extraction system for the key contents of the technical questions and answers in the programming field can automatically extract the key contents of the technical questions and answers without manual labeling, and greatly reduces the extraction cost of the key contents. (3) The automatic extraction method of the key content of the programming field technology question and answer provided by the invention is a brand new attempt facing the field of software engineering, and fills the blank of the field of software engineering about key content extraction.
Drawings
Fig. 1 is a flowchart of a method of extracting question-answer contents in a programming environment of the present invention.
Fig. 2 is a schematic diagram of an example of the CNN structure of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.
Example 1: the invention provides a question and answer content extraction system in a programming environment, which comprises the following steps:
a data processing module for executing: preprocessing the input network question-answering text data, removing useless information and performing word segmentation;
entity identification module for executing: entity identification in the field of software engineering is carried out on the text processed by the data processing module;
a document reading module for performing: inputting the text recognized by the entity recognition module into a neural network for document reading;
the abstract extraction module is used for executing the following steps: and extracting key contents in the question-answer text by using another neural network.
Preferably, the specific execution of the data processing module includes: an initial state; processing code segments in the question-answering text; processing the HTML tag; processing the URL; processing the expression symbol; processing the "@" information; word segmentation is carried out by using an ntk tool; and finishing the data processing.
Preferably, the entity identification module specifically performs the following steps: an initial state; calculating to obtain spelling patterns of words, including whether the initial of the word is capitalized, whether the underline is included and whether the word is included; calculating to obtain the context characteristics of the words, specifically using a window of [ -2,2], and adding the words in the window, namely the front word and the rear word, as the characteristics; calculating to obtain bit stream characteristics of words, specifically using unlabeled texts in the field of large-scale software engineering, clustering similarly distributed words into one class by using a clustering method, and representing the class by bit streams with different lengths as characteristics; calculating to obtain the external dictionary characteristics of the words, specifically collecting a large number of known entities to form an external dictionary, and checking whether the words exist in the external dictionary; performing entity identification by using a CRF model obtained by training a tool CRF++; and (5) finishing entity identification.
Preferably, the document reading module specifically performs the steps of: an initial state; obtaining a sentence-level vector representation through a single-layer convolutional neural network with maximum pooling; converting the sentence-level vector representation into a document-level vector representation through a recurrent neural network; finishing the document reading; the abstract extraction module specifically performs the following steps: an initial state; by taking the thought of the attention mechanism as a reference, a cyclic neural network is used for marking whether each sentence can be regarded as a summary or not in sequence; and (5) finishing abstract extraction.
Example 2: the invention also provides a method for extracting the question-answer content in the programming environment, the general framework of the invention is shown in figure 1, and the method for extracting the question-answer content in the programming environment comprises the following 4 steps:
step 1: for the question and answer text on the network, firstly clearing the content in all the < pre > tags, wherein the code segments in the question and answer are appeared in the < pre > tags, and the content in the < pre > tags is cleared, so that the code segments are cleared; all html tags are then deleted, e.g., < pre > < p > < div > etc.; next, the URL appearing in the text is replaced by "@ u@", the appearing expression such as ":" is replaced by "@ e@", and the appearing "@" content of other users is replaced by "@ a@"; finally, word segmentation of text using the nltk word segmentation tool requires that the API name as a whole, e.g., os.path.join (path) needs to be indistinguishable as a word.
Step 2: and carrying out entity recognition on the text after the data processing. The subject of the entity identification method is a conditional random field model (CRF) implemented based on a tool crf++, the features of the CRF model including:
l features on word spelling. Such as whether the word initials are uppercase, contain underlining, and contain ";
l contextual characteristics. Using a window of [ -2,2] to add the words in the window, namely the front word and the back word, as the characteristics;
bit stream characteristics of word. Classifying words appearing in similar contexts into one class by utilizing unlabeled texts in the large-scale software field and adopting a Brown clustering algorithm, setting the number of classes of the words to be 1000 altogether, and representing the words in the same class by using the same bit stream as a characteristic;
external dictionary features. A large number of known entities are collected in advance to constitute an external dictionary, and whether or not a word exists in the external dictionary is checked.
Step 3: and reading the text identified by the entity and encoding. First, a single-layer Convolutional Neural Network (CNN) is used to obtain sentence-level document representation vectors; a Recurrent Neural Network (RNN) is then used to construct a vector representation of the document. The CNN operates at the word level to obtain a sentence-level representation, which is then used as input to the RNN, which obtains the document-level representation in a hierarchical manner. The embedding dimensions of words, sentences, documents are set to 150, 300, 750, respectively.
In the single-layer convolutional neural network, for each convolutional kernel, a series of features are calculated by using a plurality of feature graphs, so that the number of the features is 300 as well and is matched with the dimension of a sentence. And, different convolution kernels with dimensions of 1-7 are used to obtain different feature representation vectors of the sentence, and finally the vectors are added to obtain the final sentence vector representation. As in the lower half of fig. 2, is an example of a CNN structure. The dimension of the word is 5, the total of 6 words in the illustrated sentence, the two colors respectively represent two convolution kernels, the dimension of the blue convolution kernel is 2, the dimension of the red convolution kernel is 3, and the convolution kernels of the two dimensions have 6 feature graphs respectively. Each feature map corresponds to one dimension in the final vector after pooling, so that a vector with two dimensions of 6 can be obtained through two convolution kernels, and the two vectors are summed to obtain the final sentence vector.
The Recurrent Neural Network (RNN) uses a single layer long and short term memory recurrent neural network (LSTM) to solve the gradient vanishing problem during long sentence training.
Step 4: by taking the thought of the attention mechanism into consideration, a cyclic neural network is used for marking whether each sentence can be regarded as key content or not in sequence, and the marking process can consider whether the sentences are independent of each other or whether the meanings are repeated or not. As shown in the upper right part of fig. 2 above, the labeling result of the next sentence depends not only on the current input but also on the labeling result of the previous sentence.
Example 3: the invention also proposes an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method when executing the program.
Example 4: the invention also proposes a medium on which a computer program is stored which, when being executed by a processor, implements the steps of the method.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.
Claims (8)
1. The system for extracting the question and answer content in the programming environment is characterized by comprising the following steps:
a data processing module for executing: preprocessing the input network question-answering text data, removing useless information and performing word segmentation;
entity identification module for executing: entity identification in the field of software engineering is carried out on the text processed by the data processing module; the entity identification module specifically performs the following steps: an initial state; calculating to obtain spelling patterns of words, including whether the initial of the word is capitalized, whether the underline is included and whether the word is included; calculating to obtain the context characteristics of the words, specifically using a window of [ -2,2], and adding the words in the window, namely the front word and the rear word, as the characteristics; calculating to obtain bit stream characteristics of words, specifically using unlabeled texts in the field of large-scale software engineering, clustering similarly distributed words into one class by using a clustering method, and representing the class by bit streams with different lengths as characteristics; calculating to obtain the external dictionary characteristics of the words, specifically collecting a large number of known entities to form an external dictionary, and checking whether the words exist in the external dictionary; performing entity identification by using a CRF model obtained by training a tool CRF++; finishing entity identification;
a document reading module for performing: inputting the text recognized by the entity recognition module into a neural network for document reading;
the abstract extraction module is used for executing the following steps: and extracting key contents in the question-answer text by using another neural network.
2. The system for extracting question-answer content in a programming environment of claim 1, wherein the data processing module specifically performs the steps of: an initial state; processing code segments in the question-answering text; processing the HTML tag; processing the URL; processing the expression symbol; processing the "@" information; word segmentation is carried out by using an ntk tool; and finishing the data processing.
3. The system for extracting question-answer content in a programming environment of claim 1, wherein the document reading module specifically performs the steps of: an initial state; obtaining a sentence-level vector representation through a single-layer convolutional neural network with maximum pooling; converting the sentence-level vector representation into a document-level vector representation through a recurrent neural network; finishing the document reading; the abstract extraction module specifically performs the following steps: an initial state; by taking the thought of the attention mechanism as a reference, a cyclic neural network is used for marking whether each sentence can be regarded as a summary or not in sequence; and (5) finishing abstract extraction.
4. The method for extracting the question and answer content in the programming environment is characterized by comprising the following steps:
the data processing step specifically comprises the following steps: preprocessing the input network question-answering text data, removing useless information and performing word segmentation;
the entity identification step specifically comprises the following steps: performing entity identification in the field of software engineering on the text processed by the data processing step; the entity identification step specifically comprises the following steps: an initial state; calculating to obtain spelling patterns of words, including whether the initial of the word is capitalized, whether the underline is included and whether the word is included; calculating to obtain the context characteristics of the words, specifically using a window of [ -2,2], and adding the words in the window, namely the front word and the rear word, as the characteristics; calculating to obtain bit stream characteristics of words, specifically using unlabeled texts in the field of large-scale software engineering, clustering similarly distributed words into one class by using a clustering method, and representing the class by bit streams with different lengths as characteristics; calculating to obtain the external dictionary characteristics of the words, specifically collecting a large number of known entities to form an external dictionary, and checking whether the words exist in the external dictionary; performing entity identification by using a CRF model obtained by training a tool CRF++; finishing entity identification;
the document reading step specifically comprises the following steps: inputting the text identified by the entity identification step into a neural network for document reading;
the abstract extraction step specifically comprises the following steps: and extracting key contents in the question-answer text by using another neural network.
5. The method for extracting question-answer content in a programming environment according to claim 4, wherein the data processing step specifically comprises: an initial state; processing code segments in the question-answering text; processing the HTML tag; processing the URL; processing the expression symbol; processing the "@" information; word segmentation is carried out by using an ntk tool; and finishing the data processing.
6. The method for extracting question-answer contents in a programming environment according to claim 4, wherein the document reading step specifically comprises: an initial state; obtaining a sentence-level vector representation through a single-layer convolutional neural network with maximum pooling; converting the sentence-level vector representation into a document-level vector representation through a recurrent neural network; finishing the document reading; the abstract extracting step specifically comprises the following steps: an initial state; by taking the thought of the attention mechanism as a reference, a cyclic neural network is used for marking whether each sentence can be regarded as a summary or not in sequence; and (5) finishing abstract extraction.
7. Electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 4 to 6 when the program is executed.
8. A medium having stored thereon a computer program, which when executed by a processor performs the steps of the method according to any of claims 4 to 6.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110449778.0A CN113076127B (en) | 2021-04-25 | 2021-04-25 | Method, system, electronic device and medium for extracting question and answer content in programming environment |
PCT/CN2021/089820 WO2022226714A1 (en) | 2021-04-25 | 2021-04-26 | Method and system for extracting question and answer content in programming environment, electronic device, and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110449778.0A CN113076127B (en) | 2021-04-25 | 2021-04-25 | Method, system, electronic device and medium for extracting question and answer content in programming environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113076127A CN113076127A (en) | 2021-07-06 |
CN113076127B true CN113076127B (en) | 2023-08-29 |
Family
ID=76618820
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110449778.0A Active CN113076127B (en) | 2021-04-25 | 2021-04-25 | Method, system, electronic device and medium for extracting question and answer content in programming environment |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113076127B (en) |
WO (1) | WO2022226714A1 (en) |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN106776562A (en) * | 2016-12-20 | 2017-05-31 | 上海智臻智能网络科技股份有限公司 | A kind of keyword extracting method and extraction system |
CN109902175A (en) * | 2019-02-20 | 2019-06-18 | 上海方立数码科技有限公司 | A kind of file classification method and categorizing system based on neural network structure model |
CN110298037A (en) * | 2019-06-13 | 2019-10-01 | 同济大学 | The matched text recognition method of convolutional neural networks based on enhancing attention mechanism |
CN110390049A (en) * | 2019-07-10 | 2019-10-29 | 北京航空航天大学 | A kind of answer automatic generation method of software-oriented development problem |
CN111428012A (en) * | 2020-03-02 | 2020-07-17 | 平安科技(深圳)有限公司 | Intelligent question-answering method, device, equipment and storage medium based on attention mechanism |
CN111522965A (en) * | 2020-04-22 | 2020-08-11 | 重庆邮电大学 | Question-answering method and system for entity relationship extraction based on transfer learning |
CN111611361A (en) * | 2020-04-01 | 2020-09-01 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Intelligent reading, understanding, question answering system of extraction type machine |
CN111666752A (en) * | 2020-04-20 | 2020-09-15 | 中山大学 | Circuit teaching material entity relation extraction method based on keyword attention mechanism |
CN112149421A (en) * | 2020-09-23 | 2020-12-29 | 云南师范大学 | Software programming field entity identification method based on BERT embedding |
WO2020261234A1 (en) * | 2019-06-28 | 2020-12-30 | Tata Consultancy Services Limited | System and method for sequence labeling using hierarchical capsule based neural network |
CN112329465A (en) * | 2019-07-18 | 2021-02-05 | 株式会社理光 | Named entity identification method and device and computer readable storage medium |
CN112417854A (en) * | 2020-12-15 | 2021-02-26 | 北京信息科技大学 | Chinese document abstraction type abstract method |
CN115952263A (en) * | 2022-12-16 | 2023-04-11 | 桂林电子科技大学 | Question-answering method fusing machine reading understanding |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6842167B2 (en) * | 2017-05-08 | 2021-03-17 | 国立研究開発法人情報通信研究機構 | Summary generator, summary generation method and computer program |
US11487954B2 (en) * | 2019-07-22 | 2022-11-01 | Capital One Services, Llc | Multi-turn dialogue response generation via mutual information maximization |
-
2021
- 2021-04-25 CN CN202110449778.0A patent/CN113076127B/en active Active
- 2021-04-26 WO PCT/CN2021/089820 patent/WO2022226714A1/en active Application Filing
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN106776562A (en) * | 2016-12-20 | 2017-05-31 | 上海智臻智能网络科技股份有限公司 | A kind of keyword extracting method and extraction system |
CN109902175A (en) * | 2019-02-20 | 2019-06-18 | 上海方立数码科技有限公司 | A kind of file classification method and categorizing system based on neural network structure model |
CN110298037A (en) * | 2019-06-13 | 2019-10-01 | 同济大学 | The matched text recognition method of convolutional neural networks based on enhancing attention mechanism |
WO2020261234A1 (en) * | 2019-06-28 | 2020-12-30 | Tata Consultancy Services Limited | System and method for sequence labeling using hierarchical capsule based neural network |
CN110390049A (en) * | 2019-07-10 | 2019-10-29 | 北京航空航天大学 | A kind of answer automatic generation method of software-oriented development problem |
CN112329465A (en) * | 2019-07-18 | 2021-02-05 | 株式会社理光 | Named entity identification method and device and computer readable storage medium |
CN111428012A (en) * | 2020-03-02 | 2020-07-17 | 平安科技(深圳)有限公司 | Intelligent question-answering method, device, equipment and storage medium based on attention mechanism |
CN111611361A (en) * | 2020-04-01 | 2020-09-01 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Intelligent reading, understanding, question answering system of extraction type machine |
CN111666752A (en) * | 2020-04-20 | 2020-09-15 | 中山大学 | Circuit teaching material entity relation extraction method based on keyword attention mechanism |
CN111522965A (en) * | 2020-04-22 | 2020-08-11 | 重庆邮电大学 | Question-answering method and system for entity relationship extraction based on transfer learning |
CN112149421A (en) * | 2020-09-23 | 2020-12-29 | 云南师范大学 | Software programming field entity identification method based on BERT embedding |
CN112417854A (en) * | 2020-12-15 | 2021-02-26 | 北京信息科技大学 | Chinese document abstraction type abstract method |
CN115952263A (en) * | 2022-12-16 | 2023-04-11 | 桂林电子科技大学 | Question-answering method fusing machine reading understanding |
Non-Patent Citations (1)
Title |
---|
面向知识库问答的多注意力RNN关系链接方法;李慧颖 等;《东南大学学报》;第36卷(第4期);385-392 * |
Also Published As
Publication number | Publication date |
---|---|
WO2022226714A1 (en) | 2022-11-03 |
CN113076127A (en) | 2021-07-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108363743B (en) | Intelligent problem generation method and device and computer readable storage medium | |
CN107463607B (en) | Method for acquiring and organizing upper and lower relations of domain entities by combining word vectors and bootstrap learning | |
CN110134954B (en) | Named entity recognition method based on Attention mechanism | |
CN110851599B (en) | Automatic scoring method for Chinese composition and teaching assistance system | |
CN110750959A (en) | Text information processing method, model training method and related device | |
CN108345583B (en) | Event identification and classification method and device based on multilingual attention mechanism | |
CN106372061A (en) | Short text similarity calculation method based on semantics | |
CN111858944A (en) | Entity aspect level emotion analysis method based on attention mechanism | |
CN112966525B (en) | Law field event extraction method based on pre-training model and convolutional neural network algorithm | |
CN112800184B (en) | Short text comment emotion analysis method based on Target-Aspect-Opinion joint extraction | |
CN112101031B (en) | Entity identification method, terminal equipment and storage medium | |
CN111125367A (en) | Multi-character relation extraction method based on multi-level attention mechanism | |
US11170169B2 (en) | System and method for language-independent contextual embedding | |
CN117076653B (en) | Knowledge base question-answering method based on thinking chain and visual lifting context learning | |
CN113268576B (en) | Deep learning-based department semantic information extraction method and device | |
CN112364132A (en) | Similarity calculation model and system based on dependency syntax and method for building system | |
CN114239574A (en) | Miner violation knowledge extraction method based on entity and relationship joint learning | |
CN115034218A (en) | Chinese grammar error diagnosis method based on multi-stage training and editing level voting | |
Al Ghamdi | A novel approach to printed Arabic optical character recognition | |
CN113076127B (en) | Method, system, electronic device and medium for extracting question and answer content in programming environment | |
CN116595023A (en) | Address information updating method and device, electronic equipment and storage medium | |
CN113220864B (en) | Intelligent question-answering data processing system | |
CN116266268A (en) | Semantic analysis method and device based on contrast learning and semantic perception | |
CN112214511A (en) | API recommendation method based on WTP-WCD algorithm | |
CN110909547A (en) | Judicial entity identification method based on improved deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |