CN114398905A - Crowd-sourcing-oriented problem and solution automatic extraction method, corresponding storage medium and electronic device - Google Patents

Crowd-sourcing-oriented problem and solution automatic extraction method, corresponding storage medium and electronic device Download PDF

Info

Publication number
CN114398905A
CN114398905A CN202210002150.0A CN202210002150A CN114398905A CN 114398905 A CN114398905 A CN 114398905A CN 202210002150 A CN202210002150 A CN 202210002150A CN 114398905 A CN114398905 A CN 114398905A
Authority
CN
China
Prior art keywords
solution
context
sentence
layer
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210002150.0A
Other languages
Chinese (zh)
Inventor
石琳
江子攸
王青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CAS filed Critical Institute of Software of CAS
Priority to CN202210002150.0A priority Critical patent/CN114398905A/en
Publication of CN114398905A publication Critical patent/CN114398905A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides an automatic extraction method for a crowd-sourcing-oriented problem and a solution, a corresponding storage medium and an electronic device. The method is based on a customized enhanced natural language processing deep learning technique. Specifically, the technique involves two basic tasks: 1) decoupling conversations of the real-time chat logs, and automatically decomposing time-sequentially arranged linear texts into independent conversations by using a data preprocessing technology and a candidate feedforward neural network; 2) a new problem-solution prediction network is used to extract problems and solutions, and the network comprises a statement coding layer, a context-dependent statement coding layer and an output layer, so that a problem solution knowledge base in a corpus is constructed. According to the invention, a complex rule set does not need to be constructed for extraction, the full-automatic recommendation of a problem-solution scheme can be realized, and experiments prove that the crowd-sourcing model can promote knowledge sharing and improve problem solution efficiency, thereby promoting software development based on chat communities.

Description

Crowd-sourcing-oriented problem and solution automatic extraction method, corresponding storage medium and electronic device
Technical Field
The invention belongs to the technical field of computers, and particularly relates to an automatic extraction method for crowd-sourcing-oriented problems and solutions, a corresponding storage medium and an electronic device.
Background
With the continuous development of online chat platforms, compared with asynchronous communication modes such as e-mails or forums, synchronous communication is performed through real-time chat, so that developers can more efficiently seek information and technical support, share opinions and ideas, and discuss problems in the development process. Thus, real-time chat has become an integral part of most software development processes, not only for the purpose of forming an open source community of globally distributed developers, but also for software companies, online chat facilitates internal team communication and coordination, particularly in accommodating remote work brought by the COVID-19 pandemic. The real-time chat platform can be used for solving various problems in software development, such as installation and setting, bug solving, building and compiling and the like. Developers may ask questions related to certain specific questions and rely on others' answers to provide potential solutions.
Automated "problem-solution" extraction techniques have been extensively studied, such as the Casper method based on SVM, DECA based on rule sets, CNC based on CNN networks, and the UIT of context classifiers, among others. However, none of these methods analyze the following three challenges in mining real-time chat: (1) a coupled dialog. Real-time chat data is very voluminous and multiple concurrent discussions of different problems often exist in an interleaved fashion; (2) expensive labor costs. Chat logs are typically large numbers of inclusive informal conversations involving a wide range of technologies and complex topics; (3) and (4) noise data. There are duplicate and unreadable messages in the chat log that do not provide valuable information. These problems affect the accuracy and efficiency of extraction, and are not suitable for wide popularization and application in the industry.
Disclosure of Invention
Aiming at the problems, the automatic extraction technology for the crowd-sourcing-oriented problem and solution provided by the invention aims to automatically extract a large number of problem-solution pairs from a complex community real-time chat text through natural language processing and information extraction technologies, so that a difficult problem knowledge base existing in the development process is expanded, and the aim of automatically recommending solutions according to historical experience on an online question-and-answer platform is fulfilled.
The invention relates to an automatic extraction method for a crowd-sourcing-oriented problem and a solution, which comprises the following steps:
decoupling conversations of the real-time chat logs, and decomposing linear texts arranged in time sequence into independent conversations;
and extracting the problems and the solutions from the decomposed conversation by using a new problem-solution prediction network, and constructing a problem and solution knowledge base in the corpus by using the extracted problems and solutions.
Further, the decoupling of the dialogs of the real-time chat log comprises the steps of data preprocessing through text analysis and splitting of the dialogs using a dialogue decoupling model.
Further, the data preprocessing comprises the following steps:
1) capturing linear text data in online platform texts by using a crawler, and collecting chat records of a certain duration through a chat platform which is divided by projects and organized by time sequence, such as a Gitter;
2) the conversation is divided into words, and low-frequency words are replaced by specific symbols, so that interference is reduced;
3) replacing emoticons in the vocabulary text with standard regular character strings;
4) and calculating the consistency of adjacent sentences by using a Baidu artificial intelligence Cloud (Baidu AI Cloud) and utilizing the confusion index, and combining the adjacent sentences of which the confusion is lower than a set threshold (such as 40) into a new sentence.
Furthermore, the linear feedforward neural network containing 2-layer and 512-dimensional hidden layer vectors is selected for the conversation decoupling model, the network has the optimal testing effect on the online chat conversation decoupling data set with the sample size of 77563, and the accuracy rate of 74.9% and the recall rate of 79.7% can be achieved.
Further, the "problem-solution" predictive network contains a statement coding layer, a context dependent statement coding layer, and an output layer.
Further, the statement coding layer, its components include:
1) the BERT model used for coding the statement is pre-trained on a 2500M text, and fine-tuned on the decoupled dialogue data;
2) the triple used for context coding gathers the k adjacent sentences of the corresponding sentence and the context into an independent window vector and is used for the subsequent dialogue coding.
Further, the context-dependent sentence coding layer uses three feature extractors to extract codes containing context information of the dialog and feature information of the sentence, and the three feature extractors include:
1) a text feature extractor based on a convolutional network utilizes three layers of convolution and a maximum pooling layer to reduce the original sentence codes while maintaining the sentence semantics;
2) the heuristic characteristic extractor based on the attribute comprises heuristic characteristic codes of key words, structures, themes, emotions and roles and is used for extracting high-level semantic information of the sentences;
3) the context feature extractor based on the triples acquires the weight codes by using a local attention mechanism so as to capture the semantic information of the context.
Further, the output layer, its modules, uses the concatenated text feature vector, heuristic feature vector and context feature vector, using two fully connected layers (FC)1,FC2) Predicting whether it is a problem and a solution, respectively.
A storage medium having a computer program stored therein, wherein the computer program performs the above method.
An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the above method.
Compared with the prior art, the invention has the advantages that:
the invention can realize the automation and the intellectualization of the problem of the open source community chat system and the extraction of the solution.
The method does not need to use complex rules for extraction, has cross-domain self-adaption capability, and reduces the overhead of a problem-solution extraction algorithm.
The invention proves that the designed 'problem-solution' extraction algorithm has higher accuracy, recall rate and harmonic mean value by testing on the text data sets of eight main representative projects.
The knowledge base constructed by the invention can cover most of possible unsolved problems, and is beneficial to reuse of knowledge and automatic solution recommendation.
The invention separates independent dialog from complex linear text by understanding online chatting document described by natural language, and uses shared text feature coding, heuristic feature coding and context feature coding layer to solve problems and problems of solution prediction, based on semantic analysis and text mining, simplifies prediction task, and more accurately positions the position of 'problem-solution' pair. The automatic extraction algorithm can better avoid the interference of noise data, reduces the cost of manual extraction, has a higher F1 index evaluation result, and has higher industrial value because the model analyzes and completes the recommendation on a plurality of project indexes.
Drawings
FIG. 1 shows a flow chart of the present invention model session decoupling.
FIG. 2 shows a hierarchical flow diagram of model prediction in accordance with the present invention.
FIG. 3 shows a flow chart of the application of the model of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, the present invention shall be described in further detail with reference to the following detailed description and accompanying drawings.
The invention provides a method for automatically extracting problem-solution of an open source community and constructing a domain knowledge base, which is used for constructing a dialogue sample based on a plain text by utilizing a preprocessing technology after semantic analysis and natural language processing are finished. The problem and solution labels are then located through the shared multi-layer coding and prediction model. And finally, integrating all predicted 'problem-solution' to construct a complete question-answer knowledge base. The present invention is further illustrated by the following specific embodiments.
Fig. 1 is a block diagram illustrating the dialogue decoupling of the present invention model. Comprises four main steps: spell checking, low frequency word replacement, acronyms and emoticon replacement, and decoupled conversations.
Step 1.1 first spell check is performed, i.e. all text is collated, replacing potentially misbehaving words and tenses with standard vocabulary.
Step 1.2, replacing low-frequency words with uniform characteristics and special types by specific wildcards, usually selecting identifiers with "[ ]" symbols to replace the text, and mainly listing five common low-frequency words: a uniform resource locator ([ URL ]), an EMAIL ([ EMAIL ]), a web link ([ HTML ]), a source CODE ([ CODE ]), and identity information ([ ID ]).
Step 1.3 replaces the commonly used abbreviation text (e.g.: IDK → I Don't knock) with a standard abbreviation list, while replacing the special Unicode-encoded emoticons with standard ASCII characters (e.g.:
Figure BDA0003455197770000041
based on such specific alternatives, a plain ASCII encoded document may be constructed for training.
Step 1.4, firstly, using Baidu artificial intelligence Cloud (Baidu AI Cloud), calculating the consistency of adjacent sentences by using a confusion index, combining the adjacent sentences with the confusion lower than 40 into a new sentence, and secondly, selecting a dialogue decoupling model f of a multilayer feedforward network to decouple the original mixed dialogue into an independent dialogue set:
f:[u1,u2...un]→[D1,D2...Dn],
Di={uud1,ud2...udi}
wherein [ u ]1,u2...un]Is a time-ordered sentence of the original linear text, D ═ D1,D2...Dn]Is a decoupled dialog list. Wherein each dialog DiBy a sentence, i.e. u, in the original linear textd1,ud2...udiAnd (4) extracting the components. The sample D thus extracted can be used as an input of the model.
FIG. 2 is a flow chart of a hierarchy of model prediction according to the present invention. The model level flow chart comprises two main parts: problem prediction models and solution extraction models. The two models share the same model structure and different parameters, and are respectively used for predicting whether the current conversation contains a problem or not and extracting a statement corresponding to a solution.
Step 2.1 dialog D is first of all introducediDivided into two parts, one being a head pieceiCorresponding to the part containing the problem; the other part is a main body BiIncluding the solution that needs to be extracted. Binary Di=<Hi,Bi>The entire candidate dialog can be constructed. Therefore, the invention can input the head into the problem model to train the problem prediction and use the main body part to train the solution prediction, thereby simplifying the training steps and the expenditure.
Step 2.2.1 use BERT-based independent statement coding, chose "[ CLS]And outputting, namely encoding the vocabulary sequence into sentence encoding with 800 dimensions. Based on the current coding, the model constructs the context window relationship of the 2k +1 dimensioni-k...ui-1) Statement code set, current statement code and context (u)i+1...ui+k) Three formed by sentence coding setTuples for subsequent context-based encoding:
wini=[ui-k...ui...ui+k]
step 2.2.2 is the context-dependent statement code composed of three components, including a text feature extractor, a heuristic feature extractor and a context feature extractor.
The text feature extractor selects a three-layer convolution deep network model, and dimension reduction statement features are achieved while semantics are kept. Selecting convolution kernels
Figure BDA0003455197770000053
And sentence embedding x ═ uiThe constructed feature vector is:
γt=ReLU(W·xt∶t+h-1+b)
γ=[γ1,γ2...γn-h+1]
where ReLU is the activation function, W and b are the convolution kernel parameters, γtIs an output characteristic diagram, t is a specific position of statement coding, h is a convolution kernel and a coding window size, n is a coding length of a single statement, and xt:t+h-1And (4) embedding the h code vector with the length of the starting position t in the code x for the statement, wherein gamma is all feature map sets output after a sliding window. Outputting model feature vector of any layer through maximum pooling layer
Figure BDA0003455197770000051
The output dimensions of the three layers are 1024, 512 and 256 respectively, and finally the text feature extractor outputs a text feature vector gamma of 256 dimensions as the sentence code after dimension reduction.
The heuristic characteristic extractor selects a heuristic characteristic extractor based on attributes, comprises heuristic characteristic codes of key words, structures, themes, emotions and roles, and finally outputs 29-dimensional heuristic characteristic codes ξiAnd the semantic information extraction module is used for extracting the high-level semantic information of the sentence. Specific heuristic feature classifications, variables, descriptions, and examples are shown in Table 1.
TABLE 1
Figure BDA0003455197770000052
The context feature extractor is combined with a window mechanism, a local attention mechanism and a weight vector are used for predicting a certain sentence and a weight value related to a specific sentence in a window context, and context-related sentence codes are obtained in an accumulation and sum mode. The model constructs a triple by selecting a key-value pair mode: (h)Q,hK,hV)=WQKV·(ui,us,us) Wherein h isQQuery vector, h, for attributeKFor query-based key vectors, hVFor a corresponding vector of values, WQKVTo encode Q, K, V full connection layer matrix, ui,usFor the current candidate sentence coding and the sentence coding at a specific position in the window, u is satisfieds∈wini,winiRepresenting the window vector of dimension 2k +1 above. The model constructs the attention weight of a specific position by using a dot product similarity mode:
Figure BDA0003455197770000061
Figure BDA0003455197770000062
wherein, score (h)Q,hK) A score vector representing key-based query attribute weight, s represents the position of the current statement, i represents the position of the context statement for which the local attribute weight score between us needs to be calculated, k represents the window size in which σ/2 in normal distribution is half, asU representing outputsAnd context specific location statement uiThe weight of the local attention in between.
And accumulating the weights of the specific positions to obtain a final code vector:
Figure BDA0003455197770000063
where d is the dimension of a single statement vector within the window, a 128-dimensional context-dependent statement vector can ultimately be output. The vectors output by the three components are spliced to obtain the complete context-dependent statement code:
Figure BDA0003455197770000064
step 2.2.3 is full-link prediction, which is input into two models based on statement coding through two full-link layers to respectively judge whether the two models are problems and solutions. The statement of a given header is coded as
Figure BDA0003455197770000065
The statement of the subject is coded as
Figure BDA0003455197770000066
The full-connection layer selects a two-classification prediction problem, and a solution is extracted:
Figure BDA0003455197770000067
wherein → represents a function mapping relationship, FC represents a full connection layer, and I represents a head statement uHProblem indicator of uHHead statement, P (I | u), representing dialogue splittingH) Denotes the probability that the head sentence is predicted as a problem, S denotes a solution indicator of the body sentence, uBSet of body statements, P (S | u), representing a dialogue splitB) Representing the probability of predicting as a solution for all subject statements.
To optimize this model, this step uses cross-entropy to analyze the difference in loss of probability and true value, training the model:
LossI=-yH·log P(I|utH),
Figure BDA0003455197770000068
therein, LossILoss function, y, representing problem predictionHTrue tag, Loss, corresponding to the presentation of problem predictionSLoss function, y, representing solution predictioniSolution real tag, u, representing the ith subject statementBiDenotes the ith body sentence, and B denotes a body indicator.
Combining the problem after the training convergence and the solution model, as shown in fig. 3, is a flowchart of the model application of the present invention. And 3.1, decoupling the conversation, inputting a real-time chat log into the crowd-sourcing model, and obtaining a structured conversation sample by a conversation decoupling technology. And 3.2, model prediction is performed, a certain record of the existing sample is sequentially input, and after the head and the main body are separated, whether the head is a problem in the development process is detected through a problem model. If the detection problem is false, discarding the current record and selecting the next record; otherwise, the body of the current record is extracted and the input solution model detects sentences that satisfy the solution description. And 3.3, integrating and archiving, extracting a dialog set predicted as a question, combining predicted sentences and storing the combined predicted sentences into a candidate question-answer knowledge base. Specific examples of the "problem-solution" knowledge base obtained and the recommended strategy are shown in table 2.
TABLE 2
Figure BDA0003455197770000071
The present invention evaluated F1 values for the extraction effect of 171 "problem-solution" over multiple baselines and projects, and found to be over 30% above baseline in problem detection and over 20% above solution extraction with relatively high accuracy and stability. Meanwhile, a 30K problem-solution pair is disclosed on 11 other community projects, and the fact that the crowd-sourcing model can promote knowledge sharing and improve problem solving efficiency is proved, so that software development based on chat communities is promoted.
Another embodiment of the present invention provides a storage medium having a computer program stored therein, the computer program performing the method of the present invention.
Another embodiment of the present invention provides an electronic device, which includes a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the method of the present invention.
Other embodiments of the invention:
1) aiming at the problem that the position of the problem is deviated in the dialogue data of the context feature extractor, a Graph Attention Network (GAT) can be selected for more accurate extraction and a solution;
2) for the problem of 'problem-solution' iterative update possibly caused by the change of project version information in the heuristic feature extractor, time features such as open source project versions and the like can be added in the heuristic features in table 1;
3) extracting a model for an existing problem prediction model and solution may present a problem with multiple stages (e.g., for problem I)1Analytic solution S1May cause new problems I2Need to adopt S2Can perfectly solve the current I1Two "problem-solution" knowledge pairs may thus be output:<I1,[step1:S1;step2:S2]>and<I2,S2>) A more perfect knowledge base can be constructed by adopting an extraction method based on a neural network + rule mode;
4) aiming at the problem that the extracted solution sentences are not smooth enough, a solution with higher quality can be constructed by adopting the scheme of extraction type abstract and word connection prediction;
5) an intelligent recommendation algorithm can be established for the problem-solution manual recommendation time-consuming problem of table 2, and simultaneously, since a single problem may have a plurality of possible solutions, a de-duplication knowledge base and a solution confidence ranking algorithm can be optimized for automatically recommending a plurality of possible solutions for the StackOverflow unsolved problem and ranking on the basis of the confidence.
The particular embodiments of the present invention disclosed above are illustrative only and are not intended to be limiting, since various alternatives, modifications, and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The invention should not be limited to the disclosure of the embodiments in the present specification, but the scope of the invention is defined by the appended claims.

Claims (10)

1. A crowd-sourcing-oriented problem and solution automatic extraction method is characterized by comprising the following steps:
decoupling conversations of the real-time chat logs, and decomposing linear texts arranged in time sequence into independent conversations;
and adopting a problem-solution prediction network, extracting problems and solutions from the decomposed conversation, and constructing a problem and solution knowledge base by using the extracted problems and solutions.
2. The method of claim 1, wherein decoupling conversations of the real-time chat log comprises preprocessing data by text analysis and splitting conversations using a conversation decoupling model.
3. The method of claim 1, wherein the data preprocessing comprises:
1) capturing linear text data in online platform texts by using a crawler, and collecting chat records of a certain duration through a chat platform;
2) the conversation is divided into words, and low-frequency words are replaced by specific symbols, so that interference is reduced;
3) replacing emoticons in the vocabulary text with standard regular character strings;
4) and calculating the consistency of adjacent sentences by using a Baidu artificial intelligence cloud and using the confusion index, and combining the adjacent sentences of which the confusion is lower than a set threshold value into a new sentence.
4. The method of claim 1, wherein the dialogue decoupling model employs a linear feedforward neural network comprising 2-layer, 512-dimensional hidden layer vectors.
5. The method of claim 1, wherein the problem-solution prediction network comprises a syntax coding layer, a context-dependent syntax coding layer, and an output layer.
6. The method of claim 5, wherein the syntax encoding layer comprises:
1) a BERT model for coding the sentence, the model being pre-trained on the text and fine-tuned on the decoupled dialogue data;
2) the triple used for context coding gathers the k adjacent sentences of the corresponding sentence and the context into an independent window vector and is used for the subsequent dialogue coding.
7. The method of claim 5, wherein the context dependent sentence coding layer uses three feature extractors to extract codes containing context information of the dialog and feature information of the sentence itself, the three feature extractors comprising:
1) a text feature extractor based on a convolutional network utilizes three layers of convolution and a maximum pooling layer to reduce the original sentence codes while maintaining the sentence semantics;
2) the heuristic characteristic extractor based on the attribute comprises heuristic characteristic codes of key words, structures, themes, emotions and roles and is used for extracting high-level semantic information of the sentences;
3) the context feature extractor based on the triples acquires the weight codes by using a local attention mechanism so as to capture the semantic information of the context.
8. The method of claim 5, wherein the output layer uses the stitched text feature vector, heuristic feature vector, and context feature vector to predict whether a problem and a solution, respectively, using two fully-connected layers.
9. A storage medium, characterized in that a computer program is stored in the storage medium, which computer program performs the method of any of claims 1-8.
10. An electronic device, comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the method of any of claims 1-8.
CN202210002150.0A 2022-01-04 2022-01-04 Crowd-sourcing-oriented problem and solution automatic extraction method, corresponding storage medium and electronic device Pending CN114398905A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210002150.0A CN114398905A (en) 2022-01-04 2022-01-04 Crowd-sourcing-oriented problem and solution automatic extraction method, corresponding storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210002150.0A CN114398905A (en) 2022-01-04 2022-01-04 Crowd-sourcing-oriented problem and solution automatic extraction method, corresponding storage medium and electronic device

Publications (1)

Publication Number Publication Date
CN114398905A true CN114398905A (en) 2022-04-26

Family

ID=81229274

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210002150.0A Pending CN114398905A (en) 2022-01-04 2022-01-04 Crowd-sourcing-oriented problem and solution automatic extraction method, corresponding storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN114398905A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115759113A (en) * 2022-11-08 2023-03-07 贝壳找房(北京)科技有限公司 Method and device for recognizing sentence semantics in dialog information

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115759113A (en) * 2022-11-08 2023-03-07 贝壳找房(北京)科技有限公司 Method and device for recognizing sentence semantics in dialog information
CN115759113B (en) * 2022-11-08 2023-11-03 贝壳找房(北京)科技有限公司 Method and device for identifying sentence semantics in dialogue information

Similar Documents

Publication Publication Date Title
Lin et al. Traceability transformed: Generating more accurate links with pre-trained bert models
Arora et al. Character level embedding with deep convolutional neural network for text normalization of unstructured data for Twitter sentiment analysis
Onan SRL-ACO: A text augmentation framework based on semantic role labeling and ant colony optimization
CN116775847A (en) Question answering method and system based on knowledge graph and large language model
Abdel-Nabi et al. Deep learning-based question answering: a survey
Liu et al. Open intent discovery through unsupervised semantic clustering and dependency parsing
CN117149974A (en) Knowledge graph question-answering method for sub-graph retrieval optimization
CN115204143B (en) Method and system for calculating text similarity based on prompt
CN118093834A (en) AIGC large model-based language processing question-answering system and method
CN116010553A (en) Viewpoint retrieval system based on two-way coding and accurate matching signals
CN116861269A (en) Multi-source heterogeneous data fusion and analysis method in engineering field
Han et al. A-BPS: automatic business process discovery service using ordered neurons LSTM
CN112989803B (en) Entity link prediction method based on topic vector learning
CN114372454B (en) Text information extraction method, model training method, device and storage medium
CN114398905A (en) Crowd-sourcing-oriented problem and solution automatic extraction method, corresponding storage medium and electronic device
CN117350271A (en) AI content generation method and service cloud platform based on large language model
Shahade et al. Deep learning approach-based hybrid fine-tuned Smith algorithm with Adam optimiser for multilingual opinion mining
Olivero Figurative Language Understanding based on Large Language Models
CN113157892A (en) User intention processing method and device, computer equipment and storage medium
CN117971990B (en) Entity relation extraction method based on relation perception
Lv et al. A Code Completion Approach Based on Abstract Syntax Tree Splitting and Tree-LSTM
Li et al. Hierarchical Information Fusion Graph Neural Networks for Chinese Implicit Rhetorical Questions Recognition
Guo Graformer: A user alignment method based on joint embedding of user attributes and network structure
CN116257629A (en) Dialogue decoupling method based on user intention and mutual learning technology, corresponding storage medium and electronic device
Wang et al. Aspect-level Sentiment Analysis based on Prompt Templates and External Knowledge

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination