CN110825930A - Method for automatically identifying correct answers in community question-answering forum based on artificial intelligence - Google Patents

Method for automatically identifying correct answers in community question-answering forum based on artificial intelligence Download PDF

Info

Publication number
CN110825930A
CN110825930A CN201911058818.8A CN201911058818A CN110825930A CN 110825930 A CN110825930 A CN 110825930A CN 201911058818 A CN201911058818 A CN 201911058818A CN 110825930 A CN110825930 A CN 110825930A
Authority
CN
China
Prior art keywords
answer
question
answers
similarity
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911058818.8A
Other languages
Chinese (zh)
Inventor
孙海峰
王晶
戚琦
王敬宇
郭令奇
马兵
杜纯宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201911058818.8A priority Critical patent/CN110825930A/en
Publication of CN110825930A publication Critical patent/CN110825930A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • G06F16/90332Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The method for automatically identifying correct answers in the community question-answering forum based on artificial intelligence comprises the following operation steps: (1) a process of data set establishment; (2) extracting information characteristics of the text pairs by using a deep learning method; (3) extracting other characteristics of the question and the answer by using a rule, and splicing the characteristics and the characteristics obtained in the step (2) into a characteristic vector, wherein the format of the characteristic vector is [ BERT prediction probability, similarity of the current answer and the excellent answer, similarity of the answer and the question, and day difference ]; (4) training the machine to learn the classification model and predict new posts. The method can quickly and accurately judge the answer which is probably the correct answer under a post, and is time-saving and labor-saving.

Description

Method for automatically identifying correct answers in community question-answering forum based on artificial intelligence
Technical Field
The invention relates to a method for automatically identifying correct answers in a community question-answering forum based on artificial intelligence, which belongs to the technical field of natural language processing, in particular to the technical field of natural language processing forum question-answering based on artificial intelligence.
Background
With the advent of numerous community forums, the tasks associated therewith have recently become increasingly important. With the daily influx of a plurality of new problems in the forums, most of the messages related to the new problems have certain errors, and certain misleading effect is caused to other people. These false messages, if identified manually, require not only a relatively authoritative specialist in some areas, but are also time consuming and laborious. Therefore, how to quickly and effectively judge whether the answer to the new question is helpful to solve the question is an effective way for solving the problem which is increasing.
Artificial intelligence technology and natural language processing technology have been developed greatly in recent years, and how to use artificial intelligence technology and natural language processing technology to discriminate the answer quality is a technical problem which needs to be solved urgently.
Disclosure of Invention
In view of the above, the present invention is to invent a method for automatically identifying correct answers in a community question and answer forum based on artificial intelligence, so as to identify answers in a question and answer sticker, and select excellent answers for others to refer to.
In order to achieve the above object, the present invention provides a method for automatically identifying correct answers in a community question-answering forum based on artificial intelligence, which comprises the following operation steps:
(1) the specific contents of the process of establishing the data set are as follows: crawling a large amount of question and answer sticker contents by using crawler software; after crawling, storing the contents of the question and answer stickers in a form of text pairs consisting of questions and single answers; then, data cleaning is carried out on the stored data, and then manual marking is carried out to establish a data set;
(2) extracting information characteristics of the text pairs by using a deep learning method, wherein the specific contents are as follows: taking the data set obtained in the step (1) as a training set to train a deep learning model, and then extracting characteristics such as tone, keywords, grammatical structures and the like of the text pair by using the deep learning model;
(3) other characteristics of the questions and the answers are extracted by using the rules, and the specific contents are as follows: calculating the difference of day numbers of the question and the answer issue, calculating the similarity degree of the single answer and the current question by using TF-IDF, calculating the similarity degree of the single answer and other answers of the current question by using TF-IDF and other characteristics, and splicing the characteristics and the characteristics obtained in the step (2) into a characteristic vector;
(4) training a machine learning classification model and predicting a new post, wherein the specific contents are as follows: training a machine learning classification model by using the feature vectors obtained in the step (3); and (3) predicting the new post after the training is finished, crawling all contents of the new post by using a crawler and storing, then extracting characteristic composition vectors according to the step (2) and the step (3), predicting by using the machine learning classification model, and selecting the first n answers with the highest probability, wherein n is a natural number and is not more than the total number of answers.
The specific content of the step (1) comprises the following operation steps:
(11) the information of a website is crawled by using a crawler, and information such as post question asking, answer, question user, answer user, posting time and the like is stored, or data can be obtained from other similar data sets and is arranged together;
(12) traversing and filling NULL attributes, unifying the maximum length of the text, and cleaning interference data;
(13) and storing the data obtained in the last step in a text pair mode through questions and single answers, and carrying out manual annotation.
The specific content of the step (2) comprises the following operation steps:
(21) performing fine tuning training by using a BERT model according to the data obtained in the step (1); the BERT model carries out byte coding, segment coding and position coding on input text content; and after the fine tuning training is finished, storing the fine tuned model.
(22) And (4) adding the vectors of the three coding layers obtained in the step (21) and then classifying to obtain a single question and a single answer classification result, wherein the classification result contains text features such as mood, keywords and the like learned by the BERT model in the text.
The specific content of the step (3) comprises the following operation steps:
(31) reading the current question and the time of the answer thereof in the data set, calculating the difference of days, namely the difference of days is the time of the question-the time of the answer, and calculating the similarity of the single answer and the question by using a TF-IDF (Trans-inverse document frequency) algorithm;
(32) calculating the similarity between each answer and the answer with the highest probability of the current question according to the classification result of all the answers obtained in the step (2), wherein the similarity is calculated by using a TF-IDF (Trans-inverse document frequency) algorithm, and the answer with the highest probability is an excellent answer;
(33) and (3) splicing the obtained day difference features, similarity features and the feature values obtained in the step (2) into a feature vector, wherein the format of the feature vector is [ BERT prediction probability, similarity of current answer and excellent answer, similarity of answer and question, and day difference ].
The specific content of the step (4) comprises the following operation steps:
(41) selecting an SVM model as a machine learning classification model, and training the machine learning classification model according to the feature vector obtained in the step (3);
(42) obtaining relevant information of target posts, including but not limited to question content, answer content and posting time, and storing the question and the single answer in a text pair mode according to the storage format of the step (1);
(43) predicting the target paste by using the BERT model finely adjusted in the step (2) according to the text data obtained in the last step, calculating features such as an antenna number difference, similarity and the like according to the method in the step (3), and combining the features into a feature vector, wherein the feature vector has the same format as the feature vector formed in the step (3), and the number of the feature vectors is equal to the number of answers;
(44) and (4) predicting the feature vectors by using the machine learning classification model trained in the step (41), and outputting the previous n answers with the highest probability for the user to refer to, wherein n is a natural number and is not more than the total number of answers.
The invention has the beneficial effects that: the method of the invention is not limited to the text information of postings and postings, but also considers the information except the text, such as user name, time difference between postings and postings, similarity with other answers and the like, and trains the model by using the multi-dimensional characteristics, so that the accuracy of the model is higher. The method can quickly and accurately judge the answer which is probably the correct answer under a post, saves time and labor and also reduces the misleading of wrong answers to other people.
Drawings
FIG. 1 is a flow chart of a method for automatically identifying correct answers in a community question-answering forum based on artificial intelligence in accordance with the present invention;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the accompanying drawings.
Referring to fig. 1, a method for automatically identifying correct answers in a community question and answer forum based on artificial intelligence is presented, the method comprises the following operation steps:
(1) the specific contents of the process of establishing the data set are as follows: crawling a large amount of question and answer sticker contents by using crawler software; after crawling, storing the contents of the question and answer stickers in a form of text pairs consisting of questions and single answers, wherein the data storage format of the text pairs refers to table 1; data may also be obtained from other data sets; then, data cleaning is carried out on the stored data, and then manual marking is carried out to establish a data set;
TABLE 1
Problem(s) Answers
The validity period of a visa is several months 3 months old
The validity period of a visa is several months Multiple entries and exits were allowed within 3 months.
The validity period of a visa is several months About two months, or three months
(2) Extracting information characteristics of the text pairs by using a deep learning method, wherein the specific contents are as follows: taking the data set obtained in the step (1) as a training set to train a deep learning model, and then extracting characteristics such as tone, keywords, grammatical structures and the like of the text pair by using the deep learning model;
(3) other characteristics of the questions and the answers are extracted by using the rules, and the specific contents are as follows: calculating the difference of day numbers of the question and the answer issue, calculating the similarity degree of the single answer and the current question by using TF-IDF, calculating the similarity degree of the single answer and other answers of the current question by using TF-IDF and other characteristics, and splicing the characteristics and the characteristics obtained in the step (2) into a characteristic vector;
(4) training a machine learning classification model and predicting a new post, wherein the specific contents are as follows: training a machine learning classification model by using the feature vectors obtained in the step (3); and (3) predicting the new post after the training is finished, crawling all contents of the new post by using a crawler and storing, then extracting characteristic composition vectors according to the step (2) and the step (3), predicting by using the machine learning classification model, and selecting the first n answers with the highest probability, wherein n is a natural number and is not more than the total number of answers.
The specific content of the step (1) comprises the following operation steps:
(11) the information of a website is crawled by using a crawler, the posts are asked, answered, users are asked, users are answered, the posting time and other information are stored, and data can also be obtained from other similar data sets, such as: some data sets with forum help posts as main contents, such as data sets of Task8 of Semeval2019, obtain data and arrange the data together;
(12) traversing and filling NULL attributes, unifying the maximum length of the text, and cleaning interference data by using rules; for example, irrelevant post contents such as discussion posts, bulletin posts and the like are searched, and whether keywords such as 'festival happy', 'water paste', 'discussion' and the like are contained in the posts or not is mainly searched for;
TABLE 2
Figure BDA0002257306330000041
(13) Storing the data obtained in the last step in the form of a question and a single answer in the form of a text pair, and manually labeling, wherein the manual labeling method follows the following formula:
Figure BDA0002257306330000042
in the above formula, a represents the label of a text pair, which would be labeled "1" if the answer is correct, "0" if the answer is wrong, and "2" if the answer is a question.
Referring to table 1, data is stored in a file for reading in the form of text pairs of questions and individual answers. Referring to the data example shown in table 2, each row in table 2 represents a single text pair, with the first column being a question and the second column being an answer to the question. There may be zero, one, or more correct answers to a post. In this example, the first and second answers are correct and the third answer is wrong.
The specific content of the step (2) comprises the following operation steps:
(21) performing fine tuning training by using a BERT model according to the data obtained in the step (1); the BERT model carries out byte coding, segment coding and position coding on input text content; and after the fine tuning training is finished, storing the fine tuned model. The BERT model is generally referred to as Bidirective Encoder registration from transformations, see the paper Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.2018.Bert: Pre-training of deep Bidirectional transformations for language interpretation: arXiv preprints Arxiv: 1810.04805;
(22) and (4) adding the vectors of the three coding layers obtained in the step (21) and then classifying to obtain a single question and a single answer classification result, wherein the classification result contains text features such as mood, keywords and the like learned by the BERT model in the text.
The specific content of the step (3) comprises the following operation steps:
(31) reading the current question and the time of the answer thereof in the data set, calculating the difference of days, namely the difference of days is the time of the question-the time of the answer, and calculating the similarity of the single answer and the question by using a TF-IDF (Trans-inverse document frequency) algorithm;
(32) calculating the similarity between each answer and the answer with the highest probability of the current question according to the classification result of all the answers obtained in the step (2), wherein the similarity is calculated by using a TF-IDF (Trans-inverse document frequency) algorithm, and the answer with the highest probability is an excellent answer;
(33) and (3) splicing the obtained day difference features, similarity features and the feature values obtained in the step (2) into a feature vector, wherein the format of the feature vector is [ BERT prediction probability, similarity of current answer and excellent answer, similarity of answer and question, and day difference ].
The specific content of the step (4) comprises the following operation steps:
(41) selecting an SVM model as a machine learning classification model, and training the machine learning classification model according to the feature vector obtained in the step (3);
(42) obtaining relevant information of target posts, including but not limited to question content, answer content and posting time, and storing the question and the single answer in a text pair mode according to the storage format of the step (1);
(43) predicting the target paste by using the BERT model finely adjusted in the step (2) according to the text data obtained in the last step, calculating features such as an antenna number difference, similarity and the like according to the method in the step (3), and combining the features into a feature vector, wherein the feature vector has the same format as the feature vector formed in the step (3), and the number of the feature vectors is equal to the number of answers;
(44) and (4) predicting the feature vectors by using the machine learning classification model trained in the step (41), and outputting the previous n answers with the highest probability for the user to refer to, wherein n is a natural number and is not more than the total number of answers.
The inventor conducts a large number of experiments on the method, and the experimental results prove that the method is feasible and effective.

Claims (5)

1. The method for automatically identifying correct answers in the community question-answering forum based on artificial intelligence is characterized by comprising the following steps: the method comprises the following operation steps:
(1) the specific contents of the process of establishing the data set are as follows: crawling a large amount of question and answer sticker contents by using crawler software; after crawling, storing the contents of the question and answer stickers in a form of text pairs consisting of questions and single answers; then, data cleaning is carried out on the stored data, and then manual marking is carried out to establish a data set;
(2) extracting information characteristics of the text pairs by using a deep learning method, wherein the specific contents are as follows: taking the data set obtained in the step (1) as a training set to train a deep learning model, and then extracting characteristics such as tone, keywords, grammatical structures and the like of the text pair by using the deep learning model;
(3) other characteristics of the questions and the answers are extracted by using the rules, and the specific contents are as follows: calculating the difference of day numbers of the question and the answer issue, calculating the similarity degree of the single answer and the current question by using TF-IDF, calculating the similarity degree of the single answer and other answers of the current question by using TF-IDF and other characteristics, and splicing the characteristics and the characteristics obtained in the step (2) into a characteristic vector;
(4) training a machine learning classification model and predicting a new post, wherein the specific contents are as follows: training a machine learning classification model by using the feature vectors obtained in the step (3); and (3) predicting the new post after the training is finished, crawling all contents of the new post by using a crawler and storing, then extracting characteristic composition vectors according to the step (2) and the step (3), predicting by using the machine learning classification model, and selecting the first n answers with the highest probability, wherein n is a natural number and is not more than the total number of answers.
2. The method for automatically identifying correct answers in a community question-answering forum based on artificial intelligence as claimed in claim 1, wherein: the specific content of the step (1) comprises the following operation steps:
(11) the information of a website is crawled by using a crawler, and information such as post question asking, answer, question user, answer user, posting time and the like is stored, or data can be obtained from other similar data sets and is arranged together;
(12) traversing and filling NULL attributes, unifying the maximum length of the text, and cleaning interference data;
(13) and storing the data obtained in the last step in a text pair mode through questions and single answers, and carrying out manual annotation.
3. The method for automatically identifying correct answers in a community question-answering forum based on artificial intelligence as claimed in claim 1, wherein: the specific content of the step (2) comprises the following operation steps:
(21) performing fine tuning training by using a BERT model according to the data obtained in the step (1); the BERT model carries out byte coding, segment coding and position coding on input text content; after the fine tuning training is finished, storing the fine tuned model;
(22) and (4) adding the vectors of the three coding layers obtained in the step (21) and then classifying to obtain a single question and a single answer classification result, wherein the classification result contains text features such as mood, keywords and the like learned by the BERT model in the text.
4. The method for automatically identifying correct answers in a community question-answering forum based on artificial intelligence as claimed in claim 1, wherein: the specific content of the step (3) comprises the following operation steps:
(31) reading the current question and the time of the answer thereof in the data set, calculating the difference of days, namely the difference of days is the time of the question-the time of the answer, and calculating the similarity of the single answer and the question by using a TF-IDF (Trans-inverse document frequency) algorithm;
(32) calculating the similarity between each answer and the answer with the highest probability of the current question according to the classification result of all the answers obtained in the step (2), wherein the similarity is calculated by using a TF-IDF (Trans-inverse document frequency) algorithm, and the answer with the highest probability is an excellent answer;
(33) and (3) splicing the obtained day difference features, similarity features and the feature values obtained in the step (2) into a feature vector, wherein the format of the feature vector is [ BERT prediction probability, similarity of current answer and excellent answer, similarity of answer and question, and day difference ].
5. The method for automatically identifying correct answers in a community question-answering forum based on artificial intelligence as claimed in claim 1, wherein: the specific content of the step (4) comprises the following operation steps:
(41) selecting an SVM model as a machine learning classification model, and training the machine learning classification model according to the feature vector obtained in the step (3);
(42) obtaining relevant information of target posts, including but not limited to question content, answer content and posting time, and storing the question and the single answer in a text pair mode according to the storage format of the step (1);
(43) predicting the target paste by using the BERT model finely adjusted in the step (2) according to the text data obtained in the last step, calculating features such as an antenna number difference, similarity and the like according to the method in the step (3), and combining the features into a feature vector, wherein the feature vector has the same format as the feature vector formed in the step (3), and the number of the feature vectors is equal to the number of answers;
(44) and (4) predicting the feature vectors by using the machine learning classification model trained in the step (41), and outputting the previous n answers with the highest probability for the user to refer to, wherein n is a natural number and is not more than the total number of answers.
CN201911058818.8A 2019-11-01 2019-11-01 Method for automatically identifying correct answers in community question-answering forum based on artificial intelligence Pending CN110825930A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911058818.8A CN110825930A (en) 2019-11-01 2019-11-01 Method for automatically identifying correct answers in community question-answering forum based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911058818.8A CN110825930A (en) 2019-11-01 2019-11-01 Method for automatically identifying correct answers in community question-answering forum based on artificial intelligence

Publications (1)

Publication Number Publication Date
CN110825930A true CN110825930A (en) 2020-02-21

Family

ID=69551882

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911058818.8A Pending CN110825930A (en) 2019-11-01 2019-11-01 Method for automatically identifying correct answers in community question-answering forum based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN110825930A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159414A (en) * 2020-04-02 2020-05-15 成都数联铭品科技有限公司 Text classification method and system, electronic equipment and computer readable storage medium
CN113609851A (en) * 2021-07-09 2021-11-05 浙江连信科技有限公司 Psychological idea cognitive deviation identification method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100191686A1 (en) * 2009-01-23 2010-07-29 Microsoft Corporation Answer Ranking In Community Question-Answering Sites
CN109271505A (en) * 2018-11-12 2019-01-25 深圳智能思创科技有限公司 A kind of question answering system implementation method based on problem answers pair
CN109472305A (en) * 2018-10-31 2019-03-15 国信优易数据有限公司 Answer quality determines model training method, answer quality determination method and device
CN109558477A (en) * 2018-10-23 2019-04-02 深圳先进技术研究院 A kind of community's question answering system, method and electronic equipment based on multi-task learning
CN109871439A (en) * 2019-02-18 2019-06-11 华南理工大学 A kind of Ask-Answer Community problem method for routing based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100191686A1 (en) * 2009-01-23 2010-07-29 Microsoft Corporation Answer Ranking In Community Question-Answering Sites
CN109558477A (en) * 2018-10-23 2019-04-02 深圳先进技术研究院 A kind of community's question answering system, method and electronic equipment based on multi-task learning
CN109472305A (en) * 2018-10-31 2019-03-15 国信优易数据有限公司 Answer quality determines model training method, answer quality determination method and device
CN109271505A (en) * 2018-11-12 2019-01-25 深圳智能思创科技有限公司 A kind of question answering system implementation method based on problem answers pair
CN109871439A (en) * 2019-02-18 2019-06-11 华南理工大学 A kind of Ask-Answer Community problem method for routing based on deep learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JACOB DEVLIN 等: "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding", 《HTTPS://ARXIV.ORG/ABS/1810.04805》 *
LICHUN YANG 等: "Analyzing and Predicting Not-Answered Questions in Community-based Question Answering Services", 《TWENTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》 *
车车轮轮滚滚滚: "《如何拼接不同维度的特征并传入SVM训练?》", 《HTTPS://BBS.CSDN.NET/TOPICS/392506779》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159414A (en) * 2020-04-02 2020-05-15 成都数联铭品科技有限公司 Text classification method and system, electronic equipment and computer readable storage medium
CN113609851A (en) * 2021-07-09 2021-11-05 浙江连信科技有限公司 Psychological idea cognitive deviation identification method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN102262634B (en) Automatic questioning and answering method and system
CN110795543A (en) Unstructured data extraction method and device based on deep learning and storage medium
CN111767716B (en) Method and device for determining enterprise multi-level industry information and computer equipment
CN111368042A (en) Intelligent question and answer method and device, computer equipment and computer storage medium
CN104156433B (en) Image retrieval method based on semantic mapping space construction
CN105095187A (en) Search intention identification method and device
CN111581376B (en) Automatic knowledge graph construction system and method
CN112287069B (en) Information retrieval method and device based on voice semantics and computer equipment
CN112818093A (en) Evidence document retrieval method, system and storage medium based on semantic matching
CN111078837A (en) Intelligent question and answer information processing method, electronic equipment and computer readable storage medium
CN104881428B (en) A kind of hum pattern extraction, search method and the device of hum pattern webpage
CN117076693A (en) Method for constructing digital human teacher multi-mode large language model pre-training discipline corpus
CN107844531B (en) Answer output method and device and computer equipment
CN115761753A (en) Retrieval type knowledge prefix guide visual question-answering method fused with knowledge graph
CN112966117A (en) Entity linking method
CN110825930A (en) Method for automatically identifying correct answers in community question-answering forum based on artificial intelligence
CN117520503A (en) Financial customer service dialogue generation method, device, equipment and medium based on LLM model
CN114048308A (en) Method and device for generating category retrieval report
CN111259223B (en) News recommendation and text classification method based on emotion analysis model
CN110969005A (en) Method and device for determining similarity between entity corpora
CN114842982B (en) Knowledge expression method, device and system for medical information system
CN114416914B (en) Processing method based on picture question and answer
CN116089578A (en) Automatic labeling method, system and storage medium for intelligent question-answering data
CN116306506A (en) Intelligent mail template method based on content identification
CN114090777A (en) Text data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200221