CN113901824A - Question-answering system construction method based on named entity recognition - Google Patents
Question-answering system construction method based on named entity recognition Download PDFInfo
- Publication number
- CN113901824A CN113901824A CN202111276164.3A CN202111276164A CN113901824A CN 113901824 A CN113901824 A CN 113901824A CN 202111276164 A CN202111276164 A CN 202111276164A CN 113901824 A CN113901824 A CN 113901824A
- Authority
- CN
- China
- Prior art keywords
- question
- answer
- questions
- entity
- database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000010276 construction Methods 0.000 title claims abstract description 10
- 239000013598 vector Substances 0.000 claims abstract description 49
- 238000000034 method Methods 0.000 claims abstract description 22
- 238000004364 calculation method Methods 0.000 claims abstract description 12
- 230000011218 segmentation Effects 0.000 claims abstract description 11
- 101100421536 Danio rerio sim1a gene Proteins 0.000 claims description 5
- 101100495431 Schizosaccharomyces pombe (strain 972 / ATCC 24843) cnp1 gene Proteins 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 238000012163 sequencing technique Methods 0.000 claims description 5
- 230000008520 organization Effects 0.000 claims description 4
- 230000002457 bidirectional effect Effects 0.000 claims description 3
- 238000004140 cleaning Methods 0.000 claims description 3
- 230000009193 crawling Effects 0.000 claims description 3
- 238000012552 review Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 241000220225 Malus Species 0.000 description 9
- 235000013399 edible fruits Nutrition 0.000 description 4
- 230000007547 defect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 235000021016 apples Nutrition 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a question-answer system construction method based on named entity recognition, which comprises the following steps of 1, constructing a question-answer database: step 2, conducting named entity recognition and non-named entity recognition on the questions in the question-answer database, step 3, storing the recognition results of the step 2 into corresponding fields in the question-answer database, step 4, calculating similarity, conducting entity recognition on the questions input by the user, dividing Chinese words to obtain named entities and non-named entities, finding corresponding entity questions from the question-answer database as candidate questions, and returning answers of the candidate questions with the highest similarity; the method comprises the steps of conducting named entity recognition and Chinese word segmentation on questions in a question-answering database to obtain word vectors of named entities and non-named entities, further obtaining corresponding candidate questions, obtaining the similarity between user input and the candidate questions according to an improved similarity calculation method, accurately matching the user input questions, and improving the accuracy of answers in a question-answering system.
Description
Technical Field
The invention relates to the field of natural language processing research, in particular to a question-answering system construction method based on named entity recognition.
Background
The rapid development of the mobile internet brings abundant and diverse information to internet users. In the face of the vast amount of information on the internet, people are increasingly relying on querying information through search engines. However, conventional search engines return a large number of related web pages, and it is difficult for a user to quickly and accurately locate the correct answer that matches the question from among the large number of web pages.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: unlike the conventional search engine, the question-answering system, as a novel information retrieval technology, can directly return accurate answers to users, thereby saving the time for the users to search for required information from a large number of related web pages. The short text similarity calculation plays an important role in a question-answering system, because the questions and the answers are in the form of short texts, particularly, the length of the questions is generally not more than 100 characters, and the contained information amount is small; and the user expression habits are different, and irregular expressions such as wrongly written characters, short words, spoken language and the like exist in the short text problem, so that the quality of the given answer is reduced. The short text is different from the long text, has the characteristics of short content, sparse features and the like, and causes poor calculation and measurement effects of the similarity of the short text. The existing short text similarity method cannot effectively solve the problem of interference of short text noise words, and improves the accuracy of short text similarity calculation. Therefore, a new semantic similarity method needs to be provided to improve the matching precision of automatically returning user answers. How to dig out valuable information from short text information, accurately position the most similar question and return the most accurate answer of the user is a problem to be solved urgently.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a question-answering system construction method based on named entity identification. The technical scheme is as follows:
a question-answering system construction method based on named entity recognition comprises the following steps:
step 1, constructing a question-answer database:
a question-answer data source is obtained, a web crawler is utilized to grab a question-answer platform as a data source of a question-answer database,
after the web page is captured, data cleaning operation is needed, useless data are eliminated, and problem elements are obtained: question, answer time, number of praise, number of comments field; calculating the effective score number S of each answer record according to the question elements; and according to the effective score S, only one answer record with the highest effective score is reserved for each question and is stored in a question-answer database.
Step 2, conducting named entity recognition on the questions in the question and answer database, wherein the named entity recognition refers to recognition of entities with specific meanings in the text, including names of people, places and organizations; and (3) carrying out named entity recognition on the questions in the question-answer database by using a BERT-BilSTM-CRF model, generating word vector semantic representation of input contents by using the BERT, and connecting the BilSTM-CRF model.
The method for carrying out entity identification by using the BERT-BilSTM-CRF model comprises the following steps:
(1) processing the problem by using a bidirectional Transformer encoder in a BERT pre-training language model, constructing an Embedding layer, and obtaining a vector representation of each word as the input of a downstream task BilSTM-CRF.
(2) The word vector obtained by BERT processing is used as the input of the BilSTM model, the sequence input is processed in the forward direction and the reverse direction at the same time, and then the forward information vector at the same time is outputAnd the output of the reverse information vectorSplicing to obtain sentence representation at time tThe association between text contexts is learned in both the forward and reverse directions.
(3) Using the output of the BilsTM layer as CRFInput sequence X ═ X1,x2,…,xn) X represents a word vector, n represents the number of input word vectors, the constraint conditions among learning labels improve the accuracy of label prediction to obtain a final prediction label sequence, and marking information for each position of an input problem.
Performing Chinese word segmentation on the questions in the question-answer database, and identifying non-named entities: and performing word segmentation and part-of-speech tagging on the questions in the question-answer database by using a Baidu LAC word segmentation tool, skipping pronouns, adjectives and adverbs which have no value in calculating similarity, and screening out non-named entity nouns and non-named entity verbs.
And 3, storing the identification result of the step 2 into a corresponding field in a question-answer database, and adding the following field columns to each question in the database: and (2) organizing the organization entity, the person name entity, the place entity, the non-named entity noun and the non-named entity verb, and respectively storing the named entity and the non-named entity obtained in the step (2) into corresponding columns, wherein each element comprises a stored entity name and an entity word vector, and if a plurality of names exist in a certain class, the named entity and the non-named entity are separately stored by commas.
Step 4, calculating similarity, performing entity recognition on the user input questions, dividing Chinese words into words to obtain named entities and non-named entities, finding corresponding entity questions from the question-answer database as candidate questions, calculating the similarity between the user input questions and the candidate questions through an improved similarity calculation method, and returning answers of the candidate questions with the highest similarity; the method specifically comprises the following steps:
after the named entities are identified according to the questions input by the user, if the named entities exist, the corresponding problems of the named entities are found from the question-answer database and serve as candidate problems.
Calculating similarity sim1 according to the word vector of the user input question and the word vectors of the candidate questions thereof(x,y)(ii) a And sequencing the similarity values of the candidate questions, and selecting the answer corresponding to the candidate question with the highest similarity score as the return answer of the question input by the user.
If no named entity exists, the question-answer data is obtainedFinding out the question of the corresponding non-named entity in the library as a candidate answer, and calculating the similarity sim2 according to the word vector of the question input by the user and the word vector of the candidate question(x,y);
And sequencing the similarity values of the candidate questions, and selecting the answer corresponding to the candidate question with the highest similarity score as the return answer of the question input by the user.
Preferably, the question-answering platform in step 1 selects one or more of the following platforms: baidu bar, Baidu know, search for questions, 360 questions and answers, search for fox questions and answers, and know answer.
Preferably, the effective score S of each answer record in step 1 is:
where d is the number of days +1, n from the answer record to the latest answer1Indicates the number of praise, n2Indicating the number of reviews.
Preferably, the step 1 further comprises periodically crawling the question-answer platform at regular time to update the question-answer database, calculating the effective score of the newly added answer for the same question existing in the database, and if the effective score is higher than the effective score of the answer of the question in the database, directly replacing the answer of the question in the database; if the effective score of the question in the database is lower than the effective score of the question in the database, the answer of the question in the database is unchanged.
Preferably, the similarity sim1(x,y)The calculation method is as follows:
wherein, W1,W2,…,WaRepresenting a named class entity word vector, N1,N2,…NbRepresenting noun word vectors of non-named classes, V1,V2,…VbRepresenting a non-naming class entity noun word vector, a representing the number of naming class entities, b representing the noun of the non-naming class entityThe number of the named entity verbs is c.
Preferably, the similarity sim2(x,y)The calculation method is as follows:
further, the step 4 of finding the corresponding problem of the named entity from the question-answer database as a candidate problem refers to finding the same word and the same entity type to which the word belongs.
Compared with the prior art, one of the technical schemes has the following beneficial effects: the method comprises the steps of conducting named entity recognition and Chinese segmentation on questions in a question-answering database to obtain word vectors of named entities and non-named entities, obtaining corresponding candidate questions according to the named entities or the non-named entities of the questions input by a user, obtaining the similarity between the user input and the candidate questions according to an improved similarity calculation method, accurately matching the user input questions, and improving the accuracy of answers in a question-answering system.
Drawings
Fig. 1 is a flowchart of a method for constructing a question-answering system based on named entity identification according to an embodiment of the present disclosure.
Fig. 2 is a flow chart of constructing a question-answer database according to an embodiment of the present disclosure.
FIG. 3 is a block diagram of entity identification using a BERT-BilSTM-CRF model according to an embodiment of the present disclosure.
Detailed Description
In order to clarify the technical solution and the working principle of the present invention, the embodiments of the present disclosure will be described in further detail with reference to the accompanying drawings. All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
The terms "step 1," "step 2," "step 3," and the like in the description and claims of this application and the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be practiced in sequences other than those described herein.
The embodiment of the disclosure provides a question-answering system construction method based on named entity identification, and the attached figure 1 is a flow chart of the question-answering system construction method based on named entity identification, and the method mainly comprises the following steps by combining the flow chart:
step 1, constructing a question-answer database: fig. 2 is a flow chart of a question-answer database construction, and the specific method combining the chart is as follows:
obtaining a question-answer data source, and capturing a question-answer platform by using a web crawler as a data source of a question-answer database, wherein preferably, the question-answer platform selects one or more of the following platforms: baidu bar, Baidu know, search for questions, 360 questions and answers, search for fox questions and answers, and know answer.
After the web page is captured, data cleaning operation is needed, useless data (pictures, links, advertisements and the like) are eliminated, and problem elements are obtained: question, answer time, number of praise, number of comments field; calculating the effective score number S of each answer record according to the question elements; since a question is often answered by multiple people, the same question is recorded in multiple records, and the effective score of each answer record is calculated.
Preferably, the effective score S of each answer record is:
where d ═ n (the number of days the answer was recorded from the most recent answer +1), n1Indicates the number of praise, n2Representing the number of reviews; when the answer record is the only answer of a certain question or the latest answer of a certain question, d is 1, so that d is not equal to 0, and the denominator is avoided to be 0.
And according to the effective score S, only one answer record with the highest effective score is reserved for each question and is stored in a question-answer database.
Preferably, the step 1 further comprises periodically crawling the question-answer platform at regular time to update the question-answer database, calculating the effective score of the newly added answer for the same question existing in the database, and if the effective score is higher than the effective score of the answer of the question in the database, directly replacing the answer of the question in the database; if the effective score of the question in the database is lower than the effective score of the question in the database, the answer of the question in the database is unchanged.
And 2, carrying out named entity identification and non-named entity identification on the questions in the question and answer database, wherein the named entity identification refers to identification of entities with specific meanings in the text, including names of people, places and organizations. And (3) carrying out named entity recognition on the questions in the question-answer database by using a BERT-BilSTM-CRF model, generating word vector semantic representation of input contents by using the BERT, and connecting the BilSTM-CRF model. FIG. 3 is a block diagram of entity identification using a BERT-BilSTM-CRF model, and in conjunction with this figure, the method of entity identification using the BERT-BilSTM-CRF model is as follows:
(1) processing the problem by using a bidirectional Transformer encoder in a BERT pre-training language model, constructing an Embedding layer, and obtaining a vector representation of each word as the input of a downstream task BilSTM-CRF.
(2) The word vector obtained by BERT processing is used as the input of the BilSTM model, the sequence input is processed in the forward direction and the reverse direction at the same time, and then the forward information vector at the same time is outputAnd the output of the reverse information vectorSplicing to obtain sentence representation at time tBy learning the relation between the text contexts in the forward direction and the reverse direction, the defect that the context can only be derived by the unidirectional circulation neural network can be effectively overcome.
(3) Transporting the BilSt layerThe input sequence X ═ X (X) is extracted as CRF1,x2,…,xn) X represents a word vector, n represents the number of input word vectors, the constraint conditions among learning labels improve the accuracy of label prediction to obtain a final prediction label sequence, and marking information for each position of an input problem.
For example, the input question of step one is "what the latest products released by apple are", and the output of step three is (O) of apple (B-ORG) and fruit (I-ORG) and cloth (O) of last (O) and new (O) products (O) of apple (B-ORG) and fruit (O) is (O) and apple (I-ORG) and fruit (O) of last (O) and new products (O) of apple (O) and fruit (O).
Performing Chinese word segmentation on the questions in the question-answer database, and identifying non-named entities: the method comprises the steps of using a Baidu LAC word segmentation tool to perform word segmentation and part-of-speech tagging on questions in a question-answer database, skipping pronouns, adjectives and adverbs which have no value in calculating similarity, and screening out non-named entity class nouns and non-named entity class verbs, wherein the non-named entity classes such as 'release' and 'product' are obtained by segmenting words from 'latest products released by apples'.
And 3, storing the identification result of the step 2 into a corresponding field in a question-answer database, and adding the following field columns to each question in the database: and (3) organizing the organization entity, the person name entity, the place entity, the non-naming entity noun and the non-naming entity verb obtained in the step (2), and storing the naming entity and the non-naming entity obtained in the step (2) into corresponding columns respectively, wherein each element comprises a stored entity name and an entity word vector, and if a plurality of names exist in a certain class, the names are stored separately by commas, such as (entity 1, word vector 1), (entity 2, word vector 2), …).
And 4, calculating the similarity, performing entity recognition on the user input questions, dividing words in Chinese language to obtain named entities and non-named entities, finding corresponding entity questions from the question-answer database to serve as candidate questions, calculating the similarity between the user input questions and the candidate questions by an improved similarity calculation method, and returning answers of the candidate questions with the highest similarity. The method specifically comprises the following steps:
after the user input question is subjected to named entity recognition, if the named entity (organization, person name and place name) exists, the named entity is selected from a question-answer databaseThe corresponding named entity is found as a candidate question, and the similarity sim1 is calculated according to the word vector of the question input by the user and the word vector of the candidate question(x,y):
Wherein, W1,W2,…,WaRepresenting a named class entity word vector, N1,N2,…NbRepresenting noun word vectors of non-named classes, V1,V2,…VbRepresenting a non-naming entity noun word vector, a representing the number of naming entities, b representing the number of non-naming entity nouns, and c representing the number of non-naming entity verbs;
and sequencing the similarity values of the candidate questions, and selecting the answer corresponding to the candidate question with the highest similarity score as the return answer of the question input by the user.
If no named entity exists, the question of the corresponding non-named entity (noun and verb) is found from the question-answer database as a candidate answer, and the similarity sim2 is calculated according to the word vector of the question input by the user and the word vector of the candidate question(x,y):
And sequencing the similarity values of the candidate questions, and selecting the answer corresponding to the candidate question with the highest similarity score as the return answer of the question input by the user.
Preferably, the step 4 of finding the corresponding named entity from the question-answer database as a candidate question refers to finding the same word and the types of the entities to which the word belongs are the same, such as: the ' apple haute and the ' apple released by the apple is what ' the former ' apple ' is an unnamed entity noun, the latter ' apple ' is a named entity organizational structure type, and the two entity types are obviously inconsistent and do not belong to the problem of the corresponding named entity.
The invention has been described above by way of example with reference to the accompanying drawings, it being understood that the invention is not limited to the specific embodiments described above, but is capable of numerous insubstantial modifications when implemented in accordance with the principles and solutions of the present invention; or directly apply the conception and the technical scheme of the invention to other occasions without improvement and equivalent replacement, and the invention is within the protection scope of the invention.
Claims (7)
1. A question-answering system construction method based on named entity recognition is characterized by comprising the following steps:
step 1, constructing a question-answer database:
a question-answer data source is obtained, a web crawler is utilized to grab a question-answer platform as a data source of a question-answer database,
after the web page is captured, data cleaning operation is needed, useless data are eliminated, and problem elements are obtained: question, answer time, number of praise, number of comments field; calculating the effective score number S of each answer record according to the question elements; according to the effective score S, only one answer record with the highest effective score is reserved for each question and is stored in a question-answer database;
step 2, conducting named entity recognition and non-named entity recognition on the questions in the question and answer database, wherein the named entity recognition refers to recognition of entities with specific meanings in the text, including names of people, places and organizations; using a BERT-BilSTM-CRF model to identify named entities of questions in a question-answer database, generating word vector semantic representation of input contents by using the BERT, and connecting the BilSTM-CRF model;
the method for carrying out entity identification by using the BERT-BilSTM-CRF model comprises the following steps:
(1) processing the problem by using a bidirectional Transformer encoder in a BERT pre-training language model, constructing an Embedding layer, and obtaining the vector representation of each word as the input of a downstream task BiLSTM-CRF;
(2) the word vectors obtained by the BERT process are used as input of the BilSTM model, and the sequence input is processed in the forward and reverse directionsThen outputting the forward information vector at the same timeAnd the output of the reverse information vectorSplicing to obtain sentence representation at time tLearning the association between text contexts in both forward and reverse directions;
(3) the output of the BilSTM layer is used as the input sequence X ═ X (X) of the CRF1,x2,…,xn) X represents a word vector, n represents the number of input word vectors, the constraint conditions among learning labels improve the accuracy of label prediction to obtain a final prediction label sequence, and marking information for each position of an input problem;
performing Chinese word segmentation on the questions in the question-answer database, and identifying non-named entities: using a Baidu LAC word segmentation tool to perform word segmentation and part-of-speech tagging on the questions in the question-answer database, skipping pronouns, adjectives and adverbs which have no value in calculating similarity, and screening out non-named entity nouns and non-named entity verbs;
and 3, storing the identification result of the step 2 into a corresponding field in a question-answer database, and adding the following field columns to each question in the database: organization entity, name entity, location entity, noun of non-naming entity and verb of non-naming entity, storing the naming entity and non-naming entity obtained in step 2 into corresponding columns respectively, wherein each element includes storing entity name and entity word vector, if there are multiple names in a certain class, storing them separately with comma;
step 4, calculating similarity, performing entity recognition on the user input questions, dividing Chinese words into words to obtain named entities and non-named entities, finding corresponding entity questions from the question-answer database as candidate questions, calculating the similarity between the user input questions and the candidate questions through an improved similarity calculation method, and returning answers of the candidate questions with the highest similarity; the method specifically comprises the following steps:
after the named entities are identified by the questions input by the user, if the named entities exist, the corresponding problems of the named entities are found from the question-answer database as candidate problems,
calculating similarity sim1 according to the word vector of the user input question and the word vectors of the candidate questions thereof(x,y)(ii) a Sorting the similarity values of the candidate questions, and selecting the answer corresponding to the candidate question with the highest similarity score as the return answer of the user input question;
if no named entity exists, finding out the corresponding question of the non-named entity from the question-answer database as a candidate answer, and calculating the similarity sim2 according to the word vector of the question input by the user and the word vector of the candidate question(x,y);
And sequencing the similarity values of the candidate questions, and selecting the answer corresponding to the candidate question with the highest similarity score as the return answer of the question input by the user.
2. The method for constructing the question-answering system based on named entity recognition according to claim 1, wherein the question-answering platform in step 1 selects one or more of the following platforms: baidu bar, Baidu know, search for questions, 360 questions and answers, search for fox questions and answers, and know answer.
3. The method for constructing a question-answering system based on named entity recognition according to claim 1, wherein the effective score of each answer record in step 1 is as follows:
where d is the number of days +1, n from the answer record to the latest answer1Indicates the number of praise, n2Indicating the number of reviews.
4. The method for constructing the question-answer system based on named entity recognition according to any one of claims 1 to 3, wherein the step 1 further comprises periodically and regularly crawling the question-answer class platform to update the question-answer database, calculating the effective score of a newly added answer for the same question existing in the database, and directly replacing the answer of the question in the database if the effective score of the answer of the question is higher than that in the database; if the effective score of the question in the database is lower than the effective score of the question in the database, the answer of the question in the database is unchanged.
5. The method for constructing the question-answering system based on named entity recognition according to claim 4, wherein the similarity sim1 is(x,y)The calculation method is as follows:
wherein, W1,W2,…,WaRepresenting a named class entity word vector, N1,N2,…NbRepresenting noun word vectors of non-named classes, V1,V2,…VbThe word vector of the noun of the non-naming class entity is represented, a represents the number of the noun of the naming class entity, b represents the number of the noun of the non-naming class entity, and c represents the number of the verb of the non-naming class entity.
7. the method for constructing a question-answer system based on named entity recognition according to any one of claims 5 or 6, wherein the step 4 of finding the question of the corresponding named entity from the question-answer database as a candidate question refers to finding the question of the same word and the same entity type to which the same word belongs.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111276164.3A CN113901824A (en) | 2021-10-29 | 2021-10-29 | Question-answering system construction method based on named entity recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111276164.3A CN113901824A (en) | 2021-10-29 | 2021-10-29 | Question-answering system construction method based on named entity recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113901824A true CN113901824A (en) | 2022-01-07 |
Family
ID=79027156
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111276164.3A Withdrawn CN113901824A (en) | 2021-10-29 | 2021-10-29 | Question-answering system construction method based on named entity recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113901824A (en) |
-
2021
- 2021-10-29 CN CN202111276164.3A patent/CN113901824A/en not_active Withdrawn
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5671333A (en) | Training apparatus and method | |
US6728695B1 (en) | Method and apparatus for making predictions about entities represented in documents | |
CN113094578B (en) | Deep learning-based content recommendation method, device, equipment and storage medium | |
CN113505586A (en) | Seat-assisted question-answering method and system integrating semantic classification and knowledge graph | |
CN111353306A (en) | Entity relationship and dependency Tree-LSTM-based combined event extraction method | |
CN111274829A (en) | Sequence labeling method using cross-language information | |
CN112632258A (en) | Text data processing method and device, computer equipment and storage medium | |
CN116077942A (en) | Method for realizing interactive content recommendation | |
CN112711666B (en) | Futures label extraction method and device | |
JP6924975B2 (en) | Sound analyzer and its processing method, program | |
CN112579666A (en) | Intelligent question-answering system and method and related equipment | |
CN115017271B (en) | Method and system for intelligently generating RPA flow component block | |
Van Enschot et al. | Taming our wild data: On intercoder reliability in discourse research | |
CN113901824A (en) | Question-answering system construction method based on named entity recognition | |
CN111798217B (en) | Data analysis system and method | |
CN114090777A (en) | Text data processing method and device | |
CN114328902A (en) | Text labeling model construction method and device | |
CN116775813B (en) | Service searching method, device, electronic equipment and readable storage medium | |
CN111209404B (en) | Method for generating similar question sentences based on deep learning assistance | |
CN114398492B (en) | Knowledge graph construction method, terminal and medium in digital field | |
CN116340481B (en) | Method and device for automatically replying to question, computer readable storage medium and terminal | |
CN110008307B (en) | Method and device for identifying deformed entity based on rules and statistical learning | |
Brajković et al. | Evaluating Text Summarization Using FAHP and TOPSIS Methods in Intelligent Tutoring Systems | |
CN118070784A (en) | Method, device, equipment and storage medium for constructing entity dictionary in vertical industrial field | |
CN111931481A (en) | Text emotion recognition method and device, storage medium and computer equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20220107 |