CN113901824A - Question-answering system construction method based on named entity recognition - Google Patents

Question-answering system construction method based on named entity recognition Download PDF

Info

Publication number
CN113901824A
CN113901824A CN202111276164.3A CN202111276164A CN113901824A CN 113901824 A CN113901824 A CN 113901824A CN 202111276164 A CN202111276164 A CN 202111276164A CN 113901824 A CN113901824 A CN 113901824A
Authority
CN
China
Prior art keywords
question
answer
questions
entity
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202111276164.3A
Other languages
Chinese (zh)
Inventor
周洁琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Inspector Intelligent Technology Co Ltd
Original Assignee
Nanjing Inspector Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Inspector Intelligent Technology Co Ltd filed Critical Nanjing Inspector Intelligent Technology Co Ltd
Priority to CN202111276164.3A priority Critical patent/CN113901824A/en
Publication of CN113901824A publication Critical patent/CN113901824A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a question-answer system construction method based on named entity recognition, which comprises the following steps of 1, constructing a question-answer database: step 2, conducting named entity recognition and non-named entity recognition on the questions in the question-answer database, step 3, storing the recognition results of the step 2 into corresponding fields in the question-answer database, step 4, calculating similarity, conducting entity recognition on the questions input by the user, dividing Chinese words to obtain named entities and non-named entities, finding corresponding entity questions from the question-answer database as candidate questions, and returning answers of the candidate questions with the highest similarity; the method comprises the steps of conducting named entity recognition and Chinese word segmentation on questions in a question-answering database to obtain word vectors of named entities and non-named entities, further obtaining corresponding candidate questions, obtaining the similarity between user input and the candidate questions according to an improved similarity calculation method, accurately matching the user input questions, and improving the accuracy of answers in a question-answering system.

Description

Question-answering system construction method based on named entity recognition
Technical Field
The invention relates to the field of natural language processing research, in particular to a question-answering system construction method based on named entity recognition.
Background
The rapid development of the mobile internet brings abundant and diverse information to internet users. In the face of the vast amount of information on the internet, people are increasingly relying on querying information through search engines. However, conventional search engines return a large number of related web pages, and it is difficult for a user to quickly and accurately locate the correct answer that matches the question from among the large number of web pages.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: unlike the conventional search engine, the question-answering system, as a novel information retrieval technology, can directly return accurate answers to users, thereby saving the time for the users to search for required information from a large number of related web pages. The short text similarity calculation plays an important role in a question-answering system, because the questions and the answers are in the form of short texts, particularly, the length of the questions is generally not more than 100 characters, and the contained information amount is small; and the user expression habits are different, and irregular expressions such as wrongly written characters, short words, spoken language and the like exist in the short text problem, so that the quality of the given answer is reduced. The short text is different from the long text, has the characteristics of short content, sparse features and the like, and causes poor calculation and measurement effects of the similarity of the short text. The existing short text similarity method cannot effectively solve the problem of interference of short text noise words, and improves the accuracy of short text similarity calculation. Therefore, a new semantic similarity method needs to be provided to improve the matching precision of automatically returning user answers. How to dig out valuable information from short text information, accurately position the most similar question and return the most accurate answer of the user is a problem to be solved urgently.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a question-answering system construction method based on named entity identification. The technical scheme is as follows:
a question-answering system construction method based on named entity recognition comprises the following steps:
step 1, constructing a question-answer database:
a question-answer data source is obtained, a web crawler is utilized to grab a question-answer platform as a data source of a question-answer database,
after the web page is captured, data cleaning operation is needed, useless data are eliminated, and problem elements are obtained: question, answer time, number of praise, number of comments field; calculating the effective score number S of each answer record according to the question elements; and according to the effective score S, only one answer record with the highest effective score is reserved for each question and is stored in a question-answer database.
Step 2, conducting named entity recognition on the questions in the question and answer database, wherein the named entity recognition refers to recognition of entities with specific meanings in the text, including names of people, places and organizations; and (3) carrying out named entity recognition on the questions in the question-answer database by using a BERT-BilSTM-CRF model, generating word vector semantic representation of input contents by using the BERT, and connecting the BilSTM-CRF model.
The method for carrying out entity identification by using the BERT-BilSTM-CRF model comprises the following steps:
(1) processing the problem by using a bidirectional Transformer encoder in a BERT pre-training language model, constructing an Embedding layer, and obtaining a vector representation of each word as the input of a downstream task BilSTM-CRF.
(2) The word vector obtained by BERT processing is used as the input of the BilSTM model, the sequence input is processed in the forward direction and the reverse direction at the same time, and then the forward information vector at the same time is output
Figure BDA0003330016630000021
And the output of the reverse information vector
Figure BDA0003330016630000022
Splicing to obtain sentence representation at time t
Figure BDA0003330016630000023
The association between text contexts is learned in both the forward and reverse directions.
(3) Using the output of the BilsTM layer as CRFInput sequence X ═ X1,x2,…,xn) X represents a word vector, n represents the number of input word vectors, the constraint conditions among learning labels improve the accuracy of label prediction to obtain a final prediction label sequence, and marking information for each position of an input problem.
Performing Chinese word segmentation on the questions in the question-answer database, and identifying non-named entities: and performing word segmentation and part-of-speech tagging on the questions in the question-answer database by using a Baidu LAC word segmentation tool, skipping pronouns, adjectives and adverbs which have no value in calculating similarity, and screening out non-named entity nouns and non-named entity verbs.
And 3, storing the identification result of the step 2 into a corresponding field in a question-answer database, and adding the following field columns to each question in the database: and (2) organizing the organization entity, the person name entity, the place entity, the non-named entity noun and the non-named entity verb, and respectively storing the named entity and the non-named entity obtained in the step (2) into corresponding columns, wherein each element comprises a stored entity name and an entity word vector, and if a plurality of names exist in a certain class, the named entity and the non-named entity are separately stored by commas.
Step 4, calculating similarity, performing entity recognition on the user input questions, dividing Chinese words into words to obtain named entities and non-named entities, finding corresponding entity questions from the question-answer database as candidate questions, calculating the similarity between the user input questions and the candidate questions through an improved similarity calculation method, and returning answers of the candidate questions with the highest similarity; the method specifically comprises the following steps:
after the named entities are identified according to the questions input by the user, if the named entities exist, the corresponding problems of the named entities are found from the question-answer database and serve as candidate problems.
Calculating similarity sim1 according to the word vector of the user input question and the word vectors of the candidate questions thereof(x,y)(ii) a And sequencing the similarity values of the candidate questions, and selecting the answer corresponding to the candidate question with the highest similarity score as the return answer of the question input by the user.
If no named entity exists, the question-answer data is obtainedFinding out the question of the corresponding non-named entity in the library as a candidate answer, and calculating the similarity sim2 according to the word vector of the question input by the user and the word vector of the candidate question(x,y)
And sequencing the similarity values of the candidate questions, and selecting the answer corresponding to the candidate question with the highest similarity score as the return answer of the question input by the user.
Preferably, the question-answering platform in step 1 selects one or more of the following platforms: baidu bar, Baidu know, search for questions, 360 questions and answers, search for fox questions and answers, and know answer.
Preferably, the effective score S of each answer record in step 1 is:
Figure BDA0003330016630000031
where d is the number of days +1, n from the answer record to the latest answer1Indicates the number of praise, n2Indicating the number of reviews.
Preferably, the step 1 further comprises periodically crawling the question-answer platform at regular time to update the question-answer database, calculating the effective score of the newly added answer for the same question existing in the database, and if the effective score is higher than the effective score of the answer of the question in the database, directly replacing the answer of the question in the database; if the effective score of the question in the database is lower than the effective score of the question in the database, the answer of the question in the database is unchanged.
Preferably, the similarity sim1(x,y)The calculation method is as follows:
Figure BDA0003330016630000032
wherein, W1,W2,…,WaRepresenting a named class entity word vector, N1,N2,…NbRepresenting noun word vectors of non-named classes, V1,V2,…VbRepresenting a non-naming class entity noun word vector, a representing the number of naming class entities, b representing the noun of the non-naming class entityThe number of the named entity verbs is c.
Preferably, the similarity sim2(x,y)The calculation method is as follows:
Figure BDA0003330016630000041
further, the step 4 of finding the corresponding problem of the named entity from the question-answer database as a candidate problem refers to finding the same word and the same entity type to which the word belongs.
Compared with the prior art, one of the technical schemes has the following beneficial effects: the method comprises the steps of conducting named entity recognition and Chinese segmentation on questions in a question-answering database to obtain word vectors of named entities and non-named entities, obtaining corresponding candidate questions according to the named entities or the non-named entities of the questions input by a user, obtaining the similarity between the user input and the candidate questions according to an improved similarity calculation method, accurately matching the user input questions, and improving the accuracy of answers in a question-answering system.
Drawings
Fig. 1 is a flowchart of a method for constructing a question-answering system based on named entity identification according to an embodiment of the present disclosure.
Fig. 2 is a flow chart of constructing a question-answer database according to an embodiment of the present disclosure.
FIG. 3 is a block diagram of entity identification using a BERT-BilSTM-CRF model according to an embodiment of the present disclosure.
Detailed Description
In order to clarify the technical solution and the working principle of the present invention, the embodiments of the present disclosure will be described in further detail with reference to the accompanying drawings. All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
The terms "step 1," "step 2," "step 3," and the like in the description and claims of this application and the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be practiced in sequences other than those described herein.
The embodiment of the disclosure provides a question-answering system construction method based on named entity identification, and the attached figure 1 is a flow chart of the question-answering system construction method based on named entity identification, and the method mainly comprises the following steps by combining the flow chart:
step 1, constructing a question-answer database: fig. 2 is a flow chart of a question-answer database construction, and the specific method combining the chart is as follows:
obtaining a question-answer data source, and capturing a question-answer platform by using a web crawler as a data source of a question-answer database, wherein preferably, the question-answer platform selects one or more of the following platforms: baidu bar, Baidu know, search for questions, 360 questions and answers, search for fox questions and answers, and know answer.
After the web page is captured, data cleaning operation is needed, useless data (pictures, links, advertisements and the like) are eliminated, and problem elements are obtained: question, answer time, number of praise, number of comments field; calculating the effective score number S of each answer record according to the question elements; since a question is often answered by multiple people, the same question is recorded in multiple records, and the effective score of each answer record is calculated.
Preferably, the effective score S of each answer record is:
Figure BDA0003330016630000051
where d ═ n (the number of days the answer was recorded from the most recent answer +1), n1Indicates the number of praise, n2Representing the number of reviews; when the answer record is the only answer of a certain question or the latest answer of a certain question, d is 1, so that d is not equal to 0, and the denominator is avoided to be 0.
And according to the effective score S, only one answer record with the highest effective score is reserved for each question and is stored in a question-answer database.
Preferably, the step 1 further comprises periodically crawling the question-answer platform at regular time to update the question-answer database, calculating the effective score of the newly added answer for the same question existing in the database, and if the effective score is higher than the effective score of the answer of the question in the database, directly replacing the answer of the question in the database; if the effective score of the question in the database is lower than the effective score of the question in the database, the answer of the question in the database is unchanged.
And 2, carrying out named entity identification and non-named entity identification on the questions in the question and answer database, wherein the named entity identification refers to identification of entities with specific meanings in the text, including names of people, places and organizations. And (3) carrying out named entity recognition on the questions in the question-answer database by using a BERT-BilSTM-CRF model, generating word vector semantic representation of input contents by using the BERT, and connecting the BilSTM-CRF model. FIG. 3 is a block diagram of entity identification using a BERT-BilSTM-CRF model, and in conjunction with this figure, the method of entity identification using the BERT-BilSTM-CRF model is as follows:
(1) processing the problem by using a bidirectional Transformer encoder in a BERT pre-training language model, constructing an Embedding layer, and obtaining a vector representation of each word as the input of a downstream task BilSTM-CRF.
(2) The word vector obtained by BERT processing is used as the input of the BilSTM model, the sequence input is processed in the forward direction and the reverse direction at the same time, and then the forward information vector at the same time is output
Figure BDA0003330016630000052
And the output of the reverse information vector
Figure BDA0003330016630000053
Splicing to obtain sentence representation at time t
Figure BDA0003330016630000054
By learning the relation between the text contexts in the forward direction and the reverse direction, the defect that the context can only be derived by the unidirectional circulation neural network can be effectively overcome.
(3) Transporting the BilSt layerThe input sequence X ═ X (X) is extracted as CRF1,x2,…,xn) X represents a word vector, n represents the number of input word vectors, the constraint conditions among learning labels improve the accuracy of label prediction to obtain a final prediction label sequence, and marking information for each position of an input problem.
For example, the input question of step one is "what the latest products released by apple are", and the output of step three is (O) of apple (B-ORG) and fruit (I-ORG) and cloth (O) of last (O) and new (O) products (O) of apple (B-ORG) and fruit (O) is (O) and apple (I-ORG) and fruit (O) of last (O) and new products (O) of apple (O) and fruit (O).
Performing Chinese word segmentation on the questions in the question-answer database, and identifying non-named entities: the method comprises the steps of using a Baidu LAC word segmentation tool to perform word segmentation and part-of-speech tagging on questions in a question-answer database, skipping pronouns, adjectives and adverbs which have no value in calculating similarity, and screening out non-named entity class nouns and non-named entity class verbs, wherein the non-named entity classes such as 'release' and 'product' are obtained by segmenting words from 'latest products released by apples'.
And 3, storing the identification result of the step 2 into a corresponding field in a question-answer database, and adding the following field columns to each question in the database: and (3) organizing the organization entity, the person name entity, the place entity, the non-naming entity noun and the non-naming entity verb obtained in the step (2), and storing the naming entity and the non-naming entity obtained in the step (2) into corresponding columns respectively, wherein each element comprises a stored entity name and an entity word vector, and if a plurality of names exist in a certain class, the names are stored separately by commas, such as (entity 1, word vector 1), (entity 2, word vector 2), …).
And 4, calculating the similarity, performing entity recognition on the user input questions, dividing words in Chinese language to obtain named entities and non-named entities, finding corresponding entity questions from the question-answer database to serve as candidate questions, calculating the similarity between the user input questions and the candidate questions by an improved similarity calculation method, and returning answers of the candidate questions with the highest similarity. The method specifically comprises the following steps:
after the user input question is subjected to named entity recognition, if the named entity (organization, person name and place name) exists, the named entity is selected from a question-answer databaseThe corresponding named entity is found as a candidate question, and the similarity sim1 is calculated according to the word vector of the question input by the user and the word vector of the candidate question(x,y)
Figure BDA0003330016630000061
Wherein, W1,W2,…,WaRepresenting a named class entity word vector, N1,N2,…NbRepresenting noun word vectors of non-named classes, V1,V2,…VbRepresenting a non-naming entity noun word vector, a representing the number of naming entities, b representing the number of non-naming entity nouns, and c representing the number of non-naming entity verbs;
and sequencing the similarity values of the candidate questions, and selecting the answer corresponding to the candidate question with the highest similarity score as the return answer of the question input by the user.
If no named entity exists, the question of the corresponding non-named entity (noun and verb) is found from the question-answer database as a candidate answer, and the similarity sim2 is calculated according to the word vector of the question input by the user and the word vector of the candidate question(x,y)
Figure BDA0003330016630000071
And sequencing the similarity values of the candidate questions, and selecting the answer corresponding to the candidate question with the highest similarity score as the return answer of the question input by the user.
Preferably, the step 4 of finding the corresponding named entity from the question-answer database as a candidate question refers to finding the same word and the types of the entities to which the word belongs are the same, such as: the ' apple haute and the ' apple released by the apple is what ' the former ' apple ' is an unnamed entity noun, the latter ' apple ' is a named entity organizational structure type, and the two entity types are obviously inconsistent and do not belong to the problem of the corresponding named entity.
The invention has been described above by way of example with reference to the accompanying drawings, it being understood that the invention is not limited to the specific embodiments described above, but is capable of numerous insubstantial modifications when implemented in accordance with the principles and solutions of the present invention; or directly apply the conception and the technical scheme of the invention to other occasions without improvement and equivalent replacement, and the invention is within the protection scope of the invention.

Claims (7)

1. A question-answering system construction method based on named entity recognition is characterized by comprising the following steps:
step 1, constructing a question-answer database:
a question-answer data source is obtained, a web crawler is utilized to grab a question-answer platform as a data source of a question-answer database,
after the web page is captured, data cleaning operation is needed, useless data are eliminated, and problem elements are obtained: question, answer time, number of praise, number of comments field; calculating the effective score number S of each answer record according to the question elements; according to the effective score S, only one answer record with the highest effective score is reserved for each question and is stored in a question-answer database;
step 2, conducting named entity recognition and non-named entity recognition on the questions in the question and answer database, wherein the named entity recognition refers to recognition of entities with specific meanings in the text, including names of people, places and organizations; using a BERT-BilSTM-CRF model to identify named entities of questions in a question-answer database, generating word vector semantic representation of input contents by using the BERT, and connecting the BilSTM-CRF model;
the method for carrying out entity identification by using the BERT-BilSTM-CRF model comprises the following steps:
(1) processing the problem by using a bidirectional Transformer encoder in a BERT pre-training language model, constructing an Embedding layer, and obtaining the vector representation of each word as the input of a downstream task BiLSTM-CRF;
(2) the word vectors obtained by the BERT process are used as input of the BilSTM model, and the sequence input is processed in the forward and reverse directionsThen outputting the forward information vector at the same time
Figure FDA0003330016620000011
And the output of the reverse information vector
Figure FDA0003330016620000012
Splicing to obtain sentence representation at time t
Figure FDA0003330016620000013
Learning the association between text contexts in both forward and reverse directions;
(3) the output of the BilSTM layer is used as the input sequence X ═ X (X) of the CRF1,x2,…,xn) X represents a word vector, n represents the number of input word vectors, the constraint conditions among learning labels improve the accuracy of label prediction to obtain a final prediction label sequence, and marking information for each position of an input problem;
performing Chinese word segmentation on the questions in the question-answer database, and identifying non-named entities: using a Baidu LAC word segmentation tool to perform word segmentation and part-of-speech tagging on the questions in the question-answer database, skipping pronouns, adjectives and adverbs which have no value in calculating similarity, and screening out non-named entity nouns and non-named entity verbs;
and 3, storing the identification result of the step 2 into a corresponding field in a question-answer database, and adding the following field columns to each question in the database: organization entity, name entity, location entity, noun of non-naming entity and verb of non-naming entity, storing the naming entity and non-naming entity obtained in step 2 into corresponding columns respectively, wherein each element includes storing entity name and entity word vector, if there are multiple names in a certain class, storing them separately with comma;
step 4, calculating similarity, performing entity recognition on the user input questions, dividing Chinese words into words to obtain named entities and non-named entities, finding corresponding entity questions from the question-answer database as candidate questions, calculating the similarity between the user input questions and the candidate questions through an improved similarity calculation method, and returning answers of the candidate questions with the highest similarity; the method specifically comprises the following steps:
after the named entities are identified by the questions input by the user, if the named entities exist, the corresponding problems of the named entities are found from the question-answer database as candidate problems,
calculating similarity sim1 according to the word vector of the user input question and the word vectors of the candidate questions thereof(x,y)(ii) a Sorting the similarity values of the candidate questions, and selecting the answer corresponding to the candidate question with the highest similarity score as the return answer of the user input question;
if no named entity exists, finding out the corresponding question of the non-named entity from the question-answer database as a candidate answer, and calculating the similarity sim2 according to the word vector of the question input by the user and the word vector of the candidate question(x,y)
And sequencing the similarity values of the candidate questions, and selecting the answer corresponding to the candidate question with the highest similarity score as the return answer of the question input by the user.
2. The method for constructing the question-answering system based on named entity recognition according to claim 1, wherein the question-answering platform in step 1 selects one or more of the following platforms: baidu bar, Baidu know, search for questions, 360 questions and answers, search for fox questions and answers, and know answer.
3. The method for constructing a question-answering system based on named entity recognition according to claim 1, wherein the effective score of each answer record in step 1 is as follows:
Figure FDA0003330016620000021
where d is the number of days +1, n from the answer record to the latest answer1Indicates the number of praise, n2Indicating the number of reviews.
4. The method for constructing the question-answer system based on named entity recognition according to any one of claims 1 to 3, wherein the step 1 further comprises periodically and regularly crawling the question-answer class platform to update the question-answer database, calculating the effective score of a newly added answer for the same question existing in the database, and directly replacing the answer of the question in the database if the effective score of the answer of the question is higher than that in the database; if the effective score of the question in the database is lower than the effective score of the question in the database, the answer of the question in the database is unchanged.
5. The method for constructing the question-answering system based on named entity recognition according to claim 4, wherein the similarity sim1 is(x,y)The calculation method is as follows:
Figure FDA0003330016620000031
wherein, W1,W2,…,WaRepresenting a named class entity word vector, N1,N2,…NbRepresenting noun word vectors of non-named classes, V1,V2,…VbThe word vector of the noun of the non-naming class entity is represented, a represents the number of the noun of the naming class entity, b represents the number of the noun of the non-naming class entity, and c represents the number of the verb of the non-naming class entity.
6. The method for constructing the question-answering system based on named entity recognition according to claim 4, wherein the similarity sim2 is(x,y)The calculation method is as follows:
Figure FDA0003330016620000032
7. the method for constructing a question-answer system based on named entity recognition according to any one of claims 5 or 6, wherein the step 4 of finding the question of the corresponding named entity from the question-answer database as a candidate question refers to finding the question of the same word and the same entity type to which the same word belongs.
CN202111276164.3A 2021-10-29 2021-10-29 Question-answering system construction method based on named entity recognition Withdrawn CN113901824A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111276164.3A CN113901824A (en) 2021-10-29 2021-10-29 Question-answering system construction method based on named entity recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111276164.3A CN113901824A (en) 2021-10-29 2021-10-29 Question-answering system construction method based on named entity recognition

Publications (1)

Publication Number Publication Date
CN113901824A true CN113901824A (en) 2022-01-07

Family

ID=79027156

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111276164.3A Withdrawn CN113901824A (en) 2021-10-29 2021-10-29 Question-answering system construction method based on named entity recognition

Country Status (1)

Country Link
CN (1) CN113901824A (en)

Similar Documents

Publication Publication Date Title
US5671333A (en) Training apparatus and method
US6728695B1 (en) Method and apparatus for making predictions about entities represented in documents
CN113094578B (en) Deep learning-based content recommendation method, device, equipment and storage medium
CN113505586A (en) Seat-assisted question-answering method and system integrating semantic classification and knowledge graph
CN111353306A (en) Entity relationship and dependency Tree-LSTM-based combined event extraction method
CN111274829A (en) Sequence labeling method using cross-language information
CN112632258A (en) Text data processing method and device, computer equipment and storage medium
CN116077942A (en) Method for realizing interactive content recommendation
CN112711666B (en) Futures label extraction method and device
JP6924975B2 (en) Sound analyzer and its processing method, program
CN112579666A (en) Intelligent question-answering system and method and related equipment
CN115017271B (en) Method and system for intelligently generating RPA flow component block
Van Enschot et al. Taming our wild data: On intercoder reliability in discourse research
CN113901824A (en) Question-answering system construction method based on named entity recognition
CN111798217B (en) Data analysis system and method
CN114090777A (en) Text data processing method and device
CN114328902A (en) Text labeling model construction method and device
CN116775813B (en) Service searching method, device, electronic equipment and readable storage medium
CN111209404B (en) Method for generating similar question sentences based on deep learning assistance
CN114398492B (en) Knowledge graph construction method, terminal and medium in digital field
CN116340481B (en) Method and device for automatically replying to question, computer readable storage medium and terminal
CN110008307B (en) Method and device for identifying deformed entity based on rules and statistical learning
Brajković et al. Evaluating Text Summarization Using FAHP and TOPSIS Methods in Intelligent Tutoring Systems
CN118070784A (en) Method, device, equipment and storage medium for constructing entity dictionary in vertical industrial field
CN111931481A (en) Text emotion recognition method and device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20220107