CN113282719A - Construction method of labeled data set, intelligent terminal and storage medium - Google Patents

Construction method of labeled data set, intelligent terminal and storage medium Download PDF

Info

Publication number
CN113282719A
CN113282719A CN202010100949.4A CN202010100949A CN113282719A CN 113282719 A CN113282719 A CN 113282719A CN 202010100949 A CN202010100949 A CN 202010100949A CN 113282719 A CN113282719 A CN 113282719A
Authority
CN
China
Prior art keywords
data set
article
answer
question
constructing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010100949.4A
Other languages
Chinese (zh)
Inventor
张高升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan TCL Group Industrial Research Institute Co Ltd
Original Assignee
Wuhan TCL Group Industrial Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan TCL Group Industrial Research Institute Co Ltd filed Critical Wuhan TCL Group Industrial Research Institute Co Ltd
Priority to CN202010100949.4A priority Critical patent/CN113282719A/en
Publication of CN113282719A publication Critical patent/CN113282719A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/382Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using citations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Library & Information Science (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a construction method of a labeled data set, an intelligent terminal and a storage medium, wherein the method comprises the following steps: the method comprises the steps of obtaining an article data set, wherein the article data set comprises a plurality of articles; generating answers of the articles of the article data set by adopting a sequence labeling model; and generating a question corresponding to the answer by adopting a deep learning generation model, constructing a combination corresponding to the article, the question and the answer, and generating the labeled data set. According to the method, the marked data set required in deep learning is automatically constructed by constructing the corresponding combination of the article, the question and the answer, and the time cost and the economic cost are saved.

Description

Construction method of labeled data set, intelligent terminal and storage medium
Technical Field
The invention relates to the technical field of computer data processing, in particular to a construction method of a labeled data set, an intelligent terminal and a storage medium.
Background
Machine-reading understanding means that a context description is given, a query is given correspondingly, and then the machine reads the context and gives an answer to the query correspondingly. An assumption is made here that the answer to the query must be a segment of the word (which can also be understood as a succession of words) that can be found in the context text, i.e. the goal of the final model prediction is to output two indices, corresponding to the beginning and ending positions of the query answer in the context text, respectively. The penalty function of the final model is the cross-entropy softmax of the multi-class (since the problem is essentially equivalent to a multi-class problem, the number of classes of the problem is equal to the number of words in the context, i.e. each word is likely to start as an answer).
Machine Reading Comprehension (MRC) is a task that requires the machine to answer questions according to a given context, in order to test the extent to which the machine understands natural language. Constructing a model of an MRC based on a deep neural network requires a large amount of annotation data. A piece of annotation data is formally a combination, including an article, a question, and an answer. An annotation data set is a collection containing a plurality of pieces of annotation data. The existing labeling data is constructed in a manual labeling mode, so that the time cost and the economic cost are very high.
Accordingly, the prior art is yet to be improved and developed.
Disclosure of Invention
The invention mainly aims to provide a construction method of a labeled data set, an intelligent terminal and a storage medium, and aims to solve the problem that in the prior art, labeled data are constructed in a manual labeling mode, so that the time cost and the economic cost are high.
In order to achieve the above object, the present invention provides a method for constructing an annotated data set, which comprises the following steps:
the method comprises the steps of obtaining an article data set, wherein the article data set comprises a plurality of articles;
generating answers of the articles of the article data set by adopting a sequence labeling model;
and generating a question corresponding to the answer by adopting a deep learning generation model, constructing a combination corresponding to the article, the question and the answer, and generating the labeled data set.
Optionally, the method for constructing an annotation data set, where the manner of obtaining an article data set includes: the method comprises the following steps of obtaining the article data set from a network in a web crawler mode, obtaining the article data set by inquiring a service database according to a specific rule, obtaining the article data set from a data set which is published and licensed on the network, and obtaining at least one of the article data sets in a mode of obtaining authorization from a third party.
Optionally, the method for constructing a labeled data set, wherein the step of generating answers to articles in the article data set by using a sequence labeling model includes:
defining a label of the sequence label, wherein the label is used for representing name information in the article;
selecting information of a person name, a place name and an organization name in the article as the answer according to the label;
and inputting each article into the sequence labeling model, and outputting the selectable answer set corresponding to each article by the sequence labeling model.
Optionally, the method for constructing an annotation data set, wherein the tag includes: at least one of a beginning part of a person name, a middle part of a person name, a beginning part of a place name, a middle part of a place name, a beginning part of an organization, a middle part of an organization, and non-entity information.
Optionally, the method for constructing a labeled data set, where the step of generating the question corresponding to the answer by using the deep learning generative model includes:
in the training stage of a deep learning generation model, words of input articles and answers are embedded, the words of output questions are embedded as input in an encoder part in sequence, and an output part of a decoder is selected;
the loss function is defined as:
P(q1,...qn|p1...pn,astart,aend);
representing the probability of a question as text appearing in a language model given articles and answers;
wherein, { q1,...qnText sequence representing a question, astart,aendDenotes the start and end positions of the answer, p ═ p1…pnA text sequence representing an article;
modeling as a language model, wherein the language model is used for calculating the probability of a sentence, and the probability of a section of text is represented by the product of the probabilities of each word in the text;
the formula is described as:
Figure BDA0002386849510000041
in the prediction stage of the deep learning generation model, the output part of the decoder is taken as the generated problem.
Optionally, the method for constructing a labeled data set, wherein the step of constructing a combination corresponding to the article, the question and the answer, and generating the labeled data set includes:
constructing a set of articles, questions and answers as S { (p1, q1, a1), (p2, q2, a2), (p3, q3, a 3.);
wherein S represents the labeled data set, each element is a tuple (p, q, a), p represents an article, q represents a question, a represents an answer, and the article, the question and the answer in one tuple correspond to each other.
Optionally, in the method for constructing a labeled data set, a correspondence between the answer and the article is determined according to input and output of the sequence labeling model;
and the corresponding relation between the question and the answer is determined according to the input and the output of the deep learning generation model.
Optionally, the annotation data set is constructed by a deep learning model, wherein the deep learning model includes an encoder and a decoder.
In addition, to achieve the above object, the present invention further provides an intelligent terminal, wherein the intelligent terminal includes: a memory, a processor and a construction program of an annotation data set stored on the memory and executable on the processor, the construction program of the annotation data set implementing the steps of the construction method of the annotation data set as described above when executed by the processor.
In order to achieve the above object, the present invention further provides a storage medium, wherein the storage medium stores a construction program of an annotation data set, and the construction program of the annotation data set realizes the steps of the construction method of the annotation data set as described above when executed by a processor.
The method comprises the steps of obtaining an article data set, wherein the article data set comprises a plurality of articles; generating answers of the articles of the article data set by adopting a sequence labeling model; and generating a question corresponding to the answer by adopting a deep learning generation model, constructing a combination corresponding to the article, the question and the answer, and generating the labeled data set. According to the method, the marked data set required in deep learning is automatically constructed by constructing the corresponding combination of the article, the question and the answer, and the time cost and the economic cost are saved.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of the construction method of the annotation data set of the present invention;
FIG. 2 is a schematic diagram of a sequence annotation model in a preferred embodiment of the construction method of the annotation data set of the present invention;
FIG. 3 is a schematic diagram of a sequence annotation model for prediction according to a preferred embodiment of the method for constructing an annotated data set of the present invention;
FIG. 4 is a schematic diagram of a deep learning generative model in a preferred embodiment of the construction method of an annotation data set of the present invention;
fig. 5 is a schematic operating environment diagram of an intelligent terminal according to a preferred embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, the method for constructing an annotated data set according to the preferred embodiment of the present invention includes the following steps:
and step S10, acquiring an article data set, wherein the article data set comprises a plurality of articles.
Specifically, a data set of articles is obtained, wherein the articles generally refer to an article, such as a reading comprehension article. The article data can be acquired by adopting different technologies according to specific service requirements; for example, web crawlers (also known as web spiders, web robots, and more often web chasers, which are programs or scripts that automatically capture web information according to certain rules) can be used to obtain information from the web; the data can be obtained by querying from a business database according to a certain rule; can employ data set acquisition that is published and licensed on the network; may be obtained in a manner that takes authorization from a third party.
Further, the obtained article data set is P ═ { P1, P2, p3.. }, where P denotes the article data set; p1, p2, p3, etc. represent each an article.
And step S20, generating answers of the articles of the article data set by adopting a sequence labeling model.
Specifically, all articles P are sequentially taken out from the article data set P, and the same processing manner is adopted for each article, specifically: a sequence annotation model (sequence annotation refers to an artificial intelligence task type, such as judging the part of speech of each word in a sentence) is used as an optional answer set of an article, the input is an article p, and the output is an optional answer set A ═ a1, a2, a3., wherein a1, a2, a3 and the like respectively represent optional answers.
And selecting the information of the names of people, places and organizations in the article as the answers, wherein the adopted sequence labeling model is shown in figure 2.
Firstly, defining a label marked by a sequence, wherein the label is used for representing name information (such as information representing a person name, a place name and an organization name, so that the information of the person name, the place name and the organization name in the article can be selected as the answer according to the label), and according to the business requirement, defining the following labels:
B-Person (beginning part of Person name);
I-Person (middle part of the name of a Person);
B-Place (beginning of Place name);
I-Place (middle part of Place name);
B-Organization (beginning part of the Organization);
I-Organization (middle part of the Organization);
o (non-entity information);
namely, the tag includes: at least one of a beginning part of a person name, a middle part of a person name, a beginning part of a place name, a middle part of a place name, a beginning part of an organization, a middle part of an organization, and non-entity information.
As shown in fig. 2, the input of the BilSTM-CRF (BilSTM, Bi-directional Long Short-Term Memory network, which is a bidirectional Long-Term Memory network and is formed by combining forward LSTM and backward LSTM, the representation of words is combined into the representation of sentences, longer distance dependency can be better captured by using the LSTM model, and CRF is a conditional random field) is a word embedding vector (which refers to a technology for mapping a word to a vector in a high-dimensional space), such as w1, w2, w3 and the like in fig. 2, and the output is a prediction tag corresponding to each word (the word embedding vector and the word have a corresponding relationship, and the input in fig. 2 is further preprocessed to convert the word into the word embedding vector). Each word in a sentence is a word vector containing word embedding (word embedding vector refers to a technique of mapping a word to a vector in a high-dimensional space), word embedding is usually trained in advance, and word embedding is initialized randomly. All embeddings are adjusted as the training is iterated.
The process of prediction is shown in fig. 3, where the output of the BiLSTM layer represents the score of the word for each category. The output is the score of the BilSTM layer indicating that the word corresponds to each category, for example, W0, and the output of the BilSTM node is 1.5(B-Person), 0.9(I-Person), 0.1(B-Organization), 0.08(I-Organization) and 0.05 (O). These scores will be the input to the CRF layer. All the scores output by the BilSTM layer are used as the input of the CRF layer (the input is the output from the BilSTM layer, namely the scores representing the words corresponding to all the categories), and the category with the highest score in the category sequence is the final result predicted by the user.
The CRF layer may add some constraints to ensure that the final prediction result is valid. These constraints can be learned automatically by the CRF layer during training data, and possible constraints are: the beginning of the sentence should be "B-" or "O" instead of "I-".
"B-label 1I-label 2I-label 3 …", in this mode, classes 1, 2, 3 should be the same entity class. For example, "B-Person I-Person" is correct, while "B-Person I-Organization" is incorrect. "O I-label" is erroneous, and the beginning of the named entity should be "B-" rather than "I-". With these useful constraints, the erroneous prediction sequences will be greatly reduced.
The CRF loss function consists of two parts, namely the fraction of a real path and the total fraction of all paths; the score of the real path should be the highest of all paths. The output is the path with the highest score in all paths, and the category with the highest score in the category sequence is the predicted final result.
And finally, extracting information of the names of people, places and organizations in the articles according to the prediction result of the sequence labeling model, wherein the information forms an optional answer set.
And step S30, generating the question corresponding to the answer by adopting a deep learning generation model, constructing a combination corresponding to the article, the question and the answer, and generating the labeled data set.
Specifically, a generation model of deep learning is adopted to generate a corresponding question of each answer. The generative model of the deep learning is modeled as an encoder-decoder process as shown in fig. 4, and an attention model is introduced.
In the training stage of the deep learning generation model, words of an input article and answers are embedded, the encoder (encoder) part is used as input in sequence, words of a question are embedded, and the output part of a decoder (decoder) is taken. The loss function is defined as:
P(q1,...qn|p1...pn,astart,aend);
representing the probability of a question appearing in a language model as text given an article and an answer. Wherein:
{q1,...qntext sequence representing the question;
astart,aenda start position and an end position representing an answer;
p=p1...pna text sequence representing an article;
in order to make the loss function calculable, a language model (i.e., a model for calculating the probability of a sentence, i.e., the probability of judging whether a sentence is a human word) is modeled, and the probability of occurrence of a piece of text is expressed by the product of the probabilities of occurrence of each word in the text.
The formula is described as:
Figure BDA0002386849510000091
in the prediction stage of the deep learning generative model, the output part of a decoder is taken as a problem of generation.
Building a set of articles, questions and answers: s { (p1, q1, a1), (p2, q2, a2), (p3, q3, a3). };
where S represents the set (i.e. the annotation data set), each element is a tuple (p, q, a), p represents an article, q represents a question, and a represents an answer. There is correspondence between articles, questions and answers in a tuple.
The corresponding relation between the answers and the articles is determined according to the input and the output of the sequence labeling model; the corresponding relation between the question and the answer is determined according to the input and the output of the deep learning generative model, namely the corresponding relation between the question and the (article, answer) is corresponding according to the input and the output of the deep learning generative model.
Furthermore, the construction method of the labeled data set can expand the labeled data set read and understood by a machine, can automatically generate the question and answer of the read and understood question type on the teaching, and brings great convenience to the use of users.
Further, as shown in fig. 5, based on the above construction method of the annotation data set, the present invention also provides an intelligent terminal, which includes a processor 10, a memory 20 and a display 30. Fig. 5 shows only some of the components of the smart terminal, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.
The memory 20 may be an internal storage unit of the intelligent terminal in some embodiments, such as a hard disk or a memory of the intelligent terminal. The memory 20 may also be an external storage device of the Smart terminal in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the Smart terminal. Further, the memory 20 may also include both an internal storage unit and an external storage device of the smart terminal. The memory 20 is used for storing application software installed in the intelligent terminal and various data, such as program codes of the installed intelligent terminal. The memory 20 may also be used to temporarily store data that has been output or is to be output. In one embodiment, the memory 20 stores a construction program 40 of the annotation data set, and the construction program 40 of the annotation data set can be executed by the processor 10, so as to implement the construction method of the annotation data set in the present application.
The processor 10 may be, in some embodiments, a Central Processing Unit (CPU), a microprocessor or other data Processing chip, and is used for executing the program codes stored in the memory 20 or Processing data, such as executing the constructing method of the labeled data set.
The display 30 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch panel, or the like in some embodiments. The display 30 is used for displaying information at the intelligent terminal and for displaying a visual user interface. The components 10-30 of the intelligent terminal communicate with each other via a system bus.
In one embodiment, the following steps are implemented when the processor 10 executes the construction program 40 for labeling data sets in the memory 20:
the method comprises the steps of obtaining an article data set, wherein the article data set comprises a plurality of articles;
generating answers of the articles of the article data set by adopting a sequence labeling model;
and generating a question corresponding to the answer by adopting a deep learning generation model, constructing a combination corresponding to the article, the question and the answer, and generating the labeled data set.
The method for acquiring the article data set comprises the following steps: the method comprises the following steps of obtaining the article data set from a network in a web crawler mode, obtaining the article data set by inquiring a service database according to a specific rule, obtaining the article data set from a data set which is published and licensed on the network, and obtaining at least one of the article data sets in a mode of obtaining authorization from a third party.
The step of generating answers to the articles of the article data set by using the sequence tagging model comprises the following steps:
defining a label of the sequence label, wherein the label is used for representing name information in the article;
selecting information of a person name, a place name and an organization name in the article as the answer according to the label;
and inputting each article into the sequence labeling model, and outputting the selectable answer set corresponding to each article by the sequence labeling model.
The label includes: a beginning part of a person name, a middle part of a person name, a beginning part of a place name, a middle part of a place name, a beginning part of an organization, a middle part of an organization, and non-entity information.
The step of generating the question corresponding to the answer by using the deep learning generative model comprises the following steps:
in the training stage of a deep learning generation model, words of input articles and answers are embedded, the words of output questions are embedded as input in an encoder part in sequence, and an output part of a decoder is selected;
the loss function is defined as:
P(q1,...qn|p1...pn,astart,aend);
representing the probability of a question as text appearing in a language model given articles and answers;
wherein, { q1,…qnText sequence representing a question, astart,aendDenotes the start and end positions of the answer, p ═ p1…pnA text sequence representing an article;
modeling as a language model, wherein the language model is used for calculating the probability of a sentence, and the probability of a section of text is represented by the product of the probabilities of each word in the text;
the formula is described as:
Figure BDA0002386849510000131
in the prediction stage of the deep learning generation model, the output part of the decoder is taken as the generated problem.
The constructing of the corresponding combination of the article, the question and the answer to generate the labeled data set specifically includes:
constructing a set of articles, questions and answers as S { (p1, q1, a1), (p2, q2, a2), (p3, q3, a 3.);
wherein S represents the labeled data set, each element is a tuple (p, q, a), p represents an article, q represents a question, a represents an answer, and the article, the question and the answer in one tuple correspond to each other.
The corresponding relation between the answers and the articles is determined according to the input and the output of the sequence labeling model;
and the corresponding relation between the question and the answer is determined according to the input and the output of the deep learning generation model.
The generative model of deep learning includes an encoder and a decoder.
The present invention also provides a storage medium, wherein the storage medium stores a construction program of an annotated data set, and the construction program of the annotated data set realizes the steps of the construction method of the annotated data set as described above when executed by a processor.
In summary, the present invention provides a method for constructing an annotated data set, an intelligent terminal and a storage medium, wherein the method includes: the method comprises the steps of obtaining an article data set, wherein the article data set comprises a plurality of articles; generating answers of the articles of the article data set by adopting a sequence labeling model; and generating a question corresponding to the answer by adopting a deep learning generation model, constructing a combination corresponding to the article, the question and the answer, and generating the labeled data set. According to the method, the marked data set required in deep learning is automatically constructed by constructing the corresponding combination of the article, the question and the answer, and the time cost and the economic cost are saved.
Of course, it will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by a computer program instructing relevant hardware (such as a processor, a controller, etc.), and the program may be stored in a computer readable storage medium, and when executed, the program may include the processes of the above method embodiments. The storage medium may be a memory, a magnetic disk, an optical disk, etc.
It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims (10)

1. A construction method of an annotated data set is characterized by comprising the following steps:
the method comprises the steps of obtaining an article data set, wherein the article data set comprises a plurality of articles;
generating answers of the articles of the article data set by adopting a sequence labeling model;
and generating a question corresponding to the answer by adopting a deep learning generation model, constructing a combination corresponding to the article, the question and the answer, and generating the labeled data set.
2. The method for constructing an annotation data set according to claim 1, wherein the manner of obtaining the article data set comprises: the method comprises the following steps of obtaining the article data set from a network in a web crawler mode, obtaining the article data set by inquiring a service database according to a specific rule, obtaining the article data set from a data set which is published and licensed on the network, and obtaining at least one of the article data sets in a mode of obtaining authorization from a third party.
3. The method for constructing a labeled data set according to claim 1, wherein the step of generating answers to articles of the article data set by using a sequence labeling model comprises:
defining a label of the sequence label, wherein the label is used for representing name information in the article;
selecting information of a person name, a place name and an organization name in the article as the answer according to the label;
and inputting each article into the sequence labeling model, and outputting the selectable answer set corresponding to each article by the sequence labeling model.
4. The construction method of an annotation data set according to claim 3, wherein the tag comprises: at least one of a beginning part of a person name, a middle part of a person name, a beginning part of a place name, a middle part of a place name, a beginning part of an organization, a middle part of an organization, and non-entity information.
5. The method for constructing a labeled data set according to claim 4, wherein the step of generating the question corresponding to the answer by using the deep learning generative model comprises:
in the training stage of a deep learning generation model, words of input articles and answers are embedded, the words of output questions are embedded as input in an encoder part in sequence, and an output part of a decoder is selected;
the loss function is defined as:
P(q1,...qn|p1...pn,astart,aend);
representing the probability of a question as text appearing in a language model given articles and answers;
wherein, { q1,...qnText sequence representing a question, astart,aendDenotes the start and end positions of the answer, p ═ p1...pnA text sequence representing an article;
modeling as a language model, wherein the language model is used for calculating the probability of a sentence, and the probability of a section of text is represented by the product of the probabilities of each word in the text;
the formula is described as:
Figure FDA0002386849500000021
in the prediction stage of the deep learning generation model, the output part of the decoder is taken as the generated problem.
6. The method for constructing a labeled data set according to claim 5, wherein the step of constructing a corresponding combination of articles, questions and answers and generating the labeled data set comprises:
constructing a set of articles, questions and answers as S { (p1, q1, a1), (p2, q2, a2), (p3, q3, a 3.);
wherein S represents the labeled data set, each element is a tuple (p, q, a), p represents an article, q represents a question, a represents an answer, and the article, the question and the answer in one tuple correspond to each other.
7. The method for constructing a labeled data set according to claim 1 or 6, wherein the correspondence between the answers and the articles is determined according to the input and output of the sequence labeling model;
and the corresponding relation between the question and the answer is determined according to the input and the output of the deep learning generation model.
8. The method of constructing an annotation data set according to claim 1 or 5, wherein the generative model of deep learning comprises an encoder and a decoder.
9. An intelligent terminal, characterized in that, intelligent terminal includes: memory, processor and a construction program of an annotation data set stored on the memory and executable on the processor, which when executed by the processor implements the steps of the construction method of an annotation data set according to any one of claims 1 to 8.
10. A storage medium characterized by storing a construction program of an annotation data set, which when executed by a processor implements the steps of the construction method of an annotation data set according to any one of claims 1 to 8.
CN202010100949.4A 2020-02-19 2020-02-19 Construction method of labeled data set, intelligent terminal and storage medium Pending CN113282719A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010100949.4A CN113282719A (en) 2020-02-19 2020-02-19 Construction method of labeled data set, intelligent terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010100949.4A CN113282719A (en) 2020-02-19 2020-02-19 Construction method of labeled data set, intelligent terminal and storage medium

Publications (1)

Publication Number Publication Date
CN113282719A true CN113282719A (en) 2021-08-20

Family

ID=77274886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010100949.4A Pending CN113282719A (en) 2020-02-19 2020-02-19 Construction method of labeled data set, intelligent terminal and storage medium

Country Status (1)

Country Link
CN (1) CN113282719A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160232441A1 (en) * 2015-02-05 2016-08-11 International Business Machines Corporation Scoring type coercion for question answering
CN109657041A (en) * 2018-12-04 2019-04-19 南京理工大学 The problem of based on deep learning automatic generation method
CN110334184A (en) * 2019-07-04 2019-10-15 河海大学常州校区 The intelligent Answer System understood is read based on machine

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160232441A1 (en) * 2015-02-05 2016-08-11 International Business Machines Corporation Scoring type coercion for question answering
CN109657041A (en) * 2018-12-04 2019-04-19 南京理工大学 The problem of based on deep learning automatic generation method
CN110334184A (en) * 2019-07-04 2019-10-15 河海大学常州校区 The intelligent Answer System understood is read based on machine

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DAVID GOLUB: "two-stage synthesis networks for transfer learning in machine comprehension", 《ARXIX》 *
孙孙: "最通俗易懂的BiLSTM-CRF模型中CRF层介绍", 《知乎》 *

Similar Documents

Publication Publication Date Title
CN111027327B (en) Machine reading understanding method, device, storage medium and device
Yu et al. FlowSense: A natural language interface for visual data exploration within a dataflow system
CN111858944B (en) Entity aspect level emotion analysis method based on attention mechanism
Stevenson et al. A survey of grammatical inference in software engineering
CN111738016A (en) Multi-intention recognition method and related equipment
Barclay et al. Object-oriented Design with UML and Java
Korinek Generative AI for economic research: Use cases and implications for economists
CN114281957A (en) Natural language data query method and device, electronic equipment and storage medium
CN116245097A (en) Method for training entity recognition model, entity recognition method and corresponding device
CN114580424A (en) Labeling method and device for named entity identification of legal document
EP4145273A1 (en) Natural solution language
Weichbroth Fluent editor and controlled natural language in ontology development
Joyner et al. From clusters to content: Using code clustering for course improvement
CN116386895B (en) Epidemic public opinion entity identification method and device based on heterogeneous graph neural network
CN112148879B (en) Computer readable storage medium for automatically labeling code with data structure
CN111639500A (en) Semantic role labeling method and device, computer equipment and storage medium
CN113282719A (en) Construction method of labeled data set, intelligent terminal and storage medium
Zander et al. Student transformations: are they computer scientists yet?
CN114491004A (en) Title generation method and device, electronic equipment and storage medium
Calle Gallego et al. QUARE: towards a question-answering model for requirements elicitation
CN113407704A (en) Text matching method, device and equipment and computer readable storage medium
Aquino et al. A Methodological Assistant for UML and SysML Use Case Diagrams
Weerasinghe et al. Smart UML-Assignment Management Tool for UML Diagrams
CN115169330B (en) Chinese text error correction and verification method, device, equipment and storage medium
US11995394B1 (en) Language-guided document editing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210820