CN110569490A

CN110569490A - Method for constructing domain entity labeling corpus based on entity iteration

Info

Publication number: CN110569490A
Application number: CN201910665738.2A
Authority: CN
Inventors: 肖清林
Original assignee: Fujian Singularity Space-Time Digital Technology Co Ltd
Current assignee: Xiamen Useear Information Technology Co ltd
Priority date: 2019-07-23
Filing date: 2019-07-23
Publication date: 2019-12-13

Abstract

A field entity labeling corpus construction method based on entity iteration is characterized in that data information is collected on each terminal and a cloud server; screening and classifying the collected information; sending the screened information to a network; reading and recording the information of the net friends; and (5) perfecting and classifying to construct a corpus. The invention sorts and screens sentences, vocabulary entries and articles with the most output by sorting a large number of collected objects of information on the terminal and the cloud server, screens and classifies the sentences, vocabulary entries and articles with the most search amount, sorts and transmits the sorted and evaluated sentences, vocabulary entries and articles to the network for the network friends to watch and evaluate, and finally classifies and constructs the corpus according to the evaluated content and the comprehensive consideration and judgment of the age, the scholarly and geographical position lamp information of the network friends, screens and compares the evaluated content with the sorted sentences, vocabulary entries and articles, modifies the corrected sentences, vocabulary entries and articles and the articles, and the corpus is continuously fed back to finally construct the corpus in a classified manner, so that the corpus construction is accurate, practical and valuable.

Description

Method for constructing domain entity labeling corpus based on entity iteration

Technical Field

The invention relates to the field of corpora, in particular to a method for constructing a domain entity annotation corpora based on entity iteration.

Background

the positive evaluation of enterprise work is realized by constructing a corpus, an effective method is lacked all the time, superior reports are carried out for a period of time through subordinate reports, and thus the method has independence, cannot be combined with the heart sound of people, and cannot be objectively expressed, so that the difficulty in constructing the corpus is further caused.

in order to solve the above problems, the present application provides a method for constructing a domain entity annotation corpus based on entity iteration.

disclosure of Invention

Objects of the invention

In order to solve the technical problems in the background art, the invention provides a field entity labeling corpus construction method based on entity iteration.

(II) technical scheme

In order to solve the problems, the invention provides a method for constructing a domain entity annotation corpus based on entity iteration, which comprises the following steps:

S1, collecting data information from each terminal and cloud server;

S2, screening and classifying the collected information;

s3, sending the screened information to a network;

s4, reading and recording the information of the net friends;

And S5, perfecting and classifying to construct a corpus.

Preferably, in S1, the collected data information includes corpus construction, enterprise culture construction, enterprise organization construction, enterprise wind construction, discipline construction and system construction related articles, keywords and sentences.

Preferably, in S2, the collected information is sorted by sorting the collected information to remove useless information, sorting the sorted information into a series of labels, and classifying the sorted information according to the labels.

Preferably, in S3, the articles, keywords, and sentences that appear most frequently after being filtered are sent to the network and pushed to the mobile terminal for viewing.

preferably, in S4, the net friend information includes age information, comments, scholarly information, and geographical information, and the comments of each age group, the comments of each scholarly section, and the comments of different geographical location information are collated and recorded.

preferably, in S5, the comments and insights of the people at each stage are integrated, and articles, keywords and sentences related to corpus construction, enterprise culture construction, enterprise organization construction, enterprise wind construction, discipline construction and system construction are continuously corrected to construct a complete corpus, and the corpus is divided into keywords, bilingual and multilingual.

The technical scheme of the invention has the following beneficial technical effects:

The method comprises the steps of sorting and screening sentences, vocabulary entries and articles with the most amount of search information on a terminal and a cloud server, screening and classifying, sorting and sending the most amount of search information to a network for network friends to watch and evaluate, judging according to the evaluated content and comprehensive consideration of the age, scholarly and geographical position lamp information of the network friends, screening the evaluated content, comparing the evaluated content with the sorted sentences, vocabulary entries and articles, modifying, and finally classifying and constructing a corpus after continuous feedback, so that the corpus construction achieves the effects of quickness, accuracy, practicability and value.

Drawings

fig. 1 is a schematic flowchart of a method for constructing a domain entity annotation corpus based on entity iteration according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.

As shown in fig. 1, the method for constructing a domain entity annotation corpus based on entity iteration, provided by the invention, comprises the following steps:

S1, collecting data information from each terminal and cloud server;

S2, screening and classifying the collected information;

s3, sending the screened information to a network;

S4, reading and recording the information of the net friends;

and S5, perfecting and classifying to construct a corpus.

In an alternative embodiment, at S1, the collected material information includes corpus construction, business culture construction, business organization construction, business composition construction, discipline construction and institutional construction articles, keywords and sentences.

in an alternative embodiment, at S2, the collected information is filtered to remove useless information, the filtered information is divided into a series of labels, and the labels are classified.

in an optional embodiment, in S3, the articles, keywords, and sentences that appear most frequently after being filtered are sent to the network, and pushed to the mobile terminal for viewing.

In an alternative embodiment, in S4, the net friend information includes age information, comments, scholarly information, and geographical information, and the comments of each age group, the comments of each scholarly section, and the comments of different geographical location information are collated and recorded.

in an alternative embodiment, in S5, the comments and insights of the people at each stage are combined, and articles, keywords and sentences related to corpus construction, enterprise culture construction, enterprise organization construction, enterprise wind construction, discipline construction and system construction are continuously corrected to construct a complete corpus, and the corpus is divided into keyword, bilingual and multilingual corpuses.

According to the method, a large number of collected objects of information on the terminal and the cloud server are sorted and screened to obtain sentences, entries and articles with the largest output, the sentences, entries and articles are screened and classified, the searched sentences, entries and articles with the largest output are sorted and sent to the network for the net friends to watch and evaluate, the evaluated contents are screened and compared with the sorted sentences, entries and articles according to the evaluated contents and comprehensive consideration and evaluation of the age, the scholarly and geographical position lamp information of the net friends, and are modified and are continuously fed back to finally classify and construct the corpus, so that the corpus construction is accurate, practical and valuable.

It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims

1. A method for constructing a domain entity labeling corpus based on entity iteration is characterized by comprising the following steps:

S1, collecting data information from each terminal and cloud server;

S2, screening and classifying the collected information;

S3, sending the screened information to a network;

s4, reading and recording the information of the net friends;

And S5, perfecting and classifying to construct a corpus.

2. The method for constructing a domain entity labeling corpus based on entity iteration as claimed in claim 1, wherein in S1, the collected data information includes corpus construction, enterprise culture construction, enterprise organization construction, enterprise wind construction, discipline construction related articles, keywords and sentences.

3. The method for constructing the domain entity labeling corpus based on entity iteration as claimed in claim 1, wherein in S2, the collected information is filtered to remove useless information, the filtered information is divided into a series of labels, and the labels are classified.

4. The method for constructing the domain entity annotation corpus based on entity iteration as claimed in claim 1, wherein in S3, the articles, keywords and sentences with the largest number of occurrences after being screened are sent to the network and pushed to the mobile terminal for viewing.

5. the method for constructing the entity annotation corpus in the field based on entity iteration as claimed in claim 1, wherein in S4, the net friend information includes age information, comments, academic information and geographic information, and the comments of each age group, the comments of each academic section and the comments of different geographic location information are sorted and recorded.

6. The method for constructing a corpus based on entity iteration domain entity labeling of claim 1, wherein in S5, the comments and insights of people in each stage are integrated, and the articles, keywords and sentences related to corpus construction, enterprise culture construction, enterprise organization construction, enterprise wind construction, discipline construction and system construction are continuously corrected to construct a complete corpus, and the corpus is divided into keywords, bilinguals and multilinguals.