CN110569490A - Method for constructing domain entity labeling corpus based on entity iteration - Google Patents

Method for constructing domain entity labeling corpus based on entity iteration Download PDF

Info

Publication number
CN110569490A
CN110569490A CN201910665738.2A CN201910665738A CN110569490A CN 110569490 A CN110569490 A CN 110569490A CN 201910665738 A CN201910665738 A CN 201910665738A CN 110569490 A CN110569490 A CN 110569490A
Authority
CN
China
Prior art keywords
information
corpus
entity
construction
articles
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910665738.2A
Other languages
Chinese (zh)
Inventor
肖清林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Useear Information Technology Co ltd
Original Assignee
Fujian Singularity Space-Time Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Singularity Space-Time Digital Technology Co Ltd filed Critical Fujian Singularity Space-Time Digital Technology Co Ltd
Priority to CN201910665738.2A priority Critical patent/CN110569490A/en
Publication of CN110569490A publication Critical patent/CN110569490A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A field entity labeling corpus construction method based on entity iteration is characterized in that data information is collected on each terminal and a cloud server; screening and classifying the collected information; sending the screened information to a network; reading and recording the information of the net friends; and (5) perfecting and classifying to construct a corpus. The invention sorts and screens sentences, vocabulary entries and articles with the most output by sorting a large number of collected objects of information on the terminal and the cloud server, screens and classifies the sentences, vocabulary entries and articles with the most search amount, sorts and transmits the sorted and evaluated sentences, vocabulary entries and articles to the network for the network friends to watch and evaluate, and finally classifies and constructs the corpus according to the evaluated content and the comprehensive consideration and judgment of the age, the scholarly and geographical position lamp information of the network friends, screens and compares the evaluated content with the sorted sentences, vocabulary entries and articles, modifies the corrected sentences, vocabulary entries and articles and the articles, and the corpus is continuously fed back to finally construct the corpus in a classified manner, so that the corpus construction is accurate, practical and valuable.

Description

Method for constructing domain entity labeling corpus based on entity iteration
Technical Field
The invention relates to the field of corpora, in particular to a method for constructing a domain entity annotation corpora based on entity iteration.
Background
the positive evaluation of enterprise work is realized by constructing a corpus, an effective method is lacked all the time, superior reports are carried out for a period of time through subordinate reports, and thus the method has independence, cannot be combined with the heart sound of people, and cannot be objectively expressed, so that the difficulty in constructing the corpus is further caused.
in order to solve the above problems, the present application provides a method for constructing a domain entity annotation corpus based on entity iteration.
disclosure of Invention
Objects of the invention
In order to solve the technical problems in the background art, the invention provides a field entity labeling corpus construction method based on entity iteration.
(II) technical scheme
In order to solve the problems, the invention provides a method for constructing a domain entity annotation corpus based on entity iteration, which comprises the following steps:
S1, collecting data information from each terminal and cloud server;
S2, screening and classifying the collected information;
s3, sending the screened information to a network;
s4, reading and recording the information of the net friends;
And S5, perfecting and classifying to construct a corpus.
Preferably, in S1, the collected data information includes corpus construction, enterprise culture construction, enterprise organization construction, enterprise wind construction, discipline construction and system construction related articles, keywords and sentences.
Preferably, in S2, the collected information is sorted by sorting the collected information to remove useless information, sorting the sorted information into a series of labels, and classifying the sorted information according to the labels.
Preferably, in S3, the articles, keywords, and sentences that appear most frequently after being filtered are sent to the network and pushed to the mobile terminal for viewing.
preferably, in S4, the net friend information includes age information, comments, scholarly information, and geographical information, and the comments of each age group, the comments of each scholarly section, and the comments of different geographical location information are collated and recorded.
preferably, in S5, the comments and insights of the people at each stage are integrated, and articles, keywords and sentences related to corpus construction, enterprise culture construction, enterprise organization construction, enterprise wind construction, discipline construction and system construction are continuously corrected to construct a complete corpus, and the corpus is divided into keywords, bilingual and multilingual.
The technical scheme of the invention has the following beneficial technical effects:
The method comprises the steps of sorting and screening sentences, vocabulary entries and articles with the most amount of search information on a terminal and a cloud server, screening and classifying, sorting and sending the most amount of search information to a network for network friends to watch and evaluate, judging according to the evaluated content and comprehensive consideration of the age, scholarly and geographical position lamp information of the network friends, screening the evaluated content, comparing the evaluated content with the sorted sentences, vocabulary entries and articles, modifying, and finally classifying and constructing a corpus after continuous feedback, so that the corpus construction achieves the effects of quickness, accuracy, practicability and value.
Drawings
fig. 1 is a schematic flowchart of a method for constructing a domain entity annotation corpus based on entity iteration according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.
As shown in fig. 1, the method for constructing a domain entity annotation corpus based on entity iteration, provided by the invention, comprises the following steps:
S1, collecting data information from each terminal and cloud server;
S2, screening and classifying the collected information;
s3, sending the screened information to a network;
S4, reading and recording the information of the net friends;
and S5, perfecting and classifying to construct a corpus.
In an alternative embodiment, at S1, the collected material information includes corpus construction, business culture construction, business organization construction, business composition construction, discipline construction and institutional construction articles, keywords and sentences.
in an alternative embodiment, at S2, the collected information is filtered to remove useless information, the filtered information is divided into a series of labels, and the labels are classified.
in an optional embodiment, in S3, the articles, keywords, and sentences that appear most frequently after being filtered are sent to the network, and pushed to the mobile terminal for viewing.
In an alternative embodiment, in S4, the net friend information includes age information, comments, scholarly information, and geographical information, and the comments of each age group, the comments of each scholarly section, and the comments of different geographical location information are collated and recorded.
in an alternative embodiment, in S5, the comments and insights of the people at each stage are combined, and articles, keywords and sentences related to corpus construction, enterprise culture construction, enterprise organization construction, enterprise wind construction, discipline construction and system construction are continuously corrected to construct a complete corpus, and the corpus is divided into keyword, bilingual and multilingual corpuses.
According to the method, a large number of collected objects of information on the terminal and the cloud server are sorted and screened to obtain sentences, entries and articles with the largest output, the sentences, entries and articles are screened and classified, the searched sentences, entries and articles with the largest output are sorted and sent to the network for the net friends to watch and evaluate, the evaluated contents are screened and compared with the sorted sentences, entries and articles according to the evaluated contents and comprehensive consideration and evaluation of the age, the scholarly and geographical position lamp information of the net friends, and are modified and are continuously fed back to finally classify and construct the corpus, so that the corpus construction is accurate, practical and valuable.
It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims (6)

1. A method for constructing a domain entity labeling corpus based on entity iteration is characterized by comprising the following steps:
S1, collecting data information from each terminal and cloud server;
S2, screening and classifying the collected information;
S3, sending the screened information to a network;
s4, reading and recording the information of the net friends;
And S5, perfecting and classifying to construct a corpus.
2. The method for constructing a domain entity labeling corpus based on entity iteration as claimed in claim 1, wherein in S1, the collected data information includes corpus construction, enterprise culture construction, enterprise organization construction, enterprise wind construction, discipline construction related articles, keywords and sentences.
3. The method for constructing the domain entity labeling corpus based on entity iteration as claimed in claim 1, wherein in S2, the collected information is filtered to remove useless information, the filtered information is divided into a series of labels, and the labels are classified.
4. The method for constructing the domain entity annotation corpus based on entity iteration as claimed in claim 1, wherein in S3, the articles, keywords and sentences with the largest number of occurrences after being screened are sent to the network and pushed to the mobile terminal for viewing.
5. the method for constructing the entity annotation corpus in the field based on entity iteration as claimed in claim 1, wherein in S4, the net friend information includes age information, comments, academic information and geographic information, and the comments of each age group, the comments of each academic section and the comments of different geographic location information are sorted and recorded.
6. The method for constructing a corpus based on entity iteration domain entity labeling of claim 1, wherein in S5, the comments and insights of people in each stage are integrated, and the articles, keywords and sentences related to corpus construction, enterprise culture construction, enterprise organization construction, enterprise wind construction, discipline construction and system construction are continuously corrected to construct a complete corpus, and the corpus is divided into keywords, bilinguals and multilinguals.
CN201910665738.2A 2019-07-23 2019-07-23 Method for constructing domain entity labeling corpus based on entity iteration Pending CN110569490A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910665738.2A CN110569490A (en) 2019-07-23 2019-07-23 Method for constructing domain entity labeling corpus based on entity iteration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910665738.2A CN110569490A (en) 2019-07-23 2019-07-23 Method for constructing domain entity labeling corpus based on entity iteration

Publications (1)

Publication Number Publication Date
CN110569490A true CN110569490A (en) 2019-12-13

Family

ID=68773834

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910665738.2A Pending CN110569490A (en) 2019-07-23 2019-07-23 Method for constructing domain entity labeling corpus based on entity iteration

Country Status (1)

Country Link
CN (1) CN110569490A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127628A (en) * 2021-04-23 2021-07-16 北京达佳互联信息技术有限公司 Method, device, equipment and computer-readable storage medium for generating comments

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104954234A (en) * 2015-05-19 2015-09-30 中国地质大学(北京) Microblog data acquisition method, microblog data acquisition device and public opinion analysis method
US20180089164A1 (en) * 2016-09-28 2018-03-29 Microsoft Technology Licensing, Llc Entity-specific conversational artificial intelligence
CN109271477A (en) * 2018-09-05 2019-01-25 杭州数湾信息科技有限公司 A kind of method and system by internet building taxonomy library

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104954234A (en) * 2015-05-19 2015-09-30 中国地质大学(北京) Microblog data acquisition method, microblog data acquisition device and public opinion analysis method
US20180089164A1 (en) * 2016-09-28 2018-03-29 Microsoft Technology Licensing, Llc Entity-specific conversational artificial intelligence
CN109271477A (en) * 2018-09-05 2019-01-25 杭州数湾信息科技有限公司 A kind of method and system by internet building taxonomy library

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
富永军著: "《现代公共文化服务发展与建设研究》", 30 September 2018, 吉林美术出版社 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127628A (en) * 2021-04-23 2021-07-16 北京达佳互联信息技术有限公司 Method, device, equipment and computer-readable storage medium for generating comments
CN113127628B (en) * 2021-04-23 2024-03-19 北京达佳互联信息技术有限公司 Method, apparatus, device and computer readable storage medium for generating comments

Similar Documents

Publication Publication Date Title
US11580104B2 (en) Method, apparatus, device, and storage medium for intention recommendation
US9742718B2 (en) Message optimization utilizing term replacement based on term sentiment score specific to message category
US7953724B2 (en) Method and system for disambiguating informational objects
US20140304267A1 (en) Suffix tree similarity measure for document clustering
US9183290B2 (en) Method and system for disambiguating informational objects
US8706730B2 (en) System and method for extraction of factoids from textual repositories
CN102708096A (en) Network intelligence public sentiment monitoring system based on semantics and work method thereof
US20080097985A1 (en) Information Access With Usage-Driven Metadata Feedback
CN111339754B (en) Case public opinion abstract generation method based on case element sentence association graph convolution
CN102207948A (en) Method for generating incident statement sentence material base
CN103049440A (en) Recommendation processing method and processing system for related articles
Carmel et al. Rank by time or by relevance? Revisiting email search
CN112486917A (en) Method and system for automatically generating information-rich content from multiple microblogs
CN108199951A (en) A kind of rubbish mail filtering method based on more algorithm fusion models
CN105989056A (en) Chinese news recommending system
CN110020327A (en) A kind of resume resolution system based on vertical search engine
CN103886020A (en) Quick search method of real estate information
CN113901308A (en) Knowledge graph-based enterprise recommendation method and recommendation device and electronic equipment
CN110728453A (en) Big data based policy automatic matching analysis system and method
CN110569490A (en) Method for constructing domain entity labeling corpus based on entity iteration
CN109885836A (en) A method of precisely segment
CN111008285B (en) Author disambiguation method based on thesis key attribute network
CN103049454A (en) Chinese and English search result visualization system based on multi-label classification
CN112380422A (en) Financial news recommending device based on keyword popularity
CN113128238B (en) Financial information semantic analysis method and system based on natural language processing technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220913

Address after: 361000 units 1702 and 1703, No. 59, Chengyi North Street, phase III, software park, Xiamen, Fujian

Applicant after: XIAMEN USEEAR INFORMATION TECHNOLOGY Co.,Ltd.

Address before: Unit 1701, unit 1704, No. 59, Chengyi North Street, phase III, software park, Xiamen City, Fujian Province, 361000

Applicant before: FUJIAN QIDIAN SPACE-TIME DIGITAL TECHNOLOGY Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191213