CN111090994A - Chinese-internet-forum-text-oriented event place attribution province identification method - Google Patents

Chinese-internet-forum-text-oriented event place attribution province identification method Download PDF

Info

Publication number
CN111090994A
CN111090994A CN201911101388.3A CN201911101388A CN111090994A CN 111090994 A CN111090994 A CN 111090994A CN 201911101388 A CN201911101388 A CN 201911101388A CN 111090994 A CN111090994 A CN 111090994A
Authority
CN
China
Prior art keywords
text
event
forum
post
place
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911101388.3A
Other languages
Chinese (zh)
Inventor
陈进东
刘琳琳
杜雨璇
张健
齐林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Information Science and Technology University
Original Assignee
Beijing Information Science and Technology University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Information Science and Technology University filed Critical Beijing Information Science and Technology University
Priority to CN201911101388.3A priority Critical patent/CN111090994A/en
Publication of CN111090994A publication Critical patent/CN111090994A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a Chinese Internet forum text oriented incident place attribution province identification method, which comprises the following steps: 1. constructing a place name attribution province inquiry dictionary; 2. chinese word segmentation based on a jieba tool; step two, event location identification: 1. extracting and constructing a characteristic value; 2. text event location identification; 3. multiple event locations are deduplicated; step three, determining the province of the homed province: and directly utilizing a place name attribution province query dictionary to query and determine the event place attribution province of the post text in the forum aiming at the identified event place of the post text in the forum. The invention provides a clear idea for dealing with complex text word segmentation, particularly for removing duplication of a plurality of event places and identifying provinces to which the event places belong on the basis of event place identification. The method is simple to realize and easy to generalize, and compared with the traditional place name identification, the method has the advantage that the fineness and the accuracy are obviously improved.

Description

Chinese-internet-forum-text-oriented event place attribution province identification method
Technical Field
The invention relates to the fields of computer science and technology, natural language processing, public opinion analysis, text mining and the like, in particular to a Chinese internet forum text-oriented incident place attribution province identification method.
Background
The method can be used for counting and analyzing main public opinion events and public opinion conditions of different provinces, transversely comparing the public opinion levels and the public opinion differences of different provinces, and providing support for government accurate management and intelligent decision. The basis for identifying the province of the network text where the event occurs is to identify the event location of the network text, and a certain achievement is obtained in the event location identification of the network text such as news by means of a natural language processing tool, a location dictionary, a classification model, and the like.
The Chinese web forum is becoming more and more important worldwide, and a large number of posts and comment texts are emerging every day. The published contents of users often contain rich public sentiment and place information, including various vocabularies of parts of speech such as simple place names, composite place names, organization names, enterprise names, landmark place names and the like. Based on the published content of forum users, the main public sentiment events and public sentiment conditions of different provinces can be identified, so that local public sentiments can be reflected more directly and accurately, and support is provided for government decision. However, most of the forum users have informal publications, the quality of the corpus is worse than that of the news, and how to accurately identify the event location from these large amount of forum texts is a difficult problem.
The method for identifying provinces belonging to the event site of the Chinese-oriented Internet forum post text has no patent with obvious pertinence at present. The patent title "an event and place extraction method for Chinese news text"; CN 104731768A; the invention discloses a place name extraction method for a news text, which is realized by extracting candidate event places, constructing a feature vector and identifying event places in the news text. However, the patent can only identify the event location, and no clear idea is provided for the convenience of dealing with word segmentation of complex texts, duplicate removal of a plurality of event locations, identification of provinces to which the event locations belong, and the like.
Disclosure of Invention
The invention aims to provide a Chinese-oriented internet forum text-oriented incident place attribution province identification method. The method comprises the steps of segmenting the post text in the Chinese network forum by adopting a jieba Chinese word segmentation tool, carrying out binary classification on the place names acquired by segmenting words by adopting a support vector machine, and finally determining the occurrence place of the post text event by using an event place attributive province query dictionary.
In order to achieve the above purposes, the technical scheme adopted by the invention is as follows:
a method for identifying provinces belonging to an event site facing to Chinese Internet forum texts comprises the following steps:
step one, text word segmentation
1. Constructing a place name attribution province inquiry dictionary: constructing a place name attribution province inquiry dictionary of 'city, county, town, street office and landmark → attribution province' through a four-level administrative division place name lexicon in a dog searching input method lexicon, a large-scale and Chinese landscape and famous site lexicon of a government organization;
2. chinese word segmentation based on a jieba tool: adopting a jieba Chinese word segmentation tool, establishing a custom dictionary, adding a four-level administrative division place word bank, a government organization and organization universe and a Chinese landscape scenic spot word bank, and segmenting the content of a post text T in a forum by adopting an accurate mode;
step two, event location identification
1. Extracting and constructing characteristic values: part-of-speech tagging is carried out through jieba, and place names, organization groups and landmark nouns in text participles are extracted to form a candidate event place setAnd WT(ii) a For set WTIs of each feature vector w'iTwo features are selected, including w'iContextual feature c in post text TiAnd w'iLocation feature p in post text Ti
2. Text event location identification
Manually marking the event location of the post text in the forum, and marking 5000 post texts; based on w'iTraining an SVM event location classifier according to the context characteristics and the position characteristics in the post text and the result of artificial marking, and utilizing the SVM event location classifier to collect WTOf (2) a feature vector w'iPerforming binary classification on the event location and the non-event location, and identifying the event location of the post text;
3. multiple event location deduplication
Aiming at the condition that a plurality of event places are identified, calculating cosine similarity between the content of the post text in the forum and different event places through a word distributed vector established by the word2vec model through unsupervised learning, and selecting the event place with high cosine similarity as the only event place of the post text in the forum;
step three, determining the province of the homed province
And directly utilizing a place name attribution province query dictionary to query and determine the event place attribution province of the post text in the forum aiming at the identified event place of the post text in the forum.
For set WTW 'of'iContextual feature in post text T, w'iWeight representation of matched regular expressions, denoted ci
(1) If w'iOne of the regular expressions from formula (1) to formula (10) can be matched in the post text T, and the k-th regular expression is assumed to be rkWhen k is 1-10, rkThe expression of (a) is as follows:
r1^ w + generation $ (1)
r2=^\,\w$ (2)
r3^ at + \ w $ (3)
r4=^\:\w$ (4)
r5^ report + \ w $ (5)
r6^ explosive + \ w $ (6)
r7Is ^ is + \ w $ (7)
r8^ report + \ w $ (8)
r9^ name + \\ w $ (9)
r10^ located \ w + $ (10)
The weight matching the kth regular expression is represented by the value tfidf (k), which is expressed as
ci=tfidfi,j(k)=tfi,j(k)×idfi,j(k) (11)
Tfidf is a fixed algorithm name, tf represents word frequency, idf represents an inverse text frequency index, and k represents matching with the kth regular expression;
wherein, tfi,j(k) Defined by the following equation:
Figure BDA0002269970300000051
wherein n isi,j(k) Represents w'iNumber of times of conforming to kth regular expression in post text j, N (k) represents w'iThe times of conforming to the kth regular expression in all post texts in the forum;
idfi,j(k) defined by the following equation:
Figure BDA0002269970300000052
where | D | represents the number of all posts in the forum, rkDenotes the kth regular expression, djThe number of posts containing the kth regular expression in all post text sets in the forum is represented, and the +1 in the denominator is used for preventing the denominator from being 0 and being incapable of calculating due to the fact that no post contains the kth regular expression in the corpus;
(2) if w'iRegular expressions in formula (1) to formula (10) cannot be matched in the post text T, ci=0。
For the position feature piThere are two cases:
(1) occurrence of position information, p, in the title of post text Ti=0.99;
(2) The location information appears in the non-title text of the post text T,
Figure BDA0002269970300000053
wherein loc (p)iT) represents w from the start of post text T'iThe number of words between the first occurrence positions; length (T) represents the total word count of the post text T.
The invention has the beneficial effects that:
the invention relates to a Chinese Internet forum text-oriented incident place belonging province identification method, which is realized by extracting candidate incident places in a Chinese Internet forum, identifying incident places and determining belonging provinces. The method provides a clear idea in the aspects of dealing with complex text word segmentation, particularly, de-duplication of a plurality of event places on the basis of event place identification, identification of provinces to which the event places belong and the like. The method is simple to realize and easy to generalize, and compared with the traditional place name identification, the method has the advantage that the fineness and the accuracy are obviously improved.
Drawings
The invention has the following drawings:
FIG. 1: the invention is a schematic diagram of an event location attribution province identification process for Chinese internet forum texts;
FIG. 2: the invention is based on a text classification process schematic diagram of a support vector machine;
FIG. 3: the invention discloses a network structure schematic diagram of a word2vec model.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
As shown in fig. 1-3, the method for identifying provinces belonging to an event site facing to a chinese internet forum text according to the present invention includes the following steps:
the method comprises the following steps: text word segmentation
1. And constructing a place name attribution province inquiry dictionary. A place name attribution province inquiry dictionary of 'city, county, town, street office and landmark → attribution province' is constructed through a four-level administrative division place name lexicon in a dog search input method lexicon and a large-scale and Chinese landscape and famous site lexicon of government organization organizations.
2. Chinese word segmentation based on a jieba tool. In order to ensure higher word segmentation accuracy, a jieba Chinese word segmentation tool is adopted to establish a custom dictionary (userdite), a four-level administrative district word bank, a large-scale and Chinese landscape word bank of government organization groups are added, and the content of a post text T (a certain post text in a forum) in the forum is segmented by adopting an accurate mode.
Step two: event location identification
1. And extracting and constructing the characteristic value. Part-of-speech tagging is carried out through jieba, and place names, organization groups and landmark nouns in text participles are extracted to form a candidate event place set WT. For set WTIs of each feature vector w'iTwo features are selected, including w'iContextual feature c in post text TiAnd w'iLocation feature p in post text Ti
The method is characterized in that: contextual feature ci
For set WTW 'of'iContextual feature in post text T, w'iWeight representation of matched regular expressions, denoted ci
(1) Imitating the concept of TF-IDF algorithm, if w'iOne of the regular expressions from formula (1) to formula (10) can be matched in the post text T, and the k-th regular expression is assumed to be rkWhen k is 1-10, rkThe expression of (a) is as follows:
r1^ w + generation $ (1)
r2=^\,\w$ (2)
r3^ at + \ w $ (3)
r4=^\:\w$ (4)
r5^ report + \ w $ (5)
r6^ explosive + \ w $ (6)
r7Is ^ is + \ w $ (7)
r8^ report + \ w $ (8)
r9^ name + \\ w $ (9)
r10^ located \ w + $ (10)
The weight matching the kth regular expression is represented by the value tfidf (k), which is expressed as
ci=tfidfi,j(k)=tfi,j(k)×idfi,j(k) (11)
Tfidf is a fixed algorithm name, tf represents word frequency, idf represents an inverse text frequency index, and k represents matching with the kth regular expression;
a)tfi,j(k) defined by the following equation:
Figure BDA0002269970300000081
wherein n isi,j(k) Represents w'iNumber of times of conforming to kth regular expression in post text j, N (k) represents w'iThe number of times that the kth regular expression is met in all post texts in the forum.
b)idfi,j(k) Defined by the following equation:
Figure BDA0002269970300000091
where | D | represents the number of all posts in the forum, rkDenotes the kth regular expression, djThe number of posts containing the kth regular expression in all post text sets in the forum is represented, and the +1 in the denominator is used for preventing that the denominator is 0 and calculation cannot be carried out due to the fact that no post in the corpus contains the kth regular expression.
(2) If w'iIn post textT cannot match the regular expressions in the formulas (1) to (10), then ci=0。
The second characteristic: position feature pi
(1) The position information appears in the title of the post text T. Generally, the occurrence of a single location in a title is largely an incident, but it is found by reading the corpus that a case of multiple place names in the title occasionally occurs. Therefore, the feature of the position information appearing in the title can be weighted more heavily, pi=0.99。
(2) The location information appears in the non-title text of the post text T,
Figure BDA0002269970300000092
wherein loc (p)iT) represents w from the start of post text T'iThe number of words between the first occurrence positions; length (T) represents the total word count of the post text T.
2. Text event location identification
And manually marking the event location of the post texts in the forum, and marking 5000 post texts in order to ensure the accuracy of the classifier. Based on w'iTraining an SVM event location classifier according to the context characteristics and the position characteristics in the post text and the result of artificial marking, and utilizing the SVM event location classifier to collect WTOf (2) a feature vector w'iAnd performing binary classification of the event location and the non-event location, and identifying the event location of the post text.
3. Multiple event location deduplication
Aiming at the condition that a plurality of event places are identified, cosine similarity between the content of the post text in the forum and different event places is calculated through a word distributed vector established by the word2vec model through unsupervised learning, and the event place with high cosine similarity is selected as the only event place of the post text in the forum.
Step three: home province determination
And directly utilizing a place name attribution province query dictionary to query and determine the event place attribution province of the post text in the forum aiming at the identified event place of the post text in the forum.
Those not described in detail in this specification are within the skill of the art.

Claims (3)

1. A method for identifying provinces belonging to an event site and oriented to Chinese Internet forum texts is characterized by comprising the following steps:
step one, text word segmentation
1. Constructing a place name attribution province inquiry dictionary: constructing a place name attribution province inquiry dictionary of 'city, county, town, street office and landmark → attribution province' through a four-level administrative division place name lexicon in a dog searching input method lexicon, a large-scale and Chinese landscape and famous site lexicon of a government organization;
2. chinese word segmentation based on a jieba tool: adopting a jieba Chinese word segmentation tool, establishing a custom dictionary, adding a four-level administrative division place word bank, a government organization and organization universe and a Chinese landscape scenic spot word bank, and segmenting the content of a post text T in a forum by adopting an accurate mode;
step two, event location identification
1. Extracting and constructing characteristic values: part-of-speech tagging is carried out through jieba, and place names, organization groups and landmark nouns in text participles are extracted to form a candidate event place set WT(ii) a For set WTIs of each feature vector w'iTwo features are selected, including w'iContextual feature c in post text TiAnd w'iLocation feature p in post text Ti
2. Text event location identification
Manually marking the event location of the post text in the forum, and marking 5000 post texts; based on w'iTraining an SVM event location classifier according to the context characteristics and the position characteristics in the post text and the result of artificial marking, and utilizing the SVM event location classifier to collect WTOf (2) a feature vector w'iPerforming binary classification of event location and non-event location to identify post textA piece location;
3. multiple event location deduplication
Aiming at the condition that a plurality of event places are identified, calculating cosine similarity between the content of the post text in the forum and different event places through a word distributed vector established by the word2vec model through unsupervised learning, and selecting the event place with high cosine similarity as the only event place of the post text in the forum;
step three, determining the province of the homed province
And directly utilizing a place name attribution province query dictionary to query and determine the event place attribution province of the post text in the forum aiming at the identified event place of the post text in the forum.
2. The method for identifying provinces of event sites oriented to the text of the chinese internet forum as claimed in claim 1, wherein:
for set WTW 'of'iContextual feature in post text T, w'iWeight representation of matched regular expressions, denoted ci
(1) If w'iOne of the regular expressions from formula (1) to formula (10) can be matched in the post text T, and the k-th regular expression is assumed to be rkWhen k is 1-10, rkThe expression of (a) is as follows:
r1^ w + generation $ (1)
r2=^\,\w$ (2)
r3^ at + \ w $ (3)
r4=^\:\w$ (4)
r5^ report + \ w $ (5)
r6^ explosive + \ w $ (6)
r7Is ^ is + \ w $ (7)
r8^ report + \ w $ (8)
r9^ name + \\ w $ (9)
r10^ located \ w + $ (10)
The weight matching the kth regular expression is represented by the value tfidf (k), which is expressed as
ci=tfidfi,j(k)=tfi,j(k)×idfi,j(k) (11)
Tfidf is a fixed algorithm name, tf represents word frequency, idf represents an inverse text frequency index, and k represents matching with the kth regular expression;
wherein, tfi,j(k) Defined by the following equation:
Figure FDA0002269970290000031
wherein n isi,j(k) Represents w'iNumber of times of conforming to kth regular expression in post text j, N (k) represents w'iThe times of conforming to the kth regular expression in all post texts in the forum;
idfi,j(k) defined by the following equation:
Figure FDA0002269970290000032
where | D | represents the number of all posts in the forum, rkDenotes the kth regular expression, djThe number of posts containing the kth regular expression in all post text sets in the forum is represented, and the +1 in the denominator is used for preventing the denominator from being 0 and being incapable of calculating due to the fact that no post contains the kth regular expression in the corpus;
(2) if w'iRegular expressions in formula (1) to formula (10) cannot be matched in the post text T, ci=0。
3. The method for identifying the province of the event location oriented to the Chinese forum text as claimed in claim 1, wherein the location characteristic p isiThere are two cases:
(1) occurrence of position information, p, in the title of post text Ti=0.99;
(2) Non-logo in post text TThe position information appears in the subject text and,
Figure FDA0002269970290000041
wherein loc (p)iT) represents w from the start of post text T'iThe number of words between the first occurrence positions; length (T) represents the total word count of the post text T.
CN201911101388.3A 2019-11-12 2019-11-12 Chinese-internet-forum-text-oriented event place attribution province identification method Pending CN111090994A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911101388.3A CN111090994A (en) 2019-11-12 2019-11-12 Chinese-internet-forum-text-oriented event place attribution province identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911101388.3A CN111090994A (en) 2019-11-12 2019-11-12 Chinese-internet-forum-text-oriented event place attribution province identification method

Publications (1)

Publication Number Publication Date
CN111090994A true CN111090994A (en) 2020-05-01

Family

ID=70393557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911101388.3A Pending CN111090994A (en) 2019-11-12 2019-11-12 Chinese-internet-forum-text-oriented event place attribution province identification method

Country Status (1)

Country Link
CN (1) CN111090994A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680500A (en) * 2020-06-10 2020-09-18 深圳前海微众银行股份有限公司 Address recognition method, device, equipment and computer readable storage medium
CN111813835A (en) * 2020-07-14 2020-10-23 上海元卓信息科技有限公司 Public activity center identification system based on mobile phone signaling and POI data
CN113255352A (en) * 2021-05-12 2021-08-13 北京易华录信息技术股份有限公司 Street information determination method and device and computer equipment
CN113822057A (en) * 2021-08-06 2021-12-21 北京百度网讯科技有限公司 Location information determination method, location information determination device, electronic device, and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104731768A (en) * 2015-03-05 2015-06-24 西安交通大学城市学院 Incident location extraction method oriented to Chinese news texts
CN106055658A (en) * 2016-06-02 2016-10-26 中国人民解放军国防科学技术大学 Extraction method aiming at Twitter text event
CN106503150A (en) * 2016-10-21 2017-03-15 天津海量信息技术股份有限公司 Chinese Place Names administrative division belongs to recognition methods
CN110298039A (en) * 2019-06-20 2019-10-01 北京百度网讯科技有限公司 Recognition methods, system, equipment and the computer readable storage medium of event

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104731768A (en) * 2015-03-05 2015-06-24 西安交通大学城市学院 Incident location extraction method oriented to Chinese news texts
CN106055658A (en) * 2016-06-02 2016-10-26 中国人民解放军国防科学技术大学 Extraction method aiming at Twitter text event
CN106503150A (en) * 2016-10-21 2017-03-15 天津海量信息技术股份有限公司 Chinese Place Names administrative division belongs to recognition methods
CN110298039A (en) * 2019-06-20 2019-10-01 北京百度网讯科技有限公司 Recognition methods, system, equipment and the computer readable storage medium of event

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680500A (en) * 2020-06-10 2020-09-18 深圳前海微众银行股份有限公司 Address recognition method, device, equipment and computer readable storage medium
CN111680500B (en) * 2020-06-10 2023-07-14 深圳前海微众银行股份有限公司 Address recognition method, address recognition device, address recognition equipment and computer-readable storage medium
CN111813835A (en) * 2020-07-14 2020-10-23 上海元卓信息科技有限公司 Public activity center identification system based on mobile phone signaling and POI data
CN111813835B (en) * 2020-07-14 2023-09-26 上海元卓信息科技有限公司 Public activity center recognition system based on mobile phone signaling and POI data
CN113255352A (en) * 2021-05-12 2021-08-13 北京易华录信息技术股份有限公司 Street information determination method and device and computer equipment
CN113822057A (en) * 2021-08-06 2021-12-21 北京百度网讯科技有限公司 Location information determination method, location information determination device, electronic device, and storage medium
CN113822057B (en) * 2021-08-06 2022-10-18 北京百度网讯科技有限公司 Location information determination method, location information determination device, electronic device, and storage medium

Similar Documents

Publication Publication Date Title
CN110442760B (en) Synonym mining method and device for question-answer retrieval system
CN108829658B (en) Method and device for discovering new words
CN109960724B (en) Text summarization method based on TF-IDF
CN107862070B (en) Online classroom discussion short text instant grouping method and system based on text clustering
CN111104794A (en) Text similarity matching method based on subject words
CN111090994A (en) Chinese-internet-forum-text-oriented event place attribution province identification method
CN106599054B (en) Method and system for classifying and pushing questions
CN107844559A (en) A kind of file classifying method, device and electronic equipment
CN112818093B (en) Evidence document retrieval method, system and storage medium based on semantic matching
US20060206306A1 (en) Text mining apparatus and associated methods
CN104199965A (en) Semantic information retrieval method
CN113962293B (en) LightGBM classification and representation learning-based name disambiguation method and system
CN104484380A (en) Personalized search method and personalized search device
CN108038099B (en) Low-frequency keyword identification method based on word clustering
CN113886604A (en) Job knowledge map generation method and system
CN109241277A (en) The method and system of text vector weighting based on news keyword
CN110750995A (en) File management method based on user-defined map
Gao et al. Sentiment classification for stock news
Tüselmann et al. Are end-to-end systems really necessary for NER on handwritten document images?
Fodil et al. Theme classification of Arabic text: A statistical approach
CN112597768B (en) Text auditing method, device, electronic equipment, storage medium and program product
CN110705285B (en) Government affair text subject word library construction method, device, server and readable storage medium
CN116932736A (en) Patent recommendation method based on combination of user requirements and inverted list
CN110413985B (en) Related text segment searching method and device
Ziv et al. CompanyName2Vec: Company Entity Matching Based on Job Ads

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200501

RJ01 Rejection of invention patent application after publication