CN109597879B - Service behavior relation extraction method and device based on 'citation relation' data - Google Patents

Service behavior relation extraction method and device based on 'citation relation' data Download PDF

Info

Publication number
CN109597879B
CN109597879B CN201811463779.5A CN201811463779A CN109597879B CN 109597879 B CN109597879 B CN 109597879B CN 201811463779 A CN201811463779 A CN 201811463779A CN 109597879 B CN109597879 B CN 109597879B
Authority
CN
China
Prior art keywords
business
behavior
relation
words
corpus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811463779.5A
Other languages
Chinese (zh)
Other versions
CN109597879A (en
Inventor
蓝建敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Excellence Information Technology Co ltd
Original Assignee
Excellence Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Excellence Information Technology Co ltd filed Critical Excellence Information Technology Co ltd
Priority to CN201811463779.5A priority Critical patent/CN109597879B/en
Publication of CN109597879A publication Critical patent/CN109597879A/en
Application granted granted Critical
Publication of CN109597879B publication Critical patent/CN109597879B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering

Abstract

The invention discloses a service behavior relation extraction method and a device based on citation relation data, wherein the method comprises the following steps: collecting corpora, preprocessing the corpora and constructing a corpus; extracting business behavior words from all document titles in the corpus, classifying the business behavior words according to business fields, and forming a business behavior word bank corresponding to each business field; extracting the relation data of all file titles and the titles of the cited files from the corpus to construct a cited relation database; and counting the number and the simultaneous occurrence times of the business behavior words and the referenced business behavior words according to the citation relation database, generating a business behavior relation, and constructing a business behavior relation database. The invention can improve the business of the correlation relationship, is closer to the business reality than the single word distance, and improves the accuracy of knowledge retrieval based on tasks.

Description

Service behavior relation extraction method and device based on 'citation relation' data
Technical Field
The invention relates to the technical field of big data mining, in particular to a service behavior relation extraction method and device based on quotation relation data.
Background
In the study and practice of the prior art, the inventors of the present invention found that:
the first traditional method comprises the following steps: manually studying and reading a plurality of files, identifying business behaviors in the files and establishing correlation among the behaviors. The method has the advantages of large workload, narrow coverage and inaccurate relation weight by completely and manually constructing the relation between the business behaviors.
The second traditional method is as follows: and calculating the distance between the business behavior words by using a word2vec algorithm so as to calculate the correlation between the business behaviors. The correlation relationship calculated by the method is not strong in business, and the requirement of searching the correlation knowledge cannot be really met.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a method and a device for extracting business behavior relation based on 'citation relation' data, which can improve the business performance of the correlation relation, are closer to the business reality than the distance of a single word, and improve the accuracy of task-based knowledge retrieval.
In order to solve the above problem, an embodiment of the present invention provides a method for extracting a business behavior relationship based on "citation relationship" data, including:
collecting corpora, preprocessing the corpora and constructing a corpus;
extracting business behavior words from all document titles in the corpus, classifying the business behavior words according to business fields, and forming a business behavior word bank corresponding to each business field;
extracting the relation data of all file titles and the titles of the cited files from the corpus to construct a cited relation database;
and counting the number and the simultaneous occurrence times of the business behavior words and the referenced business behavior words according to the citation relation database, generating a business behavior relation, and constructing a business behavior relation database.
Further, the collecting of the corpora specifically includes searching for the existing corpora, and downloading and capturing the corpora from the internet; and preprocessing the corpus, specifically, performing corpus cleaning, word segmentation, part of speech tagging and word removal.
Further, the extracting of the business behavior words from all document titles in the corpus specifically includes:
analyzing and word segmentation are carried out on all file titles in the corpus;
collecting service behavior words comprising known service behavior words, continuously derived service behavior words and service behavior words needing to be converted;
screening and testing business behavior words;
and carrying out initial classification and reasoning on the business behavior words.
Further, the relational data of all the document titles and the cited document titles are extracted from the corpus to construct a cited relational database, specifically:
analyzing the content of each file in the corpus, and extracting the relation data of the file title and the title of the file to be quoted;
according to the file titles, marking a business behavior label on each file to form citation relation data, and constructing a citation relation database; the citation relation data comprises a file title, a behavior tag, a cited file title and a cited behavior tag.
Another embodiment of the present invention further provides a device for extracting business behavior relationship based on "citation relationship" data, including:
the corpus database module is used for collecting corpora, preprocessing the corpora and constructing a corpus;
the business behavior word bank module is used for extracting business behavior words from all file titles in the corpus and classifying the business behavior words according to business fields to form a business behavior word bank corresponding to each business field;
the quotation relation database module is used for extracting relation data of all file titles and quotation file titles from the corpus and constructing a quotation relation database;
and the business behavior relation library module is used for counting the number and the simultaneous occurrence times of the business behavior words and the quoted business behavior words according to the citation relation database, generating a business behavior relation and constructing a business behavior relation library.
Further, the corpus module is specifically configured to: searching the existing linguistic data, and downloading and capturing the linguistic data from the network; and performing corpus cleaning, word segmentation, part of speech tagging and word removal and stop on the corpus.
Further, the business behavior lexicon module is specifically configured to:
analyzing and word segmentation are carried out on all file titles in the corpus;
collecting service behavior words comprising known service behavior words, continuously derived service behavior words and service behavior words needing to be converted;
screening and testing business behavior words;
and carrying out initial classification and reasoning on the business behavior words.
Further, the citation relation database module is specifically configured to:
analyzing the content of each file in the corpus, and extracting the relation data of the file title and the title of the file to be quoted;
according to the file titles, marking a business behavior label on each file to form citation relation data, and constructing a citation relation database; the citation relation data comprises a file title, a behavior tag, a cited file title and a cited behavior tag.
Yet another embodiment of the present invention further provides a "citation relationship" data-based business behavior relationship extraction device, which is characterized by comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, and when the processor executes the computer program, the business behavior relationship extraction device implements the "citation relationship" data-based business behavior relationship extraction method as described above.
By implementing the embodiment of the invention, the business of the correlation relationship can be improved, the business reality is closer than the single word distance, and the accuracy of the knowledge retrieval based on the task is improved.
Drawings
Fig. 1 is a schematic flowchart of a business behavior relationship extraction method based on "citation relationship" data according to an embodiment of the present invention;
fig. 2 is another schematic flow chart of a business behavior relationship extraction method based on "citation relationship" data according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a business behavior relationship extraction apparatus based on "citation relationship" data according to another embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In a first aspect, please refer to FIGS. 1-2. One embodiment of the present invention provides a business behavior relationship extraction method based on "citation relationship" data, including:
and S1, collecting the linguistic data, preprocessing the linguistic data and constructing a corpus.
The method comprises the following steps of collecting linguistic data, specifically searching the existing linguistic data, and downloading and capturing the linguistic data from the internet; and preprocessing the corpus, specifically, performing corpus cleaning, word segmentation, part of speech tagging and word removal.
In a specific embodiment, the method mainly collects and arranges the special policies and leader speech files of central and provincial governments of various government official networks.
It can be understood that many organizations such as business departments and companies accumulate a great deal of paper or electronic text data as business progresses. Then, for these data, we integrate slightly under the allowed conditions, and the whole paper text is electronized to be used as our corpus.
The user can also select to obtain a standard open data set at home and abroad, such as Chinese and Chinese dog searching corpus and people daily newspaper corpus at home. The crawler itself can also choose to capture some data before proceeding with the subsequent content.
In one embodiment, the corpus pre-processing is about 50% -70% of the workload of a complete Chinese natural language processing engineering application, so developers are mostly in the process of corpus pre-processing. The preprocessing of the corpus is completed through four major aspects of data cleaning, word segmentation, part of speech tagging and word stop removal.
1. Corpus cleaning
And (2) data cleaning, namely finding interesting things in the corpus as the name implies, cleaning and deleting uninterested contents regarded as noise, wherein the method comprises the steps of extracting information such as titles, abstracts and texts from the original text, and removing codes and comments such as advertisements, tags, HTML (hypertext markup language), JS (JavaScript) and the like from the captured webpage contents. Common data cleansing methods are: manual deduplication, alignment, deletion, tagging, etc., or rule extraction content, regular expression matching, extraction according to part of speech and named entities, script writing or code batch processing, etc.
2. Word segmentation
The Chinese corpus data is a batch of short texts or long texts, such as: a sentence, abstract, paragraph, or whole article. The words and expressions between the general sentences and paragraphs are continuous and have certain meanings. When text mining analysis is performed, the minimum unit granularity of text processing is expected to be words or words, so that word segmentation is needed to perform word segmentation on the whole text at this time.
Common word segmentation algorithms are: the method comprises a word segmentation method based on character string matching, a word segmentation method based on understanding, a word segmentation method based on statistics and a word segmentation method based on rules, wherein each method corresponds to a plurality of specific algorithms.
The main difficulties of the current chinese word segmentation algorithm are ambiguity recognition and new word recognition, such as: "badminton auction is finished", this can be divided into "badminton auction is finished", also can be divided into "badminton auction is finished", if do not rely on other sentences of the context, fear to know how to understand it is difficult.
3. Part-of-speech tagging
Part-of-speech tagging is to tag each word or word with a part-of-speech class, such as adjectives, verbs, nouns, etc. This allows the text to incorporate more useful language information in later processing. Part-of-speech tagging is a classic sequence tagging problem, although part-of-speech tagging is not necessary for some Chinese natural language processing. For example, common text classification does not concern the part-of-speech problem, but similar emotion analysis and knowledge reasoning are needed, and the following figure is a common Chinese part-of-speech sorting.
Common part-of-speech tagging methods can be divided into rule-based and statistical-based methods. Wherein the statistical-based methods such as part-of-speech tagging based on maximum entropy, part-of-speech output based on statistical maximum probability, and part-of-speech tagging based on HMM.
4. Stop word
Stop words generally refer to words that do not contribute to text features, such as punctuation, tone, human scale, and so on. So in general text processing, after word segmentation, the next step is to stop the word. However, for Chinese, the operation of stop words is not constant, and the stop word dictionary is determined according to specific scenes, for example, in emotion analysis, the word of tone and the exclamation mark should be retained because they have certain contribution and meaning to expressing the degree of tone and emotional color.
S2, extracting the business behavior words from all the file titles in the corpus, and classifying the business behavior words according to the business fields to form a business behavior word bank corresponding to each business field.
Wherein, the extracting of the business behavior words from all the document titles in the corpus is specifically:
analyzing and word segmentation are carried out on all file titles in the corpus;
collecting service behavior words comprising known service behavior words, continuously derived service behavior words and service behavior words needing to be converted;
screening and testing business behavior words;
and carrying out initial classification and reasoning on the business behavior words.
In particular embodiments, targeted business activity words may allow a slew rate client to find entries.
The method mainly comprises the following steps:
1. and collecting business behavior words.
(1) Continuously derived business behavior words;
(2) most people do not know the existing business behavior words (do not know that the words have conversion rate). It can be understood that the system will select only if the user searches, from which we just find the core word.
2. And screening the business behavior words.
After new words are generated continuously and old words disappear continuously, the system can screen whether new species are generated or not continuously. But not every word is useful and words that apparently do not meet the user's needs should be cut off. Business behavior words that are obviously useless are removed. The test of the business behavior words can acquire some business behavior words which cannot be judged.
3. And testing the business behavior words.
The conversion rate of the business behavior words is checked by using a testing tool, but the conversion rate cannot be singly judged, wherein each link, customer service, website content and the like are required to be the standards of high and low conversion rate. After the test is passed, the obtained business behavior words are effective business behavior words.
4. And classifying and reasoning the business behavior words.
And S3, extracting the relation data of all the file titles and the cited file titles from the corpus to construct a cited relation database. Specifically, the method comprises the following steps:
analyzing the content of each file in the corpus, and extracting the relation data of the file title and the title of the file to be quoted;
according to the file titles, marking a business behavior label on each file to form citation relation data, and constructing a citation relation database; the citation relation data comprises a file title, a behavior tag, a cited file title and a cited behavior tag.
In a particular embodiment, the citation relationship data is extracted from the referenced reference library into the citation relationship database at the time of transacting the document.
S4, according to the citation relation database, counting the number of the business behavior words and the cited business behavior words and the number of simultaneous occurrence, generating a business behavior relation, and constructing a business behavior relation database.
In a specific embodiment, the correlation between business activities is evaluated based on the number of simultaneous occurrences.
Compared with the relationship between manual work and business behavior construction based on the word2vec algorithm, the method has the following advantages:
(1) and the manual work and the machine are combined, so that the efficiency is higher than that of the simple manual construction.
(2) And the strong relation between the services implied by the quotation relation data is closer to the service authenticity than the single word distance, so that the strong relation is stronger than the service relation constructed by the word2vec algorithm.
By implementing the embodiment of the invention, the business of the correlation relationship can be improved, the business reality is closer than the single word distance, and the accuracy of the knowledge retrieval based on the task is improved.
In a second aspect, as shown in fig. 3, another embodiment of the present invention further provides a business behavior relationship extracting apparatus based on "citation relationship" data, including:
and the corpus library module 21 is used for collecting the corpus, preprocessing the corpus and constructing a corpus.
Wherein, the corpus library module 21 is specifically configured to: searching the existing linguistic data, and downloading and capturing the linguistic data from the network; and performing corpus cleaning, word segmentation, part of speech tagging and word removal and stop on the corpus.
In a specific embodiment, the method mainly collects and arranges the special policies and leader speech files of central and provincial governments of various government official networks.
It can be understood that many organizations such as business departments and companies accumulate a great deal of paper or electronic text data as business progresses. Then, for these data, we integrate slightly under the allowed conditions, and the whole paper text is electronized to be used as our corpus.
The user can also select to obtain a standard open data set at home and abroad, such as Chinese and Chinese dog searching corpus and people daily newspaper corpus at home. The crawler itself can also choose to capture some data before proceeding with the subsequent content.
In one embodiment, the corpus pre-processing is about 50% -70% of the workload of a complete Chinese natural language processing engineering application, so developers are mostly in the process of corpus pre-processing. The preprocessing of the corpus is completed through four major aspects of data cleaning, word segmentation, part of speech tagging and word stop removal.
And the business behavior word library module 22 is configured to extract business behavior words from all document titles in the corpus, and classify the business behavior words according to business fields to form a business behavior word library corresponding to each business field.
The business behavior word bank module 22 is specifically configured to:
analyzing and word segmentation are carried out on all file titles in the corpus;
collecting service behavior words comprising known service behavior words, continuously derived service behavior words and service behavior words needing to be converted;
screening and testing business behavior words;
and carrying out initial classification and reasoning on the business behavior words.
In particular embodiments, targeted business activity words may allow a slew rate client to find entries.
The method mainly comprises the following steps:
1. and collecting business behavior words.
(1) Continuously derived business behavior words;
(2) most people do not know the existing business behavior words (do not know that the words have conversion rate). It can be understood that the system will select only if the user searches, from which we just find the core word.
2. And screening the business behavior words.
After new words are generated continuously and old words disappear continuously, the system can screen whether new species are generated or not continuously. But not every word is useful and words that apparently do not meet the user's needs should be cut off. Business behavior words that are obviously useless are removed. The test of the business behavior words can acquire some business behavior words which cannot be judged.
3. And testing the business behavior words.
The conversion rate of the business behavior words is checked by using a testing tool, but the conversion rate cannot be singly judged, wherein each link, customer service, website content and the like are required to be the standards of high and low conversion rate. After the test is passed, the obtained business behavior words are effective business behavior words.
4. And classifying and reasoning the business behavior words.
And the citation relational database module 23 is used for extracting the relational data of all the file titles and the cited file titles from the corpus to construct a citation relational database.
Wherein, the citation relation database module 23 is specifically configured to:
analyzing the content of each file in the corpus, and extracting the relation data of the file title and the title of the file to be quoted;
according to the file titles, marking a business behavior label on each file to form citation relation data, and constructing a citation relation database; the citation relation data comprises a file title, a behavior tag, a cited file title and a cited behavior tag.
In a particular embodiment, the citation relationship data is extracted from the referenced reference library into the citation relationship database at the time of transacting the document.
And the business behavior relation library module 24 is used for counting the number and the simultaneous occurrence times of the business behavior words and the quoted business behavior words according to the citation relation database, generating a business behavior relation and constructing a business behavior relation library.
In a specific embodiment, the correlation between business activities is evaluated based on the number of simultaneous occurrences.
By implementing the embodiment of the invention, the business of the correlation relationship can be improved, the business reality is closer than the single word distance, and the accuracy of the knowledge retrieval based on the task is improved.
Yet another embodiment of the present invention further provides a "citation relationship" data-based business behavior relationship extraction device, which is characterized by comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, and when the processor executes the computer program, the business behavior relationship extraction device implements the "citation relationship" data-based business behavior relationship extraction method as described above.
The foregoing is directed to the preferred embodiment of the present invention, and it is understood that various changes and modifications may be made by one skilled in the art without departing from the spirit of the invention, and it is intended that such changes and modifications be considered as within the scope of the invention.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Claims (7)

1. A business behavior relation extraction method based on 'citation relation' data is characterized by comprising the following steps:
the method comprises the following steps of collecting corpora, preprocessing the corpora and constructing a corpus, wherein the collected corpora are used for searching the existing corpora, downloading and capturing the corpora from the internet; preprocessing the corpus, specifically, performing corpus cleaning, word segmentation, part-of-speech tagging and word removal;
extracting business behavior words from all document titles in the corpus, classifying the business behavior words according to business fields, and forming a business behavior word bank corresponding to each business field;
extracting the relation data of all file titles and the titles of the cited files from the corpus to construct a cited relation database;
and counting the number and the simultaneous occurrence times of the business behavior words and the referenced business behavior words according to the citation relation database, generating a business behavior relation, and constructing a business behavior relation database.
2. The method for extracting business behavior relationship based on "citation relationship" data according to claim 1, wherein the business behavior words are extracted from all document titles in the corpus, specifically:
analyzing and word segmentation are carried out on all file titles in the corpus;
collecting service behavior words comprising known service behavior words, continuously derived service behavior words and service behavior words needing to be converted;
screening and testing business behavior words;
and carrying out initial classification and reasoning on the business behavior words.
3. The method for extracting business behavior relationship based on "quotation relationship" data as claimed in claim 1, wherein the relational data of all document titles and cited document titles are extracted from the corpus to construct a quotation relationship database, specifically:
analyzing the content of each file in the corpus, and extracting the relation data of the file title and the title of the file to be quoted;
according to the file titles, marking a business behavior label on each file to form citation relation data, and constructing a citation relation database; the citation relation data comprises a file title, a behavior tag, a cited file title and a cited behavior tag.
4. A business behavior relation extraction device based on 'citation relation' data is characterized by comprising:
the corpus database module is used for collecting corpora, preprocessing the corpora and constructing a corpus;
the corpus library module is specifically used for: searching the existing linguistic data, and downloading and capturing the linguistic data from the network; performing corpus cleaning, word segmentation, part of speech tagging and word removal for the corpus;
the business behavior word bank module is used for extracting business behavior words from all file titles in the corpus and classifying the business behavior words according to business fields to form a business behavior word bank corresponding to each business field;
the quotation relation database module is used for extracting relation data of all file titles and quotation file titles from the corpus and constructing a quotation relation database;
and the business behavior relation library module is used for counting the number and the simultaneous occurrence times of the business behavior words and the quoted business behavior words according to the citation relation database, generating a business behavior relation and constructing a business behavior relation library.
5. The device for extracting business behavior relationship based on "citation relationship" data as claimed in claim 4, wherein the business behavior thesaurus module is specifically configured to:
analyzing and word segmentation are carried out on all file titles in the corpus;
collecting service behavior words comprising known service behavior words, continuously derived service behavior words and service behavior words needing to be converted;
screening and testing business behavior words;
and carrying out initial classification and reasoning on the business behavior words.
6. The device for extracting business behavior relationship based on "citation relationship" data as claimed in claim 4, wherein the citation relationship database module is specifically configured to:
analyzing the content of each file in the corpus, and extracting the relation data of the file title and the title of the file to be quoted;
according to the file titles, marking a business behavior label on each file to form citation relation data, and constructing a citation relation database; the citation relation data comprises a file title, a behavior tag, a cited file title and a cited behavior tag.
7. A citation relationship data-based business behavior relationship extraction device, comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, wherein the processor implements the citation relationship data-based business behavior relationship extraction method according to any one of claims 1 to 3 when executing the computer program.
CN201811463779.5A 2018-11-30 2018-11-30 Service behavior relation extraction method and device based on 'citation relation' data Active CN109597879B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811463779.5A CN109597879B (en) 2018-11-30 2018-11-30 Service behavior relation extraction method and device based on 'citation relation' data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811463779.5A CN109597879B (en) 2018-11-30 2018-11-30 Service behavior relation extraction method and device based on 'citation relation' data

Publications (2)

Publication Number Publication Date
CN109597879A CN109597879A (en) 2019-04-09
CN109597879B true CN109597879B (en) 2022-03-29

Family

ID=65959447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811463779.5A Active CN109597879B (en) 2018-11-30 2018-11-30 Service behavior relation extraction method and device based on 'citation relation' data

Country Status (1)

Country Link
CN (1) CN109597879B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516069B (en) * 2019-08-28 2023-07-25 中南大学 Fasttext-CRF-based quotation metadata extraction method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104537063A (en) * 2014-12-29 2015-04-22 北京理工大学 Knowledge venation map construction system and method based on thesis citation network
CN105631018A (en) * 2015-12-29 2016-06-01 上海交通大学 Article feature extraction method based on topic model
CN108509481A (en) * 2018-01-18 2018-09-07 天津大学 Draw the study frontier visual analysis method of cluster altogether based on document

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000222427A (en) * 1999-02-02 2000-08-11 Mitsubishi Electric Corp Related word extracting device, related word extracting method and recording medium with related word extraction program recorded therein
CN104035975B (en) * 2014-05-23 2017-07-25 华东师范大学 It is a kind of to realize the method that remote supervisory character relation is extracted using Chinese online resource
CN105653706B (en) * 2015-12-31 2018-04-06 北京理工大学 A kind of multilayer quotation based on literature content knowledge mapping recommends method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104537063A (en) * 2014-12-29 2015-04-22 北京理工大学 Knowledge venation map construction system and method based on thesis citation network
CN105631018A (en) * 2015-12-29 2016-06-01 上海交通大学 Article feature extraction method based on topic model
CN108509481A (en) * 2018-01-18 2018-09-07 天津大学 Draw the study frontier visual analysis method of cluster altogether based on document

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
利用引用信息的关键词提取;陈翀 等;《图书情报工作》;20140131;第58卷(第1期);全文 *
基于引用背景信息的关键词自动抽取方法研究;宋宁 等;《情报理论与实践》;20161130;第39卷(第11期);第121-123页,图1、4 *

Also Published As

Publication number Publication date
CN109597879A (en) 2019-04-09

Similar Documents

Publication Publication Date Title
CN109189942B (en) Construction method and device of patent data knowledge graph
Gokulakrishnan et al. Opinion mining and sentiment analysis on a twitter data stream
CN110263248B (en) Information pushing method, device, storage medium and server
RU2704531C1 (en) Method and apparatus for analyzing semantic information
CN107102993B (en) User appeal analysis method and device
CN111177532A (en) Vertical search method, device, computer system and readable storage medium
CN111310476B (en) Public opinion monitoring method and system using aspect-based emotion analysis method
CN111783394A (en) Training method of event extraction model, event extraction method, system and equipment
Rashid et al. Feature level opinion mining of educational student feedback data using sequential pattern mining and association rule mining
CN112069312B (en) Text classification method based on entity recognition and electronic device
CN111460162B (en) Text classification method and device, terminal equipment and computer readable storage medium
CN111325018A (en) Domain dictionary construction method based on web retrieval and new word discovery
Bhakuni et al. Evolution and evaluation: Sarcasm analysis for twitter data using sentiment analysis
Bouchlaghem et al. A machine learning approach for classifying sentiments in Arabic tweets
CN111966792A (en) Text processing method and device, electronic equipment and readable storage medium
CN111782793A (en) Intelligent customer service processing method, system and equipment
CN109597879B (en) Service behavior relation extraction method and device based on 'citation relation' data
Shah et al. An automatic text summarization on Naive Bayes classifier using latent semantic analysis
CN110705285A (en) Government affair text subject word bank construction method, device, server and readable storage medium
CN115640439A (en) Method, system and storage medium for network public opinion monitoring
CN111753540B (en) Method and system for collecting text data to perform Natural Language Processing (NLP)
Patel et al. Influence of Gujarati STEmmeR in supervised learning of web page categorization
Altınel et al. Performance Analysis of Different Sentiment Polarity Dictionaries on Turkish Sentiment Detection
TWI534640B (en) Chinese network information monitoring and analysis system and its method
CN112560425A (en) Template generation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant