CN110619117B - Keyword extraction method and device - Google Patents

Keyword extraction method and device Download PDF

Info

Publication number
CN110619117B
CN110619117B CN201810630477.6A CN201810630477A CN110619117B CN 110619117 B CN110619117 B CN 110619117B CN 201810630477 A CN201810630477 A CN 201810630477A CN 110619117 B CN110619117 B CN 110619117B
Authority
CN
China
Prior art keywords
word
article
value
words
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810630477.6A
Other languages
Chinese (zh)
Other versions
CN110619117A (en
Inventor
潘岸腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN201810630477.6A priority Critical patent/CN110619117B/en
Publication of CN110619117A publication Critical patent/CN110619117A/en
Application granted granted Critical
Publication of CN110619117B publication Critical patent/CN110619117B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a keyword extraction method and device. The method comprises the following steps: performing word segmentation on each article in the article set to be processed to acquire words contained in each article; determining, for each of the terms, a set of articles containing the term; determining a value of each word according to the set, wherein the value is used for representing the capability of each word to distinguish a theme; and determining the keywords of each article in the article set to be processed according to the value of each word. The keywords which can most represent the article theme can be obtained without establishing a keyword database in advance, and the reliability of keyword extraction is improved.

Description

Keyword extraction method and device
Technical Field
The present invention relates to data processing technologies, and in particular, to a keyword extraction method and apparatus.
Background
In the application of the mobile phone of the information class, articles are required to be classified or labeled so as to be convenient to classify, basic data support is provided for providing personalized pushing services for different users later, and the most basic work in the process of classifying or labeling the articles is the extraction of the keywords of the articles.
The method for extracting the keywords is provided in the prior art, wherein at least one group of phrases is extracted from the text, each group of phrases is matched with a preset keyword database, and words in the group of phrases with the highest matching degree are used as keywords of the text.
However, the keywords extracted by the method cannot accurately represent the subject of the text, and the reliability is not high.
Disclosure of Invention
The invention provides a keyword extraction method and a keyword extraction device, which are used for improving the reliability of keywords.
The invention provides a keyword extraction method, which comprises the following steps:
performing word segmentation on each article in the article set to be processed to acquire words contained in each article;
determining, for each of the terms, a set of articles containing the term;
determining a value of each word according to the set, wherein the value is used for representing the capability of each word to distinguish a theme;
and determining the keywords of each article in the article set to be processed according to the value of each word.
Optionally, the determining the value of each word according to the set includes:
according to the set, determining a correlation coefficient of each word and other words, wherein the correlation coefficient is used for characterizing the capability of distinguishing a theme by each word and other words together;
and determining the value of each word according to the correlation coefficient.
Optionally, the determining, according to the set, a correlation coefficient of each word and other words includes:
the formula is adopted:
Sim k,l =|U k ∩U l |/|U k ∪U l |
calculating the correlation coefficient of each word and other words, wherein Sim k,l For the correlation coefficient of word k and word l, U k U, for a collection of articles containing the word k l Is a collection of articles that contain the word/.
Optionally, the determining the value of each word according to the correlation coefficient includes:
the formula is adopted:
determining a Value of each of the words, wherein Value k For the value of the word k, Q is the set of all words.
Optionally, the determining, according to the value of each word, a keyword of each article in the article set to be processed includes:
the formula is adopted:
Kvalue i,k =Value k ×count i,k
calculating the value of each word in each article, wherein Kvalue i,k For the value of word k in article i, count i,k The number of occurrences of the word k in the article i;
and aiming at each article, arranging the value of the included words in the article from big to small, and taking the preset number of words arranged in front as keywords.
Optionally, the method further comprises:
each article is classified based on its keywords.
Optionally, the term is a noun, a verb, or an adjective.
The invention provides a keyword extraction device, comprising:
the acquisition module is used for carrying out word segmentation processing on each article in the article collection to be processed to acquire words contained in each article;
a determining module, configured to determine, for each of the terms, a set of articles that include the term;
the determining module is further configured to determine a value of each term according to the set, where the value is used to characterize an ability of each term to distinguish topics;
and the determining module is also used for determining the keywords of each article in the article set to be processed according to the value of each word.
Optionally, the determining module is specifically configured to determine, according to the set, a correlation coefficient of each word and other words, where the correlation coefficient is used to indicate a probability magnitude of occurrence of each word and other words at the same time; and the method is also specifically used for determining the value of each word according to the correlation coefficient.
The invention provides a computer readable storage medium, wherein computer execution instructions are stored in the computer readable storage medium, and when a processor executes the computer execution instructions, the keyword extraction method is realized.
The invention provides a key word extraction device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to implement the keyword extraction method described above via execution of the executable instructions.
According to the keyword extraction method provided by the invention, firstly, word segmentation is carried out on each article in the article set to be processed, so that words contained in each article are obtained; then, for each word, determining a set of articles containing the word; then determining the value of each word according to the set; and finally, determining the keywords of each article in the article set to be processed according to the value of each word. The keywords which can most represent the article theme can be obtained without establishing a keyword database in advance, and the reliability of keyword extraction is improved.
Drawings
FIG. 1 is a flowchart of a first embodiment of a keyword extraction method provided by the present invention;
FIG. 2 is a flowchart of a second embodiment of a keyword extraction method provided by the present invention;
fig. 3 is a schematic structural diagram of a first embodiment of a keyword extraction apparatus provided by the present invention;
fig. 4 is a schematic hardware structure of the keyword extraction apparatus provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
For information mobile phone applications, the most basic task in the process of classifying or labeling articles is the extraction of article keywords. The method for extracting the keywords is provided in the prior art, wherein at least one group of phrases is extracted from the text, each group of phrases is matched with a preset keyword database, and words in the group of phrases with the highest matching degree are used as keywords of the text. However, the method in the prior art depends on a pre-established keyword database to a certain extent, when the keyword database cannot be updated in time, the keywords extracted by the method cannot always accurately represent the subjects to be expressed by the text, and further, classification errors are easily caused when the text is classified based on the keywords extracted by the method.
The invention provides a keyword extraction method, which can obtain keywords which can most represent the subject of an article by directly calculating the capability of distinguishing the subject of each word contained in the article without establishing a keyword database in advance, thereby improving the reliability of keyword extraction.
The following describes the technical scheme of the present invention and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a first embodiment of a keyword extraction method provided by the present invention, as shown in fig. 1, where the keyword extraction method provided by the present embodiment includes:
s101, performing word segmentation processing on each article in a to-be-processed article set to acquire words contained in each article.
As the name implies, the word segmentation process refers to segmenting a string of chinese characters into individual words, and in this embodiment, the words after word segmentation may be nouns, verbs or adjectives.
For example, a string of Chinese character sequences is the "typing most accurate, interface most personalized input method". After word segmentation processing is carried out on the string of Chinese character sequences, the obtained words are as follows: typing, accurate, interface, individuation and input method.
The text number of the articles in the article set to be processed is at least one, and the text number of the articles in the article set to be processed is 4, which is illustrated as an example, and the words obtained after the 4 articles are subjected to word segmentation processing can be illustrated in table 1.
Article 1 Word a Word b Word k Word l
Article 2 Word a Word k Word l Word c
Article 3 Word a Word c Word d Word e
Article 4 Word a Word b Word c Word l
TABLE 1
As shown in table 1, after the article 1 is subjected to word segmentation, the obtained words are: word a, word b, word k, word l; after the article 2 is subjected to word segmentation, the obtained words are as follows: word a, word k, word l, word c; after the article 3 is subjected to word segmentation, the obtained words are as follows: word a, word c, word d, word e; after the article 4 is subjected to word segmentation, the obtained words are as follows: word a, word b, word c, word l.
It should be noted that: table 1 is only illustrative, and the number of articles in the article set to be processed in this embodiment may be more, which is not limited in the present invention.
S102, determining a set formed by articles containing the words for each word;
continuing with the description of table 1 in S101, for each word in table 1, namely word a, word b, word c, word d, word e, word k, word l; determining a set of articles containing a term; as can be seen from Table 1, for word a, the set of articles containing word a is U a = { article 1, article 2, article 3, article 4}, for word b, the set of articles containing word b is U b = { article 1, article 4}, for word c, the set of articles containing word c is U c = { article 2, article 3, article 4}, for word d, the set of articles containing word d is U d = { article 3}, for word e, the set of articles containing word e is U e = { article 3}, for word k, the set of articles containing word k is U k = { article 1, article 2}, for word l, the set of articles containing word l is U l = { article 1, article 2, article 4}.
S103, determining the value of each word according to the set.
Wherein the value is used for representing the capability of distinguishing the theme of each word, and the higher the value of the word is, the stronger the capability of distinguishing the theme of the word is.
S104, determining keywords of each article in the article set to be processed according to the value of each word.
After the value of each term is obtained in S103, the value of each term in the corresponding article may be calculated according to the value, and then, for each article, the values of the terms contained in the article are ranked from large to small, and the top several terms are used as keywords of the article.
According to the keyword extraction method provided by the embodiment, firstly, word segmentation is carried out on each article in the article to be processed set, so that words contained in each article are obtained; then, for each word, determining a set of articles containing the word; then determining the value of each word according to the set; and finally, determining the keywords of each article in the article set to be processed according to the value of each word. The keywords which can most represent the article theme can be obtained without establishing a keyword database in advance, and the reliability of keyword extraction is improved.
Fig. 2 is a flowchart of a second embodiment of the keyword extraction method provided by the present invention, and on the basis of the foregoing embodiment, as an implementation manner, in the keyword extraction method provided by the present embodiment, S103 specifically includes:
and S201, according to the set, determining the correlation coefficient of each word and other words, wherein the correlation coefficient is used for representing the capability of distinguishing the theme of each word and other words together.
Wherein, from the set, methods of determining the correlation coefficient of each term and other terms include, but are not limited to, the following methods.
Specifically, the following formula can be adopted
Sim k,l =|U k ∩U l |/|U k ∪U l | (1)
Calculating the correlation coefficient of each word and other words, wherein Sim k,l For the correlation coefficient of word k and word l, U k U, for a collection of articles containing the word k l Is a collection of articles that contain the word/.
Taking the words in table 1 of the above embodiment as an example, for example, to determine the correlation coefficient between the word k and the word l, the set of articles containing the word k is obtained as U in S102 k = { article 1, article 2}, the set of articles containing word l is U l Based on = { article 1, article 2, article 4}, due to U k And U l The intersection of { article 1, article 2}, U k And U l The union of the words k and l determined by the above equation (1) is 2/3.
The larger the correlation coefficient of the word k and the word l is, the larger the probability that the word k and the word l are simultaneously appeared is, the larger the probability that the word k and the word l point to a certain theme is, and the stronger the capability of the word k and the word l to jointly distinguish the theme is.
According to the above method, the correlation coefficient of any one word and other words in table 1 can be calculated.
S202, determining the value of each word according to the correlation coefficient.
After obtaining the correlation coefficient of any word with other words in S201, the value of the word may be determined according to the correlation coefficient, and a specific determination method includes, but is not limited to, the following implementation manner.
The formula is adopted:
determining a Value of each of the words, wherein Value k For the value of the word k, Q is the set of all words.
Taking the value of the word k in the table 1 as an example for determination, according to the formula (1) for calculating the correlation coefficient, the correlation coefficient of the word k and the word l is 2/3, the correlation coefficient of the word k and the word a is 1/2, the correlation coefficient of the word k and the word b is 1/3, the correlation coefficient of the word k and the word c is 1/4, the correlation coefficient of the word k and the word d is 0, and the correlation coefficient of the word k and the word e is 0. On this basis, since q= { article 1, article 2, article 3, article 4}, q=4. The Value of k calculated according to the formula (2) is Value k Is (2/3+1/2+1/3+1/4+0+0)/4=7/16.
On the basis of obtaining the value of each word, S104 specifically includes, as one possible way:
s203, adopting a formula:
Kvalue i,k =Value k ×count i,k (3)
calculating the value of each word in each article, wherein Kvalue i,k For the value of word k in article i, count i,k For word k in article iNumber of occurrences.
Continuing with the description of the word k in Table 1, it is known from S202 that the value of the word k is 7/16, the number of occurrences of the word k in the article 1 and the article 2 is 1, the value of the word k in the article 1 and the article 2 is 7/16 according to the above formula (3), the number of occurrences of the word k in the article 3 and the article 4 is 0, and the value of the word k in the article 3 and the article 4 is 0 according to the above formula (3).
S204, aiming at each article, arranging the value of the included words in the article from big to small, and taking the preset number of words arranged in the front as keywords. .
Taking article 1 as an example, the value of the other words in article 1 is calculated according to the method for calculating the value of the word k in article 1, for example: the value of the word a in the article 1 is calculated as x, the value of the word b in the article 1 is calculated as y, the value of the word k in the article 1 is calculated as 7/16, and the value of the word l in the article 1 is calculated as z; moreover, x >7/16> y > z, all words contained in article 1 are arranged in order of value from big to small: word a, word k, word b, word l. Alternatively, the first two words, word a and word k, may be taken as keywords for this article,
alternatively, after obtaining the keyword of each article, each article may be classified based on the keyword, for example: the types of classifications may be financial, recreational, political, and the like.
The keyword extraction method provided by the embodiment provides a realizable mode of determining the value of the words and a realizable mode of determining the keywords of each article according to the value of each word, and the keywords extracted by the keyword extraction method provided by the embodiment can more accurately represent the subjects to be expressed by the articles and have higher reliability.
Fig. 3 is a schematic structural diagram of a first embodiment of a keyword extraction apparatus provided in the present invention, as shown in fig. 3, where the keyword extraction apparatus provided in the present embodiment includes:
the obtaining module 10 is configured to perform word segmentation on each article in the article set to be processed, and obtain a word included in each article;
a determining module 11, configured to determine, for each of the terms, a set of articles including the terms;
the determining module 11 is further configured to determine a value of each of the terms according to the set, where the value is used to characterize an ability of each of the terms to distinguish topics;
the determining module 11 is further configured to determine a keyword of each article in the article set to be processed according to the value of each word.
Wherein the determining module 11 is specifically configured to determine, according to the set, a correlation coefficient of each term and other terms, where the correlation coefficient is used to characterize an ability of each term to distinguish a topic from other terms together; and determining the value of each word according to the correlation coefficient.
Wherein the determining module 11 is specifically configured to,
the formula is adopted:
Sim k,l =|U k ∩U l |/|U k ∪U l |
calculating the correlation coefficient of each word and other words, wherein Sim k,l For the correlation coefficient of word k and word l, U k U, for a collection of articles containing the word k l Is a collection of articles that contain the word/.
The determination module 11 is also adapted to,
the formula is adopted:
determining a Value of each of the words, wherein Value k For the value of the word k, Q is the set of all words.
The determination module 11 is also adapted to,
the formula is adopted:
Kvalue i,k =Value k ×count i,k
calculating the value of each word in each article, wherein Kvalue i,k For the value of word k in article i, count i,k The number of occurrences of the word k in the article i;
and aiming at each article, arranging the value of the included words in the article from big to small, and taking the preset number of words arranged in front as keywords.
The apparatus of this embodiment may further include: and the classification module is used for classifying each article based on the keywords of each article.
Optionally, the term is a noun, a verb, or an adjective.
The keyword extraction apparatus provided in this embodiment may be used to perform the method in the embodiment shown in fig. 1 or fig. 2, and its implementation principle and technical effects are similar, and are not described herein again.
Fig. 4 is a schematic hardware structure of the keyword extraction apparatus provided by the present invention. As shown in fig. 4, the keyword extraction apparatus of the present embodiment may include: a processor 401 and a memory 402,
the memory 402 is used to store program instructions. The method comprises the steps of carrying out a first treatment on the surface of the
The processor 401 is configured to implement the method described in any of the foregoing embodiments when the program instructions are executed, and specific implementation principles may be referred to the foregoing embodiments, which are not repeated herein.
The present invention also provides a program product comprising a computer program stored in a readable storage medium, from which at least one processor can read the computer program, the at least one processor executing the computer program causing a terminal to implement the keyword extraction method described above.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.
The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (english: processor) to perform some of the steps of the methods according to the embodiments of the invention. And the aforementioned storage medium includes: u disk, mobile hard disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (11)

1. A keyword extraction method, comprising:
word segmentation is carried out on each article in the article collection to be processed, words contained in each article are obtained, and each article corresponds to a theme to be expressed;
determining, for each of the terms, a set of articles containing the term;
according to the set, determining the value of each word, wherein the value of the word is the ratio of the sum of the correlation coefficients of the word and other words to the number of article sets formed by all words, the value is used for representing the capability of distinguishing the theme of each word, and the higher the value of the word is, the stronger the capability of distinguishing the theme of the word is;
and determining keywords of each article in the article set to be processed according to the value of each word, wherein the keywords are a preset number of words with values arranged in front from big to small in each article.
2. The method of claim 1, wherein said determining a value for each of said terms from said collection comprises:
according to the set, determining a correlation coefficient of each word and other words, wherein the correlation coefficient is a ratio of an intersection and a union of the set of articles where the words are located and the set of articles where the other words are located, and the correlation coefficient is used for representing the capability of distinguishing a theme together with the other words;
and determining the value of each word according to the correlation coefficient.
3. The method of claim 2, wherein determining the correlation coefficient for each term and other terms from the set comprises:
the formula is adopted:
Sim k,l =|U k ∩U l |/|U k ∪U l |
calculating the correlation coefficient of each word and other words, wherein Sim k,l For the correlation coefficient of word k and word l, U k For a collection of articles containing the word k,U l Is a collection of articles that contain the word/.
4. A method according to claim 3, wherein said determining the value of each of said terms based on said correlation coefficients comprises:
the formula is adopted:
determining a Value of each of the words, wherein Value k For the value of the word k, Q is the set of all words.
5. The method of claim 4, wherein determining keywords for each article in the set of articles to be processed based on the value of each of the terms comprises:
the formula is adopted:
Kvalue i,k =Value k ×count i,k
calculating the value of each word in each article, wherein Kvalue i,k For the value of word k in article i, count i,k The number of occurrences of the word k in the article i;
and aiming at each article, arranging the value of the included words in the article from big to small, and taking the preset number of words arranged in front as keywords.
6. The method of any one of claims 1-5, further comprising:
each article is classified based on its keywords.
7. The method of claim 6, wherein the term is a noun, a verb, or an adjective.
8. A keyword extraction apparatus, characterized by comprising:
the acquisition module is used for carrying out word segmentation on each article in the article collection to be processed to acquire words contained in each article, wherein each article corresponds to a theme to be expressed;
a determining module, configured to determine, for each of the terms, a set of articles that include the term;
the determining module is further configured to determine, according to the set, a value of each term, where the value of a term is a ratio of a sum of correlation coefficients of the term and other terms to a number of article sets formed by all terms, the value is used to characterize an ability of each term to distinguish a topic, and the higher the value of the term is, the stronger the ability of the term to distinguish a topic is;
the determining module is further configured to determine, according to the value of each term, a keyword of each article in the article set to be processed, where the keyword is a preset number of terms with values arranged in front from big to small in each article.
9. The apparatus of claim 8, wherein the device comprises a plurality of sensors,
the determining module is specifically configured to determine, according to the set, a correlation coefficient of each term and other terms, where the correlation coefficient is a ratio of an intersection and a union of a set of articles where the term is located and a set of articles where the other terms are located, and the correlation coefficient is used to indicate a probability that each term and the other terms occur simultaneously; and the method is also specifically used for determining the value of each word according to the correlation coefficient.
10. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor implement the method of any of claims 1 to 7.
11. A keyword extraction apparatus, characterized by comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to implement the method of any of claims 1-7 via execution of the executable instructions.
CN201810630477.6A 2018-06-19 2018-06-19 Keyword extraction method and device Active CN110619117B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810630477.6A CN110619117B (en) 2018-06-19 2018-06-19 Keyword extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810630477.6A CN110619117B (en) 2018-06-19 2018-06-19 Keyword extraction method and device

Publications (2)

Publication Number Publication Date
CN110619117A CN110619117A (en) 2019-12-27
CN110619117B true CN110619117B (en) 2024-03-19

Family

ID=68920231

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810630477.6A Active CN110619117B (en) 2018-06-19 2018-06-19 Keyword extraction method and device

Country Status (1)

Country Link
CN (1) CN110619117B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101621391A (en) * 2009-08-07 2010-01-06 北京百问百答网络技术有限公司 Method and system for classifying short texts based on probability topic
CN106611041A (en) * 2016-09-29 2017-05-03 四川用联信息技术有限公司 New text similarity solution method
CN106708880A (en) * 2015-11-16 2017-05-24 北京国双科技有限公司 Topic associated word obtaining method and apparatus

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10592541B2 (en) * 2015-05-29 2020-03-17 Intel Corporation Technologies for dynamic automated content discovery

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101621391A (en) * 2009-08-07 2010-01-06 北京百问百答网络技术有限公司 Method and system for classifying short texts based on probability topic
CN106708880A (en) * 2015-11-16 2017-05-24 北京国双科技有限公司 Topic associated word obtaining method and apparatus
CN106611041A (en) * 2016-09-29 2017-05-03 四川用联信息技术有限公司 New text similarity solution method

Also Published As

Publication number Publication date
CN110619117A (en) 2019-12-27

Similar Documents

Publication Publication Date Title
US11093854B2 (en) Emoji recommendation method and device thereof
WO2019153612A1 (en) Question and answer data processing method, electronic device and storage medium
CN109885773B (en) Personalized article recommendation method, system, medium and equipment
US8744839B2 (en) Recognition of target words using designated characteristic values
CN102227724B (en) Machine learning for transliteration
CN111814770B (en) Content keyword extraction method of news video, terminal device and medium
US20150186503A1 (en) Method, system, and computer readable medium for interest tag recommendation
WO2015149533A1 (en) Method and device for word segmentation processing on basis of webpage content classification
CN110929125B (en) Search recall method, device, equipment and storage medium thereof
CN110134777B (en) Question duplication eliminating method and device, electronic equipment and computer readable storage medium
KR102373146B1 (en) Device and Method for Cluster-based duplicate document removal
CN112632257A (en) Question processing method and device based on semantic matching, terminal and storage medium
CN107885717B (en) Keyword extraction method and device
US20210240751A1 (en) Machine learning approach to cross-language translation and search
CN111651986A (en) Event keyword extraction method, device, equipment and medium
WO2015062377A1 (en) Device and method for detecting similar text, and application
CN111325033B (en) Entity identification method, entity identification device, electronic equipment and computer readable storage medium
CN112328735A (en) Hot topic determination method and device and terminal equipment
CN110019556B (en) Topic news acquisition method, device and equipment thereof
CN114298007A (en) Text similarity determination method, device, equipment and medium
CN112581297B (en) Information pushing method and device based on artificial intelligence and computer equipment
CN110619117B (en) Keyword extraction method and device
CN114417883B (en) Data processing method, device and equipment
WO2018205460A1 (en) Target user acquisition method and apparatus, electronic device and medium
CN109947947B (en) Text classification method and device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200420

Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant after: Alibaba (China) Co.,Ltd.

Address before: 510627 Guangdong city of Guangzhou province Whampoa Tianhe District Road No. 163 Xiping Yun Lu Yun Ping square B radio tower 15 layer self unit 02

Applicant before: GUANGZHOU UC NETWORK TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant