WO2019041521A1 - 用户关键词提取装置、方法及计算机可读存储介质 - Google Patents
用户关键词提取装置、方法及计算机可读存储介质 Download PDFInfo
- Publication number
- WO2019041521A1 WO2019041521A1 PCT/CN2017/108797 CN2017108797W WO2019041521A1 WO 2019041521 A1 WO2019041521 A1 WO 2019041521A1 CN 2017108797 W CN2017108797 W CN 2017108797W WO 2019041521 A1 WO2019041521 A1 WO 2019041521A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- keyword
- word
- preset
- keywords
- score
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3347—Query execution using vector based model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Definitions
- the present application relates to the field of computer technologies, and in particular, to a social network-based user keyword extraction apparatus, method, and computer readable storage medium.
- the current recommendation method is mainly based on friend recommendation of the same tag information.
- the recommendation of the topic Based on the recommendation of friends who are concerned about the topic, the recommendation of the topic based on the popularity of the topic, but the recommendation method is limited, and it is difficult to make recommendations based on the interests of the user. Therefore, how to extract keywords that can effectively represent the user's interests from the massive blog post data, and analyze and determine the user's real interest is an urgent problem to be solved.
- the present application provides a social network-based user keyword extraction apparatus, method, and computer readable storage medium, the main purpose of which is to solve the problem in the prior art that it is difficult to extract a keyword that can effectively represent a user's interest according to a user's blog post. problem.
- the present application provides a social network-based user keyword extraction apparatus, the apparatus comprising a memory and a processor, wherein the memory stores a user keyword extraction program executable on the processor, When the user keyword extraction program is executed by the processor, the following steps are implemented:
- the keyword of the preset condition is used as the keyword of interest of the target user.
- the step of constructing a semantic similarity map according to the candidate keyword set and the word vector corresponding to each keyword in the candidate keyword set includes:
- the keyword in the candidate keyword set is used as a word node, wherein one keyword corresponds to one word node;
- the semantic similarity map is composed of all word nodes and established edges.
- the step of calculating the context similarity between each two word nodes according to the corresponding word vector comprises:
- a word vector of two word nodes is obtained, and a cosine similarity between the two word vectors is calculated, and the cosine similarity is used as a context similarity between the two word nodes.
- the step of extracting the keyword corresponding to the blog post from the word list of the blog post by the keyword extraction algorithm includes:
- the repeated keywords in the keywords extracted by the plurality of keyword extraction algorithms are used as keywords corresponding to the blog posts.
- the step of using the keyword that meets the preset condition as the interest keyword of the target user includes:
- a keyword having a score greater than a preset score is used as a keyword of interest of the target user
- the keyword having a score greater than the preset score is used as the keyword of interest of the target user, wherein, when the number of keywords whose score is greater than the preset score is greater than the first preset number, the first preset is The second preset number of keywords among the plurality of keywords is used as the interest keyword of the target user, and the first preset number is greater than the second preset number.
- the present application further provides a social network-based user keyword extraction method, including:
- the Pagerank algorithm is run on the semantic similarity graph to score each keyword, and the keyword whose score meets the preset condition is used as the interest keyword of the target user.
- the step of constructing a semantic similarity map according to the candidate keyword set and the word vector corresponding to each keyword in the candidate keyword set includes:
- the keyword in the candidate keyword set is used as a word node, wherein one keyword corresponds to one word node;
- the semantic similarity map is composed of all word nodes and established edges.
- the step of calculating the context similarity between each two word nodes according to the corresponding word vector comprises:
- a word vector of two word nodes is obtained, and a cosine similarity between the two word vectors is calculated, and the cosine similarity is used as a context similarity between the two word nodes.
- the step of extracting the keyword corresponding to the blog post from the word list of the blog post by the keyword extraction algorithm includes:
- the repeated keywords in the keywords extracted by the plurality of keyword extraction algorithms are used as keywords corresponding to the blog posts.
- the present application further provides a computer readable storage medium having a user keyword extraction program stored thereon, the user keyword extraction program being executable by at least one processor, To achieve the following steps:
- the Pagerank algorithm is run on the semantic similarity graph to score each keyword, and the keyword whose score meets the preset condition is used as the interest keyword of the target user.
- the social network-based user keyword extracting apparatus, method and computer readable storage medium proposed by the present application perform word segmentation processing on each blog post that the target user has published in a preset time interval.
- To obtain a list of words corresponding to each blog post input into the Word2Vec model for training to obtain a word vector model, and extract a corresponding keyword from the word list of the blog post to form a candidate keyword set based on the keyword extraction algorithm, based on the above words
- the vector model calculates the word vector of each keyword in the set, constructs a semantic similarity graph according to the keyword and the word vector in the keyword set, and runs the Pagerank algorithm on the semantic similarity graph to score the keyword, and the key that satisfies the preset condition
- the present application extracts a keyword that can effectively represent the user's interest by synthesizing the user's published blog post in the above manner.
- FIG. 1 is a schematic diagram of a preferred embodiment of a user keyword extraction apparatus based on a social network according to the present application
- FIG. 2 is a schematic diagram of a program module of a user keyword extraction program in an embodiment of a social network-based user keyword extraction apparatus according to the present application;
- FIG. 3 is a flowchart of a preferred embodiment of a method for extracting user keywords based on a social network.
- the application provides a social network based user keyword extraction device.
- FIG. 1 it is a schematic diagram of a preferred embodiment of a social network based user keyword extraction apparatus according to the present application.
- the social network-based user keyword extraction device may be a PC (Personal Computer), or may be a terminal device such as a smart phone, a tablet computer, an e-book reader, or a portable computer.
- PC Personal Computer
- terminal device such as a smart phone, a tablet computer, an e-book reader, or a portable computer.
- the social network based user keyword extraction device includes a memory 11, a processor 12, a communication bus 13, and a network interface 14.
- the memory 11 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (for example, an SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like.
- the memory 11 may in some embodiments be an internal storage unit of a social network based user keyword extraction device, such as a hard disk of the social network based user keyword extraction device.
- the memory 11 may also be an external storage device of the social network-based user keyword extraction device in other embodiments, such as a plug-in hard disk equipped on a social network-based user keyword extraction device, a smart memory card (Smart Media Card) , SMC), Secure Digital (SD) card, Flash Card, etc.
- the memory 11 may also include both an internal storage unit of the social network based user keyword extraction device and an external storage device.
- the memory 11 can be used not only for storing application software and various types of data installed in a social network-based user keyword extraction device, such as code of a user keyword extraction program, but also for temporarily storing data that has been output or is to be output. .
- the processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data processing chip for running program code or processing stored in the memory 11. Data, such as executing a user keyword extraction program or the like.
- CPU Central Processing Unit
- controller microcontroller
- microprocessor or other data processing chip for running program code or processing stored in the memory 11.
- Data such as executing a user keyword extraction program or the like.
- Communication bus 13 is used to implement connection communication between these components.
- the network interface 14 can optionally include a standard wired interface, a wireless interface (such as a WI-FI interface), and is typically used to establish a communication connection between the device and other electronic devices.
- a standard wired interface such as a WI-FI interface
- Figure 1 shows only a social network based user keyword extraction device with components 11-14 and a user keyword extraction program, but it should be understood that not all illustrated components may be implemented, and alternative implementations may be implemented. Or fewer components.
- the device may further include a user interface
- the user interface may include a display
- an input unit such as a keyboard
- the optional user interface may further include a standard wired interface and a wireless interface.
- the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch sensor, or the like.
- the display may also be suitably referred to as a display screen or display unit for displaying information processed in the social network based user keyword extraction device and a user interface for displaying visualization.
- a user keyword extraction program is stored in the memory 11; when the processor 12 executes the user keyword extraction program stored in the memory 11, the following steps are implemented:
- A. Obtain a blog post published by the target user in a preset time interval, perform a word segmentation process on the obtained blog post using a preset word segmentation tool, and respectively obtain a word list corresponding to each blog post;
- the scheme of the present application is explained by taking Weibo as an example.
- the blog post that the user has published is obtained for word segmentation processing.
- the published blog posts are filtered in the time dimension to set a preset time interval. Only the published blog posts of this time period are analyzed, for example, only the blog posts published in the past year are analyzed.
- the number of blog posts published by the user in the preset time interval is small, all the blog posts that the user has published in the past may also be analyzed.
- the word segmentation tool is used to perform word segmentation processing on each of the obtained blog posts one by one, for example, a word segmentation tool such as Stanford Chinese word segmentation tool and jieba word segmentation is used for word segmentation processing.
- a word segmentation tool such as Stanford Chinese word segmentation tool and jieba word segmentation is used for word segmentation processing.
- Stanford Chinese word segmentation tool and jieba word segmentation is used for word segmentation processing.
- the word segmentation result is retained, and the words such as adverbs and adjectives that cannot express the user's interest are removed, for example, in the above example, only Keep the word "movie".
- the result of the word segmentation processing is empty, the corresponding blog post is filtered out, and for each blog post whose score result is not empty, a corresponding word list can be obtained, and all the blog posts in the above time interval are corresponding.
- the word list is input into the Word2Vec model for training, and a word vector model is obtained, which is used to convert the keyword into a word vector.
- the Word2Vec model is a tool for word vector calculation. There are mature calculation methods for training the model and using it to calculate the word vector of a word, and will not be described here.
- keyword extraction is performed for each blog post using a keyword extraction algorithm, for example, using TF-IDF (Term Frequency-Inverse Document Frequency) algorithm, LSA (Latent Semantic Analysis), implicit semantics.
- Analytic algorithm or any one of the keyword extraction algorithms such as PLSA (Probabilisitic Latent Semantic Analysis) algorithm calculates the word list of each blog post, and takes the highest scored one or more words as the The keyword corresponding to the blog post uses the above word vector model to convert each keyword into a corresponding word vector.
- the keyword extraction algorithm is combined with a plurality of keyword extraction algorithms.
- the step of extracting the keyword corresponding to the blog post from the word list of the blog based on the keyword extraction algorithm includes: respectively, according to the preset The plurality of keyword extraction algorithms extract keywords from the word list of the blog post; and the repeated keywords in the keywords extracted by the plurality of keyword extraction algorithms are used as keywords corresponding to the blog post.
- the keyword extraction is performed according to the TF-IDF algorithm, the LSA algorithm, or the PLSA algorithm, and then the keywords of the overlapping portion are taken as the keywords corresponding to the blog.
- the keyword extraction algorithm is used to extract keywords and serve as candidate keywords, and a candidate keyword set is established, and then the keyword set is processed according to a subsequent algorithm, and keywords that can reflect user interest are obtained therefrom. .
- the keywords corresponding to each blog post published by the target user in the preset time interval constitute a candidate keyword set of the target user, and the word vector of each keyword in the set is calculated using the above word vector model.
- a semantic similarity graph is constructed according to the above candidate keyword set and the word vector.
- the step of constructing the semantic similarity graph may include the following refinement step: using the keyword in the candidate keyword set as a word a node, wherein a keyword corresponds to a word node; traversing all word nodes, calculating a context similarity between each two word nodes according to the corresponding word vector, whenever the context similarity between the two word nodes is greater than a preset At the threshold, an edge is established between the two word nodes; the semantic similarity graph is formed by all word nodes and established edges.
- the word vectors of the two word nodes are obtained, and the cosine similarity between the two word vectors is calculated, and the cosine similarity is used as the context similarity between the two word nodes. degree.
- the edge established between the word nodes may be a directed edge or an undirected edge, wherein the direction of the directed edge may be an early word node that appears to point to the late word node that appears. They have different advantages.
- the characteristic of the directed edge is that it needs to be iteratively calculated when running the Pagerank algorithm.
- the calculation amount is slightly larger, and the advantage is that the denoising effect is good; for example, after analyzing a user, the obtained keywords are: C Luo, Real Madrid, La Liga, football, sweepstakes, the first four words in the semantic similarity map whoever points to who will form a mutual promotion role in the Pagerank algorithm score, then even if there are some words, such as snacks, and other words established There is a directed edge, but it does not form a promotion in the iteration, so that the word "sweepstake" is scored lower, and the word can be excluded. For the undirected edge, the calculation speed when running the Pagerank algorithm is fast, and iterative calculation is not needed, but the effect of denoising is not very good.
- the word "sweepstake” may not be excluded.
- other methods may be used to calculate the semantic similarity between two words.
- a method for calculating semantic similarity based on a large-scale corpus is a method for calculating semantic similarity based on a large-scale corpus. The calculation method of the similarity between the more mature words, the specific principle of which will not be described here.
- the step of using a keyword that satisfies a preset condition as a keyword of interest of the target user may include:
- a keyword having a score greater than a preset score is used as a keyword of interest of the target user
- the keyword having a score greater than the preset score is used as the keyword of interest of the target user, wherein, when the number of keywords whose score is greater than the preset score is greater than the first preset number, the first preset is The second preset number of keywords among the plurality of keywords is used as the interest keyword of the target user, and the first preset number is greater than the second preset number.
- the preset threshold, the preset number of words, the first preset number, and the second preset number which are involved in the foregoing embodiments, may be preset according to actual conditions.
- the social network-based user keyword extracting apparatus proposed in the above embodiment performs word segmentation processing on each blog post that the target user has published in the preset time interval, so as to obtain a word list corresponding to each blog post, and input it into the Word2Vec model.
- Training to obtain a word vector model extracting corresponding keywords from the word list of the blog based on the keyword extraction algorithm to form a candidate keyword set, and calculating a word vector of each keyword in the set based on the word vector model, according to the keyword In the collection
- the keyword and the word vector construct a semantic similarity graph.
- the Pagerank algorithm is used to score the keywords on the semantic similarity graph, and the keywords whose scores meet the preset conditions are used as the interest keywords of the user. This application is synthesized by the user in the above manner.
- the user keyword extraction program may also be divided into one or more modules, one or more modules are stored in the memory 11 and executed by one or more processors (this implementation)
- the processor 12 is executed to complete the application
- a module referred to herein refers to a series of computer program instructions that are capable of performing a particular function.
- FIG. 2 it is a schematic diagram of a program module of a user keyword extraction program in an embodiment of a social network-based user keyword extraction apparatus according to the present application.
- the user keyword extraction program may be divided into acquisitions. Module 10, training module 20, extraction module 30, mapping module 40, and scoring module 50, by way of example:
- the obtaining module 10 is configured to obtain a blog post that has been published by the target user in a preset time interval, and perform a word segmentation process on the obtained blog post by using a preset word segmentation tool, and respectively obtain a word list corresponding to each blog post;
- the training module 20 is configured to input the obtained word list corresponding to each blog post into the Word2Vec model for training to obtain a word vector model;
- the extraction module 30 is configured to extract a keyword corresponding to the blog post from the word list of the blog post based on the keyword extraction algorithm, and form a keyword accumulated by the target user in the preset time interval to constitute the target user. a candidate keyword set, and calculating a word vector of each keyword in the candidate keyword set based on the word vector model;
- the mapping module 40 is configured to construct a semantic similarity map according to the candidate keyword set and the word vector corresponding to each keyword in the candidate keyword set;
- the scoring module 50 is configured to run the Pagerank algorithm on the semantic similarity graph to score each keyword, and use a keyword whose score meets the preset condition as the interest keyword of the target user.
- the present application also provides a social network based user keyword extraction method.
- FIG. 3 it is a flowchart of a preferred embodiment of a method for extracting user keywords based on a social network according to the present application. The method can be performed by a device that can be implemented by software and/or hardware.
- the social network-based user keyword extraction method includes:
- Step S10 Obtain a blog post published by the target user in a preset time interval, perform a word segmentation process on the obtained blog post by using a preset word segmentation tool, and obtain a word list corresponding to each blog post respectively;
- Step S20 inputting the obtained word list corresponding to each blog post into the Word2Vec model for training, to obtain a word vector model
- Step S30 extracting, according to the keyword extraction algorithm, the correspondence corresponding to the blog post from the word list of the blog post Key words, the keyword accumulated by the target user published in the preset time interval constitutes a candidate keyword set of the target user, and the candidate keyword set is calculated based on the word vector model The word vector for each keyword.
- the scheme is explained by taking Weibo as an example.
- the blog post that the user has published is obtained for word segmentation processing. It can be understood that since the user's hobbies may change over time, in order to improve the accuracy of keyword extraction, the published blog posts are filtered in the time dimension to set a preset time interval.
- the word segmentation tool is used to perform word segmentation processing on each of the obtained blog posts one by one, for example, a word segmentation tool such as Stanford Chinese word segmentation tool and jieba word segmentation is used for word segmentation processing.
- a word segmentation of the content of this blog post "Going to the movie last night” will result in the following results: "Yesterday
- Movie the word segmentation result is retained.
- the word segmentation result is retained, and the words such as adverbs and adjectives that cannot express the user's interest are removed, for example, in the above example, only Keep the word "movie".
- the result of the word segmentation processing is empty, the corresponding blog post is filtered out, and for each blog post whose score result is not empty, a corresponding word list can be obtained, and all the blog posts in the above time interval are corresponding.
- the word list is input into the Word2Vec model for training, and a word vector model is obtained, which is used to convert the keyword into a word vector.
- the Word2Vec model is a tool for word vector calculation. There are mature calculation methods for training the model and using it to calculate the word vector of a word, and will not be described here.
- keyword extraction is performed for each blog post using a keyword extraction algorithm, for example, using TF-IDF (Term Frequency-Inverse Document Frequency) algorithm, LSA (Latent Semantic Analysis), implicit semantics.
- Analytic algorithm or any one of the keyword extraction algorithms such as PLSA (Probabilisitic Latent Semantic Analysis) algorithm calculates the word list of each blog post, and takes the highest scored one or more words as the The keyword corresponding to the blog post uses the above word vector model to convert each keyword into a corresponding word vector.
- the keyword extraction algorithm is combined with a plurality of keyword extraction algorithms.
- the step of extracting the keyword corresponding to the blog post from the word list of the blog based on the keyword extraction algorithm includes: respectively, according to the preset The plurality of keyword extraction algorithms extract keywords from the word list of the blog post; and the repeated keywords in the keywords extracted by the plurality of keyword extraction algorithms are used as keywords corresponding to the blog post.
- the keyword extraction is performed according to the TF-IDF algorithm, the LSA algorithm, or the PLSA algorithm, and then the keywords of the overlapping portion are taken as the keywords corresponding to the blog.
- the keyword extraction algorithm extracts the keyword and uses it as a candidate keyword to establish a candidate keyword set, and then processes the keyword set according to a subsequent algorithm to obtain a keyword that can reflect the user's interest.
- Step S40 Construct a semantic similarity map according to the candidate keyword set and the word vector corresponding to each keyword in the candidate keyword set.
- the keywords corresponding to each blog post published by the target user in the preset time interval constitute a candidate keyword set of the target user, and the word vector of each keyword in the set is calculated using the above word vector model.
- a semantic similarity graph is constructed according to the above candidate keyword set and the word vector.
- the step of constructing the semantic similarity graph may include the following refinement step: using the keyword in the candidate keyword set as a word a node, wherein a keyword corresponds to a word node; traversing all word nodes, calculating a context similarity between each two word nodes according to the corresponding word vector, whenever the context similarity between the two word nodes is greater than a preset At the threshold, an edge is established between the two word nodes; the semantic similarity graph is formed by all word nodes and established edges.
- the word vectors of the two word nodes are obtained, and the cosine similarity between the two word vectors is calculated, and the cosine similarity is used as the context similarity between the two word nodes. degree.
- the edge established between the word nodes may be a directed edge or an undirected edge, wherein the direction of the directed edge may be an early word node that appears to point to the late word node that appears. They have different advantages.
- the characteristic of the directed edge is that it needs to be iteratively calculated when running the Pagerank algorithm.
- the calculation amount is slightly larger, and the advantage is that the denoising effect is good; for example, after analyzing a user, the obtained keywords are: C Luo, Real Madrid, La Liga, football, sweepstakes, the first four words in the semantic similarity map whoever points to who will form a mutual promotion role in the Pagerank algorithm score, then even if there are some words, such as snacks, and other words established There is a directed edge, but it does not form a promotion in the iteration, so that the word "sweepstake" is scored lower, and the word can be excluded. For the undirected edge, the calculation speed when running the Pagerank algorithm is fast, and iterative calculation is not needed, but the effect of denoising is not very good.
- the word "sweepstake” may not be excluded.
- other methods may be used to calculate the semantic similarity between two words.
- a method for calculating semantic similarity based on a large-scale corpus is a method for calculating semantic similarity based on a large-scale corpus. The calculation method of the similarity between the more mature words, the specific principle of which will not be described here.
- step S50 the Pagerank algorithm is run on the semantic similarity graph to score each keyword, and the keyword whose score meets the preset condition is used as the interest keyword of the target user.
- the step of using a keyword that satisfies a preset condition as a keyword of interest of the target user may include:
- a keyword having a score greater than a preset score is used as a keyword of interest of the target user
- the keyword having a score greater than the preset score is used as the keyword of interest of the target user, wherein, when the number of keywords whose score is greater than the preset score is greater than the first preset number, the first preset is The second preset number of keywords among the plurality of keywords is used as the interest keyword of the target user, and the first preset number is greater than the second preset number.
- the preset threshold, the preset number of words, the first preset number, and the second preset number which are involved in the foregoing embodiments, may be preset according to actual conditions.
- the social network-based user keyword extraction method proposed in the above embodiment performs word segmentation processing on each blog post published by the target user in a preset time interval, so as to obtain a word list corresponding to each blog post, and input it into the Word2Vec model.
- Training to obtain a word vector model, extracting corresponding keywords from the word list of the blog based on the keyword extraction algorithm to form a candidate keyword set, and calculating a word vector of each keyword in the set based on the word vector model, according to the keyword
- the keyword and the word vector in the set construct a semantic similarity graph, and the Pagerank algorithm is used to score the keyword on the semantic similarity graph, and the keyword whose score meets the preset condition is used as the interest keyword of the user, and the application integrates the user by the above method.
- the published blog post performs word segmentation processing to extract keywords that can effectively represent the user's interests.
- the embodiment of the present application further provides a computer readable storage medium, where the user keyword extraction program is stored, and the user keyword extraction program may be executed by one or more processors, Implement the following operations:
- the Pagerank algorithm is run on the semantic similarity graph to score each keyword, and the keyword whose score meets the preset condition is used as the interest keyword of the target user.
- the keyword in the candidate keyword set is used as a word node, wherein one keyword corresponds to one word node;
- the semantic similarity map is composed of all word nodes and established edges.
- a word vector of two word nodes is obtained, and a cosine similarity between the two word vectors is calculated, and the cosine similarity is used as a context similarity between the two word nodes.
- the repeated keywords in the keywords extracted by the plurality of keyword extraction algorithms are used as keywords corresponding to the blog posts.
- the technical solution of the present application which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM as described above). , a disk, an optical disk, including a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods described in the various embodiments of the present application.
- a terminal device which may be a mobile phone, a computer, a server, or a network device, etc.
Abstract
Description
Claims (20)
- 一种基于社交网络的用户关键词提取装置,其特征在于,所述装置包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的用户关键词提取程序,所述用户关键词提取程序被所述处理器执行时实现如下步骤:获取目标用户在预设时间区间内发表过的博文,使用预设的分词工具对获取的博文进行分词处理,分别获取每条博文对应的单词列表;将获取的每个博文对应的单词列表输入到Word2Vec模型中进行训练,以获取词向量模型;基于关键词提取算法从博文的单词列表中提取该博文对应的关键词,将所述目标用户在所述预设时间区间内发表过的博文累计的关键词构成所述目标用户的候选关键词集合,并基于所述词向量模型计算所述候选关键词集合中每一个关键词的词向量;根据所述候选关键词集合以及所述候选关键词集合中每一个关键词对应的词向量,构建语义相似图;在所述语义相似图上运行Pagerank算法为每一个关键词打分,将得分满足预设条件的关键词作为所述目标用户的兴趣关键词。
- 根据权利要求1所述的基于社交网络的用户关键词提取装置,其特征在于,所述根据所述候选关键词集合以及所述候选关键词集合中每一个关键词对应的词向量,构建语义相似图的步骤包括:将所述候选关键词集合中的关键词作为单词节点,其中,一个关键词对应一个单词节点;遍历全部单词节点,根据对应的词向量计算每两个单词节点之间的上下文相似度,每当两个单词节点之间的上下文相似度大于预设阈值时,在所述两个单词节点之间建立一条边;由全部单词节点以及建立的边构成所述语义相似图。
- 根据权利要求2所述的基于社交网络的用户关键词提取装置,其特征在于,所述根据对应的词向量计算每两个单词节点之间的上下文相似度的步骤包括:获取两个单词节点的词向量,并计算这两个词向量之间的余弦相似度,将所述余弦相似度作为所述两个单词节点之间的上下文相似度。
- 根据权利要求1所述的基于社交网络的用户关键词提取装置,其特征在于,当所述博文包含的字数大于或者等于预设字数时,所述基于关键词提取算法从博文的单词列表中提取该博文对应的关键词的步骤包括:分别按照预设的多个关键词提取算法从博文的单词列表中提取关键词;将所述多个关键词提取算法提取的关键词中重复的关键词作为该博文对 应的关键词。
- 根据权利要求2所述的基于社交网络的用户关键词提取装置,其特征在于,当所述博文包含的字数大于或者等于预设字数时,所述基于关键词提取算法从博文的单词列表中提取该博文对应的关键词的步骤包括:分别按照预设的多个关键词提取算法从博文的单词列表中提取关键词;将所述多个关键词提取算法提取的关键词中重复的关键词作为该博文对应的关键词。
- 根据权利要求1所述的基于社交网络的用户关键词提取装置,其特征在于,所述将得分满足预设条件的关键词作为所述目标用户的兴趣关键词的步骤包括:将得分大于预设分数的关键词作为所述目标用户的兴趣关键词;或者,将得分大于预设分数的关键词作为所述目标用户的兴趣关键词,其中,在得分大于预设分数的关键词的数量大于第一预设个数时,将所述第一预设个数个关键词中的第二预设个数个关键词作为所述目标用户的兴趣关键词,所述第一预设个数大于所述第二预设个数。
- 根据权利要求2所述的基于社交网络的用户关键词提取装置,其特征在于,所述将得分满足预设条件的关键词作为所述目标用户的兴趣关键词的步骤包括:将得分大于预设分数的关键词作为所述目标用户的兴趣关键词;或者,将得分大于预设分数的关键词作为所述目标用户的兴趣关键词,其中,在得分大于预设分数的关键词的数量大于第一预设个数时,将所述第一预设个数个关键词中的第二预设个数个关键词作为所述目标用户的兴趣关键词,所述第一预设个数大于所述第二预设个数。
- 一种基于社交网络的用户关键词提取方法,其特征在于,所述方法包括:获取目标用户在预设时间区间内发表过的博文,使用预设的分词工具对获取的博文进行分词处理,分别获取每条博文对应的单词列表;将获取的每个博文对应的单词列表输入到Word2Vec模型中进行训练,以获取词向量模型;基于关键词提取算法从博文的单词列表中提取该博文对应的关键词,将所述目标用户在所述预设时间区间内发表过的博文累计的关键词构成所述目标用户的候选关键词集合,并基于所述词向量模型计算所述候选关键词集合中每一个关键词的词向量;根据所述候选关键词集合以及所述候选关键词集合中每一个关键词对应的词向量,构建语义相似图;在所述语义相似图上运行Pagerank算法为每一个关键词打分,将得分满足预设条件的关键词作为所述目标用户的兴趣关键词。
- 根据权利要求8所述的基于社交网络的用户关键词提取方法,其特征在于,所述根据所述候选关键词集合以及所述候选关键词集合中每一个关键词对应的词向量,构建语义相似图的步骤包括:将所述候选关键词集合中的关键词作为单词节点,其中,一个关键词对应一个单词节点;遍历全部单词节点,根据对应的词向量计算每两个单词节点之间的上下文相似度,每当两个单词节点之间的上下文相似度大于预设阈值时,在所述两个单词节点之间建立一条边;由全部单词节点以及建立的边构成所述语义相似图。
- 根据权利要求9所述的基于社交网络的用户关键词提取方法,其特征在于,所述根据对应的词向量计算每两个单词节点之间的上下文相似度的步骤包括:获取两个单词节点的词向量,并计算这两个词向量之间的余弦相似度,将所述余弦相似度作为所述两个单词节点之间的上下文相似度。
- 根据权利要求8所述的基于社交网络的用户关键词提取方法,其特征在于,当所述博文包含的字数大于或者等于预设字数时,所述基于关键词提取算法从博文的单词列表中提取该博文对应的关键词的步骤包括:分别按照预设的多个关键词提取算法从博文的单词列表中提取关键词;将所述多个关键词提取算法提取的关键词中重复的关键词作为该博文对应的关键词。
- 根据权利要求9所述的基于社交网络的用户关键词提取方法,其特征在于,当所述博文包含的字数大于或者等于预设字数时,所述基于关键词提取算法从博文的单词列表中提取该博文对应的关键词的步骤包括:分别按照预设的多个关键词提取算法从博文的单词列表中提取关键词;将所述多个关键词提取算法提取的关键词中重复的关键词作为该博文对应的关键词。
- 根据权利要求8所述的基于社交网络的用户关键词提取装置,其特征在于,所述将得分满足预设条件的关键词作为所述目标用户的兴趣关键词的步骤包括:将得分大于预设分数的关键词作为所述目标用户的兴趣关键词;或者,将得分大于预设分数的关键词作为所述目标用户的兴趣关键词,其中,在得分大于预设分数的关键词的数量大于第一预设个数时,将所述第 一预设个数个关键词中的第二预设个数个关键词作为所述目标用户的兴趣关键词,所述第一预设个数大于所述第二预设个数。
- 根据权利要求9所述的基于社交网络的用户关键词提取装置,其特征在于,所述将得分满足预设条件的关键词作为所述目标用户的兴趣关键词的步骤包括:将得分大于预设分数的关键词作为所述目标用户的兴趣关键词;或者,将得分大于预设分数的关键词作为所述目标用户的兴趣关键词,其中,在得分大于预设分数的关键词的数量大于第一预设个数时,将所述第一预设个数个关键词中的第二预设个数个关键词作为所述目标用户的兴趣关键词,所述第一预设个数大于所述第二预设个数。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有用户关键词提取程序,所述用户关键词提取程序可被至少一个处理器执行,以实现如下步骤:获取目标用户在预设时间区间内发表过的博文,使用预设的分词工具对获取的博文进行分词处理,分别获取每条博文对应的单词列表;将获取的每个博文对应的单词列表输入到Word2Vec模型中进行训练,以获取词向量模型;基于关键词提取算法从博文的单词列表中提取该博文对应的关键词,将所述目标用户在所述预设时间区间内发表过的博文累计的关键词构成所述目标用户的候选关键词集合,并基于所述词向量模型计算所述候选关键词集合中每一个关键词的词向量;根据所述候选关键词集合以及所述候选关键词集合中每一个关键词对应的词向量,构建语义相似图;在所述语义相似图上运行Pagerank算法为每一个关键词打分,将得分满足预设条件的关键词作为所述目标用户的兴趣关键词。
- 根据权利要求15所述的计算机可读存储介质,其特征在于,所述根据所述候选关键词集合以及所述候选关键词集合中每一个关键词对应的词向量,构建语义相似图的步骤包括:将所述候选关键词集合中的关键词作为单词节点,其中,一个关键词对应一个单词节点;遍历全部单词节点,根据对应的词向量计算每两个单词节点之间的上下文相似度,每当两个单词节点之间的上下文相似度大于预设阈值时,在所述两个单词节点之间建立一条边;由全部单词节点以及建立的边构成所述语义相似图。
- 根据权利要求16所述的计算机可读存储介质,其特征在于,所述根 据对应的词向量计算每两个单词节点之间的上下文相似度的步骤包括:获取两个单词节点的词向量,并计算这两个词向量之间的余弦相似度,将所述余弦相似度作为所述两个单词节点之间的上下文相似度。
- 根据权利要求15所述的计算机可读存储介质,其特征在于,当所述博文包含的字数大于或者等于预设字数时,所述基于关键词提取算法从博文的单词列表中提取该博文对应的关键词的步骤包括:分别按照预设的多个关键词提取算法从博文的单词列表中提取关键词;将所述多个关键词提取算法提取的关键词中重复的关键词作为该博文对应的关键词。
- 根据权利要求16所述的计算机可读存储介质,其特征在于,当所述博文包含的字数大于或者等于预设字数时,所述基于关键词提取算法从博文的单词列表中提取该博文对应的关键词的步骤包括:分别按照预设的多个关键词提取算法从博文的单词列表中提取关键词;将所述多个关键词提取算法提取的关键词中重复的关键词作为该博文对应的关键词。
- 根据权利要求15所述的计算机可读存储介质,其特征在于,所述将得分满足预设条件的关键词作为所述目标用户的兴趣关键词的步骤包括:将得分大于预设分数的关键词作为所述目标用户的兴趣关键词;或者,将得分大于预设分数的关键词作为所述目标用户的兴趣关键词,其中,在得分大于预设分数的关键词的数量大于第一预设个数时,将所述第一预设个数个关键词中的第二预设个数个关键词作为所述目标用户的兴趣关键词,所述第一预设个数大于所述第二预设个数。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2017408801A AU2017408801B2 (en) | 2017-08-29 | 2017-10-31 | User keyword extraction device and method, and computer-readable storage medium |
US16/084,988 US20210097238A1 (en) | 2017-08-29 | 2017-10-31 | User keyword extraction device and method, and computer-readable storage medium |
JP2018538141A JP2019533205A (ja) | 2017-08-29 | 2017-10-31 | ユーザキーワード抽出装置、方法、及びコンピュータ読み取り可能な記憶媒体 |
KR1020187024862A KR102170929B1 (ko) | 2017-08-29 | 2017-10-31 | 사용자 키워드 추출장치, 방법 및 컴퓨터 판독 가능한 저장매체 |
EP17904351.8A EP3477495A4 (en) | 2017-08-29 | 2017-10-31 | APPARATUS AND METHOD FOR USER KEYWORD EXTRACTION AND COMPUTER-READABLE MEMORY MEDIUM |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710754314.4 | 2017-08-29 | ||
CN201710754314.4A CN107704503A (zh) | 2017-08-29 | 2017-08-29 | 用户关键词提取装置、方法及计算机可读存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019041521A1 true WO2019041521A1 (zh) | 2019-03-07 |
Family
ID=61169937
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/108797 WO2019041521A1 (zh) | 2017-08-29 | 2017-10-31 | 用户关键词提取装置、方法及计算机可读存储介质 |
Country Status (7)
Country | Link |
---|---|
US (1) | US20210097238A1 (zh) |
EP (1) | EP3477495A4 (zh) |
JP (1) | JP2019533205A (zh) |
KR (1) | KR102170929B1 (zh) |
CN (1) | CN107704503A (zh) |
AU (1) | AU2017408801B2 (zh) |
WO (1) | WO2019041521A1 (zh) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110489758A (zh) * | 2019-09-10 | 2019-11-22 | 深圳市和讯华谷信息技术有限公司 | 应用程序的价值观计算方法及装置 |
CN111160193A (zh) * | 2019-12-20 | 2020-05-15 | 中国平安财产保险股份有限公司 | 关键信息提取方法、装置及存储介质 |
CN111191119A (zh) * | 2019-12-16 | 2020-05-22 | 绍兴市上虞区理工高等研究院 | 一种基于神经网络的科技成果自学习方法及装置 |
CN111581492A (zh) * | 2020-04-01 | 2020-08-25 | 车智互联(北京)科技有限公司 | 一种内容推荐方法、计算设备及可读存储介质 |
CN111858834A (zh) * | 2020-07-30 | 2020-10-30 | 平安国际智慧城市科技股份有限公司 | 基于ai的案件争议焦点确定方法、装置、设备及介质 |
CN112101012A (zh) * | 2020-09-25 | 2020-12-18 | 北京百度网讯科技有限公司 | 互动领域确定方法、装置、电子设备及存储介质 |
CN112800771A (zh) * | 2020-02-17 | 2021-05-14 | 腾讯科技(深圳)有限公司 | 文章识别方法、装置、计算机可读存储介质和计算机设备 |
CN112988971A (zh) * | 2021-03-15 | 2021-06-18 | 平安科技(深圳)有限公司 | 基于词向量的搜索方法、终端、服务器及存储介质 |
Families Citing this family (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108596789B (zh) * | 2018-03-29 | 2022-08-30 | 时时同云科技(成都)有限责任公司 | 一种菜品标准化的方法 |
CN108573134A (zh) * | 2018-04-04 | 2018-09-25 | 阿里巴巴集团控股有限公司 | 一种识别身份的方法、装置及电子设备 |
CN109635273B (zh) * | 2018-10-25 | 2023-04-25 | 平安科技(深圳)有限公司 | 文本关键词提取方法、装置、设备及存储介质 |
CN109408826A (zh) * | 2018-11-07 | 2019-03-01 | 北京锐安科技有限公司 | 一种文本信息提取方法、装置、服务器及存储介质 |
CN111259656A (zh) * | 2018-11-15 | 2020-06-09 | 武汉斗鱼网络科技有限公司 | 短语相似度计算方法、存储介质、电子设备及系统 |
CN109508423A (zh) * | 2018-12-14 | 2019-03-22 | 平安科技(深圳)有限公司 | 基于语义识别的房源推荐方法、装置、设备及存储介质 |
CN110298029B (zh) * | 2019-05-22 | 2022-07-12 | 平安科技(深圳)有限公司 | 基于用户语料的好友推荐方法、装置、设备及介质 |
JP7451917B2 (ja) * | 2019-09-26 | 2024-03-19 | 株式会社Jvcケンウッド | 情報提供装置、情報提供方法及びプログラム |
KR102326744B1 (ko) * | 2019-11-21 | 2021-11-16 | 강원오픈마켓 주식회사 | 사용자 참여형 키워드 선정 시스템의 제어 방법, 장치 및 프로그램 |
CN111274428B (zh) * | 2019-12-19 | 2023-06-30 | 北京创鑫旅程网络技术有限公司 | 一种关键词的提取方法及装置、电子设备、存储介质 |
CN111460099B (zh) * | 2020-03-30 | 2023-04-07 | 招商局金融科技有限公司 | 关键词提取方法、装置及存储介质 |
KR102476334B1 (ko) * | 2020-04-22 | 2022-12-09 | 인하대학교 산학협력단 | 딥러닝 기반 일기 생성 방법 및 장치 |
CN111737523B (zh) * | 2020-04-22 | 2023-11-14 | 聚好看科技股份有限公司 | 一种视频标签、搜索内容的生成方法及服务器 |
CN111724196A (zh) * | 2020-05-14 | 2020-09-29 | 天津大学 | 一种基于用户体验的提高汽车产品质量的方法 |
CN112069232B (zh) * | 2020-09-08 | 2023-08-01 | 中国移动通信集团河北有限公司 | 宽带业务覆盖范围的查询方法及装置 |
CN112347778B (zh) * | 2020-11-06 | 2023-06-20 | 平安科技(深圳)有限公司 | 关键词抽取方法、装置、终端设备及存储介质 |
CN112329462B (zh) * | 2020-11-26 | 2024-02-20 | 北京五八信息技术有限公司 | 一种数据排序方法、装置、电子设备及存储介质 |
CN113919342A (zh) * | 2021-09-18 | 2022-01-11 | 暨南大学 | 一种会计术语共现网络图构建的方法 |
CN115080718B (zh) * | 2022-06-21 | 2024-04-09 | 浙江极氪智能科技有限公司 | 一种文本关键短语的抽取方法、系统、设备及存储介质 |
CN115344679A (zh) * | 2022-08-16 | 2022-11-15 | 中国平安财产保险股份有限公司 | 问题数据的处理方法、装置、计算机设备及存储介质 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020073095A1 (en) * | 2000-12-07 | 2002-06-13 | Patentmall Ltd. | Patent classification displaying method and apparatus |
CN104778161A (zh) * | 2015-04-30 | 2015-07-15 | 车智互联(北京)科技有限公司 | 基于Word2Vec和Query log抽取关键词方法 |
CN106997382A (zh) * | 2017-03-22 | 2017-08-01 | 山东大学 | 基于大数据的创新创意标签自动标注方法及系统 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5088096B2 (ja) * | 2007-11-02 | 2012-12-05 | 富士通株式会社 | 情報抽出プログラムおよび情報抽出装置 |
CN103201718A (zh) * | 2010-11-05 | 2013-07-10 | 乐天株式会社 | 关于关键词提取的系统和方法 |
US9798818B2 (en) * | 2015-09-22 | 2017-10-24 | International Business Machines Corporation | Analyzing concepts over time |
CN105893410A (zh) * | 2015-11-18 | 2016-08-24 | 乐视网信息技术(北京)股份有限公司 | 一种关键词提取方法和装置 |
US20170139899A1 (en) | 2015-11-18 | 2017-05-18 | Le Holdings (Beijing) Co., Ltd. | Keyword extraction method and electronic device |
CN105447179B (zh) * | 2015-12-14 | 2019-02-05 | 清华大学 | 基于微博社交网络的话题自动推荐方法及其系统 |
CN105912524B (zh) * | 2016-04-09 | 2019-08-20 | 北京交通大学 | 基于低秩矩阵分解的文章话题关键词提取方法和装置 |
CN106372064B (zh) * | 2016-11-18 | 2019-04-19 | 北京工业大学 | 一种文本挖掘的特征词权重计算方法 |
CN106970910B (zh) * | 2017-03-31 | 2020-03-27 | 北京奇艺世纪科技有限公司 | 一种基于图模型的关键词提取方法及装置 |
-
2017
- 2017-08-29 CN CN201710754314.4A patent/CN107704503A/zh active Pending
- 2017-10-31 JP JP2018538141A patent/JP2019533205A/ja active Pending
- 2017-10-31 EP EP17904351.8A patent/EP3477495A4/en not_active Withdrawn
- 2017-10-31 KR KR1020187024862A patent/KR102170929B1/ko active IP Right Grant
- 2017-10-31 US US16/084,988 patent/US20210097238A1/en not_active Abandoned
- 2017-10-31 AU AU2017408801A patent/AU2017408801B2/en active Active
- 2017-10-31 WO PCT/CN2017/108797 patent/WO2019041521A1/zh unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020073095A1 (en) * | 2000-12-07 | 2002-06-13 | Patentmall Ltd. | Patent classification displaying method and apparatus |
CN104778161A (zh) * | 2015-04-30 | 2015-07-15 | 车智互联(北京)科技有限公司 | 基于Word2Vec和Query log抽取关键词方法 |
CN106997382A (zh) * | 2017-03-22 | 2017-08-01 | 山东大学 | 基于大数据的创新创意标签自动标注方法及系统 |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110489758B (zh) * | 2019-09-10 | 2023-04-18 | 深圳市和讯华谷信息技术有限公司 | 应用程序的价值观计算方法及装置 |
CN110489758A (zh) * | 2019-09-10 | 2019-11-22 | 深圳市和讯华谷信息技术有限公司 | 应用程序的价值观计算方法及装置 |
CN111191119A (zh) * | 2019-12-16 | 2020-05-22 | 绍兴市上虞区理工高等研究院 | 一种基于神经网络的科技成果自学习方法及装置 |
CN111191119B (zh) * | 2019-12-16 | 2023-12-12 | 绍兴市上虞区理工高等研究院 | 一种基于神经网络的科技成果自学习方法及装置 |
CN111160193A (zh) * | 2019-12-20 | 2020-05-15 | 中国平安财产保险股份有限公司 | 关键信息提取方法、装置及存储介质 |
CN111160193B (zh) * | 2019-12-20 | 2024-02-09 | 中国平安财产保险股份有限公司 | 关键信息提取方法、装置及存储介质 |
CN112800771B (zh) * | 2020-02-17 | 2023-11-07 | 腾讯科技(深圳)有限公司 | 文章识别方法、装置、计算机可读存储介质和计算机设备 |
CN112800771A (zh) * | 2020-02-17 | 2021-05-14 | 腾讯科技(深圳)有限公司 | 文章识别方法、装置、计算机可读存储介质和计算机设备 |
CN111581492A (zh) * | 2020-04-01 | 2020-08-25 | 车智互联(北京)科技有限公司 | 一种内容推荐方法、计算设备及可读存储介质 |
CN111581492B (zh) * | 2020-04-01 | 2024-02-23 | 车智互联(北京)科技有限公司 | 一种内容推荐方法、计算设备及可读存储介质 |
CN111858834B (zh) * | 2020-07-30 | 2023-12-01 | 平安国际智慧城市科技股份有限公司 | 基于ai的案件争议焦点确定方法、装置、设备及介质 |
CN111858834A (zh) * | 2020-07-30 | 2020-10-30 | 平安国际智慧城市科技股份有限公司 | 基于ai的案件争议焦点确定方法、装置、设备及介质 |
CN112101012A (zh) * | 2020-09-25 | 2020-12-18 | 北京百度网讯科技有限公司 | 互动领域确定方法、装置、电子设备及存储介质 |
CN112101012B (zh) * | 2020-09-25 | 2024-04-26 | 北京百度网讯科技有限公司 | 互动领域确定方法、装置、电子设备及存储介质 |
CN112988971A (zh) * | 2021-03-15 | 2021-06-18 | 平安科技(深圳)有限公司 | 基于词向量的搜索方法、终端、服务器及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
EP3477495A4 (en) | 2019-12-11 |
KR102170929B1 (ko) | 2020-10-29 |
AU2017408801B2 (en) | 2020-04-02 |
KR20190038751A (ko) | 2019-04-09 |
CN107704503A (zh) | 2018-02-16 |
AU2017408801A1 (en) | 2019-03-14 |
EP3477495A1 (en) | 2019-05-01 |
JP2019533205A (ja) | 2019-11-14 |
US20210097238A1 (en) | 2021-04-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019041521A1 (zh) | 用户关键词提取装置、方法及计算机可读存储介质 | |
WO2019200806A1 (zh) | 文本分类模型的生成装置、方法及计算机可读存储介质 | |
CN108287864B (zh) | 一种兴趣群组划分方法、装置、介质及计算设备 | |
US10026021B2 (en) | Training image-recognition systems using a joint embedding model on online social networks | |
CN107609152B (zh) | 用于扩展查询式的方法和装置 | |
WO2019218514A1 (zh) | 网页目标信息的提取方法、装置及存储介质 | |
US11797620B2 (en) | Expert detection in social networks | |
CN110334272B (zh) | 基于知识图谱的智能问答方法、装置及计算机存储介质 | |
US10083379B2 (en) | Training image-recognition systems based on search queries on online social networks | |
JP6661790B2 (ja) | テキストタイプを識別する方法、装置及びデバイス | |
KR20200094627A (ko) | 텍스트 관련도를 확정하기 위한 방법, 장치, 기기 및 매체 | |
WO2020000717A1 (zh) | 网页分类方法、装置及计算机可读存储介质 | |
WO2019205373A1 (zh) | 相似用户查找装置、方法及计算机可读存储介质 | |
CN110413787B (zh) | 文本聚类方法、装置、终端和存储介质 | |
JP2019519019A5 (zh) | ||
WO2020056977A1 (zh) | 知识点推送方法、装置及计算机可读存储介质 | |
CN110275962B (zh) | 用于输出信息的方法和装置 | |
WO2021068681A1 (zh) | 标签分析方法、装置及计算机可读存储介质 | |
WO2020258481A1 (zh) | 个性化文本智能推荐方法、装置及计算机可读存储介质 | |
CN110019763B (zh) | 文本过滤方法、系统、设备及计算机可读存储介质 | |
WO2018205459A1 (zh) | 获取目标用户的方法、装置、电子设备及介质 | |
CN113626704A (zh) | 基于word2vec模型的推荐信息方法、装置及设备 | |
CN115248890A (zh) | 用户兴趣画像的生成方法、装置、电子设备以及存储介质 | |
WO2019085118A1 (zh) | 基于主题模型的关联词分析方法、电子装置及存储介质 | |
WO2018205460A1 (zh) | 获取目标用户的方法、装置、电子设备及介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2018538141 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20187024862 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2017904351 Country of ref document: EP Effective date: 20181008 |
|
ENP | Entry into the national phase |
Ref document number: 2017408801 Country of ref document: AU Date of ref document: 20171031 Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |