CN110888990A - Text recommending methods, devices, equipment and media - Google Patents
Text recommending methods, devices, equipment and media Download PDFInfo
- Publication number
- CN110888990A CN110888990A CN201911179808.XA CN201911179808A CN110888990A CN 110888990 A CN110888990 A CN 110888990A CN 201911179808 A CN201911179808 A CN 201911179808A CN 110888990 A CN110888990 A CN 110888990A
- Authority
- CN
- China
- Prior art keywords
- text
- preset
- node
- candidate
- texts
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明涉及金融科技(Fintech)技术领域,尤其涉及一种文本推荐方法、装置、设备及介质。The present invention relates to the technical field of financial technology (Fintech), and in particular, to a text recommendation method, apparatus, device and medium.
背景技术Background technique
随着计算机技术的发展,越来越多的技术应用在金融领域,传统金融业正在逐步向金融科技(Finteh)转变,内容推荐技术也不例外,但由于金融行业的安全性、实时性以及精准性要求,也对内容推荐技术提出的更高的要求,目前,内容推荐技术完全依赖用户配置的关键词进行推荐,完全依赖用户配置的关键词进行推荐会把用户偏好狭义化,进而存在推送给用户的新闻数据等内容存在过于单一化以及存在准确率低等技术问题,比如用户的监控目标是长租公寓,关键词都是长租公寓相关,由于P2P爆雷会间接导致长租公寓爆仓,在未配置P2P相关关键词的情况下,用户无法收到这样的消息,也就无法提前预测长租公寓可能爆仓的风险。With the development of computer technology, more and more technologies are applied in the financial field, and the traditional financial industry is gradually transforming to Finteh, and content recommendation technology is no exception. It also puts forward higher requirements for content recommendation technology. At present, content recommendation technology completely relies on user-configured keywords for recommendation, and completely relying on user-configured keywords for recommendation will narrow the user's preference, and then there is a push to The user's news data and other content are too simplistic and have technical problems such as low accuracy. For example, the user's monitoring target is long-term rental apartments, and the key words are related to long-term rental apartments. Due to the explosion of P2P lightning, the long-term rental apartment will be exploded indirectly. In the absence of P2P-related keywords, users cannot receive such news, and they cannot predict the risk that long-term rental apartments may burst out in advance.
发明内容SUMMARY OF THE INVENTION
本发明的主要目的在于提供一种文本推荐方法、装置、设备及介质,旨在解决现有根据关键词进行内容推荐过程中存在推荐过于单一化以及推荐准确率低的技术问题。The main purpose of the present invention is to provide a text recommendation method, device, equipment and medium, which aims to solve the technical problems of excessively single recommendation and low recommendation accuracy in the existing content recommendation process based on keywords.
为实现上述目的,本发明实施例提供一种文本推荐方法,所述文本推荐方法包括:To achieve the above object, an embodiment of the present invention provides a text recommendation method, where the text recommendation method includes:
监控目标用户的操作行为,根据所述操作行为确定与目标用户关联的关键词;Monitoring the operation behavior of the target user, and determining the keywords associated with the target user according to the operation behavior;
从预设的文本数据库集合中检索出一个以上的包含至少一个所述关键词的更新文本,作为第一候选文本;Retrieve one or more updated texts containing at least one of the keywords from a preset text database set as the first candidate text;
调取预设的事理图谱,根据所述预设的事理图谱从所述预设的文本数据库集合中选取与所述第一候选文本的总关联度不小于预设关联阈值的更新文本,作为第二候选文本,所述事理图谱包含文本与文本之间的关联关系,各关联关系有其对应的关联度;Retrieve the preset affair atlas, and select, from the preset text database set according to the preset affair atlas, the updated text whose total degree of association with the first candidate text is not less than the preset association threshold, as the first Two candidate texts, the event graph includes the association between texts, and each association has its corresponding degree of association;
根据所述操作行为,从所述第一候选文本与所述第二候选文本中筛选出被选文本,并将所述被选文本推荐给所述目标用户。According to the operation behavior, the selected text is screened from the first candidate text and the second candidate text, and the selected text is recommended to the target user.
可选地,所述调取预设的事理图谱,根据所述预设的事理图谱从所述预设的文本数据库集合中选取与所述第一候选文本的总关联度不小于预设关联阈值的更新文本步骤之前包括:Optionally, the retrieval of a preset event map, according to the preset event map, is selected from the preset text database set and the total degree of association with the first candidate text is not less than a preset association threshold. The Update Text step before includes:
每间隔预设时间段从所述预设的文本数据库集合中采集待处理文本;Collect the text to be processed from the preset text database collection every preset time period;
通过预设正则表达式对所述待处理文本进行html标签过滤、符号过滤以及分句处理,得到分句列表构成的预处理文本;Perform html tag filtering, symbol filtering and clause processing on the to-be-processed text through a preset regular expression, to obtain a preprocessed text formed by a clause list;
根据所述预处理文本生成所述预设的事理图谱。The preset event graph is generated according to the preprocessed text.
可选地,所述根据所述预处理文本生成所述预设的事理图谱步骤包括:Optionally, the step of generating the preset event map according to the preprocessed text includes:
对所述分句列表中的每条分句进行多个预设文本关联关系的识别,得到待处理节点文本,其中,所述预设文本关联关系包括但不限于顺承、因果、条件以及并列关系;Identify multiple preset text associations for each clause in the clause list, and obtain node texts to be processed, wherein the preset text associations include but are not limited to succession, causality, condition, and juxtaposition relation;
通过预设分词工具对所述待处理节点文本进行分词处理,并获取每个分词的词向量,基于每个分词的词向量得到每个待处理节点文本的节点向量;Perform word segmentation processing on the node text to be processed by using a preset word segmentation tool, and obtain the word vector of each word segmentation, and obtain the node vector of each node text to be processed based on the word vector of each word segmentation;
根据所述每个待处理节点文本的节点向量计算每个待处理节点文本与其他待处理节点文本之间的第一节点距离;Calculate the first node distance between each to-be-processed node text and other to-be-processed node texts according to the node vector of each to-be-processed node text;
将节点距离小于第一预设距离的两个待处理节点文本进行迭代嫁接处理,直至所述每个待处理节点文本处于节点文本关系边不再发生变化的收敛状态,其中,将所述处于收敛状态的各个待处理节点文本设为收敛节点文本;Perform iterative grafting processing on two to-be-processed node texts whose node distances are less than the first preset distance, until each of the to-be-processed node texts is in a convergent state in which the relationship between the node texts does not change, wherein the The text of each pending node of the state is set to the text of the convergent node;
基于所述收敛节点文本与所述收敛节点文本之间的节点文本关系边,生成所述预设的事理图谱。The preset event graph is generated based on the node text relationship edges between the convergent node text and the convergent node text.
可选地,所述调取预设的事理图谱,根据所述预设的事理图谱从所述预设的文本数据库集合中选取与所述第一候选文本的总关联度不小于预设关联阈值的更新文本,作为第二候选文本步骤包括:Optionally, the retrieval of a preset event map, according to the preset event map, is selected from the preset text database set and the total degree of association with the first candidate text is not less than a preset association threshold. The updated text of , as the second candidate text step includes:
调取预设的事理图谱,判断所述事理图谱中是否存在对应分句中包含所述关键词的收敛节点文本;Retrieve a preset event map, and determine whether there is a convergence node text containing the keyword in the corresponding clause in the event map;
若存在,则将所述对应分句中包含所述关键词的收敛节点文本设为用户关注节点文本,并从所述预设的文本数据库在预设时间段内更新的文本中选取第一候选文本外的第三候选文本,通过预设分词工具对所述第三候选文本中每篇文本的标题进行分词处理,得到第三候选文本中每篇文本的标题向量;If there is, set the convergence node text containing the keyword in the corresponding clause as the user's focus node text, and select the first candidate from the text updated in the preset text database within the preset time period For the third candidate text outside the text, a word segmentation process is performed on the title of each text in the third candidate text by using a preset word segmentation tool to obtain the title vector of each text in the third candidate text;
计算所述标题向量与所述事理图谱中各收敛节点文本的节点向量之间的第二节点距离;Calculate the second node distance between the title vector and the node vector of each convergent node text in the event graph;
从所述第三候选文本中选取第二节点距离小于第二预设距离,且所述小于第二预设距离对应的收敛节点文本为用户关注节点文本的第一目标文本,或者从所述第三候选文本中选取第二节点距离小于第二预设距离,且所述小于第二预设距离对应的收敛节点文本的预设筛选逻辑深度范围内存在用户关注节点文本的第二目标文本,其中,所述筛选逻辑深度根据所述事理图谱中所述各关联关系的关联度确定;Select the second node distance from the third candidate text that is less than the second preset distance, and the convergent node text corresponding to the less than the second preset distance is the first target text of the node text that the user pays attention to, or from the first target text of the node text. Among the three candidate texts, the distance of the second node selected is less than the second preset distance, and there is a second target text of the node text that the user pays attention to within the preset screening logic depth range of the convergent node text corresponding to the less than the second preset distance, wherein , the screening logic depth is determined according to the correlation degree of each correlation in the event map;
将所述第一目标文本与所述第二目标文本设为所述第二候选文本。The first target text and the second target text are set as the second candidate text.
可选地,所述根据所述操作行为,从所述第一候选文本与所述第二候选文本中筛选出被选文本,并将所述被选文本推荐给所述目标用户步骤包括:Optionally, the step of filtering out the selected text from the first candidate text and the second candidate text according to the operation behavior, and recommending the selected text to the target user includes:
获取所述第一候选文本与所述第二候选文本中每篇文本的传播量、并获取每篇文本与所述目标用户的相关度,根据所述操作行为获取所述目标用户的偏好度;Obtain the dissemination volume of each text in the first candidate text and the second candidate text, and obtain the correlation between each text and the target user, and obtain the preference of the target user according to the operation behavior;
根据所述传播量、所述相关度与所述偏好度,从所述第一候选文本与所述第二候选文本中筛选出被选文本,并将所述被选文本推荐给所述目标用户。According to the spread, the relevance and the preference, the selected text is screened out from the first candidate text and the second candidate text, and the selected text is recommended to the target user .
可选地,所述获取每篇文本与所述目标用户的相关度步骤包括:Optionally, the step of obtaining the relevance between each text and the target user includes:
获取所述关键词在所述第一候选文本的每篇文本中出现的次数,将所述次数设为词次数;Obtain the number of times that the keyword appears in each text of the first candidate text, and set the number of times as the number of words;
获取所述关键词在所述第一候选文本的每篇文本中出现的位置,将所述位置设为词位置,并获取所述词位置对应预设的位置权重,其中,词位置不同,位置权重不同,所述词位置包括文本首段首句位置,文本尾段首句位置,文本首段非首句位置、文本尾段非首句位置、非首段首句位置以及非尾段首句位置;Obtain the position where the keyword appears in each text of the first candidate text, set the position as a word position, and obtain a preset position weight corresponding to the word position, wherein the word position is different, the position The weights are different, and the word positions include the position of the first sentence of the first paragraph of the text, the position of the first sentence of the last paragraph of the text, the position of the non-first sentence of the first paragraph of the text, the position of the non-first sentence of the last paragraph of the text, the position of the first sentence of the non-first paragraph and the first sentence of the non-final paragraph. Location;
获取所述关键词在所述第一候选文本的每篇文本中第一次和最后一次出现的位置之间间隔的句子数量与全文总句数的比值,将所述比值设为词跨度;Obtain the ratio of the number of sentences between the positions where the keyword appears for the first time and the last time in each text of the first candidate text to the total number of sentences in the full text, and set the ratio as the word span;
获取所述关键词在所述第一候选文本的每篇文本中第一次和最后一次出现的位置之间的目标正文,获取所述目标正文中平均每预设句数中包含所述关键词的数量,将平均每预设句数中包含所述关键词的数量设为词密度;Obtain the target text between the positions where the keyword appears for the first time and the last time in each text of the first candidate text, and obtain the target text that contains the keyword in the average number of preset sentences The number of words contained in the average number of sentences per preset sentence is set as the word density;
根据所述词次数、所述词位置对应预设的位置权重、所述词跨度与所述词密度获取所述第一候选文本中每篇文本的第一相关度;Obtain the first relevance of each text in the first candidate text according to the word count, the preset position weight corresponding to the word position, the word span and the word density;
获取所述第二候选文本中每篇文本的筛选逻辑深度,根据所述筛选逻辑深度确定所述第二候选文本中每篇文本的第二相关度。The screening logic depth of each text in the second candidate text is acquired, and the second relevancy degree of each text in the second candidate text is determined according to the screening logic depth.
可选地,所述根据所述操作行为获取所述目标用户的偏好度度步骤包括:Optionally, the step of obtaining the preference degree of the target user according to the operation behavior includes:
从所述操作行为中获取所述目标用户的历史浏览文本,获取所述历史浏览文本中每篇文本的第一文档向量,并获取所述第一候选文本与所述第二候选文本中每篇文本的第二文档向量;Acquire the historical browsing text of the target user from the operation behavior, acquire the first document vector of each text in the historical browsing text, and acquire each of the first candidate text and the second candidate text the second document vector of the text;
获取所述第二文档向量与所述第一文档向量之间的第一皮尔逊相关系数,根据所述第一皮尔逊相关系数获取所述目标用户的偏好度。The first Pearson correlation coefficient between the second document vector and the first document vector is obtained, and the preference of the target user is obtained according to the first Pearson correlation coefficient.
可选地,所述获取所述历史浏览文本中每篇文本的第一文档向量步骤包括:Optionally, the step of obtaining the first document vector of each text in the historical browsing text includes:
根据预设聚类算法获取所述历史浏览文本中每篇文本被划分在第一预设类别下的第一概率矩阵;Obtain a first probability matrix in which each text in the historical browsing text is classified under a first preset category according to a preset clustering algorithm;
根据预设分词算法获取所述历史浏览文本中每篇文本的分词词语,获取所述分词词语被划分在第二预设类别下的第二概率矩阵;Obtain the word segmentation of each text in the historical browsing text according to a preset word segmentation algorithm, and obtain a second probability matrix in which the word segmentation is divided into a second preset category;
根据所述第一概率矩阵与所述第二概率矩阵获取所述历史浏览文本中每篇文本对应的各个优化词向量;Obtain each optimized word vector corresponding to each text in the historical browsing text according to the first probability matrix and the second probability matrix;
根据所述优化词向量获取所述历史浏览文本中每篇文本的第一文档向量。The first document vector of each text in the historical browsing text is obtained according to the optimized word vector.
可选地,所述获取所述第二文档向量与所述第一文档向量之间的第一皮尔逊相关系数,根据所述第一皮尔逊相关系数获取所述目标用户的偏好度步骤包括:Optionally, the step of obtaining the first Pearson correlation coefficient between the second document vector and the first document vector, and obtaining the preference degree of the target user according to the first Pearson correlation coefficient includes:
获取所述历史浏览文本中每篇文本被点击浏览时至当前时刻之间的历史浏览时间;Obtain the historical browsing time between the time when each text in the historical browsing text is clicked to browse to the current moment;
获取所述第二文档向量与所述第一文档向量之间的第一皮尔逊相关系数,根据所述历史浏览时间对所述第一皮尔逊相关系数进行兴趣降权处理,得到第二皮尔逊相关系数;Obtain the first Pearson correlation coefficient between the second document vector and the first document vector, and perform interest downweight processing on the first Pearson correlation coefficient according to the historical browsing time to obtain a second Pearson correlation coefficient correlation coefficient;
根据所述第二皮尔逊相关系数获取所述目标用户的偏好度。The preference of the target user is acquired according to the second Pearson correlation coefficient.
可选地,所述根据所述传播量、所述相关度与所述偏好度,从所述第一候选文本与所述第二候选文本中筛选出被选文本,并将所述被选文本推荐给所述目标用户步骤包括:Optionally, the selected text is screened out from the first candidate text and the second candidate text according to the spread, the relevancy and the preference, and the selected text is The steps of recommending to the target user include:
根据所述传播量、所述第一相关度、所述第二相关度与所述偏好度,计算所述第一候选文本与所述第二候选文本中每篇文本的价值分数;Calculate the value score of each text in the first candidate text and the second candidate text according to the spread, the first relevancy, the second relevancy and the preference;
根据所述价值分数从高至低依次选取预设数量的文本作为被选文本,并将所述被选文本推荐给所述目标用户。According to the value score, a preset number of texts are sequentially selected as selected texts, and the selected texts are recommended to the target user.
本发明还提供一种文本推荐装置,所述文本推荐装置包括:The present invention also provides a text recommendation device, the text recommendation device includes:
监控模块,用于监控目标用户的操作行为,根据所述操作行为确定与目标用户关联的关键词;a monitoring module for monitoring the operation behavior of the target user, and determining the keywords associated with the target user according to the operation behavior;
检索模块,用于从预设的文本数据库集合中检索出一个以上的包含至少一个所述关键词的更新文本,作为第一候选文本;a retrieval module, used for retrieving more than one updated text containing at least one of the keywords from a preset text database set, as the first candidate text;
选取模块,用于调取预设的事理图谱,根据所述预设的事理图谱从所述预设的文本数据库集合中选取与所述第一候选文本的总关联度不小于预设关联阈值的更新文本,作为第二候选文本,所述事理图谱包含文本与文本之间的关联关系,各关联关系有其对应的关联度;The selection module is used for retrieving a preset affair atlas, and according to the preset affair atlas, selects from the preset text database set the ones whose total degree of association with the first candidate text is not less than a preset association threshold. Update the text, as the second candidate text, the event graph includes the association relationship between the text and the text, and each association relationship has its corresponding association degree;
筛选模块,用于根据所述操作行为,从所述第一候选文本与所述第二候选文本中筛选出被选文本,并将所述被选文本推荐给所述目标用户。The screening module is configured to screen out the selected text from the first candidate text and the second candidate text according to the operation behavior, and recommend the selected text to the target user.
可选地,所述文本推荐装置还包括:Optionally, the text recommendation device further includes:
采集模块,用于每间隔预设时间段从所述预设的文本数据库集合中采集待处理文本;a collection module, used for collecting texts to be processed from the preset text database set at every preset time period;
预处理模块,用于通过预设正则表达式对所述待处理文本进行html标签过滤、符号过滤以及分句处理,得到分句列表构成的预处理文本;a preprocessing module, configured to perform html tag filtering, symbol filtering and clause processing on the text to be processed by using a preset regular expression to obtain preprocessed text formed by a clause list;
生成模块,用于根据所述预处理文本生成所述预设的事理图谱。A generating module, configured to generate the preset event graph according to the preprocessed text.
可选地,所述生成模块包括:Optionally, the generation module includes:
识别单元,用于对所述分句列表中的每条分句进行多个预设文本关联关系的识别,得到待处理节点文本,其中,所述预设文本关联关系包括但不限于顺承、因果、条件以及并列关系;An identification unit, configured to identify multiple preset text associations for each clause in the clause list, and obtain node texts to be processed, wherein the preset text associations include but are not limited to Shuncheng, causality, condition and juxtaposition;
第一获取单元,用于通过预设分词工具对所述待处理节点文本进行分词处理,并获取每个分词的词向量,基于每个分词的词向量得到每个待处理节点文本的节点向量;a first obtaining unit, configured to perform word segmentation processing on the node text to be processed by using a preset word segmentation tool, and obtain the word vector of each word segmentation, and obtain the node vector of each node text to be processed based on the word vector of each word segmentation;
第一计算单元,用于根据所述每个待处理节点文本的节点向量计算每个待处理节点文本与其他待处理节点文本之间的第一节点距离;a first computing unit, configured to calculate the first node distance between each to-be-processed node text and other to-be-processed node texts according to the node vector of each to-be-processed node text;
嫁接处理单元,用于将节点距离小于第一预设距离的两个待处理节点文本进行迭代嫁接处理,直至所述每个待处理节点文本处于节点文本关系边不再发生变化的收敛状态,其中,将所述处于收敛状态的各个待处理节点文本设为收敛节点文本;The grafting processing unit is used for iterative grafting processing of two to-be-processed node texts whose node distances are less than the first preset distance, until each of the to-be-processed node texts is in a convergent state where the relationship edges of the node texts no longer change, wherein , setting the text of each node to be processed in the convergent state as the text of the convergent node;
生成单元,用于基于所述收敛节点文本与所述收敛节点文本之间的节点文本关系边,生成所述预设的事理图谱。A generating unit, configured to generate the preset event graph based on the node text relationship edges between the convergent node text and the convergent node text.
可选地,所述选取模块包括:Optionally, the selection module includes:
调取单元,用于调取预设的事理图谱,判断所述事理图谱中是否存在对应分句中包含所述关键词的收敛节点文本;A retrieving unit, used for retrieving a preset event map, and judging whether there is a convergence node text that includes the keyword in the corresponding clause in the event map;
第一设置单元,用于若存在,则将所述对应分句中包含所述关键词的收敛节点文本设为用户关注节点文本,并从所述预设的文本数据库在预设时间段内更新的文本中选取第一候选文本外的第三候选文本,通过预设分词工具对所述第三候选文本中每篇文本的标题进行分词处理,得到第三候选文本中每篇文本的标题向量;The first setting unit is configured to set the convergence node text containing the keyword in the corresponding clause as the user's attention node text, if it exists, and update it from the preset text database within a preset time period In the text, select the third candidate text outside the first candidate text, carry out word segmentation processing to the title of each text in the third candidate text by the preset word segmentation tool, and obtain the title vector of each text in the third candidate text;
第二计算单元,用于计算所述标题向量与所述事理图谱中各收敛节点文本的节点向量之间的第二节点距离;a second computing unit for computing the second node distance between the title vector and the node vector of each convergent node text in the event graph;
选取单元,用于从所述第三候选文本中选取第二节点距离小于第二预设距离,且所述小于第二预设距离对应的收敛节点文本为用户关注节点文本的第一目标文本,或者从所述第三候选文本中选取第二节点距离小于第二预设距离,且所述小于第二预设距离对应的收敛节点文本的预设筛选逻辑深度范围内存在用户关注节点文本的第二目标文本,其中,所述筛选逻辑深度根据所述事理图谱中所述各关联关系的关联度确定;a selection unit, configured to select from the third candidate texts that the second node distance is less than the second preset distance, and the convergent node text corresponding to the less than the second preset distance is the first target text of the node text that the user pays attention to, Or select the second node distance from the third candidate text that is less than the second preset distance, and the preset screening logic depth range of the convergence node text corresponding to the less than the second preset distance exists. Two target texts, wherein the screening logic depth is determined according to the degree of association of the associations in the event map;
第二设置单元,用于将所述第一目标文本与所述第二目标文本设为所述第二候选文本。A second setting unit, configured to set the first target text and the second target text as the second candidate text.
可选地,所述筛选模块包括:Optionally, the screening module includes:
第二获取单元,用于获取所述第一候选文本与所述第二候选文本中每篇文本的传播量、并获取每篇文本与所述目标用户的相关度,根据所述操作行为获取所述目标用户的偏好度;The second obtaining unit is configured to obtain the dissemination volume of each text in the first candidate text and the second candidate text, obtain the correlation between each text and the target user, and obtain the information according to the operation behavior. Describe the preferences of target users;
推荐单元,用于根据所述传播量、所述相关度与所述偏好度,从所述第一候选文本与所述第二候选文本中筛选出被选文本,并将所述被选文本推荐给所述目标用户。a recommending unit, configured to filter out the selected text from the first candidate text and the second candidate text according to the spread, the relevance and the preference, and recommend the selected text to the target user.
可选地,所述第二获取单元包括:Optionally, the second obtaining unit includes:
第一获取子单元,用于获取所述关键词在所述第一候选文本的每篇文本中出现的次数,将所述次数设为词次数;A first acquisition subunit, for acquiring the number of times the keyword appears in each text of the first candidate text, and setting the number of times as the number of words;
第二获取子单元,用于获取所述关键词在所述第一候选文本的每篇文本中出现的位置,将所述位置设为词位置,并获取所述词位置对应预设的位置权重,其中,词位置不同,位置权重不同,所述词位置包括文本首段首句位置,文本尾段首句位置,文本首段非首句位置、文本尾段非首句位置、非首段首句位置以及非尾段首句位置;The second obtaining subunit is configured to obtain the position where the keyword appears in each text of the first candidate text, set the position as a word position, and obtain a preset position weight corresponding to the word position , wherein the word position is different, the position weight is different, and the word position includes the position of the first sentence of the first paragraph of the text, the position of the first sentence of the last paragraph of the text, the position of the non-first sentence of the first paragraph of the text, the position of the non-first sentence of the last paragraph of the text, and the first sentence of the non-first paragraph. Sentence position and non-final first sentence position;
第三获取子单元,用于获取所述关键词在所述第一候选文本的每篇文本中第一次和最后一次出现的位置之间间隔的句子数量与全文总句数的比值,将所述比值设为词跨度;The third obtaining subunit is used to obtain the ratio of the number of sentences between the positions where the keyword appears for the first time and the last time in each text of the first candidate text to the total number of sentences in the full text, The ratio is set to the word span;
第四获取子单元,用于获取所述关键词在所述第一候选文本的每篇文本中第一次和最后一次出现的位置之间的目标正文,获取所述目标正文中平均每预设句数中包含所述关键词的数量,将平均每预设句数中包含所述关键词的数量设为词密度;The fourth obtaining subunit is used to obtain the target text between the positions where the keyword appears for the first time and the last time in each text of the first candidate text, and obtain the average per preset value in the target text. The number of the keywords contained in the number of sentences, and the number of the keywords contained in the average number of sentences per preset is set as the word density;
第五获取子单元,用于根据所述词次数、所述词位置对应预设的位置权重、所述词跨度与所述词密度获取所述第一候选文本中每篇文本的第一相关度;The fifth obtaining subunit is used to obtain the first correlation degree of each text in the first candidate text according to the word count, the preset position weight corresponding to the word position, the word span and the word density ;
第六获取子单元,用于获取所述第二候选文本中每篇文本的筛选逻辑深度,根据所述筛选逻辑深度确定所述第二候选文本中每篇文本的第二相关度。The sixth obtaining subunit is configured to obtain the screening logic depth of each text in the second candidate text, and determine the second relevancy degree of each text in the second candidate text according to the screening logic depth.
可选地,所述第二获取单元包括:Optionally, the second obtaining unit includes:
第七获取子单元,用于从所述操作行为中获取所述目标用户的历史浏览文本,获取所述历史浏览文本中每篇文本的第一文档向量,并获取所述第一候选文本与所述第二候选文本中每篇文本的第二文档向量;The seventh obtaining subunit is used to obtain the historical browsing text of the target user from the operation behavior, obtain the first document vector of each text in the historical browsing text, and obtain the first candidate text and all the texts. Describe the second document vector of each text in the second candidate text;
第八获取子单元,用于获取所述第二文档向量与所述第一文档向量之间的第一皮尔逊相关系数,根据所述第一皮尔逊相关系数获取所述目标用户的偏好度。The eighth obtaining subunit is configured to obtain the first Pearson correlation coefficient between the second document vector and the first document vector, and obtain the preference degree of the target user according to the first Pearson correlation coefficient.
可选地,所述第七获取子单元用于实现:Optionally, the seventh acquisition subunit is used to realize:
根据预设聚类算法获取所述历史浏览文本中每篇文本被划分在第一预设类别下的第一概率矩阵;Obtain a first probability matrix in which each text in the historical browsing text is classified under a first preset category according to a preset clustering algorithm;
根据预设分词算法获取所述历史浏览文本中每篇文本的分词词语,获取所述分词词语被划分在第二预设类别下的第二概率矩阵;Obtain the word segmentation of each text in the historical browsing text according to a preset word segmentation algorithm, and obtain a second probability matrix in which the word segmentation is divided into a second preset category;
根据所述第一概率矩阵与所述第二概率矩阵获取所述历史浏览文本中每篇文本对应的各个优化词向量;Obtain each optimized word vector corresponding to each text in the historical browsing text according to the first probability matrix and the second probability matrix;
根据所述优化词向量获取所述历史浏览文本中每篇文本的第一文档向量。The first document vector of each text in the historical browsing text is obtained according to the optimized word vector.
可选地,所述第八获取子单元用于实现:Optionally, the eighth acquisition subunit is used to implement:
获取所述历史浏览文本中每篇文本被点击浏览时至当前时刻之间的历史浏览时间;Obtain the historical browsing time between the time when each text in the historical browsing text is clicked to browse to the current moment;
获取所述第二文档向量与所述第一文档向量之间的第一皮尔逊相关系数,根据所述历史浏览时间对所述第一皮尔逊相关系数进行兴趣降权处理,得到第二皮尔逊相关系数;Obtain the first Pearson correlation coefficient between the second document vector and the first document vector, and perform interest downweight processing on the first Pearson correlation coefficient according to the historical browsing time to obtain a second Pearson correlation coefficient correlation coefficient;
根据所述第二皮尔逊相关系数获取所述目标用户的偏好度。The preference of the target user is acquired according to the second Pearson correlation coefficient.
可选地,所述筛选模块包括:Optionally, the screening module includes:
第三计算单元,用于根据所述传播量、所述第一相关度、所述第二相关度与所述偏好度,计算所述第一候选文本与所述第二候选文本中每篇文本的价值分数;a third calculation unit, configured to calculate each text in the first candidate text and the second candidate text according to the spread, the first degree of relevancy, the second degree of relevancy and the preference degree value score;
筛选单元,用于根据所述价值分数从高至低依次选取预设数量的文本作为被选文本,并将所述被选文本推荐给所述目标用户。A screening unit, configured to sequentially select a preset number of texts as selected texts according to the value score from high to low, and recommend the selected texts to the target user.
本发明还提供一种介质,所述介质上存储有文本推荐程序,所述文本推荐程序被处理器执行时实现如上述的文本推荐方法的步骤。The present invention also provides a medium on which a text recommendation program is stored, and when the text recommendation program is executed by a processor, implements the steps of the above text recommendation method.
本发明监控目标用户的操作行为,并根据所述操作行为确定与目标用户关联的关键词;在获取关键词后,从预设的文本数据库集合中检索出一个以上的包含至少一个所述关键词的更新文本,作为第一候选文本;在获取第一候选文本后,调取预设的事理图谱,根据所述预设的事理图谱从所述预设的文本数据库集合中选取与所述第一候选文本的总关联度不小于预设关联阈值的更新文本,作为第二候选文本,第二候选文本的获取扩大了推荐过程中备选文本的选取范畴,所述事理图谱包含文本与文本之间的关联关系,各关联关系有其对应的关联度;根据所述操作行为,从所述第一候选文本与所述第二候选文本中筛选出被选文本,并将所述被选文本推荐给所述目标用户。即在本申请中,不是单一只从根据关键词搜索出来的第一候选文本中选出被选文本,而是从根据所述预设的事理图谱等得到的第二候选文本与第一候选文本集合中选出被选文本,因而避免了内容推荐的单一化,且由于本申请中综合参考文本与文本之间的关联关系进行内容的推荐而不是只是单一根据关键词进行推荐,因而本申请可以提升推荐准确率。The present invention monitors the operation behavior of the target user, and determines the keywords associated with the target user according to the operation behavior; after acquiring the keywords, retrieves one or more documents containing at least one of the keywords from a preset text database set The updated text is regarded as the first candidate text; after obtaining the first candidate text, the preset event map is called, and the preset event map is selected from the preset text database set according to the preset event map. The updated text whose total correlation degree of candidate text is not less than the preset correlation threshold is regarded as the second candidate text. The acquisition of the second candidate text expands the selection category of the candidate text in the recommendation process. Each association has its corresponding degree of association; according to the operation behavior, the selected text is screened out from the first candidate text and the second candidate text, and the selected text is recommended to the target user. That is, in this application, the selected text is not only selected from the first candidate text searched according to the keywords, but the second candidate text and the first candidate text obtained according to the preset event map, etc. The selected text is selected from the collection, thus avoiding the simplification of content recommendation, and because the association relationship between the reference text and the text in this application is used to recommend content instead of just recommending based on keywords alone, this application can Improve recommendation accuracy.
附图说明Description of drawings
图1为本发明文本推荐方法第一实施例的流程示意图;1 is a schematic flowchart of a first embodiment of a text recommendation method according to the present invention;
图2为本发明文本推荐方法第二实施例中基于调取预设的事理图谱,根据所述预设的事理图谱从所述预设的文本数据库集合中选取与所述第一候选文本的总关联度不小于预设关联阈值的更新文本步骤之前的细化流程示意图;2 is a second embodiment of the text recommendation method of the present invention based on retrieving a preset event map, and selecting the total content with the first candidate text from the preset text database set according to the preset event map. A schematic diagram of the refinement process before the step of updating the text with the degree of association not less than the preset association threshold;
图3是本发明实施例方法涉及的硬件运行环境的设备结构示意图;3 is a schematic diagram of a device structure of a hardware operating environment involved in a method according to an embodiment of the present invention;
图4是本发明文本推荐方法涉及的第一场景示意图;4 is a schematic diagram of a first scenario involved in the text recommendation method of the present invention;
图5是本发明文本推荐方法涉及的第二场景示意图;5 is a schematic diagram of a second scenario involved in the text recommendation method of the present invention;
图6是本发明文本推荐方法涉及的第三场景示意图;6 is a schematic diagram of a third scenario involved in the text recommendation method of the present invention;
图7是本发明文本推荐方法涉及的第四场景示意图;7 is a schematic diagram of a fourth scenario involved in the text recommendation method of the present invention;
图8是本发明文本推荐方法涉及的第五场景示意图;8 is a schematic diagram of a fifth scenario involved in the text recommendation method of the present invention;
图9是本发明文本推荐方法涉及的第六场景示意图;9 is a schematic diagram of a sixth scenario involved in the text recommendation method of the present invention;
图10是本发明文本推荐方法涉及的第七场景示意图;10 is a schematic diagram of a seventh scenario involved in the text recommendation method of the present invention;
图11是本发明文本推荐方法涉及的第八场景示意图;11 is a schematic diagram of an eighth scenario involved in the text recommendation method of the present invention;
图12是本发明文本推荐方法涉及的第九场景示意图;12 is a schematic diagram of a ninth scenario involved in the text recommendation method of the present invention;
图13是本发明文本推荐方法涉及的流程示意图。FIG. 13 is a schematic flowchart of the text recommendation method of the present invention.
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics and advantages of the present invention will be further described with reference to the accompanying drawings in conjunction with the embodiments.
具体实施方式Detailed ways
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.
本发明提供一种文本推荐方法,在文本推荐方法一实施例中,参照图1,所述文本推荐方法包括:The present invention provides a text recommendation method. In an embodiment of the text recommendation method, referring to FIG. 1 , the text recommendation method includes:
步骤S10,监控目标用户的操作行为,根据所述操作行为确定与目标用户关联的关键词;Step S10, monitoring the operation behavior of the target user, and determining the keywords associated with the target user according to the operation behavior;
步骤S20,从预设的文本数据库集合中检索出一个以上的包含至少一个所述关键词的更新文本,作为第一候选文本;Step S20, retrieve more than one updated text containing at least one of the keywords from a preset text database set, as the first candidate text;
步骤S30,调取预设的事理图谱,根据所述预设的事理图谱从所述预设的文本数据库集合中选取与所述第一候选文本的总关联度不小于预设关联阈值的更新文本,作为第二候选文本,所述事理图谱包含文本与文本之间的关联关系,各关联关系有其对应的关联度;Step S30, fetching a preset affair atlas, and selecting, according to the preset affair atlas, from the preset text database set an updated text whose total degree of association with the first candidate text is not less than a preset association threshold , as the second candidate text, the event graph includes the association relationship between texts, and each association relationship has its corresponding association degree;
步骤S40,根据所述操作行为,从所述第一候选文本与所述第二候选文本中筛选出被选文本,并将所述被选文本推荐给所述目标用户。Step S40, according to the operation behavior, screen out the selected text from the first candidate text and the second candidate text, and recommend the selected text to the target user.
具体步骤如下:Specific steps are as follows:
步骤S10,监控目标用户的操作行为,根据所述操作行为确定与目标用户关联的关键词;Step S10, monitoring the operation behavior of the target user, and determining the keywords associated with the target user according to the operation behavior;
目前,市面上出现了越来越多的舆情系统,以满足企业对网络舆情的监测和个体对热点事件的专题追踪等需求,具体地,舆情系统可以帮助企业实现倾听目标受众想法、分析行业趋势、管理品牌声誉与进行危机预警等功能,当前舆情系统一般通过如下过程实现上述功能:1、数据采集:采集全网所有信息源,全网所有信息源包括新闻网媒,论坛,博客,微博以及各类资讯客户端等;2、数据筛选:根据在舆情系统上配置的监控任务关键词来筛选新闻数据,例如,若某篇新闻的正文中包含有用户配置的关键词,则保留该篇新闻用于后续的处理;3、数据处理:对所有包含关键词的新闻,依次计算文本的情感倾向,新闻的传播量,新闻正文与关键词的相关程度等;4、数据推送:综合考虑新闻的情感,传播量,相关程度,以及舆情系统上用户对历史推送新闻数据的点击偏好,将1-3步处理后的新闻数据进行排序,挑选用户最可能感兴趣的多篇新闻进行推送,也即,在现有技术中,完全依赖用户配置的关键词以及用户对历史推送新闻数据的点击偏好,进行目标内容的推荐,而目标内容推送过程中,在分析用户偏好时都是基于word2vec(一种词向量模型,可以根据词向量之间的距离来描述中文词汇的语义相似性)类词向量来处理的,在分析用户偏好时都是基于word2vec类词向量来处理的,这样会把用户的偏好狭义化,导致推送的内容单一化。具体地,比如用户在某个时间段多次点击了“企业A与银行B达成合作”这样的一篇新闻,通过word2vec类词向量处理后,舆情系统学习到用户更加偏好企业A和银行B之间的新闻。假若出现了另一篇新闻“企业A与银行C在某大学投资数亿共同建立实验室”,舆情系统大概率不会认为这是用户偏好或者喜欢的新闻,而用户实际的偏好是企业A在金融领域的布局情况,而只推送企业A和银行B之间的新闻很显然导致推送的内容单一化。另外,目标内容推送过程中,完全依赖用户配置的关键词以及用户对历史推送新闻数据的点击偏好,进行目标内容的推荐,会导致推荐给用户的新闻数据等内容存在推荐过于单一化以及推荐准确率低的技术问题,比如用户的监控目标是长租公寓,关键词都是长租公寓相关。由于P2P爆雷会间接导致长租公寓爆仓,在未配置P2P相关关键词的情况下,用户无法收到这样的消息,也就无法提前预测长租公寓可能爆仓的风险。因此完全依赖关键词的数据筛选,推荐准确率低在诸多场景下都偏低。At present, more and more public opinion systems have appeared on the market to meet the needs of enterprises to monitor online public opinions and individual thematic tracking of hot events. Specifically, public opinion systems can help enterprises to listen to the ideas of target audiences and analyze industry trends. , manage brand reputation and carry out crisis warning and other functions, the current public opinion system generally achieves the above functions through the following processes: 1. Data collection: collect all information sources on the entire network, including news media, forums, blogs, microblogs 2. Data screening: filter news data according to the monitoring task keywords configured on the public opinion system. For example, if the text of a certain news contains keywords configured by the user, the article will be kept. News is used for subsequent processing; 3. Data processing: For all news containing keywords, sequentially calculate the emotional tendency of the text, the amount of news dissemination, the degree of correlation between news text and keywords, etc.; 4. Data push: comprehensively consider news The sentiment, the amount of communication, the degree of relevance, and the user’s click preference on the historically pushed news data on the public opinion system, sort the news data processed in steps 1-3, and select multiple news articles that users are most likely to be interested in to push, and also That is, in the prior art, the recommendation of the target content is carried out completely relying on the keywords configured by the user and the user's click preference on the historically pushed news data, and in the process of pushing the target content, the user preference is analyzed based on word2vec (one The word vector model, which can describe the semantic similarity of Chinese words according to the distance between the word vectors) is processed based on the word vector of the word2vec type, and is processed based on the word2vec type word vector when analyzing user preferences. The preference is narrow, which leads to the simplification of the pushed content. Specifically, for example, a user has clicked on a piece of news such as "Enterprise A and Bank B have reached a cooperation" multiple times in a certain period of time. After processing the word2vec word vector, the public opinion system learns that the user prefers the relationship between Enterprise A and Bank B. news in between. If there is another piece of news "Enterprise A and Bank C invest hundreds of millions in a university to jointly build a laboratory", the public opinion system will probably not think that this is the news that the user prefers or likes, but the actual preference of the user is that Enterprise A is in The layout of the financial field, and only pushing the news between enterprise A and bank B obviously leads to the simplification of the content of the push. In addition, in the process of pushing the target content, the recommendation of the target content completely relies on the keywords configured by the user and the user's click preference on the historically pushed news data, which will lead to the recommendation of the news data and other content recommended to the user is too single and the recommendation is accurate. For technical problems with low rates, such as the user's monitoring target is long-term rental apartments, the keywords are all related to long-term rental apartments. Since the explosion of P2P mines will indirectly lead to the explosion of long-term rental apartments, if P2P-related keywords are not configured, users cannot receive such news, and they cannot predict the risk of explosion of long-term rental apartments in advance. Therefore, it is completely dependent on the data screening of keywords, and the recommendation accuracy is low in many scenarios.
为解决上述技术问题,本实施例中监控目标用户的操作行为,根据所述操作行为确定与目标用户关联的关键词,该操作行为包括滑动行为或者输入关键词等触发的搜索行为,如果操作行为是滑动行为,根据所述操作行为确定与目标用户关联的关键词可以为:提取预存的与目标用户关联的关键词,该预存的与目标用户关联的关键词基于目标用户的历史浏览文本获取得到,如果操作行为是输入关键词的搜索行为,与目标用户关联的关键词可以为该输入关键词,或者可以为该输入关键词与预存的与目标用户关联的关键词的结合。In order to solve the above-mentioned technical problems, in this embodiment, the operation behavior of the target user is monitored, and the keywords associated with the target user are determined according to the operation behavior, and the operation behavior includes the search behavior triggered by sliding behavior or inputting keywords. is a sliding behavior, and determining the keyword associated with the target user according to the operation behavior may be: extracting a pre-stored keyword associated with the target user, and the pre-stored keyword associated with the target user is obtained based on the historical browsing text of the target user. , if the operation behavior is a search behavior of inputting a keyword, the keyword associated with the target user may be the input keyword, or may be a combination of the input keyword and a pre-stored keyword associated with the target user.
步骤S20,从预设的文本数据库集合中检索出一个以上的包含至少一个所述关键词的更新文本,作为第一候选文本;Step S20, retrieve more than one updated text containing at least one of the keywords from a preset text database set, as the first candidate text;
预设的文本数据库集合中包括新闻网媒,论坛,博客,微博以及其他各类资讯客户端等构成的数据库集合。The preset text database collection includes database collections composed of news network media, forums, blogs, Weibo and other various information clients.
从预设的文本数据库集合中检索出一个以上的包含至少一个所述关键词的更新文本,作为第一候选文本,从预设的文本数据库集合中检索出一个以上的包含至少一个所述关键词的更新文本,作为第一候选文本可以为:实时从预设的文本数据库集合中检索出一个以上的包含至少一个所述关键词的更新文本,作为第一候选文本(便于实时推荐),或者每间隔一定时间段实时从预设的文本数据库集合中检索出一个以上的包含至少一个所述关键词的更新文本,作为第一候选文本(便于定时推荐),还或者只是此次从预设的文本数据库集合中检索出一个以上的包含至少一个所述关键词的更新文本,作为第一候选文本(便于目标用户搜索时进行推荐)等。One or more updated texts containing at least one of the keywords are retrieved from the preset text database set, and as the first candidate text, one or more updated texts containing at least one of the keywords are retrieved from the preset text database set The updated text, as the first candidate text, can be: retrieve more than one update text containing at least one of the keywords from the preset text database collection in real time, as the first candidate text (for real-time recommendation), or every At intervals of a certain period of time, one or more updated texts containing at least one of the keywords are retrieved from the preset text database collection in real time, as the first candidate text (for timing recommendation), or only this time from the preset text. One or more updated texts containing at least one of the keywords are retrieved from the database set as the first candidate texts (to facilitate the target user to make recommendations when searching), and the like.
步骤S30,调取预设的事理图谱,根据所述预设的事理图谱从所述预设的文本数据库集合中选取与所述第一候选文本的总关联度不小于预设关联阈值的更新文本,作为第二候选文本,所述事理图谱包含文本与文本之间的关联关系,各关联关系有其对应的关联度;Step S30, fetching a preset affair atlas, and selecting, according to the preset affair atlas, from the preset text database set an updated text whose total degree of association with the first candidate text is not less than a preset association threshold , as the second candidate text, the event graph includes the association relationship between texts, and each association relationship has its corresponding association degree;
其中,预设的事理图谱是已经生成且实时或者定时更新的,总关联度等可以与词语之间的逻辑深度或者是词语之间的距离等来确定,如总关联度不小于预设关联阈值可以是逻辑深度不小于2个深度单位或者是词语之间的距离不小于10个预设距离单位等,而文本与文本之间的关联关系可以为因果,顺承等关联关系,因果,顺承等关联关系的关联度不同。调取预设的事理图谱,根据所述预设的事理图谱从所述预设的文本数据库集合中选取与所述第一候选文本的总关联度不小于预设关联阈值的更新文本,作为第二候选文本,第二候选文本的提高了内容推荐过程中的推荐范围。Among them, the preset affair map has been generated and updated in real time or regularly, and the total relevance can be determined by the logical depth between words or the distance between words, for example, the total relevance is not less than the preset relevance threshold. It can be that the logical depth is not less than 2 depth units or the distance between words is not less than 10 preset distance units, etc., and the relationship between texts can be causality, inheritance and other related relationships. The correlation degree of the relationship is different. Retrieve the preset affair atlas, and select, from the preset text database set according to the preset affair atlas, the updated text whose total degree of association with the first candidate text is not less than the preset association threshold, as the first Two candidate texts, the second candidate text improves the recommendation range in the content recommendation process.
其中,如图2所示,所述调取预设的事理图谱,根据所述预设的事理图谱从所述预设的文本数据库集合中选取与所述第一候选文本的总关联度不小于预设关联阈值的更新文本步骤之前包括:Wherein, as shown in FIG. 2 , when the preset event map is retrieved, the total correlation degree with the first candidate text is selected from the preset text database set according to the preset event map and is not less than The update text step for preset association thresholds includes:
步骤A1,每间隔预设时间段从所述预设的文本数据库集合中采集待处理文本;Step A1, collecting the text to be processed from the preset text database set at every preset time period;
步骤A2,通过预设正则表达式对所述待处理文本进行html标签过滤、符号过滤以及分句处理,得到分句列表构成的预处理文本;Step A2, performing html tag filtering, symbol filtering and clause processing on the text to be processed by a preset regular expression, to obtain a preprocessed text formed by a clause list;
步骤A3,根据所述预处理文本生成所述预设的事理图谱。Step A3, generating the preset event map according to the preprocessed text.
本实施例中,每间隔预设时间段(可以包括实时)从所述预设的文本数据库集合中采集待处理文本,其中,由于每天采集的待处理文本量在千万量级,因此可以采用预设采集模型如预设spark streaming模型来完成采集。在得到待处理文本后,通过预设正则表达式对所述待处理文本进行html标签过滤、符号过滤以及分句处理,得到分句列表构成的预处理文本,可选地,每间隔预设时间段从所述预设的文本数据库集合中采集待处理文本,在得到待处理文本后,通过预设正则表达式对所述待处理文本中的每篇文本进行html标签过滤、符号过滤以及分句处理,得到分句列表构成的预处理文本,具体地,通过以下4条正则表达式过滤掉待处理文本的每篇文本正文中的html标签,第一条:'//<!\[CDATA\[[^>]*//\]\]>',第二条:'<\s*script[^>]*>[^<]*<\s*/\s*script\s*>',第三条:'<\s*style[^>]*>[^<]*<\s*/\s*style\s*>',第四条:'<!--[^>]*-->',通过上述4条正则表达式过滤掉正文中的html标签后,通过以下4条正则表达式过滤掉待处理文本的每篇文本中的表情符号,第一条:"\U0001F600-\U0001F64F",第二条:"\U0001F300-\U0001F5FF",第三条:"\U0001F680-\U0001F6FF",第四条:"\U0001F1E0-\U0001F1FF",在过滤后,对待处理文本如每篇文本中正文进行分句:按照标点符号“。”,“?”,“?”,“!”,“!”等将正文切分成句子列表,本实施例中,虽然对每篇每篇文本分开进行分句处理,但是句子列表中各篇文本的分句可以是混合的而不是根据每篇文本进行区分的。In this embodiment, texts to be processed are collected from the preset text database set every preset time period (which may include real-time), wherein, since the amount of texts to be processed collected every day is on the order of tens of millions, it is possible to use The preset collection model, such as the preset spark streaming model, is used to complete the collection. After the text to be processed is obtained, html tag filtering, symbol filtering and clause processing are performed on the text to be processed by using a preset regular expression, so as to obtain a preprocessed text formed by a clause list, optionally, every preset time interval The segment collects the text to be processed from the preset text database set, and after obtaining the text to be processed, performs html tag filtering, symbol filtering and sentence segmentation on each text in the text to be processed through a preset regular expression After processing, the preprocessed text composed of the clause list is obtained. Specifically, the html tags in each text body of the text to be processed are filtered out through the following four regular expressions, the first one: '//<! \[CDATA\[[^>]*//\]\]>', the second line: '<\s*script[^>]*>[^<]*<\s*/\s*script\ s*>', the third item: '<\s*style[^>]*>[^<]*<\s*/\s*style\s*>', the fourth item: '<! --[^>]*-->', after filtering out the html tags in the text through the above 4 regular expressions, filter out the emojis in each text of the text to be processed through the following 4 regular expressions, the first One: "\U0001F600-\U0001F64F", the second: "\U0001F300-\U0001F5FF", the third: "\U0001F680-\U0001F6FF", the fourth: "\U0001F1E0-\U0001F1FF", after filtering , the text to be processed, such as the main text of each text, is divided into sentences: according to the punctuation marks ".", "?", "?", "!", "!", etc., the main text is divided into sentence lists. In this embodiment, although Sentence processing is performed separately for each text, but the sentences for each text in the sentence list can be mixed rather than differentiated on a per-text basis.
所述根据所述预处理文本生成所述预设的事理图谱步骤包括:The step of generating the preset event map according to the preprocessed text includes:
步骤A31,对所述分句列表中的每条分句进行多个预设文本关联关系的识别,得到待处理节点文本,其中,所述预设文本关联关系包括但不限于顺承、因果、条件以及并列关系;Step A31: Identify multiple preset text associations for each clause in the clause list, and obtain node texts to be processed, wherein the preset text associations include but are not limited to succession, causality, conditions and juxtapositions;
对所述分句列表中的每条分句进行多个预设文本关联关系的识别,该预设文本关联关系包括但不限于顺承、因果、条件以及并列关系等类型,对所述分句列表中的每条分句进行多个预设文本关联关系的识别后,得到待处理节点文本。Identify multiple preset text associations for each clause in the clause list. The preset text associations include but are not limited to types such as inheritance, causality, condition, and parallel relationship. After multiple preset text associations are identified for each clause in the list, the node text to be processed is obtained.
具体地,可以根据预设事件分词工具从预处理文本的每篇文本的每条文本语句中识别出表顺承/因果/条件/并列关系的两个事件(两个事件短语),两个事件短语即可以设为事理图谱中待处理节点文本,且事理图谱用一条有向边(有指向的线条)将这两个待处理节点文本连接起来,比如从“央行降息将使得贷款成本变低”的文本语句中可以得到如图4的待处理节点文本,本实施例中,还可以基于预设文本关联关系抽取模型,从预处理文本的每篇文本的每条文本语句中识别出表顺承/因果/条件/并列关系的待处理节点文本,其中,预设文本关联关系中预设的表顺承的词语组合有:(首先,其次),(首先,然后),(一方面,一方面),(先是,进而),(先是,然后),(先是,再)等,如果一条句子中同时包含上述表顺承的词语组合某个词组中的两个词语,且两个词语在句子中的出现顺序与词组中定义的顺序一致,通过预设文本关联关系抽取模型中预设的引导分句模型把这两个词语引导的两个分句抽取出来,去掉分句中的所有标点符号(预设的),语气词(预设的),助词(预设的)和停用词(预设的)等,作为事理图谱中的两个待处理节点文本,同时用一条表顺承的有向边(逻辑关系的边)将这两个待处理节点文本连接起来。例如句子“首先A,其次B”,处理后如图5所示。Specifically, two events (two event phrases) representing succession/causality/condition/parallel relationship can be identified from each text sentence of each text of the preprocessed text according to the preset event word segmentation tool. The phrase can be set as the node text to be processed in the event graph, and the event map uses a directed edge (pointed line) to connect the two node texts to be processed, for example, from "The central bank's interest rate cut will make the loan cost lower" The node text to be processed as shown in Fig. 4 can be obtained from the text sentence of . In the present embodiment, the table can also be identified from each text sentence of each text of the preprocessed text based on the preset text association extraction model. /Cause/Condition/Parallel relationship of node texts to be processed, wherein the preset word combinations in the preset text association relationship are: (first, second), (first, then), (on the one hand, on the other hand ), (first, then), (first, then), (first, then), etc., if a sentence contains both the above-mentioned words and expressions to combine two words in a certain phrase, and the two words are in the sentence The order of appearance is consistent with the order defined in the phrase, and the two clauses guided by these two words are extracted by the preset guiding clause model in the preset text association extraction model, and all punctuation marks in the clauses are removed ( default), modal particles (default), auxiliary words (default) and stop words (default), etc., as the two pending node texts in the event graph, and at the same time use a table to follow the To edge (the edge of the logical relationship) connects the two pending node texts. For example, the sentence "First A, then B" is processed as shown in Figure 5.
同样地,预设的表因果的词语组合有:(因为,所以),(因为,导致),(因为,使得),(因为,故而),(正因为,所以),(正因为,导致),(正因为,使得),(正因为,故而),(既然,那么),(既然,就),(一旦,就),(由于,因此),(由于,所以),(由于,导致),(由于,因而),(由于,使得),(由于,故而),(_,因此),(_,所以),(_,导致),(_,因而),(_,使得),(_,故而)等。词组中的下划线“_”表示空词,在后续的匹配过程中可以忽略对空词的匹配。如果一条句子中同时包含上述表因果的词语组合某个词组中的两个词语,且两个词语在句子中的出现顺序与词组中定义的顺序一致,通过预设的引导分句模型把这两个词语引导的两个分句抽取出来,去掉分句中的所有标点符号(预设的),语气词(预设的),助词(预设的)和停用词(预设的)等,作为图谱中的两个待处理节点文本,同时用一条表因果的有向边(逻辑关系的边)将这两个待处理节点文本连接起来。例如句子“因为A,所以B”,处理后得到图6。Similarly, the presupposed word combinations that express cause and effect are: (because, so), (because, cause), (because, cause), (because, therefore), (because, so), (because, cause) , (because, make), (because, therefore), (since, then), (since, then), (once, then), (because, therefore), (because, so), (because, cause) , (due to, therefore), (due to, made), (due to, therefore), (_, therefore), (_, therefore), (_, caused), (_, thus), (_, made), ( _, therefore) etc. The underscore "_" in the phrase represents an empty word, and the matching of the empty word can be ignored in the subsequent matching process. If a sentence contains the above causal words to combine two words in a certain phrase, and the order of appearance of the two words in the sentence is the same as the order defined in the phrase, the two words are divided by the preset guiding clause model. The two clauses guided by each word are extracted, and all punctuation marks (preset), modal particles (preset), auxiliary words (preset) and stop words (preset) in the clauses are removed. As two to-be-processed node texts in the graph, a causal directed edge (logical relationship edge) is used to connect the two to-be-processed node texts. For example, the sentence "because of A, so B" is processed to obtain Figure 6.
同样地,预设的表条件的词语组合有:(如果,那么),(如果,就),(假如,那么),(假如,就),(假使,那么),(假使,就),(假若,那么),(假若,就),(一旦,就),(只要,就),(要是,就),(只有,才)等。如果一条句子中同时包含上述表条件的词语组合某个词组中的两个词语,且两个词语在句子中的出现顺序与词组中定义的顺序一致,通过预设的引导分句模型把这两个词语引导的两个分句抽取出来,去掉分句中的所有标点符号(预设的),语气词(预设的),助词(预设的)和停用词(预设的)等,作为图谱中的两个待处理节点文本,同时用一条表条件的有向边(逻辑关系的边)将这两个待处理节点文本连接起来,例如句子“如果A,那么B”,处理后得到图7。Similarly, the word combinations of the preset table conditions are: (if, then), (if, then), (if, then), (if, then), (if, then), (if, then), ( If, then), (if, then), (once, then), (as long as, then), (if, then), (only, only) and so on. If a sentence contains the words of the above table conditions at the same time to combine two words in a certain phrase, and the appearance order of the two words in the sentence is the same as the order defined in the phrase, the two words are divided by the preset guiding clause model. The two clauses guided by each word are extracted, and all punctuation marks (preset), modal particles (preset), auxiliary words (preset) and stop words (preset) in the clauses are removed. As two node texts to be processed in the graph, a directed edge (the edge of a logical relationship) of a table condition is used to connect the two node texts to be processed, such as the sentence "If A, then B", after processing, we get Figure 7.
同样地,预设的表并列的词语组合有:(不但,而且),(不但,并且),(不但,还),(不但,也),(不只,而且),(不只,并且),(不只,还),(不只,也),(不仅,而且),(不仅,并且),(不仅,还),(不仅,也),(不单,而且),(不单,并且),(不单,还),(不单,也),(要么,要么),(要么,或者),(或者,或者)等。如果一条句子中同时包含上述表并列的词语组合某个词组中的两个词语,且两个词语在句子中的出现顺序与词组中定义的顺序一致,通过预设的引导分句模型把这两个词语引导的两个分句抽取出来,去掉分句中的所有标点符号(预设的),语气词(预设的),助词(预设的)和停用词(预设的)等,作为图谱中的两个待处理节点文本,同时用一条表并列的有向边(逻辑关系的边)将这两个待处理节点文本连接起来。例如句子“不但A,而且B”,处理后得到图8。Likewise, the presupposed list of juxtaposed word combinations are: (not only, and), (not only, and), (not only, also), (not only, also), (not only, and), (not only, and), ( not only, also), (not only, also), (not only, and), (not only, and), (not only, also), (not only, also), (not only, but), (not only, and), (not only, also), (not only, also), (either, or), (or, or), (or, or), etc. If a sentence contains the parallel words in the above table to combine two words in a certain phrase, and the order of appearance of the two words in the sentence is the same as the order defined in the phrase, the two words are divided by the preset guiding clause model. The two clauses guided by each word are extracted, and all punctuation marks (preset), modal particles (preset), auxiliary words (preset) and stop words (preset) in the clauses are removed. As two to-be-processed node texts in the graph, the two to-be-processed node texts are connected by a table-parallel directed edge (edge of logical relationship). For example, the sentence "Not only A, but also B" is processed to obtain Figure 8.
需要说明的是,若预处理文本中包括有大于预设数目如十万条的文本数据,可以通过预设双向抽取网络模型来抽取预处理文本中每条文本语句中的多个预设文本关联关系的事件短句。It should be noted that if the preprocessed text contains more than a preset number of text data, such as 100,000 pieces of text data, a preset bidirectional extraction network model can be used to extract multiple preset text associations in each text sentence in the preprocessed text. Relational event phrases.
步骤A32,通过预设分词工具对所述待处理节点文本进行分词处理,并获取每个分词的词向量,基于每个分词的词向量得到每个待处理节点文本的节点向量;Step A32, perform word segmentation processing on the node text to be processed by a preset word segmentation tool, and obtain the word vector of each word segmentation, and obtain the node vector of each node text to be processed based on the word vector of each word segmentation;
通过预设分词工具如预设结巴分词工具(一种开源的中文分词工具,可以对输入的中文文本进行切词以及词性标注)对所述待处理节点文本进行分词处理,分词处理后通过预设word2vec(一种词向量模型,将每一个中文词汇映射为一个高维向量(可以取200维向量),假设得到高维向量5个,abcde,将这5个的词向量按对应维度以及维度权重相加,就可以得到分词的词向量)得到每个分词的词向量,基于每个分词的词向量得到每个待处理节点文本的节点向量。The word segmentation process is performed on the node text to be processed by using a preset word segmentation tool, such as the preset stammer word segmentation tool (an open source Chinese word segmentation tool, which can perform word segmentation and part-of-speech tagging on the input Chinese text). word2vec (a word vector model that maps each Chinese word into a high-dimensional vector (a 200-dimensional vector can be taken), assuming that 5 high-dimensional vectors are obtained, abcde, the 5 word vectors are divided into corresponding dimensions and dimension weights Add up to get the word vector of the segmented word) to get the word vector of each segmented word, and get the node vector of each node text to be processed based on the word vector of each segmented word.
步骤A33,根据所述每个待处理节点文本的节点向量计算每个待处理节点文本与其他待处理节点文本之间的第一节点距离;Step A33, calculating the first node distance between each to-be-processed node text and other to-be-processed node texts according to the node vector of each to-be-processed node text;
对于任意两个中文词汇,语义上越相近,映射后得到的向量距离也越近,因此可以根据词向量之间的距离来描述中文词汇的语义相似性。For any two Chinese words, the closer they are in terms of semantics, the closer the vector distance is after mapping. Therefore, the semantic similarity of Chinese words can be described according to the distance between word vectors.
在本实施例中,根据所述每个待处理节点文本的节点向量计算每个待处理节点文本与其他待处理节点文本之间的第一节点距离,具体地,获取预设节点文本皮尔逊相关系数的计算公式,根据所述每个待处理节点文本的节点向量与所述预设节点文本皮尔逊相关系数的计算公式计算两个待处理节点文本向量之间的节点文本皮尔逊相关系数,用θ表示两个待处理节点文本向量之间的节点文本皮尔逊相关系数,那么两个待处理节点文本之间的第一节点距离可以表示为1-(θ+1)/2,对每个待处理节点文本,依次计算该待处理节点文本,与其余所有待处理节点文本之间的距离。In this embodiment, the first node distance between each to-be-processed node text and other to-be-processed node texts is calculated according to the node vector of each to-be-processed node text, and specifically, the preset node text Pearson correlation is obtained. The calculation formula of the coefficient is to calculate the node text Pearson correlation coefficient between the two to-be-processed node text vectors according to the calculation formula of the node vector of each node text to be processed and the Pearson correlation coefficient of the preset node text, using θ represents the node text Pearson correlation coefficient between two to-be-processed node text vectors, then the first node distance between the two to-be-processed node texts can be expressed as 1-(θ+1)/2, for each to-be-processed node text The node text is processed, and the distances between the to-be-processed node text and all other to-be-processed node texts are calculated in turn.
步骤A34,将节点距离小于第一预设距离的两个待处理节点文本进行迭代嫁接处理,直至所述每个待处理节点文本处于节点文本关系边不再发生变化的收敛状态,其中,将所述处于收敛状态的各个待处理节点文本设为收敛节点文本;Step A34, performing iterative grafting processing on two to-be-processed node texts whose node distances are less than the first preset distance, until each of the to-be-processed node texts is in a convergent state in which the node-text relationship edges no longer change, wherein all The text of each pending node in the convergent state is set as the text of the convergent node;
步骤A35,基于所述收敛节点文本与所述收敛节点文本之间的节点文本关系边,生成所述预设的事理图谱。Step A35: Generate the preset event graph based on the node text relationship edge between the convergent node text and the convergent node text.
将节点距离小于第一预设距离的两个待处理节点文本进行迭代嫁接处理,直至所述每个待处理节点文本处于节点文本关系边不再发生变化的收敛状态,其中,将所述处于收敛状态的各个待处理节点文本设为收敛节点文本,具体地,例如,若发现待处理节点文本A与某个节点B之间的距离小于第一预设距离如小于0.3,则将待处理节点文本A的所有关系嫁接到待处理节点文本B上,同时删除待处理节点文本A,如图9所示,如果待处理节点文本A和待处理节点文本C之间的距离小于第一预设距离如小于0.3,则得到如图10所示的待处理节点文本之间的关系,将节点距离小于第一预设距离的两个待处理节点文本进行嫁接处理,直至所述待处理文本的每个待处理节点文本处于收敛状态,即是迭代执行将节点距离小于第一预设距离的两个待处理节点文本进行嫁接处理的这一计算过程,直至所述待处理文本的每个待处理节点文本处于收敛状态,因而各个待处理节点文本构成事理图谱的关系边(关系边界)不再发生变化,因而生成了有向边的事理图谱,即认为该图谱已经达到收敛状态,需要说明的是,在融合过程中,还需要处理事理图谱中表“并列”的关系边,例如,假设存在如下图11,将表“并列”的有向边的尾节点D,嫁接在该有向边的首节点B的父待处理节点文本A上,新建一条与A和B之间相同的有向边,来连接A和D,得到图12。Perform iterative grafting processing on two to-be-processed node texts whose node distances are less than the first preset distance, until each of the to-be-processed node texts is in a convergent state in which the relationship between the node texts does not change, wherein the Each node text to be processed in the state is set as the convergence node text. Specifically, for example, if it is found that the distance between the node text A to be processed and a certain node B is less than the first preset distance, such as less than 0.3, the node text to be processed is set. All the relationships of A are grafted onto the node text B to be processed, and the node text A to be processed is deleted at the same time, as shown in Figure 9, if the distance between the node text A to be processed and the node text C to be processed is smaller than the first preset distance as is less than 0.3, the relationship between the texts of the nodes to be processed as shown in Figure 10 is obtained, and the two texts of the nodes to be processed whose node distance is less than the first preset distance are grafted, until each of the texts to be processed is The processing node text is in a convergent state, that is, iteratively executes the calculation process of grafting two to-be-processed node texts whose node distance is less than the first preset distance, until each to-be-processed node text of the to-be-processed text is in Convergence state, so the relation edge (relation boundary) of each node text to be processed will not change any more, so an event graph of directed edges is generated, that is, it is considered that the graph has reached a convergent state. It should be noted that, in the fusion In the process, it is also necessary to deal with the relationship edge of the "parallel" table in the event graph. For example, suppose there is the following Figure 11, the tail node D of the "parallel" directed edge of the table is grafted to the head node B of the directed edge. On the parent to-be-processed node text A, create a new directed edge that is the same as that between A and B to connect A and D, and obtain Figure 12.
步骤S40,根据所述操作行为,从所述第一候选文本与所述第二候选文本中筛选出被选文本,并将所述被选文本推荐给所述目标用户。Step S40, according to the operation behavior, screen out the selected text from the first candidate text and the second candidate text, and recommend the selected text to the target user.
在得到第二候选文本以及第一候选文本后,从所述第一候选文本与所述第二候选文本中综合筛选出被选文本,将所述被选文本推荐给所述目标用户,而不只是从第一候选文筛选出被选文本,将所述被选文本推荐给所述目标用户。After the second candidate text and the first candidate text are obtained, the selected text is comprehensively screened from the first candidate text and the second candidate text, and the selected text is recommended to the target user without Only the selected text is filtered out from the first candidate text, and the selected text is recommended to the target user.
本发明监控目标用户的操作行为,并根据所述操作行为确定与目标用户关联的关键词;在获取关键词后,从预设的文本数据库集合中检索出一个以上的包含至少一个所述关键词的更新文本,作为第一候选文本;在获取第一候选文本后,调取预设的事理图谱,根据所述预设的事理图谱从所述预设的文本数据库集合中选取与所述第一候选文本的总关联度不小于预设关联阈值的更新文本,作为第二候选文本,第二候选文本的获取扩大了推荐过程中备选文本的选取范畴,所述事理图谱包含文本与文本之间的关联关系,各关联关系有其对应的关联度;根据所述操作行为,从所述第一候选文本与所述第二候选文本中筛选出被选文本,并将所述被选文本推荐给所述目标用户。即在本申请中,不是单一只从根据关键词搜索出来的第一候选文本中选出被选文本,而是从根据所述预设的事理图谱等得到的第二候选文本与第一候选文本集合中选出被选文本,因而避免了内容推荐的单一化,且由于本申请中综合参考文本与文本之间的关联关系进行内容的推荐而不是只是单一根据关键词进行推荐,因而本申请可以提升推荐准确率。The present invention monitors the operation behavior of the target user, and determines the keywords associated with the target user according to the operation behavior; after acquiring the keywords, retrieves one or more documents containing at least one of the keywords from a preset text database set The updated text is regarded as the first candidate text; after obtaining the first candidate text, the preset event map is called, and the preset event map is selected from the preset text database set according to the preset event map. The updated text whose total correlation degree of candidate text is not less than the preset correlation threshold is regarded as the second candidate text. The acquisition of the second candidate text expands the selection category of the candidate text in the recommendation process. Each association has its corresponding degree of association; according to the operation behavior, the selected text is screened out from the first candidate text and the second candidate text, and the selected text is recommended to the target user. That is, in this application, the selected text is not only selected from the first candidate text searched according to the keywords, but the second candidate text and the first candidate text obtained according to the preset event map, etc. The selected text is selected from the collection, thus avoiding the simplification of content recommendation, and because the association relationship between the reference text and the text in this application is used to recommend content instead of just recommending based on keywords alone, this application can Improve recommendation accuracy.
进一步地,在第一实施例的基础上,在本发明提供文本推荐方法另一实施例,所述调取预设的事理图谱,根据所述预设的事理图谱从所述预设的文本数据库集合中选取与所述第一候选文本的总关联度不小于预设关联阈值的更新文本,作为第二候选文本步骤包括:Further, on the basis of the first embodiment, the present invention provides another embodiment of the text recommendation method. The preset event map is retrieved, and the preset event map is retrieved from the preset text database according to the preset event map. Selecting an update text whose total degree of association with the first candidate text is not less than a preset association threshold in the set, as the second candidate text, the steps include:
步骤S31,调取预设的事理图谱,判断所述事理图谱中是否存在对应分句中包含所述关键词的收敛节点文本;Step S31, fetching a preset event map, and determining whether there is a convergence node text containing the keyword in the corresponding clause in the event map;
步骤S32,若存在,则将所述对应分句中包含所述关键词的收敛节点文本设为用户关注节点文本,并从所述预设的文本数据库在预设时间段内更新的文本中选取第一候选文本外的第三候选文本,通过预设分词工具对所述第三候选文本中每篇文本的标题进行分词处理,得到第三候选文本中每篇文本的标题向量;Step S32, if there is, set the convergence node text containing the keyword in the corresponding clause as the user's attention node text, and select from the text updated in the preset text database within the preset time period. For the third candidate text other than the first candidate text, a word segmentation process is performed on the title of each text in the third candidate text by using a preset word segmentation tool to obtain the title vector of each text in the third candidate text;
调取预设的事理图谱,判断所述事理图谱中是否存在对应分句中包含所述关键词的收敛节点文本,其中,对应分句中可以包含所述关键词,也可以不包含所述关键词,若所述事理图谱中不存在对应分句中包含所述关键词的收敛节点文本,则不进行后续处理,可以直接从第一候选文本中选取被选文本进行推荐,若所述事理图谱中存在对应分句中包含所述关键词的收敛节点文本,则将该待处理节点文本标注为“用户关注节点文本”,从所述预设的文本数据库在预设时间段内更新的文本中选取第一候选文本外的第三候选文本,通过预设分词工具对所述第三候选文本中每篇文本的标题进行分词处理,得到第三候选文本中每篇文本的标题向量,具体地,对第三候选文本中每篇文本的标题进行预设结巴分词,借助预设word2vec工具得到第三候选文本中每篇文本的标题向量,得到标题向量的目的在于计算第二节点距离。Retrieve a preset event graph, and determine whether there is a convergence node text that contains the keyword in the corresponding clause, wherein the corresponding clause may contain the keyword or may not contain the key word, if there is no convergent node text containing the keyword in the corresponding clause in the event graph, no subsequent processing is performed, and the selected text can be directly selected from the first candidate text for recommendation. If there is a convergence node text that contains the keyword in the corresponding clause, the node text to be processed is marked as "user concerned node text", and the text updated in the preset time period from the preset text database Select the third candidate text outside the first candidate text, perform word segmentation processing on the title of each text in the third candidate text by a preset word segmentation tool, and obtain the title vector of each text in the third candidate text, specifically, Perform preset stammer segmentation for the title of each text in the third candidate text, and obtain the title vector of each text in the third candidate text with the help of the preset word2vec tool. The purpose of obtaining the title vector is to calculate the second node distance.
步骤S33,计算所述标题向量与所述事理图谱中各收敛节点文本的节点向量之间的第二节点距离;Step S33, calculating the second node distance between the title vector and the node vector of each convergent node text in the event map;
步骤S34,从所述第三候选文本中选取第二节点距离小于第二预设距离,且所述小于第二预设距离对应的收敛节点文本为用户关注节点文本的第一目标文本,或者从所述第三候选文本中选取第二节点距离小于第二预设距离,且所述小于第二预设距离对应的收敛节点文本的预设筛选逻辑深度范围内存在用户关注节点文本的第二目标文本,其中,所述筛选逻辑深度根据所述事理图谱中所述各关联关系的关联度确定;Step S34, selecting from the third candidate text that the second node distance is less than the second preset distance, and the convergence node text corresponding to the less than the second preset distance is the first target text of the node text that the user pays attention to, or from In the third candidate text, the distance between the selected second node is less than the second preset distance, and there is a second target of the node text that the user pays attention to within the preset screening logic depth range of the convergent node text corresponding to the less than the second preset distance The text, wherein the screening logic depth is determined according to the degree of association of the associations in the event map;
计算所述标题向量与所述事理图谱中各收敛节点文本的节点向量之间的第二节点距离,若距离小于第二预设距离如小于0.4,且该待处理节点文本为“用户关注节点文本”,则确定标题向量对应该篇文本为第一目标文本,若距离小于第二预设距离如小于0.4,且在该待处理节点文本节点预设预设筛选逻辑深度范围内如逻辑深度为2范围内存在标注为“用户关注节点文本”的其他节点,则将该标题向量对应的文本保留下来作为第二目标文本,其中,所述筛选逻辑深度根据所述事理图谱中所述各关联关系的关联度确定,即筛选逻辑深度可以定义如下:表“并列”逻辑关系的边的逻辑深度记为0.5,表“顺承”逻辑关系的边的逻辑深度记为0.7,表“因果”逻辑关系和表“条件”逻辑关系的边的逻辑深度记为1,两个待处理节点文本之间的逻辑深度为节点之间所有边的逻辑深度之和,例如,该B待处理节点文本到用户关注节点文本C之间最快可以通过两条边实现关联,且该两条边分别表因果”逻辑关系和顺承”逻辑关系,则B待处理节点文本的筛选逻辑深度或者对应标题向量的该篇文本的逻辑深度为1.7,该1.7在逻辑深度为2范围内。Calculate the second node distance between the title vector and the node vector of each convergent node text in the event graph, if the distance is less than the second preset distance, such as less than 0.4, and the node text to be processed is "Users focus on node text" ”, then it is determined that the text corresponding to the title vector is the first target text, if the distance is less than the second preset distance, such as less than 0.4, and within the preset preset filtering logical depth range of the node text node to be processed, such as the logical depth is 2 If there are other nodes marked as "user-focused node text" in the range, the text corresponding to the title vector is reserved as the second target text, wherein the screening logic depth is based on the relationship between the relationships in the event map. The correlation degree is determined, that is, the logical depth of screening can be defined as follows: the logical depth of the edge of the table "parallel" logical relationship is recorded as 0.5, the logical depth of the edge of the table "sequential" logical relationship is recorded as 0.7, the table "causal" logical relationship and The logical depth of the edge of the logical relationship in the table "condition" is denoted as 1, and the logical depth between the texts of two nodes to be processed is the sum of the logical depths of all edges between the nodes. The fastest relationship between texts C can be achieved through two edges, and the two edges represent the “causal” logical relationship and the “successive” logical relationship respectively, then the screening logic depth of the node text to be processed in B or the corresponding title vector of the text of the text. The logical depth is 1.7, which is within the logical depth of 2.
步骤S35,将所述第一目标文本与所述第二目标文本设为所述第二候选文本。Step S35, set the first target text and the second target text as the second candidate text.
在得到第一目标文本以及第二目标文本后,将所述第一目标文本与所述第二目标文本设为所述第二候选文本。After the first target text and the second target text are obtained, the first target text and the second target text are set as the second candidate text.
在本实施例中,通过调取预设的事理图谱,判断所述事理图谱中是否存在对应分句中包含所述关键词的收敛节点文本;若存在,则将所述对应分句中包含所述关键词的收敛节点文本设为用户关注节点文本,并从所述预设的文本数据库在预设时间段内更新的文本中选取第一候选文本外的第三候选文本,通过预设分词工具对所述第三候选文本中每篇文本的标题进行分词处理,得到第三候选文本中每篇文本的标题向量;计算所述标题向量与所述事理图谱中各收敛节点文本的节点向量之间的第二节点距离;从所述第三候选文本中选取第二节点距离小于第二预设距离,且所述小于第二预设距离对应的收敛节点文本为用户关注节点文本的第一目标文本,或者从所述第三候选文本中选取第二节点距离小于第二预设距离,且所述小于第二预设距离对应的收敛节点文本的预设筛选逻辑深度范围内存在用户关注节点文本的第二目标文本,其中,所述筛选逻辑深度根据所述事理图谱中所述各关联关系的关联度确定;将所述第一目标文本与所述第二目标文本设为所述第二候选文本。本实施例实现准确获取第二候选文本,为实现准确的文本推荐奠定基础。In this embodiment, by retrieving a preset event graph, it is judged whether there is a convergence node text that contains the keyword in the corresponding clause; The convergence node text of the keyword is set as the user's attention node text, and the third candidate text except the first candidate text is selected from the texts updated in the preset text database within the preset time period, and the preset word segmentation tool is used. Perform word segmentation processing on the title of each text in the third candidate text to obtain the title vector of each text in the third candidate text; calculate the relationship between the title vector and the node vector of each convergent node text in the event map The second node distance is selected from the third candidate text, and the second node distance is less than the second preset distance, and the convergence node text corresponding to the less than the second preset distance is the first target text of the node text that the user pays attention to. , or select from the third candidate text that the second node distance is less than the second preset distance, and the preset screening logic depth range of the convergent node text corresponding to the less than the second preset distance contains the node text that the user is concerned about. The second target text, wherein the screening logic depth is determined according to the correlation degree of the respective associations in the matter graph; the first target text and the second target text are set as the second candidate texts . This embodiment realizes the accurate acquisition of the second candidate text, and lays a foundation for realizing accurate text recommendation.
进一步地,在上述实施例的基础上,在本发明提供文本推荐方法另一实施例,在该实施例中,所述根据所述操作行为,从所述第一候选文本与所述第二候选文本中筛选出被选文本,并将所述被选文本推荐给所述目标用户步骤包括:Further, on the basis of the above embodiment, the present invention provides another embodiment of the text recommendation method. In this embodiment, according to the operation behavior, the first candidate text and the second candidate text are The steps of filtering out the selected text from the text and recommending the selected text to the target user include:
步骤S41,获取所述第一候选文本与所述第二候选文本中每篇文本的传播量、并获取每篇文本与所述目标用户的相关度,根据所述操作行为获取所述目标用户的偏好度;Step S41, obtaining the dissemination volume of each text in the first candidate text and the second candidate text, and obtaining the correlation between each text and the target user, and obtaining the target user's information according to the operation behavior. preference;
在预设搜索引擎中对所述第一候选文本与所述第二候选文本中每篇文本的标题进行检索,以得到每篇文本的传播量,其中,每篇文本的传播量反映这篇文本的热度,本实施例中认为:拥有相同标题的两篇新闻属于同一篇新闻的两次转发,传播量的计算步骤可以如下:首先删掉第一候选文本与所述第二候选文本的标题中的所有标点符号(由于文本采集过程中,可能会将某些标点符号从半角修改为全角,此外,某些媒体在转发文本的时候,也会将部分标点符号从半角修改为全角,或从全角修改为半角,因此这里计算传播量的时候,不考虑标题中标点符号的差异),然后用删除所有标点符号后的该标题从预设搜索引擎中检索预设数目的文本如1000篇文本(通常情况下,一篇文本的最大转发量在百篇量级,不会超过1000篇),将检索出的1000篇文本的标题依次删掉标点符号,统计与当前第一候选文本与所述第二候选文本标题完全一致的标题数量,作为当前新闻的传播量α。Retrieve the title of each text in the first candidate text and the second candidate text in a preset search engine to obtain the spread of each text, wherein the spread of each text reflects the text In this embodiment, it is considered that: two news with the same title belong to two reposts of the same news, and the calculation steps of the spread can be as follows: first delete the title of the first candidate text and the second candidate text. All punctuation marks (due to the process of text collection, some punctuation marks may be modified from half-width to full-width, in addition, some media will also modify some punctuation from half-width to full-width, or from full-width when forwarding text Modified to half-width, so when calculating the spread here, the difference in punctuation in the title is not considered), and then use the title after removing all punctuation to retrieve a preset number of texts from the preset search engine, such as 1000 texts (usually In this case, the maximum forwarding volume of a text is on the order of 100, and it will not exceed 1000), delete the punctuation marks in the titles of the retrieved 1000 texts, and count the current first candidate text and the second candidate text. The number of titles that are completely consistent with the candidate text titles is used as the spread of the current news α.
其中,所述获取每篇文本与所述目标用户的相关度步骤包括:Wherein, the step of obtaining the relevance between each text and the target user includes:
步骤S41,获取所述关键词在所述第一候选文本的每篇文本中出现的次数,将所述次数设为词次数;Step S41, obtain the number of times that the keyword appears in each text of the first candidate text, and set the number of times as the number of words;
步骤S42,获取所述关键词在所述第一候选文本的每篇文本中出现的位置,将所述位置设为词位置,并获取所述词位置对应预设的位置权重,其中,词位置不同,位置权重不同,所述词位置包括文本首段首句位置,文本尾段首句位置,文本首段非首句位置、文本尾段非首句位置、非首段首句位置以及非尾段首句位置;Step S42, obtain the position where the keyword appears in each text of the first candidate text, set the position as a word position, and obtain a preset position weight corresponding to the word position, wherein the word position Different, the position weight is different, and the word position includes the position of the first sentence of the first paragraph of the text, the position of the first sentence of the last paragraph of the text, the position of the non-first sentence of the first paragraph of the text, the position of the non-first sentence of the last paragraph of the text, the position of the first sentence of the non-first paragraph and the non-tail position. the position of the first sentence of the paragraph;
步骤S43,获取所述关键词在所述第一候选文本的每篇文本中第一次和最后一次出现的位置之间间隔的句子数量与全文总句数的比值,将所述比值设为词跨度;Step S43: Obtain the ratio of the number of sentences between the positions where the keyword appears for the first time and the last time in each text of the first candidate text to the total number of sentences in the full text, and set the ratio as a word. span;
步骤S44,获取所述关键词在所述第一候选文本的每篇文本中第一次和最后一次出现的位置之间的目标正文,获取所述目标正文中平均每预设句数中包含所述关键词的数量,将平均每预设句数中包含所述关键词的数量设为词密度;Step S44, obtaining the target text between the positions where the keyword appears for the first time and the last time in each text of the first candidate text, and obtaining the average number of sentences in the target text that contains the The number of stated keywords, and the number of said keywords contained in the average number of preset sentences is set as the word density;
步骤S45,根据所述词次数、所述词位置对应预设的位置权重、所述词跨度与所述词密度获取所述第一候选文本中每篇文本的第一相关度;Step S45, obtaining the first relevance of each text in the first candidate text according to the word count, the preset position weight corresponding to the word position, the word span and the word density;
第一候选文本和第二候选文本的相关度计算不同,如图13所示。The correlation calculation of the first candidate text and the second candidate text is different, as shown in FIG. 13 .
第一候选文本的第一相关度计算如下:获取所述关键词在第一候选文本的每篇文本中出现的词次数、词位置、词跨度,词密度等,根据所述词次数、词位置、词跨度,词密度获取所述第一候选文本每篇文本的第一相关度,具体地,其中,词次数a:关键词在文本正文中出现的总数量;词位置b:初始时b的值为0,若关键词出现在了文本正文的首段首句或尾段首句,将b加2;若关键词出现在了文本正文的首段非首句或尾段非首句,将b加1;若关键词出现在了除首段和尾段外的其余段落的首句,将b加0.5;词跨度c:关键词在文本正文中第一次和最后一次出现的位置之间间隔的句子数量,与全文总句数的比值;词密度d:截取关键词在正文中第一次和最后一次出现的位置之间的正文,在这部分正文中,平均每预设量级句如每10句话中包含的关键词的数量定义为词密度d,那么相关度计算公式为:β=0.3a+b+0.1c*d。The first correlation degree of the first candidate text is calculated as follows: obtaining the word count, word position, word span, word density, etc. of the keyword appearing in each text of the first candidate text, and according to the word count, word position, etc. , word span, word density to obtain the first relevance of each text of the first candidate text, specifically, the number of words a: the total number of keywords appearing in the text body; word position b: the initial time b The value is 0. If the keyword appears in the first sentence of the first paragraph or the first sentence of the last paragraph of the text, add 2 to b; if the keyword appears in the first sentence or the last sentence of the text, add 2 b plus 1; if the keyword appears in the first sentence of the rest of the paragraphs except the first and last paragraphs, add 0.5 to b; word span c: the keyword is between the first and last occurrences of the keyword in the text The ratio of the number of sentences in the interval to the total number of sentences in the full text; word density d: intercept the text between the first and last occurrences of keywords in the text. In this part of the text, the average number of sentences per preset magnitude is If the number of keywords contained in every 10 sentences is defined as the word density d, the correlation calculation formula is: β=0.3a+b+0.1c*d.
步骤S46,获取所述第二候选文本中每篇文本的筛选逻辑深度,根据所述筛选逻辑深度确定所述第二候选文本中每篇文本的第二相关度。Step S46: Obtain the screening logic depth of each text in the second candidate text, and determine the second relevance of each text in the second candidate text according to the screening logic depth.
第二候选文本中不包含关键词,因而,根据筛选逻辑深度确定所述第二候选文本每篇文本的第二相关度,具体地,第二相关度定义为β=6*0.8l,其中l表示筛选逻辑深度,具体地,筛选逻辑深度指的是筛选第二候选文本过程中从事理图谱的最初的节点文本到对应用户关注节点文本之间存在的逻辑深度或者其包含的最少逻辑边的深度等。The second candidate text does not contain keywords, therefore, the second degree of relevancy of each text of the second candidate text is determined according to the depth of the screening logic. Specifically, the second degree of relevancy is defined as β=6*0.8 l , where l Indicates the logical depth of screening, specifically, the logical depth of screening refers to the logical depth that exists between the initial node text of the political graph to the corresponding user-focused node text or the depth of the least logical edges it contains in the process of screening the second candidate text Wait.
所述根据所述操作行为获取所述目标用户的偏好度度步骤包括:The step of acquiring the preference degree of the target user according to the operation behavior includes:
步骤S47,从所述操作行为中获取所述目标用户的历史浏览文本,获取所述历史浏览文本中每篇文本的第一文档向量,并获取所述第一候选文本与所述第二候选文本中每篇文本的第二文档向量;Step S47, obtaining the historical browsing text of the target user from the operation behavior, obtaining the first document vector of each text in the historical browsing text, and obtaining the first candidate text and the second candidate text the second document vector for each text in ;
步骤S48,获取所述第二文档向量与所述第一文档向量之间的第一皮尔逊相关系数,根据所述第一皮尔逊相关系数获取所述目标用户的偏好度。Step S48: Obtain a first Pearson correlation coefficient between the second document vector and the first document vector, and obtain the preference of the target user according to the first Pearson correlation coefficient.
本实施例中,获取所述目标用户文本的历史浏览文本,历史浏览文本可以为过去一个月内的历史浏览文本,获取所述历史浏览文本中每篇文本的第一文档向量,并获取所述第一候选文本与所述第二候选文本中每篇文本的第二文档向量,获取所述第二文档向量与所述第一文档向量之间的第一皮尔逊相关系数,根据所述第一皮尔逊相关系数获取所述目标用户的偏好度,根据所述历史浏览文本与所述第一候选文本与所述第二候选文本获取用户偏好度。In this embodiment, the historical browsing text of the target user text is obtained, and the historical browsing text may be the historical browsing text in the past month, the first document vector of each text in the historical browsing text is obtained, and the the second document vector of each text in the first candidate text and the second candidate text, and obtain the first Pearson correlation coefficient between the second document vector and the first document vector, according to the first The Pearson correlation coefficient is used to obtain the preference of the target user, and the user preference is obtained according to the historical browsing text, the first candidate text and the second candidate text.
步骤S42,根据所述传播量、所述相关度与所述偏好度,从所述第一候选文本与所述第二候选文本中筛选出被选文本,并将所述被选文本推荐给所述目标用户。Step S42: Screen out the selected text from the first candidate text and the second candidate text according to the communication amount, the correlation degree and the preference degree, and recommend the selected text to the selected text. describe the target users.
本实施例中,综合所述传播量、所述相关度与所述偏好度,从所述第一候选文本与所述第二候选文本中筛选出被选文本,并将所述被选文本推荐给所述目标用户。In this embodiment, the selected text is screened out from the first candidate text and the second candidate text, and the selected text is recommended to the target user.
在本实施例中,通过获取所述第一候选文本与所述第二候选文本中每篇文本的传播量、并获取每篇文本与所述目标用户的相关度,根据所述操作行为获取所述目标用户的偏好度;根据所述传播量、所述相关度与所述偏好度,从所述第一候选文本与所述第二候选文本中筛选出被选文本,并将所述被选文本推荐给所述目标用户,本实施例中考量三种因素进行被选文本的筛选,因而实现提升推荐的准确率。In this embodiment, by acquiring the dissemination volume of each text in the first candidate text and the second candidate text, and acquiring the relevance of each text and the target user, the information is acquired according to the operation behavior. the preference of the target user; according to the spread, the relevancy and the preference, screen out the selected text from the first candidate text and the second candidate text, and select the selected text from the first candidate text and the second candidate text. The text is recommended to the target user. In this embodiment, three factors are considered to screen the selected text, thereby improving the accuracy of the recommendation.
进一步地,在上述实施例的基础上,在本发明提供文本推荐方法另一实施例,在该实施例中,所述获取所述历史浏览文本中每篇文本的第一文档向量步骤包括:Further, on the basis of the above embodiment, the present invention provides another embodiment of a text recommendation method. In this embodiment, the step of obtaining the first document vector of each text in the historical browsing text includes:
步骤B1,根据预设聚类算法获取所述历史浏览文本中每篇文本被划分在第一预设类别下的第一概率矩阵;Step B1, obtaining, according to a preset clustering algorithm, a first probability matrix in which each text in the historical browsing text is divided into a first preset category;
步骤B2,根据预设分词算法获取所述历史浏览文本中每篇文本的分词词语,获取所述分词词语被划分在第二预设类别下的第二概率矩阵;Step B2, obtaining the word segmentation of each text in the historical browsing text according to a preset word segmentation algorithm, and obtaining a second probability matrix in which the word segmentation is divided into a second preset category;
步骤B3,根据所述第一概率矩阵与所述第二概率矩阵获取所述历史浏览文本中每篇文本对应的各个优化词向量;Step B3, obtaining each optimized word vector corresponding to each text in the historical browsing text according to the first probability matrix and the second probability matrix;
步骤B4,根据所述优化词向量获取所述历史浏览文本中每篇文本的第一文档向量。Step B4, obtaining the first document vector of each text in the historical browsing text according to the optimized word vector.
根据预设聚类算法获取所述历史浏览文本中每篇文本被划分在第一预设类别下(包括200个文本子类别数量)的第一概率矩阵,具体地,可以用LDA(Latent DirichletAllocation,隐含狄利克雷分布)算法对历史浏览文本进行无监督聚类,(聚类数量可以设为200个),以获取每篇文本被划分在第一预设类别下的第一概率矩阵p,根据预设分词算法获取所述历史浏览文本中每篇文本的分词词语,获取所述分词词语被划分在第二预设类别下(包括200个词语子类别数量)的第二概率矩阵q,根据所述第一概率矩阵与所述第二概率矩阵获取所述历史浏览文本中每篇文本对应的各个优化词向量W,W=0.6p+0.4q,根据所述优化词向量获取所述历史浏览文本中每篇文本的第一文档向量。具体地,将所有分词的优化词向量相加后得到对应新闻的文档向量。Obtain a first probability matrix in which each text in the historical browsing text is divided into a first preset category (including the number of 200 text subcategories) according to a preset clustering algorithm. Specifically, LDA (Latent DirichletAllocation, Implicit Dirichlet distribution) algorithm performs unsupervised clustering of historical browsing texts (the number of clusters can be set to 200) to obtain the first probability matrix p that each text is divided into the first preset category, Obtain the word segmentation of each text in the historical browsing text according to the preset word segmentation algorithm, and obtain the second probability matrix q in which the word segmentation is divided into the second preset category (including the number of sub-categories of 200 words), according to The first probability matrix and the second probability matrix obtain each optimized word vector W corresponding to each text in the historical browsing text, W=0.6p+0.4q, and obtain the historical browsing according to the optimized word vector The first document vector for each text in the text. Specifically, after adding the optimized word vectors of all the segmented words, the document vector of the corresponding news is obtained.
获取所述第二文档向量与所述第一文档向量之间的第一皮尔逊相关系数,根据所述第一皮尔逊相关系数获取所述目标用户的偏好度,obtaining the first Pearson correlation coefficient between the second document vector and the first document vector, and obtaining the preference of the target user according to the first Pearson correlation coefficient,
获取所述第二文档向量与所述第一文档向量之间的第一皮尔逊相关系数,根据所述第一皮尔逊相关系数获取所述目标用户的偏好度,首先从预设的文本数据库集合中检索出当前目标用户在历史上点击浏览过的文本集合(N1,N2,.....Nk),总共k篇文本。然后将第一候选文本与所述第二候选文本与检索出的k篇文本如新闻,分别进行预设结巴分词处理,将所有分词的词向量相加后得到对应文本的文档向量V。用(V1,V2,.....Vk)表示历史浏览过的文本集合(N1,N2,.....Nk)中各篇文本的文档向量,用Vv表示当前第一候选文本与所述第二候选文本中每篇文本的文档向量,用ρ(a,b)表示两个文档向量(a,b)之间的第一皮尔逊相关系数,那么第一候选文本与所述第二候选文本的用户偏好度可以表示为Obtain the first Pearson correlation coefficient between the second document vector and the first document vector, and obtain the preference of the target user according to the first Pearson correlation coefficient, first from a preset text database collection Retrieve the set of texts (N 1 , N2, ..... Nk) that the current target user has clicked and browsed in history, with a total of k texts. Then, the first candidate text, the second candidate text and the retrieved k texts such as news are respectively subjected to preset stuttering word segmentation processing, and the word vectors of all word segmentations are added to obtain the document vector V of the corresponding text. Let (V 1 ,V2,.....Vk) represent the document vector of each text in the historically browsed text set (N 1 ,N2,.....Nk), and let Vv represent the current first candidate text and the document vector of each text in the second candidate text, use ρ(a, b) to represent the first Pearson correlation coefficient between the two document vectors (a, b), then the first candidate text and the The user preference of the second candidate text can be expressed as
其中,公式中的Vv表示第一候选文本与所述第二候选文本中各篇文本的文档向量,Vj表示历史浏览过的文本集合中各篇文本的文档向量。Wherein, Vv in the formula represents the document vector of each text in the first candidate text and the second candidate text, and Vj represents the document vector of each text in the historically browsed text set.
其中,所述获取所述第二文档向量与所述第一文档向量之间的第一皮尔逊相关系数,根据所述第一皮尔逊相关系数获取所述目标用户的偏好度步骤包括:The step of obtaining the first Pearson correlation coefficient between the second document vector and the first document vector, and obtaining the preference degree of the target user according to the first Pearson correlation coefficient includes:
步骤C1,获取所述历史浏览文本中每篇文本被点击浏览时至当前时刻之间的历史浏览时间;Step C1, obtaining the historical browsing time between the time when each text in the historical browsing text is clicked to browse to the current moment;
步骤C2,获取所述第二文档向量与所述第一文档向量之间的第一皮尔逊相关系数,根据所述历史浏览时间对所述第一皮尔逊相关系数进行兴趣降权处理,得到第二皮尔逊相关系数;Step C2: Obtain the first Pearson correlation coefficient between the second document vector and the first document vector, and perform interest reduction processing on the first Pearson correlation coefficient according to the historical browsing time, to obtain the first Pearson correlation coefficient. Two Pearson correlation coefficient;
步骤C3,根据所述第二皮尔逊相关系数获取所述目标用户的偏好度。Step C3, obtaining the preference of the target user according to the second Pearson correlation coefficient.
考虑到目标用户的偏好可能会随着时间发生很大的偏移。例如,某运营人员在1周前最关心的文本是公司的新产品发布会,而当前最关心的是文本是大众对于公司新产品的评价等。因此需要对用户在历史上点击浏览过的文本做时间上的兴趣降权处理,首先获取所述历史浏览文本中每篇文本被点击浏览时至当前时刻之间的历史浏览时间,具体地,用tk表示文本Nk是在tk天以前被点击浏览即历史浏览时间为tk,获取所述第二文档向量与所述第一文档向量之间的第一皮尔逊相关系数,根据所述历史浏览时间对所述第一皮尔逊相关系数进行兴趣降权处理,得到第二皮尔逊相关系数,根据所述第二皮尔逊相关系数获取所述目标用户的偏好度,那么第一候选文本与所述第二候选文本的用户偏好度最终表示为Take into account that the preferences of target users may shift significantly over time. For example, a week ago, the text that an operator cared about most was the company's new product launch conference, and the text that he cared most about now was the public's evaluation of the company's new products. Therefore, it is necessary to perform time interest reduction processing on the texts that the user has clicked and browsed in the history. First, obtain the historical browsing time between the time when each text in the historically browsed text is clicked and browsed to the current moment. Specifically, use t k indicates that the text N k was clicked and browsed before t k days, that is, the historical browsing time is t k , and the first Pearson correlation coefficient between the second document vector and the first document vector is obtained, according to the The historical browsing time performs interest reduction processing on the first Pearson correlation coefficient to obtain a second Pearson correlation coefficient, and obtains the preference of the target user according to the second Pearson correlation coefficient, then the first candidate text and The user preference of the second candidate text is finally expressed as
在本实施例中,通过获取所述历史浏览文本中每篇文本被点击浏览时至当前时刻之间的历史浏览时间;获取所述第二文档向量与所述第一文档向量之间的第一皮尔逊相关系数,根据所述历史浏览时间对所述第一皮尔逊相关系数进行兴趣降权处理,得到第二皮尔逊相关系数;根据所述第二皮尔逊相关系数获取所述目标用户的偏好度。本实施例,实现准确获取目标用户的偏好度,为准确进行推荐奠定基础。In this embodiment, by obtaining the historical browsing time between the time when each piece of historical browsing text is clicked and browsing to the current moment; the first document vector between the second document vector and the first document vector is obtained. Pearson correlation coefficient, performing interest downweight processing on the first Pearson correlation coefficient according to the historical browsing time to obtain a second Pearson correlation coefficient; obtaining the preference of the target user according to the second Pearson correlation coefficient Spend. In this embodiment, the preference of the target user can be accurately obtained, which lays a foundation for accurate recommendation.
进一步地,在上述实施例的基础上,在本发明提供文本推荐方法另一实施例,在该实施例中,所述根据所述传播量、所述相关度与所述偏好度,从所述第一候选文本与所述第二候选文本中筛选出被选文本,并将所述被选文本推荐给所述目标用户步骤包括:Further, on the basis of the above-mentioned embodiment, the present invention provides another embodiment of the text recommendation method. In this embodiment, the The steps of filtering out the selected text from the first candidate text and the second candidate text, and recommending the selected text to the target user include:
步骤D1,根据所述传播量、所述第一相关度、所述第二相关度与所述偏好度,计算所述第一候选文本与所述第二候选文本中每篇文本的价值分数;Step D1, calculating the value score of each text in the first candidate text and the second candidate text according to the spread, the first correlation, the second correlation and the preference;
步骤D2,根据所述价值分数从高至低依次选取预设数量的文本作为被选文本,并将所述被选文本推荐给所述目标用户。Step D2, selecting a preset number of texts in order from high to low according to the value score as selected texts, and recommending the selected texts to the target user.
根据所述传播量、所述第一相关度、所述第二相关度与所述用户偏好度从所述第一候选文本与所述第二候选文本中选取被选文本,并将所述被选文本推送给目标用户,具体地,通过预设计算公式如s=0.7αβ+0.3γ得到第一候选文本与所述第二候选文本中每篇文本的价值分数,根据所述价值分数从高至低依次选取预设数量的文本作为被选文本,并将所述被选文本推荐给所述目标用户,如选择分数最大的10篇新闻作为目标内容推送给目标用户,可以每天进行一次推送。The selected text is selected from the first candidate text and the second candidate text according to the spread, the first relevancy, the second relevancy and the user preference, and the selected text is The selected text is pushed to the target user. Specifically, the value score of each text in the first candidate text and the second candidate text is obtained through a preset calculation formula such as s=0.7αβ+0.3γ. A preset number of texts are selected in sequence at the lowest level as selected texts, and the selected texts are recommended to the target user. For example, 10 news articles with the highest scores are selected as the target content to be pushed to the target user, which can be pushed once a day.
在本实施例中,通过根据所述传播量、所述第一相关度、所述第二相关度与所述偏好度,计算所述第一候选文本与所述第二候选文本中每篇文本的价值分数;根据所述价值分数从高至低依次选取预设数量的文本作为被选文本,并将所述被选文本推荐给所述目标用户。本实施例中根据价值分数进行文本的精准推荐。In this embodiment, each piece of text in the first candidate text and the second candidate text is calculated according to the spread, the first correlation degree, the second correlation degree, and the preference degree value score; according to the value score, a preset number of texts are sequentially selected as selected texts from high to low, and the selected texts are recommended to the target user. In this embodiment, accurate recommendation of text is performed according to the value score.
参照图3,图3是本发明实施例方案涉及的硬件运行环境的设备结构示意图。Referring to FIG. 3 , FIG. 3 is a schematic diagram of a device structure of a hardware operating environment involved in an embodiment of the present invention.
本发明实施例文本推荐设备可以是PC,也可以是智能手机、平板电脑、便携计算机等终端设备。The text recommendation device in the embodiment of the present invention may be a PC, and may also be a terminal device such as a smart phone, a tablet computer, and a portable computer.
如图3所示,该文本推荐设备可以包括:处理器1001,例如CPU,存储器1005,通信总线1002。其中,通信总线1002用于实现处理器1001和存储器1005之间的连接通信。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储设备。As shown in FIG. 3 , the text recommendation device may include: a
可选地,该文本推荐设备还可以包括目标用户接口、网络接口、摄像头、RF(RadioFrequency,射频)电路,传感器、音频电路、WiFi模块等等。目标用户接口可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选目标用户接口还可以包括标准的有线接口、无线接口。网络接口可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。Optionally, the text recommendation device may further include a target user interface, a network interface, a camera, an RF (Radio Frequency, radio frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like. The target user interface may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional target user interface may also include a standard wired interface and a wireless interface. Optional network interfaces may include standard wired interfaces and wireless interfaces (eg, WI-FI interfaces).
本领域技术人员可以理解,图3中示出的文本推荐设备结构并不构成对文本推荐设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the structure of the text recommendation device shown in FIG. 3 does not constitute a limitation on the text recommendation device, and may include more or less components than the one shown, or combine some components, or different components layout.
如图3所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块以及文本推荐程序。操作系统是管理和控制文本推荐设备硬件和软件资源的程序,支持文本推荐程序以及其它软件和/或程序的运行。网络通信模块用于实现存储器1005内部各组件之间的通信,以及与文本推荐设备中其它硬件和软件之间通信。As shown in FIG. 3 , the
在图3所示的文本推荐设备中,处理器1001用于执行存储器1005中存储的文本推荐程序,实现上述任一项所述的文本推荐方法的步骤。In the text recommendation device shown in FIG. 3 , the
本发明文本推荐设备具体实施方式与上述文本推荐方法各实施例基本相同,在此不再赘述。The specific implementation manner of the text recommendation device of the present invention is basically the same as the above-mentioned embodiments of the text recommendation method, and will not be repeated here.
此外,本发明实施例还提出一种文本推荐装置,所述文本推荐装置包括:In addition, an embodiment of the present invention also provides a text recommendation device, where the text recommendation device includes:
监控模块,用于监控目标用户的操作行为,根据所述操作行为确定与目标用户关联的关键词;a monitoring module for monitoring the operation behavior of the target user, and determining the keywords associated with the target user according to the operation behavior;
检索模块,用于从预设的文本数据库集合中检索出一个以上的包含至少一个所述关键词的更新文本,作为第一候选文本;a retrieval module, used for retrieving more than one updated text containing at least one of the keywords from a preset text database set, as the first candidate text;
选取模块,用于调取预设的事理图谱,根据所述预设的事理图谱从所述预设的文本数据库集合中选取与所述第一候选文本的总关联度不小于预设关联阈值的更新文本,作为第二候选文本,所述事理图谱包含文本与文本之间的关联关系,各关联关系有其对应的关联度;The selection module is used for retrieving a preset affair atlas, and according to the preset affair atlas, selects from the preset text database set the ones whose total degree of association with the first candidate text is not less than a preset association threshold. Update the text, as the second candidate text, the event graph includes the association relationship between the text and the text, and each association relationship has its corresponding association degree;
筛选模块,用于根据所述操作行为,从所述第一候选文本与所述第二候选文本中筛选出被选文本,并将所述被选文本推荐给所述目标用户。The screening module is configured to screen out the selected text from the first candidate text and the second candidate text according to the operation behavior, and recommend the selected text to the target user.
可选地,所述文本推荐装置还包括:Optionally, the text recommendation device further includes:
采集模块,用于每间隔预设时间段从所述预设的文本数据库集合中采集待处理文本;a collection module, used for collecting texts to be processed from the preset text database set at every preset time period;
预处理模块,用于通过预设正则表达式对所述待处理文本进行html标签过滤、符号过滤以及分句处理,得到分句列表构成的预处理文本;a preprocessing module, configured to perform html tag filtering, symbol filtering and clause processing on the text to be processed by using a preset regular expression to obtain preprocessed text formed by a clause list;
生成模块,用于根据所述预处理文本生成所述预设的事理图谱。A generating module, configured to generate the preset event graph according to the preprocessed text.
可选地,所述生成模块包括:Optionally, the generation module includes:
识别单元,用于对所述分句列表中的每条分句进行多个预设文本关联关系的识别,得到待处理节点文本,其中,所述预设文本关联关系包括但不限于顺承、因果、条件以及并列关系;An identification unit, configured to identify multiple preset text associations for each clause in the clause list, and obtain node texts to be processed, wherein the preset text associations include but are not limited to Shuncheng, causality, condition and juxtaposition;
第一获取单元,用于通过预设分词工具对所述待处理节点文本进行分词处理,并获取每个分词的词向量,基于每个分词的词向量得到每个待处理节点文本的节点向量;a first obtaining unit, configured to perform word segmentation processing on the node text to be processed by using a preset word segmentation tool, and obtain the word vector of each word segmentation, and obtain the node vector of each node text to be processed based on the word vector of each word segmentation;
第一计算单元,用于根据所述每个待处理节点文本的节点向量计算每个待处理节点文本与其他待处理节点文本之间的第一节点距离;a first computing unit, configured to calculate the first node distance between each to-be-processed node text and other to-be-processed node texts according to the node vector of each to-be-processed node text;
嫁接处理单元,用于将节点距离小于第一预设距离的两个待处理节点文本进行迭代嫁接处理,直至所述每个待处理节点文本处于节点文本关系边不再发生变化的收敛状态,其中,将所述处于收敛状态的各个待处理节点文本设为收敛节点文本;The grafting processing unit is used for iterative grafting processing of two to-be-processed node texts whose node distances are less than the first preset distance, until each of the to-be-processed node texts is in a convergent state where the relationship edges of the node texts no longer change, wherein , setting the text of each node to be processed in the convergent state as the text of the convergent node;
生成单元,用于基于所述收敛节点文本与所述收敛节点文本之间的节点文本关系边,生成所述预设的事理图谱。A generating unit, configured to generate the preset event graph based on the node text relationship edges between the convergent node text and the convergent node text.
可选地,所述选取模块包括:Optionally, the selection module includes:
调取单元,用于调取预设的事理图谱,判断所述事理图谱中是否存在对应分句中包含所述关键词的收敛节点文本;A retrieving unit, used for retrieving a preset event map, and judging whether there is a convergence node text that includes the keyword in the corresponding clause in the event map;
第一设置单元,用于若存在,则将所述对应分句中包含所述关键词的收敛节点文本设为用户关注节点文本,并从所述预设的文本数据库在预设时间段内更新的文本中选取第一候选文本外的第三候选文本,通过预设分词工具对所述第三候选文本中每篇文本的标题进行分词处理,得到第三候选文本中每篇文本的标题向量;The first setting unit is configured to set the convergence node text containing the keyword in the corresponding clause as the user's attention node text, if it exists, and update it from the preset text database within a preset time period In the text, select the third candidate text outside the first candidate text, carry out word segmentation processing to the title of each text in the third candidate text by the preset word segmentation tool, and obtain the title vector of each text in the third candidate text;
第二计算单元,用于计算所述标题向量与所述事理图谱中各收敛节点文本的节点向量之间的第二节点距离;a second computing unit for computing the second node distance between the title vector and the node vector of each convergent node text in the event graph;
选取单元,用于从所述第三候选文本中选取第二节点距离小于第二预设距离,且所述小于第二预设距离对应的收敛节点文本为用户关注节点文本的第一目标文本,或者从所述第三候选文本中选取第二节点距离小于第二预设距离,且所述小于第二预设距离对应的收敛节点文本的预设筛选逻辑深度范围内存在用户关注节点文本的第二目标文本,其中,所述筛选逻辑深度根据所述事理图谱中所述各关联关系的关联度确定;a selection unit, configured to select from the third candidate texts that the second node distance is less than the second preset distance, and the convergent node text corresponding to the less than the second preset distance is the first target text of the node text that the user pays attention to, Or select the second node distance from the third candidate text that is less than the second preset distance, and the preset screening logic depth range of the convergence node text corresponding to the less than the second preset distance exists. Two target texts, wherein the screening logic depth is determined according to the degree of association of the associations in the event map;
第二设置单元,用于将所述第一目标文本与所述第二目标文本设为所述第二候选文本。A second setting unit, configured to set the first target text and the second target text as the second candidate text.
可选地,所述筛选模块包括:Optionally, the screening module includes:
第二获取单元,用于获取所述第一候选文本与所述第二候选文本中每篇文本的传播量、并获取每篇文本与所述目标用户的相关度,根据所述操作行为获取所述目标用户的偏好度;The second obtaining unit is configured to obtain the dissemination volume of each text in the first candidate text and the second candidate text, obtain the correlation between each text and the target user, and obtain the information according to the operation behavior. Describe the preferences of target users;
推荐单元,用于根据所述传播量、所述相关度与所述偏好度,从所述第一候选文本与所述第二候选文本中筛选出被选文本,并将所述被选文本推荐给所述目标用户。a recommending unit, configured to filter out the selected text from the first candidate text and the second candidate text according to the spread, the relevance and the preference, and recommend the selected text to the target user.
可选地,所述第二获取单元包括:Optionally, the second obtaining unit includes:
第一获取子单元,用于获取所述关键词在所述第一候选文本的每篇文本中出现的次数,将所述次数设为词次数;A first acquisition subunit, for acquiring the number of times the keyword appears in each text of the first candidate text, and setting the number of times as the number of words;
第二获取子单元,用于获取所述关键词在所述第一候选文本的每篇文本中出现的位置,将所述位置设为词位置,并获取所述词位置对应预设的位置权重,其中,词位置不同,位置权重不同,所述词位置包括文本首段首句位置,文本尾段首句位置,文本首段非首句位置、文本尾段非首句位置、非首段首句位置以及非尾段首句位置;The second obtaining subunit is configured to obtain the position where the keyword appears in each text of the first candidate text, set the position as a word position, and obtain a preset position weight corresponding to the word position , wherein the word position is different, the position weight is different, and the word position includes the position of the first sentence of the first paragraph of the text, the position of the first sentence of the last paragraph of the text, the position of the non-first sentence of the first paragraph of the text, the position of the non-first sentence of the last paragraph of the text, and the first sentence of the non-first paragraph. Sentence position and non-final first sentence position;
第三获取子单元,用于获取所述关键词在所述第一候选文本的每篇文本中第一次和最后一次出现的位置之间间隔的句子数量与全文总句数的比值,将所述比值设为词跨度;The third obtaining subunit is used to obtain the ratio of the number of sentences between the positions where the keyword appears for the first time and the last time in each text of the first candidate text to the total number of sentences in the full text, The ratio is set to the word span;
第四获取子单元,用于获取所述关键词在所述第一候选文本的每篇文本中第一次和最后一次出现的位置之间的目标正文,获取所述目标正文中平均每预设句数中包含所述关键词的数量,将平均每预设句数中包含所述关键词的数量设为词密度;The fourth obtaining subunit is used to obtain the target text between the positions where the keyword appears for the first time and the last time in each text of the first candidate text, and obtain the average per preset value in the target text. The number of the keywords contained in the number of sentences, and the number of the keywords contained in the average number of sentences per preset is set as the word density;
第五获取子单元,用于根据所述词次数、所述词位置对应预设的位置权重、所述词跨度与所述词密度获取所述第一候选文本中每篇文本的第一相关度;The fifth obtaining subunit is used to obtain the first correlation degree of each text in the first candidate text according to the word count, the preset position weight corresponding to the word position, the word span and the word density ;
第六获取子单元,用于获取所述第二候选文本中每篇文本的筛选逻辑深度,根据所述筛选逻辑深度确定所述第二候选文本中每篇文本的第二相关度。The sixth obtaining subunit is configured to obtain the screening logic depth of each text in the second candidate text, and determine the second relevancy degree of each text in the second candidate text according to the screening logic depth.
可选地,所述第二获取单元包括:Optionally, the second obtaining unit includes:
第七获取子单元,用于从所述操作行为中获取所述目标用户的历史浏览文本,获取所述历史浏览文本中每篇文本的第一文档向量,并获取所述第一候选文本与所述第二候选文本中每篇文本的第二文档向量;The seventh obtaining subunit is used to obtain the historical browsing text of the target user from the operation behavior, obtain the first document vector of each text in the historical browsing text, and obtain the first candidate text and all the texts. Describe the second document vector of each text in the second candidate text;
第八获取子单元,用于获取所述第二文档向量与所述第一文档向量之间的第一皮尔逊相关系数,根据所述第一皮尔逊相关系数获取所述目标用户的偏好度。The eighth obtaining subunit is configured to obtain the first Pearson correlation coefficient between the second document vector and the first document vector, and obtain the preference degree of the target user according to the first Pearson correlation coefficient.
可选地,所述第七获取子单元用于实现:Optionally, the seventh acquisition subunit is used to realize:
根据预设聚类算法获取所述历史浏览文本中每篇文本被划分在第一预设类别下的第一概率矩阵;Obtain a first probability matrix in which each text in the historical browsing text is classified under a first preset category according to a preset clustering algorithm;
根据预设分词算法获取所述历史浏览文本中每篇文本的分词词语,获取所述分词词语被划分在第二预设类别下的第二概率矩阵;Obtain the word segmentation of each text in the historical browsing text according to a preset word segmentation algorithm, and obtain a second probability matrix in which the word segmentation is divided into a second preset category;
根据所述第一概率矩阵与所述第二概率矩阵获取所述历史浏览文本中每篇文本对应的各个优化词向量;Obtain each optimized word vector corresponding to each text in the historical browsing text according to the first probability matrix and the second probability matrix;
根据所述优化词向量获取所述历史浏览文本中每篇文本的第一文档向量。The first document vector of each text in the historical browsing text is obtained according to the optimized word vector.
可选地,所述第八获取子单元用于实现:Optionally, the eighth acquisition subunit is used to implement:
获取所述历史浏览文本中每篇文本被点击浏览时至当前时刻之间的历史浏览时间;Obtain the historical browsing time between the time when each text in the historical browsing text is clicked to browse to the current moment;
获取所述第二文档向量与所述第一文档向量之间的第一皮尔逊相关系数,根据所述历史浏览时间对所述第一皮尔逊相关系数进行兴趣降权处理,得到第二皮尔逊相关系数;Obtain the first Pearson correlation coefficient between the second document vector and the first document vector, and perform interest downweight processing on the first Pearson correlation coefficient according to the historical browsing time to obtain a second Pearson correlation coefficient correlation coefficient;
根据所述第二皮尔逊相关系数获取所述目标用户的偏好度。The preference of the target user is acquired according to the second Pearson correlation coefficient.
可选地,所述筛选模块包括:Optionally, the screening module includes:
第三计算单元,用于根据所述传播量、所述第一相关度、所述第二相关度与所述偏好度,计算所述第一候选文本与所述第二候选文本中每篇文本的价值分数;a third calculation unit, configured to calculate each text in the first candidate text and the second candidate text according to the spread, the first degree of relevancy, the second degree of relevancy and the preference degree value score;
筛选单元,用于根据所述价值分数从高至低依次选取预设数量的文本作为被选文本,并将所述被选文本推荐给所述目标用户。所述文本推荐装置具体实施方式与上述文本推荐方法各实施例基本相同,在此不再赘述。A screening unit, configured to sequentially select a preset number of texts as selected texts according to the value score from high to low, and recommend the selected texts to the target user. The specific implementation manner of the text recommendation apparatus is basically the same as that of the above-mentioned text recommendation method embodiments, and will not be repeated here.
此外,本发明实施例还提出一种文本推荐设备,设备包括:存储器109、处理器110及存储在存储器109上并可在处理器110上运行的文本推荐程序,文本推荐程序被处理器110执行时实现上述的文本推荐方法各实施例的步骤。In addition, an embodiment of the present invention also provides a text recommendation device, the device includes: a memory 109 , a processor 110 , and a text recommendation program stored in the memory 109 and executable on the processor 110 , and the text recommendation program is executed by the processor 110 At the same time, the steps of each embodiment of the above text recommendation method are implemented.
此外,本发明还提供了一种计算机介质,所述计算机介质存储有一个或者一个以上程序,所述一个或者一个以上程序还可被一个或者一个以上的处理器执行以用于实现上述文本推荐方法各实施例的步骤。In addition, the present invention also provides a computer medium, the computer medium stores one or more programs, and the one or more programs can also be executed by one or more processors to implement the above text recommendation method Steps of each example.
本发明设备及介质(即计算机介质)的具体实施方式的拓展内容与上述文本推荐方法各实施例基本相同,在此不做赘述。The expanded contents of the specific implementation manner of the device and medium (ie, computer medium) of the present invention are basically the same as those of the above-mentioned embodiments of the text recommendation method, and are not repeated here.
需要说明的是,在文本中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, in the text, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages or disadvantages of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course hardware can also be used, but in many cases the former is better implementation. Based on this understanding, the technical solutions of the present invention can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products are stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present invention.
上面结合附图对本发明的实施例进行了描述,但是本发明并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本发明的启示下,在不脱离本发明宗旨和权利要求所保护的范围情况下,还可做出很多形式,这些均属于本发明的保护之内。The embodiments of the present invention have been described above in conjunction with the accompanying drawings, but the present invention is not limited to the above-mentioned specific embodiments, which are merely illustrative rather than restrictive. Under the inspiration of the present invention, without departing from the scope of protection of the present invention and the claims, many forms can be made, which all belong to the protection of the present invention.
Claims (13)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201911179808.XA CN110888990B (en) | 2019-11-22 | 2019-11-22 | Text recommendation method, device, equipment and medium |
| PCT/CN2020/129115 WO2021098648A1 (en) | 2019-11-22 | 2020-11-16 | Text recommendation method, apparatus and device, and medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201911179808.XA CN110888990B (en) | 2019-11-22 | 2019-11-22 | Text recommendation method, device, equipment and medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN110888990A true CN110888990A (en) | 2020-03-17 |
| CN110888990B CN110888990B (en) | 2024-04-12 |
Family
ID=69748961
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201911179808.XA Active CN110888990B (en) | 2019-11-22 | 2019-11-22 | Text recommendation method, device, equipment and medium |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN110888990B (en) |
| WO (1) | WO2021098648A1 (en) |
Cited By (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111400456A (en) * | 2020-03-20 | 2020-07-10 | 北京百度网讯科技有限公司 | Information recommendation method and device |
| CN111428092A (en) * | 2020-03-20 | 2020-07-17 | 北京中亦安图科技股份有限公司 | Accurate bank marketing method based on graph model |
| CN112000795A (en) * | 2020-08-04 | 2020-11-27 | 中国建设银行股份有限公司 | Official document recommendation method and device |
| CN112561581A (en) * | 2020-12-14 | 2021-03-26 | 珠海格力电器股份有限公司 | Recommendation method and device, electronic equipment and storage medium |
| CN112749344A (en) * | 2021-02-04 | 2021-05-04 | 北京百度网讯科技有限公司 | Information recommendation method and device, electronic equipment, storage medium and program product |
| CN112836061A (en) * | 2021-01-12 | 2021-05-25 | 平安科技(深圳)有限公司 | Method, device and computer equipment for intelligent recommendation |
| WO2021098648A1 (en) * | 2019-11-22 | 2021-05-27 | 深圳前海微众银行股份有限公司 | Text recommendation method, apparatus and device, and medium |
| CN113505587A (en) * | 2021-06-23 | 2021-10-15 | 科大讯飞华南人工智能研究院(广州)有限公司 | Entity extraction method, related device, equipment and storage medium |
| CN113535939A (en) * | 2020-04-17 | 2021-10-22 | 阿里巴巴集团控股有限公司 | Text processing method and apparatus, electronic device, and computer-readable storage medium |
| CN113987160A (en) * | 2021-11-15 | 2022-01-28 | 京东科技控股股份有限公司 | Text information pushing method and device |
| CN114153965A (en) * | 2021-12-08 | 2022-03-08 | 深圳市网联安瑞网络科技有限公司 | Content and map combined public opinion event recommendation method, system and terminal |
| CN114625747A (en) * | 2022-05-13 | 2022-06-14 | 杭银消费金融股份有限公司 | Wind control updating method and system based on information security |
| US11977841B2 (en) | 2021-12-22 | 2024-05-07 | Bank Of America Corporation | Classification of documents |
Families Citing this family (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114239828B (en) * | 2021-09-14 | 2024-11-15 | 华信宸安(北京)科技有限公司 | A method for constructing a supply chain event graph based on causality |
| CN114153988B (en) * | 2021-12-02 | 2025-08-19 | 青岛科技大学 | Logic map-based chemical abnormal event detection method and system |
| CN114020936B (en) * | 2022-01-06 | 2022-04-01 | 北京融信数联科技有限公司 | Construction method and system of multi-modal affair map and readable storage medium |
| CN114817678B (en) * | 2022-01-27 | 2024-08-20 | 武汉理工大学 | An automatic text collection method for specific fields |
| CN118194864B (en) * | 2024-05-17 | 2024-07-19 | 上海通创信息技术股份有限公司 | A potential user mining method and system based on speech analysis |
| CN119271807B (en) * | 2024-12-09 | 2025-03-18 | 成都信息工程大学 | Public data story method, device, equipment and storage medium |
| CN121031758B (en) * | 2025-10-24 | 2026-01-27 | 鱼快创领智能科技(南京)有限公司 | Mixed search knowledge base construction system and method based on keyword base and vector |
Citations (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2014194689A1 (en) * | 2013-06-06 | 2014-12-11 | Tencent Technology (Shenzhen) Company Limited | Method, server, browser, and system for recommending text information |
| US20170139899A1 (en) * | 2015-11-18 | 2017-05-18 | Le Holdings (Beijing) Co., Ltd. | Keyword extraction method and electronic device |
| WO2017084362A1 (en) * | 2015-11-18 | 2017-05-26 | 百度在线网络技术(北京)有限公司 | Model generation method, recommendation method and corresponding apparatuses, device and storage medium |
| CN107944911A (en) * | 2017-11-18 | 2018-04-20 | 电子科技大学 | A kind of recommendation method of the commending system based on text analyzing |
| CN108153901A (en) * | 2018-01-16 | 2018-06-12 | 北京百度网讯科技有限公司 | The information-pushing method and device of knowledge based collection of illustrative plates |
| CN108733694A (en) * | 2017-04-18 | 2018-11-02 | 北京国双科技有限公司 | Method and apparatus are recommended in retrieval |
| CN109165350A (en) * | 2018-08-23 | 2019-01-08 | 成都品果科技有限公司 | A kind of information recommendation method and system based on deep knowledge perception |
| CN109408826A (en) * | 2018-11-07 | 2019-03-01 | 北京锐安科技有限公司 | A kind of text information extracting method, device, server and storage medium |
| CN109597878A (en) * | 2018-11-13 | 2019-04-09 | 北京合享智慧科技有限公司 | A kind of method and relevant apparatus of determining text similarity |
| CN110413875A (en) * | 2019-06-26 | 2019-11-05 | 腾讯科技(深圳)有限公司 | A method and related device for pushing text information |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150310073A1 (en) * | 2014-04-29 | 2015-10-29 | Microsoft Corporation | Finding patterns in a knowledge base to compose table answers |
| CN109033132B (en) * | 2018-06-05 | 2020-12-11 | 中证征信(深圳)有限公司 | Method and device for calculating text and subject correlation by using knowledge graph |
| CN110888990B (en) * | 2019-11-22 | 2024-04-12 | 深圳前海微众银行股份有限公司 | Text recommendation method, device, equipment and medium |
-
2019
- 2019-11-22 CN CN201911179808.XA patent/CN110888990B/en active Active
-
2020
- 2020-11-16 WO PCT/CN2020/129115 patent/WO2021098648A1/en not_active Ceased
Patent Citations (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2014194689A1 (en) * | 2013-06-06 | 2014-12-11 | Tencent Technology (Shenzhen) Company Limited | Method, server, browser, and system for recommending text information |
| US20170139899A1 (en) * | 2015-11-18 | 2017-05-18 | Le Holdings (Beijing) Co., Ltd. | Keyword extraction method and electronic device |
| WO2017084362A1 (en) * | 2015-11-18 | 2017-05-26 | 百度在线网络技术(北京)有限公司 | Model generation method, recommendation method and corresponding apparatuses, device and storage medium |
| CN108733694A (en) * | 2017-04-18 | 2018-11-02 | 北京国双科技有限公司 | Method and apparatus are recommended in retrieval |
| CN107944911A (en) * | 2017-11-18 | 2018-04-20 | 电子科技大学 | A kind of recommendation method of the commending system based on text analyzing |
| CN108153901A (en) * | 2018-01-16 | 2018-06-12 | 北京百度网讯科技有限公司 | The information-pushing method and device of knowledge based collection of illustrative plates |
| CN109165350A (en) * | 2018-08-23 | 2019-01-08 | 成都品果科技有限公司 | A kind of information recommendation method and system based on deep knowledge perception |
| CN109408826A (en) * | 2018-11-07 | 2019-03-01 | 北京锐安科技有限公司 | A kind of text information extracting method, device, server and storage medium |
| CN109597878A (en) * | 2018-11-13 | 2019-04-09 | 北京合享智慧科技有限公司 | A kind of method and relevant apparatus of determining text similarity |
| CN110413875A (en) * | 2019-06-26 | 2019-11-05 | 腾讯科技(深圳)有限公司 | A method and related device for pushing text information |
Non-Patent Citations (1)
| Title |
|---|
| 邱利茂;刘嘉勇;: "基于文档词典的文本关联关键词推荐技术", 现代计算机(专业版), no. 07, 5 March 2018 (2018-03-05) * |
Cited By (20)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2021098648A1 (en) * | 2019-11-22 | 2021-05-27 | 深圳前海微众银行股份有限公司 | Text recommendation method, apparatus and device, and medium |
| CN111428092B (en) * | 2020-03-20 | 2023-05-02 | 北京中亦安图科技股份有限公司 | Bank accurate marketing method based on graph model |
| CN111428092A (en) * | 2020-03-20 | 2020-07-17 | 北京中亦安图科技股份有限公司 | Accurate bank marketing method based on graph model |
| CN111400456A (en) * | 2020-03-20 | 2020-07-10 | 北京百度网讯科技有限公司 | Information recommendation method and device |
| CN111400456B (en) * | 2020-03-20 | 2023-09-26 | 北京百度网讯科技有限公司 | Information recommendation method and device |
| CN113535939A (en) * | 2020-04-17 | 2021-10-22 | 阿里巴巴集团控股有限公司 | Text processing method and apparatus, electronic device, and computer-readable storage medium |
| CN112000795A (en) * | 2020-08-04 | 2020-11-27 | 中国建设银行股份有限公司 | Official document recommendation method and device |
| CN112000795B (en) * | 2020-08-04 | 2024-12-27 | 中国建设银行股份有限公司 | Official document recommendation method and device |
| CN112561581A (en) * | 2020-12-14 | 2021-03-26 | 珠海格力电器股份有限公司 | Recommendation method and device, electronic equipment and storage medium |
| CN112836061A (en) * | 2021-01-12 | 2021-05-25 | 平安科技(深圳)有限公司 | Method, device and computer equipment for intelligent recommendation |
| WO2022151594A1 (en) * | 2021-01-12 | 2022-07-21 | 平安科技(深圳)有限公司 | Intelligent recommendation method and apparatus, and computer device |
| CN112749344B (en) * | 2021-02-04 | 2023-08-01 | 北京百度网讯科技有限公司 | Information recommendation method, device, electronic device, storage medium and program product |
| CN112749344A (en) * | 2021-02-04 | 2021-05-04 | 北京百度网讯科技有限公司 | Information recommendation method and device, electronic equipment, storage medium and program product |
| CN113505587A (en) * | 2021-06-23 | 2021-10-15 | 科大讯飞华南人工智能研究院(广州)有限公司 | Entity extraction method, related device, equipment and storage medium |
| CN113505587B (en) * | 2021-06-23 | 2024-04-09 | 科大讯飞华南人工智能研究院(广州)有限公司 | Entity extraction method and related device, equipment and storage medium |
| CN113987160A (en) * | 2021-11-15 | 2022-01-28 | 京东科技控股股份有限公司 | Text information pushing method and device |
| CN114153965A (en) * | 2021-12-08 | 2022-03-08 | 深圳市网联安瑞网络科技有限公司 | Content and map combined public opinion event recommendation method, system and terminal |
| CN114153965B (en) * | 2021-12-08 | 2026-02-13 | 深圳市网联安瑞网络科技有限公司 | A method, system, and terminal for recommending public opinion events that combines content and graphs. |
| US11977841B2 (en) | 2021-12-22 | 2024-05-07 | Bank Of America Corporation | Classification of documents |
| CN114625747A (en) * | 2022-05-13 | 2022-06-14 | 杭银消费金融股份有限公司 | Wind control updating method and system based on information security |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2021098648A1 (en) | 2021-05-27 |
| CN110888990B (en) | 2024-04-12 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110888990A (en) | Text recommending methods, devices, equipment and media | |
| US11580104B2 (en) | Method, apparatus, device, and storage medium for intention recommendation | |
| US11663254B2 (en) | System and engine for seeded clustering of news events | |
| CN108154395B (en) | Big data-based customer network behavior portrait method | |
| US9268843B2 (en) | Personalization engine for building a user profile | |
| US8140515B2 (en) | Personalization engine for building a user profile | |
| US10217058B2 (en) | Predicting interesting things and concepts in content | |
| US20130060769A1 (en) | System and method for identifying social media interactions | |
| CN104111941B (en) | The method and apparatus that information is shown | |
| CN107357793B (en) | Information recommendation method and device | |
| US20080319973A1 (en) | Recommending content using discriminatively trained document similarity | |
| WO2018151856A1 (en) | Intelligent matching system with ontology-aided relation extraction | |
| WO2016179938A1 (en) | Method and device for question recommendation | |
| CN111008265A (en) | Enterprise information searching method and device | |
| CN110232126B (en) | Hot spot mining method and server and computer-readable storage medium | |
| EP2307951A1 (en) | Method and apparatus for relating datasets by using semantic vectors and keyword analyses | |
| US10002187B2 (en) | Method and system for performing topic creation for social data | |
| US20140006369A1 (en) | Processing structured and unstructured data | |
| CA2956627C (en) | System and engine for seeded clustering of news events | |
| CN109299277A (en) | Public opinion analysis method, server and computer-readable storage medium | |
| CN115906858A (en) | Text processing method, system and electronic device | |
| KR20190109628A (en) | Method for providing personalized article contents and apparatus for the same | |
| CN116431895A (en) | Personalized recommendation method and system for safety production knowledge | |
| CN106503064A (en) | A kind of generation method of self adaptation microblog topic summary | |
| US8195458B2 (en) | Open class noun classification |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |

