WO2021213448A1 - Determination of map for information recommendation - Google Patents

Determination of map for information recommendation Download PDF

Info

Publication number
WO2021213448A1
WO2021213448A1 PCT/CN2021/088763 CN2021088763W WO2021213448A1 WO 2021213448 A1 WO2021213448 A1 WO 2021213448A1 CN 2021088763 W CN2021088763 W CN 2021088763W WO 2021213448 A1 WO2021213448 A1 WO 2021213448A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
nodes
vector representation
word
knowledge point
Prior art date
Application number
PCT/CN2021/088763
Other languages
French (fr)
Chinese (zh)
Inventor
杨明晖
崔恒斌
陈显玲
陈晓军
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2021213448A1 publication Critical patent/WO2021213448A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method, system and apparatus for determining a map for information recommendation. The method comprises: acquiring a plurality of nodes used for constructing a target map, wherein the nodes at least comprise a word node and a knowledge point node (202), wherein if the node is a word node, a vector representation of a word corresponding to the node is taken as a vector representation of the node, and if the node is a knowledge point node, on the basis of a vector representation of a word related to the knowledge point node, a vector representation corresponding to the knowledge point node is determined; for any two nodes, determining an edge weight between the two nodes on the basis of the types of the two nodes, and taking the edge weight as an association relationship between the two nodes (204); and on the basis of the vector representation of the node and the association relationship between the two nodes, performing at least one round of map aggregation iteration, and updating the vector representation of the node in the map (206).

Description

确定用于信息推荐的图谱Determine the atlas used for information recommendation 技术领域Technical field
本说明书涉及数据处理领域,特别涉及一种确定用于信息推荐的图谱的方法、系统及装置。This specification relates to the field of data processing, and in particular to a method, system and device for determining an atlas for information recommendation.
背景技术Background technique
随着科技的发展,人工智能的出现为以往需要耗费大量人工成本的行业提供了新的解决方案,例如,人工客服。智能客服机器人可以解答用户的简单文本提问,但不擅长处理复杂、模糊的提问。由于用户会发送复杂或者模糊的问题,导致智能客服机器人并不能给用户推荐准确的信息,这增加了智能客服机器人处理难度,以及降低用户的体验。With the development of science and technology, the emergence of artificial intelligence provides new solutions for industries that used to consume a lot of labor costs, such as manual customer service. Intelligent customer service robots can answer simple text questions from users, but they are not good at handling complex and vague questions. Because users will send complex or ambiguous questions, the intelligent customer service robot cannot recommend accurate information to the user, which increases the processing difficulty of the intelligent customer service robot and reduces the user's experience.
发明内容Summary of the invention
本说明书实施例之一提供一种确定用于信息推荐的图谱的方法。所述方法包括:获取构建图谱的多个节点;所述节点至少包括词语节点,以及知识点节点;若所述节点为词语节点,将该节点对应的词语的向量表示作为该节点的向量表示;若所述节点为知识点节点,基于与所述知识点节点相关的词语的向量表示,确定对应于所述知识点节点的向量表示;对于任意两个节点:基于所述两个节点的类型,确定所述两个节点之间的边权,并将所述边权作为所述两个节点之间的关联关系;基于节点的向量表示,以及节点与节点之间的关联关系,进行至少一轮图聚合迭代,以更新所述图谱中节点的向量表示。One of the embodiments of this specification provides a method for determining the atlas for information recommendation. The method includes: obtaining a plurality of nodes for constructing a graph; the nodes include at least a word node and a knowledge point node; if the node is a word node, the vector representation of the word corresponding to the node is used as the vector representation of the node; If the node is a knowledge point node, determine the vector representation corresponding to the knowledge point node based on the vector representation of the words related to the knowledge point node; for any two nodes: based on the types of the two nodes, Determine the edge weight between the two nodes, and use the edge weight as the association relationship between the two nodes; based on the vector representation of the node and the association relationship between the nodes, perform at least one round Graph aggregation iterations to update the vector representation of nodes in the graph.
本说明书实施例之一提供一种利用确定的图谱进行的信息推荐方法。所述方法包括:获取输入信息;利用所述图谱,确定所述输入信息在所述图谱中对应的节点;所述图谱如确定用于信息推荐的图谱的方法确定;基于所述节点的向量表示,以及所述节点的邻接节点的向量表示,确定推荐节点;将与所述推荐节点相关的信息作为输出。One of the embodiments of this specification provides an information recommendation method using a determined map. The method includes: acquiring input information; using the graph to determine the node corresponding to the input information in the graph; the graph is determined by a method for determining a graph for information recommendation; based on the vector representation of the node , And the vector representation of the adjacent nodes of the node, determine the recommended node; output the information related to the recommended node.
本说明书实施例之一提供一种确定用于信息推荐的图谱的系统。所述系统包括第一获取模块、第一确定模块、以及更新模块;所述第一获取模块,用于获取构建图谱的多个节点;所述节点至少包括词语节点,以及知识点节点;若所述节点为词语节点,将该节点对应的词语的向量表示作为该节点的向量表示;若所述节点为知识点节点,基于与所述知识点节点相关的词语的向量表示,确定对应于所述知识点节点的向量表示;对于任意两个节点:所述第一确定模块,用于基于所述两个节点的类型,确定所述两个节点之间的边权,并将所述边权作为所述两个节点之间的关联关系;所述更新模块,用于基于节点的向量表示,以及节点与节点之间的关联关系,进行至少一轮图聚合迭代,以更新所述图谱中节点的向量表示。One of the embodiments of this specification provides a system for determining an atlas for information recommendation. The system includes a first acquisition module, a first determination module, and an update module; the first acquisition module is used to acquire a plurality of nodes for constructing a graph; the nodes include at least a word node and a knowledge point node; The node is a word node, and the vector representation of the word corresponding to the node is taken as the vector representation of the node; if the node is a knowledge point node, based on the vector representation of the word related to the knowledge point node, it is determined to correspond to the The vector representation of the knowledge point node; for any two nodes: the first determining module is configured to determine the edge weight between the two nodes based on the types of the two nodes, and use the edge weight as The association relationship between the two nodes; the update module is configured to perform at least one round of graph aggregation iterations based on the vector representation of the node and the association relationship between the nodes to update the nodes in the graph Vector representation.
本说明书实施例之一提供一种利用图谱进行的信息推荐系统。所述系统包括第二获取 模块、第二确定模块、第三确定模块以及输出模块;所述第二获取模块,用于获取输入信息;所述第二确定模块,用于利用所述图谱,确定所述输入信息在所述图谱中对应的节点;所述图谱由如确定用于信息推荐的图谱的方法确定;所述第三确定模块,用于基于所述节点的向量表示,以及所述节点的邻接节点的向量表示,确定推荐节点;所述输出模块,用于将与所述推荐节点相关的信息作为输出。One of the embodiments of this specification provides an information recommendation system using atlas. The system includes a second acquisition module, a second determination module, a third determination module, and an output module; the second acquisition module is used to acquire input information; the second determination module is used to use the atlas to determine The node corresponding to the input information in the graph; the graph is determined by a method such as determining the graph for information recommendation; the third determining module is configured to be based on the vector representation of the node, and the node The vector representation of the adjacent nodes of, determines the recommended node; the output module is used to output the information related to the recommended node.
本说明书实施例之一提供一种确定用于信息推荐的图谱的装置。所述装置包括处理器,所述处理器用于执行以上所述的确定用于信息推荐的图谱的方法。One of the embodiments of this specification provides a device for determining a map for information recommendation. The device includes a processor, and the processor is configured to execute the above-mentioned method for determining a graph for information recommendation.
本说明书实施例之一提供一种利用确定的图谱进行信息推荐的装置。所述装置包括处理器,所述处理器用于执行以上所述的利用确定的图谱进行信息推荐的方法。One of the embodiments of this specification provides a device for information recommendation using a determined map. The device includes a processor, and the processor is configured to execute the above-mentioned method for information recommendation using the determined atlas.
附图说明Description of the drawings
本说明书将以示例性实施例的方式进一步说明,这些示例性实施例将通过附图进行详细描述。这些实施例并非限制性的,在这些实施例中,相同的编号表示相同的结构,其中:This specification will be further described in the form of exemplary embodiments, and these exemplary embodiments will be described in detail with the accompanying drawings. These embodiments are not restrictive. In these embodiments, the same number represents the same structure, in which:
图1是根据本说明书一些实施例所示的信息推荐系统的应用场景100的示意图;FIG. 1 is a schematic diagram of an application scenario 100 of an information recommendation system according to some embodiments of this specification;
图2是根据本说明书一些实施例所示的确定用于信息推荐的图谱的示例性流程图;Fig. 2 is an exemplary flow chart of determining the atlas for information recommendation according to some embodiments of the present specification;
图3是根据本说明书一些实施例所示的更新初始图谱的示例性流程图;Fig. 3 is an exemplary flowchart of updating the initial atlas according to some embodiments of the present specification;
图4是根据本说明书一些实施例所示的利用目标图谱进行信息推荐的示例性流程图;Fig. 4 is an exemplary flow chart of information recommendation using a target graph according to some embodiments of the present specification;
图5是根据本说明书一些实施例所示的确定用于信息推荐的图谱的系统模块图;Fig. 5 is a block diagram of a system for determining an atlas for information recommendation according to some embodiments of this specification;
图6是根据本说明书一些实施例所示的利用目标图谱进行信息推荐的系统模块图;及Fig. 6 is a block diagram of a system for recommending information by using a target graph according to some embodiments of this specification; and
图7是根据本说明书一些实施例所示的图谱的示意图。Figure 7 is a schematic diagram of a map according to some embodiments of the present specification.
具体实施方式Detailed ways
为了更清楚地说明本说明书实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单的介绍。显而易见地,下面描述中的附图仅仅是本说明书的一些示例或实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图将本说明书应用于其它类似情景。除非从语言环境中显而易见或另做说明,图中相同标号代表相同结构或操作。In order to more clearly describe the technical solutions of the embodiments of the present specification, the following will briefly introduce the accompanying drawings used in the description of the embodiments. Obviously, the drawings in the following description are only some examples or embodiments of this specification. For those of ordinary skill in the art, without creative work, this specification can also be applied to these drawings. Other similar scenarios. Unless it is obvious from the language environment or otherwise stated, the same reference numerals in the figures represent the same structure or operation.
应当理解,本文使用的“系统”、“装置”、“单元”和/或“模组”是用于区分不同级别的不同组件、元件、部件、部分或装配的一种方法。然而,如果其他词语可实现相同的目的,则可通过其他表达来替换所述词语。It should be understood that the “system”, “device”, “unit” and/or “module” used herein is a method for distinguishing different components, elements, parts, parts, or assemblies of different levels. However, if other words can achieve the same purpose, the words can be replaced by other expressions.
如本说明书和权利要求书中所示,除非上下文明确提示例外情形,“一”、“一个”、“一 种”和/或“该”等词并非特指单数,也可包括复数。一般说来,术语“包括”与“包含”仅提示包括已明确标识的步骤和元素,而这些步骤和元素不构成一个排它性的罗列,方法或者设备也可能包含其它的步骤或元素。As shown in this specification and claims, unless the context clearly suggests exceptions, the words "a", "an", "one" and/or "the" do not specifically refer to the singular, but may also include the plural. Generally speaking, the terms "include" and "include" only suggest that the clearly identified steps and elements are included, and these steps and elements do not constitute an exclusive list, and the method or device may also include other steps or elements.
本说明书中使用了流程图用来说明根据本说明书的实施例的系统所执行的操作。应当理解的是,前面或后面操作不一定按照顺序来精确地执行。相反,可以按照倒序或同时处理各个步骤。同时,也可以将其他操作添加到这些过程中,或从这些过程移除某一步或数步操作。In this specification, a flowchart is used to illustrate the operations performed by the system according to the embodiment of this specification. It should be understood that the preceding or following operations are not necessarily performed exactly in order. Instead, the steps can be processed in reverse order or at the same time. At the same time, other operations can be added to these processes, or a certain step or several operations can be removed from these processes.
在一些应用场景下,智能客服机器人可以提供气泡推荐功能,用户通过点击气泡获取知识或服务。所述气泡可以理解为文本框,其具有一定的形状,如圆形、长方形等,其对应具有特定含义的本文。在一些实施例中,为用户提供固定的气泡,每个气泡对应一个固定的功能。这需要专门的配置和开发。在又一些实施例中,当用户点击一个气泡时,会为用户推荐与气泡相关联的细化知识或服务。但此方案在气泡产生上依赖人工标注,且没有共现的词语之间无法建立连接。本说明书所披露的又一些实施例所涉及的确定用于信息推荐的图谱以及基于以上图谱进行信息推荐的方法,依赖无监督数据,不需要人工标注。且本方法使用图结构,使得没有共现的词语之间也能建立联系,可以挖掘深层次的表示信息。In some application scenarios, intelligent customer service robots can provide bubble recommendation functions, and users can click on bubbles to obtain knowledge or services. The bubble can be understood as a text box, which has a certain shape, such as a circle, a rectangle, etc., which corresponds to a text with a specific meaning. In some embodiments, fixed bubbles are provided for the user, and each bubble corresponds to a fixed function. This requires specialized configuration and development. In still other embodiments, when the user clicks on a bubble, refined knowledge or services associated with the bubble will be recommended for the user. However, this solution relies on manual labeling for bubble generation, and no connection can be established between words without co-occurrence. The methods for determining the atlas used for information recommendation and performing information recommendation based on the above atlases involved in other embodiments disclosed in this specification rely on unsupervised data and do not require manual annotation. In addition, this method uses a graph structure to establish connections between words that do not co-occur, and can dig deep-level representation information.
图1是根据本说明书一些实施例所示的信息推荐系统的应用场景100的示意图。Fig. 1 is a schematic diagram of an application scenario 100 of an information recommendation system according to some embodiments of this specification.
如图1所示,应用场景100可以包含处理设备110、网络120、用户终端130、以及存储设备140。应用场景100可以至少包括云客服场景。用户通过使用用户终端130向处理设备110发送咨询数据,处理设备110可以确定与接收到的咨询数据最相关的推荐信息,并将该推荐信息返回至用户终端130。As shown in FIG. 1, the application scenario 100 may include a processing device 110, a network 120, a user terminal 130, and a storage device 140. The application scenario 100 may include at least a cloud customer service scenario. The user sends the consultation data to the processing device 110 by using the user terminal 130, and the processing device 110 can determine the recommendation information most relevant to the received consultation data, and return the recommendation information to the user terminal 130.
处理设备110可以执行一个或多个本说明书中描述的功能。例如,处理设备110可以用于构建目标图谱,并利用目标图谱向用户进行信息推荐。处理设备110的使用者可以是服务提供者,该服务提供者可以根据自身所提供的服务内容,或历史上多个用户的咨询数据,构建目标图谱,并基于目标图谱向新老用户推荐信息。所推荐的信息可以是与服务提供者所提供的服务相关的知识,或请求服务的链接等。在一些实施例中,处理设备110可以是独立的服务器或者服务器组。该服务器组可以是集中式的或者分布式的(如:处理设备110可以是分布系统)。在一些实施例中该处理设备110可以是区域的或者远程的。例如,处理设备110可通过网络访问存储于用户终端130、存储设备140中的信息和/或资料。在一些实施例中,处理设备110可直接与用户终端130、存储设备140直接连接以访问存储于其中的信息和/或资料。在一些实施例中,处理设备110可在云平台上执行。例如,该云平台可包括私有云、公共云、混合云、社区云、分散式云、内部云等中的一种或其任意组合。The processing device 110 may perform one or more functions described in this specification. For example, the processing device 110 may be used to construct a target map, and use the target map to recommend information to the user. The user of the processing device 110 may be a service provider, and the service provider may construct a target map based on the service content provided by itself or the consulting data of multiple users in history, and recommend information to new and old users based on the target map. The recommended information may be knowledge related to the service provided by the service provider, or a link to request a service, etc. In some embodiments, the processing device 110 may be an independent server or a server group. The server group may be centralized or distributed (for example, the processing device 110 may be a distributed system). In some embodiments, the processing device 110 may be regional or remote. For example, the processing device 110 may access the information and/or data stored in the user terminal 130 and the storage device 140 via the network. In some embodiments, the processing device 110 may be directly connected to the user terminal 130 and the storage device 140 to access the information and/or data stored therein. In some embodiments, the processing device 110 may be executed on a cloud platform. For example, the cloud platform may include one or any combination of private cloud, public cloud, hybrid cloud, community cloud, decentralized cloud, internal cloud, etc.
在一些实施例中,处理设备110可包含一个或多个处理设备(例如,单芯处理设备或多核多芯处理设备)。仅仅作为范例,所述处理设备可包含中央处理器(CPU)、专用集成 电路(ASIC)、专用指令处理器(ASIP)、图形处理器(GPU)、物理处理器(PPU)、数字信号处理器(DSP)、现场可编程门阵列(FPGA)、可编辑逻辑电路(PLD)、控制器、微控制器单元、精简指令集电脑(RISC)、微处理器等或以上任意组合。In some embodiments, the processing device 110 may include one or more processing devices (for example, a single-core processing device or a multi-core and multi-core processing device). Merely as an example, the processing device may include a central processing unit (CPU), an application specific integrated circuit (ASIC), an application specific instruction processor (ASIP), a graphics processing unit (GPU), a physical processor (PPU), and a digital signal processor. (DSP), Field Programmable Gate Array (FPGA), Editable Logic Circuit (PLD), Controller, Microcontroller Unit, Reduced Instruction Set Computer (RISC), Microprocessor, etc. or any combination of the above.
网络120可以促进应用场景100中的各个部件间的数据和/或信息的交换。例如,处理设备110可以将推荐的信息通过网络120发送至用户终端130。在一些实施例中,应用场景100中的一个或多个组件(用户终端130、存储设备140)可通过网络120发送数据和/或信息给应用场景100中的其他组件。在一些实施例中,网络120可是任意类型的有线或无线网络。例如,网络120可包括有线网络、光纤网络、远程通信网络、内部网络、互联网、局域网(LAN)、广域网(WAN)、无线局域网(WLAN)、城域网(MAN)、广域网(WAN)、公共交换电话网络(PSTN)、蓝牙网络、紫蜂网络、近场通讯(NFC)网络、全球移动通讯系统(GSM)网络、码分多址(CDMA)网络、时分多址(TDMA)网络、通用分组无线服务(GPRS)网络、增强数据速率GSM演进(EDGE)网络、宽带码分多址接入(WCDMA)网络、高速下行分组接入(HSDPA)网络、长期演进(LTE)网络、用户数据报协议(UDP)网络、传输控制协议/互联网协议(TCP/IP)网络、短讯息服务(SMS)网络、无线应用协议(WAP)网络、超宽带(UWB)网络、移动通信(1G、2G、3G、4G、5G)网络、Wi-Fi、Li-Fi、窄带物联网(NB-IoT)、红外通信等中的一种或多种组合。在一些实施例中,网络120可包括一个或多个网络接入点。例如,网络120可包含有线或无线网络接入点。通过这些接入点,应用场景100中的一个或多个组件可连接到网络120上以交换数据和/或信息。The network 120 can facilitate the exchange of data and/or information among various components in the application scenario 100. For example, the processing device 110 may send the recommended information to the user terminal 130 through the network 120. In some embodiments, one or more components (user terminal 130, storage device 140) in the application scenario 100 may send data and/or information to other components in the application scenario 100 via the network 120. In some embodiments, the network 120 may be any type of wired or wireless network. For example, the network 120 may include a wired network, an optical fiber network, a telecommunication network, an internal network, the Internet, a local area network (LAN), a wide area network (WAN), a wireless local area network (WLAN), a metropolitan area network (MAN), a wide area network (WAN), public Switched Telephone Network (PSTN), Bluetooth Network, Zigbee Network, Near Field Communication (NFC) Network, Global System for Mobile Communications (GSM) Network, Code Division Multiple Access (CDMA) Network, Time Division Multiple Access (TDMA) Network, General Packet Wireless service (GPRS) network, enhanced data rate GSM evolution (EDGE) network, broadband code division multiple access (WCDMA) network, high-speed downlink packet access (HSDPA) network, long-term evolution (LTE) network, user datagram protocol (UDP) network, transmission control protocol/Internet protocol (TCP/IP) network, short message service (SMS) network, wireless application protocol (WAP) network, ultra-wideband (UWB) network, mobile communication (1G, 2G, 3G, 4G, 5G) network, Wi-Fi, Li-Fi, Narrowband Internet of Things (NB-IoT), infrared communication, etc., one or more combinations. In some embodiments, the network 120 may include one or more network access points. For example, the network 120 may include wired or wireless network access points. Through these access points, one or more components in the application scenario 100 can be connected to the network 120 to exchange data and/or information.
用户终端130可以是一种具有信息发送和/或接收功能的设备。例如,用户终端130可以将使用者所键入的咨询数据发送至处理设备110,并接收推荐系统110所返回的关于咨询数据的回复。在一些实施例中,用户终端可包括智能手机130-1、平板电脑130-2、笔记本电脑130-3等中的一种或任意组合。上述示例仅用于说明用户终端130范围的广泛性而非对其范围的限制。在一些实施例中,用户终端130上可以安装有多种应用程序,例如,电脑程序、移动应用程序(手机APP)等。所述应用程序可以由服务提供者制作并发布,用户进行下载并安装至用户终端130中。且用户可以通过该应用程序向服务提供者进行咨询。The user terminal 130 may be a device with information sending and/or receiving functions. For example, the user terminal 130 may send the consultation data entered by the user to the processing device 110, and receive a reply regarding the consultation data returned by the recommendation system 110. In some embodiments, the user terminal may include one or any combination of a smart phone 130-1, a tablet computer 130-2, a notebook computer 130-3, and the like. The above examples are only used to illustrate the breadth of the scope of the user terminal 130 and not to limit its scope. In some embodiments, a variety of application programs may be installed on the user terminal 130, for example, a computer program, a mobile application program (mobile phone APP), and so on. The application program may be produced and published by a service provider, and the user may download and install it in the user terminal 130. And users can consult service providers through the application.
存储设备140可以存储数据和/或指令。数据可以包括构建图谱所需要的数据、构建完成的图谱、知识点、面向用户的推荐数据,例如关于服务提供者所提供的服务的说明。指令可以是处理设备110实现如本说明书所披露的功能时所需的指令。在一些实施例中,存储设备140还可以从用户终端130处获取数据,例如,用户在历史上所输入的咨询/查询数据。在一些实施例中,存储设备140可包括大容量存储器、可移动存储器、易失性读写存储器、只读存储器(ROM)等或其任意组合。示例性的大容量存储器可以包括磁盘、光盘、固态磁盘等。示例性可移动存储器可以包括闪存驱动器、软盘、光盘、存储卡、压缩盘、磁带等。示例性易失性读写存储器可以包括随机存取存储器(RAM)。示例性RAM可包括动态随机存取存储器(DRAM)、双倍数据速率同步动态随机存取存储器(DDRSDRAM)、静态随 机存取存储器(SRAM)、晶闸管随机存取存储器(T-RAM)和零电容随机存取存储器(Z-RAM)等。示例性只读存储器可以包括掩模型只读存储器(MROM)、可编程只读存储器(PROM)、可擦除可编程只读存储器(PEROM)、电可擦除可编程只读存储器(EEPROM)、光盘只读存储器(CD-ROM)和数字多功能磁盘只读存储器等。在一些实施例中,存储设备140可以在单个中央服务器、或通过通信链路连接的多个服务器或多个个人设备中实现。存储设备140也可以由多个个人设备和云服务器实现。存储设备140还可以在云平台上实现。例如,该云平台可包括私有云、公共云、混合云、社区云、分散式云、内部云等或以上任意组合。The storage device 140 may store data and/or instructions. The data may include the data required to construct the graph, the completed graph, knowledge points, and user-oriented recommendation data, such as descriptions of the services provided by the service provider. The instruction may be an instruction required by the processing device 110 to implement the functions as disclosed in this specification. In some embodiments, the storage device 140 may also obtain data from the user terminal 130, for example, consultation/query data input by the user in history. In some embodiments, the storage device 140 may include mass storage, removable storage, volatile read-write storage, read-only storage (ROM), etc., or any combination thereof. Exemplary mass storage devices may include magnetic disks, optical disks, solid state disks, and the like. Exemplary removable storage may include flash drives, floppy disks, optical disks, memory cards, compact disks, magnetic tapes, and the like. An exemplary volatile read-write memory may include random access memory (RAM). Exemplary RAM may include dynamic random access memory (DRAM), double data rate synchronous dynamic random access memory (DDRSDRAM), static random access memory (SRAM), thyristor random access memory (T-RAM), and zero capacitance Random access memory (Z-RAM), etc. Exemplary read-only memory may include mask-type read-only memory (MROM), programmable read-only memory (PROM), erasable programmable read-only memory (PEROM), electrically erasable programmable read-only memory (EEPROM), CD-ROM and digital versatile disk read-only memory, etc. In some embodiments, the storage device 140 may be implemented in a single central server, or multiple servers or multiple personal devices connected through a communication link. The storage device 140 may also be implemented by multiple personal devices and cloud servers. The storage device 140 may also be implemented on a cloud platform. For example, the cloud platform may include private cloud, public cloud, hybrid cloud, community cloud, decentralized cloud, internal cloud, etc. or any combination of the above.
在一些实施例中,存储设备140可与网络120连接以与应用场景100中的一个或多个组件(例如,处理设备110、用户终端130等)通讯。应用场景100中的一个或多个组件可通过网络120访问存储于存储设备140中的数据或指令。在一些实施例中,存储设备140可直接与应用场景100中的一个或多个组件(例如,处理设备110、用户终端130等)连接或通讯。在一些实施例中,存储设备140可以是处理设备110的一部分。In some embodiments, the storage device 140 may be connected to the network 120 to communicate with one or more components in the application scenario 100 (for example, the processing device 110, the user terminal 130, etc.). One or more components in the application scenario 100 can access data or instructions stored in the storage device 140 via the network 120. In some embodiments, the storage device 140 may directly connect or communicate with one or more components in the application scenario 100 (for example, the processing device 110, the user terminal 130, etc.). In some embodiments, the storage device 140 may be part of the processing device 110.
应当注意的是,以上应用场景100中的各个部件的描述仅仅是为了示例和说明,而不限定本说明书的适用范围。对于本领域技术人员来说,在本说明书的指导下可以对应用场景100中的部件进行添加或减少。然而,这些改变仍在本说明书的范围之内。It should be noted that the description of each component in the above application scenario 100 is only for example and explanation, and does not limit the scope of application of this specification. For those skilled in the art, under the guidance of this specification, components in the application scenario 100 can be added or reduced. However, these changes are still within the scope of this specification.
图2是根据本说明书一些实施例所示的确定用于信息推荐的图谱(或称为目标图谱)的示例性流程图。在一些实施例中,流程200可以由信息推荐系统500,或图1所示的处理设备110实现。例如,流程200可以以程序或指令的形式存储在存储装置(如存储设备140)中,所述程序或指令在被执行时,可以实现流程200。如图2所示,流程200可以包括以下步骤。Fig. 2 is an exemplary flow chart of determining a map (or referred to as a target map) for information recommendation according to some embodiments of the present specification. In some embodiments, the process 200 may be implemented by the information recommendation system 500 or the processing device 110 shown in FIG. 1. For example, the process 200 may be stored in a storage device (such as the storage device 140) in the form of a program or instruction, and the program or instruction may implement the process 200 when the program or instruction is executed. As shown in FIG. 2, the process 200 may include the following steps.
步骤202,获取构建目标图谱的多个节点;所述节点至少包括词语节点,以及知识点节点。Step 202: Obtain multiple nodes for constructing the target graph; the nodes include at least word nodes and knowledge point nodes.
该步骤可以由第一获取模块510执行。This step may be performed by the first acquisition module 510.
在一些实施例中,所述目标图谱可以是指为用户进行信息推荐时所使用的图谱,其包含有多个节点以及节点之间的关联信息,每个节点可以对应于一份信息。在使用时,所述目标图谱可以根据用户的输入,确定最相关的节点,并将所述节点所对应的信息向用户推荐。构成所述目标图谱的多个节点,可以至少包括词语节点,以及知识点节点。所述词语节点所对应的信息可以是一个词语,在进行信息推荐可以直接将词语节点对应的词语推荐给用户。所述知识点节点所对应的信息可以是一个知识点。所述知识点可以由标题以及正文组成,标题可以是一个问题,正文可以是该问题的答案。在进行信息推荐时可以根据标题确定是否与用户的输入最相关,若是,则将正文推荐给用户。在所述目标图谱中,任意的两个节点中具有一定的关联关系。在进行信息推荐时,节点间的关联关系可以用于确定与用户的输入最相关的节点。In some embodiments, the target graph may refer to a graph used for information recommendation for users, which includes multiple nodes and associated information between nodes, and each node may correspond to a piece of information. When in use, the target graph can determine the most relevant node according to the user's input, and recommend the information corresponding to the node to the user. The multiple nodes constituting the target graph may include at least word nodes and knowledge point nodes. The information corresponding to the word node may be a word, and the word corresponding to the word node may be directly recommended to the user during information recommendation. The information corresponding to the knowledge point node may be a knowledge point. The knowledge points can be composed of a title and a text. The title can be a question, and the text can be the answer to the question. When recommending information, you can determine whether it is most relevant to the user's input according to the title, and if so, recommend the text to the user. In the target graph, any two nodes have a certain association relationship. When performing information recommendation, the association relationship between nodes can be used to determine the node most relevant to the user's input.
参照图7,图7是根据本说明书一些实施例所示的目标图谱的示意图。如图7所示,方框用来表示词语节点,方框中的内容为该词语节点对应的词语。如图7中的“相册”、“尺寸”、“照片”、“包邮”、“优惠券”等。词语节点所对应的词语,可以是在进行信息推荐时所使用的关键词,其与待推荐的信息关系紧密。也可以是用户在进行信息咨询时所输入的高频词,其可以链接到一份或多份待推荐的信息。图7中的圆框用来表示知识点节点,圆框中的内容为该知识点节点的标题,如图7,中的“相册有哪些尺寸”、“照片有哪些尺寸”、“能不能包邮”、“优惠券怎么用”等。知识点节点所对应的问答内容(即标题与正文)可以是在信息推荐时用户想要获得的信息支持内容,其可以是由使用应用场景100的一方(例如,服务提供者)的服务范围或服务内容确定的。例如,假定信息推荐100的使用者是一家摄影馆,则知识点节点所对应的问答内容可以是与摄影相关比如开业时间、所提供的摄影类型比如证件照写真照、成品相框的大小尺寸、是否可以邮寄、邮寄是否包邮等等。Referring to FIG. 7, FIG. 7 is a schematic diagram of a target map according to some embodiments of the present specification. As shown in Figure 7, the box is used to represent a word node, and the content in the box is the word corresponding to the word node. As shown in Figure 7, "Album", "Size", "Photo", "Free Shipping", "Coupon" and so on. The words corresponding to the word nodes may be keywords used in information recommendation, which are closely related to the information to be recommended. It can also be a high-frequency word input by the user during information consultation, which can be linked to one or more pieces of information to be recommended. The round box in Figure 7 is used to represent the knowledge point node, and the content in the round box is the title of the knowledge point node, as shown in Figure 7, "What are the sizes of the photo album", "What are the sizes of the photos", and "Can it be packaged? Mail", "how to use coupons", etc. The question and answer content (ie title and text) corresponding to the knowledge point node can be the information support content that the user wants to obtain when the information is recommended, and it can be the service range or The service content is determined. For example, assuming that the user of the information recommendation 100 is a photography studio, the question and answer content corresponding to the knowledge point node can be photography-related such as the opening time, the type of photography provided, such as the ID photo, the size of the finished photo frame, whether It can be mailed, whether it is free shipping, etc.
图7中的节点与节点之间的连线,则可以表示两个节点之间的关联关系。例如,词语节点“相册”与“尺寸”之间的连线,可以是表示两个词语共同出现的频率比如在一个语句、一个段落中共同出现的频率。频率越高,两者之间的关系越紧密。又例如,词语节点“相册”与知识点节点“相册有哪些尺寸”之间的连线,可以是表示相册在对“相册有哪些尺寸”的回答或解释中是否占比关键。占比关键则说明该回答或解释与相册关系紧密。在进行信息推荐时,可以根据用户的输入将目标图谱与用户的输入最相关的内容推荐给用户。目标图谱的构建在本流程图后续部分有详细描述,以及可以参考图3。关于信息推荐的描述可以参考本说明书图4部分。The connection between the nodes in Figure 7 can represent the association relationship between the two nodes. For example, the connection between the word node "album" and "size" may indicate the frequency of two words co-occurring, such as the frequency of co-occurring in a sentence or a paragraph. The higher the frequency, the closer the relationship between the two. For another example, the connection between the word node "album" and the knowledge node "what are the sizes of the album" may indicate whether the album accounts for the key to the answer or explanation of the "what sizes of the album". The key to the proportion indicates that the answer or explanation is closely related to the album. When performing information recommendation, the most relevant content of the target map and the user's input can be recommended to the user according to the user's input. The construction of the target map is described in detail in the subsequent part of this flowchart, and you can refer to Figure 3. For the description of information recommendation, please refer to Figure 4 of this manual.
在一些实施例中,所述多个节点可以是预先存储在存储设备,例如,处理设备110自带的存储设备,或存储设备140中。其可以是根据用户在历史上的咨询或自身的服务范围确定并预先存储的。第一获取模块510可以与存储设备进行通信后,读取所述多个节点。In some embodiments, the multiple nodes may be pre-stored in a storage device, for example, the storage device that comes with the processing device 110, or the storage device 140. It can be determined and stored in advance according to the user's historical consultation or his own service scope. The first acquiring module 510 may read the multiple nodes after communicating with the storage device.
在一些实施例中,可以基于各个节点的类型分别确定对应于每个节点的向量表示。可以理解,每个节点所对应的内容(例如,词语或知识点)可以通过向量的方式表示。例如,通过词嵌入的方式将词语、短语、语句、或段落映射成数字,通过数学方式在向量空间中表达,有益于数据的处理。节点与节点的关联关系,在本说明书中也可以是使用一个量化数据进行体现,以表示两个节点之间的关联紧密程度。In some embodiments, the vector representation corresponding to each node may be determined based on the type of each node. It can be understood that the content (for example, words or knowledge points) corresponding to each node can be expressed in the form of vectors. For example, mapping words, phrases, sentences, or paragraphs into numbers through word embedding, and expressing them in vector space through mathematics, is beneficial to data processing. The relationship between nodes and nodes may also be embodied in this specification using a piece of quantitative data to indicate the closeness of the relationship between the two nodes.
在一些实施例中,第一获取模块510可以根据节点类型(词语节点以及知识点节点)的不同,为每个节点确定其对应的向量表示。若所述节点为词语节点,第一获取模块510可以将该节点对应的词语的向量表示作为该节点的向量表示,可以利用词向量表示模型确定对应于所述词语的向量表示。所述词向量表示模型包括机器学习模型,例如,人工神经网络。示例性的词向量表示模型可以是词嵌入模型,包括但不限于word2vec、glove、ELMo、BERT等。其输入可以是词语,输出可以是该词语对应的词向量。第一获取模块510可以通过词嵌入模型确定每个词语对应的向量,进而每个对应的向量表示作为该词语对应的词语节点的向 量表示。例如,假定两个词语节点分别对应的两个词语为“相册”、以及“尺寸”。第一获取模块510可以将以上两个词语输入至词嵌入模型中,获取到“相册”、“尺寸”分别对应的词语向量V 1以及V 2,并将V 1以及V 2这两个向量分别作为以上两个词语节点的向量表示。 In some embodiments, the first obtaining module 510 may determine the corresponding vector representation for each node according to the difference of node types (word nodes and knowledge node nodes). If the node is a word node, the first obtaining module 510 can use the vector representation of the word corresponding to the node as the vector representation of the node, and can use the word vector representation model to determine the vector representation corresponding to the word. The word vector representation model includes a machine learning model, for example, an artificial neural network. An exemplary word vector representation model may be a word embedding model, including but not limited to word2vec, glove, ELMo, BERT, etc. The input can be a word, and the output can be the word vector corresponding to the word. The first acquisition module 510 may determine the vector corresponding to each word through the word embedding model, and then each corresponding vector representation is a vector representation of the word node corresponding to the word. For example, assume that the two words corresponding to two word nodes are "album" and "size". The first obtaining module 510 can input the above two words into the word embedding model, obtain the word vectors V 1 and V 2 corresponding to "album" and "size" respectively, and separate the two vectors V 1 and V 2 As a vector representation of the above two word nodes.
在一些实施例中,若所述节点为知识点节点,第一获取模块510可以基于与所述知识点节点相关的词语的向量表示,确定对所述知识点节点的向量表示。所述与知识点节点相关的词语,可以是知识点中所包括的词语,也可以是与知识点节点具有关联关系的词语节点所对应的词语。例如,知识点节点“相册有哪些尺寸”中包括的词语可以有“相册”、“尺寸”,则与该知识点节点相关的词语为“相册”、“尺寸”。又例如,与知识点节点“相册有哪些尺寸”具有关联关系的词语节点对应的词语为“相册”、“尺寸”,则以上两个词语可以作为所述与知识点节点相关的词语。In some embodiments, if the node is a knowledge point node, the first obtaining module 510 may determine the vector representation of the knowledge point node based on the vector representation of the words related to the knowledge point node. The words related to the knowledge point node may be words included in the knowledge point, or may be words corresponding to the word node having an association relationship with the knowledge point node. For example, the words included in the knowledge point node "What are the sizes of the album" may include "album" and "size", and the words related to the knowledge point node are "album" and "size". For another example, the words corresponding to the word node associated with the knowledge point node "what size of the album" are "album" and "size", the above two words can be used as the words related to the knowledge point node.
在一些实施例中,第一获取模块510可以首先获取来自知识点节点对应的知识点的一个或多个词语,并确定所述一个或多个词语的向量表示。随后,第一获取模块510可以对一个或多个所述向量表示进行运算,将运算结果作为对应于所述知识点节点的向量表示。所述运算可以是一个或多个向量表示的求和运算或平均运算等,平均运算可包括加权平均,或算术平均。作为示例,假定来自知识点节点“相册有哪些尺寸”的词语包括“相册”和“尺寸”,两个词语对应的词语向量分别为V 1以及V 2,其可以基于词向量表示模型确定。第一获取模块510可以通过将两个词语向量进行平均计算,例如,算术平均计算得到向量V 3。则V 3将被作为知识点节点“相册有哪些尺寸”的向量表示。 In some embodiments, the first obtaining module 510 may first obtain one or more words from the knowledge point corresponding to the knowledge point node, and determine the vector representation of the one or more words. Subsequently, the first obtaining module 510 may perform operations on one or more of the vector representations, and use the operation results as the vector representations corresponding to the knowledge point nodes. The operation may be a sum operation or an average operation represented by one or more vectors, and the average operation may include a weighted average or an arithmetic average. As an example, it is assumed that the words from the knowledge point node "what sizes of albums" include "album" and "size", and the word vectors corresponding to the two words are V 1 and V 2 respectively , which can be determined based on the word vector representation model. The first obtaining module 510 may calculate the vector V 3 by averaging the two word vectors, for example, arithmetic average calculation. Then V 3 will be represented as a vector of the knowledge point node "what size is the album".
步骤204,对于任意两个节点:基于所述两个节点的类型,确定两个节点之间的边权,并将所述边权作为节点与节点之间的关联关系。该步骤可以由第一确定模块520执行。Step 204: For any two nodes: determine an edge weight between the two nodes based on the types of the two nodes, and use the edge weight as an association relationship between the nodes. This step may be executed by the first determining module 520.
在一些实施例中,在确定节点与节点之间的关联关系时,第一确定模块520可以基于两个节点的类型以执行不同的处理。第一确定模块520可以首先确定两个节点是否为同一类节点,并基于确定结果,确定两个节点之间的边权,然后将所述边权作为两个节点之间的关联关系。In some embodiments, when determining the association relationship between the node and the node, the first determining module 520 may perform different processing based on the types of the two nodes. The first determining module 520 may first determine whether the two nodes are the same type of node, and based on the determination result, determine the edge weight between the two nodes, and then use the edge weight as the association relationship between the two nodes.
在一些实施例中,若所述两个节点同为词语节点,第一确定模块520可以基于两个词语节点对应的词语之间共现频率确定所述两个节点之间的边权。所述共现频率可以是指两个词语在文本中同时出现的概率。概率越大,两个词语之间的关系越紧密,关联度越高。第一确定模块520可以通过点互信息算法(PMI,point-wise mutual information)确定所述共现频率。若所述两个节点中一个节点为词语节点,另一个节点为知识点节点,第一确定模块520可以基于词语节点的词语相对于知识点节点对应的知识点(包括标题和正文)的重要程度确定两个节点之间的边权。该重要程度可以理解为词语在知识点节点的内容中的被说明程度。例如,假定某一知识点节点的内容是对某一词语的解释说明(比如词语是服务提供者所提供的一项服务,知识点节点对其进行了说明),则可以认为该词语相对于该知识点节点的重要程度高。反之,若词语仅仅是知识点节点的一个构成元素,则可以认为该词语相对于该知识 点节点的重要程度低。第一确定模块520可以使用词频-逆向文件频率(TF-IDF,term frequency-inversed document frequency)衡量基于词语节点的词语相对于知识点节点对应的知识点的重要程度。若两个节点同为知识点节点,则第一确定模块520可以直接将两个节点之间的边权确定为0。参照图7,目标图谱中,两个节点之间的连线可以表示两个节点之间具备有关联关系,其可以使用PMI值(两个方框(词语节点)之间的连线),或TF-IDF值(方框与圆框(词语节点与知识点节点)之间的连线)。两个节点之间也可以没有连线,比如两个知识点节点的关联关系为0,则两个知识点节点之间不存在连线。In some embodiments, if the two nodes are both word nodes, the first determining module 520 may determine the edge weight between the two nodes based on the co-occurrence frequency between words corresponding to the two word nodes. The co-occurrence frequency may refer to the probability of two words appearing at the same time in the text. The greater the probability, the closer the relationship between the two words, and the higher the degree of association. The first determining module 520 may determine the co-occurrence frequency through a point-wise mutual information algorithm (PMI, point-wise mutual information). If one of the two nodes is a word node, and the other node is a knowledge point node, the first determining module 520 may be based on the importance of the word of the word node relative to the knowledge point (including title and text) corresponding to the knowledge point node. Determine the edge weight between two nodes. This degree of importance can be understood as the degree to which the word is explained in the content of the knowledge node. For example, assuming that the content of a knowledge point node is an explanation of a word (for example, a word is a service provided by a service provider and the knowledge point node explains it), then the word can be considered relative to the word The importance of knowledge nodes is high. Conversely, if a word is only a constituent element of a knowledge node, it can be considered that the word has a low degree of importance relative to the knowledge node. The first determining module 520 may use term frequency-inverse document frequency (TF-IDF, term frequency-inversed document frequency) to measure the importance of the term based on the term node relative to the knowledge point corresponding to the knowledge point node. If the two nodes are both knowledge point nodes, the first determining module 520 can directly determine the edge weight between the two nodes as zero. Referring to Figure 7, in the target graph, the connection between two nodes can indicate that there is an association relationship between the two nodes, which can use the PMI value (the connection between two boxes (word nodes)), or TF-IDF value (the line between the box and the circle (word node and knowledge node)). There may also be no connection between two nodes. For example, if the relationship between two knowledge point nodes is 0, there is no connection between the two knowledge point nodes.
步骤206,基于节点的向量表示,以及节点与节点之间的关联关系,进行至少一轮图聚合迭代,以更新所述图谱中节点的向量表示。Step 206, based on the vector representation of the node and the association relationship between the node and the node, perform at least one round of graph aggregation iteration to update the vector representation of the node in the graph.
该步骤可以由更新模块530执行。This step may be performed by the update module 530.
在一些实施例中,可以将步骤202与步骤204确定的节点的向量表示及边权看作所述图谱的初始表达,而具有初始表达的图谱,可以理解是一个尚未具备信息推荐功能的图谱,需要进行更新其节点的向量表示以得到更加完善的图谱的表达。In some embodiments, the vector representation and edge weights of the nodes determined in step 202 and step 204 can be regarded as the initial expression of the graph, and the graph with the initial expression can be understood as a graph that does not yet have the information recommendation function. The vector representation of its nodes needs to be updated to get a more complete representation of the graph.
在一些实施例中,所述图谱的初始表达可以利用矩阵表示。作为示例,由多个节点的向量表示构成的图谱矩阵X,以及多个节点之间的关联关系构成的关系矩阵R,可以用来表示所述图谱的初始表达。假设构成所述图谱的共有N个节点,其中每个节点的向量为300维的向量,则图谱矩阵X可以为N*300的矩阵,或者为300*N的矩阵。而对于关系矩阵R,则可以是一个N*N的矩阵,每一行或一列可以是一个节点与其他节点之间的关联关系(例如,边权)。而节点相较与自身,则边权可以是1。In some embodiments, the initial expression of the atlas may be represented by a matrix. As an example, a graph matrix X composed of vector representations of multiple nodes and a relationship matrix R composed of association relationships between multiple nodes can be used to represent the initial expression of the graph. Assuming that there are a total of N nodes constituting the graph, and the vector of each node is a 300-dimensional vector, the graph matrix X can be a matrix of N*300 or a matrix of 300*N. For the relation matrix R, it can be an N*N matrix, and each row or column can be an association relationship between a node and other nodes (for example, edge weights). When the node is compared with itself, the edge weight can be 1.
在一些实施例中,更新模块530可以对图谱的表达进行至少一轮图聚合迭代,以更新图谱的表达。在一些实施例中,图聚合可以理解为基于图谱中至少一个节点和/或边权的向量表示进行运算,利用运算结果更新图谱中另外至少一个节点和/或边权向量表示的处理过程。例如,对于每一个节点,在一轮迭代中,更新模块530可以利用该节点的邻接节点的向量表示,更新该节点的向量表示。作为示例,更新模块530可以对该节点的邻接节点在当前迭代轮次中的向量表示进行运算,例如,加权(该节点与邻接节点间的边权作为权重)平均运算,并利用运算结果更新该节点的向量表示。In some embodiments, the update module 530 may perform at least one round of graph aggregation iterations on the expression of the graph to update the expression of the graph. In some embodiments, graph aggregation can be understood as a process of performing operations based on the vector representation of at least one node and/or edge weight in the graph, and updating at least one other node and/or edge weight vector representation in the graph with the result of the operation. For example, for each node, in a round of iteration, the update module 530 may use the vector representation of the adjacent node of the node to update the vector representation of the node. As an example, the update module 530 may perform operations on the vector representation of the neighboring nodes of the node in the current iteration round, for example, a weighted average operation (the edge weight between the node and the neighboring nodes is used as the weight), and use the result of the operation to update the vector representation. The vector representation of the node.
在一些实施例中,更新模块530可以利用关系矩阵R更新所述图谱中节点的向量表示,以达到更新所述图谱的表达的目的。在一轮迭代中,更新模块530可以利用多个节点在当前迭代轮次中的向量表示,以获取向量表示矩阵,例如,前述示例中的图谱矩阵X。同时,更新模块530可以基于节点与节点之间的关联关系,确定对应于所述多个节点的邻接矩阵,例如,前述示例中的关系矩阵R。随后,更新模块530可以将向量表示矩阵与邻接矩阵进行运算,利用运算结果更新所述图谱中各节点的向量表示。例如,使用关系矩阵R对图谱矩阵X进行加权聚合,以更新图谱矩阵X。In some embodiments, the update module 530 may use the relationship matrix R to update the vector representation of the nodes in the graph, so as to achieve the purpose of updating the expression of the graph. In a round of iteration, the update module 530 may use the vector representation of multiple nodes in the current iteration round to obtain a vector representation matrix, for example, the graph matrix X in the foregoing example. At the same time, the update module 530 may determine the adjacency matrix corresponding to the multiple nodes based on the association relationship between the nodes, for example, the relationship matrix R in the foregoing example. Subsequently, the update module 530 may perform operations on the vector representation matrix and the adjacency matrix, and use the results of the operations to update the vector representation of each node in the graph. For example, the relationship matrix R is used to perform weighted aggregation on the graph matrix X to update the graph matrix X.
在一些实施例中,更新模块530利用基于神经网络的聚合模型,更新所述图谱中节点的向量表示。更新模块530可以利用基于神经网络的聚合模型,处理由利用所述多个节点的向量表示获取的向量表示矩阵,以及基于节点与节点之间的关联关系确定的对应于所述多个节点的邻接矩阵,以获得更新后的向量表示矩阵,并基于所述更新后的向量表示矩阵更新所述图谱中节点的向量表示。所述基于神经网络的聚合模型可以包括GCN(Graph Convolutional Network,图卷积网络)、GAT(Graph Attention Networks,图神经网络)等。假定向量表示矩阵以X表示(例如,图谱矩阵X),邻接矩阵以R表示(例如,关系矩阵R),以GCN为例,更新模块530可以将X与R输入至GCN中,在GCN内部,向量表示矩阵X、邻接矩阵R与GCN的模型参数W进行运算后,GCN可以将图谱节点的向量表示由X转化为X’。X’可以指更新后的向量表示矩阵。可以理解,更新后的向量表示矩阵X’能否准确表示图谱的信息,在一定程度上依赖于GCN模型参数W的准确性。In some embodiments, the update module 530 uses a neural network-based aggregation model to update the vector representation of the nodes in the graph. The update module 530 may use a neural network-based aggregation model to process the vector representation matrix obtained by using the vector representation of the multiple nodes, and the adjacency corresponding to the multiple nodes determined based on the association relationship between the nodes Matrix to obtain an updated vector representation matrix, and update vector representations of nodes in the graph based on the updated vector representation matrix. The neural network-based aggregation model may include GCN (Graph Convolutional Network, Graph Convolutional Network), GAT (Graph Attention Networks, Graph Neural Network), and the like. Suppose that the vector representation matrix is represented by X (for example, the graph matrix X), and the adjacency matrix is represented by R (for example, the relation matrix R). Taking GCN as an example, the update module 530 can input X and R into the GCN. Inside the GCN, After the vector representation matrix X, the adjacency matrix R and the model parameter W of the GCN are calculated, the GCN can convert the vector representation of the map node from X to X'. X'can refer to the updated vector representation matrix. It can be understood that whether the updated vector representation matrix X'can accurately represent the information of the map depends on the accuracy of the GCN model parameter W to a certain extent.
在一些实施例中,需要对GCN进行训练,以优化其模型参数W。在实际应用中,可以根据具体的应用场景确定GCN的预测任务,并基于预测任务对GCN进行训练。以预测两个节点的相关度作为预测任务为例,GCN可以作为预测模型的一部分,该预测模型的输入为两个节点,预测模型可以基于GCN对这两个节点的向量表示(如向量表示矩阵X’)计算这两个节点的相关度并输出。在GCN训练阶段,GCN的模型参数W为随机初始值,此时X’也是不准确的,预测模型的输入层接收训练样本的输入节点A、B,基于这两个输入节点在X’中对应的向量表示确定两个节点的相似度y,基于y与训练样本的相关度真实值的差异构造损失函数,调节GCN的模型参数W以使损失函数最小化。其中,真实值可以表示为“0”或“1”,例如,某推荐系统向用户输出A,之后用点击了B,说明节点A与节点B相关,其真实值为1,反之为0。随着训练的深入,模型参数W训练好。同时,图谱节点的向量表示矩阵X’也能更反映图谱的信息。需要说明的是,所述损失函数可以基于具体的训练任务确定,本说明书对此不做任何限制。In some embodiments, GCN needs to be trained to optimize its model parameter W. In practical applications, the prediction task of the GCN can be determined according to specific application scenarios, and the GCN can be trained based on the prediction task. Taking the prediction task of predicting the correlation of two nodes as an example, GCN can be used as a part of the prediction model. The input of the prediction model is two nodes. The prediction model can be based on the vector representation of the two nodes by GCN (such as vector representation matrix). X') Calculate the correlation between these two nodes and output. In the GCN training stage, the GCN model parameter W is a random initial value, and X'is also inaccurate at this time. The input layer of the prediction model receives the input nodes A and B of the training sample, based on the two input nodes corresponding to X' The vector represents determining the similarity y of two nodes, constructing a loss function based on the difference between the true value of the correlation between y and the training sample, and adjusting the model parameter W of GCN to minimize the loss function. Among them, the true value can be expressed as "0" or "1". For example, a recommendation system outputs A to the user, and then clicks on B, indicating that node A is related to node B, and its true value is 1, otherwise it is 0. As the training progresses, the model parameter W is trained well. At the same time, the vector representation matrix X'of the map node can also better reflect the information of the map. It should be noted that the loss function can be determined based on a specific training task, and this specification does not make any restrictions on this.
关于图谱中节点的向量表示的更新的其他描述,可以参考本说明书图3部分。For other descriptions of the update of the vector representation of the nodes in the graph, please refer to Figure 3 of this specification.
应当注意的是,上述有关流程200的描述仅仅是为了示例和说明,而不限定本说明书的适用范围。对于本领域技术人员来说,在本说明书的指导下可以对流程200进行各种修正和改变。然而,这些修正和改变仍在本说明书的范围之内。It should be noted that the foregoing description of the process 200 is only for example and description, and does not limit the scope of application of this specification. For those skilled in the art, various modifications and changes can be made to the process 200 under the guidance of this specification. However, these corrections and changes are still within the scope of this specification.
图3是根据本说明书一些实施例所示的更新图谱的初始表达的示例性流程图。在一些实施例中,流程300可以由信息推荐系统500,或图1所示的处理设备110实现。例如,流程200可以以程序或指令的形式存储在存储装置(如存储设备140)中,所述程序或指令在被执行时,可以实现流程200。在一些实施例中,流程300可以是描述一轮迭代的具体过程。在一些实施例中,流程300可以由更新模块530执行。如图3所示,流程300可以包括以下步骤。Fig. 3 is an exemplary flow chart of updating the initial expression of the map according to some embodiments of the present specification. In some embodiments, the process 300 may be implemented by the information recommendation system 500 or the processing device 110 shown in FIG. 1. For example, the process 200 may be stored in a storage device (such as the storage device 140) in the form of a program or instruction, and the program or instruction may implement the process 200 when the program or instruction is executed. In some embodiments, the process 300 may describe a specific process of a round of iteration. In some embodiments, the process 300 may be executed by the update module 530. As shown in FIG. 3, the process 300 may include the following steps.
步骤302,利用所述多个节点在当前迭代轮次中的向量表示,获取向量表示矩阵。Step 302: Use the vector representation of the multiple nodes in the current iteration round to obtain a vector representation matrix.
在一些实施例中,更新模块530可以排列所述多个节点在当前迭代轮次中的向量表示,以获取所述向量表示矩阵。作为示例,假定构成所述图谱的共有N个节点,其中每个节点的向量为300维的向量,则更新模块530可以按行排列节点的向量,以构成N*300的向量表示矩阵,或者按列排列节点的向量,以构成300*N的向量表示矩阵。In some embodiments, the update module 530 may arrange the vector representations of the multiple nodes in the current iteration round to obtain the vector representation matrix. As an example, assuming that there are a total of N nodes constituting the graph, and the vector of each node is a 300-dimensional vector, the update module 530 can arrange the vector of the nodes in rows to form an N*300 vector representation matrix, or press Arrange the vector of nodes in columns to form a 300*N vector to represent the matrix.
步骤304,基于节点与节点之间的关联关系,确定对应于所述多个节点的邻接矩阵。Step 304: Determine an adjacency matrix corresponding to the multiple nodes based on the association relationship between the nodes.
在一些实施例中,所述多个节点之间的关联关系可以使用矩阵的形式表示,如步骤206中所提到的关系矩阵R。在本说明书中,该关系矩阵R也可以被称为邻接矩阵A,用以表示某一节点与其他所有节点之间的关联关系。假定总共有N个节点,则邻接矩阵A为N*N的矩阵。矩阵中第i行第j列的数表示节点i与节点j之间的关联关系,比如边权。出于说明的目的,以下示出了一个简略的邻接矩阵A:而对于关系矩阵R,则可以是一个N*N的矩阵,每一行或一列可以是一个节点与其他节点之间的关联关系(例如,边权)。而节点相较与自身,则边权可以是1In some embodiments, the association relationship between the multiple nodes may be expressed in the form of a matrix, such as the relationship matrix R mentioned in step 206. In this specification, the relationship matrix R can also be referred to as the adjacency matrix A, which is used to represent the relationship between a certain node and all other nodes. Assuming that there are a total of N nodes, the adjacency matrix A is an N*N matrix. The number in the i-th row and j-th column of the matrix represents the relationship between node i and node j, such as edge weights. For illustrative purposes, a simplified adjacency matrix A is shown below: For the relational matrix R, it can be an N*N matrix, and each row or column can be the association relationship between a node and other nodes ( For example, Bian Quan). And the node is compared with itself, the edge weight can be 1
Figure PCTCN2021088763-appb-000001
Figure PCTCN2021088763-appb-000001
其中,A ij表示第i个节点和第j个节点之间的关联关系。当第i个节点和第j个节点都为词语节点时,A ij=PMI(i,j);当第i个节点为词语节点,第j个节点为知识点节点时,A ij=TF-IDF(i,j);当i=j时,即,第i个节点相对于其本身,A ij=1;当第i个节点和第j个节点都为知识点节点时,A ij=0,表明两个知识点节点之间没有关联关系。 Among them, A ij represents the relationship between the i-th node and the j-th node. When the i-th node and the j-th node are both word nodes, A ij =PMI(i,j); when the i-th node is a word node and the j-th node is a knowledge node, A ij =TF- IDF(i,j); when i=j, that is, the i-th node relative to itself, A ij =1; when both the i-th node and the j-th node are knowledge point nodes, A ij =0 , Indicating that there is no correlation between the two knowledge point nodes.
步骤306,将所述向量表示矩阵与所述邻接矩阵进行运算,利用运算结果更新所述图谱中各节点的向量表示。Step 306: Perform operations on the vector representation matrix and the adjacency matrix, and use the results of the operations to update the vector representation of each node in the graph.
在一些实施例中,更新模块530可以利用邻接矩阵A,对向量表示矩阵(此处赋予标号X)进行加权平均计算。例如,根据加权平均算法的公式aggregate(X)=A*X,通过邻接矩阵A对向量表示矩阵X进行计算,将计算结果X’中包含的各向量作为当前迭代轮次更新后的节点的向量表示。In some embodiments, the update module 530 may use the adjacency matrix A to perform a weighted average calculation on the vector representation matrix (here assigned the label X). For example, according to the formula aggregate(X)=A*X of the weighted average algorithm, the vector representation matrix X is calculated through the adjacency matrix A, and the vectors contained in the calculation result X'are used as the vectors of the nodes after the current iteration. Express.
在一些实施例中,在一轮迭代里,更新模块530也可以是针对每个节点单独进行更新。对于任一节点,更新模块530可以基于节点与节点之间的关联关系,确定该节点的邻接节点。所述邻接节点可以是与该节点直接相接的节点,可以理解为两个节点之间存在关联关系(例如两个节点之间有边权比如PMI或TD-IDF)。参阅图7,图7中所示的词语节点“照片”的邻接节点可以是词语节点“尺寸”、词语节点“包邮”、知识点节点“照片有哪些尺寸”。词语节点“照片”与以上节点之间都是通过一条连线直接相连。在确定该节点的邻接节点后,更新模块530可以基于该节点与邻接节点之间的边权对邻接节点的向量表示进行加权平均运算,将运算结 果作为更新后的该节点的向量表示。例如,词语节点“照片”的邻接节点是词语节点“尺寸”、词语节点“包邮”、知识点节点“照片有哪些尺寸”,在更新词语节点“照片”的向量表示时,可以将这三个邻接节点的向量表示进行加权平均,将计算结果作为词语节点“照片”更新后的向量表示。其中,加权平均算法中各邻接节点的向量表示的权重可以基于该节点与各邻接节点的关联关系确定。例如,邻接矩阵A中元素的值可以用作所述权重。In some embodiments, in one round of iteration, the update module 530 may also be updated separately for each node. For any node, the update module 530 may determine the adjacent node of the node based on the association relationship between the node and the node. The adjacent node may be a node directly connected to the node, which can be understood as an association relationship between two nodes (for example, there is an edge weight between the two nodes, such as PMI or TD-IDF). Referring to FIG. 7, the adjacent nodes of the word node "photo" shown in FIG. 7 may be the word node "size", the word node "Free shipping", and the knowledge node "what size is the photo". The word node "photo" and the above nodes are directly connected by a line. After determining the adjacent node of the node, the update module 530 may perform a weighted average operation on the vector representation of the adjacent node based on the edge weight between the node and the adjacent node, and use the calculation result as the updated vector representation of the node. For example, the adjacent nodes of the word node "photo" are the word node "size", the word node "Free shipping", and the knowledge node "what size is the photo". When the vector representation of the word node "photo" is updated, these three The vector representation of the adjacent nodes is weighted and averaged, and the calculation result is used as the updated vector representation of the word node "photo". Among them, the weight represented by the vector of each adjacent node in the weighted average algorithm can be determined based on the association relationship between the node and each adjacent node. For example, the value of the element in the adjacency matrix A can be used as the weight.
以上描述了一轮迭代的过程。更新模块530可以依据以上描述对所述图谱中的初始表达进行一次或以上迭代的更新(例如,更新一次或以上次的节点的向量表示),以获得所述图谱的最终表达。可以理解,可以对图谱中的每个节点的向量表示按照步骤304的方式进行更新,当每个节点都被更新过设定次数后,可以认为更新完成。或者,进行不断的更新,直到每个节点的向量表示的变化小于设定阈值。作为示例,经过一次更新迭代,图谱矩阵X将被更新为X’=aggregate(X)=A*X。进行下一次迭代,图谱矩阵X将被更新为X”=aggregate(X’)=A*X’。在进行第三次迭代时,图谱矩阵X将被更新为X”’=aggregate(X”)=A*X”。以此类推。所述迭代的轮次次数可以预先设定,例如,3次,5次,7次等,本说明书不对其进行限制。迭代完成后,经过数次更新后的图谱矩阵X联合关系矩阵R(即,邻接矩阵A)可以作为目标图谱。The above describes a round of iterative process. The update module 530 may update the initial expression in the graph for one or more iterations (for example, update the vector representation of the nodes one or more times) according to the above description to obtain the final expression of the graph. It can be understood that the vector representation of each node in the graph can be updated in the manner of step 304. When each node has been updated a set number of times, it can be considered that the update is complete. Or, continue to update until the change in the vector representation of each node is less than the set threshold. As an example, after one update iteration, the graph matrix X will be updated as X'=aggregate(X)=A*X. In the next iteration, the graph matrix X will be updated to X”=aggregate(X')=A*X'. In the third iteration, the graph matrix X will be updated to X”'=aggregate(X”) =A*X". And so on. The number of rounds of the iteration can be preset, for example, 3 times, 5 times, 7 times, etc., which are not limited in this specification. After the iteration is completed, the graph matrix X and the joint relation matrix R (ie, the adjacency matrix A) after several updates can be used as the target graph.
应当注意的是,上述有关流程300的描述仅仅是为了示例和说明,而不限定本说明书的适用范围。对于本领域技术人员来说,在本说明书的指导下可以对流程300进行各种修正和改变。然而,这些修正和改变仍在本说明书的范围之内。It should be noted that the foregoing description of the process 300 is only for example and description, and does not limit the scope of application of this specification. For those skilled in the art, various modifications and changes can be made to the process 300 under the guidance of this specification. However, these corrections and changes are still within the scope of this specification.
图4是根据本说明书一些实施例所示的利用目标图谱进行信息推荐的示例性流程图。在一些实施例中,流程300可以由信息推荐系统600,或图1所示的处理设备110实现。例如,流程400可以以程序或指令的形式存储在存储装置(如存储设备140)中,所述程序或指令在被执行时,可以实现流程400。如图4所示,流程300可以包括以下步骤。Fig. 4 is an exemplary flow chart of information recommendation using a target graph according to some embodiments of the present specification. In some embodiments, the process 300 may be implemented by the information recommendation system 600 or the processing device 110 shown in FIG. 1. For example, the process 400 may be stored in a storage device (such as the storage device 140) in the form of a program or instruction, and when the program or instruction is executed, the process 400 may be implemented. As shown in FIG. 4, the process 300 may include the following steps.
步骤402,获取输入信息。Step 402: Obtain input information.
该步骤可以由信息第二获取模块610执行。This step may be performed by the second information acquisition module 610.
在一些实施例中,所述输入信息可以是用户从预先向用户提供的候选词语中选中的一个或多个词语。例如,在进行信息推荐时,处理设备110(或信息推荐系统600)可以将向用户提供的候选词语发送至用户终端130上并显示。显示的形式可以是多个气泡推荐,每个气泡对应一个候选词语。用户可以通过点击候选词语中的一个或多个,向处理设备110(或信息推荐系统600)进行点击反馈。反馈内容即为所述输入信息。例如,预先向用户提供的候选词语有“照片”、“上衣”、“鞋子”、“尺寸”等,用户从中选择了“照片”这一个词语,则输入信息为词语“照片”。当用户从中选择了“照片”、“尺寸”两个词语,则输入信息为词语“照片”以及“尺寸”。在一些实施例中,所述预先向用户提供的候选词语可以是历史上用户进行咨询时所出现的高频词语,也可以是处理设备110(或信息推荐系统600)的使用者(例如服务提供者) 所提供的服务相关的词语。假定服务提供者所提供的服务为网络衣物售贩,则预先向用户提供的候选词语可以包括“尺寸”、“优惠”、“包邮”等。In some embodiments, the input information may be one or more words selected by the user from candidate words provided to the user in advance. For example, when performing information recommendation, the processing device 110 (or the information recommendation system 600) may send the candidate words provided to the user to the user terminal 130 and display them. The display format can be multiple bubble recommendations, and each bubble corresponds to a candidate word. The user can click one or more of the candidate words to provide click feedback to the processing device 110 (or the information recommendation system 600). The feedback content is the input information. For example, the candidate words provided to the user in advance include "photo", "top", "shoes", "size", etc., and the user selects the word "photo" among them, and the input information is the word "photo". When the user selects the two words "photo" and "size", the input information is the words "photo" and "size". In some embodiments, the candidate words provided to the user in advance may be high-frequency words that appeared during the user's consultation in history, or may be a user of the processing device 110 (or the information recommendation system 600) (for example, a service provider).者) Terms related to the service provided. Assuming that the service provided by the service provider is an online clothing vendor, the candidate words provided to the user in advance may include "size", "discount", "free shipping", and so on.
步骤404,利用所述图谱,确定所述输入信息在所述图谱中对应的节点。Step 404: Using the graph, determine the node corresponding to the input information in the graph.
该步骤可以由第二确定模块620执行。This step may be executed by the second determining module 620.
在一些实施例中,所述图谱可以为所述目标图谱。关于目标图谱的具体描述,可以参照本说明书图2与图3中的相关内容。In some embodiments, the map may be the target map. For a specific description of the target map, you can refer to the relevant content in Figures 2 and 3 of this specification.
在一些实施例中,第二确定模块620可以将所述输入信息中的词语,与在所述目标图谱中的词语节点对应的词语进行比对,以确定所述输入信息对应的节点。例如,假定输入信息包括词语“照片”,则第二确定模块620可以将词语“照片”在目标图谱中对应的词语节点“照片”确定为所述输入信息对应的节点。假定输入信息包括词语“照片”、“尺寸”,则第二确定模块620可以将词语“照片”、“尺寸”在目标图谱中对应的词语节点“照片”、以及词语节点“尺寸”确定为所述输入信息对应的节点。In some embodiments, the second determining module 620 may compare the words in the input information with the words corresponding to the word nodes in the target graph to determine the node corresponding to the input information. For example, assuming that the input information includes the word "photo", the second determining module 620 may determine the word node "photo" corresponding to the word "photo" in the target atlas as the node corresponding to the input information. Assuming that the input information includes the words "photo" and "size", the second determining module 620 may determine the word node "photo" and the word node "size" corresponding to the words "photo" and "size" in the target atlas as the words "photo" and "size". Describe the node corresponding to the input information.
步骤406,基于所述节点的向量表示,以及所述节点的邻接节点的向量表示,确定推荐节点。Step 406: Determine a recommended node based on the vector representation of the node and the vector representation of the adjacent nodes of the node.
该步骤可以由第三确定模块630执行。This step may be performed by the third determining module 630.
关于节点的向量表示、以及节点的邻接节点的向量表示的相关内容可以参照本说明书中图2及图3中的相关描述。For the related content of the vector representation of the node and the vector representation of the adjacent nodes of the node, please refer to the related descriptions in FIG. 2 and FIG. 3 in this specification.
在一些实施例中,第三确定模块630可以分别确定所述节点的向量表示与该节点的每一个邻接节点的向量表示之间的距离。所述距离可以是闵可夫斯基距离、欧式距离、曼哈顿距离、切比雪夫距离、夹角余弦、汉明距离、杰卡德相似系数等。第三确定模块630可以将所述距离最近(比如距离值最小)对应的节点确定为推荐节点。参照图7,假定输入信息为词语“照片”,第三确定模块630可以确定词语节点“照片”的向量表示,与词语节点“照片”的邻接节点词语节点“尺寸”、词语节点“包邮”、知识点节点“照片有哪些尺寸”、各自的向量表示之间的距离,并将对应距离最近的一个或多个节点确定为推荐节点。In some embodiments, the third determining module 630 may respectively determine the distance between the vector representation of the node and the vector representation of each adjacent node of the node. The distance may be Minkowski distance, Euclidean distance, Manhattan distance, Chebyshev distance, angle cosine, Hamming distance, Jackard similarity coefficient, and the like. The third determining module 630 may determine the node corresponding to the closest distance (for example, the smallest distance value) as the recommended node. 7, assuming that the input information is the word "photo", the third determining module 630 can determine the vector representation of the word node "photo", the word node "size" and the word node "Free shipping" adjacent to the word node "photo" , Knowledge point node "what size of the photo", the respective vector represents the distance between them, and determine the one or more nodes with the closest corresponding distance as the recommended node.
步骤408,将与所述推荐节点相关的信息作为输出。Step 408: Output the information related to the recommended node.
该步骤可以由输出模块640执行。This step can be performed by the output module 640.
在一些实施例中,当所述推荐节点只包括知识点节点时,输出模块640可以将知识点节点对应的知识点正文作为相关信息输出。例如,假定用户选择两个词语“照片”、“尺寸”作为输入信息,根据步骤404及步骤406确定知识点节点“照片有哪些尺寸”为推荐节点。推荐节点中只包含知识点节点,则输出模块640可以关于知识点“照片有哪些尺寸”对应的知识点正文比如“1寸2.5*3.5(cm),2寸3.6*4.7(cm),3寸5.8*8.4(cm)”作为输出,并推荐给用户。In some embodiments, when the recommended nodes only include knowledge point nodes, the output module 640 may output the knowledge point text corresponding to the knowledge point node as related information. For example, suppose that the user selects two words "photo" and "size" as input information, according to step 404 and step 406, the knowledge point node "what size of photo" is determined as the recommended node. The recommended node contains only the knowledge point node, and the output module 640 can read the text of the knowledge point corresponding to the knowledge point "what size of the photo", such as "1 inch 2.5*3.5 (cm), 2 inch 3.6*4.7 (cm), 3 inch" 5.8*8.4(cm)” as output and recommended to users.
在一些实施例中,当所述推荐节点包括有词语节点时,处理设备110(或信息推荐系统600)可以将推荐节点对应的词语再次推荐给用户,让用户从中选择词语,并基于用户的选择再次确定推荐节点。例如,当所述推荐节点确定为词语节点“尺寸”以及词语节点“照片”时,处理设备110(或信息推荐系统600)可以将词语“尺寸”以及“照片”再次推荐给用户进行选择。若用户再次选择了词语“照片”,则处理设备110(或信息推荐系统600)可以重复步骤402至406,重新确定推荐节点。若重新确定的推荐节点包括知识点节点“照片有哪些尺寸”,输出模块640可以将知识点“照片有哪些尺寸”对应的正文比如“1寸2.5*3.5(cm),2寸3.6*4.7(cm),3寸5.8*8.4(cm)”作为输出,并推荐给用户。若重新确定的推荐节点任然不包括词语节点,则以上过程将再次重复一遍,直至推荐节点中包括至少一个知识点节点。In some embodiments, when the recommendation node includes a word node, the processing device 110 (or the information recommendation system 600) may recommend the word corresponding to the recommendation node to the user again, allowing the user to select a word from it, and based on the user's selection Determine the recommended node again. For example, when the recommendation node is determined to be the word node "size" and the word node "photo", the processing device 110 (or the information recommendation system 600) may recommend the words "size" and "photo" to the user again for selection. If the user selects the word "photo" again, the processing device 110 (or the information recommendation system 600) may repeat steps 402 to 406 to re-determine the recommendation node. If the re-determined recommended nodes include the knowledge point node "What size of the photo", the output module 640 can convert the knowledge point "What size of the photo" to the text, such as "1 inch 2.5*3.5 (cm), 2 inch 3.6*4.7 ( cm), 3 inches 5.8*8.4(cm)” as output and recommended to users. If the re-determined recommended node still does not include the word node, the above process will be repeated again until the recommended node includes at least one knowledge point node.
应当注意的是,上述有关流程确定用于信息推荐的图谱方法的描述仅仅是为了示例和说明,而不限定本说明书的适用范围。对于本领域技术人员来说,在本说明书的指导下可以对流程确定用于信息推荐的图谱方法进行各种修正和改变。然而,这些修正和改变仍在本说明书的范围之内。例如,在流程确定用于信息推荐的图谱方法中添加其他步骤,例如,存储步骤、检验步骤等。It should be noted that the above description of the process to determine the map method used for information recommendation is only for example and explanation, and does not limit the scope of application of this specification. For those skilled in the art, under the guidance of this specification, various corrections and changes can be made to the map method for determining the process for information recommendation. However, these corrections and changes are still within the scope of this specification. For example, other steps are added to the map method used for information recommendation in the process of determining, for example, storage steps, verification steps, and so on.
图5是根据本说明书一些实施例所示的确定用于信息推荐的图谱的系统500的模块图。FIG. 5 is a block diagram of a system 500 for determining a graph for information recommendation according to some embodiments of the present specification.
如图5所示,该确定用于信息推荐的图谱的系统500可以包括第一获取模块510、第一确定模块520、以及更新模块530。As shown in FIG. 5, the system 500 for determining a graph for information recommendation may include a first obtaining module 510, a first determining module 520, and an updating module 530.
第一获取模块510可以用于获取构建目标图谱的多个节点。所述目标图谱可以是指为用户进行信息推荐时所使用的图谱,其包含有多个节点,每个节点可以对应于一份信息。所述节点至少包括词语节点,以及知识点节点。所述词语节点所对应的信息可以是一个词语。所述知识点节点所对应的信息可以是一个知识点。所述知识点可以由标题以及正文组成,标题可以是一个问题,正文可以是该问题的答案。在一些实施例中,所述多个节点可以是预先存储在存储设备,例如,处理设备110自带的存储设备,或存储设备140中。其可以是根据用户在历史上的咨询或自身的服务范围确定并预先存储的。第一获取模块510可以与存储设备进行通信后,读取所述多个节点。The first obtaining module 510 may be used to obtain multiple nodes for constructing the target graph. The target graph may refer to a graph used when making information recommendations for users, which includes multiple nodes, and each node may correspond to a piece of information. The nodes include at least word nodes and knowledge point nodes. The information corresponding to the word node may be a word. The information corresponding to the knowledge point node may be a knowledge point. The knowledge points can be composed of a title and a text. The title can be a question, and the text can be the answer to the question. In some embodiments, the multiple nodes may be pre-stored in a storage device, for example, the storage device that comes with the processing device 110, or the storage device 140. It can be determined and stored in advance according to the user's historical consultation or his own service scope. The first acquiring module 510 may read the multiple nodes after communicating with the storage device.
在一些实施例中,第一获取模块510可以根据节点类型(词语节点以及知识点节点)的不同,为每个节点确定其对应的向量表示。若所述节点为词语节点,第一获取模块510可以将该节点对应的词语的向量表示作为该节点的向量表示。若所述节点为知识点节点,第一获取模块510可以基于与所述知识点节点相关的词语的向量表示,确定对所述知识点节点的向量表示。In some embodiments, the first obtaining module 510 may determine the corresponding vector representation for each node according to the difference of node types (word nodes and knowledge node nodes). If the node is a word node, the first obtaining module 510 may use the vector representation of the word corresponding to the node as the vector representation of the node. If the node is a knowledge point node, the first obtaining module 510 may determine the vector representation of the knowledge point node based on the vector representation of the words related to the knowledge point node.
第一确定模块520可以基于所述两个节点的类型,确定两个节点之间的边权,并将所述边权作为节点与节点之间的关联关系。第一确定模块520可以对于任意两个节点执行以上操作。在一些实施例中,在确定节点与节点之间的关联关系时,第一确定模块520可以基于 两个节点的类型以执行不同的处理。第一确定模块520可以首先确定两个节点是否为同一类节点,并基于确定结果,确定两个节点之间的边权,然后将所述边权作为两个节点之间的关联关系。若所述两个节点同为词语节点,第一确定模块520可以基于两个词语节点对应的词语之间共现频率确定所述两个节点之间的边权。若所述两个节点中一个节点为词语节点,另一个节点为知识点节点,第一确定模块520可以基于词语节点的词语相对于知识点节点对应的知识点(包括标题和正文)的重要程度确定两个节点之间的边权。若两个节点同为知识点节点,则第一确定模块520可以直接将两个节点之间的边权确定为0。The first determining module 520 may determine the edge weight between the two nodes based on the types of the two nodes, and use the edge weight as the association relationship between the nodes. The first determining module 520 can perform the above operations on any two nodes. In some embodiments, when determining the association relationship between the node and the node, the first determining module 520 may perform different processing based on the types of the two nodes. The first determining module 520 may first determine whether the two nodes are the same type of node, and based on the determination result, determine the edge weight between the two nodes, and then use the edge weight as the association relationship between the two nodes. If the two nodes are both word nodes, the first determining module 520 may determine the edge weight between the two nodes based on the co-occurrence frequency between the words corresponding to the two word nodes. If one of the two nodes is a word node, and the other node is a knowledge point node, the first determining module 520 may be based on the importance of the word of the word node relative to the knowledge point (including title and text) corresponding to the knowledge point node. Determine the edge weight between two nodes. If the two nodes are both knowledge point nodes, the first determining module 520 can directly determine the edge weight between the two nodes as zero.
更新模块530可以基于节点的向量表示,以及节点与节点之间的关联关系,进行至少一轮图聚合迭代,以更新所述图谱中节点的向量表示。在一些实施例中,对于每一个节点,更新模块530可以利用节点的邻接节点的向量表示,更新节点的向量表示。作为示例,更新模块530可以对邻接节点的向量表示进行运算,例如,加权平均运算,并利用运算结果更新节点的向量表示。更新模块530也可以利用节点与节点间的关联关系更新所述图谱中节点的向量表示,以确定所述目标图谱。更新模块530还可以利用基于神经网络的聚合模型,更新所述初始图谱中节点的向量表示。The update module 530 may perform at least one round of graph aggregation iteration based on the vector representation of the node and the association relationship between the nodes to update the vector representation of the nodes in the graph. In some embodiments, for each node, the update module 530 can update the vector representation of the node by using the vector representation of the neighboring nodes of the node. As an example, the update module 530 may perform an operation on the vector representation of the adjacent node, for example, a weighted average operation, and update the vector representation of the node with the result of the operation. The update module 530 may also update the vector representation of the nodes in the graph by using the association relationship between the nodes to determine the target graph. The update module 530 may also use a neural network-based aggregation model to update the vector representation of the nodes in the initial map.
关于系统500的模块的更多描述可以参见本说明书流程图部分,例如,图2至图3。For more description of the modules of the system 500, please refer to the flowchart part of this specification, for example, FIG. 2 to FIG. 3.
图6是根据本说明书一些实施例所示的利用目标图谱进行的信息推荐的系统600的模块图。FIG. 6 is a block diagram of a system 600 for information recommendation using target graphs according to some embodiments of this specification.
如图6所示,该利用确定的图谱进行的信息推荐的系统600可以包括第二获取模块610、第二确定模块620、第三确定模块630和输出模块640。As shown in FIG. 6, the information recommendation system 600 using the determined atlas may include a second acquisition module 610, a second determination module 620, a third determination module 630, and an output module 640.
第二获取模块610可以用于获取输入信息。在一些实施例中,所述输入信息可以是用户从预先向用户提供的候选词语中选中的一个或多个词语。例如,在进行信息推荐时,处理设备110(或信息推荐系统600)可以将向用户提供的候选词语发送至用户终端130上并显示。显示的形式可以是多个气泡推荐,每个气泡对应一个候选词语。用户可以通过点击候选词语中的一个或多个,向处理设备110(或信息推荐系统600)进行点击反馈。反馈内容即为所述输入信息。The second obtaining module 610 may be used to obtain input information. In some embodiments, the input information may be one or more words selected by the user from candidate words provided to the user in advance. For example, when performing information recommendation, the processing device 110 (or the information recommendation system 600) may send the candidate words provided to the user to the user terminal 130 and display them. The display format can be multiple bubble recommendations, and each bubble corresponds to a candidate word. The user can click one or more of the candidate words to provide click feedback to the processing device 110 (or the information recommendation system 600). The feedback content is the input information.
第二确定模块620可以用于利用所述图谱,确定所述输入信息在所述图谱中对应的节点。在一些实施例中,所述图谱可以为所述目标图谱。第二确定模块620可以将所述输入信息中的词语,与在所述目标图谱中的词语节点对应的词语进行比对,以确定所述输入信息对应的节点。The second determining module 620 may be configured to use the graph to determine the node corresponding to the input information in the graph. In some embodiments, the map may be the target map. The second determining module 620 may compare the words in the input information with the words corresponding to the word nodes in the target graph to determine the node corresponding to the input information.
第三确定模块630可以用于基于所述节点的向量表示,以及所述节点的邻接节点的向量表示,确定推荐节点。在一些实施例中,第三确定模块630可以分别确定所述节点的向量表示与该节点的每一个邻接节点的向量表示之间的距离,并将所述距离最近(比如距离值最小)对应的节点确定为推荐节点。The third determining module 630 may be configured to determine the recommended node based on the vector representation of the node and the vector representation of the adjacent nodes of the node. In some embodiments, the third determining module 630 may respectively determine the distance between the vector representation of the node and the vector representation of each adjacent node of the node, and correspond to the closest distance (for example, the smallest distance value). The node is determined as the recommended node.
输出模块640可以用于将与所述推荐节点相关的信息作为输出。在一些实施例中,当所述推荐节点只包括知识点节点时,输出模块640可以将知识点节点对应的知识点正文作为相关信息输出。当所述推荐节点包括有词语节点时,系统600可以再一次获取用户的输入信息,并再次确定推荐节点直到推荐节点中包括至少一个知识点节点。此时,输出模块640可以向用户输出该至少一个知识点节点。The output module 640 may be used to output information related to the recommended node. In some embodiments, when the recommended nodes only include knowledge point nodes, the output module 640 may output the knowledge point text corresponding to the knowledge point node as related information. When the recommended node includes a word node, the system 600 can obtain the user's input information again, and determine the recommended node again until the recommended node includes at least one knowledge point node. At this time, the output module 640 may output the at least one knowledge point node to the user.
关于系统600的模块的更多描述可以参考本说明书流程图部分,例如,图4。For more description of the modules of the system 600, please refer to the flowchart part of this specification, for example, FIG. 4.
应当理解,图5和图6所示的系统及其模块可以利用各种方式来实现。例如,在一些实施例中,系统及其模块可以通过硬件、软件或者软件和硬件的结合来实现。其中,硬件部分可以利用专用逻辑来实现;软件部分则可以存储在存储器中,由适当的指令执行系统,例如微处理器或者专用设计硬件来执行。本领域技术人员可以理解上述的方法和系统可以使用计算机可执行指令和/或包含在处理器控制代码中来实现,例如在诸如磁盘、CD或DVD-ROM的载体介质、诸如只读存储器(固件)的可编程的存储器或者诸如光学或电子信号载体的数据载体上提供了这样的代码。本说明书的系统及其模块不仅可以有诸如超大规模集成电路或门阵列、诸如逻辑芯片、晶体管等的半导体、或者诸如现场可编程门阵列、可编程逻辑设备等的可编程硬件设备的硬件电路实现,也可以用例如由各种类型的处理器所执行的软件实现,还可以由上述硬件电路和软件的结合(例如,固件)来实现。It should be understood that the system and its modules shown in FIG. 5 and FIG. 6 can be implemented in various ways. For example, in some embodiments, the system and its modules may be implemented by hardware, software, or a combination of software and hardware. Among them, the hardware part can be implemented using dedicated logic; the software part can be stored in a memory and executed by an appropriate instruction execution system, such as a microprocessor or dedicated design hardware. Those skilled in the art can understand that the above-mentioned methods and systems can be implemented using computer-executable instructions and/or included in processor control codes, for example on a carrier medium such as a disk, CD or DVD-ROM, such as a read-only memory (firmware Such codes are provided on a programmable memory or a data carrier such as an optical or electronic signal carrier. The system and its modules in this specification can not only be implemented by hardware circuits such as very large-scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc. It may also be implemented by software executed by various types of processors, or may be implemented by a combination of the foregoing hardware circuit and software (for example, firmware).
需要注意的是,以上对于候选项显示、确定系统及其模块的描述,仅为描述方便,并不能把本说明书限制在所举实施例范围之内。可以理解,对于本领域的技术人员来说,在了解该系统的原理后,可能在不背离这一原理的情况下,对各个模块进行任意组合,或者构成子系统与其他模块连接。例如,图5中披露的第一确定模块520、以及第二确定模块530,或图6中披露的第三确定模块620、以及第四确定模块630可以是一个系统中的不同模块,也可以是一个模块实现上述的两个或两个以上模块的功能。又例如,各个模块可以共用一个存储模块,各个模块也可以分别具有各自的存储模块。诸如此类的变形,均在本说明书的保护范围之内。It should be noted that the above description of the candidate item display, determination system and its modules is only for convenience of description, and does not limit this specification to the scope of the examples mentioned. It can be understood that for those skilled in the art, after understanding the principle of the system, it is possible to arbitrarily combine various modules, or form a subsystem to connect with other modules without departing from this principle. For example, the first determination module 520 and the second determination module 530 disclosed in FIG. 5, or the third determination module 620 and the fourth determination module 630 disclosed in FIG. One module implements the functions of the two or more modules mentioned above. For another example, each module may share a storage module, and each module may also have its own storage module. Such deformations are all within the protection scope of this specification.
本说明书实施例可能带来的有益效果包括但不限于:(1)本说明书通过向用户推荐更准确且具有区分度的词语,供用户进行选择,进而向用户回复更准确的信息,提高回复信息的准确度,降低云客服机器人的处理难度,及提高用户体验。(2)本说明书通过采用每个节点的邻接节点来优化其节点的向量表示,可以获取到两个节点之间更加精确的关联程度,使得为用户推荐的词语以及回复信息更加准确。(3)本说明书通过图谱的邻接信息训练模型,依赖无监督数据,避免了对人工打标数据的依赖。需要说明的是,不同实施例可能产生的有益效果不同,在不同的实施例里,可能产生的有益效果可以是以上任意一种或几种的组合,也可以是其他任何可能获得的有益效果。The possible beneficial effects of the embodiments of this specification include but are not limited to: (1) This specification recommends more accurate and distinguishable words to the user for the user to choose, and then responds to the user with more accurate information and improves the response information The accuracy of the cloud customer service robot is reduced, and the user experience is improved. (2) This specification optimizes the vector representation of each node by using the adjacent nodes of each node to obtain a more accurate degree of association between two nodes, so that the words recommended for the user and the reply information are more accurate. (3) This manual trains the model through the adjacent information of the graph and relies on unsupervised data, avoiding the dependence on manual marking data. It should be noted that different embodiments may have different beneficial effects. In different embodiments, the possible beneficial effects may be any one or a combination of the above, or any other beneficial effects that may be obtained.
上文已对基本概念做了描述,显然,对于本领域技术人员来说,上述详细披露仅仅作为示例,而并不构成对本说明书的限定。虽然此处并没有明确说明,本领域技术人员可能会 对本说明书进行各种修改、改进和修正。该类修改、改进和修正在本说明书中被建议,所以该类修改、改进、修正仍属于本说明书示范实施例的精神和范围。The basic concepts have been described above. Obviously, for those skilled in the art, the above detailed disclosure is only an example, and does not constitute a limitation to this specification. Although it is not explicitly stated here, those skilled in the art may make various modifications, improvements and amendments to this specification. Such modifications, improvements, and corrections are suggested in this specification, so such modifications, improvements, and corrections still belong to the spirit and scope of the exemplary embodiments of this specification.
同时,本说明书使用了特定词语来描述本说明书的实施例。如“一个实施例”、“一实施例”、和/或“一些实施例”意指与本说明书至少一个实施例相关的某一特征、结构或特点。因此,应强调并注意的是,本说明书中在不同位置两次或多次提及的“一实施例”或“一个实施例”或“一个替代性实施例”并不一定是指同一实施例。此外,本说明书的一个或多个实施例中的某些特征、结构或特点可以进行适当的组合。Meanwhile, this specification uses specific words to describe the embodiments of this specification. For example, "one embodiment", "an embodiment", and/or "some embodiments" mean a certain feature, structure, or characteristic related to at least one embodiment of this specification. Therefore, it should be emphasized and noted that “one embodiment” or “one embodiment” or “an alternative embodiment” mentioned twice or more in different positions in this specification does not necessarily refer to the same embodiment. . In addition, some features, structures, or characteristics in one or more embodiments of this specification can be appropriately combined.
此外,本领域技术人员可以理解,本说明书的各方面可以通过若干具有可专利性的种类或情况进行说明和描述,包括任何新的和有用的工序、机器、产品或物质的组合,或对他们的任何新的和有用的改进。相应地,本说明书的各个方面可以完全由硬件执行、可以完全由软件(包括固件、常驻软件、微码等)执行、也可以由硬件和软件组合执行。以上硬件或软件均可被称为“数据块”、“模块”、“引擎”、“单元”、“组件”或“系统”。此外,本说明书的各方面可能表现为位于一个或多个计算机可读介质中的计算机产品,该产品包括计算机可读程序编码。In addition, those skilled in the art can understand that various aspects of this specification can be explained and described through a number of patentable categories or situations, including any new and useful process, machine, product, or combination of substances, or a combination of them. Any new and useful improvements. Correspondingly, various aspects of this specification can be completely executed by hardware, can be completely executed by software (including firmware, resident software, microcode, etc.), or can be executed by a combination of hardware and software. The above hardware or software can all be referred to as "data block", "module", "engine", "unit", "component" or "system". In addition, various aspects of this specification may be embodied as a computer product located in one or more computer-readable media, and the product includes computer-readable program codes.
计算机存储介质可能包含一个内含有计算机程序编码的传播数据信号,例如在基带上或作为载波的一部分。该传播信号可能有多种表现形式,包括电磁形式、光形式等,或合适的组合形式。计算机存储介质可以是除计算机可读存储介质之外的任何计算机可读介质,该介质可以通过连接至一个指令执行系统、装置或设备以实现通讯、传播或传输供使用的程序。位于计算机存储介质上的程序编码可以通过任何合适的介质进行传播,包括无线电、电缆、光纤电缆、RF、或类似介质,或任何上述介质的组合。The computer storage medium may contain a propagated data signal containing a computer program code, for example on a baseband or as part of a carrier wave. The propagated signal may have multiple manifestations, including electromagnetic forms, optical forms, etc., or a suitable combination. The computer storage medium may be any computer readable medium other than the computer readable storage medium, and the medium may be connected to an instruction execution system, device, or device to realize communication, propagation, or transmission of the program for use. The program code located on the computer storage medium can be transmitted through any suitable medium, including radio, cable, fiber optic cable, RF, or similar medium, or any combination of the above medium.
本说明书各部分操作所需的计算机程序编码可以用任意一种或多种程序语言编写,包括面向对象编程语言如Java、Scala、Smalltalk、Eiffel、JADE、Emerald、C++、C#、VB.NET、Python等,常规程序化编程语言如C语言、Visual Basic、Fortran 2003、Perl、COBOL 2002、PHP、ABAP,动态编程语言如Python、Ruby和Groovy,或其他编程语言等。该程序编码可以完全在用户计算机上运行、或作为独立的软件包在用户计算机上运行、或部分在用户计算机上运行部分在远程计算机运行、或完全在远程计算机或服务器上运行。在后种情况下,远程计算机可以通过任何网络形式与用户计算机连接,比如局域网(LAN)或广域网(WAN),或连接至外部计算机(例如通过因特网),或在云计算环境中,或作为服务使用如软件即服务(SaaS)。The computer program codes required for the operation of each part of this manual can be written in any one or more programming languages, including object-oriented programming languages such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB.NET, Python Etc., conventional programming languages such as C language, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, dynamic programming languages such as Python, Ruby and Groovy, or other programming languages. The program code can be run entirely on the user's computer, or run as an independent software package on the user's computer, or partly run on the user's computer and partly run on a remote computer, or run entirely on the remote computer or server. In the latter case, the remote computer can be connected to the user's computer through any network form, such as a local area network (LAN) or a wide area network (WAN), or connected to an external computer (for example, via the Internet), or in a cloud computing environment, or as a service Use software as a service (SaaS).
此外,除非权利要求中明确说明,本说明书所述处理元素和序列的顺序、数字字母的使用、或其他名称的使用,并非用于限定本说明书流程和方法的顺序。尽管上述披露中通过各种示例讨论了一些目前认为有用的发明实施例,但应当理解的是,该类细节仅起到说明的目的,附加的权利要求并不仅限于披露的实施例,相反,权利要求旨在覆盖所有符合本说明书实施例实质和范围的修正和等价组合。例如,虽然以上所描述的系统组件可以通过硬件设 备实现,但是也可以只通过软件的解决方案得以实现,如在现有的服务器或移动设备上安装所描述的系统。In addition, unless explicitly stated in the claims, the order of processing elements and sequences, the use of numbers and letters, or the use of other names in this specification are not used to limit the order of the processes and methods in this specification. Although the foregoing disclosure uses various examples to discuss some embodiments of the invention that are currently considered useful, it should be understood that such details are only for illustrative purposes, and the appended claims are not limited to the disclosed embodiments. On the contrary, the rights are The requirements are intended to cover all modifications and equivalent combinations that conform to the essence and scope of the embodiments of this specification. For example, although the system components described above can be implemented by hardware devices, they can also be implemented only by software solutions, such as installing the described system on an existing server or mobile device.
同理,应当注意的是,为了简化本说明书披露的表述,从而帮助对一个或多个发明实施例的理解,前文对本说明书实施例的描述中,有时会将多种特征归并至一个实施例、附图或对其的描述中。但是,这种披露方法并不意味着本说明书对象所需要的特征比权利要求中提及的特征多。实际上,实施例的特征要少于上述披露的单个实施例的全部特征。For the same reason, it should be noted that, in order to simplify the expressions disclosed in this specification and help the understanding of one or more embodiments of the invention, in the foregoing description of the embodiments of this specification, multiple features are sometimes combined into one embodiment. In the drawings or its description. However, this method of disclosure does not mean that the subject of the specification requires more features than those mentioned in the claims. In fact, the features of the embodiment are less than all the features of the single embodiment disclosed above.
一些实施例中使用了描述成分、属性数量的数字,应当理解的是,此类用于实施例描述的数字,在一些示例中使用了修饰词“大约”、“近似”或“大体上”来修饰。除非另外说明,“大约”、“近似”或“大体上”表明所述数字允许有±20%的变化。相应地,在一些实施例中,说明书和权利要求中使用的数值参数均为近似值,该近似值根据个别实施例所需特点可以发生改变。在一些实施例中,数值参数应考虑规定的有效数位并采用一般位数保留的方法。尽管本说明书一些实施例中用于确认其范围广度的数值域和参数为近似值,在具体实施例中,此类数值的设定在可行范围内尽可能精确。In some embodiments, numbers describing the number of ingredients and attributes are used. It should be understood that such numbers used in the description of the embodiments use the modifier "about", "approximately" or "substantially" in some examples. Retouch. Unless otherwise stated, "approximately", "approximately" or "substantially" indicates that the number is allowed to vary by ±20%. Correspondingly, in some embodiments, the numerical parameters used in the specification and claims are approximate values, and the approximate values can be changed according to the required characteristics of individual embodiments. In some embodiments, the numerical parameter should consider the prescribed effective digits and adopt the method of general digit retention. Although the numerical ranges and parameters used to confirm the breadth of the ranges in some embodiments of this specification are approximate values, in specific embodiments, the setting of such numerical values is as accurate as possible within the feasible range.
针对本说明书引用的每个专利、专利申请、专利申请公开物和其他材料,如文章、书籍、说明书、出版物、文档等,特此将其全部内容并入本说明书作为参考。与本说明书内容不一致或产生冲突的申请历史文件除外,对本说明书权利要求最广范围有限制的文件(当前或之后附加于本说明书中的)也除外。需要说明的是,如果本说明书附属材料中的描述、定义、和/或术语的使用与本说明书所述内容有不一致或冲突的地方,以本说明书的描述、定义和/或术语的使用为准。For each patent, patent application, patent application publication and other materials cited in this specification, such as articles, books, specifications, publications, documents, etc., the entire contents are hereby incorporated into this specification as a reference. The application history documents that are inconsistent or conflict with the content of this specification are excluded, and the documents that restrict the broadest scope of the claims of this specification (currently or later appended to this specification) are also excluded. It should be noted that if there is any inconsistency or conflict between the description, definition, and/or use of terms in the auxiliary materials of this manual and the content of this manual, the description, definition and/or use of terms in this manual shall prevail. .
最后,应当理解的是,本说明书中所述实施例仅用以说明本说明书实施例的原则。其他的变形也可能属于本说明书的范围。因此,作为示例而非限制,本说明书实施例的替代配置可视为与本说明书的教导一致。相应地,本说明书的实施例不仅限于本说明书明确介绍和描述的实施例。Finally, it should be understood that the embodiments described in this specification are only used to illustrate the principles of the embodiments of this specification. Other variations may also fall within the scope of this specification. Therefore, as an example and not a limitation, the alternative configuration of the embodiment of the present specification can be regarded as consistent with the teaching of the present specification. Accordingly, the embodiments of this specification are not limited to the embodiments explicitly introduced and described in this specification.

Claims (20)

  1. 一种确定用于信息推荐的图谱的方法,其中,所述方法包括:A method for determining an atlas for information recommendation, wherein the method includes:
    获取构建图谱的多个节点;所述节点至少包括词语节点,以及知识点节点;若所述节点为词语节点,将该节点对应的词语的向量表示作为该节点的向量表示;若所述节点为知识点节点,基于与所述知识点节点相关的词语的向量表示,确定对应于所述知识点节点的向量表示;Obtain multiple nodes for constructing the graph; the node includes at least a word node and a knowledge point node; if the node is a word node, the vector representation of the word corresponding to the node is used as the vector representation of the node; if the node is A knowledge point node, based on the vector representation of the words related to the knowledge point node, determine the vector representation corresponding to the knowledge point node;
    对于任意两个节点:基于所述两个节点的类型,确定所述两个节点之间的边权,并将所述边权作为所述两个节点之间的关联关系;For any two nodes: determine the edge weight between the two nodes based on the types of the two nodes, and use the edge weight as the association relationship between the two nodes;
    基于节点的向量表示,以及节点与节点之间的关联关系,进行至少一轮图聚合迭代,以更新所述图谱中节点的向量表示。Based on the vector representation of the node and the association relationship between the node and the node, at least one round of graph aggregation iteration is performed to update the vector representation of the node in the graph.
  2. 根据权利要求1所述的方法,其中,The method of claim 1, wherein:
    词语的向量表示通过以下方式确定:The vector representation of the word is determined in the following way:
    利用词向量表示模型确定对应于所述词语的向量表示,所述词向量表示模型包括机器学习模型;Using a word vector representation model to determine a vector representation corresponding to the word, and the word vector representation model includes a machine learning model;
    所述基于与所述知识点节点相关的词语的向量表示,确定对应与所述知识点节点的向量表示,包括:The determining the vector representation corresponding to the knowledge point node based on the vector representation of the word related to the knowledge point node includes:
    获取来自知识点节点对应的知识点的一个或多个词语;Obtain one or more words from the knowledge point corresponding to the knowledge point node;
    确定所述一个或多个词语的向量表示;Determine the vector representation of the one or more words;
    对一个或多个所述向量表示进行运算,将运算结果作为对应于所述知识点节点的向量表示。An operation is performed on one or more of the vector representations, and the operation result is used as a vector representation corresponding to the knowledge point node.
  3. 根据权利要求1所述的方法,其中,所述基于所述两个节点的类型,确定所述两个节点之间的边权,包括:The method according to claim 1, wherein the determining the edge weight between the two nodes based on the types of the two nodes comprises:
    若所述两个节点同为词语节点,基于两个节点对应的词语之间共现频率确定所述两个节点之间的边权;If the two nodes are both word nodes, determine the edge weight between the two nodes based on the co-occurrence frequency between the words corresponding to the two nodes;
    若所述两个节点中一个节点为词语节点,另一个节点为知识点节点,基于词语节点对应的词语相对于知识点结点对应的知识点的重要程度确定两个节点之间的边权;If one of the two nodes is a word node and the other node is a knowledge point node, the edge weight between the two nodes is determined based on the importance of the word corresponding to the word node relative to the knowledge point corresponding to the knowledge point node;
    若所述两个节点同为知识点节点,确定两个节点之间的边权为零。If the two nodes are both knowledge point nodes, the edge weight between the two nodes is determined to be zero.
  4. 根据权利要求1所示的方法,其中,所述至少一轮图聚合迭代中的一轮,包括:The method according to claim 1, wherein one of the at least one round of graph aggregation iterations includes:
    对于任一节点:For any node:
    基于节点与节点之间的关联关系,确定该节点的邻接节点;Based on the association relationship between the node and the node, determine the adjacent node of the node;
    基于该节点与邻接节点之间的边权对邻接节点在当前迭代轮次中的向量表示进行加权运算,利用运算结果更新该节点的向量表示。The vector representation of the neighboring node in the current iteration round is weighted based on the edge weight between the node and the neighboring node, and the vector representation of the node is updated with the result of the calculation.
  5. 根据权利要求1所述的方法,其中,所述至少一轮图聚合迭代中的一轮,包括:The method according to claim 1, wherein one of the at least one round of graph aggregation iterations includes:
    利用所述多个节点在当前迭代轮次中的向量表示,获取向量表示矩阵;Using the vector representations of the multiple nodes in the current iteration round to obtain a vector representation matrix;
    基于节点与节点之间的关联关系,确定对应于所述多个节点的邻接矩阵;Determine the adjacency matrix corresponding to the multiple nodes based on the association relationship between the nodes;
    将所述向量表示矩阵与所述邻接矩阵进行运算,利用运算结果更新所述图谱中各节 点的向量表示。The vector representation matrix and the adjacency matrix are operated on, and the vector representation of each node in the map is updated with the result of the operation.
  6. 根据权利要求1所述的方法,其中,所述基于节点的向量表示,以及节点与节点之间的关联关系,进行至少一轮图聚合迭代,以更新所述图谱中节点的向量表示,包括:The method according to claim 1, wherein the performing at least one round of graph aggregation iteration based on the vector representation of the node and the association relationship between the nodes to update the vector representation of the nodes in the graph comprises:
    利用所述多个节点的向量表示,获取向量表示矩阵;Using the vector representations of the multiple nodes to obtain a vector representation matrix;
    基于节点与节点之间的关联关系,确定对应于所述多个节点的邻接矩阵;Determine the adjacency matrix corresponding to the multiple nodes based on the association relationship between the nodes;
    利用基于神经网络的聚合模型,处理所述向量表示矩阵以及所述邻接矩阵以获得更新后的向量表示矩阵;所述基于神经网络的聚合模型至少包括GCN、或GAT;Processing the vector representation matrix and the adjacency matrix by using a neural network-based aggregation model to obtain an updated vector representation matrix; the neural network-based aggregation model includes at least GCN or GAT;
    基于所述更新后的向量表示矩阵更新所述图谱中节点的向量表示。The vector representation of the node in the graph is updated based on the updated vector representation matrix.
  7. 一种利用图谱进行的信息推荐方法,其中,所述方法包括:An information recommendation method using graphs, wherein the method includes:
    获取输入信息;Get input information;
    利用所述图谱,确定所述输入信息在所述图谱中对应的节点;所述图谱如权利要求1-6中任意一项所述的方法确定;Use the graph to determine the node corresponding to the input information in the graph; the graph is determined by the method according to any one of claims 1-6;
    基于所述节点的向量表示,以及所述节点的邻接节点的向量表示,确定推荐节点;Determine a recommended node based on the vector representation of the node and the vector representation of the adjacent nodes of the node;
    将与所述推荐节点相关的信息作为输出。The information related to the recommended node is output.
  8. 根据权利要求7所述的方法,其中,所述输入信息为用户从预先向用户提供的候选词语中选中的一个或多个词语。8. The method according to claim 7, wherein the input information is one or more words selected by the user from candidate words provided to the user in advance.
  9. 根据权利要求7所述的方法,其中,所述与所述推荐节点相关的信息包括与所述推荐节点相关的知识点。The method according to claim 7, wherein the information related to the recommending node includes knowledge points related to the recommending node.
  10. 一种确定用于信息推荐的图谱的系统,其中,所述系统包括第一获取模块、第一确定模块、以及更新模块;A system for determining an atlas for information recommendation, wherein the system includes a first acquisition module, a first determination module, and an update module;
    所述第一获取模块,用于获取构建图谱的多个节点;所述节点至少包括词语节点,以及知识点节点;若所述节点为词语节点,将该节点对应的词语的向量表示作为该节点的向量表示;若所述节点为知识点节点,基于与所述知识点节点相关的词语的向量表示,确定对应于所述知识点节点的向量表示;The first acquisition module is configured to acquire a plurality of nodes for constructing a graph; the nodes include at least a word node and a knowledge point node; if the node is a word node, the vector representation of the word corresponding to the node is used as the node If the node is a knowledge point node, determine the vector representation corresponding to the knowledge point node based on the vector representation of the words related to the knowledge point node;
    对于任意两个节点:所述第一确定模块,用于基于所述两个节点的类型,确定所述两个节点之间的边权,并将所述边权作为所述两个节点之间的关联关系;For any two nodes: the first determining module is configured to determine the edge weight between the two nodes based on the types of the two nodes, and use the edge weight as the value between the two nodes ’S relationship;
    所述更新模块,用于基于节点的向量表示,以及节点与节点之间的关联关系,进行至少一轮图聚合迭代,以更新所述图谱中节点的向量表示。The update module is configured to perform at least one round of graph aggregation iteration based on the vector representation of the node and the association relationship between the nodes to update the vector representation of the nodes in the graph.
  11. 根据权利要求10所述的系统,其中,为获取词语的向量表示,所述第一获取模块用于:The system according to claim 10, wherein, in order to obtain a vector representation of a word, the first obtaining module is configured to:
    利用词向量表示模型确定对应于所述词语的向量表示;所述词向量表示模型包括机器学习模型;Using a word vector representation model to determine a vector representation corresponding to the word; the word vector representation model includes a machine learning model;
    为基于与所述知识点节点相关的词语的向量表示,确定对应与所述知识点节点的向量表示,所述第一获取模块用于:In order to determine the vector representation corresponding to the knowledge point node based on the vector representation of the words related to the knowledge point node, the first acquisition module is configured to:
    获取来自知识点节点对应的知识点的一个或多个词语;Obtain one or more words from the knowledge point corresponding to the knowledge point node;
    确定所述一个或多个词语的向量表示;Determine the vector representation of the one or more words;
    对一个或多个所述向量表示进行运算,将运算结果作为对应于所述知识点节点的向量表示。An operation is performed on one or more of the vector representations, and the operation result is used as a vector representation corresponding to the knowledge point node.
  12. 根据权利要求10所述的系统,其中,为基于所述两个节点的类型,确定两个节点之间的边权,所述第一确定模块用于:The system according to claim 10, wherein, to determine the edge weight between the two nodes based on the types of the two nodes, the first determining module is configured to:
    若所述两个节点同为词语节点,基于两个节点对应的词语之间共现频率确定所述两个节点之间的边权;If the two nodes are both word nodes, determine the edge weight between the two nodes based on the co-occurrence frequency between the words corresponding to the two nodes;
    若所述两个节点中一个节点为词语节点,另一个节点为知识点节点,基于词语节点对应的词语相对于知识点结点对应的知识点的重要程度确定两个节点之间的边权;If one of the two nodes is a word node and the other node is a knowledge point node, the edge weight between the two nodes is determined based on the importance of the word corresponding to the word node relative to the knowledge point corresponding to the knowledge point node;
    若所述两个节点同为知识点节点,确定两个节点之间的边权为零。If the two nodes are both knowledge point nodes, the edge weight between the two nodes is determined to be zero.
  13. 根据权利要求10所述的系统,其中,为进行所述至少一轮图聚合迭代中的一轮,所述更新模块用于:The system according to claim 10, wherein, to perform one of the at least one round of graph aggregation iteration, the update module is configured to:
    对于任一节点:For any node:
    基于节点与节点之间的关联关系,确定该节点的邻接节点;Based on the association relationship between the node and the node, determine the adjacent node of the node;
    基于该节点与邻接节点之间的边权对邻接节点在当前迭代轮次中的向量表示进行加权运算,利用运算结果更新该节点的向量表示。The vector representation of the neighboring node in the current iteration round is weighted based on the edge weight between the node and the neighboring node, and the vector representation of the node is updated with the result of the calculation.
  14. 根据权利要求10所述的系统,其中,为进行所述至少一轮图聚合迭代中的一轮,所述更新模块用于:The system according to claim 10, wherein, to perform one of the at least one round of graph aggregation iteration, the update module is configured to:
    利用所述多个节点在当前迭代轮次中的向量表示,获取向量表示矩阵;Using the vector representations of the multiple nodes in the current iteration round to obtain a vector representation matrix;
    基于节点与节点之间的关联关系,确定对应于所述多个节点的邻接矩阵;Determine the adjacency matrix corresponding to the multiple nodes based on the association relationship between the nodes;
    将所述向量表示矩阵与所述邻接矩阵进行运算,利用运算结果更新所述图谱中各节点的向量表示。The vector representation matrix and the adjacency matrix are operated on, and the vector representation of each node in the graph is updated with the result of the operation.
  15. 根据权利要求10所述的系统,其中,为基于节点的向量表示,以及节点与节点之间的关联关系,进行至少一轮图聚合迭代,以更新所述图谱中节点的向量表示,所述更新模块用于:The system according to claim 10, wherein at least one round of graph aggregation iteration is performed to update the vector representation of the nodes in the graph based on the vector representation of the nodes and the association relationship between the nodes and the nodes, and the update Modules are used for:
    利用所述多个节点的向量表示,获取向量表示矩阵;Using the vector representations of the multiple nodes to obtain a vector representation matrix;
    基于节点与节点之间的关联关系,确定对应于所述多个节点的邻接矩阵;Determine the adjacency matrix corresponding to the multiple nodes based on the association relationship between the nodes;
    利用基于神经网络的聚合模型,处理所述向量表示矩阵以及所述邻接矩阵以获得更新后的向量表示矩阵;所述基于神经网络的聚合模型至少包括GCN、或GAT;Processing the vector representation matrix and the adjacency matrix by using a neural network-based aggregation model to obtain an updated vector representation matrix; the neural network-based aggregation model includes at least GCN or GAT;
    基于所述更新后的向量表示矩阵更新所述图谱中节点的向量表示。The vector representation of the node in the graph is updated based on the updated vector representation matrix.
  16. 一种利用图谱进行的信息推荐系统,其中,所述系统包括第二获取模块、第二确定模块、第三确定模块以及输出模块;An information recommendation system using graphs, wherein the system includes a second acquisition module, a second determination module, a third determination module, and an output module;
    所述第二获取模块,用于获取输入信息;The second obtaining module is used to obtain input information;
    所述第二确定模块,用于利用所述图谱,确定所述输入信息在所述图谱中对应的节点;所述图谱由如权利要求1-6中任意一项所述的方法确定;The second determining module is configured to use the graph to determine the node corresponding to the input information in the graph; the graph is determined by the method according to any one of claims 1-6;
    所述第三确定模块,用于基于所述节点的向量表示,以及所述节点的邻接节点的向 量表示,确定推荐节点;The third determining module is configured to determine a recommended node based on the vector representation of the node and the vector representation of the adjacent nodes of the node;
    所述输出模块,用于将与所述推荐节点相关的信息作为输出。The output module is configured to output information related to the recommended node.
  17. 根据权利要求16所述的系统,其中,所述输入信息为用户从预先向用户提供的候选词语中选中的一个或多个词语。The system according to claim 16, wherein the input information is one or more words selected by the user from candidate words provided to the user in advance.
  18. 根据权利要求16所述的系统,其中,所述与所述推荐节点相关的信息包括与所述推荐节点相关的知识点。The system according to claim 16, wherein the information related to the recommending node includes knowledge points related to the recommending node.
  19. 一种确定用于信息推荐的图谱的装置,其中,所述装置包括处理器,所述处理器用于执行如权利要求1-6中任意一项所述的方法。An apparatus for determining an atlas for information recommendation, wherein the apparatus includes a processor, and the processor is configured to execute the method according to any one of claims 1-6.
  20. 一种利用图谱进行的信息推荐装置,其中,所述装置包括处理器,所述处理器用于执行如权利要求7-9中任意一项所述的方法。An information recommendation device using graphs, wherein the device includes a processor, and the processor is configured to execute the method according to any one of claims 7-9.
PCT/CN2021/088763 2020-04-24 2021-04-21 Determination of map for information recommendation WO2021213448A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010329694.9A CN111241412B (en) 2020-04-24 2020-04-24 Method, system and device for determining map for information recommendation
CN202010329694.9 2020-04-24

Publications (1)

Publication Number Publication Date
WO2021213448A1 true WO2021213448A1 (en) 2021-10-28

Family

ID=70864714

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/088763 WO2021213448A1 (en) 2020-04-24 2021-04-21 Determination of map for information recommendation

Country Status (2)

Country Link
CN (1) CN111241412B (en)
WO (1) WO2021213448A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115080706A (en) * 2022-08-18 2022-09-20 京华信息科技股份有限公司 Method and system for constructing enterprise relationship map

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241412B (en) * 2020-04-24 2020-08-07 支付宝(杭州)信息技术有限公司 Method, system and device for determining map for information recommendation
CN111695501B (en) * 2020-06-11 2021-08-10 青岛大学 Equipment soft fault detection method based on operating system kernel calling data
CN111723292B (en) * 2020-06-24 2023-07-07 携程计算机技术(上海)有限公司 Recommendation method, system, electronic equipment and storage medium based on graph neural network
CN112256834B (en) * 2020-10-28 2021-06-08 中国科学院声学研究所 Marine science data recommendation system based on content and literature
CN117094529B (en) * 2023-10-16 2024-02-13 浙江挚典科技有限公司 Reinforcement avoiding scheme recommendation method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086434A (en) * 2018-08-13 2018-12-25 华中师范大学 A kind of knowledge polymerizing method and system based on thematic map
CN110516697A (en) * 2019-07-15 2019-11-29 清华大学 Statement verification method and system based on evidence figure polymerization and reasoning
US20200084084A1 (en) * 2018-09-06 2020-03-12 Ca, Inc. N-gram based knowledge graph for semantic discovery model
CN111241412A (en) * 2020-04-24 2020-06-05 支付宝(杭州)信息技术有限公司 Method, system and device for determining map for information recommendation

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104102713B (en) * 2014-07-16 2018-01-19 百度在线网络技术(北京)有限公司 Recommendation results show method and apparatus
CN105824802B (en) * 2016-03-31 2018-10-30 清华大学 It is a kind of to obtain the method and device that knowledge mapping vectorization indicates
CN107545000A (en) * 2016-06-28 2018-01-05 百度在线网络技术(北京)有限公司 The information-pushing method and device of knowledge based collection of illustrative plates
CN108846104B (en) * 2018-06-20 2022-03-11 北京师范大学 Question-answer analysis and processing method and system based on education knowledge graph
CN109670051A (en) * 2018-12-14 2019-04-23 北京百度网讯科技有限公司 Knowledge mapping method for digging, device, equipment and storage medium
CN110362723B (en) * 2019-05-31 2022-06-21 平安国际智慧城市科技股份有限公司 Topic feature representation method, device and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086434A (en) * 2018-08-13 2018-12-25 华中师范大学 A kind of knowledge polymerizing method and system based on thematic map
US20200084084A1 (en) * 2018-09-06 2020-03-12 Ca, Inc. N-gram based knowledge graph for semantic discovery model
CN110516697A (en) * 2019-07-15 2019-11-29 清华大学 Statement verification method and system based on evidence figure polymerization and reasoning
CN111241412A (en) * 2020-04-24 2020-06-05 支付宝(杭州)信息技术有限公司 Method, system and device for determining map for information recommendation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115080706A (en) * 2022-08-18 2022-09-20 京华信息科技股份有限公司 Method and system for constructing enterprise relationship map
CN115080706B (en) * 2022-08-18 2022-11-08 京华信息科技股份有限公司 Method and system for constructing enterprise relationship map

Also Published As

Publication number Publication date
CN111241412A (en) 2020-06-05
CN111241412B (en) 2020-08-07

Similar Documents

Publication Publication Date Title
WO2021213448A1 (en) Determination of map for information recommendation
US9807473B2 (en) Jointly modeling embedding and translation to bridge video and language
US20190164084A1 (en) Method of and system for generating prediction quality parameter for a prediction model executed in a machine learning algorithm
KR20200094627A (en) Method, apparatus, device and medium for determining text relevance
US11030265B2 (en) Cross-platform data matching method and apparatus, computer device and storage medium
US11861516B2 (en) Methods and system for associating locations with annotations
TW201939366A (en) Recommendation system construction method and device
US20170255621A1 (en) Determining key concepts in documents based on a universal concept graph
US10762163B2 (en) Probabilistic matrix factorization for automated machine learning
WO2020224106A1 (en) Text classification method and system based on neural network, and computer device
WO2021089013A1 (en) Spatial graph convolutional network training method, electronic device and storage medium
CN107657015A (en) A kind of point of interest recommends method, apparatus, electronic equipment and storage medium
CN113127506B (en) Target query statement construction method and device, storage medium and electronic device
US20190066054A1 (en) Accuracy of member profile retrieval using a universal concept graph
WO2020020085A1 (en) Representation learning method and device
CN111259647A (en) Question and answer text matching method, device, medium and electronic equipment based on artificial intelligence
US11537448B1 (en) Adapting application programming interfaces with schema mappings
US20190065612A1 (en) Accuracy of job retrieval using a universal concept graph
WO2020224220A1 (en) Knowledge graph-based question answering method, electronic device, apparatus, and storage medium
CN115905687A (en) Cold start-oriented recommendation system and method based on meta-learning graph neural network
US20230185639A1 (en) Mapping application programming interface schemas with semantic representations
Santacruz et al. Learning the sub-optimal graph edit distance edit costs based on an embedded model
US20170155571A1 (en) System and method for discovering ad-hoc communities over large-scale implicit networks by wave relaxation
US20230351153A1 (en) Knowledge graph reasoning model, system, and reasoning method based on bayesian few-shot learning
US20230117973A1 (en) Data processing method and apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21793125

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21793125

Country of ref document: EP

Kind code of ref document: A1