WO2020143186A1 - 推荐系统训练方法、装置、计算机设备及存储介质 - Google Patents

推荐系统训练方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2020143186A1
WO2020143186A1 PCT/CN2019/093138 CN2019093138W WO2020143186A1 WO 2020143186 A1 WO2020143186 A1 WO 2020143186A1 CN 2019093138 W CN2019093138 W CN 2019093138W WO 2020143186 A1 WO2020143186 A1 WO 2020143186A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
sample
search
word vector
input
Prior art date
Application number
PCT/CN2019/093138
Other languages
English (en)
French (fr)
Inventor
金戈
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020143186A1 publication Critical patent/WO2020143186A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular, to a recommendation system training method, device, computer equipment, and storage medium.
  • the recommendation system can recommend personalized results to the user based on the retrieval information input by the user.
  • the recommendation system In order to make the recommendation results more accurate, generally a large amount of data is required to train the recommendation system.
  • Embodiments of the present application provide a recommendation system training method, device, computer equipment, and storage medium, aimed at improving the accuracy of the recommendation system.
  • an embodiment of the present application provides a training method for a recommendation system, which includes:
  • a recommendation result is provided to the user based on the retrieval information and the trained recommendation system.
  • an embodiment of the present application also provides a recommendation system training device, which includes:
  • the first word segmentation unit is used to obtain pre-stored historical search record samples, and perform word segmentation processing on the search word samples and answer word samples of the historical search record samples to obtain corresponding search word sample word segmentation sets and answer word sample word segmentation sets, respectively ;
  • the first training unit is configured to perform word vector training on the search word sample word segmentation set and the answer word sample word segmentation set to obtain the search word sample word vector set and the answer word sample word vector set, respectively;
  • a first input unit configured to input all search word sample word vectors in the search word sample word vector set into a pre-established knowledge map to obtain an input word vector set, wherein the knowledge map is based on the history
  • the search record sample is established, and the input word vector in the input word vector set is an output result after the search word sample word vector in the search word sample word vector set is input to the knowledge map;
  • a second training unit configured to train a preset recommendation system through the input word vector set and the answer word sample word vector set;
  • the first recommendation unit is configured to provide the user with a recommendation result based on the retrieval information and the trained recommendation system if the retrieval information input by the user is received.
  • an embodiment of the present application further provides a computer device, which includes a memory and a processor connected to the memory; the memory is used to store a computer program; and the processor is used to run the memory stored in the memory Computer program to perform the following steps:
  • a recommendation result is provided to the user based on the retrieval information and the trained recommendation system.
  • an embodiment of the present application further provides a computer-readable storage medium that stores a computer program, and when the computer program is executed by a processor, causes the processor to perform the following steps:
  • a recommendation result is provided to the user based on the retrieval information and the trained recommendation system.
  • FIG. 1 is a schematic flowchart of a training method of a recommendation system provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a sub-process of a training method of a recommendation system provided by an embodiment of the present application
  • FIG. 3 is a schematic flowchart of a sub-process of a training method for a recommendation system provided by an embodiment of the present application
  • FIG. 4 is a schematic flowchart of a sub-process of a training method for a recommendation system provided by an embodiment of the present application
  • FIG. 5 is a schematic flowchart of a sub-process of a training method for a recommendation system provided by an embodiment of the present application
  • FIG. 6 is a schematic block diagram of a training device for a recommendation system provided by an embodiment of the present application.
  • FIG. 7 is a schematic block diagram of a first word segmentation unit of a training device for a recommendation system provided by an embodiment of this application;
  • FIG. 8 is a schematic block diagram of a first input unit of a training apparatus for a recommendation system provided by an embodiment of the present application
  • FIG. 9 is a schematic block diagram of a second training unit of a training apparatus for a recommendation system provided by an embodiment of this application.
  • FIG. 10 is a schematic block diagram of a first recommendation unit of a training device for a recommendation system provided by an embodiment of the present application.
  • FIG. 11 is a schematic block diagram of a computer device provided by an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of a training method of a recommendation system provided by an embodiment of the present application. As shown in FIG. 1, the method includes the following steps S1-S5.
  • the historical search records of the user are collected in advance to obtain a sample of the historical search records.
  • the historical search record sample includes a search word sample and a corresponding answer word sample.
  • word segmentation processing is performed on the search word samples to obtain the word segmentation set of the search word samples accordingly.
  • the search word sample word segmentation set is a set composed of the word segmentation results of the search word sample.
  • the word segmentation processing is performed on the answer word samples to obtain the word segmentation set of the answer word samples accordingly.
  • the answer word sample word segmentation set is a set of word segmentation results of the answer word sample.
  • the step S1 may include steps S11-S12.
  • the commonly used word segmentation tool is the stammer word segmentation tool.
  • the word segmentation processing is performed on the search word samples and the answer word samples by the stutter word segmentation tool to obtain the initial search word sample segmentation set and the initial answer word sample segmentation set accordingly.
  • word segmentation processing is performed on the search word samples and answer word samples by other word segmentation tools, which is not specifically limited in this application.
  • S12 Remove the stop words in the initial search word sample segmentation set and the initial answer word sample segmentation set to obtain the search word sample segmentation set and the answer word sample segmentation set, respectively.
  • stop words in the initial search word sample segmentation set and the initial answer word sample segmentation set are removed respectively to obtain the search word sample segmentation set and the answer word sample segmentation set, respectively.
  • stop words are often prepositions, adverbs or conjunctions. Stop words have no real meaning, so they need to be removed. Commonly used stop words include words such as "in”, “inside”, “also”, “of”, “it”, “for”, etc.
  • S2 Perform word vector training on the search word sample word segmentation set and the answer word sample word segmentation set to obtain the search word sample word vector set and the answer word sample word vector set, respectively.
  • the word vector tool word2vec is used to perform word vector training on the search word sample word segmentation set and the answer word sample word segmentation set respectively to obtain the search word sample word vector set and the answer word sample word vector set.
  • the search word sample word vector set is a set of word vectors of each sample in the search word sample segmentation set;
  • the answer word sample word vector set is a set of word vectors of each sample in the answer word sample segmentation set.
  • word2vec is a commonly used natural language processing tool, and its function is to convert words in natural language into word vectors that can be understood by computers.
  • this embodiment uses word2vec to obtain the word vector, which can be calculated by calculating the vector The distance between them reflects the similarity between words.
  • word vector tools may be used to perform word vector training on the search word sample segmentation set and answer word sample segmentation set, which is not specifically limited in this application.
  • a knowledge graph is established in advance based on historical search record samples.
  • a top-down approach is used to construct the knowledge graph.
  • the ontology and pattern information are extracted from the user's historical search record samples, and added to the knowledge base to obtain the knowledge graph.
  • all the search word sample word vectors in the search word sample word vector set are input into the pre-established knowledge map to obtain the input word vector set.
  • the input word vector in the set of input word vectors is an output result after the search word sample word vector in the search word sample word vector set is input to the knowledge graph.
  • the step S3 may include steps S31-S34.
  • a search term sample word vector is obtained from the search term sample word vector set as a target search term sample word vector and input into the knowledge map, and the output result of the knowledge map is added as an input word vector Into the set of input word vectors.
  • the target search word sample word vector is removed from the set of search word sample word vectors to avoid repeated input.
  • step S31 is executed to return the retrieval word sample word vector obtained from the search word sample word vector set as the target search word sample word vector. The step of inputting into the knowledge graph, and adding the output result of the knowledge graph as an input word vector to the set of input word vectors.
  • step S4 is executed.
  • the recommendation system is trained by using input word vectors in the input word vector set as input data, and using corresponding answer word vectors in the answer word sample word vector set as reference output data.
  • the recommendation system is a recommendation system established based on Deep Neural Networks (DNN). Or, in other embodiments, a recommendation system of other architecture may be used, which is not specifically limited in this application.
  • DNN Deep Neural Networks
  • the step S4 may include steps S41-S45.
  • an input word vector is obtained from the set of input word vectors as a target input word vector and input into the recommendation system.
  • S42 Determine whether the answer word vector output by the recommendation system is the same as the target answer word vector, where the target answer word vector is an answer word sample corresponding to the input word vector in the set of answer word sample word vectors Word vector.
  • the target answer word vector is an answer word sample word vector corresponding to the input word vector in the set of answer word sample word vectors.
  • the answer word vector output by the recommendation system is different from the target answer word vector, the answer word vector output by the recommendation system is input into the knowledge graph, and the output result of the knowledge graph is used as The new target input word vector is input into the recommendation system, and the above steps are repeated in this manner until the answer word vector output by the recommendation system is the same as the target answer word vector.
  • the target input word vector is removed from the set of input word vectors to avoid repeated input.
  • step S41 it is determined whether there is an input word vector in the input word vector set, that is, whether all input word vectors in the input word vector set have been input into the recommendation system. If the input word vector still exists in the input word vector set, continue to step S41, continue to obtain the input word vector and input it into the recommendation system; if there is no input word vector in the input word vector set, then end the training of the recommendation system, and Go to step S5.
  • the retrieval information input by the user is received, the retrieval information is converted into a word vector and input into the recommendation system trained through the above steps, and the output result of the recommendation system is used as the recommendation result provided to the user.
  • the step S5 may include steps S51-S53.
  • S51 Perform word segmentation processing and word vector training on the search information to obtain search word vectors for the search information.
  • the search information is subjected to word segmentation processing to obtain search words, and the search words are subjected to word vector training to obtain search word vectors.
  • the search term vector is input into the knowledge graph, and the output result of the knowledge graph is input into the trained recommendation system.
  • the recommendation system can share the data features in the knowledge graph , Making the recommendation results of the recommendation system more accurate.
  • the output result of the recommendation system is used as the recommendation result provided to the user.
  • the technical solution of the embodiment of the present application by inputting the search word sample word vectors in the search word sample word vector set into the pre-established knowledge map to obtain the input word vector, and training the recommendation system by inputting the word vector, can Make the recommendation system share the data features in the knowledge map, make the recommendation system realize some data enhancement during training, improve the accuracy of the recommendation system and reduce the possibility of overfitting to a certain extent, and improve the accuracy of the recommendation system .
  • FIG. 6 is a schematic block diagram of a recommendation system training device 60 provided by an embodiment of the present application. As shown in FIG. 6, corresponding to the above recommendation system training method, the present application also provides a recommendation system training device 60.
  • the recommendation system training device 60 includes a unit for performing the above recommendation system training method, and the device may be configured in a server. Specifically, referring to FIG. 6, the recommendation system training device 60 includes a first word segmentation unit 61, a first training unit 62, a first input unit 63, a second training unit 64, and a first recommendation unit 65.
  • the first word segmentation unit 61 is used to obtain pre-stored historical search record samples, and perform word segmentation processing on the search word samples and answer word samples of the historical search record samples respectively to obtain the search word sample word segmentation set and the answer word sample word segmentation set;
  • the first training unit 62 is configured to perform word vector training on the search word sample word segmentation set and the answer word sample word segmentation set to obtain the search word sample word vector set and the answer word sample word vector set, respectively;
  • the first input unit 63 is configured to input all search word sample word vectors in the search word sample word vector set into a pre-established knowledge map to obtain an input word vector set, wherein the knowledge map is based on the The historical search record sample is established, and the input word vector in the input word vector set is an output result after the search word sample word vector in the search word sample word vector set is input to the knowledge map;
  • the second training unit 64 is configured to train a preset recommendation system through the input word vector set and the answer word sample word vector set;
  • the first recommendation unit 65 is configured to provide the user with a recommendation result based on the retrieval information and the trained recommendation system if the retrieval information input by the user is received.
  • the first word segmentation unit 61 includes a second word segmentation unit 611 and a removal unit 612.
  • the second word segmentation unit 611 is used to perform word segmentation processing on the search word sample and the answer word sample through a preset word segmentation tool to obtain the initial search word sample word segmentation set and the initial answer word sample word segmentation set accordingly;
  • the removing unit 612 is configured to remove the stop words in the initial search word sample segmentation set and the initial answer word sample segmentation set to obtain the search word sample segmentation set and the answer word sample segmentation set, respectively.
  • the first input unit 63 includes a second input unit 631, a first removal unit 632, a first judgment unit 633 and a first notification unit 634.
  • the second input unit 631 is configured to obtain a search word sample word vector from the search word sample word vector set as a target search word sample word vector and input it into the knowledge map, and use the output result of the knowledge map as The input word vector is added to the set of input word vectors;
  • the first removing unit 632 is configured to remove the target search word sample word vector from the search word sample word vector set
  • the first judging unit 633 is configured to judge whether the search word sample word vector still exists in the search word sample word vector set;
  • the first notification unit 634 is configured to notify the second input unit to return to obtain a search word sample word from the search word sample word vector set if the search word sample word vector set still exists in the search word sample word vector set. The step of inputting the vector as the target search word sample word vector into the knowledge map, and adding the output result of the knowledge map as the input word vector to the input word vector set.
  • the second training unit 64 includes a third input unit 641, a second judgment unit 642, a second notification unit 643, a second removal unit 644, a third judgment unit 645, and a third Three notification unit 646.
  • the third input unit 641 is configured to obtain an input word vector from the input word vector set as a target input word vector and input it into the recommendation system;
  • the second judgment unit 642 is used to judge whether the answer word vector output by the recommendation system is the same as the target answer word vector, where the target answer word vector is the input word vector in the set of answer word sample word vectors Sample word vectors of corresponding answer words;
  • the second notification unit 643 is configured to input the answer word vector output by the recommendation system into the knowledge graph if the answer word vector output by the recommendation system is different from the target answer word vector, and notify the third
  • the input unit 641 inputs the output result of the knowledge graph into the recommendation system as a new target input word vector;
  • the second removing unit 644 is configured to remove the target input word vector from the set of input word vectors if the answer word vector output by the recommendation system is the same as the target answer word vector;
  • the third determining unit 645 is configured to determine whether there is an input word vector in the input word vector set
  • the third notification unit 646 is configured to notify the third input unit 641 to return an input word vector obtained from the input word vector set as the target input word if the input word vector still exists in the input word vector set The step of vector input into the recommendation system.
  • the first recommendation unit 65 includes a processing unit 651, a fourth input unit 652, and a second recommendation unit 653.
  • the processing unit 651 is configured to separately perform word segmentation processing and word vector training on the search information to obtain search word vectors for the search information;
  • the fourth input unit 652 is configured to input the search term vector into the knowledge graph, and input the output result of the knowledge graph into the trained recommendation system;
  • the second recommendation unit 653 is used to recommend the output result of the recommendation system to the user as a recommendation result.
  • the above recommendation system training device may be implemented in the form of a computer program, and the computer program may run on the computer device shown in FIG. 11.
  • the computer device 500 may be a terminal or a server.
  • the terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, a wearable device, and other electronic devices with communication functions.
  • the server can be an independent server or a server cluster composed of multiple servers.
  • the computer device 500 includes a processor 502, a memory, and a network interface 505 connected through a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
  • the non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032.
  • the computer program 5032 When executed, it may cause the processor 502 to execute a recommendation system training method.
  • the processor 502 is used to provide computing and control capabilities to support the operation of the entire computer device 500.
  • the internal memory 504 provides an environment for running the computer program 5032 in the non-volatile storage medium 503.
  • the processor 502 can execute a recommended system training method.
  • the network interface 505 is used for network communication with other devices.
  • FIG. 11 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied.
  • the specific computer device 500 may include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • the processor 502 is used to run the computer program 5032 stored in the memory to implement the recommended system training method of the embodiment of the present application.
  • the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), Application specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may be any conventional processor.
  • a person of ordinary skill in the art may understand that all or part of the processes in the method for implementing the foregoing embodiments may be completed by instructing relevant hardware through a computer program.
  • the computer program may be stored in a storage medium, which is a computer-readable storage medium.
  • the computer program is executed by at least one processor in the computer system to implement the process steps of the foregoing method embodiments.
  • the embodiments of the present application also provide a storage medium.
  • the storage medium may be a computer-readable storage medium.
  • the storage medium stores a computer program which, when executed by the processor, causes the processor to perform the steps of the recommended system training method described in the above embodiments.
  • the storage medium may be various computer-readable storage media that can store computer programs, such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例公开了一种推荐系统训练方法、装置、计算机设备及存储介质。其中,该方法属于人工智能技术,该方法包括:对检索词样本以及答案词样本进行分词处理以得到检索词样本分词集合以及答案词样本分词集合;对检索词样本分词集合以及答案词样本分词集合进行词向量训练以得到检索词样本词向量集合以及答案词样本词向量集合;将检索词样本词向量集合中的检索词样本词向量输入到知识图谱中以得到输入词向量集合;通过输入词向量集合以及答案词样本词向量集合对推荐系统进行训练;若接收到用户输入的检索信息,根据训练后的推荐系统向用户提供推荐结果。

Description

推荐系统训练方法、装置、计算机设备及存储介质
本申请要求于2019年1月10日提交中国专利局、申请号为201910023778.7、申请名称为“推荐系统训练方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种推荐系统训练方法、装置、计算机设备及存储介质。
背景技术
随着电子商务的发展,推荐系统的应用越来越广泛。推荐系统能够根据用户输入的检索信息向用户推荐个性化的结果。为了使得推荐的结果更加准确,一般需要大量的数据对推荐系统进行训练。
然而由于用户的数据体量较少,导致推荐系统的训练效果不好,常出现过拟合的现象。进一步地,导致推荐系统的推荐结果不准确。
发明内容
本申请实施例提供了一种推荐系统训练方法、装置、计算机设备及存储介质,旨在提高推荐系统的准确性。
第一方面,本申请实施例提供了一种推荐系统训练方法,其包括:
获取预先存储的历史检索记录样本,并分别对所述历史检索记录样本的检索词样本以及答案词样本进行分词处理以相应得到检索词样本分词集合以及答案词样本分词集合;
分别对所述检索词样本分词集合以及答案词样本分词集合进行词向量训练以分别得到检索词样本词向量集合以及答案词样本词向量集合;
将所述检索词样本词向量集合中的所有检索词样本词向量输入到预先建立的知识图谱中以得到输入词向量集合,其中,所述知识图谱是根据所述历史检索记录样本建立的,所述输入词向量集合中的输入词向量为所述检索词样本词向量集合中的检索词样本词向量输入到所述知识图谱后的输出结果;
通过所述输入词向量集合以及所述答案词样本词向量集合对预设的推荐系统进行训练;
若接收到用户输入的检索信息,根据所述检索信息以及训练后的推荐系统向用户提供推荐结果。
第二方面,本申请实施例还提供了一种推荐系统训练装置,其包括:
第一分词单元,用于获取预先存储的历史检索记录样本,并分别对所述历史检索记录样本的检索词样本以及答案词样本进行分词处理以相应得到检索词样本分词集合以及答案词样本分词集合;
第一训练单元,用于分别对所述检索词样本分词集合以及答案词样本分词集合进行词向量训练以分别得到检索词样本词向量集合以及答案词样本词向量集合;
第一输入单元,用于将所述检索词样本词向量集合中的所有检索词样本词向量输入到预先建立的知识图谱中以得到输入词向量集合,其中,所述知识图谱是根据所述历史检索记录样本建立的,所述输入词向量集合中的输入词向量为所述检索词样本词向量集合中的检索词样本词向量输入到所述知识图谱后的输出结果;
第二训练单元,用于通过所述输入词向量集合以及所述答案词样本词向量集合对预设的推荐系统进行训练;
第一推荐单元,用于若接收到用户输入的检索信息,根据所述检索信息以及训练后的推荐系统向用户提供推荐结果。
第三方面,本申请实施例还提供了一种计算机设备,其包括存储器以及与所述存储器相连的处理器;所述存储器用于存储计算机程序;所述处理器用于运行所述存储器中存储的计算机程序,以执行如下步骤:
获取预先存储的历史检索记录样本,并分别对所述历史检索记录样本的检索词样本以及答案词样本进行分词处理以相应得到检索词样本分词集合以及答案词样本分词集合;
分别对所述检索词样本分词集合以及答案词样本分词集合进行词向量训练以分别得到检索词样本词向量集合以及答案词样本词向量集合;
将所述检索词样本词向量集合中的所有检索词样本词向量输入到预先建立的知识图谱中以得到输入词向量集合,其中,所述知识图谱是根据所述历史检 索记录样本建立的,所述输入词向量集合中的输入词向量为所述检索词样本词向量集合中的检索词样本词向量输入到所述知识图谱后的输出结果;
通过所述输入词向量集合以及所述答案词样本词向量集合对预设的推荐系统进行训练;
若接收到用户输入的检索信息,根据所述检索信息以及训练后的推荐系统向用户提供推荐结果。
第四方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时使所述处理器执行以下步骤:
获取预先存储的历史检索记录样本,并分别对所述历史检索记录样本的检索词样本以及答案词样本进行分词处理以相应得到检索词样本分词集合以及答案词样本分词集合;
分别对所述检索词样本分词集合以及答案词样本分词集合进行词向量训练以分别得到检索词样本词向量集合以及答案词样本词向量集合;
将所述检索词样本词向量集合中的所有检索词样本词向量输入到预先建立的知识图谱中以得到输入词向量集合,其中,所述知识图谱是根据所述历史检索记录样本建立的,所述输入词向量集合中的输入词向量为所述检索词样本词向量集合中的检索词样本词向量输入到所述知识图谱后的输出结果;
通过所述输入词向量集合以及所述答案词样本词向量集合对预设的推荐系统进行训练;
若接收到用户输入的检索信息,根据所述检索信息以及训练后的推荐系统向用户提供推荐结果。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种推荐系统训练方法的流程示意图;
图2为本申请实施例提供的一种推荐系统训练方法的子流程示意图;
图3为本申请实施例提供的一种推荐系统训练方法的子流程示意图;
图4为本申请实施例提供的一种推荐系统训练方法的子流程示意图;
图5为本申请实施例提供的一种推荐系统训练方法的子流程示意图;
图6为本申请实施例提供的一种推荐系统训练装置的示意性框图;
图7为本申请实施例提供的一种推荐系统训练装置的第一分词单元的示意性框图;
图8为本申请实施例提供的一种推荐系统训练装置的第一输入单元的示意性框图;
图9为本申请实施例提供的一种推荐系统训练装置的第二训练单元的示意性框;
图10为本申请实施例提供的一种推荐系统训练装置的第一推荐单元的示意性框图;以及
图11为本申请实施例提供的一种计算机设备的示意性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参阅图1,图1是本申请实施例提供的推荐系统训练方法的流程示意图。如图1所示,该方法包括以下步骤S1-S5。
S1,获取预先存储的历史检索记录样本,并分别对所述历史检索记录样本的检索词样本以及答案词样本进行分词处理以相应得到检索词样本分词集合以及答案词样本分词集合。
本申请实施例中,预先收集获取用户的历史检索记录,以得到历史检索记录样本。历史检索记录样本包括检索词样本以及相应的答案词样本。
具体实施中,对所述检索词样本进行分词处理以相应得到检索词样本分词集合。检索词样本分词集合是所述检索词样本的分词结果组成的集合。
相应地,对所述答案词样本进行分词处理以相应得到答案词样本分词集合。答案词样本分词集合是所述答案词样本的分词结果组成的集合。
在一实施例中,如图2所示,所述步骤S 1可包括步骤S11-S12。
S11,通过预设分词工具对检索词样本以及答案词样本进行分词处理以相应得到初始检索词样本分词集合以及初始答案词样本分词集合。
常用的分词工具为结巴分词工具。本申请实施例中,通过结巴分词工具对检索词样本以及答案词样本进行分词处理以相应得到初始检索词样本分词集合以及初始答案词样本分词集合。
或者在其它实施例中,通过其它分词工具对检索词样本以及答案词样本进行分词处理,本申请对此不做具体限定。
S12,分别将所述初始检索词样本分词集合以及初始答案词样本分词集合中的停止词去除以分别得到检索词样本分词集合以及答案词样本分词集合。
具体实施中,分别将所述初始检索词样本分词集合以及初始答案词样本分词集合中的停止词去除以分别得到检索词样本分词集合以及答案词样本分词集合。
要说明的是,停止词(stop word),常为介词、副词或连词等。停止词没有实际意义,因此需要去除。常用的停止词包括如″在″、″里面″、″也″、″的″、″它″、″为″等。
S2,分别对所述检索词样本分词集合以及答案词样本分词集合进行词向量训练以分别得到检索词样本词向量集合以及答案词样本词向量集合。
具体实施中,通过词向量工具word2vec分别对所述检索词样本分词集合以及答案词样本分词集合进行词向量训练以分别得到检索词样本词向量集合以及答案词样本词向量集合。
其中,检索词样本词向量集为检索词样本分词集合中各样本的词向量组成的集合;答案词样本词向量集合为答案词样本分词集合中各样本的词向量组成的集合。
需要说明的是,word2vec是一种常用的自然语言处理工具,其作用就是将自然语言中的字词转为计算机可以理解的词向量。
传统的词向量容易受维数灾难的困扰,且任意两个词之间都是孤立的,不能体现词和词之间的关系,因此本实施例采用word2vec来得到词向量,其可通过计算向量之间的距离来体现词与词之间的相似性。
或者,在其他实施例中,可采用其他词向量工具对所述检索词样本分词集 合以及答案词样本分词集合进行词向量训练,本申请对此不作具体限定。
S3,将所述检索词样本词向量集合中的所有检索词样本词向量输入到预先建立的知识图谱中以得到输入词向量集合。
具体实施中,预先根据历史检索记录样本建立知识图谱。本方案中,采用自顶向下的方式构建知识图谱。
具体的,借助百科类互联网网站等结构化数据源,从用户的历史检索记录样本中提取本体和模式信息,加入到知识库中得到知识图谱。
需要说明的是,知识图谱的建立方法可参照现有技术资料实现,本申请对此不作具体限定。
在预先建立了知识图谱后,将所述检索词样本词向量集合中的所有检索词样本词向量输入到预先建立的知识图谱中以得到输入词向量集合。其中,所述输入词向量集合中的输入词向量为所述检索词样本词向量集合中的检索词样本词向量输入到所述知识图谱后的输出结果。
在一实施例中,如图3所示,所述步骤S3可包括步骤S31-S34。
S31,从所述检索词样本词向量集合中获取一检索词样本词向量作为目标检索词样本词向量输入到所述知识图谱中,并将所述知识图谱的输出结果作为输入词向量加入到所述输入词向量集合中。
具体实施中,从所述检索词样本词向量集合中获取一检索词样本词向量作为目标检索词样本词向量输入到所述知识图谱中,并将所述知识图谱的输出结果作为输入词向量加入到所述输入词向量集合中。
S32,将所述目标检索词样本词向量从所述检索词样本词向量集合中移除。
具体实施中,在将目标词向量输入到知识图谱中获得输入词向量后,将所述目标检索词样本词向量从所述检索词样本词向量集合中移除,以避免重复输入。
S33,判断所述检索词样本词向量集合中是否还存在检索词样本词向量。
具体实施中,判断所述检索词样本词向量集合中是否还存在检索词样本词向量。
若所述检索词样本词向量集合中还存在检索词样本词向量,则执行步骤S31,返回所述从所述检索词样本词向量集合中获取一检索词样本词向量作为目标检索词样本词向量输入到所述知识图谱中,并将所述知识图谱的输出结果作 为输入词向量加入到所述输入词向量集合中的步骤。
具体实施中,如果所述检索词样本词向量集合中还存在检索词样本词向量,则返回所述从所述检索词样本词向量集合中获取一检索词样本词向量作为目标检索词样本词向量输入到所述知识图谱中,并将所述知识图谱的输出结果作为输入词向量加入到所述输入词向量集合中的步骤,如此循环以上步骤,直到将所述检索词样本词向量集合中的所有检索词样本词向量均输入到所述知识图谱中为止。
如果所述检索词样本词向量集合中不存在检索词样本词向量,则执行以下步骤S4。
S4,通过所述输入词向量集合以及所述答案词样本词向量集合对预设的推荐系统进行训练。
具体实施中,将所述输入词向量集合中的输入词向量作为输入数据,以及将所述答案词样本词向量集合中的相应的答案词向量作为参考输出数据来对所述推荐系统进行训练。
需要说明的是,在本实施例中,推荐系统是基于深度神经网络(Deep Neural Networks,DNN)建立的推荐系统。或者,在其它实施例中,可采用其它架构的推荐系统,本申请对此不作具体限定。
在一实施例中,如图4所示,所述步骤S4可包括步骤S41-S45。
S41,从所述输入词向量集合中获取一输入词向量作为目标输入词向量输入到所述推荐系统中。
具体实施中,从所述输入词向量集合中获取一输入词向量作为目标输入词向量输入到所述推荐系统中。
S42,判断所述推荐系统输出的答案词向量是否与目标答案词向量相同,其中,所述目标答案词向量为所述答案词样本词向量集合中与所述输入词向量相对应的答案词样本词向量。
具体实施中,判断所述推荐系统输出的答案词向量是否与目标答案词向量相同。其中,所述目标答案词向量为所述答案词样本词向量集合中与所述输入词向量相对应的答案词样本词向量。
S43,若所述推荐系统输出的答案词向量与目标答案词向量不相同,将所述推荐系统输出的答案词向量输入到所述知识图谱中,并将所述知识图谱的输出 结果作为新的目标输入词向量输入到所述推荐系统中。
具体实施中,如果所述推荐系统输出的答案词向量与目标答案词向量不相同,将所述推荐系统输出的答案词向量输入到所述知识图谱中,并将所述知识图谱的输出结果作为新的目标输入词向量输入到所述推荐系统中,如此循环以上步骤,直到所述推荐系统输出的答案词向量与目标答案词向量相同为止。
S44,若所述推荐系统输出的答案词向量与目标答案词向量相同,将所述目标输入词向量从所述输入词向量集合中移除。
具体实施中,如果所述推荐系统输出的答案词向量与目标答案词向量相同,将所述目标输入词向量从所述输入词向量集合中移除,以避免重复输入。
S45,判断所述输入词向量集合中是否存在输入词向量。
具体实施中,判断所述输入词向量集合中是否存在输入词向量,即判断是否所述输入词向量集合中的所有输入词向量均已输入到推荐系统中。如果输入词向量集合中还存在输入词向量,则继续执行步骤S41,继续获取输入词向量输入到推荐系统中;如果输入词向量集合中不存在输入词向量,则结束对推荐系统的训练,并执行步骤S5。
具体实施中,如果所述输入词向量集合中还存在输入词向量,返回所述从所述输入词向量集合中获取一输入词向量作为目标输入词向量输入到所述推荐系统中的步骤,如此循环以上步骤,直到输入词向量集合中不存在输入词向量为止。
S5,若接收到用户输入的检索信息,根据所述检索信息以及训练后的推荐系统向用户提供推荐结果。
具体实施中,如果接收到用户输入的检索信息,将所述检索信息转换为词向量后输入到通过上述步骤训练好的推荐系统中,并将推荐系统的输出结果作为提供给用户的推荐结果。
在一实施例中,如图5所示,所述步骤S5可包括步骤S51-S53。
S51,分别对所述检索信息进行分词处理以及词向量训练以得到所述检索信息的检索词向量。
具体实施中,分别对所述检索信息进行分词处理后得到检索词,对检索词进行词向量训练以得到检索词向量。
分词处理以及词向量训练的过程在上述步骤S1-S2中已经详细描述,在此 不再赘述。
S52,将所述检索词向量输入到所述知识图谱中,并将所述知识图谱的输出结果输入到训练好的推荐系统中。
具体实施中,将所述检索词向量输入到所述知识图谱中,并将所述知识图谱的输出结果输入到训练好的推荐系统中,通过以上步骤可使得推荐系统共享知识图谱中的数据特征,使得推荐系统的推荐结果更加的准确。
S53,将所述推荐系统的输出结果作为推荐结果推荐给用户。
具体实施中,将所述推荐系统的输出结果作为提供给用户的推荐结果。
本申请实施例的技术方案,通过将检索词样本词向量集合中的检索词样本词向量输入到预先建立的知识图谱中以得到输入词向量,并通过输入词向量来对推荐系统进行训练,能够使得推荐系统共享知识图谱中的数据特征,使得推荐系统在训练过程中实现了部分数据增强,提升了推荐系统的精度并一定程度上降低了过拟合的可能性,提高了推荐系统的准确性。
图6是本申请实施例提供的一种推荐系统训练装置60的示意性框图。如图6所示,对应于以上推荐系统训练方法,本申请还提供一种推荐系统训练装置60。该推荐系统训练装置60包括用于执行上述推荐系统训练方法的单元,该装置可以被配置于服务器中。具体地,请参阅图6,该推荐系统训练装置60包括第一分词单元61、第一训练单元62、第一输入单元63、第二训练单元64以及第一推荐单元65。
第一分词单元61,用于获取预先存储的历史检索记录样本,并分别对所述历史检索记录样本的检索词样本以及答案词样本进行分词处理以相应得到检索词样本分词集合以及答案词样本分词集合;
第一训练单元62,用于分别对所述检索词样本分词集合以及答案词样本分词集合进行词向量训练以分别得到检索词样本词向量集合以及答案词样本词向量集合;
第一输入单元63,用于将所述检索词样本词向量集合中的所有检索词样本词向量输入到预先建立的知识图谱中以得到输入词向量集合,其中,所述知识图谱是根据所述历史检索记录样本建立的,所述输入词向量集合中的输入词向量为所述检索词样本词向量集合中的检索词样本词向量输入到所述知识图谱后的输出结果;
第二训练单元64,用于通过所述输入词向量集合以及所述答案词样本词向量集合对预设的推荐系统进行训练;
第一推荐单元65,用于若接收到用户输入的检索信息,根据所述检索信息以及训练后的推荐系统向用户提供推荐结果。
在一实施例中,参见图7,所述第一分词单元61包括第二分词单元611以及去除单元612。
第二分词单元611,用于通过预设分词工具对检索词样本以及答案词样本进行分词处理以相应得到初始检索词样本分词集合以及初始答案词样本分词集合;
去除单元612,用于分别将所述初始检索词样本分词集合以及初始答案词样本分词集合中的停止词去除以分别得到检索词样本分词集合以及答案词样本分词集合。
在一实施例中,参见图8,所述第一输入单元63包括第二输入单元631、第一移除单元632、第一判断单元633以及第一通知单元634。
第二输入单元631,用于从所述检索词样本词向量集合中获取一检索词样本词向量作为目标检索词样本词向量输入到所述知识图谱中,并将所述知识图谱的输出结果作为输入词向量加入到所述输入词向量集合中;
第一移除单元632,用于将所述目标检索词样本词向量从所述检索词样本词向量集合中移除;
第一判断单元633,用于判断所述检索词样本词向量集合中是否还存在检索词样本词向量;
第一通知单元634,用于若所述检索词样本词向量集合中还存在检索词样本词向量,通知第二输入单元返回所述从所述检索词样本词向量集合中获取一检索词样本词向量作为目标检索词样本词向量输入到所述知识图谱中,并将所述知识图谱的输出结果作为输入词向量加入到所述输入词向量集合中的步骤。
在一实施例中,参见图9,所述第二训练单元64包括第三输入单元641、第二判断单元642、第二通知单元643、第二移除单元644、第三判断单元645以及第三通知单元646。
第三输入单元641,用于从所述输入词向量集合中获取一输入词向量作为目标输入词向量输入到所述推荐系统中;
第二判断单元642,用于判断所述推荐系统输出的答案词向量是否与目标答案词向量相同,其中,所述目标答案词向量为所述答案词样本词向量集合中与所述输入词向量相对应的答案词样本词向量;
第二通知单元643,用于若所述推荐系统输出的答案词向量与目标答案词向量不相同,将所述推荐系统输出的答案词向量输入到所述知识图谱中,并通知所述第三输入单元641将所述知识图谱的输出结果作为新的目标输入词向量输入到所述推荐系统中;
第二移除单元644,用于若所述推荐系统输出的答案词向量与目标答案词向量相同,将所述目标输入词向量从所述输入词向量集合中移除;
第三判断单元645,用于判断所述输入词向量集合中是否存在输入词向量;
第三通知单元646,用于若所述输入词向量集合中还存在输入词向量,通知所述第三输入单元641返回所述从所述输入词向量集合中获取一输入词向量作为目标输入词向量输入到所述推荐系统中的步骤。
在一实施例中,参见图10,所述第一推荐单元65包括处理单元651、第四输入单元652以及第二推荐单元653。
处理单元651,用于分别对所述检索信息进行分词处理以及词向量训练以得到所述检索信息的检索词向量;
第四输入单元652,用于将所述检索词向量输入到所述知识图谱中,并将所述知识图谱的输出结果输入到训练好的推荐系统中;
第二推荐单元653,用于将所述推荐系统的输出结果作为推荐结果推荐给用户。
需要说明的是,所属领域的技术人员可以清楚地了解到,上述推荐系统训练装置60和各单元的具体实现过程,可以参考前述方法实施例中的相应描述,为了描述的方便和简洁,在此不再赘述。
上述推荐系统训练装置可以实现为一种计算机程序的形式,该计算机程序可以在如图11所示的计算机设备上运行。
请参阅图11,图11是本申请实施例提供的一种计算机设备的示意性框图。该计算机设备500可以是终端,也可以是服务器,其中,终端可以是智能手机、平板电脑、笔记本电脑、台式电脑、个人数字助理和穿戴式设备等具有通信功能的电子设备。服务器可以是独立的服务器,也可以是多个服务器组成的服务 器集群。
参阅图11,该计算机设备500包括通过系统总线501连接的处理器502、存储器和网络接口505,其中,存储器可以包括非易失性存储介质503和内存储器504。
该非易失性存储介质503可存储操作系统5031和计算机程序5032。该计算机程序5032被执行时,可使得处理器502执行一种推荐系统训练方法。
该处理器502用于提供计算和控制能力,以支撑整个计算机设备500的运行。
该内存储器504为非易失性存储介质503中的计算机程序5032的运行提供环境,该计算机程序5032被处理器502执行时,可使得处理器502执行一种推荐系统训练方法。
该网络接口505用于与其它设备进行网络通信。本领域技术人员可以理解,图11中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备500的限定,具体的计算机设备500可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
其中,所述处理器502用于运行存储在存储器中的计算机程序5032,以实现本申请实施例的推荐系统训练方法。
应当理解,在本申请实施例中,处理器502可以是中央处理单元(Central Processing Unit,CPU),该处理器502还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
本领域普通技术人员可以理解的是实现上述实施例的方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成。该计算机程序可存储于一存储介质中,该存储介质为计算机可读存储介质。该计算机程序被该计算机系统中的至少一个处理器执行,以实现上述方法的实施例的流程步骤。
因此,本申请实施例还提供一种存储介质。该存储介质可以为计算机可读存储介质。该存储介质存储有计算机程序,该计算机程序被处理器执行时使处 理器执行以上各实施例中所描述的推荐系统训练方法的步骤。
所述存储介质可以是U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、磁碟或者光盘等各种可以存储计算机程序的计算机可读存储介质。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
以上所述,仅为本申请的具体实施方式,但本申请明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (20)

  1. 一种推荐系统训练方法,包括:
    获取预先存储的历史检索记录样本,并分别对所述历史检索记录样本的检索词样本以及答案词样本进行分词处理以相应得到检索词样本分词集合以及答案词样本分词集合;
    分别对所述检索词样本分词集合以及答案词样本分词集合进行词向量训练以分别得到检索词样本词向量集合以及答案词样本词向量集合;
    将所述检索词样本词向量集合中的所有检索词样本词向量输入到预先建立的知识图谱中以得到输入词向量集合,其中,所述知识图谱是根据所述历史检索记录样本建立的,所述输入词向量集合中的输入词向量为所述检索词样本词向量集合中的检索词样本词向量输入到所述知识图谱后的输出结果;
    通过所述输入词向量集合以及所述答案词样本词向量集合对预设的推荐系统进行训练;
    若接收到用户输入的检索信息,根据所述检索信息以及训练后的推荐系统向用户提供推荐结果。
  2. 根据权利要求1所述的方法,其中,所述分别对所述历史检索记录样本的检索词样本以及答案词样本进行分词处理以相应得到检索词样本分词集合以及答案词样本分词集合,包括:
    通过预设分词工具对检索词样本以及答案词样本进行分词处理以相应得到初始检索词样本分词集合以及初始答案词样本分词集合;
    分别将所述初始检索词样本分词集合以及初始答案词样本分词集合中的停止词去除以分别得到检索词样本分词集合以及答案词样本分词集合。
  3. 根据权利要求1所述的方法,其中,所述将所述检索词样本词向量集合中的所有检索词样本词向量输入到预先建立的知识图谱中以得到输入词向量集合,包括:
    从所述检索词样本词向量集合中获取一检索词样本词向量作为目标检索词样本词向量输入到所述知识图谱中,并将所述知识图谱的输出结果作为输入词向量加入到所述输入词向量集合中;
    将所述目标检索词样本词向量从所述检索词样本词向量集合中移除;
    判断所述检索词样本词向量集合中是否还存在检索词样本词向量;
    若所述检索词样本词向量集合中还存在检索词样本词向量,返回所述从所述检索词样本词向量集合中获取一检索词样本词向量作为目标检索词样本词向量输入到所述知识图谱中,并将所述知识图谱的输出结果作为输入词向量加入到所述输入词向量集合中的步骤。
  4. 根据权利要求1所述的方法,其中,所述通过所述输入词向量集合以及所述答案词样本词向量集合对预设的推荐系统进行训练,包括:
    从所述输入词向量集合中获取一输入词向量作为目标输入词向量输入到所述推荐系统中;
    判断所述推荐系统输出的答案词向量是否与目标答案词向量相同,其中,所述目标答案词向量为所述答案词样本词向量集合中与所述输入词向量相对应的答案词样本词向量;
    若所述推荐系统输出的答案词向量与目标答案词向量不相同,将所述推荐系统输出的答案词向量输入到所述知识图谱中,并将所述知识图谱的输出结果作为新的目标输入词向量输入到所述推荐系统中。
  5. 根据权利要求4所述的方法,其中,所述通过所述输入词向量集合以及所述答案词样本词向量集合对预设的推荐系统进行训练,还包括:
    若所述推荐系统输出的答案词向量与目标答案词向量相同,将所述目标输入词向量从所述输入词向量集合中移除;
    判断所述输入词向量集合中是否存在输入词向量;
    若所述输入词向量集合中还存在输入词向量,返回所述从所述输入词向量集合中获取一输入词向量作为目标输入词向量输入到所述推荐系统中的步骤。
  6. 根据权利要求1所述的方法,其中,所述根据所述检索信息以及训练后的推荐系统向用户提供推荐结果,包括:
    分别对所述检索信息进行分词处理以及词向量训练以得到所述检索信息的检索词向量;
    将所述检索词向量输入到所述知识图谱中,并将所述知识图谱的输出结果输入到训练好的推荐系统中;
    将所述推荐系统的输出结果作为推荐结果推荐给用户。
  7. 根据权利要求3所述的方法,其中,所述将所述检索词样本词向量集合中的所有检索词样本词向量输入到预先建立的知识图谱中以得到输入词向量集 合,还包括:
    若所述检索词样本词向量集合中不存在检索词样本词向量,转至所述通过所述输入词向量集合以及所述答案词样本词向量集合对预设的推荐系统进行训练的步骤。
  8. 根据权利要求5所述的方法,其中,所述通过所述输入词向量集合以及所述答案词样本词向量集合对预设的推荐系统进行训练,还包括:
    若所述输入词向量集合中还存在输入词向量,转至所述若接收到用户输入的检索信息,根据所述检索信息以及训练后的推荐系统向用户提供推荐结果的步骤。
  9. 根据权利要求1所述的方法,其中,所述分别对所述检索词样本分词集合以及答案词样本分词集合进行词向量训练以分别得到检索词样本词向量集合以及答案词样本词向量集合,包括:
    通过词向量工具word2vec分别对所述检索词样本分词集合以及所述答案词样本分词集合进行词向量训练以分别得到所述检索词样本词向量集合以及所述答案词样本词向量集合。
  10. 一种推荐系统训练装置,包括:
    第一分词单元,用于获取预先存储的历史检索记录样本,并分别对所述历史检索记录样本的检索词样本以及答案词样本进行分词处理以相应得到检索词样本分词集合以及答案词样本分词集合;
    第一训练单元,用于分别对所述检索词样本分词集合以及答案词样本分词集合进行词向量训练以分别得到检索词样本词向量集合以及答案词样本词向量集合;
    第一输入单元,用于将所述检索词样本词向量集合中的所有检索词样本词向量输入到预先建立的知识图谱中以得到输入词向量集合,其中,所述知识图谱是根据所述历史检索记录样本建立的,所述输入词向量集合中的输入词向量为所述检索词样本词向量集合中的检索词样本词向量输入到所述知识图谱后的输出结果;
    第二训练单元,用于通过所述输入词向量集合以及所述答案词样本词向量集合对预设的推荐系统进行训练;
    第一推荐单元,用于若接收到用户输入的检索信息,根据所述检索信息以 及训练后的推荐系统向用户提供推荐结果。
  11. 一种计算机设备,包括存储器以及与所述存储器相连的处理器;所述存储器用于存储计算机程序;所述处理器用于运行所述存储器中存储的计算机程序,以执行如下步骤:
    获取预先存储的历史检索记录样本,并分别对所述历史检索记录样本的检索词样本以及答案词样本进行分词处理以相应得到检索词样本分词集合以及答案词样本分词集合;
    分别对所述检索词样本分词集合以及答案词样本分词集合进行词向量训练以分别得到检索词样本词向量集合以及答案词样本词向量集合;
    将所述检索词样本词向量集合中的所有检索词样本词向量输入到预先建立的知识图谱中以得到输入词向量集合,其中,所述知识图谱是根据所述历史检索记录样本建立的,所述输入词向量集合中的输入词向量为所述检索词样本词向量集合中的检索词样本词向量输入到所述知识图谱后的输出结果;
    通过所述输入词向量集合以及所述答案词样本词向量集合对预设的推荐系统进行训练;
    若接收到用户输入的检索信息,根据所述检索信息以及训练后的推荐系统向用户提供推荐结果。
  12. 根据权利要求11所述的计算机设备,其中,所述分别对所述历史检索记录样本的检索词样本以及答案词样本进行分词处理以相应得到检索词样本分词集合以及答案词样本分词集合的步骤包括:
    通过预设分词工具对检索词样本以及答案词样本进行分词处理以相应得到初始检索词样本分词集合以及初始答案词样本分词集合;
    分别将所述初始检索词样本分词集合以及初始答案词样本分词集合中的停止词去除以分别得到检索词样本分词集合以及答案词样本分词集合。
  13. 根据权利要求11所述的计算机设备,其中,所述将所述检索词样本词向量集合中的所有检索词样本词向量输入到预先建立的知识图谱中以得到输入词向量集合的步骤包括:
    从所述检索词样本词向量集合中获取一检索词样本词向量作为目标检索词样本词向量输入到所述知识图谱中,并将所述知识图谱的输出结果作为输入词向量加入到所述输入词向量集合中;
    将所述目标检索词样本词向量从所述检索词样本词向量集合中移除;
    判断所述检索词样本词向量集合中是否还存在检索词样本词向量;
    若所述检索词样本词向量集合中还存在检索词样本词向量,返回所述从所述检索词样本词向量集合中获取一检索词样本词向量作为目标检索词样本词向量输入到所述知识图谱中,并将所述知识图谱的输出结果作为输入词向量加入到所述输入词向量集合中的步骤。
  14. 根据权利要求11所述的计算机设备,其中,所述通过所述输入词向量集合以及所述答案词样本词向量集合对预设的推荐系统进行训练的步骤包括:
    从所述输入词向量集合中获取一输入词向量作为目标输入词向量输入到所述推荐系统中;
    判断所述推荐系统输出的答案词向量是否与目标答案词向量相同,其中,所述目标答案词向量为所述答案词样本词向量集合中与所述输入词向量相对应的答案词样本词向量;
    若所述推荐系统输出的答案词向量与目标答案词向量不相同,将所述推荐系统输出的答案词向量输入到所述知识图谱中,并将所述知识图谱的输出结果作为新的目标输入词向量输入到所述推荐系统中。
  15. 根据权利要求14所述的计算机设备,其中,所述通过所述输入词向量集合以及所述答案词样本词向量集合对预设的推荐系统进行训练的步骤还包括:
    若所述推荐系统输出的答案词向量与目标答案词向量相同,将所述目标输入词向量从所述输入词向量集合中移除;
    判断所述输入词向量集合中是否存在输入词向量;
    若所述输入词向量集合中还存在输入词向量,返回所述从所述输入词向量集合中获取一输入词向量作为目标输入词向量输入到所述推荐系统中的步骤。
  16. 根据权利要求11所述的计算机设备,其中,所述根据所述检索信息以及训练后的推荐系统向用户提供推荐结果的步骤包括:
    分别对所述检索信息进行分词处理以及词向量训练以得到所述检索信息的检索词向量;
    将所述检索词向量输入到所述知识图谱中,并将所述知识图谱的输出结果输入到训练好的推荐系统中;
    将所述推荐系统的输出结果作为推荐结果推荐给用户。
  17. 根据权利要求13所述的计算机设备,其中,所述将所述检索词样本词向量集合中的所有检索词样本词向量输入到预先建立的知识图谱中以得到输入词向量集合的步骤还包括:
    若所述检索词样本词向量集合中不存在检索词样本词向量,转至所述通过所述输入词向量集合以及所述答案词样本词向量集合对预设的推荐系统进行训练的步骤。
  18. 根据权利要求15所述的计算机设备,其中,所述通过所述输入词向量集合以及所述答案词样本词向量集合对预设的推荐系统进行训练的步骤还包括:
    若所述输入词向量集合中还存在输入词向量,转至所述若接收到用户输入的检索信息,根据所述检索信息以及训练后的推荐系统向用户提供推荐结果的步骤。
  19. 根据权利要求11所述的计算机设备,其中,所述分别对所述检索词样本分词集合以及答案词样本分词集合进行词向量训练以分别得到检索词样本词向量集合以及答案词样本词向量集合的步骤包括:
    通过词向量工具word2vec分别对所述检索词样本分词集合以及所述答案词样本分词集合进行词向量训练以分别得到所述检索词样本词向量集合以及所述答案词样本词向量集合。
  20. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时使所述处理器执行以下步骤:
    获取预先存储的历史检索记录样本,并分别对所述历史检索记录样本的检索词样本以及答案词样本进行分词处理以相应得到检索词样本分词集合以及答案词样本分词集合;
    分别对所述检索词样本分词集合以及答案词样本分词集合进行词向量训练以分别得到检索词样本词向量集合以及答案词样本词向量集合;
    将所述检索词样本词向量集合中的所有检索词样本词向量输入到预先建立的知识图谱中以得到输入词向量集合,其中,所述知识图谱是根据所述历史检索记录样本建立的,所述输入词向量集合中的输入词向量为所述检索词样本词向量集合中的检索词样本词向量输入到所述知识图谱后的输出结果;
    通过所述输入词向量集合以及所述答案词样本词向量集合对预设的推荐系统进行训练;
    若接收到用户输入的检索信息,根据所述检索信息以及训练后的推荐系统向用户提供推荐结果。
PCT/CN2019/093138 2019-01-10 2019-06-27 推荐系统训练方法、装置、计算机设备及存储介质 WO2020143186A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910023778.7 2019-01-10
CN201910023778.7A CN109858528B (zh) 2019-01-10 2019-01-10 推荐系统训练方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2020143186A1 true WO2020143186A1 (zh) 2020-07-16

Family

ID=66894421

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/093138 WO2020143186A1 (zh) 2019-01-10 2019-06-27 推荐系统训练方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN109858528B (zh)
WO (1) WO2020143186A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487141A (zh) * 2020-11-26 2021-03-12 北京三快在线科技有限公司 推荐文案的生成方法、装置、设备及存储介质
CN112527998A (zh) * 2020-12-22 2021-03-19 深圳市优必选科技股份有限公司 一种答复推荐方法、答复推荐装置及智能设备
CN113268604A (zh) * 2021-05-19 2021-08-17 国网辽宁省电力有限公司 知识库自适应扩展方法及系统
CN115329200A (zh) * 2022-08-26 2022-11-11 国家开放大学 一种基于知识图谱和用户相似度的教学资源推荐方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858528B (zh) * 2019-01-10 2024-05-14 平安科技(深圳)有限公司 推荐系统训练方法、装置、计算机设备及存储介质
CN110825855B (zh) * 2019-09-18 2023-02-14 平安科技(深圳)有限公司 基于人工智能的应答方法、装置、计算机设备及存储介质
CN111368048A (zh) * 2020-02-26 2020-07-03 京东方科技集团股份有限公司 信息获取方法、装置、电子设备及计算机可读存储介质
CN111797257B (zh) * 2020-06-29 2024-02-23 京东方科技集团股份有限公司 基于词向量的图片推荐方法及相关设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070299824A1 (en) * 2006-06-27 2007-12-27 International Business Machines Corporation Hybrid approach for query recommendation in conversation systems
CN104156359A (zh) * 2013-05-13 2014-11-19 腾讯科技(深圳)有限公司 内链信息推荐方法及装置
CN105989040A (zh) * 2015-02-03 2016-10-05 阿里巴巴集团控股有限公司 智能问答的方法、装置及系统
CN108427707A (zh) * 2018-01-23 2018-08-21 深圳市阿西莫夫科技有限公司 人机问答方法、装置、计算机设备和存储介质
CN109858528A (zh) * 2019-01-10 2019-06-07 平安科技(深圳)有限公司 推荐系统训练方法、装置、计算机设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095444A (zh) * 2015-07-24 2015-11-25 百度在线网络技术(北京)有限公司 信息获取方法和装置
CN106055623A (zh) * 2016-05-26 2016-10-26 《中国学术期刊(光盘版)》电子杂志社有限公司 一种跨语言推荐方法和系统
CN108446286B (zh) * 2017-02-16 2023-04-25 阿里巴巴集团控股有限公司 一种自然语言问句答案的生成方法、装置及服务器
CN109165350A (zh) * 2018-08-23 2019-01-08 成都品果科技有限公司 一种基于深度知识感知的信息推荐方法和系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070299824A1 (en) * 2006-06-27 2007-12-27 International Business Machines Corporation Hybrid approach for query recommendation in conversation systems
CN104156359A (zh) * 2013-05-13 2014-11-19 腾讯科技(深圳)有限公司 内链信息推荐方法及装置
CN105989040A (zh) * 2015-02-03 2016-10-05 阿里巴巴集团控股有限公司 智能问答的方法、装置及系统
CN108427707A (zh) * 2018-01-23 2018-08-21 深圳市阿西莫夫科技有限公司 人机问答方法、装置、计算机设备和存储介质
CN109858528A (zh) * 2019-01-10 2019-06-07 平安科技(深圳)有限公司 推荐系统训练方法、装置、计算机设备及存储介质

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487141A (zh) * 2020-11-26 2021-03-12 北京三快在线科技有限公司 推荐文案的生成方法、装置、设备及存储介质
CN112527998A (zh) * 2020-12-22 2021-03-19 深圳市优必选科技股份有限公司 一种答复推荐方法、答复推荐装置及智能设备
CN113268604A (zh) * 2021-05-19 2021-08-17 国网辽宁省电力有限公司 知识库自适应扩展方法及系统
CN115329200A (zh) * 2022-08-26 2022-11-11 国家开放大学 一种基于知识图谱和用户相似度的教学资源推荐方法
CN115329200B (zh) * 2022-08-26 2024-04-26 国家开放大学 一种基于知识图谱和用户相似度的教学资源推荐方法

Also Published As

Publication number Publication date
CN109858528A (zh) 2019-06-07
CN109858528B (zh) 2024-05-14

Similar Documents

Publication Publication Date Title
WO2020143186A1 (zh) 推荐系统训练方法、装置、计算机设备及存储介质
US10586155B2 (en) Clarification of submitted questions in a question and answer system
TWI700632B (zh) 使用者意圖識別方法及裝置
US20200250378A1 (en) Methods and apparatuses for identifying a user intent of a statement
US10891322B2 (en) Automatic conversation creator for news
CN115485690A (zh) 用于处置聊天机器人的不平衡训练数据的分批技术
EP4006909B1 (en) Method, apparatus and device for quality control and storage medium
US20190114711A1 (en) Financial analysis system and method for unstructured text data
US10810056B2 (en) Adding descriptive metadata to application programming interfaces for consumption by an intelligent agent
WO2020048296A1 (zh) 机器学习方法、设备及存储介质
WO2020233360A1 (zh) 一种产品测评模型的生成方法及设备
US11688393B2 (en) Machine learning to propose actions in response to natural language questions
WO2020143303A1 (zh) 深度学习模型训练方法、装置、计算机设备及存储介质
WO2022267168A1 (zh) 语音识别方法、装置、计算机设备及存储介质
WO2021098876A1 (zh) 一种基于知识图谱的问答方法及装置
CN113312554B (zh) 用于评价推荐系统的方法及装置、电子设备和介质
US20230004988A1 (en) Systems and methods for utilizing feedback data
CN111061523B (zh) 一种软件包调用管理方法、系统、装置及存储介质
CN111382246B (zh) 文本的匹配方法、匹配装置、终端及计算机可读存储介质
CN111858899A (zh) 语句处理方法、装置、系统和介质
CN112784032A (zh) 会话语料推荐评价方法、装置、存储介质和电子设备
WO2021184579A1 (zh) 多解决方案的智能选择方法、装置、计算机设备及存储介质
US11144836B1 (en) Processing and re-using assisted support data to increase a self-support knowledge base
US20240152933A1 (en) Automatic mapping of a question or compliance controls associated with a compliance standard to compliance controls associated with another compliance standard
WO2022053018A1 (zh) 一种文本聚类系统、方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19909266

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19909266

Country of ref document: EP

Kind code of ref document: A1