WO2019192122A1 - 文档主题参数提取方法、产品推荐方法、设备及存储介质 - Google Patents

文档主题参数提取方法、产品推荐方法、设备及存储介质 Download PDF

Info

Publication number
WO2019192122A1
WO2019192122A1 PCT/CN2018/100312 CN2018100312W WO2019192122A1 WO 2019192122 A1 WO2019192122 A1 WO 2019192122A1 CN 2018100312 W CN2018100312 W CN 2018100312W WO 2019192122 A1 WO2019192122 A1 WO 2019192122A1
Authority
WO
WIPO (PCT)
Prior art keywords
product
topic
target
topics
theme
Prior art date
Application number
PCT/CN2018/100312
Other languages
English (en)
French (fr)
Inventor
王义文
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019192122A1 publication Critical patent/WO2019192122A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Definitions

  • the present application relates to the field of artificial intelligence, and in particular, to a document topic parameter extraction method, a product recommendation method, a device, and a storage medium.
  • the rapid development of the Internet has catalyzed the generation of massive information and gradually made big data an inevitable trend of current information technology. It is necessary to extract valuable data from various types of information quickly and effectively.
  • the current product recommendation is similar to the content, or the product containing the keyword is recommended to the user through a large number of keywords, but the products that are not similar to the user description but related to the theme, such as "health” and "
  • the "gene” keyword is irrelevant, but the subject is relevant, but when the "health” keyword is input through the prior art, the product related to the "gene” cannot be found, thereby affecting the accuracy of the recommendation.
  • a document topic parameter extraction method comprising:
  • the trained related topic models are trained based on a document sample set that contains a plurality of topics.
  • a product recommendation method comprising:
  • a target product associated with the theme of the product description is recommended to the user based on the distribution of the product description on the topic and the relationship between the topics in the related topic model and the probability distribution between the product and the topic.
  • An electronic device comprising: a memory for storing at least one instruction, the processor for executing the at least one instruction to implement the document subject parameter of any of any of the embodiments An extraction method, and/or a product recommendation method according to any one of the embodiments.
  • a non-volatile readable storage medium storing at least one instruction, the at least one instruction being executed by a processor to implement the document subject parameter of any of any of the embodiments
  • the present application provides a document topic parameter extraction method, which is obtained by training a trained related topic model through a document training set, and obtains a distribution of the target document on a topic, and any two of the plurality of topics.
  • the distribution of relationships between topics and the distribution between products and topics. Obtaining the input product description, processing the product description, obtaining the distribution of the product description on the theme, and the relationship between the topics in the related topic model and the probability distribution between the product and the theme.
  • the present application can search for products that are not similar in content but related to the theme, thereby recommending products with closely related topics, thereby avoiding finding only products with similar contents and improving accuracy. Thereby achieving a more accurate product.
  • FIG. 1 is a flow chart of a first preferred embodiment of a method for extracting a topic parameter of a document of the present application.
  • FIG. 2 is a flow chart of a first preferred embodiment of the product recommendation method of the present application.
  • FIG. 3 is a block diagram of a program of a first preferred embodiment of the subject parameter extracting apparatus of the present application.
  • FIG. 4 is a block diagram showing the program of the first preferred embodiment of the product recommendation device of the present application.
  • FIG. 5 is a schematic structural diagram of a preferred embodiment of an electronic device in at least one example of the present application.
  • FIG. 1 it is a flowchart of a first preferred embodiment of a method for extracting a topic parameter of the document of the present application.
  • the order of the steps in the flowchart may be changed according to different requirements, and some steps may be omitted.
  • the electronic device preprocesses the target document to obtain a word set of the target document.
  • the preprocessing the target document to obtain the word set of the target document comprises:
  • the special words include a web link, a user name tag, a special character, a place name tag, a punctuation mark, and the like.
  • the processed document is segmented to generate an n-gram (n is a positive integer, for example, n is less than 4) by extracting an n-gram.
  • n is a positive integer, for example, n is less than 4
  • the segmentation of Chinese text corpus is based on the Chinese lexical analysis system (ICTCLAS) tool.
  • ICTCLAS Chinese lexical analysis system
  • a set of three types of tuples, one-tuple, two-tuple, and three-tuple, are extracted from the text corpus.
  • the method further includes: removing, in the tuple set, a high frequency tuple of a preset number of occurrences in the text corpus (eg, the top 50 digits) ( That is, the high frequency word) and the low frequency tuple (ie, the low frequency word) lower than the preset number of times (eg, 3 times), and the processed tuple set is determined as the word set of the target document.
  • a high frequency tuple of a presetuple of occurrences in the text corpus eg, the top 50 digits
  • the low frequency tuple ie, the low frequency word
  • the n-tuple of the non-word can be removed.
  • the word segmentation technique is prior art, and the present application does not impose any limitation. This can improve the precision of the dictionary. This treatment does not hinder the effectiveness of the overall approach.
  • the related information model CTM (Correlated Topic Model) trained by the input of the target document is obtained by the electronic device, and the distribution of the target document on the theme and the relationship between any two topics in the plurality of topics are obtained. Distribution and distribution between products and topics, the trained related topic models are derived based on a sample set of documents that includes a plurality of topics.
  • the Correlated Topic Model uses a covariance matrix in a logistic normal distribution to model the subject weight to discover the distribution of document topics and the association between topics and topics.
  • the related topic model is a generation probability model in which an implicit semantic topic can be automatically extracted from a discrete data set, wherein the topic refers to content that often appears in the data set.
  • the related topic model describes the relationship between variables through a probability map model model, and calculates a probability distribution related to the topic through sampling or variation inference methods.
  • the related topic model can automatically discover topics that are implicit in the document collection, the topic being the probability distribution of words.
  • the related topic model provides a convenient tool for unsupervised analysis of documents and prediction of new documents.
  • the basic idea of the related topic model is that the document is a random mixture of several topics, where each topic is a multi-distribution of words.
  • the topic is the probability distribution of the vocabulary in the corpus.
  • a corpus has K topics, and K topics have different proportions in each document. Therefore, by training the related topic model through the document set, the distribution between the multiple topics and the distribution relationship between the product and the theme can be obtained.
  • the process of training the related topic model is as follows:
  • (a1) acquiring a document sample set, and configuring the document sample set into a training set and a test set. For example, 70% of document samples are used as training sets and 30% of document samples are used as test sets.
  • the optimal number of topics is used to represent the number of topics in the related topic model.
  • the present application trains the trained related topic model through the document training set, and obtains the distribution of the target document on the theme, the relationship distribution between any two topics in the plurality of topics, and the distribution between the product and the theme. Therefore, the present application can extract the topic parameter information of the document, thereby facilitating subsequent use of the correlation between the document topic parameters, and recommending the product related to the topic to the user.
  • FIG. 2 it is a flow chart of a first preferred embodiment of the product recommendation method of the present application.
  • the order of the steps in the flowchart may be changed according to different requirements, and some steps may be omitted.
  • the electronic device obtains the input product description, and uses the obtained product description as the target document.
  • the product description includes, but is not limited to, one or more of the following: words, words, paragraphs, and the like.
  • the form of the product description includes one or a combination of a voice form and a text form.
  • the products include, but are not limited to, wealth management products, online purchased goods, and the like.
  • the current bank's wealth management products are classified into multiple modules, such as high-yield modules, ready-to-go modules, regular monthly modules, and other types of wealth management products.
  • the user can input the description of the financial product that he wants to buy, such as voice input, etc., to find a wealth management product similar to the theme of the product description input by the user.
  • the electronic device processes the product description to obtain a distribution of product descriptions on a topic and a relationship between topics in the related topic model and a probability distribution between products and topics.
  • the electronic device processes the product description using the document subject parameter extraction method.
  • the training samples that train the related topic models include product descriptions for individual products. Use a product description as a sample document.
  • the related topic model is trained using the method of the first preferred embodiment.
  • the distribution of the product description on the topic is used to indicate the proportion of the topics included in the product description.
  • the relationship between the topics of the product is used to indicate the degree of association between any two topics in the related topic model. For example, there are three topics, the degree of association between topic A and topic B is 0.2, the degree of association between topic A and topic C is 0.8, the degree of association between topic B and topic C is 0.4, and so on.
  • the electronic device recommends, according to a distribution of the product description on the topic, a relationship between the topics in the related topic model, and a probability distribution between the product and the topic, recommending, to the user, a topic related to the product description.
  • Target product
  • recommending to the user that the target product associated with the theme of the product description comprises one or more of the following combinations:
  • the topic with the highest degree of relevance of the target topic determines, according to the probability distribution of the product and the topic in the related topic model, that the determined topic accounts for the product of the previous preset number of bits as part of the target product.
  • the description of the wealth management product input by the user includes topics with high profit and short time.
  • the topic with the highest relevance to the high-yield topic is that the annualized income is more than 5%, and the topic with the shortest time topic is the highest. Take it with you at any time. Among them, the annualized income is more than 5%.
  • the proportion of wealth management products A and wealth management products C is the highest.
  • the short-term theme has the highest proportion in wealth management products A and wealth management products D. Financial products A, wealth management products C and wealth management products D Is the target product. In this way, each topic in the product description can recommend the product with the highest relevance to the topic to the user, and realize the personalized recommendation of the product.
  • the description of the financial products input by the user includes topics of high profit and short time, in which the highest proportion of income is the highest, and the topic with the highest degree of interest is the annualized income of more than 5%.
  • the annualized revenue is more than 5%
  • the wealth management product A and the wealth management product C account for the highest proportion. Then, the wealth management product A and the wealth management product C are the target products.
  • the product description includes a theme A in which the theme C is related to the theme A, and the theme D is only associated with the theme C, indicating that the theme D is strongly associated with the theme C, and therefore, The subject D occupies the product ranked in the previous preset number as part of the target product.
  • the products associated with the topics in the product description are displayed in a category and the manner in which each type of product is recommended is displayed.
  • the product type most associated with Topic A the product category most relevant to Topic C, etc., so that the user can intuitively know the product associated with the topic of interest, and the user can personalize the selection according to the recommended product plan.
  • the method further includes: obtaining a product selected by the user according to the recommended target product, determining a theme included in the selected product, and ranking the selected product to include a product with a preset number of presets As part of the target product. This can be combined with the user's products of interest to recommend, more to meet the needs of users, to achieve personalized recommendations.
  • the present application provides a document topic parameter extraction method, which is obtained by training a trained related topic model through a document training set, and obtaining a distribution of the target document on a topic, and any two topics in multiple topics.
  • the distribution of relationships and the distribution between products and topics Obtaining the input product description, processing the product description, obtaining the distribution of the product description on the theme, and the relationship between the topics in the related topic model and the probability distribution between the product and the theme.
  • the present application can search for products that are not similar in content but related to the theme, thereby recommending products with closely related topics, thereby avoiding finding only products with similar contents and improving accuracy. Thereby achieving a more accurate product.
  • the document subject parameter extraction device 3 includes, but is not limited to, one or more of the following modules: a pre-processing module 30, a calculation module 31, and a training module 32.
  • the unit referred to in the present application refers to a series of computer readable instruction segments that can be executed by the processor of the document subject parameter extraction device 3 and capable of performing a fixed function, which are stored in the memory. The function of each unit will be detailed in the subsequent embodiments.
  • the pre-processing module 30 pre-processes the target document to obtain a set of words of the target document.
  • the preprocessing module 30 preprocesses the target document, and the obtained word set of the target document includes:
  • the special words include a web link, a user name tag, a special character, a place name tag, a punctuation mark, and the like.
  • the processed document is segmented to generate an n-gram (n is a positive integer, for example, n is less than 4) by extracting an n-gram.
  • n is a positive integer, for example, n is less than 4
  • the segmentation of Chinese text corpus is based on the Chinese lexical analysis system (ICTCLAS) tool.
  • ICTCLAS Chinese lexical analysis system
  • a set of three types of tuples, one-tuple, two-tuple, and three-tuple, are extracted from the text corpus.
  • the pre-processing module 30 is further configured to: in the tuple set, remove the first preset number of occurrences in the text corpus (for example, the top 50 digits) The high frequency tuple (ie, the high frequency word) and the low frequency tuple (ie, the low frequency word) lower than the preset number of times (eg, 3 times) determine the processed tuple set as the word set of the target document.
  • the n-tuple of the non-word can be removed.
  • the word segmentation technique is prior art, and the present application does not impose any limitation. This can improve the precision of the dictionary. This treatment does not hinder the effectiveness of the overall approach.
  • the calculation module 31 obtains the distribution of the target document on the theme, the relationship distribution between any two topics in the plurality of topics, and the product in the related topic model CTM (Correlated Topic Model) trained on the input of the target document.
  • the trained related topic model is trained based on a document sample set that includes a plurality of topics.
  • the Correlated Topic Model uses a covariance matrix in a logistic normal distribution to model the subject weight to discover the distribution of document topics and the association between topics and topics.
  • the related topic model is a generation probability model in which an implicit semantic topic can be automatically extracted from a discrete data set, wherein the topic refers to content that often appears in the data set.
  • the related topic model describes the relationship between variables through a probability map model model, and calculates a probability distribution related to the topic through sampling or variation inference methods.
  • the related topic model can automatically discover topics that are implicit in the document collection, the topic being the probability distribution of words.
  • the related topic model provides a convenient tool for unsupervised analysis of documents and prediction of new documents.
  • the basic idea of the related topic model is that the document is a random mixture of several topics, where each topic is a multi-distribution of words.
  • the topic is the probability distribution of the vocabulary in the corpus.
  • a corpus has K topics, and K topics have different proportions in each document. Therefore, by training the related topic model through the document set, the distribution between the multiple topics and the distribution relationship between the product and the theme can be obtained.
  • the training module 32 trains the related topic model as follows:
  • (a1) acquiring a document sample set, and configuring the document sample set into a training set and a test set. For example, 70% of document samples are used as training sets and 30% of document samples are used as test sets.
  • the optimal number of topics is used to represent the number of topics in the related topic model.
  • the product recommendation device 4 includes, but is not limited to, one or more of the following modules: an acquisition module 40, a data calculation module 41, a recommendation module 42, and a display module 43.
  • a unit referred to in this application refers to a series of computer readable instruction segments that can be executed by a processor of the product recommendation device 4 and that are capable of performing a fixed function, which are stored in a memory. The function of each unit will be detailed in the subsequent embodiments.
  • the obtaining module 40 acquires the input product description, and takes the obtained product description as the target document.
  • the product description includes, but is not limited to, one or more of the following: words, words, paragraphs, and the like.
  • the form of the product description includes one or a combination of a voice form and a text form.
  • the products include, but are not limited to, wealth management products, online purchased goods, and the like.
  • the current bank's wealth management products are classified into multiple modules, such as high-yield modules, ready-to-go modules, regular monthly modules, and other types of wealth management products.
  • the user can input the description of the financial product that he wants to buy, such as voice input, etc., to find a wealth management product similar to the theme of the product description input by the user.
  • the data calculation module 41 processes the product description to obtain a distribution of product descriptions on the topic and a relationship between topics in the related topic model and a probability distribution between products and topics.
  • the electronic device processes the product description using the document subject parameter extraction method.
  • the training samples that train the related topic models include product descriptions for individual products. Use a product description as a sample document.
  • the related topic model is trained using the method of the first preferred embodiment.
  • the distribution of the product description on the topic is used to indicate the proportion of the topics included in the product description.
  • the relationship between the topics of the product is used to indicate the degree of association between any two topics in the related topic model. For example, there are three topics, the degree of association between topic A and topic B is 0.2, the degree of association between topic A and topic C is 0.8, the degree of association between topic B and topic C is 0.4, and so on.
  • the recommendation module 42 recommends a target associated with the theme of the product description to the user based on the distribution of the product description on the topic and the relationship between the topics in the related topic model and the probability distribution between the product and the topic. product.
  • the recommendation module 42 recommends, based on the distribution of the product description on the theme and the relationship between the topics of the product, the target product associated with the theme of the product description to the user, including one or more of the following combination:
  • the topic with the highest degree of relevance of the target topic determines, according to the probability distribution of the product and the topic in the related topic model, that the determined topic accounts for the product of the previous preset number of bits as part of the target product.
  • the description of the wealth management product input by the user includes topics with high profit and short time.
  • the topic with the highest relevance to the high-yield topic is that the annualized income is more than 5%, and the topic with the shortest time topic is the highest. Take it with you at any time. Among them, the annualized income is more than 5%.
  • the proportion of wealth management products A and wealth management products C is the highest.
  • the short-term theme has the highest proportion in wealth management products A and wealth management products D. Financial products A, wealth management products C and wealth management products D Is the target product. In this way, each topic in the product description can recommend the product with the highest relevance to the topic to the user, and realize the personalized recommendation of the product.
  • the description of the financial products input by the user includes topics of high profit and short time, in which the highest proportion of income is the highest, and the topic with the highest degree of interest is the annualized income of more than 5%.
  • the annualized revenue is more than 5%
  • the wealth management product A and the wealth management product C account for the highest proportion. Then, the wealth management product A and the wealth management product C are the target products.
  • the product description includes a topic A in which the topic C is related to the topic A, and the topic D is only associated with the topic C, indicating that the topic D is strongly associated with the theme C, and therefore, The subject D occupies the product ranked in the previous preset number as part of the target product.
  • the display module 43 displays the product categories associated with the topics in the product description and displays the manner in which each type of product is recommended. For example, the product type most associated with Topic A, the product category most relevant to Topic C, etc., so that the user can intuitively know the product associated with the topic of interest, and the user can personalize the selection according to the recommended product plan.
  • the recommendation module 42 is further configured to: obtain a product selected by the user according to the recommended target product, determine a theme included in the selected product, and rank the selected product in the preset content in a preset position.
  • the number of products is part of the target product. This can be combined with the user's products of interest to recommend, more to meet the needs of users, to achieve personalized recommendations.
  • the present application provides a document topic parameter extraction method, which is obtained by training a trained related topic model through a document training set, and obtaining a distribution of the target document on a topic, and any two topics in multiple topics.
  • the distribution of relationships and the distribution between products and topics Obtaining the input product description, processing the product description, obtaining the distribution of the product description on the theme, and the relationship between the topics in the related topic model and the probability distribution between the product and the theme.
  • the present application can search for products that are not similar in content but related to the theme, thereby recommending products with closely related topics, thereby avoiding finding only products with similar contents and improving accuracy. Thereby achieving a more accurate product.
  • the above-described integrated unit implemented in the form of a software program module can be stored in a non-volatile readable storage medium.
  • the software program module described above is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform the method of each embodiment of the present application. Part of the steps.
  • the electronic device 5 comprises at least one transmitting device 51, at least one memory 52, at least one processor 53, at least one receiving device 54, and at least one communication bus.
  • the communication bus is used to implement connection communication between these components.
  • the electronic device 5 is a device capable of automatically performing numerical calculation and/or information processing according to an instruction set or stored in advance, and the hardware includes, but not limited to, a microprocessor, an application specific integrated circuit (ASIC). ), Field-Programmable Gate Array (FPGA), Digital Signal Processor (DSP), embedded devices, etc.
  • the electronic device 5 may also comprise a network device and/or a user device.
  • the network device includes, but is not limited to, a single network server, a server group composed of multiple network servers, or a cloud computing-based cloud composed of a large number of hosts or network servers, where the cloud computing is distributed computing.
  • a super virtual computer consisting of a group of loosely coupled computers.
  • the electronic device 5 can be, but is not limited to, any electronic product that can interact with a user through a keyboard, a touch pad, or a voice control device, such as a tablet, a smart phone, or a personal digital assistant (Personal Digital Assistant). , PDA), smart wearable devices, camera equipment, monitoring equipment and other terminals.
  • a keyboard e.g., a keyboard
  • a touch pad e.g., a touch pad
  • a voice control device such as a tablet, a smart phone, or a personal digital assistant (Personal Digital Assistant). , PDA), smart wearable devices, camera equipment, monitoring equipment and other terminals.
  • PDA Personal Digital Assistant
  • the network in which the electronic device 5 is located includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (VPN), and the like.
  • the Internet includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (VPN), and the like.
  • VPN virtual private network
  • the receiving device 54 and the sending device 51 may be wired transmission ports, or may be wireless devices, for example, including antenna devices, for performing data communication with other devices.
  • the memory 52 is used to store program code.
  • the memory 52 may be a circuit having a storage function, such as a RAM (Random-Access Memory), a FIFO (First In First Out), or the like, which is not in a physical form in the integrated circuit.
  • the memory 52 may also be a memory having a physical form, such as a memory stick, a TF card (Trans-flash Card), a smart media card, a secure digital card, a flash memory card.
  • Storage devices such as (flash card) and the like.
  • the processor 53 can include one or more microprocessors, digital processors.
  • the processor 53 can call program code stored in the memory 52 to perform related functions.
  • the various modules described in FIG. 3 are program code stored in the memory 52 and executed by the processor 53 to implement a document subject parameter extraction method; and/or as described in FIG.
  • the individual modules are program code stored in the memory 52 and executed by the processor 53 to implement a product recommendation method.
  • the processor 53 also known as a central processing unit (CPU), is a very large-scale integrated circuit, which is a computing core (Core) and a control unit (Control Unit).
  • the embodiment of the present application further provides a non-volatile readable storage medium having stored thereon computer instructions that, when executed by an electronic device including one or more processors, cause the electronic device to perform the method as described above.
  • a non-volatile readable storage medium having stored thereon computer instructions that, when executed by an electronic device including one or more processors, cause the electronic device to perform the method as described above.
  • the memory 52 in the electronic device 5 stores a plurality of instructions to implement a document subject parameter extraction method, and the processor 53 can execute the plurality of instructions to implement:
  • the relationship between the distribution of the relationship and the distribution of the product and the subject, the trained related topic model is obtained based on the training of the document sample set, the trained related topic model containing a plurality of topics.
  • the executing, by the processor 53 the plurality of instructions further includes:
  • the processed document is segmented to obtain a tuple set.
  • the executing, by the processor 53 the plurality of instructions further includes:
  • the plurality of instructions corresponding to the document topic parameter extraction method are stored in the memory 52 in any of the embodiments, and are executed by the processor 53, and will not be described in detail herein.
  • the memory 52 in the electronic device 5 stores a plurality of instructions to implement a product recommendation method
  • the processor 53 can execute the plurality of instructions to implement:
  • the executing, by the processor 53 the plurality of instructions further includes:
  • the executing, by the processor 53 the plurality of instructions further includes:
  • the processor 53 executing the plurality of instructions further comprises: displaying a product classification associated with the topic in the product description, and displaying a manner of recommending each type of product.
  • the executing, by the processor, the plurality of instructions further includes: acquiring a product selected by the user according to the recommended target product, determining a theme included in the selected product, and selecting the selected item
  • the product contains a product that occupies a predetermined number of digits as part of the target product.
  • the above-described characteristic means of the present application can be implemented by an integrated circuit and control the function of implementing the document subject parameter extraction method in any of the above embodiments. That is, the integrated circuit of the present application is installed in the electronic device, so that the electronic device performs the functions of preprocessing the target document to obtain a word set of the target document, and correlating the input of the target document.
  • the integrated circuit of the present application is installed in the electronic device, so that the electronic device performs the functions of preprocessing the target document to obtain a word set of the target document, and correlating the input of the target document.
  • the theme model CTM the distribution of the target document on the theme, the relationship distribution between any two topics in the plurality of topics, and the distribution between the product and the theme are obtained, and the trained related topic model is based on the document sample set. Trained, the trained related topic model contains multiple topics.
  • the function that can be implemented by the document theme parameter extraction method in any embodiment can be installed in the electronic device by using the integrated circuit of the present application, so that the electronic device can play the document theme parameter extraction method in any embodiment.
  • the functions that can be implemented are not described in detail here.
  • the above-described characteristic means of the present application can be implemented by an integrated circuit and control the function of implementing the document subject parameter extraction method in any of the above embodiments. That is, the integrated circuit of the present application is installed in the electronic device, so that the electronic device performs the following functions: acquiring an input product description, using the acquired product description as a target document; and extracting the document theme parameters in any embodiment.
  • the method processes the product description to obtain a distribution of the product description on the theme and a relationship between the topics in the related topic model and a probability distribution between the product and the theme; and a distribution and a theme based on the product description
  • the relationship between the topics in the related topic model and the probability distribution between the products and the topics are described, and the target products associated with the topics described by the products are recommended to the user.
  • the functions that can be implemented by the product recommendation method in any of the embodiments can be installed in the electronic device through the integrated circuit of the present application, so that the electronic device can be implemented by the product recommendation method described in any embodiment. Function, no longer detailed here.
  • the disclosed apparatus may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical or otherwise.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a non-volatile readable storage medium.
  • a computer device which may be a personal computer, server or network device, etc.
  • the foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like.

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供一种文档主题参数提取方法,通过文档训练集,训练得到训练好的相关主题模型中,得到所述目标文档在主题上的分布、多个主题中任意两个主题之间的关系分布及产品与主题间的分布。本申请还提供一种产品推荐方法:获取输入的产品描述,对所述产品描述进行处理,得到产品描述在主题上的分布及所述相关主题模型中主题之间的关系及产品与主题间的概率分布。本申请还提供一种电子设备及存储介质。本申请能避免了只找内容相似的产品,提高了准确度,从而实现了更准确的产品。

Description

文档主题参数提取方法、产品推荐方法、设备及存储介质
本申请要求于2018年04月03日提交中国专利局,申请号为201810287788.7发明名称为“文档主题参数提取方法、产品推荐方法、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,尤其涉及一种文档主题参数提取方法、产品推荐方法、设备及存储介质。
背景技术
互联网的快速发展催化了海量信息的产生,并逐步让大数据成为当前信息技术的必然趋势,则需要快速,且有效的从各类信息中提取有价值的数据。而目前的产品推荐根据内容相似,或者通过关键词从海量的产品中进行找到包含关键词的产品推荐给用户,但遗漏了与用户描述内容不相似但主题相关的产品,例如“健康”与“基因”关键词不相关,但主题相关,但通过现有技术当输入“健康”关键词,无法找到与“基因”相关的产品,从而影响了推荐的准确度。
发明内容
鉴于以上内容,有必要提供一种文档主题参数提取方法、产品推荐方法、及电子设备,能避免了只找内容相似的产品,提高了准确度,从而实现了更准确的产品。
一种文档主题参数提取方法,所述方法包括:
对目标文档预处理,得到所述目标文档的词集;
将所述目标文档的输入训练好的相关主题模型CTM中,得到所述目标文档在主题上的分布、多个主题中任意两个主题之间的关系分布及产品与主题间的分布,所述训练好的相关主题模型是基于文档样本集训练得到,所述训练好的相关主题模型包含多个主题。
一种产品推荐方法,所述方法包括:
获取输入的产品描述,将获取的产品描述作为目标文档;
利用如任意实施例中所述文档主题参数提取方法对所述产品描述进行处理,得到产品描述在主题上的分布及所述相关主题模型中主题之间的关系及产品与主题间的概率分布;
基于所述产品描述在主题上的分布及所述相关主题模型中主题之间的关系及产品与主题间的概率分布,向用户推荐与所述产品描述的主题相关联的目标产品。
一种电子设备,所述电子设备包括存储器及处理器,所述存储器用于存储至少一个指令,所述处理器用于执行所述至少一个指令以实现任意实施例中任一项所述文档主题参数提取方法,及/或任意实施例中任一项所述产品推荐方法。
一种非易失性可读存储介质,所述非易失性可读存储介质存储有至少一个指令,所述至少一个指令被处理器执行时实现任意实施例中任一项所述文档主题参数提取方法,及/或任意实施例中任一项所述产品推荐方法。
由以上技术方案可知,本申请提供一种文档主题参数提取方法,通过文档训练集,训练得到训练好的相关主题模型中,得到所述目标文档在主题上的分布、多个主题中任意两个主题之间的关系分布及产品与主题间的分布。获取输入的产品描述,对所述产品描述进行处理,得到产品描述在主题上的分布及所述相关主题模型中主题之间的关系及产品与主题间的概率分布。本申请通过上述实施例中基于所述相关主题模型,能搜索到内容不相似,但主题相关的产品,从而推荐主题密切相关的产品,从而避免了只找内容相似的产品,提高了准确度,从而实现了更准确的产品。
附图说明
图1是本申请文档主题参数提取方法的第一较佳实施例的流程图。
图2是本申请产品推荐方法的第一较佳实施例的流程图。
图3是本申请文档主题参数提取装置的第一较佳实施例的程序模块图。
图4是本申请产品推荐装置的第一较佳实施例的程序模块图。
图5是本申请至少一个实例中电子设备的较佳实施例的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”和“第三”等是用于区别不同对象,而非用于描述特定顺序。此外,术语“包括”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
如图1所示,是本申请文档主题参数提取方法的第一较佳实施例的流程图。根据不同的需求,该流程图中步骤的顺序可以改变,某些步骤可以省略。
S10、电子设备对目标文档预处理,得到所述目标文档的词集。
优选地,所述对目标文档预处理,得到所述目标文档的词集包括:
(1)去除所述目标文档中的特殊词语,得到处理后的文档。
进一步地,所述特殊词语包括网址链接、用户名标记、特殊字符、地名标记、标点符号等。
(2)对所述处理后的文档进行分词,得到元组集。
通过提取n元组(n-gram)的方式对所述处理后的文档进行分词生成n元组(n为正整数,例如n小于4)。例如,如对中文文本语料进行分词是基于汉语词法分析系统(Institute of Computing Technology,Chinese Lexical Analysis System,ICTCLAS)工具完成的。例如,对于空格分隔的文本语料(如英语),可直接通过空格分词;而对于汉语、日语等无空格作为分隔的文本语料。
进一步地,从文本语料中提取出一元组、二元组和三元组共三类元组集合。
优选的,在得到所述元组集后,所述方法还包括:在所述元组集中,移除在文本语料中出现次数居前预设位数(如居前50位)的高频元组(即高频词)以及低于预设次数(如3次)的低频元组(即低频词),将处理后的元组集确定为所述目标文档的词集。
在可选实施例中,考虑到词语的语言特性,去除一定比例的高频元组(通常是停用词等)和低频元组(通常是人名、非词等),只取剩下的中频元组作为情感词典的候选词。高频元组通常是停用词,它们同各类词语都有较高的同现机会,因此对情感特性的表达并不明显;低频元组通常是非词语或用户名等,这些元组没有语言意义,因此需要被移除。这样,将出现次数居中的中频元组作为一部分候选词。
在其他实施中,采用分词技术进行分词后,再结合n元组生成候选词集,可以去除不成词的n元组。所述分词技术是现有技术,本申请不做任何限制。这样可以提高词典精度。这种处理并不妨碍整体方法的有效性。
S11、所述电子设备将所述目标文档的输入训练好的相关主题模型CTM(Correlated Topic Model)中,得到所述目标文档在主题上的分布、多个主题中任意两个主题之间的关系分布及产品与主题间的分布,所述训练好的相关主题模型是基于文档样本集训练得到,所述训练好的相关主题模型包含多个主题。
在本申请中,所述关联主题模型CTM(Correlated Topic Model)是使用logistic正态分布中的协方差矩阵来对主题比重进行建模从而发现文档主题的分布,及主题与主题之间的关联。
所述相关主题模型是一种可以从离散数据集中自动提取其中隐含语义主题的生成概率模型,其中主题指的是数据集中经常共同出现的内容。所述相关主题模型通过概率图模型模型来描述各个变量间的关系,通过抽样或者变分推断方法来计算与主题有关的概率分布。
所述相关主题模型可以自动发现隐含在文档集合中的主题,主题是词的概率分布。所述相关主题模型为无监督地分析文档和预测新文档提供了方便的工具。所述相关主题模型的基本思想是文档是若干主题的随机混合,其中每个主题是词的多项分布。在文档集中,主题是语料库中词汇表的概率分布, 假设一个语料库有K个主题,K个主题在每篇文档中所占的比例均不同。因此,通过文档集训练所述相关主题模型,能得到多个主题间的分布及产品与主题间的分布关系。
优选地,训练所述相关主题模型的过程如下:
(a1)、获取文档样本集,将所述文档样本集配置成训练集和测试集。例如,70%的文档样本作为训练集,30%的文档样本作为测试集。
(a2)、配置所述训练集的最优主题数目。
所述最优主题数目用于表示相关主题模型中的主题数目。
(a3)、基于所述训练集及所述最优主题数目,利用相关主题模型对所述训练集中的文档进行建模,得到相关主题模型中的各个参数。
(a4)、将所述测试集中文档样本对应的词集输入到所述步骤S112步骤中训练得到的相关主题模型中,得到所述测试集中的各个文档主题表示。
(a5)、评价训练得到的相关主题模型的准确率,若训练得到的相关主题模型小于预设准确率,例如,99%,则增加训练集中的样本及/或步进式调整所述最优主题数目,重复上述训练相关主题模型的步骤,直到训练得到的相关主题模型的准确率大于等于预设准确率,例如,99%。
本申请通过文档训练集,训练得到训练好的相关主题模型中,得到所述目标文档在主题上的分布、多个主题中任意两个主题之间的关系分布及产品与主题间的分布。因此,本申请能提取文档的主题参数信息,从而便于后续利用文档主题参数之间的相关性,向用户推荐与主题相关的产品。
如图2所示,是本申请产品推荐方法的第一较佳实施例的流程图。根据不同的需求,该流程图中步骤的顺序可以改变,某些步骤可以省略。
S20、电子设备获取输入的产品描述,将获取的产品描述作为目标文档。
在可选实施例中,所述产品描述包括但不限于以下一种或者多种的组合:字、词、一段话等等。所述产品描述的形式包括语音形式、文字形式中的一种或者多种的组合。
优选地,所述产品包括,但不限于:理财产品、网购的商品等等。
例如,目前银行的理财产品都是分类成多个模块,如收益高模块、随时随取模块,定期一个月模块等等不同类型的理财产品。用户在购买理财产品时,可以输入自己想买的理财产品描述,如语音输入一段话等等,从而找出与用户输入的产品描述的主题相似的理财产品。
S21、所述电子设备对所述产品描述进行处理,得到产品描述在主题上的分布及所述相关主题模型中主题之间的关系及产品与主题间的概率分布。
在优选实施例中,所述电子设备利用所述文档主题参数提取方法对所述产品描述进行处理。
在可选实施例中,训练所述相关主题模型的训练样本包括各个产品的产品描述。将一个产品描述作为一个文档样本。利用第一较优实施例中的方法训练所述相关主题模型。
进一步地,所述产品描述在主题上的分布用于表示所述产品描述包含的主题的比重。例如,所述产品描述包括三个主题、主题A、主题B、主题C、其中比重关系为:主题A:主题B:主题C=16:2:1。
进一步地,所述产品的主题之间的关系用于表示相关主题模型中任意两个主题之间的关联程度。例如,有三主题,主题A与主题B的关联度为0.2,主题A与主题C的关联度为0.8,主题B与主题C的关联度为0.4等等。
S22、所述电子设备基于所述产品描述在主题上的分布及所述相关主题模型中主题之间的关系及产品与主题间的概率分布,向用户推荐与所述产品描述的主题相关联的目标产品。
优选的,所述基于所述产品描述在主题上的分布及产品的主题之间的关系,向用户推荐与所述产品描述的主题相关联的目标产品包括以下一种或者多种的组合:
(1)基于所述产品描述在主题上的分布,获取所述产品描述包含的至少一个目标主题,根据所述相关主题模型中主题之间的关系,确定与所述至少一个目标主题中每个目标主题的关联度最高的主题,根据所述相关主题模型中产品与主题的概率分布,确定所述确定的主题占比排在前预设位数的产品作为所述目标产品的一部分。
例如,用户输入的理财产品的描述包括的主题包括收益高、时间短两个主题,与收益高主题关联度最高的主题为收益年化在5%以上,与时间短主题关联度最高的主题为随时随取。其中收益年化在5%以上在理财产品A、及理财产品C中占比最高,时间短主题在理财产品A及理财产品D上占比最高,则理财产品A、理财产品C及理财产品D是目标产品。这样产品描述中的每个主题都能推荐与主题关联度最高的产品给用户,实现产品的个性化推荐。
(2)基于所述产品描述在主题上的分布,获取所述产品描述中占比最高的主题,根据所述相关主题模型中主题之间的关系,确定与所述占比最高的主题的关联度最高的目标主题,根据所述相关主题模型中产品与主题的概率分布,确定所述目标主题占比排在前预设位数的产品作为所述目标产品的一部分。
例如,用户输入的理财产品的描述包括的主题包括收益高、时间短两个主题,其中收益高的占比最高,与收益高主题关联度最高的主题为收益年化在5%以上。其中收益年化在5%以上在理财产品A、及理财产品C中占比最高,则理财产品A、理财产品C是目标产品。
(3)基于所述产品描述在主题上的分布,获取所述产品描述包含的至少一个目标主题,根据所述相关主题模型中产品与主题的概率分布,确定包含所述至少一个目标主题的产品,将确定的产品作为所述目标产品的一部分。
(4)基于所述产品描述在主题上的分布,获取所述产品描述包含的至少一个目标主题,根据所述相关主题模型中主题之间的关系,确定与所述至少一个目标主题关联的第一主题,再确定只与第一主题关联的第二主题,根据所述相关主题模型中产品与主题的概率分布,确定所述第二主题占比排在前预设位数的产品作为所述目标产品的一部分。这样体现了主题间的间接关系,从而找出间接的强关联主题,给用户推荐个性化产品。
例如,所述产品描述包含主题A,在所述相关主题模型中,主题C与所述主题A相关,而主题D只与所述主题C关联,说明主题D与主题C强关联,因此,将主题D占比排在前预设位数的产品作为所述目标产品的一部分。
优选地,将与所述产品描述中主题关联的产品分类显示,并显示每类产品推荐的方式。例如,与主题A最关联的产品类型、与主题C最关联的产品类等等,这样用户能直观地知道与自己感兴趣的主题关联的产品,便于用户根据推荐的产品方案个性化选取。
优选地,所述方法还包括:获取用户根据推荐的目标产品选中的产品,确定所述选中的产品包含的主题,将所述选中的产品包含的主题占比排在前预设位数的产品作为所述目标产品的一部分。这样可以结合用户的感兴趣的产品进行推荐,更能贴合用户的需求,实现产品的个性化推荐。
通过上述实施例中基于所述相关主题模型,能搜索到内容不相似,但主题相关的产品,从而推荐主题密切相关的产品,从而避免了只找内容相似的产品,提高了准确度,从而实现了更准确的产品。
通过以上实施例,本申请提供一种文档主题参数提取方法,通过文档训练集,训练得到训练好的相关主题模型中,得到所述目标文档在主题上的分布、多个主题中任意两个主题之间的关系分布及产品与主题间的分布。获取输入的产品描述,对所述产品描述进行处理,得到产品描述在主题上的分布及所述相关主题模型中主题之间的关系及产品与主题间的概率分布。本申请通过上述实施例中基于所述相关主题模型,能搜索到内容不相似,但主题相关的产品,从而推荐主题密切相关的产品,从而避免了只找内容相似的产品,提高了准确度,从而实现了更准确的产品。
如图3所示,本申请文档主题参数提取装置的第一较佳实施例的程序模块图。所述文档主题参数提取装置3包括,但不限于以下一个或者多个模块:预处理模块30、计算模块31及训练模块32。本申请所称的单元是指一种能够被文档主题参数提取装置3的处理器所执行并且能够完成固定功能的一系列计算机可读指令段,其存储在存储器中。关于各单元的功能将在后续的实施例中详述。
所述预处理模块30对目标文档预处理,得到所述目标文档的词集。
优选地,所述预处理模块30对目标文档预处理,得到所述目标文档的词集包括:
(1)去除所述目标文档中的特殊词语,得到处理后的文档。
进一步地,所述特殊词语包括网址链接、用户名标记、特殊字符、地名标记、标点符号等。
(2)对所述处理后的文档进行分词,得到元组集。
通过提取n元组(n-gram)的方式对所述处理后的文档进行分词生成n元组(n为正整数,例如n小于4)。例如,如对中文文本语料进行分词是基于汉语词法分析系统(Institute of Computing Technology,Chinese Lexical Analysis System,ICTCLAS)工具完成的。例如,对于空格分隔的文本语料(如英语),可直接通过空格分词;而对于汉语、日语等无空格作为分隔的文本语料。
进一步地,从文本语料中提取出一元组、二元组和三元组共三类元组集合。
优选的,在得到所述元组集后,所述预处理模块30还具体用于:在所述 元组集中,移除在文本语料中出现次数居前预设位数(如居前50位)的高频元组(即高频词)以及低于预设次数(如3次)的低频元组(即低频词),将处理后的元组集确定为所述目标文档的词集。
在可选实施例中,考虑到词语的语言特性,去除一定比例的高频元组(通常是停用词等)和低频元组(通常是人名、非词等),只取剩下的中频元组作为情感词典的候选词。高频元组通常是停用词,它们同各类词语都有较高的同现机会,因此对情感特性的表达并不明显;低频元组通常是非词语或用户名等,这些元组没有语言意义,因此需要被移除。这样,将出现次数居中的中频元组作为一部分候选词。
在其他实施中,采用分词技术进行分词后,再结合n元组生成候选词集,可以去除不成词的n元组。所述分词技术是现有技术,本申请不做任何限制。这样可以提高词典精度。这种处理并不妨碍整体方法的有效性。
计算模块31将所述目标文档的输入训练好的相关主题模型CTM(Correlated Topic Model)中,得到所述目标文档在主题上的分布、多个主题中任意两个主题之间的关系分布及产品与主题间的分布,所述训练好的相关主题模型是基于文档样本集训练得到,所述训练好的相关主题模型包含多个主题。
在本申请中,所述关联主题模型CTM(Correlated Topic Model)是使用logistic正态分布中的协方差矩阵来对主题比重进行建模从而发现文档主题的分布,及主题与主题之间的关联。
所述相关主题模型是一种可以从离散数据集中自动提取其中隐含语义主题的生成概率模型,其中主题指的是数据集中经常共同出现的内容。所述相关主题模型通过概率图模型模型来描述各个变量间的关系,通过抽样或者变分推断方法来计算与主题有关的概率分布。
所述相关主题模型可以自动发现隐含在文档集合中的主题,主题是词的概率分布。所述相关主题模型为无监督地分析文档和预测新文档提供了方便的工具。所述相关主题模型的基本思想是文档是若干主题的随机混合,其中每个主题是词的多项分布。在文档集中,主题是语料库中词汇表的概率分布,假设一个语料库有K个主题,K个主题在每篇文档中所占的比例均不同。因此,通过文档集训练所述相关主题模型,能得到多个主题间的分布及产品与主题间的分布关系。
优选地,训练模块32训练所述相关主题模型的过程如下:
(a1)、获取文档样本集,将所述文档样本集配置成训练集和测试集。例如,70%的文档样本作为训练集,30%的文档样本作为测试集。
(a2)、配置所述训练集的最优主题数目。
所述最优主题数目用于表示相关主题模型中的主题数目。
(a3)、基于所述训练集及所述最优主题数目,利用相关主题模型对所述训练集中的文档进行建模,得到相关主题模型中的各个参数。
(a4)、将所述测试集中文档样本对应的词集输入到所述步骤S112步骤中训练得到的相关主题模型中,得到所述测试集中的各个文档主题表示。
(a5)、评价训练得到的相关主题模型的准确率,若训练得到的相关主题模 型小于预设准确率,例如,99%,则增加训练集中的样本及/或步进式调整所述最优主题数目,重复上述训练相关主题模型的步骤,直到训练得到的相关主题模型的准确率大于等于预设准确率,例如,99%。
如图4所示,本申请产品推荐装置的第一较佳实施例的程序模块图。所述产品推荐装置4包括,但不限于以下一个或者多个模块:获取模块40、数据计算模块41、推荐模块42及显示模块43。本申请所称的单元是指一种能够被产品推荐装置4的处理器所执行并且能够完成固定功能的一系列计算机可读指令段,其存储在存储器中。关于各单元的功能将在后续的实施例中详述。
所述获取模块40获取输入的产品描述,将获取的产品描述作为目标文档。
在可选实施例中,所述产品描述包括但不限于以下一种或者多种的组合:字、词、一段话等等。所述产品描述的形式包括语音形式、文字形式中的一种或者多种的组合。
优选地,所述产品包括,但不限于:理财产品、网购的商品等等。
例如,目前银行的理财产品都是分类成多个模块,如收益高模块、随时随取模块,定期一个月模块等等不同类型的理财产品。用户在购买理财产品时,可以输入自己想买的理财产品描述,如语音输入一段话等等,从而找出与用户输入的产品描述的主题相似的理财产品。
所述数据计算模块41对所述产品描述进行处理,得到产品描述在主题上的分布及所述相关主题模型中主题之间的关系及产品与主题间的概率分布。
在优选实施例中,所述电子设备利用所述文档主题参数提取方法对所述产品描述进行处理。
在可选实施例中,训练所述相关主题模型的训练样本包括各个产品的产品描述。将一个产品描述作为一个文档样本。利用第一较优实施例中的方法训练所述相关主题模型。
进一步地,所述产品描述在主题上的分布用于表示所述产品描述包含的主题的比重。例如,所述产品描述包括三个主题、主题A、主题B、主题C、其中比重关系为:主题A:主题B:主题C=16:2:1。
进一步地,所述产品的主题之间的关系用于表示相关主题模型中任意两个主题之间的关联程度。例如,有三主题,主题A与主题B的关联度为0.2,主题A与主题C的关联度为0.8,主题B与主题C的关联度为0.4等等。
所述推荐模块42基于所述产品描述在主题上的分布及所述相关主题模型中主题之间的关系及产品与主题间的概率分布,向用户推荐与所述产品描述的主题相关联的目标产品。
优选的,所述推荐模块42基于所述产品描述在主题上的分布及产品的主题之间的关系,向用户推荐与所述产品描述的主题相关联的目标产品包括以下一种或者多种的组合:
(1)基于所述产品描述在主题上的分布,获取所述产品描述包含的至少一个目标主题,根据所述相关主题模型中主题之间的关系,确定与所述至少一个目标主题中每个目标主题的关联度最高的主题,根据所述相关主题模型中产品与 主题的概率分布,确定所述确定的主题占比排在前预设位数的产品作为所述目标产品的一部分。
例如,用户输入的理财产品的描述包括的主题包括收益高、时间短两个主题,与收益高主题关联度最高的主题为收益年化在5%以上,与时间短主题关联度最高的主题为随时随取。其中收益年化在5%以上在理财产品A、及理财产品C中占比最高,时间短主题在理财产品A及理财产品D上占比最高,则理财产品A、理财产品C及理财产品D是目标产品。这样产品描述中的每个主题都能推荐与主题关联度最高的产品给用户,实现产品的个性化推荐。
(2)基于所述产品描述在主题上的分布,获取所述产品描述中占比最高的主题,根据所述相关主题模型中主题之间的关系,确定与所述占比最高的主题的关联度最高的目标主题,根据所述相关主题模型中产品与主题的概率分布,确定所述目标主题占比排在前预设位数的产品作为所述目标产品的一部分。
例如,用户输入的理财产品的描述包括的主题包括收益高、时间短两个主题,其中收益高的占比最高,与收益高主题关联度最高的主题为收益年化在5%以上。其中收益年化在5%以上在理财产品A、及理财产品C中占比最高,则理财产品A、理财产品C是目标产品。
(3)基于所述产品描述在主题上的分布,获取所述产品描述包含的至少一个目标主题,根据所述相关主题模型中产品与主题的概率分布,确定包含所述至少一个目标主题的产品,将确定的产品作为所述目标产品的一部分。
(4)基于所述产品描述在主题上的分布,获取所述产品描述包含的至少一个目标主题,根据所述相关主题模型中主题之间的关系,确定与所述至少一个目标主题关联的第一主题,再确定只与第一主题关联的第二主题,根据所述相关主题模型中产品与主题的概率分布,确定所述第二主题占比排在前预设位数的产品作为所述目标产品的一部分。这样体现了主题间的间接关系,从而找出间接的强关联主题,给用户推荐个性化产品。
例如,所述产品描述包含主题A,在所述相关主题模型中,主题C与所述主题A相关,而主题D只与所述主题C关联,说明主题D与主题C强关联,因此,将主题D占比排在前预设位数的产品作为所述目标产品的一部分。
优选地,所述显示模块43将与所述产品描述中主题关联的产品分类显示,并显示每类产品推荐的方式。例如,与主题A最关联的产品类型、与主题C最关联的产品类等等,这样用户能直观地知道与自己感兴趣的主题关联的产品,便于用户根据推荐的产品方案个性化选取。
优选地,所述推荐模块42还用于:获取用户根据推荐的目标产品选中的产品,确定所述选中的产品包含的主题,将所述选中的产品包含的主题占比排在前预设位数的产品作为所述目标产品的一部分。这样可以结合用户的感兴趣的产品进行推荐,更能贴合用户的需求,实现产品的个性化推荐。
通过上述实施例中基于所述相关主题模型,能搜索到内容不相似,但主题相关的产品,从而推荐主题密切相关的产品,从而避免了只找内容相似的产品,提高了准确度,从而实现了更准确的产品。
通过以上实施例,本申请提供一种文档主题参数提取方法,通过文档训练集,训练得到训练好的相关主题模型中,得到所述目标文档在主题上的分 布、多个主题中任意两个主题之间的关系分布及产品与主题间的分布。获取输入的产品描述,对所述产品描述进行处理,得到产品描述在主题上的分布及所述相关主题模型中主题之间的关系及产品与主题间的概率分布。本申请通过上述实施例中基于所述相关主题模型,能搜索到内容不相似,但主题相关的产品,从而推荐主题密切相关的产品,从而避免了只找内容相似的产品,提高了准确度,从而实现了更准确的产品。
上述以软件程序模块的形式实现的集成的单元,可以存储在一个非易失性可读取存储介质中。上述软件程序模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请每个实施例所述方法的部分步骤。
如图5所示,所述电子设备5包括至少一个发送装置51、至少一个存储器52、至少一个处理器53、至少一个接收装置54以及至少一个通信总线。其中,所述通信总线用于实现这些组件之间的连接通信。
所述电子设备5是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。所述电子设备5还可包括网络设备和/或用户设备。其中,所述网络设备包括但不限于单个网络服务器、多个网络服务器组成的服务器组或基于云计算(Cloud Computing)的由大量主机或网络服务器构成的云,其中,云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。
所述电子设备5可以是,但不限于任何一种可与用户通过键盘、触摸板或声控设备等方式进行人机交互的电子产品,例如,平板电脑、智能手机、个人数字助理(Personal Digital Assistant,PDA)、智能式穿戴式设备、摄像设备、监控设备等终端。
所述电子设备5所处的网络包括,但不限于互联网、广域网、城域网、局域网、虚拟专用网络(Virtual Private Network,VPN)等。
其中,所述接收装置54和所述发送装置51可以是有线发送端口,也可以为无线设备,例如包括天线装置,用于与其他设备进行数据通信。
所述存储器52用于存储程序代码。所述存储器52可以是集成电路中没有实物形式的具有存储功能的电路,如RAM(Random-Access Memory,随机存取存储器)、FIFO(First In First Out,)等。或者,所述存储器52也可以是具有实物形式的存储器,如内存条、TF卡(Trans-flash Card)、智能媒体卡(smart media card)、安全数字卡(secure digital card)、快闪存储器卡(flash card)等储存设备等等。
所述处理器53可以包括一个或者多个微处理器、数字处理器。所述处理器53可调用存储器52中存储的程序代码以执行相关的功能。例如,图3中所述的各个模块是存储在所述存储器52中的程序代码,并由所述处理器53所执行,以实现一种文档主题参数提取方法;及/或图4中所述的各个模块是存储在所述存储器52中的程序代码,并由所述处理器53所执行,以实现一种产品推荐 方法。所述处理器53又称中央处理器(CPU,Central Processing Unit),是一块超大规模的集成电路,是运算核心(Core)和控制核心(Control Unit)。
本申请实施例还提供一种非易失性可读存储介质,其上存储有计算机指令,所述指令当被包括一个或多个处理器的电子设备执行时,使电子设备执行如上文方法实施例所述的文档主题参数提取方法及/或产品推荐方法。
结合图1所示,所述电子设备5中的所述存储器52存储多个指令以实现一种文档主题参数提取方法,所述处理器53可执行所述多个指令从而实现:
对目标文档预处理,得到所述目标文档的词集;将所述目标文档的输入训练好的相关主题模型CTM中,得到所述目标文档在主题上的分布、多个主题中任意两个主题之间的关系分布及产品与主题间的分布,所述训练好的相关主题模型是基于文档样本集训练得到,所述训练好的相关主题模型包含多个主题。
在本申请的可选实施例中,所述处理器53可执行所述多个指令还包括:
去除所述目标文档中的特殊词语,得到处理后的文档;
对所述处理后的文档进行分词,得到元组集。
在本申请的可选实施例中,所述处理器53可执行所述多个指令还包括:
在所述元组集中,移除在文本语料中出现次数居前预设位数的高频元组以及低于预设次数的低频元组,将处理后的元组集确定为所述目标文档的词集。
在任意实施例中所述文档主题参数提取方法对应的多个指令存储在所述存储器52,并通过所述处理器53来执行,在此不再详述。
结合图2所示,所述电子设备5中的所述存储器52存储多个指令以实现一种产品推荐方法,所述处理器53可执行所述多个指令从而实现:
获取输入的产品描述,将获取的产品描述作为目标文档;利用任意实施例中所述文档主题参数提取方法对所述产品描述进行处理,得到产品描述在主题上的分布及所述相关主题模型中主题之间的关系及产品与主题间的概率分布;基于所述产品描述在主题上的分布及所述相关主题模型中主题之间的关系及产品与主题间的概率分布,向用户推荐与所述产品描述的主题相关联的目标产品。
在本申请的可选实施例中,所述处理器53可执行所述多个指令还包括:
基于所述产品描述在主题上的分布,获取所述产品描述包含的至少一个目标主题,根据所述相关主题模型中主题之间的关系,确定与所述至少一个目标主题中每个目标主题的关联度最高的主题,根据所述相关主题模型中产品与主题的概率分布,确定所述确定的主题占比排在前预设位数的产品作为所述目标产品的一部分;
基于所述产品描述在主题上的分布,获取所述产品描述中占比最高的主题,根据所述相关主题模型中主题之间的关系,确定与所述占比最高的主题的关联度最高的目标主题,根据所述相关主题模型中产品与主题的概率分布,确定所述目标主题占比排在前预设位数的产品作为所述目标产品的一部分;
基于所述产品描述在主题上的分布,获取所述产品描述包含的至少一个目标主题,根据所述相关主题模型中产品与主题的概率分布,确定包含所述至少一个目标主题的产品,将确定的产品作为所述目标产品的一部分。
在本申请的可选实施例中,所述处理器53可执行所述多个指令还包括:
基于所述产品描述在主题上的分布,获取所述产品描述包含的至少一个目 标主题,根据所述相关主题模型中主题之间的关系,确定与所述至少一个目标主题关联的第一主题,再确定只与第一主题关联的第二主题,根据所述相关主题模型中产品与主题的概率分布,确定所述第二主题占比排在前预设位数的产品作为所述目标产品的一部分。
在本申请的可选实施例中,所述处理器53可执行所述多个指令还包括:将与所述产品描述中主题关联的产品分类显示,并显示每类产品推荐的方式。
在本申请的可选实施例中,所述处理器53可执行所述多个指令还包括:获取用户根据推荐的目标产品选中的产品,确定所述选中的产品包含的主题,将所述选中的产品包含的主题占比排在前预设位数的产品作为所述目标产品的一部分。
以上说明的本申请的特征性的手段可以通过集成电路来实现,并控制实现上述任意实施例中所述文档主题参数提取方法的功能。即,本申请的集成电路安装于所述电子设备中,使所述电子设备发挥如下功能:对目标文档预处理,得到所述目标文档的词集;将所述目标文档的输入训练好的相关主题模型CTM中,得到所述目标文档在主题上的分布、多个主题中任意两个主题之间的关系分布及产品与主题间的分布,所述训练好的相关主题模型是基于文档样本集训练得到,所述训练好的相关主题模型包含多个主题。
在任意实施例中所述文档主题参数提取方法所能实现的功能都能通过本申请的集成电路安装于所述电子设备中,使所述电子设备发挥任意实施例中所述文档主题参数提取方法所能实现的功能,在此不再详述。
以上说明的本申请的特征性的手段可以通过集成电路来实现,并控制实现上述任意实施例中所述文档主题参数提取方法的功能。即,本申请的集成电路安装于所述电子设备中,使所述电子设备发挥如下功能:获取输入的产品描述,将获取的产品描述作为目标文档;利用任意实施例中所述文档主题参数提取方法对所述产品描述进行处理,得到产品描述在主题上的分布及所述相关主题模型中主题之间的关系及产品与主题间的概率分布;基于所述产品描述在主题上的分布及所述相关主题模型中主题之间的关系及产品与主题间的概率分布,向用户推荐与所述产品描述的主题相关联的目标产品。
在任意实施例中所述产品推荐方法所能实现的功能都能通过本申请的集成电路安装于所述电子设备中,使所述电子设备发挥任意实施例中所述产品推荐方法所能实现的功能,在此不再详述。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方 式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请的各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个非易失性可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (20)

  1. 一种文档主题参数提取方法,其特征在于,所述方法包括:
    对目标文档预处理,得到所述目标文档的词集;
    将所述目标文档的输入训练好的相关主题模型CTM中,得到所述目标文档在主题上的分布、多个主题中任意两个主题之间的关系分布及产品与主题间的分布,所述训练好的相关主题模型是基于文档样本集训练得到,所述训练好的相关主题模型包含多个主题。
  2. 如权利要求1所述的文档主题参数提取方法,其特征在于,所述对目标文档预处理,得到所述目标文档的词集包括:
    去除所述目标文档中的特殊词语,得到处理后的文档;
    对所述处理后的文档进行分词,得到元组集。
  3. 如权利要求2所述的文档主题参数提取方法,其特征在于,所述方法还包括:
    在所述元组集中,移除在文本语料中出现次数居前预设位数的高频元组以及低于预设次数的低频元组,将处理后的元组集确定为所述目标文档的词集。
  4. 一种产品推荐方法,其特征在于,所述方法包括:
    获取输入的产品描述,将获取的产品描述作为目标文档;
    利用如权利要求1至3中任一项所述文档主题参数提取方法对所述产品描述进行处理,得到产品描述在主题上的分布及所述相关主题模型中主题之间的关系及产品与主题间的概率分布;
    基于所述产品描述在主题上的分布及所述相关主题模型中主题之间的关系及产品与主题间的概率分布,向用户推荐与所述产品描述的主题相关联的目标产品。
  5. 如权利要求4所述的产品推荐方法,其特征在于,所述基于所述产品描述在主题上的分布及产品的主题之间的关系,向用户推荐与所述产品描述的主题相关联的目标产品包括以下一种或者多种的组合:
    基于所述产品描述在主题上的分布,获取所述产品描述包含的至少一个目标主题,根据所述相关主题模型中主题之间的关系,确定与所述至少一个目标主题中每个目标主题的关联度最高的主题,根据所述相关主题模型中产品与主题的概率分布,确定所述确定的主题占比排在前预设位数的产品作为所述目标产品的一部分;
    基于所述产品描述在主题上的分布,获取所述产品描述中占比最高的主题,根据所述相关主题模型中主题之间的关系,确定与所述占比最高的主题的关联度最高的目标主题,根据所述相关主题模型中产品与主题的概率分布,确定所述目标主题占比排在前预设位数的产品作为所述目标产品的一部分;
    基于所述产品描述在主题上的分布,获取所述产品描述包含的至少一个目标主题,根据所述相关主题模型中产品与主题的概率分布,确定包含所述至少一个目标主题的产品,将确定的产品作为所述目标产品的一部分。
  6. 如权利要求4所述的产品推荐方法,其特征在于,所述基于所述产品描述在主题上的分布及产品的主题之间的关系,向用户推荐与所述产品描述的主 题相关联的目标产品还包括:
    基于所述产品描述在主题上的分布,获取所述产品描述包含的至少一个目标主题,根据所述相关主题模型中主题之间的关系,确定与所述至少一个目标主题关联的第一主题,再确定只与第一主题关联的第二主题,根据所述相关主题模型中产品与主题的概率分布,确定所述第二主题占比排在前预设位数的产品作为所述目标产品的一部分。
  7. 如权利要4所述的产品推荐方法,其特征在于,所述方法还包括:将与所述产品描述中主题关联的产品分类显示,并显示每类产品推荐的方式。
  8. 如权利要求4所述的产品推荐方法,其特征在于,所述方法还包括:获取用户根据推荐的目标产品选中的产品,确定所述选中的产品包含的主题,将所述选中的产品包含的主题占比排在前预设位数的产品作为所述目标产品的一部分。
  9. 一种电子设备,其特征在于,所述电子设备包括存储器及处理器,所述存储器用于存储至少一个指令,所述处理器用于执行所述至少一个指令以实现以下步骤:
    对目标文档预处理,得到所述目标文档的词集;
    将所述目标文档的输入训练好的相关主题模型CTM中,得到所述目标文档在主题上的分布、多个主题中任意两个主题之间的关系分布及产品与主题间的分布,所述训练好的相关主题模型是基于文档样本集训练得到,所述训练好的相关主题模型包含多个主题。
  10. 如权利要求9所述的电子设备,其特征在于,所述对目标文档预处理,得到所述目标文档的词集包括:
    去除所述目标文档中的特殊词语,得到处理后的文档;
    对所述处理后的文档进行分词,得到元组集。
  11. 如权利要求10所述的电子设备,其特征在于,所述处理器还用于执行所述至少一个指令以实现以下步骤:
    在所述元组集中,移除在文本语料中出现次数居前预设位数的高频元组以及低于预设次数的低频元组,将处理后的元组集确定为所述目标文档的词集。
  12. 一种电子设备,其特征在于,所述电子设备包括存储器及处理器,所述存储器用于存储至少一个指令,所述处理器用于执行所述至少一个指令以实现以下步骤:
    获取输入的产品描述,将获取的产品描述作为目标文档;
    利用如权利要求1至3中任一项所述文档主题参数提取方法对所述产品描述进行处理,得到产品描述在主题上的分布及所述相关主题模型中主题之间的关系及产品与主题间的概率分布;
    基于所述产品描述在主题上的分布及所述相关主题模型中主题之间的关系及产品与主题间的概率分布,向用户推荐与所述产品描述的主题相关联的目标产品。
  13. 如权利要求12所述的电子设备,其特征在于,所述基于所述产品描述在主题上的分布及产品的主题之间的关系,向用户推荐与所述产品描述的主题相关联的目标产品包括以下一种或者多种的组合:
    基于所述产品描述在主题上的分布,获取所述产品描述包含的至少一个目标主题,根据所述相关主题模型中主题之间的关系,确定与所述至少一个目标主题中每个目标主题的关联度最高的主题,根据所述相关主题模型中产品与主题的概率分布,确定所述确定的主题占比排在前预设位数的产品作为所述目标产品的一部分;
    基于所述产品描述在主题上的分布,获取所述产品描述中占比最高的主题,根据所述相关主题模型中主题之间的关系,确定与所述占比最高的主题的关联度最高的目标主题,根据所述相关主题模型中产品与主题的概率分布,确定所述目标主题占比排在前预设位数的产品作为所述目标产品的一部分;
    基于所述产品描述在主题上的分布,获取所述产品描述包含的至少一个目标主题,根据所述相关主题模型中产品与主题的概率分布,确定包含所述至少一个目标主题的产品,将确定的产品作为所述目标产品的一部分。
  14. 如权利要求13所述的电子设备,其特征在于,所述基于所述产品描述在主题上的分布及产品的主题之间的关系,向用户推荐与所述产品描述的主题相关联的目标产品还包括:
    基于所述产品描述在主题上的分布,获取所述产品描述包含的至少一个目标主题,根据所述相关主题模型中主题之间的关系,确定与所述至少一个目标主题关联的第一主题,再确定只与第一主题关联的第二主题,根据所述相关主题模型中产品与主题的概率分布,确定所述第二主题占比排在前预设位数的产品作为所述目标产品的一部分。
  15. 一种非易失性可读存储介质,其特征在于,所述非易失性可读存储介质存储有至少一个指令,所述至少一个指令被处理器执行时实现以下步骤:
    对目标文档预处理,得到所述目标文档的词集;
    将所述目标文档的输入训练好的相关主题模型CTM中,得到所述目标文档在主题上的分布、多个主题中任意两个主题之间的关系分布及产品与主题间的分布,所述训练好的相关主题模型是基于文档样本集训练得到,所述训练好的相关主题模型包含多个主题。
  16. 如权利要求15所述的存储介质,其特征在于,所述对目标文档预处理,得到所述目标文档的词集包括:
    去除所述目标文档中的特殊词语,得到处理后的文档;
    对所述处理后的文档进行分词,得到元组集。
  17. 如权利要求16所述的存储介质,其特征在于,所述至少一个指令被处理器执行时还实现以下步骤::
    在所述元组集中,移除在文本语料中出现次数居前预设位数的高频元组以及低于预设次数的低频元组,将处理后的元组集确定为所述目标文档的词集。
  18. 一种非易失性可读存储介质,其特征在于,所述非易失性可读存储介质存储有至少一个指令,所述至少一个指令被处理器执行时实现以下步骤:
    获取输入的产品描述,将获取的产品描述作为目标文档;
    利用如权利要求1至3中任一项所述文档主题参数提取方法对所述产品描述进行处理,得到产品描述在主题上的分布及所述相关主题模型中主题之间的关系及产品与主题间的概率分布;
    基于所述产品描述在主题上的分布及所述相关主题模型中主题之间的关系及产品与主题间的概率分布,向用户推荐与所述产品描述的主题相关联的目标产品。
  19. 如权利要求18所述的存储介质,其特征在于,所述基于所述产品描述在主题上的分布及产品的主题之间的关系,向用户推荐与所述产品描述的主题相关联的目标产品包括以下一种或者多种的组合:
    基于所述产品描述在主题上的分布,获取所述产品描述包含的至少一个目标主题,根据所述相关主题模型中主题之间的关系,确定与所述至少一个目标主题中每个目标主题的关联度最高的主题,根据所述相关主题模型中产品与主题的概率分布,确定所述确定的主题占比排在前预设位数的产品作为所述目标产品的一部分;
    基于所述产品描述在主题上的分布,获取所述产品描述中占比最高的主题,根据所述相关主题模型中主题之间的关系,确定与所述占比最高的主题的关联度最高的目标主题,根据所述相关主题模型中产品与主题的概率分布,确定所述目标主题占比排在前预设位数的产品作为所述目标产品的一部分;
    基于所述产品描述在主题上的分布,获取所述产品描述包含的至少一个目标主题,根据所述相关主题模型中产品与主题的概率分布,确定包含所述至少一个目标主题的产品,将确定的产品作为所述目标产品的一部分。
  20. 如权利要求18所述的存储介质,其特征在于,所述基于所述产品描述在主题上的分布及产品的主题之间的关系,向用户推荐与所述产品描述的主题相关联的目标产品还包括:
    基于所述产品描述在主题上的分布,获取所述产品描述包含的至少一个目标主题,根据所述相关主题模型中主题之间的关系,确定与所述至少一个目标主题关联的第一主题,再确定只与第一主题关联的第二主题,根据所述相关主题模型中产品与主题的概率分布,确定所述第二主题占比排在前预设位数的产品作为所述目标产品的一部分。
PCT/CN2018/100312 2018-04-03 2018-08-14 文档主题参数提取方法、产品推荐方法、设备及存储介质 WO2019192122A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810287788.7A CN108763258B (zh) 2018-04-03 2018-04-03 文档主题参数提取方法、产品推荐方法、设备及存储介质
CN201810287788.7 2018-04-03

Publications (1)

Publication Number Publication Date
WO2019192122A1 true WO2019192122A1 (zh) 2019-10-10

Family

ID=63980754

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/100312 WO2019192122A1 (zh) 2018-04-03 2018-08-14 文档主题参数提取方法、产品推荐方法、设备及存储介质

Country Status (2)

Country Link
CN (1) CN108763258B (zh)
WO (1) WO2019192122A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113538020A (zh) * 2021-07-05 2021-10-22 深圳索信达数据技术有限公司 获取客群特征关联度方法、装置、存储介质和电子装置

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763084A (zh) * 2020-09-21 2021-12-07 北京沃东天骏信息技术有限公司 产品推荐的处理方法、装置、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104679778A (zh) * 2013-11-29 2015-06-03 腾讯科技(深圳)有限公司 一种搜索结果的生成方法及装置
CN105139211A (zh) * 2014-12-19 2015-12-09 Tcl集团股份有限公司 产品简介生成方法及系统
CN105426514A (zh) * 2015-11-30 2016-03-23 扬州大学 个性化的移动应用app推荐方法
CN107730346A (zh) * 2017-09-25 2018-02-23 北京京东尚科信息技术有限公司 物品聚类的方法和装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101226557B (zh) * 2008-02-22 2010-07-14 中国科学院软件研究所 一种高效的关联主题模型数据处理方法
US9639881B2 (en) * 2013-05-20 2017-05-02 TCL Research America Inc. Method and system for personalized video recommendation based on user interests modeling
CN105389377B (zh) * 2015-11-18 2019-02-05 清华大学 基于主题挖掘的事件团获取方法
CN107220232B (zh) * 2017-04-06 2021-06-11 北京百度网讯科技有限公司 基于人工智能的关键词提取方法及装置、设备与可读介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104679778A (zh) * 2013-11-29 2015-06-03 腾讯科技(深圳)有限公司 一种搜索结果的生成方法及装置
CN105139211A (zh) * 2014-12-19 2015-12-09 Tcl集团股份有限公司 产品简介生成方法及系统
CN105426514A (zh) * 2015-11-30 2016-03-23 扬州大学 个性化的移动应用app推荐方法
CN107730346A (zh) * 2017-09-25 2018-02-23 北京京东尚科信息技术有限公司 物品聚类的方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113538020A (zh) * 2021-07-05 2021-10-22 深圳索信达数据技术有限公司 获取客群特征关联度方法、装置、存储介质和电子装置
CN113538020B (zh) * 2021-07-05 2024-03-26 深圳索信达数据技术有限公司 获取客群特征关联度方法、装置、存储介质和电子装置

Also Published As

Publication number Publication date
CN108763258A (zh) 2018-11-06
CN108763258B (zh) 2023-01-10

Similar Documents

Publication Publication Date Title
CN106649818B (zh) 应用搜索意图的识别方法、装置、应用搜索方法和服务器
CN108829822B (zh) 媒体内容的推荐方法和装置、存储介质、电子装置
CN109992646B (zh) 文本标签的提取方法和装置
JP2019504413A (ja) 絵文字を提案するためのシステムおよび方法
US10810374B2 (en) Matching a query to a set of sentences using a multidimensional relevancy determination
CN106095845B (zh) 文本分类方法和装置
CN111984851B (zh) 医学资料搜索方法、装置、电子装置及存储介质
US8825620B1 (en) Behavioral word segmentation for use in processing search queries
JP5710581B2 (ja) 質問応答装置、方法、及びプログラム
KR20200007713A (ko) 감성 분석에 의한 토픽 결정 방법 및 장치
CN112559684A (zh) 一种关键词提取及信息检索方法
US10915756B2 (en) Method and apparatus for determining (raw) video materials for news
US11734322B2 (en) Enhanced intent matching using keyword-based word mover's distance
WO2011111038A2 (en) Method and system of providing completion suggestion to a partial linguistic element
JPWO2012096388A1 (ja) 意外性判定システム、意外性判定方法およびプログラム
CN112307337B (zh) 基于标签知识图谱的关联推荐方法、装置及计算机设备
US11372914B2 (en) Image annotation
CN112579750A (zh) 相似病案的检索方法、装置、设备及存储介质
CN108280081B (zh) 生成网页的方法和装置
CN117313861A (zh) 模型预训练数据获取方法、模型预训练方法、装置及设备
WO2019192122A1 (zh) 文档主题参数提取方法、产品推荐方法、设备及存储介质
US20230394236A1 (en) Extracting content from freeform text samples into custom fields in a software application
WO2021051587A1 (zh) 基于语意识别的搜索结果排序方法、装置、电子设备及存储介质
CN109298796B (zh) 一种词联想方法及装置
CN115964474A (zh) 一种政策关键词抽取方法、装置、存储介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18913309

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 21.01.2021.)

122 Ep: pct application non-entry in european phase

Ref document number: 18913309

Country of ref document: EP

Kind code of ref document: A1