WO2021004124A1 - Data comparison-based information recommendation method and device, and storage medium - Google Patents

Data comparison-based information recommendation method and device, and storage medium Download PDF

Info

Publication number
WO2021004124A1
WO2021004124A1 PCT/CN2020/086286 CN2020086286W WO2021004124A1 WO 2021004124 A1 WO2021004124 A1 WO 2021004124A1 CN 2020086286 W CN2020086286 W CN 2020086286W WO 2021004124 A1 WO2021004124 A1 WO 2021004124A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
preset
information
product information
theme
Prior art date
Application number
PCT/CN2020/086286
Other languages
French (fr)
Chinese (zh)
Inventor
郭鸿程
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2021004124A1 publication Critical patent/WO2021004124A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Definitions

  • This application relates to the field of big data technology, and in particular to an information recommendation method, device and computer-readable storage medium based on data comparison.
  • This application provides an information recommendation method, device, and computer-readable storage medium based on data comparison, the main purpose of which is not only to protect the user's private data, but also to accurately perform personalized recommendations.
  • this application also provides an information recommendation method based on data comparison, which includes:
  • the first data and the second data are data related to a preset theme, and the first data and the second data are homomorphic Encrypted data;
  • the ranking of the target user is a preset ranking, obtaining recommended product information corresponding to the preset theme
  • the obtaining recommended product information corresponding to the preset theme includes:
  • the target histogram feature vector is input to the naive Bayes classifier used to construct the preset BOW model, and the relevant information of the preset topic is classified by the naive Bayes classifier to obtain all State the category of related information about the preset theme;
  • the product information to be recommended corresponding to the category of the related information of the preset theme is the recommended product information corresponding to the preset theme.
  • the present application also provides an information recommendation device based on data comparison.
  • the device includes a memory and a processor.
  • the memory stores information recommendation based on data comparison that can run on the processor.
  • a program when the information recommendation program based on data comparison is executed by the processor, the following steps are implemented:
  • the first data and the second data are data related to a preset theme, and the first data and the second data are homomorphic Encrypted data;
  • the ranking of the target user is a preset ranking, obtaining recommended product information corresponding to the preset theme
  • this application also provides a computer device, including:
  • One or more processors are One or more processors;
  • One or more computer programs wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, and the one or more computer programs are configured to execute An information recommendation method based on data comparison, wherein the information recommendation method based on data comparison includes:
  • the first data and the second data are data related to a preset theme, and the first data and the second data are homomorphic Encrypted data;
  • the ranking of the target user is a preset ranking, obtaining recommended product information corresponding to the preset theme
  • the present application also provides a computer-readable storage medium that stores an information recommendation program based on data comparison, and the information recommendation program based on data comparison can be used by one or A plurality of processors are executed to implement the steps of the information recommendation method based on data comparison as described above.
  • the method, device, and computer-readable storage medium for information recommendation based on data comparison proposed in this application acquire first data of a target user and second data of a comparison user.
  • the first data and the second data are related presets.
  • Subject data, and the first data and the second data are homomorphically encrypted data; compare the sizes of the first data and the second data through a homomorphic operation to obtain the sorting result; from the Acquiring the ranking of the target user from the ranking result, and returning the ranking result to the target user; if the ranking of the target user is a preset ranking, acquiring recommended product information corresponding to the preset theme;
  • the target user sends recommended product information corresponding to the preset theme.
  • this application protects the details of the data from being disclosed while comparing the data; at the same time, because The user can still be sorted accurately without obtaining the details of the number of users, and then personalized recommendation can be made according to the user's ranking. Therefore, this application achieves the purpose of not only protecting the user's private data, but also accurately performing personalized recommendation.
  • FIG. 1 is a schematic flowchart of an information recommendation method based on data comparison provided by an embodiment of this application;
  • FIG. 2 is a schematic diagram of the internal structure of an information recommendation device based on data comparison provided by an embodiment of the application;
  • FIG. 3 is a schematic diagram of modules of an information recommendation program based on data comparison in an information recommendation device based on data comparison provided by an embodiment of the application.
  • This application provides an information recommendation method based on data comparison.
  • FIG. 1 it is a schematic flowchart of an information recommendation method based on data comparison provided by an embodiment of this application.
  • the method can be executed by a device, and the device can be implemented by software and/or hardware.
  • the device is a data comparison center
  • the data comparison center mainly centers on big data technologies hadoop and spark.
  • Hadoop is composed of hdfs (responsible for cluster storage management) and yarn (responsible for system resource scheduling), and spark is used for specific calculation logic.
  • the data comparison center performs network security protection measures based on the big data architecture obtained by cloud computing.
  • the adopted network security protection measures may include:
  • Deploy IDS/IPS intrusion detection and defense equipment Deploy IPS intrusion prevention equipment between the platform core router and egress firewall, and attach IDS intrusion detection equipment on the side of the core switch to defend the application layer, such as preventing worms, viruses, Trojan horses, denial of service attacks, spyware, and VoIP attacks And point-to-point application abuse, blocking malicious traffic before loss occurs, avoiding external application layer attacks.
  • the information recommendation method based on data comparison includes:
  • Step S101 Obtain first data of a target user and second data of a comparison user.
  • the first data and the second data are data related to a preset theme, and the first data and the second data are Data that is homomorphically encrypted.
  • the target user is a user who wants to compare data.
  • the number of the comparison users may be multiple, and the second data is the second data of each comparison user.
  • the preset theme may be consumption amount, height, age, etc. within a period of time.
  • the target user is user A
  • the comparison user is user B
  • the first data of the target user is the first cumulative consumption amount of user A in the past six months
  • the second data of the comparison user is the second data of user B in the past six months. Cumulative consumption amount.
  • first data and the second data are both homomorphically encrypted data.
  • first data is transmitted to the data comparison center after the client of the target user is homomorphically encrypted
  • second data is transmitted to the data comparison center after the client of the comparison user is homomorphically encrypted of.
  • the homomorphic encryption refers to a given plaintext (x 1 , x 2 ,..., x n ), encrypted with a homomorphic encryption algorithm to obtain the ciphertext c.
  • Fully homomorphic encryption allows anyone to perform anything on the ciphertext c.
  • Operation f the ciphertext f(c) obtained after the operation is the same as the result of f(x 1 ,x 2 ,...,x n ) after decryption.
  • f(x 1 ,x 2 ,...,x n ) and any intermediate plaintext are not leaked; the input value, output value, and intermediate value are always in encrypted state in.
  • Homomorphic encryption includes semi-homomorphic encryption and fully homomorphic encryption.
  • Semi-homomorphic encryption means that data encryption meets additive homomorphism or multiplicative homomorphism.
  • the RSA algorithm satisfies the multiplicative homomorphism, and the Paillier algorithm satisfies the additive homomorphism.
  • the first data and the second data may be obtained through encryption using an asymmetric encryption (RSA) algorithm.
  • the first data and the second data may be provided by the data comparison center
  • the public key is encrypted.
  • Step S201 Comparing the sizes of the first data and the second data through a homomorphic operation to obtain a sorting result.
  • a homomorphic operation is performed on the first data and the second data.
  • the homomorphic operation is to add or multiply the first data and the second data with a standard number and then size them.
  • the sorting result is which one of the first data and the second data is larger and which one is smaller.
  • the second data is data of multiple comparison users
  • the first data and the multiple data are respectively compared to obtain the sorting result.
  • the comparing the sizes of the first data and the second data through a homomorphic operation to obtain a sorting result includes:
  • the negative number of the first data is added to the second data to obtain a second calculation result, the second calculation result is a positive number, and the sorting result that the first data is less than the second data is obtained, if The second calculation result is a negative number, and a sorting result in which the first data is greater than the second data is obtained.
  • Step S301 Obtain the ranking of the target user from the ranking result, and return the ranking result to the target user.
  • the ranking of the target user obtained from the ranking result is the first; or the ranking of the target user obtained from the ranking result is the second; the ranking of the target user obtained from the ranking result is the third.
  • returning the sorting result to the target user includes returning the sorting result to the client of the target user, and the client of the target user may share and display the sorting result through the user's sharing operation.
  • the returning the sorting result to the target user includes:
  • the sorting result when returning the sorting result to the target user, can be encrypted with the public key of the target user.
  • the client of the target user decrypts it with the private key to obtain To the specific sorting result, the security in the data transmission process is improved.
  • Step S401 If the target user's ranking is a preset ranking, obtain recommended product information corresponding to the preset theme.
  • the ranking of the target users as the preset ranking includes: the ranking of the target users is the first.
  • the ranking of the target users as the preset ranking includes: the ranking of the target users is the top third.
  • the order of the target users as the preset order includes: the order of the target users is the first third and the first third.
  • the recommended product information corresponding to the preset theme may be preset.
  • the preset recommended products corresponding to the preset theme are low-fat snacks and low-fat drinks. When the user's body fat percentage ranks the lowest, the low-fat snacks and low-fat drinks are recommended to the user.
  • different product information can also be recommended according to different rankings of preset themes and target users.
  • the obtaining recommended product information corresponding to the preset theme includes:
  • the target histogram feature vector is input to the naive Bayes classifier used to construct the preset BOW model, and the relevant information of the preset topic is classified by the naive Bayes classifier to obtain all State the category of related information about the preset theme;
  • the product information to be recommended corresponding to the category of the related information of the preset theme is the recommended product information corresponding to the preset theme.
  • the information related to the preset theme is information related to the preset theme.
  • the preset theme is a consumption theme
  • the consumption-related information of the preset theme includes consumption history records (such as information about commodities purchased at different consumption times and locations).
  • the BOW model is established in advance, and the specific BOW model is constructed by a clustering algorithm (such as the k-means algorithm and the naive Bayes classifier).
  • a clustering algorithm such as the k-means algorithm and the naive Bayes classifier.
  • the preset BOW can be constructed in the following manner:
  • clustering refers to dividing data objects with higher similarity into the same cluster, and dividing data objects with higher dissimilarity into different clusters according to the principle of similarity.
  • k in the k-means algorithm represents the number of clusters, and means represents the mean value of the data objects in the cluster (this kind of mean is a description of the center of the cluster), therefore, the k-means algorithm is also called k-means algorithm.
  • the k-means algorithm is a clustering algorithm based on partitioning. It uses distance as a measure of similarity between data objects.
  • the distance between data objects is calculated using Euclidean distance, assuming that x i and x j are data, D represents the number of attributes of the data object, and the distance between the two is:
  • x i, d represents the d-dimensional coordinates of the i-th point
  • x j, d represents the d-dimensional coordinates of the j-th point
  • C k represents the number of data objects in the k-th cluster
  • Center k represents a vector containing D attributes
  • the training data is like a cluster center mapping, and a low-dimensional representation of each training data in the cluster center space is obtained.
  • the final clustering result J use it as the basis of the histogram, use the basis vector to construct other vectors, and do mapping to obtain the statistics of the histogram of one category of different categories. This process is also the process of extracting the features of the BOW model. .
  • c k represents the number of possible values of the k-th dimension feature
  • is the coefficient.
  • D i represents the set of training samples of class w i
  • the numerator Training sample set D i w i represents the class configuration
  • the value of the k-th feature x k is the number of samples.
  • the relevant information about the preset theme is mapped to the target dictionary of the preset BOW model, where the target dictionary of the preset BOW model is The cluster center space obtained by clustering when constructing the BOW model.
  • the product information to be recommended corresponding to the category of the related information of the preset theme may be preset, that is, the corresponding relationship of the product information to be recommended corresponding to different categories is preset, and then the preset theme is obtained After the category of the related information belongs to, the product information to be recommended corresponding to the category is obtained according to the category.
  • the determining that the product information corresponding to the category of the related information of the preset theme is the recommended product information corresponding to the preset theme includes:
  • the product information whose similarity with the word frequency vector is greater than the preset similarity is the recommended product information corresponding to the preset theme.
  • the similarity can be calculated by the cosine similarity.
  • the cosine similarity uses the cosine value of the angle between two vectors in the vector space as a measure of the difference between two individuals. The closer the cosine value is to 1, the closer the angle is to 0 degrees, that is, the two vectors. The more similar. For the obtained information related to the topic posted by the customer and the recommended product information, use the following formula to calculate:
  • X is a vector-related information indicate the subject matter of the sun
  • Y is a vector representation of the recommended product information
  • X i represents the component of the vector X
  • Y i represents the vector Y component.
  • the similarity is judged based on the calculated value, so that recommended product information with high similarity is recommended to the target user, so that products that are more suitable for the user can be recommended.
  • the method before the mapping related information of the preset theme to the target dictionary of the preset BOW model, the method further includes:
  • the text processing includes performing word segmentation processing on the related information of the preset topic through a hidden Markov model, and after word segmentation processing through a preset keyword extraction algorithm The information is rewritten.
  • text processing is performed on the related information of the preset theme first, and then the operation of mapping to the target dictionary of the preset BOW model is performed according to the related information of the preset theme obtained after processing.
  • the text rewrite refers to a text that first uses Chinese word segmentation, then cleans up, retains the main words, and performs semantic enhancement (synonym/related word supplement) on the main words.
  • this application performs word segmentation processing on the related information of the preset topic by constructing a hidden Markov model. Since the text satisfies the Markov property, that is, the possibility of the occurrence of the m-th word in the text is only related to the occurrence of the preceding m-1 words, and has nothing to do with all words before the m-th word and after the m-th word, so N
  • the purpose of the metagrammatic model is to give the probability of the occurrence of the m-th word when the first m-1 words appear, specifically expressed as:
  • n any word in the text
  • n represents the previous word of the m-th word
  • the probability that the sentence is arranged according to the word order is:
  • W m-n+1 ,...W m-1 ) means: the probability that W m appears when the character string W m-n+1 ,...W m-1 appears, in Based on the large-scale corpus training, the binary grammar model is used. Therefore, the probability model of the sentence is:
  • the sentence S is segmented using the full segmentation method to obtain all possible Chinese word segmentation methods, and then the probability of each word segmentation method is calculated, and the word segmentation method with the highest probability is selected as the final text segmentation result.
  • the selection process is to find the maximum value of P(S):
  • this application performs keyword extraction in the case of word segmentation based on the hidden Markov model.
  • the keyword extraction algorithm uses statistical information, word vector information, and dependency syntax information between words to calculate the correlation strength between words by constructing a dependency relationship graph, and iteratively calculates the importance score of words using the TextRank algorithm, first based on the dependency of the sentence
  • the result of syntactic analysis constructs an undirected graph for all non-stop words, and then calculates the weight of the edges by using the gravity value between the words and the degree of dependence. Therefore, the dependency correlation degree of any two words W i and W j is:
  • len(W i , W j ) represents the length of the dependency path between words W i and W j
  • b is a hyperparameter
  • tfidf(W) is the TF-IDF value of word W
  • d is the Euclidean distance between the word vectors of words W i and W j .
  • several words with the highest scores can be selected as the main words, and the main words can be semantically enhanced.
  • Step S501 Send recommended product information corresponding to the preset theme to the target user.
  • the recommended product information corresponding to the preset consumption theme is the information of the m electronic product and the information of the n electronic product
  • the information of the m electronic product and the information of the n electronic product are sent to the user.
  • the recommended product information After obtaining the recommended product information corresponding to the preset theme, the recommended product information is sent to the target user, so that information can be accurately recommended to the target user.
  • the information recommendation method based on data comparison proposed in this embodiment obtains first data of a target user and second data of a comparison user.
  • the first data and the second data are data related to a preset theme, and the The first data and the second data are homomorphic encrypted data; the size of the first data and the second data is compared through a homomorphic operation to obtain the sorting result; the target is obtained from the sorting result
  • the ranking of users, and returning the ranking result to the target user if the ranking of the target user is a preset ranking, obtain recommended product information corresponding to the preset theme; Describe the recommended product information corresponding to the preset theme.
  • this application protects the details of the data from being disclosed while comparing the data; at the same time, because The user can still be sorted accurately without obtaining the details of the number of users, and then personalized recommendation can be made according to the user's ranking. Therefore, this application achieves the purpose of not only protecting the user's private data, but also accurately performing personalized recommendation.
  • This application also provides an information recommendation device based on data comparison.
  • FIG. 2 it is a schematic diagram of the internal structure of an information recommendation device based on data comparison provided by an embodiment of this application.
  • the information recommendation device 1 based on data comparison may be a PC (Personal Computer, personal computer), or a terminal device such as a smart phone, a tablet computer, or a portable computer.
  • the information recommendation device 1 based on data comparison at least includes a memory 11, a processor 12, a communication bus 13, and a network interface 14.
  • the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc.
  • the memory 11 may be an internal storage unit of the information recommendation device 1 based on data comparison, such as a hard disk of the information recommendation device 1 based on data comparison.
  • the memory 11 may also be an external storage device of the information recommendation device 1 based on data comparison, such as a plug-in hard disk equipped on the information recommendation device 1 based on data comparison, and a smart media card (SMC). ), Secure Digital (SD) card, Flash Card, etc.
  • the memory 11 may also include both an internal storage unit of the information recommendation device 1 based on data comparison and an external storage device.
  • the memory 11 can not only be used to store application software and various data installed in the information recommendation device 1 based on data comparison, such as the code of the information recommendation program 01 based on data comparison, etc., but also can be used to temporarily store what has been output or will be output The data.
  • the processor 12 may be a central processing unit (CPU), controller, microcontroller, microprocessor or other data processing chip in some embodiments, and is used to run the program code or processing stored in the memory 11 Data, for example, the information recommendation program 01 based on data comparison is executed.
  • the communication bus 13 is used to realize the connection and communication between these components.
  • the network interface 14 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is usually used to establish a communication connection between the device 1 and other electronic devices.
  • the device 1 may also include a user interface.
  • the user interface may include a display (Display) and an input unit such as a keyboard (Keyboard).
  • the optional user interface may also include a standard wired interface and a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch liquid crystal display, an organic light-emitting diode (OLED) touch device, and the like.
  • the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the information recommendation device 1 based on data comparison and to display a visualized user interface.
  • Figure 2 only shows the data comparison-based information recommendation device 1 with components 11-14 and the data-comparison-based information recommendation program 01. Those skilled in the art can understand that the structure shown in Figure 2 does not constitute a The limitation of the information recommendation device 1 for data comparison may include fewer or more components than shown, or a combination of certain components, or a different component arrangement.
  • the memory 11 stores the information recommendation program 01 based on data comparison; the processor 12 implements the following steps when executing the information recommendation program 01 based on the data comparison stored in the memory 11:
  • the first data and the second data are data related to a preset theme, and the first data and the second data are homomorphic Encrypted data.
  • the target user is a user who wants to compare data.
  • the number of the comparison users may be multiple, and the second data is the second data of each comparison user.
  • the preset theme may be consumption amount, height, age, etc. within a period of time.
  • the target user is user A
  • the comparison user is user B
  • the first data of the target user is the first cumulative consumption amount of user A in the past six months
  • the second data of the comparison user is the second data of user B in the past six months. Cumulative consumption amount.
  • first data and the second data are both homomorphically encrypted data.
  • first data is transmitted to the data comparison center after the client of the target user is homomorphically encrypted
  • second data is transmitted to the data comparison center after the client of the comparison user is homomorphically encrypted of.
  • the homomorphic encryption refers to a given plaintext (x 1 , x 2 ,..., x n ), encrypted with a homomorphic encryption algorithm to obtain the ciphertext c.
  • Fully homomorphic encryption allows anyone to perform anything on the ciphertext c.
  • Operation f the ciphertext f(c) obtained after the operation is the same as the result of f(x 1 ,x 2 ,...,x n ) after decryption.
  • f(x 1 ,x 2 ,...,x n ) and any intermediate plaintext are not leaked; the input value, output value, and intermediate value are always in encrypted state in.
  • Homomorphic encryption includes semi-homomorphic encryption and fully homomorphic encryption.
  • Semi-homomorphic encryption means that data encryption meets additive homomorphism or multiplicative homomorphism.
  • the RSA algorithm satisfies the multiplicative homomorphism, and the Paillier algorithm satisfies the additive homomorphism.
  • the first data and the second data may be obtained through encryption using an asymmetric encryption (RSA) algorithm.
  • the first data and the second data may be provided by the data comparison center
  • the public key is encrypted.
  • the size of the first data and the second data is compared through a homomorphic operation to obtain a sorting result.
  • a homomorphic operation is performed on the first data and the second data.
  • the homomorphic operation is to add or multiply the first data and the second data with a standard number and then size them.
  • the sorting result is which one of the first data and the second data is larger and which one is smaller.
  • the second data is data of multiple comparison users
  • the first data and the multiple data are respectively compared to obtain the sorting result.
  • the comparing the sizes of the first data and the second data through a homomorphic operation to obtain a sorting result includes:
  • the negative number of the first data is added to the second data to obtain a second calculation result, the second calculation result is a positive number, and the sorting result that the first data is less than the second data is obtained, if The second calculation result is a negative number, and a sorting result in which the first data is greater than the second data is obtained.
  • the ranking of the target user obtained from the ranking result is the first; or the ranking of the target user obtained from the ranking result is the second; the ranking of the target user obtained from the ranking result is the third.
  • returning the sorting result to the target user includes returning the sorting result to the client of the target user, and the client of the target user may share and display the sorting result through the user's sharing operation.
  • the returning the sorting result to the target user includes:
  • the sorting result when returning the sorting result to the target user, can be encrypted with the public key of the target user.
  • the client of the target user decrypts it with the private key to obtain To the specific sorting result, the security in the data transmission process is improved.
  • the ranking of the target user is a preset ranking, obtain recommended product information corresponding to the preset theme.
  • the ranking of the target users as the preset ranking includes: the ranking of the target users is the first.
  • the ranking of the target users as the preset ranking includes: the ranking of the target users is the top third.
  • the order of the target users as the preset order includes: the order of the target users is the first third and the first third.
  • the recommended product information corresponding to the preset theme may be preset.
  • the preset recommended products corresponding to the preset theme are low-fat snacks and low-fat drinks. When the user's body fat percentage ranks the lowest, the low-fat snacks and low-fat drinks are recommended to the user.
  • different product information can also be recommended according to different rankings of preset themes and target users.
  • the obtaining recommended product information corresponding to the preset theme includes:
  • the target histogram feature vector is input to the naive Bayes classifier used to construct the preset BOW model, and the relevant information of the preset topic is classified by the naive Bayes classifier to obtain all State the category of related information about the preset theme;
  • the product information to be recommended corresponding to the category of the related information of the preset theme is the recommended product information corresponding to the preset theme.
  • the information related to the preset theme is information related to the preset theme.
  • the preset theme is a consumption theme
  • the consumption-related information of the preset theme includes consumption history records (such as information on commodities purchased at different consumption times and locations).
  • the BOW model is established in advance, and the specific BOW model is constructed by a clustering algorithm (such as the k-means algorithm and the naive Bayes classifier).
  • a clustering algorithm such as the k-means algorithm and the naive Bayes classifier.
  • the preset BOW can be constructed in the following manner:
  • clustering refers to dividing data objects with higher similarity into the same cluster, and dividing data objects with higher dissimilarity into different clusters according to the principle of similarity.
  • k in the k-means algorithm represents the number of clusters, and means represents the mean value of the data objects in the cluster (this kind of mean is a description of the center of the cluster), therefore, the k-means algorithm is also called k-means algorithm.
  • the k-means algorithm is a clustering algorithm based on partitioning. It uses distance as a measure of similarity between data objects.
  • the distance between data objects is calculated using Euclidean distance, assuming that x i and x j are data, D represents the number of attributes of the data object, and the distance between the two is:
  • x i, d represents the d-dimensional coordinates of the i-th point
  • x j, d represents the d-dimensional coordinates of the j-th point
  • C k represents the number of data objects in the k-th cluster
  • Center k represents a vector containing D attributes
  • the training data is like a cluster center mapping, and a low-dimensional representation of each training data in the cluster center space is obtained.
  • the final clustering result J use it as the basis of the histogram, use the basis vector to construct other vectors, and do mapping to obtain the statistics of the histogram of one category of different categories. This process is also the process of extracting the features of the BOW model. .
  • c k represents the number of possible values of the k-th dimension feature
  • is the coefficient.
  • D i represents the set of training samples of class w i
  • the numerator Training sample set D i w i represents the class configuration
  • the value of the k-th feature x k is the number of samples.
  • the relevant information about the preset theme is mapped to the target dictionary of the preset BOW model, where the target dictionary of the preset BOW model is The cluster center space obtained by clustering when constructing the BOW model.
  • the product information to be recommended corresponding to the category of the related information of the preset theme may be preset, that is, the corresponding relationship of the product information to be recommended corresponding to different categories is preset, and then the preset theme is obtained After the category of the related information belongs to, the product information to be recommended corresponding to the category is obtained according to the category.
  • the determining that the product information corresponding to the category of the related information of the preset theme is the recommended product information corresponding to the preset theme includes:
  • the product information whose similarity with the word frequency vector is greater than the preset similarity is the recommended product information corresponding to the preset theme.
  • the similarity can be calculated by the cosine similarity.
  • the cosine similarity uses the cosine value of the angle between two vectors in the vector space as a measure of the difference between two individuals. The closer the cosine value is to 1, the closer the angle is to 0 degrees, that is, the two vectors. The more similar.
  • X is a vector-related information indicate the subject matter of the sun
  • Y is a vector representation of the recommended product information
  • X i represents the component of the vector X
  • Y i represents the vector Y component.
  • the similarity is judged based on the calculated value, so that recommended product information with high similarity is recommended to the target user, so that products that are more suitable for the user can be recommended.
  • text processing is performed on the related information of the preset topic, and the text processing It includes performing word segmentation processing on the related information of the preset topic through a hidden Markov model, and performing text rewriting on the information after the word segmentation processing through a preset keyword extraction algorithm.
  • text processing is performed on the related information of the preset theme first, and then the operation of mapping to the target dictionary of the preset BOW model is performed according to the related information of the preset theme obtained after processing.
  • the text rewrite refers to a text that first uses Chinese word segmentation, then cleans up, retains the main words, and performs semantic enhancement (synonym/related word supplement) on the main words.
  • this application performs word segmentation processing on the related information of the preset topic by constructing a hidden Markov model. Since the text satisfies the Markov property, that is, the possibility of the occurrence of the m-th word in the text is only related to the occurrence of the preceding m-1 words, and has nothing to do with all words before the m-th word and after the m-th word, so N
  • the purpose of the metagrammatic model is to give the probability of the occurrence of the m-th word when the first m-1 words appear, specifically expressed as:
  • n any word in the text
  • n represents the previous word of the m-th word
  • the probability that the sentence is arranged according to the word order is:
  • W m-n+1 ,...W m-1 ) means: the probability that W m appears when the character string W m-n+1 ,...W m-1 appears, in Based on the large-scale corpus training, the binary grammar model is used. Therefore, the probability model of the sentence is:
  • the sentence S is segmented using the full segmentation method to obtain all possible Chinese word segmentation methods, and then the probability of each word segmentation method is calculated, and the word segmentation method with the highest probability is selected as the final text segmentation result.
  • the selection process is to find the maximum value of P(S):
  • this application performs keyword extraction in the case of word segmentation based on the hidden Markov model.
  • the keyword extraction algorithm uses statistical information, word vector information, and dependency syntax information between words to calculate the correlation strength between words by constructing a dependency relationship graph, and iteratively calculates the importance score of words using the TextRank algorithm, first based on the dependency of the sentence
  • the result of syntactic analysis constructs an undirected graph for all non-stop words, and then calculates the weight of the edges by using the gravity value between the words and the degree of dependence. Therefore, the dependency correlation degree of any two words W i and W j is:
  • len(W i , W j ) represents the length of the dependency path between words W i and W j
  • b is a hyperparameter
  • tfidf(W) is the TF-IDF value of word W
  • d is the Euclidean distance between the word vectors of words W i and W j .
  • several words with the highest scores can be selected as the main words, and the main words can be semantically enhanced. Sending recommended product information corresponding to the preset theme to the target user.
  • the recommended product information corresponding to the preset consumption theme is the information of the m electronic product and the information of the n electronic product
  • the information of the m electronic product and the information of the n electronic product are sent to the user.
  • the recommended product information After obtaining the recommended product information corresponding to the preset theme, the recommended product information is sent to the target user, so that information can be accurately recommended to the target user.
  • the information recommendation device based on data comparison proposed in this embodiment obtains the first data of the target user and the second data of the comparison user.
  • the first data and the second data are data related to a preset theme, and the The first data and the second data are homomorphic encrypted data; the size of the first data and the second data is compared through a homomorphic operation to obtain the sorting result; the target is obtained from the sorting result User ranking, and returning the ranking result to the target user; if the ranking of the target user is a preset ranking, obtain the recommended product information corresponding to the preset theme; and send information to the target user Describe the recommended product information corresponding to the preset theme.
  • this application protects the details of the data from being disclosed while comparing the data; at the same time, because The user can still be sorted accurately without obtaining the details of the number of users, and then personalized recommendation can be made according to the user's ranking. Therefore, this application achieves the purpose of not only protecting the user's private data, but also accurately performing personalized recommendation.
  • the information recommendation program based on data comparison may also be divided into one or more modules, and the one or more modules are stored in the memory 11 and run by one or more processors (this The embodiment is executed by the processor 12) to complete this application.
  • the module referred to in this application refers to a series of computer program instruction segments that can complete specific functions, and is used to describe the information recommendation program based on data comparison in the information based on data comparison. Recommend the implementation process in the device.
  • FIG. 3 a schematic diagram of program modules of an information recommendation program based on data comparison in an embodiment of the information recommendation device based on data comparison of this application.
  • the information recommendation program based on data comparison can be divided
  • the comparison module 20 the comparison module 20
  • the first transmission module 30 the second acquisition module 40
  • the second transmission module 50 exemplarily:
  • the first acquisition module 10 is configured to acquire first data of a target user and second data of a comparison user.
  • the first data and the second data are data related to a preset theme, and the first data and the second data
  • the second data is data that has been homomorphically encrypted;
  • the comparison module 20 is configured to compare the sizes of the first data and the second data through a homomorphic operation to obtain a sorting result
  • the first transmission module 30 is configured to: obtain the ranking of the target user from the ranking result, and return the ranking result to the target user;
  • the second acquiring module 40 is configured to: if the ranking of the target user is a preset ranking, acquire recommended product information corresponding to the preset theme;
  • the second transmission module 50 is configured to send recommended product information corresponding to the preset theme to the target user.
  • the present application also provides a computer device, including: one or more processors; a memory; one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be Executed by the one or more processors, and the one or more computer programs are configured to execute an information recommendation method based on data comparison, wherein the information recommendation method based on data comparison includes:
  • the first data and the second data are data related to a preset theme, and the first data and the second data are homomorphic Encrypted data;
  • the ranking of the target user is a preset ranking, obtaining recommended product information corresponding to the preset theme
  • an embodiment of the present application also proposes a computer-readable storage medium that stores an information recommendation program based on data comparison, and the information recommendation program based on data comparison can be processed by one or more Executed to achieve the following operations:
  • the first data and the second data are data related to a preset theme, and the first data and the second data are homomorphic Encrypted data;
  • the ranking of the target user is a preset ranking, obtaining recommended product information corresponding to the preset theme
  • the computer-readable storage medium of the present application wherein the storage medium is a volatile storage medium or a non-volatile storage medium, and the specific implementation is basically the same as the foregoing embodiments of the information recommendation device and method based on data comparison. Not to be exhausted.
  • the technical solution of the present invention essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , Magnetic disk, optical disk), including several instructions to make a terminal device (can be a mobile phone, computer, server, or network device, etc.) execute the method described in each embodiment of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application discloses a data comparison-based information recommendation method, comprising: acquiring first data of a target user and second data of a compared user, wherein the first data and the second data are related to a preconfigured theme, and the first data and the second data are homomorphically encrypted; comparing the size of the first data and the size of the second data by means of a homomorphic operation, and obtaining a ranking result; acquiring, from the ranking result, the position of the target user in the ranking, and returning the ranking result to the target user; if the position of the target user is a preconfigured position, acquiring recommended product information corresponding to the preconfigured theme; and transmitting to the target user the recommended product information corresponding to the preconfigured theme. The present application further provides a data comparison-based information recommendation device, and a storage medium. The present application protects private data of a user, and accurately performs personalized recommendation.

Description

基于数据比较的信息推荐方法、装置及存储介质Information recommendation method, device and storage medium based on data comparison
本申请要求于2019年7月5日提交中国专利局、申请号为201910605697.8,发明名称为“基于数据比较的信息推荐方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on July 5, 2019, the application number is 201910605697.8, and the invention title is "Data comparison-based information recommendation method, device and storage medium", the entire content of which is incorporated by reference Incorporated in this application.
技术领域Technical field
本申请涉及大数据技术领域,尤其涉及一种基于数据比较的信息推荐方法、装置及计算机可读存储介质。This application relates to the field of big data technology, and in particular to an information recommendation method, device and computer-readable storage medium based on data comparison.
背景技术Background technique
近年以来,随着朋友圈、微博等社交平台的出现,人们越来越喜欢通过这些社交平台进行信息分享,诸如晒工资、晒消费、晒体重、晒年龄等排名分享行为。发明人意识到这种排名分享行为将用户的个人数据与他人的数据进行比较,进而得到比较结果进行公布,容易造成用户个人隐私数据的泄露,为用户带来信息安全的问题。另一方面,当无法获取用户个人隐私数据时,也无法有效的针对对用户进行个性化的产品推荐。因此,如何能够既保护用户的个人信息安全且又能够准确地进行个性化推荐是一个亟待解决的问题。In recent years, with the emergence of social platforms such as Moments of Friends and Weibo, people increasingly like to share information through these social platforms, such as ranking sharing behaviors such as salary posting, posting consumption, posting weight, and posting age. The inventor realizes that this ranking sharing behavior compares the user's personal data with other people's data, and then obtains the comparison result for publication, which is likely to cause the leakage of the user's personal privacy data and cause information security problems for the user. On the other hand, when the user's personal privacy data cannot be obtained, it is also impossible to effectively recommend personalized products to the user. Therefore, how to protect the user's personal information security and accurately perform personalized recommendations is an urgent problem to be solved.
发明内容Summary of the invention
本申请提供一种基于数据比较的信息推荐方法、装置及计算机可读存储介质,其主要目的在于不仅能够保护用户的隐私数据,而且能够准确地进行个性化推荐。This application provides an information recommendation method, device, and computer-readable storage medium based on data comparison, the main purpose of which is not only to protect the user's private data, but also to accurately perform personalized recommendations.
为实现上述目的,本申请还提供一种基于数据比较的信息推荐方法,该方法包括:To achieve the above objective, this application also provides an information recommendation method based on data comparison, which includes:
获取目标用户的第一数据和比较用户的第二数据,所述第一数据和所述第二数据是有关预设主题的数据,且所述第一数据和所述第二数据是经过同态加密的数据;Obtain the first data of the target user and the second data of the comparison user. The first data and the second data are data related to a preset theme, and the first data and the second data are homomorphic Encrypted data;
通过同态操作比较所述第一数据和所述第二数据的大小,得到排序结果;Comparing the sizes of the first data and the second data through a homomorphic operation to obtain a sorting result;
从所述排序结果中获取所述目标用户的排序,以及将所述排序结果返回至所述目标用户;Acquiring the ranking of the target user from the ranking result, and returning the ranking result to the target user;
若所述目标用户的排序为预设排序,获取与所述预设主题对应的推荐产品信息;If the ranking of the target user is a preset ranking, obtaining recommended product information corresponding to the preset theme;
向所述目标用户发送与所述预设主题对应的推荐产品信息。Sending recommended product information corresponding to the preset theme to the target user.
可选地,所述获取与所述预设主题对应的推荐产品信息包括:Optionally, the obtaining recommended product information corresponding to the preset theme includes:
获取所述预设主题的相关信息;Acquiring relevant information about the preset theme;
将所述预设主题的相关信息映射到预设BOW模型的目标词典,得到目标直方图特征向量,所述目标词典为通过训练样本进行聚类处理得到的;Mapping the relevant information of the preset theme to the target dictionary of the preset BOW model to obtain the target histogram feature vector, the target dictionary being obtained by clustering processing of training samples;
将所述目标直方图特征向量输入至用于构建所述预设BOW模型的朴素贝叶斯分类器,通过所述朴素贝叶斯分类器对所述预设主题的相关信息进行分类,得到所述预设主题的相关信息的所属类别;The target histogram feature vector is input to the naive Bayes classifier used to construct the preset BOW model, and the relevant information of the preset topic is classified by the naive Bayes classifier to obtain all State the category of related information about the preset theme;
获取所述预设主题的相关信息的所属类别对应的待推荐产品信息;Obtaining the product information to be recommended corresponding to the category of the related information of the preset theme;
确定与所述预设主题的相关信息的所属类别对应的待推荐产品信息为所述预设主题对应的推荐产品信息。It is determined that the product information to be recommended corresponding to the category of the related information of the preset theme is the recommended product information corresponding to the preset theme.
此外,为实现上述目的,本申请还提供一种基于数据比较的信息推荐装置,该装置包括存储器和处理器,所述存储器中存储有可在所述处理器上运行的基于数据比较的信息推荐程序,所述基于数据比较的信息推荐程序被所述处理器执行时实现如下步骤:In addition, in order to achieve the above object, the present application also provides an information recommendation device based on data comparison. The device includes a memory and a processor. The memory stores information recommendation based on data comparison that can run on the processor. A program, when the information recommendation program based on data comparison is executed by the processor, the following steps are implemented:
获取目标用户的第一数据和比较用户的第二数据,所述第一数据和所述第二数据是有关预设主题的数据,且所述第一数据和所述第二数据是经过同态加密的数据;Obtain the first data of the target user and the second data of the comparison user. The first data and the second data are data related to a preset theme, and the first data and the second data are homomorphic Encrypted data;
通过同态操作比较所述第一数据和所述第二数据的大小,得到排序结果;Comparing the sizes of the first data and the second data through a homomorphic operation to obtain a sorting result;
从所述排序结果中获取所述目标用户的排序,以及将所述排序结果返回至所述目标用户;Acquiring the ranking of the target user from the ranking result, and returning the ranking result to the target user;
若所述目标用户的排序为预设排序,获取与所述预设主题对应的推荐产品信息;If the ranking of the target user is a preset ranking, obtaining recommended product information corresponding to the preset theme;
向所述目标用户发送与所述预设主题对应的推荐产品信息。Sending recommended product information corresponding to the preset theme to the target user.
此外,为实现上述目的,本申请还提供一种计算机设备,包括:In addition, in order to achieve the above objective, this application also provides a computer device, including:
一个或多个处理器;One or more processors;
存储器;Memory
一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个计算机程序配置用于执行一种基于数据比较的信息推荐方法,其中,所述基于数据比较的信息推荐方法包括:One or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, and the one or more computer programs are configured to execute An information recommendation method based on data comparison, wherein the information recommendation method based on data comparison includes:
获取目标用户的第一数据和比较用户的第二数据,所述第一数据和所述第二数据是有关预设主题的数据,且所述第一数据和所述第二数据是经过同态加密的数据;Obtain the first data of the target user and the second data of the comparison user. The first data and the second data are data related to a preset theme, and the first data and the second data are homomorphic Encrypted data;
通过同态操作比较所述第一数据和所述第二数据的大小,得到排序结果;Comparing the sizes of the first data and the second data through a homomorphic operation to obtain a sorting result;
从所述排序结果中获取所述目标用户的排序,以及将所述排序结果返回至所述目标用户;Acquiring the ranking of the target user from the ranking result, and returning the ranking result to the target user;
若所述目标用户的排序为预设排序,获取与所述预设主题对应的推荐产品信息;If the ranking of the target user is a preset ranking, obtaining recommended product information corresponding to the preset theme;
向所述目标用户发送与所述预设主题对应的推荐产品信息。Sending recommended product information corresponding to the preset theme to the target user.
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有基于数据比较的信息推荐程序,所述基于数据比较的信息推荐程序可被一个或者多个处理器执行,以实现如上所述的基于数据比较的信息推荐方法的步骤。In addition, in order to achieve the above-mentioned object, the present application also provides a computer-readable storage medium that stores an information recommendation program based on data comparison, and the information recommendation program based on data comparison can be used by one or A plurality of processors are executed to implement the steps of the information recommendation method based on data comparison as described above.
本申请提出的基于数据比较的信息推荐方法、装置及计算机可读存储介质,获取目标用户的第一数据和比较用户的第二数据,所述第一数据和所述第二数据是有关预设主题的数据,且所述第一数据和所述第二数据是经过同态加密的数据;通过同态操作比较所述第一数据和所述第二数据的大小,得到排序结果;从所述排序结果中获取所述目标用户的排序,以及将所述排序结果返回至所述目标用户;若所述目标用户的排序为预设排序,获取与所述预设主题对应的推荐产品信息;向所述目标用户发送与所述预设主题对应的推荐产品信息。由于目标用户的第一数据和其他用户的第二数据是经过加密的数据,且通过同态操作进行数据比较,因此,本申请在数据比较的同时保护数据的细节不被公开;同时,由于在未获取用户数的细节的情况下仍能准确的进行排序,进而根据用户的排序进行个性化推荐,因此本申请实现了不仅能够保护用户的隐私数据,而且能够准确地进行个性化推荐的目的。The method, device, and computer-readable storage medium for information recommendation based on data comparison proposed in this application acquire first data of a target user and second data of a comparison user. The first data and the second data are related presets. Subject data, and the first data and the second data are homomorphically encrypted data; compare the sizes of the first data and the second data through a homomorphic operation to obtain the sorting result; from the Acquiring the ranking of the target user from the ranking result, and returning the ranking result to the target user; if the ranking of the target user is a preset ranking, acquiring recommended product information corresponding to the preset theme; The target user sends recommended product information corresponding to the preset theme. Since the first data of the target user and the second data of other users are encrypted data, and the data is compared through homomorphic operations, this application protects the details of the data from being disclosed while comparing the data; at the same time, because The user can still be sorted accurately without obtaining the details of the number of users, and then personalized recommendation can be made according to the user's ranking. Therefore, this application achieves the purpose of not only protecting the user's private data, but also accurately performing personalized recommendation.
附图说明Description of the drawings
图1为本申请一实施例提供的基于数据比较的信息推荐方法的流程示意图;FIG. 1 is a schematic flowchart of an information recommendation method based on data comparison provided by an embodiment of this application;
图2为本申请一实施例提供的基于数据比较的信息推荐装置的内部结构示意图;2 is a schematic diagram of the internal structure of an information recommendation device based on data comparison provided by an embodiment of the application;
图3为本申请一实施例提供的基于数据比较的信息推荐装置中基于数据比较的信息推荐程序的模块示意图。FIG. 3 is a schematic diagram of modules of an information recommendation program based on data comparison in an information recommendation device based on data comparison provided by an embodiment of the application.
具体实施方式Detailed ways
本申请提供一种基于数据比较的信息推荐方法。参照图1所示,为本申请一实施例提供的基于数据比较的信息推荐方法的流程示意图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。This application provides an information recommendation method based on data comparison. Referring to FIG. 1, it is a schematic flowchart of an information recommendation method based on data comparison provided by an embodiment of this application. The method can be executed by a device, and the device can be implemented by software and/or hardware.
可选的,该装置为数据比较中心,该数据比较中心主要以大数据技术hadoop、spark为核心。其中,hadoop由hdfs(负责集群的存储管理)和yarn(负责系统的资源调度)两部分组成,spark用于负责具体的计算逻辑。Optionally, the device is a data comparison center, and the data comparison center mainly centers on big data technologies hadoop and spark. Among them, Hadoop is composed of hdfs (responsible for cluster storage management) and yarn (responsible for system resource scheduling), and spark is used for specific calculation logic.
优选的,数据比较中心基于云计算得到的大数据架构进行网络安全防护措施。Preferably, the data comparison center performs network security protection measures based on the big data architecture obtained by cloud computing.
一种可选实施例中,所采用的网络安全防护措施可以包括:In an optional embodiment, the adopted network security protection measures may include:
(1)东西向流量监控。通过虚拟防火墙技术,所有流量都流经虚拟防火墙,通过虚拟防火墙将数据再转发至目标虚拟主机,从而实现同一物理主机的不同虚拟机之间、不同物理主机之间流量的隔离、管控以及安全检查。所有虚拟防护墙功能与物理防火墙一致,可以划分Trust、Untrust、Local、DMZ(不设防区域)等不同安全域,不同安全域可以预先灵活配置安全策略,用户管控数据包流向,实现安全防护功能。不同虚拟防火墙之间网络默认设置为不通,从而能够一定程度解决核心设备上横向流量的管控问题。(1) East-west flow monitoring. Through the virtual firewall technology, all traffic flows through the virtual firewall, and the data is forwarded to the target virtual host through the virtual firewall, so as to realize the isolation, control and security inspection of the traffic between different virtual machines of the same physical host and between different physical hosts . All virtual protective wall functions are consistent with physical firewalls, and can be divided into different security zones such as Trust, Untrust, Local, and DMZ (undefended zones). Different security zones can be pre-flexibly configured with security policies, and users can control the flow of data packets to achieve security protection. The default setting of the network between different virtual firewalls is blocked, which can solve the problem of horizontal traffic control on the core device to a certain extent.
(2)部署IDS/IPS入侵检测与防御设备。在平台核心路由器与出口防火墙之间部署IPS入侵防护设备,在核心交换机侧旁挂IDS入侵检测设备,从而对应用层进行防御,例如阻止蠕虫、病毒、木马,拒绝服务攻击、间谍软件、VoIP攻击及点到点应用滥用,在发生损失之前阻断恶意流量,避免受外部应用层攻击。(2) Deploy IDS/IPS intrusion detection and defense equipment. Deploy IPS intrusion prevention equipment between the platform core router and egress firewall, and attach IDS intrusion detection equipment on the side of the core switch to defend the application layer, such as preventing worms, viruses, Trojan horses, denial of service attacks, spyware, and VoIP attacks And point-to-point application abuse, blocking malicious traffic before loss occurs, avoiding external application layer attacks.
在本实施例中,基于数据比较的信息推荐方法包括:In this embodiment, the information recommendation method based on data comparison includes:
步骤S101,获取目标用户的第一数据和比较用户的第二数据,所述第一数据和所述第二数据是有关预设主题的数据,且所述第一数据和所述第二数据是经过同态加密的数据。Step S101: Obtain first data of a target user and second data of a comparison user. The first data and the second data are data related to a preset theme, and the first data and the second data are Data that is homomorphically encrypted.
本实施例中,所述目标用户为要进行数据比较的用户。所述比较用户的数量可以为多个,则第二数据为各个比较用户的第二数据。In this embodiment, the target user is a user who wants to compare data. The number of the comparison users may be multiple, and the second data is the second data of each comparison user.
所述预设主题可为一段时间内的消费金额、身高、年龄等。The preset theme may be consumption amount, height, age, etc. within a period of time.
例如,目标用户为A用户,比较用户为B用户,目标用户的第一数据为A用户在过去半年内的第一累计消费金额,比较用户的第二数据为B用户在过去半年内的第二累计消费金额。For example, the target user is user A, the comparison user is user B, the first data of the target user is the first cumulative consumption amount of user A in the past six months, and the second data of the comparison user is the second data of user B in the past six months. Cumulative consumption amount.
在本实施例中,第一数据和第二数据都是经过同态加密的数据。一种可选实施例中,第一数据是在目标用户的客户端经过同态加密之后传输至数据比较中心的,第二数据是在比较用户的客户端经过同态加密之后传输至数据比较中心的。In this embodiment, the first data and the second data are both homomorphically encrypted data. In an alternative embodiment, the first data is transmitted to the data comparison center after the client of the target user is homomorphically encrypted, and the second data is transmitted to the data comparison center after the client of the comparison user is homomorphically encrypted of.
所述同态加密是指对给定的明文(x 1,x 2,…,x n),利用同态加密算法加密后得到密文c,全同态加密允许任何人对密文c进行任何操作f,操作后得到的密文f(c)解密后与f(x 1,x 2,…,x n)的结果是一样的。这个过程中(x 1,x 2,…,x n)、f(x 1,x 2,…,x n)以及任何中间明文都没有泄露;输入值、输出值、中间值一直是在加密状态中。对最后的密文形式的f(x 1,x 2,…,x n)有不同的要求,最低的要求就是它要能正确解密得出f(x 1,x 2,…,x n),而满足不同的密文计算特性就导致有不同形式的同态加密。 The homomorphic encryption refers to a given plaintext (x 1 , x 2 ,..., x n ), encrypted with a homomorphic encryption algorithm to obtain the ciphertext c. Fully homomorphic encryption allows anyone to perform anything on the ciphertext c. Operation f, the ciphertext f(c) obtained after the operation is the same as the result of f(x 1 ,x 2 ,...,x n ) after decryption. In this process (x 1 ,x 2 ,…,x n ), f(x 1 ,x 2 ,…,x n ) and any intermediate plaintext are not leaked; the input value, output value, and intermediate value are always in encrypted state in. There are different requirements for the final ciphertext form of f(x 1 ,x 2 ,…,x n ). The minimum requirement is that it can be decrypted correctly to obtain f(x 1 ,x 2 ,…,x n ), However, satisfying different ciphertext calculation characteristics leads to different forms of homomorphic encryption.
同态加密包括半同态加密和全同态加密,半同态加密是指数据加密满足加法同态或者乘法同态。RSA算法满足乘法同态,Paillier算法满足加法同态。Homomorphic encryption includes semi-homomorphic encryption and fully homomorphic encryption. Semi-homomorphic encryption means that data encryption meets additive homomorphism or multiplicative homomorphism. The RSA algorithm satisfies the multiplicative homomorphism, and the Paillier algorithm satisfies the additive homomorphism.
例如,对于RSA算法,公钥是(e,N),对明文M加密表示为C=E(M)=M emod N; For example, for the RSA algorithm, the public key (e, N), the encryption of the plaintext M is expressed as C = E (M) = M e mod N;
对任意M 1和M 2存在: Exist for any M 1 and M 2 :
Figure PCTCN2020086286-appb-000001
即对任意的明文M 1,M 2,…M n,都有:
Figure PCTCN2020086286-appb-000001
That is, for any plaintext M 1 , M 2 ,...M n , all have:
E(M 1)*E(M 2)*…E(M n)=E(M 1*M 2*…M n),即RSA算法满足乘法同态运算。 E(M 1 )*E(M 2 )*...E(M n )=E(M 1 *M 2 *...M n ), that is, the RSA algorithm satisfies the multiplication homomorphic operation.
在本申请一可选实施例中,第一数据和第二数据可以是使用非对称加密(RSA)算法进行加密得到的,具体的,第一数据和第二数据可以是通过数据比较中心提供的公钥进行加密得到的。In an optional embodiment of the present application, the first data and the second data may be obtained through encryption using an asymmetric encryption (RSA) algorithm. Specifically, the first data and the second data may be provided by the data comparison center The public key is encrypted.
步骤S201,通过同态操作比较所述第一数据和所述第二数据的大小,得到排序结果。Step S201: Comparing the sizes of the first data and the second data through a homomorphic operation to obtain a sorting result.
在本实施例中,对第一数据和第二数据进行同态操作,例如,同态操作为将第一数据和第二数据分别与标准数相加或者相乘之后再大小。In this embodiment, a homomorphic operation is performed on the first data and the second data. For example, the homomorphic operation is to add or multiply the first data and the second data with a standard number and then size them.
本实施例中,排序结果即为第一数据和第二数据哪一个数据更大,哪一个数据更小。In this embodiment, the sorting result is which one of the first data and the second data is larger and which one is smaller.
当第二数据为多个比较用户的数据时,将第一数据与多个数据分别进行比较,得到排序结果。When the second data is data of multiple comparison users, the first data and the multiple data are respectively compared to obtain the sorting result.
可选的,在本申请另一实施例中,所述通过同态操作比较所述第一数据和所述第二数据的大小,得到排序结果包括:Optionally, in another embodiment of the present application, the comparing the sizes of the first data and the second data through a homomorphic operation to obtain a sorting result includes:
将所述第一数据与所述第二数据的负数相加,得到第一计算结果,若所述第一计算结果为正数,得到所述第一数据大于所述第二数据的排序结果,若所述第一计算结果为负数,得到所述第一数据小于所述第二数据的排序结果;或者Adding the negative numbers of the first data and the second data to obtain a first calculation result, and if the first calculation result is a positive number, obtaining a sorting result in which the first data is greater than the second data, If the first calculation result is a negative number, a sorting result in which the first data is smaller than the second data is obtained; or
将所述第一数据的负数与所述第二数据相加,得到第二计算结果,所述第二计算结果为正数,得到所述第一数据小于所述第二数据的排序结果,若所述第二计算结果为负数,得到所述第一数据大于所述第二数据的排序结果。The negative number of the first data is added to the second data to obtain a second calculation result, the second calculation result is a positive number, and the sorting result that the first data is less than the second data is obtained, if The second calculation result is a negative number, and a sorting result in which the first data is greater than the second data is obtained.
步骤S301,从所述排序结果中获取所述目标用户的排序,以及将所述排序结果返回至所述目标用户。Step S301: Obtain the ranking of the target user from the ranking result, and return the ranking result to the target user.
例如,从排序结果中获取到目标用户的排序为第一;或者从排序结果中获取到目标用户的排序为第二;从排序结果中获取到目标用户的排序为第三。For example, the ranking of the target user obtained from the ranking result is the first; or the ranking of the target user obtained from the ranking result is the second; the ranking of the target user obtained from the ranking result is the third.
一种可选实施例中,将排序结果返回至目标用户包括将排序结果返回至目标用户的客户端,目标用户的客户端可以通过用户的分享操作将排序结果分享展示。In an optional embodiment, returning the sorting result to the target user includes returning the sorting result to the client of the target user, and the client of the target user may share and display the sorting result through the user's sharing operation.
可选的,在本申请另一实施例中,所述将所述排序结果返回至所述目标用户包括:Optionally, in another embodiment of the present application, the returning the sorting result to the target user includes:
利用接收到的所述目标用户发送的公钥对所述排序结果进行加密,得到加密排序结果;Encrypt the sorting result by using the received public key sent by the target user to obtain the encrypted sorting result;
将所述加密排序结果返回至所述目标用户。Returning the encrypted sorting result to the target user.
在本实施例中,在向目标用户返回排序结果时,可以将排序结果通过目标用户的公钥进行加密,则目标用户的客户端在接收到加密排序结果之后,通过私钥进行解密,进而获取到具体的排序结果,提高了数据传输过程中的安全性。In this embodiment, when returning the sorting result to the target user, the sorting result can be encrypted with the public key of the target user. After receiving the encrypted sorting result, the client of the target user decrypts it with the private key to obtain To the specific sorting result, the security in the data transmission process is improved.
步骤S401,若所述目标用户的排序为预设排序,获取与所述预设主题对应的推荐产品信息。Step S401: If the target user's ranking is a preset ranking, obtain recommended product information corresponding to the preset theme.
一种可选实施例中,目标用户的排序为预设排序包括:目标用户的排序为第一。In an optional embodiment, the ranking of the target users as the preset ranking includes: the ranking of the target users is the first.
另一可选实施例中,目标用户的排序为预设排序包括:目标用户的排序为前三分之一。In another optional embodiment, the ranking of the target users as the preset ranking includes: the ranking of the target users is the top third.
又一可选实例中,目标用户的排序为预设排序包括:目标用户的排序为前后三分之一。In another optional example, the order of the target users as the preset order includes: the order of the target users is the first third and the first third.
所述预设主题对应的推荐产品信息可以为预设的。例如,预设的与预设主题对应的推荐产品为低脂零食和低脂饮品,当用户的体脂率排名为最低时,向用户推荐低脂零食和低脂饮品。The recommended product information corresponding to the preset theme may be preset. For example, the preset recommended products corresponding to the preset theme are low-fat snacks and low-fat drinks. When the user's body fat percentage ranks the lowest, the low-fat snacks and low-fat drinks are recommended to the user.
在本申请其他实施例中,也可以根据预设主题和目标用户的不同排序来推荐不同的产品信息。In other embodiments of the present application, different product information can also be recommended according to different rankings of preset themes and target users.
可选的,在本申请另一实施中,所述获取与所述预设主题对应的推荐产品信息包括:Optionally, in another implementation of the present application, the obtaining recommended product information corresponding to the preset theme includes:
获取所述预设主题的相关信息;Acquiring relevant information about the preset theme;
将所述预设主题的相关信息映射到预设BOW模型的目标词典,得到目标直方图特征向量,所述目标词典为通过训练样本进行聚类处理得到的;Mapping the relevant information of the preset theme to the target dictionary of the preset BOW model to obtain the target histogram feature vector, the target dictionary being obtained by clustering processing of training samples;
将所述目标直方图特征向量输入至用于构建所述预设BOW模型的朴素贝叶斯分类器,通过所述朴素贝叶斯分类器对所述预设主题的相关信息进行分类,得到所述预设主题的相关信息的所属类别;The target histogram feature vector is input to the naive Bayes classifier used to construct the preset BOW model, and the relevant information of the preset topic is classified by the naive Bayes classifier to obtain all State the category of related information about the preset theme;
获取所述预设主题的相关信息的所属类别对应的待推荐产品信息;Obtaining the product information to be recommended corresponding to the category of the related information of the preset theme;
确定与所述预设主题的相关信息的所属类别对应的待推荐产品信息为所述预设主题对应的推荐产品信息。It is determined that the product information to be recommended corresponding to the category of the related information of the preset theme is the recommended product information corresponding to the preset theme.
本实施例中,所述预设主题的相关信息为与预设主题有关的信息。In this embodiment, the information related to the preset theme is information related to the preset theme.
例如,预设主题为消费主题,预设主题的消费相关信息包括消费的历史记录(如在不 同的消费时间和消费地点所购买的商品信息)。For example, the preset theme is a consumption theme, and the consumption-related information of the preset theme includes consumption history records (such as information about commodities purchased at different consumption times and locations).
在本实施例中,所述BOW模型为预先建立的,具体的BOW模型是通过聚类算法(如k-means算法和朴素贝叶斯分类器构建的)。In this embodiment, the BOW model is established in advance, and the specific BOW model is constructed by a clustering algorithm (such as the k-means algorithm and the naive Bayes classifier).
一种可选实施例中,所述预设BOW可以通过以下方式构建:In an optional embodiment, the preset BOW can be constructed in the following manner:
(1)利用聚类算法(例如k-means算法)进行大数据聚类,找到聚类中心点(即vocabulary)。所谓聚类,即根据相似性原则,将具有较高相似度的数据对象划分至同一类簇,将具有较高相异度的数据对象划分至不同类簇。其中k-means算法中的k代表类簇个数,means代表类簇内数据对象的均值(这种均值是一种对类簇中心的描述),因此,k-means算法又称为k-均值算法。k-means算法是一种基于划分的聚类算法,以距离作为数据对象间相似性度量的标准,即数据对象间的距离越小,它们的相似性越高,即表示它们越有可能在同一个类簇。在本申请实施例中,利用欧式距离计算数据对象间的距离,假设x i,x j为数据,D表示数据对象的属性个数,其两者间的距离为: (1) Use a clustering algorithm (such as k-means algorithm) to cluster big data and find the cluster center point (that is, vocabulary). The so-called clustering refers to dividing data objects with higher similarity into the same cluster, and dividing data objects with higher dissimilarity into different clusters according to the principle of similarity. Among them, k in the k-means algorithm represents the number of clusters, and means represents the mean value of the data objects in the cluster (this kind of mean is a description of the center of the cluster), therefore, the k-means algorithm is also called k-means algorithm. The k-means algorithm is a clustering algorithm based on partitioning. It uses distance as a measure of similarity between data objects. That is, the smaller the distance between data objects, the higher their similarity, which means that they are more likely to be in the same A cluster. In the embodiment of this application, the distance between data objects is calculated using Euclidean distance, assuming that x i and x j are data, D represents the number of attributes of the data object, and the distance between the two is:
Figure PCTCN2020086286-appb-000002
Figure PCTCN2020086286-appb-000002
其中,x i,d表示第i个点的第d维坐标,x j,d表示第j个点的d维坐标。 Among them, x i, d represents the d-dimensional coordinates of the i-th point, and x j, d represents the d-dimensional coordinates of the j-th point.
同时定义第k个类簇的类簇中心为Center k,其更新方式为: At the same time, define the cluster center of the k-th cluster as Center k , and its update method is:
Figure PCTCN2020086286-appb-000003
其中C k表示第k个类簇中数据对象的个数,Center k表示一个含有D属性的向量。
Figure PCTCN2020086286-appb-000003
Where C k represents the number of data objects in the k-th cluster, and Center k represents a vector containing D attributes.
最后,采用误差平方和准则函数得到最终聚类结果J:Finally, use the error sum of squares criterion function to obtain the final clustering result J:
Figure PCTCN2020086286-appb-000004
Figure PCTCN2020086286-appb-000004
训练数据像聚类中心映射,得到每一个训练数据在该聚类中心空间的一个低维表示。通过最终聚类结果J,将其作为直方图的基,用该基向量构造别的向量,并做映射,得到不同类别的一个类别的直方图的统计,这个过程也是对BOW模型特征提取的过程。The training data is like a cluster center mapping, and a low-dimensional representation of each training data in the cluster center space is obtained. Through the final clustering result J, use it as the basis of the histogram, use the basis vector to construct other vectors, and do mapping to obtain the statistics of the histogram of one category of different categories. This process is also the process of extracting the features of the BOW model. .
得到每一个训练数据的低维表示后,选择基于多项式的朴素贝叶斯分类器训练。朴素贝叶斯分类是低方差高偏差的分类器,假设各个特征之间存在条件独立性假设:对于给定的类别,所有的特征相互独立。对于给定样本x=(x 1,x 2,…,x d) T,其属于类别w i的后验概率为: After obtaining the low-dimensional representation of each training data, select a polynomial-based naive Bayes classifier for training. Naive Bayes classification is a classifier with low variance and high deviation. It is assumed that there is a conditional independence hypothesis between each feature: for a given category, all features are independent of each other. For a given sample x=(x 1 ,x 2 ,...,x d ) T , the posterior probability of belonging to category w i is:
Figure PCTCN2020086286-appb-000005
Figure PCTCN2020086286-appb-000005
其中d是特征维数,x k是样本在第k个特征上的取值。为避免数据稀疏的问题,可以先对数据使用平滑: Where d is the feature dimension, and x k is the value of the sample on the k-th feature. To avoid the problem of data sparseness, you can use smoothing on the data first:
Figure PCTCN2020086286-appb-000006
Figure PCTCN2020086286-appb-000006
其中c k表示第k维特征可能取值的个数,α为系数。本申请通过使用MLE极大似然估计方法,得到:
Figure PCTCN2020086286-appb-000007
其中,D i表示w i类的训练样本构成的集合,分子
Figure PCTCN2020086286-appb-000008
表示w i类的训练样本构成的集合D i中,第k个特征的取值为x k的样本数。
Where c k represents the number of possible values of the k-th dimension feature, and α is the coefficient. This application uses the MLE maximum likelihood estimation method to obtain:
Figure PCTCN2020086286-appb-000007
Among them, D i represents the set of training samples of class w i , and the numerator
Figure PCTCN2020086286-appb-000008
Training sample set D i w i represents the class configuration, the value of the k-th feature x k is the number of samples.
本实施例中,在构建预设BOW模型,以及获取到预设主题的相关信息后,将预设主题的相关信息映射到预设BOW模型的目标词典,其中,预设BOW模型的目标词典是构建BOW模型时通过聚类处理得到的聚类中心空间。In this embodiment, after constructing the preset BOW model and obtaining relevant information about the preset theme, the relevant information about the preset theme is mapped to the target dictionary of the preset BOW model, where the target dictionary of the preset BOW model is The cluster center space obtained by clustering when constructing the BOW model.
本实施例中,预设主题的相关信息的所属类别对应的待推荐产品信息可以为预设的,即预设设置不同类别所对应的待推荐产品信息的对应关系,则在获取到预设主题的相关信 息的所属类别之后,根据该类别获取与该类别对应的待推荐产品信息。In this embodiment, the product information to be recommended corresponding to the category of the related information of the preset theme may be preset, that is, the corresponding relationship of the product information to be recommended corresponding to different categories is preset, and then the preset theme is obtained After the category of the related information belongs to, the product information to be recommended corresponding to the category is obtained according to the category.
可选的,在本申请另一实施例中,所述确定与所述预设主题的相关信息的所属类别对应的产品信息为所述预设主题对应的推荐产品信息包括:Optionally, in another embodiment of the present application, the determining that the product information corresponding to the category of the related information of the preset theme is the recommended product information corresponding to the preset theme includes:
对所述预设主题的相关信息进行词频特征向量提取,得到词频向量;Performing word frequency feature vector extraction on the relevant information of the preset theme to obtain a word frequency vector;
计算所述词频向量与所述待推荐产品信息的相似度;Calculating the similarity between the word frequency vector and the product information to be recommended;
确定所述待推荐产品信息之中与所述词频向量的相似度大于预设相似度的产品信息为与所述预设主题对应的推荐产品信息。It is determined that among the product information to be recommended, the product information whose similarity with the word frequency vector is greater than the preset similarity is the recommended product information corresponding to the preset theme.
在实施例中,可以通过余弦相似度计算相似度。In an embodiment, the similarity can be calculated by the cosine similarity.
所述余弦相似度是用向量空间中两个向量夹角的余弦值作为衡量两个个体间差异的大小的度量,余弦值越接近1,就表明夹角越接近0度,也就是两个向量越相似。对于所得到的客户所晒主题的相关信息和推荐产品信息,利用下列式子进行计算:The cosine similarity uses the cosine value of the angle between two vectors in the vector space as a measure of the difference between two individuals. The closer the cosine value is to 1, the closer the angle is to 0 degrees, that is, the two vectors. The more similar. For the obtained information related to the topic posted by the customer and the recommended product information, use the following formula to calculate:
Figure PCTCN2020086286-appb-000009
Figure PCTCN2020086286-appb-000009
其中,X是所晒主题的相关信息的向量表示,Y是推荐产品信息的向量表示,X i表示向量X的分量,Y i表示向量Y的分量。 Where, X is a vector-related information indicate the subject matter of the sun, Y is a vector representation of the recommended product information, X i represents the component of the vector X, Y i represents the vector Y component.
通过以上式子得到的相似性范围从-1到1,其中-1表示两个向量指向的方向正好截然相反,1表示它们的指向是完全相同的,0通常表示它们之间是独立的。The similarity obtained by the above formula ranges from -1 to 1, where -1 means that the two vectors point in opposite directions, 1 means that their directions are exactly the same, and 0 usually means that they are independent.
本实施例中根据所计算出来的值来判断相似度,从而将相似度高的推荐产品信息推荐给目标用户,从而能够推荐更符合用户的产品。In this embodiment, the similarity is judged based on the calculated value, so that recommended product information with high similarity is recommended to the target user, so that products that are more suitable for the user can be recommended.
可选的,在本申请另一实施例中,所述将所述预设主题的相关信息映射到预设BOW模型的目标词典之前,所述方法还包括:Optionally, in another embodiment of the present application, before the mapping related information of the preset theme to the target dictionary of the preset BOW model, the method further includes:
对所述预设主题的相关信息进行文本处理,所述文本处理包括对所述预设主题的相关信息通过隐性马尔科夫模型进行分词处理,以及通过预设关键词提取算法对分词处理后的信息进行文本改写。Performing text processing on the related information of the preset topic, the text processing includes performing word segmentation processing on the related information of the preset topic through a hidden Markov model, and after word segmentation processing through a preset keyword extraction algorithm The information is rewritten.
本实施例中,先对预设主题的相关信息进行文本处理,再根据处理后得到的预设主题的相关信息进行映射到预设BOW模型的目标词典的操作。In this embodiment, text processing is performed on the related information of the preset theme first, and then the operation of mapping to the target dictionary of the preset BOW model is performed according to the related information of the preset theme obtained after processing.
所述文本改写(Rewrite)指对于一个文本,首先采用中文分词,然后进行清理、保留主干词,对主干词进行语义增强(同义词/关联词补充)。The text rewrite (Rewrite) refers to a text that first uses Chinese word segmentation, then cleans up, retains the main words, and performs semantic enhancement (synonym/related word supplement) on the main words.
首先,本申请对预设主题的相关信息通过构建隐性马尔可夫模型进行分词处理。由于文本满足马尔科夫性,即文本中第m个词出现的可能性只与其前面m-1个词语的出现有关,而与m个词语之前和第m个词语之后的所有词语无关,因此N元语法模型的目的是:在前m-1个词语出现的情况下,给出第m词语出现的概率,具体表示为:First, this application performs word segmentation processing on the related information of the preset topic by constructing a hidden Markov model. Since the text satisfies the Markov property, that is, the possibility of the occurrence of the m-th word in the text is only related to the occurrence of the preceding m-1 words, and has nothing to do with all words before the m-th word and after the m-th word, so N The purpose of the metagrammatic model is to give the probability of the occurrence of the m-th word when the first m-1 words appear, specifically expressed as:
P(W m|W 1,…W m-1)=P(W m|W 1,…W m-n+1,…W m-1) P(W m |W 1 ,…W m-1 )=P(W m |W 1 ,…W m-n+1 ,…W m-1 )
其中,m表示文本中的第任意个词,n表示第m个词的前一个词。Among them, m represents any word in the text, and n represents the previous word of the m-th word.
若句子S由词序列{W 1,W 2…W m}组成,则句子按照该词序排列的概率为: If the sentence S consists of the word sequence {W 1 , W 2 …W m }, the probability that the sentence is arranged according to the word order is:
P(S)=P(W 1W 2…W m)=P(W 1)P(W 2|W 1)…P(W m|W m-n+1,…W m-1) P(S)=P(W 1 W 2 …W m )=P(W 1 )P(W 2 |W 1 )…P(W m |W m-n+1 ,…W m-1 )
其中,条件概率P(W m|W m-n+1,…W m-1)表示:在字符串W m-n+1,…W m-1出现的情况下W m出现的概率,在大规模语料库训练的基础上,使用二元语法模型,因此,句子的概率模型为: Among them, the conditional probability P(W m |W m-n+1 ,...W m-1 ) means: the probability that W m appears when the character string W m-n+1 ,...W m-1 appears, in Based on the large-scale corpus training, the binary grammar model is used. Therefore, the probability model of the sentence is:
Figure PCTCN2020086286-appb-000010
Figure PCTCN2020086286-appb-000010
对句子S使用全切分法进行切分,获得所有可能的中文分词方式,然后计算每一种分词方式的概率,选出其中概率最大的一种分词方式,作为最终文本分词结果。选择过程即求P(S)的极大值:The sentence S is segmented using the full segmentation method to obtain all possible Chinese word segmentation methods, and then the probability of each word segmentation method is calculated, and the word segmentation method with the highest probability is selected as the final text segmentation result. The selection process is to find the maximum value of P(S):
Figure PCTCN2020086286-appb-000011
Figure PCTCN2020086286-appb-000011
由于预设主题的相关信息之中存在与主题无关的叙述,因此,本申请在基于隐性马尔可夫模型进行分词的情况下,进行关键词抽取。Since there are narratives that have nothing to do with the theme in the related information of the preset theme, this application performs keyword extraction in the case of word segmentation based on the hidden Markov model.
关键词提取算法是利用统计信息、词向量信息以及词语间的依存句法信息,通过构建依存关系图来计算词语之间的关联强度,利用TextRank算法迭代算出词语的重要度得分,首先根据句子的依存句法分析结果对所有非停用词构造无向图,接着利用词语之间的引力值以及依存关联度计算求得边的权重。因此,任意两词W i和W j的依存关联度为: The keyword extraction algorithm uses statistical information, word vector information, and dependency syntax information between words to calculate the correlation strength between words by constructing a dependency relationship graph, and iteratively calculates the importance score of words using the TextRank algorithm, first based on the dependency of the sentence The result of syntactic analysis constructs an undirected graph for all non-stop words, and then calculates the weight of the edges by using the gravity value between the words and the degree of dependence. Therefore, the dependency correlation degree of any two words W i and W j is:
Figure PCTCN2020086286-appb-000012
Figure PCTCN2020086286-appb-000012
其中,len(W i,W j)表示词语W i和W j之间的依存路径长度,b是超参数。 Among them, len(W i , W j ) represents the length of the dependency path between words W i and W j , and b is a hyperparameter.
同时,引入了IDF值,将词频替换为TF-IDF值,从而考虑到更全局性的信息。于是得到了新的词引力值公式。文本词语W i和的W j的引力: At the same time, the IDF value is introduced, and the word frequency is replaced with the TF-IDF value, thus taking into account more global information. So a new formula for the value of word gravity is obtained. Gravity of text words W i and W j :
Figure PCTCN2020086286-appb-000013
Figure PCTCN2020086286-appb-000013
其中,tfidf(W)是词W的TF-IDF值,d是词W i和W j的词向量之间的欧式距离。 Among them, tfidf(W) is the TF-IDF value of word W, and d is the Euclidean distance between the word vectors of words W i and W j .
因此,两个词语之间的关联度为:Therefore, the degree of relevance between the two words is:
weight(W i,W j)=Dep(W i,W j)*f grav(W i,W j) weight(W i ,W j )=Dep(W i ,W j )*f grav (W i ,W j )
最后,本申请利用TextRank算法建立一个无向图G=(V,E),其中V是顶点的集合,E是边的集合,根据下列式子算出顶点W i的得分WS(W i),其中,
Figure PCTCN2020086286-appb-000014
是与顶点W i有关的集合(指向顶点的顶点集合),η为阻尼系数,W k表示无向图G中的顶点,WS(W j)为顶点W j的得分。本实施例中,可以选取得分最高的若干个词语作为主干词,并对主干词进行语义增强。
Finally, the present application establish an undirected graph G = (V, E) using TextRank algorithm, where V is the set of vertices, E is the set of edges, the score is calculated the WS (W i) vertex W i according to the following formula, wherein ,
Figure PCTCN2020086286-appb-000014
Is the set related to the vertex W i (the set of vertices pointing to the vertex), η is the damping coefficient, W k represents the vertex in the undirected graph G, and WS(W j ) is the score of the vertex W j . In this embodiment, several words with the highest scores can be selected as the main words, and the main words can be semantically enhanced.
Figure PCTCN2020086286-appb-000015
Figure PCTCN2020086286-appb-000015
步骤S501,向所述目标用户发送与所述预设主题对应的推荐产品信息。Step S501: Send recommended product information corresponding to the preset theme to the target user.
例如,与预设消费主题对应的推荐产品信息为m电子产品的信息和n电子产品的信息,则向用户发送m电子产品的信息以及n电子产品的信息。For example, if the recommended product information corresponding to the preset consumption theme is the information of the m electronic product and the information of the n electronic product, the information of the m electronic product and the information of the n electronic product are sent to the user.
在获取到与预设主题对应的推荐产品信息之后,向目标用户发送该推荐产品信息,从而能够准确的向目标用户进行信息推荐。After obtaining the recommended product information corresponding to the preset theme, the recommended product information is sent to the target user, so that information can be accurately recommended to the target user.
本实施例提出的基于数据比较的信息推荐方法,获取目标用户的第一数据和比较用户的第二数据,所述第一数据和所述第二数据是有关预设主题的数据,且所述第一数据和所述第二数据是经过同态加密的数据;通过同态操作比较所述第一数据和所述第二数据的大小,得到排序结果;从所述排序结果中获取所述目标用户的排序,以及将所述排序结果返回至所述目标用户;若所述目标用户的排序为预设排序,获取与所述预设主题对应的推荐产品信息;向所述目标用户发送与所述预设主题对应的推荐产品信息。由于目标用户的第一数据和其他用户的第二数据是经过加密的数据,且通过同态操作进行数据比较,因此,本申请在数据比较的同时保护数据的细节不被公开;同时,由于在未获取用户数的细节的情况下仍能准确的进行排序,进而根据用户的排序进行个性化推荐,因此本申请实现了不仅能够保护用户的隐私数据,而且能够准确地进行个性化推荐的目的。The information recommendation method based on data comparison proposed in this embodiment obtains first data of a target user and second data of a comparison user. The first data and the second data are data related to a preset theme, and the The first data and the second data are homomorphic encrypted data; the size of the first data and the second data is compared through a homomorphic operation to obtain the sorting result; the target is obtained from the sorting result The ranking of users, and returning the ranking result to the target user; if the ranking of the target user is a preset ranking, obtain recommended product information corresponding to the preset theme; Describe the recommended product information corresponding to the preset theme. Since the first data of the target user and the second data of other users are encrypted data, and the data is compared through homomorphic operations, this application protects the details of the data from being disclosed while comparing the data; at the same time, because The user can still be sorted accurately without obtaining the details of the number of users, and then personalized recommendation can be made according to the user's ranking. Therefore, this application achieves the purpose of not only protecting the user's private data, but also accurately performing personalized recommendation.
本申请还提供一种基于数据比较的信息推荐装置。参照图2所示,为本申请一实施例提供的基于数据比较的信息推荐装置的内部结构示意图。This application also provides an information recommendation device based on data comparison. Referring to FIG. 2, it is a schematic diagram of the internal structure of an information recommendation device based on data comparison provided by an embodiment of this application.
在本实施例中,基于数据比较的信息推荐装置1可以是PC(Personal Computer,个人电脑),也可以是智能手机、平板电脑、便携计算机等终端设备。该基于数据比较的信息推荐装置1至少包括存储器11、处理器12,通信总线13,以及网络接口14。In this embodiment, the information recommendation device 1 based on data comparison may be a PC (Personal Computer, personal computer), or a terminal device such as a smart phone, a tablet computer, or a portable computer. The information recommendation device 1 based on data comparison at least includes a memory 11, a processor 12, a communication bus 13, and a network interface 14.
其中,存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、磁性存储器、磁盘、光盘等。存储器11在一些实施例中可以是基于数据比较的信息推荐装置1的内部存储单元,例如该基于数据比较的信息推荐装置1的硬盘。存储器11在另一些实施例中也可以是基于数据比较的信息推荐装置1的外部存储设备,例如基于数据比较的信息推荐装置1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器11还可以既包括基于数据比较的信息推荐装置1的内部存储单元也包括外部存储设备。存储器11不仅可以用于存储安装于基于数据比较的信息推荐装置1的应用软件及各类数据,例如基于数据比较的信息推荐程序01的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。Wherein, the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 11 may be an internal storage unit of the information recommendation device 1 based on data comparison, such as a hard disk of the information recommendation device 1 based on data comparison. In some other embodiments, the memory 11 may also be an external storage device of the information recommendation device 1 based on data comparison, such as a plug-in hard disk equipped on the information recommendation device 1 based on data comparison, and a smart media card (SMC). ), Secure Digital (SD) card, Flash Card, etc. Further, the memory 11 may also include both an internal storage unit of the information recommendation device 1 based on data comparison and an external storage device. The memory 11 can not only be used to store application software and various data installed in the information recommendation device 1 based on data comparison, such as the code of the information recommendation program 01 based on data comparison, etc., but also can be used to temporarily store what has been output or will be output The data.
处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器或其他数据处理芯片,用于运行存储器11中存储的程序代码或处理数据,例如执行基于数据比较的信息推荐程序01等。通信总线13用于实现这些组件之间的连接通信。网络接口14可选的可以包括标准的有线接口、无线接口(如WI-FI接口),通常用于在该装置1与其他电子设备之间建立通信连接。The processor 12 may be a central processing unit (CPU), controller, microcontroller, microprocessor or other data processing chip in some embodiments, and is used to run the program code or processing stored in the memory 11 Data, for example, the information recommendation program 01 based on data comparison is executed. The communication bus 13 is used to realize the connection and communication between these components. The network interface 14 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is usually used to establish a communication connection between the device 1 and other electronic devices.
可选地,该装置1还可以包括用户接口,用户接口可以包括显示器(Display)、输入单元比如键盘(Keyboard),可选的用户接口还可以包括标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode,OLED)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在基于数据比较的信息推荐装置1中处理的信息以及用于显示可视化的用户界面。Optionally, the device 1 may also include a user interface. The user interface may include a display (Display) and an input unit such as a keyboard (Keyboard). The optional user interface may also include a standard wired interface and a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch liquid crystal display, an organic light-emitting diode (OLED) touch device, and the like. Among them, the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the information recommendation device 1 based on data comparison and to display a visualized user interface.
图2仅示出了具有组件11-14以及基于数据比较的信息推荐程序01的基于数据比较的信息推荐装置1,本领域技术人员可以理解的是,图2示出的结构并不构成对基于数据比较的信息推荐装置1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。Figure 2 only shows the data comparison-based information recommendation device 1 with components 11-14 and the data-comparison-based information recommendation program 01. Those skilled in the art can understand that the structure shown in Figure 2 does not constitute a The limitation of the information recommendation device 1 for data comparison may include fewer or more components than shown, or a combination of certain components, or a different component arrangement.
在图2所示的装置1实施例中,存储器11中存储有基于数据比较的信息推荐程序01;处理器12执行存储器11中存储的基于数据比较的信息推荐程序01时实现如下步骤:In the embodiment of the device 1 shown in FIG. 2, the memory 11 stores the information recommendation program 01 based on data comparison; the processor 12 implements the following steps when executing the information recommendation program 01 based on the data comparison stored in the memory 11:
获取目标用户的第一数据和比较用户的第二数据,所述第一数据和所述第二数据是有关预设主题的数据,且所述第一数据和所述第二数据是经过同态加密的数据。Obtain the first data of the target user and the second data of the comparison user. The first data and the second data are data related to a preset theme, and the first data and the second data are homomorphic Encrypted data.
本实施例中,所述目标用户为要进行数据比较的用户。所述比较用户的数量可以为多个,则第二数据为各个比较用户的第二数据。In this embodiment, the target user is a user who wants to compare data. The number of the comparison users may be multiple, and the second data is the second data of each comparison user.
所述预设主题可为一段时间内的消费金额、身高、年龄等。The preset theme may be consumption amount, height, age, etc. within a period of time.
例如,目标用户为A用户,比较用户为B用户,目标用户的第一数据为A用户在过去半年内的第一累计消费金额,比较用户的第二数据为B用户在过去半年内的第二累计消费金额。For example, the target user is user A, the comparison user is user B, the first data of the target user is the first cumulative consumption amount of user A in the past six months, and the second data of the comparison user is the second data of user B in the past six months. Cumulative consumption amount.
在本实施例中,第一数据和第二数据都是经过同态加密的数据。一种可选实施例中,第一数据是在目标用户的客户端经过同态加密之后传输至数据比较中心的,第二数据是在比较用户的客户端经过同态加密之后传输至数据比较中心的。In this embodiment, the first data and the second data are both homomorphically encrypted data. In an optional embodiment, the first data is transmitted to the data comparison center after the client of the target user is homomorphically encrypted, and the second data is transmitted to the data comparison center after the client of the comparison user is homomorphically encrypted of.
所述同态加密是指对给定的明文(x 1,x 2,…,x n),利用同态加密算法加密后得到密文c,全同态加密允许任何人对密文c进行任何操作f,操作后得到的密文f(c)解密后与f(x 1,x 2,…,x n)的结果是一样的。这个过程中(x 1,x 2,…,x n)、f(x 1,x 2,…,x n)以及任何中间明文都没有泄露;输入值、输出值、中间值一直是在加密状态中。对最后的密文形式的f(x 1,x 2,…,x n)有不同的要求,最低的要求就是它要能正确解密得出f(x 1,x 2,…,x n),而满足不同的密文计算特性就导致有不同形式的同态加密。 The homomorphic encryption refers to a given plaintext (x 1 , x 2 ,..., x n ), encrypted with a homomorphic encryption algorithm to obtain the ciphertext c. Fully homomorphic encryption allows anyone to perform anything on the ciphertext c. Operation f, the ciphertext f(c) obtained after the operation is the same as the result of f(x 1 ,x 2 ,...,x n ) after decryption. In this process (x 1 ,x 2 ,…,x n ), f(x 1 ,x 2 ,…,x n ) and any intermediate plaintext are not leaked; the input value, output value, and intermediate value are always in encrypted state in. There are different requirements for the final ciphertext form of f(x 1 ,x 2 ,…,x n ). The minimum requirement is that it can be decrypted correctly to obtain f(x 1 ,x 2 ,…,x n ), However, satisfying different ciphertext calculation characteristics leads to different forms of homomorphic encryption.
同态加密包括半同态加密和全同态加密,半同态加密是指数据加密满足加法同态或者 乘法同态。RSA算法满足乘法同态,Paillier算法满足加法同态。Homomorphic encryption includes semi-homomorphic encryption and fully homomorphic encryption. Semi-homomorphic encryption means that data encryption meets additive homomorphism or multiplicative homomorphism. The RSA algorithm satisfies the multiplicative homomorphism, and the Paillier algorithm satisfies the additive homomorphism.
例如,对于RSA算法,公钥是(e,N),对明文M加密表示为C=E(M)=M emod N; For example, for the RSA algorithm, the public key (e, N), the encryption of the plaintext M is expressed as C = E (M) = M e mod N;
对任意M 1和M 2存在: Exist for any M 1 and M 2 :
Figure PCTCN2020086286-appb-000016
即对任意的明文M 1,M 2,…M n,都有:
Figure PCTCN2020086286-appb-000016
That is, for any plaintext M 1 , M 2 ,...M n , all have:
E(M 1)*E(M 2)*…E(M n)=E(M 1*M 2*…M n),即RSA算法满足乘法同态运算。 E(M 1 )*E(M 2 )*...E(M n )=E(M 1 *M 2 *...M n ), that is, the RSA algorithm satisfies the multiplication homomorphic operation.
在本申请一可选实施例中,第一数据和第二数据可以是使用非对称加密(RSA)算法进行加密得到的,具体的,第一数据和第二数据可以是通过数据比较中心提供的公钥进行加密得到的。In an optional embodiment of the present application, the first data and the second data may be obtained through encryption using an asymmetric encryption (RSA) algorithm. Specifically, the first data and the second data may be provided by the data comparison center The public key is encrypted.
通过同态操作比较所述第一数据和所述第二数据的大小,得到排序结果。The size of the first data and the second data is compared through a homomorphic operation to obtain a sorting result.
在本实施例中,对第一数据和第二数据进行同态操作,例如,同态操作为将第一数据和第二数据分别与标准数相加或者相乘之后再大小。In this embodiment, a homomorphic operation is performed on the first data and the second data. For example, the homomorphic operation is to add or multiply the first data and the second data with a standard number and then size them.
本实施例中,排序结果即为第一数据和第二数据哪一个数据更大,哪一个数据更小。In this embodiment, the sorting result is which one of the first data and the second data is larger and which one is smaller.
当第二数据为多个比较用户的数据时,将第一数据与多个数据分别进行比较,得到排序结果。When the second data is data of multiple comparison users, the first data and the multiple data are respectively compared to obtain the sorting result.
可选的,在本申请另一实施例中,所述通过同态操作比较所述第一数据和所述第二数据的大小,得到排序结果包括:Optionally, in another embodiment of the present application, the comparing the sizes of the first data and the second data through a homomorphic operation to obtain a sorting result includes:
将所述第一数据与所述第二数据的负数相加,得到第一计算结果,若所述第一计算结果为正数,得到所述第一数据大于所述第二数据的排序结果,若所述第一计算结果为负数,得到所述第一数据小于所述第二数据的排序结果;或者Adding the negative numbers of the first data and the second data to obtain a first calculation result, and if the first calculation result is a positive number, obtaining a sorting result in which the first data is greater than the second data, If the first calculation result is a negative number, a sorting result in which the first data is smaller than the second data is obtained; or
将所述第一数据的负数与所述第二数据相加,得到第二计算结果,所述第二计算结果为正数,得到所述第一数据小于所述第二数据的排序结果,若所述第二计算结果为负数,得到所述第一数据大于所述第二数据的排序结果。The negative number of the first data is added to the second data to obtain a second calculation result, the second calculation result is a positive number, and the sorting result that the first data is less than the second data is obtained, if The second calculation result is a negative number, and a sorting result in which the first data is greater than the second data is obtained.
从所述排序结果中获取所述目标用户的排序,以及将所述排序结果返回至所述目标用户。Obtain the ranking of the target user from the ranking result, and return the ranking result to the target user.
例如,从排序结果中获取到目标用户的排序为第一;或者从排序结果中获取到目标用户的排序为第二;从排序结果中获取到目标用户的排序为第三。For example, the ranking of the target user obtained from the ranking result is the first; or the ranking of the target user obtained from the ranking result is the second; the ranking of the target user obtained from the ranking result is the third.
一种可选实施例中,将排序结果返回至目标用户包括将排序结果返回至目标用户的客户端,目标用户的客户端可以通过用户的分享操作将排序结果分享展示。In an optional embodiment, returning the sorting result to the target user includes returning the sorting result to the client of the target user, and the client of the target user may share and display the sorting result through the user's sharing operation.
可选的,在本申请另一实施例中,所述将所述排序结果返回至所述目标用户包括:Optionally, in another embodiment of the present application, the returning the sorting result to the target user includes:
利用接收到的所述目标用户发送的公钥对所述排序结果进行加密,得到加密排序结果;Encrypt the sorting result by using the received public key sent by the target user to obtain the encrypted sorting result;
将所述加密排序结果返回至所述目标用户。Returning the encrypted sorting result to the target user.
在本实施例中,在向目标用户返回排序结果时,可以将排序结果通过目标用户的公钥进行加密,则目标用户的客户端在接收到加密排序结果之后,通过私钥进行解密,进而获取到具体的排序结果,提高了数据传输过程中的安全性。In this embodiment, when returning the sorting result to the target user, the sorting result can be encrypted with the public key of the target user. After receiving the encrypted sorting result, the client of the target user decrypts it with the private key to obtain To the specific sorting result, the security in the data transmission process is improved.
若所述目标用户的排序为预设排序,获取与所述预设主题对应的推荐产品信息。If the ranking of the target user is a preset ranking, obtain recommended product information corresponding to the preset theme.
一种可选实施例中,目标用户的排序为预设排序包括:目标用户的排序为第一。In an optional embodiment, the ranking of the target users as the preset ranking includes: the ranking of the target users is the first.
另一可选实施例中,目标用户的排序为预设排序包括:目标用户的排序为前三分之一。In another optional embodiment, the ranking of the target users as the preset ranking includes: the ranking of the target users is the top third.
又一可选实例中,目标用户的排序为预设排序包括:目标用户的排序为前后三分之一。In another optional example, the order of the target users as the preset order includes: the order of the target users is the first third and the first third.
所述预设主题对应的推荐产品信息可以为预设的。例如,预设的与预设主题对应的推荐产品为低脂零食和低脂饮品,当用户的体脂率排名为最低时,向用户推荐低脂零食和低脂饮品。The recommended product information corresponding to the preset theme may be preset. For example, the preset recommended products corresponding to the preset theme are low-fat snacks and low-fat drinks. When the user's body fat percentage ranks the lowest, the low-fat snacks and low-fat drinks are recommended to the user.
在本申请其他实施例中,也可以根据预设主题和目标用户的不同排序来推荐不同的产品信息。In other embodiments of the present application, different product information can also be recommended according to different rankings of preset themes and target users.
可选的,在本申请另一实施中,所述获取与所述预设主题对应的推荐产品信息包括:Optionally, in another implementation of the present application, the obtaining recommended product information corresponding to the preset theme includes:
获取所述预设主题的相关信息;Acquiring relevant information about the preset theme;
将所述预设主题的相关信息映射到预设BOW模型的目标词典,得到目标直方图特征向量,所述目标词典为通过训练样本进行聚类处理得到的;Mapping the relevant information of the preset theme to the target dictionary of the preset BOW model to obtain the target histogram feature vector, the target dictionary being obtained by clustering processing of training samples;
将所述目标直方图特征向量输入至用于构建所述预设BOW模型的朴素贝叶斯分类器,通过所述朴素贝叶斯分类器对所述预设主题的相关信息进行分类,得到所述预设主题的相关信息的所属类别;The target histogram feature vector is input to the naive Bayes classifier used to construct the preset BOW model, and the relevant information of the preset topic is classified by the naive Bayes classifier to obtain all State the category of related information about the preset theme;
获取所述预设主题的相关信息的所属类别对应的待推荐产品信息;Obtaining the product information to be recommended corresponding to the category of the related information of the preset theme;
确定与所述预设主题的相关信息的所属类别对应的待推荐产品信息为所述预设主题对应的推荐产品信息。It is determined that the product information to be recommended corresponding to the category of the related information of the preset theme is the recommended product information corresponding to the preset theme.
本实施例中,所述预设主题的相关信息为与预设主题有关的信息。In this embodiment, the information related to the preset theme is information related to the preset theme.
例如,预设主题为消费主题,预设主题的消费相关信息包括消费的历史记录(如在不同的消费时间和消费地点所购买的商品信息)。For example, the preset theme is a consumption theme, and the consumption-related information of the preset theme includes consumption history records (such as information on commodities purchased at different consumption times and locations).
在本实施例中,所述BOW模型为预先建立的,具体的BOW模型是通过聚类算法(如k-means算法和朴素贝叶斯分类器构建的)。In this embodiment, the BOW model is established in advance, and the specific BOW model is constructed by a clustering algorithm (such as the k-means algorithm and the naive Bayes classifier).
一种可选实施例中,所述预设BOW可以通过以下方式构建:In an optional embodiment, the preset BOW can be constructed in the following manner:
(1)利用聚类算法(例如k-means算法)进行大数据聚类,找到聚类中心点(即vocabulary)。所谓聚类,即根据相似性原则,将具有较高相似度的数据对象划分至同一类簇,将具有较高相异度的数据对象划分至不同类簇。其中k-means算法中的k代表类簇个数,means代表类簇内数据对象的均值(这种均值是一种对类簇中心的描述),因此,k-means算法又称为k-均值算法。k-means算法是一种基于划分的聚类算法,以距离作为数据对象间相似性度量的标准,即数据对象间的距离越小,它们的相似性越高,即表示它们越有可能在同一个类簇。在本申请实施例中,利用欧式距离计算数据对象间的距离,假设x i,x j为数据,D表示数据对象的属性个数,其两者间的距离为: (1) Use a clustering algorithm (such as k-means algorithm) to cluster big data and find the cluster center point (that is, vocabulary). The so-called clustering refers to dividing data objects with higher similarity into the same cluster, and dividing data objects with higher dissimilarity into different clusters according to the principle of similarity. Among them, k in the k-means algorithm represents the number of clusters, and means represents the mean value of the data objects in the cluster (this kind of mean is a description of the center of the cluster), therefore, the k-means algorithm is also called k-means algorithm. The k-means algorithm is a clustering algorithm based on partitioning. It uses distance as a measure of similarity between data objects. That is, the smaller the distance between data objects, the higher their similarity, which means that they are more likely to be in the same A cluster. In the embodiment of this application, the distance between data objects is calculated using Euclidean distance, assuming that x i and x j are data, D represents the number of attributes of the data object, and the distance between the two is:
Figure PCTCN2020086286-appb-000017
Figure PCTCN2020086286-appb-000017
其中,x i,d表示第i个点的第d维坐标,x j,d表示第j个点的d维坐标。 Among them, x i, d represents the d-dimensional coordinates of the i-th point, and x j, d represents the d-dimensional coordinates of the j-th point.
同时定义第k个类簇的类簇中心为Center k,其更新方式为: At the same time, define the cluster center of the k-th cluster as Center k , and its update method is:
Figure PCTCN2020086286-appb-000018
其中C k表示第k个类簇中数据对象的个数,Center k表示一个含有D属性的向量。
Figure PCTCN2020086286-appb-000018
Where C k represents the number of data objects in the k-th cluster, and Center k represents a vector containing D attributes.
最后,采用误差平方和准则函数得到最终聚类结果J:Finally, use the error sum of squares criterion function to obtain the final clustering result J:
Figure PCTCN2020086286-appb-000019
Figure PCTCN2020086286-appb-000019
训练数据像聚类中心映射,得到每一个训练数据在该聚类中心空间的一个低维表示。通过最终聚类结果J,将其作为直方图的基,用该基向量构造别的向量,并做映射,得到不同类别的一个类别的直方图的统计,这个过程也是对BOW模型特征提取的过程。The training data is like a cluster center mapping, and a low-dimensional representation of each training data in the cluster center space is obtained. Through the final clustering result J, use it as the basis of the histogram, use the basis vector to construct other vectors, and do mapping to obtain the statistics of the histogram of one category of different categories. This process is also the process of extracting the features of the BOW model. .
得到每一个训练数据的低维表示后,选择基于多项式的朴素贝叶斯分类器训练。朴素贝叶斯分类是低方差高偏差的分类器,假设各个特征之间存在条件独立性假设:对于给定的类别,所有的特征相互独立。对于给定样本x=(x 1,x 2,…,x d) T,其属于类别w i的后验概率为: After obtaining the low-dimensional representation of each training data, select a polynomial-based naive Bayes classifier for training. Naive Bayes classification is a classifier with low variance and high deviation. It is assumed that there is a conditional independence hypothesis between each feature: for a given category, all features are independent of each other. For a given sample x=(x 1 ,x 2 ,...,x d ) T , the posterior probability of belonging to category w i is:
Figure PCTCN2020086286-appb-000020
Figure PCTCN2020086286-appb-000020
其中d是特征维数,x k是样本在第k个特征上的取值。为避免数据稀疏的问题,可以先对数据使用平滑: Where d is the feature dimension, and x k is the value of the sample on the k-th feature. To avoid the problem of data sparseness, you can use smoothing on the data first:
Figure PCTCN2020086286-appb-000021
Figure PCTCN2020086286-appb-000021
其中c k表示第k维特征可能取值的个数,α为系数。本申请通过使用MLE极大似然估计方法,得到:
Figure PCTCN2020086286-appb-000022
其中,D i表示w i类的训练样本构成的集合,分子
Figure PCTCN2020086286-appb-000023
表示w i类的训练样本构成的集合D i中,第k个特征的取值为x k的样本数。
Where c k represents the number of possible values of the k-th dimension feature, and α is the coefficient. This application uses the MLE maximum likelihood estimation method to obtain:
Figure PCTCN2020086286-appb-000022
Among them, D i represents the set of training samples of class w i , and the numerator
Figure PCTCN2020086286-appb-000023
Training sample set D i w i represents the class configuration, the value of the k-th feature x k is the number of samples.
本实施例中,在构建预设BOW模型,以及获取到预设主题的相关信息后,将预设主题的相关信息映射到预设BOW模型的目标词典,其中,预设BOW模型的目标词典是构建BOW模型时通过聚类处理得到的聚类中心空间。In this embodiment, after constructing the preset BOW model and obtaining relevant information about the preset theme, the relevant information about the preset theme is mapped to the target dictionary of the preset BOW model, where the target dictionary of the preset BOW model is The cluster center space obtained by clustering when constructing the BOW model.
本实施例中,预设主题的相关信息的所属类别对应的待推荐产品信息可以为预设的,即预设设置不同类别所对应的待推荐产品信息的对应关系,则在获取到预设主题的相关信息的所属类别之后,根据该类别获取与该类别对应的待推荐产品信息。In this embodiment, the product information to be recommended corresponding to the category of the related information of the preset theme may be preset, that is, the corresponding relationship of the product information to be recommended corresponding to different categories is preset, and then the preset theme is obtained After the category of the related information belongs to, the product information to be recommended corresponding to the category is obtained according to the category.
可选的,在本申请另一实施例中,所述确定与所述预设主题的相关信息的所属类别对应的产品信息为所述预设主题对应的推荐产品信息包括:Optionally, in another embodiment of the present application, the determining that the product information corresponding to the category of the related information of the preset theme is the recommended product information corresponding to the preset theme includes:
对所述预设主题的相关信息进行词频特征向量提取,得到词频向量;Performing word frequency feature vector extraction on the relevant information of the preset theme to obtain a word frequency vector;
计算所述词频向量与所述待推荐产品信息的相似度;Calculating the similarity between the word frequency vector and the product information to be recommended;
确定所述待推荐产品信息之中与所述词频向量的相似度大于预设相似度的产品信息为与所述预设主题对应的推荐产品信息。It is determined that among the product information to be recommended, the product information whose similarity with the word frequency vector is greater than the preset similarity is the recommended product information corresponding to the preset theme.
在实施例中,可以通过余弦相似度计算相似度。In an embodiment, the similarity can be calculated by the cosine similarity.
所述余弦相似度是用向量空间中两个向量夹角的余弦值作为衡量两个个体间差异的大小的度量,余弦值越接近1,就表明夹角越接近0度,也就是两个向量越相似。对于所得到的客户所晒主题的相关信息X和推荐产品信息Y,利用下列式子进行计算:The cosine similarity uses the cosine value of the angle between two vectors in the vector space as a measure of the difference between two individuals. The closer the cosine value is to 1, the closer the angle is to 0 degrees, that is, the two vectors. The more similar. For the obtained related information X and recommended product information Y of the subject posted by the customer, use the following formula to calculate:
Figure PCTCN2020086286-appb-000024
Figure PCTCN2020086286-appb-000024
其中,X是所晒主题的相关信息的向量表示,Y是推荐产品信息的向量表示,X i表示向量X的分量,Y i表示向量Y的分量。 Where, X is a vector-related information indicate the subject matter of the sun, Y is a vector representation of the recommended product information, X i represents the component of the vector X, Y i represents the vector Y component.
通过以上式子得到的相似性范围从-1到1,其中-1表示两个向量指向的方向正好截然相反,1表示它们的指向是完全相同的,0通常表示它们之间是独立的。The similarity obtained by the above formula ranges from -1 to 1, where -1 means that the two vectors point in opposite directions, 1 means that their directions are exactly the same, and 0 usually means that they are independent.
本实施例中根据所计算出来的值来判断相似度,从而将相似度高的推荐产品信息推荐给目标用户,从而能够推荐更符合用户的产品。In this embodiment, the similarity is judged based on the calculated value, so that recommended product information with high similarity is recommended to the target user, so that products that are more suitable for the user can be recommended.
可选的,在本申请另一实施例中,将所述预设主题的相关信息映射到预设BOW模型的目标词典之前,对所述预设主题的相关信息进行文本处理,所述文本处理包括对所述预设主题的相关信息通过隐性马尔科夫模型进行分词处理,以及通过预设关键词提取算法对分词处理后的信息进行文本改写。Optionally, in another embodiment of the present application, before mapping the related information of the preset topic to the target dictionary of the preset BOW model, text processing is performed on the related information of the preset topic, and the text processing It includes performing word segmentation processing on the related information of the preset topic through a hidden Markov model, and performing text rewriting on the information after the word segmentation processing through a preset keyword extraction algorithm.
本实施例中,先对预设主题的相关信息进行文本处理,再根据处理后得到的预设主题的相关信息进行映射到预设BOW模型的目标词典的操作。In this embodiment, text processing is performed on the related information of the preset theme first, and then the operation of mapping to the target dictionary of the preset BOW model is performed according to the related information of the preset theme obtained after processing.
所述文本改写(Rewrite)指对于一个文本,首先采用中文分词,然后进行清理、保留主干词,对主干词进行语义增强(同义词/关联词补充)。The text rewrite (Rewrite) refers to a text that first uses Chinese word segmentation, then cleans up, retains the main words, and performs semantic enhancement (synonym/related word supplement) on the main words.
首先,本申请对预设主题的相关信息通过构建隐性马尔可夫模型进行分词处理。由于文本满足马尔科夫性,即文本中第m个词出现的可能性只与其前面m-1个词语的出现有关,而与m个词语之前和第m个词语之后的所有词语无关,因此N元语法模型的目的是:在前m-1个词语出现的情况下,给出第m词语出现的概率,具体表示为:First, this application performs word segmentation processing on the related information of the preset topic by constructing a hidden Markov model. Since the text satisfies the Markov property, that is, the possibility of the occurrence of the m-th word in the text is only related to the occurrence of the preceding m-1 words, and has nothing to do with all words before the m-th word and after the m-th word, so N The purpose of the metagrammatic model is to give the probability of the occurrence of the m-th word when the first m-1 words appear, specifically expressed as:
P(W m|W 1,…W m-1)=P(W m|W 1,…W m-n+1,…W m-1) P(W m |W 1 ,…W m-1 )=P(W m |W 1 ,…W m-n+1 ,…W m-1 )
其中,m表示文本中的第任意个词,n表示第m个词的前一个词。Among them, m represents any word in the text, and n represents the previous word of the m-th word.
若句子S由词序列{W 1,W 2…W m}组成,则句子按照该词序排列的概率为: If the sentence S consists of the word sequence {W 1 , W 2 …W m }, the probability that the sentence is arranged according to the word order is:
P(S)=P(W 1W 2…W m)=P(W 1)P(W 2|W 1)…P(W m|W m-n+1,…W m-1) P(S)=P(W 1 W 2 …W m )=P(W 1 )P(W 2 |W 1 )…P(W m |W m-n+1 ,…W m-1 )
其中,条件概率P(W m|W m-n+1,…W m-1)表示:在字符串W m-n+1,…W m-1出现的情况下W m出现的概率,在大规模语料库训练的基础上,使用二元语法模型,因此,句子的概率模型为: Among them, the conditional probability P(W m |W m-n+1 ,...W m-1 ) means: the probability that W m appears when the character string W m-n+1 ,...W m-1 appears, in Based on the large-scale corpus training, the binary grammar model is used. Therefore, the probability model of the sentence is:
Figure PCTCN2020086286-appb-000025
Figure PCTCN2020086286-appb-000025
对句子S使用全切分法进行切分,获得所有可能的中文分词方式,然后计算每一种分词方式的概率,选出其中概率最大的一种分词方式,作为最终文本分词结果。选择过程即求P(S)的极大值:The sentence S is segmented using the full segmentation method to obtain all possible Chinese word segmentation methods, and then the probability of each word segmentation method is calculated, and the word segmentation method with the highest probability is selected as the final text segmentation result. The selection process is to find the maximum value of P(S):
Figure PCTCN2020086286-appb-000026
Figure PCTCN2020086286-appb-000026
由于预设主题的相关信息之中存在与主题无关的叙述,因此,本申请在基于隐性马尔可夫模型进行分词的情况下,进行关键词抽取。Since there are narratives that have nothing to do with the theme in the related information of the preset theme, this application performs keyword extraction in the case of word segmentation based on the hidden Markov model.
关键词提取算法是利用统计信息、词向量信息以及词语间的依存句法信息,通过构建依存关系图来计算词语之间的关联强度,利用TextRank算法迭代算出词语的重要度得分,首先根据句子的依存句法分析结果对所有非停用词构造无向图,接着利用词语之间的引力值以及依存关联度计算求得边的权重。因此,任意两词W i和W j的依存关联度为: The keyword extraction algorithm uses statistical information, word vector information, and dependency syntax information between words to calculate the correlation strength between words by constructing a dependency relationship graph, and iteratively calculates the importance score of words using the TextRank algorithm, first based on the dependency of the sentence The result of syntactic analysis constructs an undirected graph for all non-stop words, and then calculates the weight of the edges by using the gravity value between the words and the degree of dependence. Therefore, the dependency correlation degree of any two words W i and W j is:
Figure PCTCN2020086286-appb-000027
Figure PCTCN2020086286-appb-000027
其中,len(W i,W j)表示词语W i和W j之间的依存路径长度,b是超参数。 Among them, len(W i , W j ) represents the length of the dependency path between words W i and W j , and b is a hyperparameter.
同时,引入了IDF值,将词频替换为TF-IDF值,从而考虑到更全局性的信息。于是得到了新的词引力值公式。文本词语W i和的W j的引力: At the same time, the IDF value is introduced, and the word frequency is replaced with the TF-IDF value, thus taking into account more global information. So a new formula for the value of word gravity is obtained. Gravity of text words W i and W j :
Figure PCTCN2020086286-appb-000028
Figure PCTCN2020086286-appb-000028
其中,tfidf(W)是词W的TF-IDF值,d是词W i和W j的词向量之间的欧式距离。 Among them, tfidf(W) is the TF-IDF value of word W, and d is the Euclidean distance between the word vectors of words W i and W j .
因此,两个词语之间的关联度为:Therefore, the degree of relevance between the two words is:
weight(W i,W j)=Dep(W i,W j)*f grav(W i,W j) weight(W i ,W j )=Dep(W i ,W j )*f grav (W i ,W j )
最后,本申请利用TextRank算法建立一个无向图G=(V,E),其中V是顶点的集合,E是边的集合,根据下列式子算出顶点W i的得分WS(W i),其中,
Figure PCTCN2020086286-appb-000029
是与顶点W i有关的集合(指向顶点的顶点集合),η为阻尼系数,W k表示无向图G中的顶点,WS(W j)为顶点W j的得分。本实施例中,可以选取得分最高的若干个词语作为主干词,并对主干词进行语义增强。
Figure PCTCN2020086286-appb-000030
向所述目标用户发送与所述预设主题对应的推荐产品信息。
Finally, the present application establish an undirected graph G = (V, E) using TextRank algorithm, where V is the set of vertices, E is the set of edges, the score is calculated the WS (W i) vertex W i according to the following formula, wherein ,
Figure PCTCN2020086286-appb-000029
Is the set related to the vertex W i (the set of vertices pointing to the vertex), η is the damping coefficient, W k represents the vertex in the undirected graph G, and WS(W j ) is the score of the vertex W j . In this embodiment, several words with the highest scores can be selected as the main words, and the main words can be semantically enhanced.
Figure PCTCN2020086286-appb-000030
Sending recommended product information corresponding to the preset theme to the target user.
例如,与预设消费主题对应的推荐产品信息为m电子产品的信息和n电子产品的信息,则向用户发送m电子产品的信息以及n电子产品的信息。For example, if the recommended product information corresponding to the preset consumption theme is the information of the m electronic product and the information of the n electronic product, the information of the m electronic product and the information of the n electronic product are sent to the user.
在获取到与预设主题对应的推荐产品信息之后,向目标用户发送该推荐产品信息,从而能够准确的向目标用户进行信息推荐。After obtaining the recommended product information corresponding to the preset theme, the recommended product information is sent to the target user, so that information can be accurately recommended to the target user.
本实施例提出的基于数据比较的信息推荐装置,获取目标用户的第一数据和比较用户的第二数据,所述第一数据和所述第二数据是有关预设主题的数据,且所述第一数据和所述第二数据是经过同态加密的数据;通过同态操作比较所述第一数据和所述第二数据的大小,得到排序结果;从所述排序结果中获取所述目标用户的排序,以及将所述排序结果返回至所述目标用户;若所述目标用户的排序为预设排序,获取与所述预设主题对应的推荐产品信息;向所述目标用户发送与所述预设主题对应的推荐产品信息。由于目标用户的第 一数据和其他用户的第二数据是经过加密的数据,且通过同态操作进行数据比较,因此,本申请在数据比较的同时保护数据的细节不被公开;同时,由于在未获取用户数的细节的情况下仍能准确的进行排序,进而根据用户的排序进行个性化推荐,因此本申请实现了不仅能够保护用户的隐私数据,而且能够准确地进行个性化推荐的目的。The information recommendation device based on data comparison proposed in this embodiment obtains the first data of the target user and the second data of the comparison user. The first data and the second data are data related to a preset theme, and the The first data and the second data are homomorphic encrypted data; the size of the first data and the second data is compared through a homomorphic operation to obtain the sorting result; the target is obtained from the sorting result User ranking, and returning the ranking result to the target user; if the ranking of the target user is a preset ranking, obtain the recommended product information corresponding to the preset theme; and send information to the target user Describe the recommended product information corresponding to the preset theme. Since the first data of the target user and the second data of other users are encrypted data, and the data is compared through homomorphic operations, this application protects the details of the data from being disclosed while comparing the data; at the same time, because The user can still be sorted accurately without obtaining the details of the number of users, and then personalized recommendation can be made according to the user's ranking. Therefore, this application achieves the purpose of not only protecting the user's private data, but also accurately performing personalized recommendation.
可选地,在其他实施例中,基于数据比较的信息推荐程序还可以被分割为一个或者多个模块,一个或者多个模块被存储于存储器11中,并由一个或多个处理器(本实施例为处理器12)所执行以完成本申请,本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段,用于描述基于数据比较的信息推荐程序在基于数据比较的信息推荐装置中的执行过程。Optionally, in other embodiments, the information recommendation program based on data comparison may also be divided into one or more modules, and the one or more modules are stored in the memory 11 and run by one or more processors (this The embodiment is executed by the processor 12) to complete this application. The module referred to in this application refers to a series of computer program instruction segments that can complete specific functions, and is used to describe the information recommendation program based on data comparison in the information based on data comparison. Recommend the implementation process in the device.
例如,参照图3所示,为本申请基于数据比较的信息推荐装置一实施例中的基于数据比较的信息推荐程序的程序模块示意图,该实施例中,基于数据比较的信息推荐程序可以被分割为第一获取模块10、比较模块20、第一传输模块30、第二获取模块40和第二传输模块50,示例性地:For example, referring to FIG. 3, a schematic diagram of program modules of an information recommendation program based on data comparison in an embodiment of the information recommendation device based on data comparison of this application. In this embodiment, the information recommendation program based on data comparison can be divided For the first acquisition module 10, the comparison module 20, the first transmission module 30, the second acquisition module 40, and the second transmission module 50, exemplarily:
第一获取模块10用于:获取目标用户的第一数据和比较用户的第二数据,所述第一数据和所述第二数据是有关预设主题的数据,且所述第一数据和所述第二数据是经过同态加密的数据;The first acquisition module 10 is configured to acquire first data of a target user and second data of a comparison user. The first data and the second data are data related to a preset theme, and the first data and the second data The second data is data that has been homomorphically encrypted;
比较模块20用于:通过同态操作比较所述第一数据和所述第二数据的大小,得到排序结果;The comparison module 20 is configured to compare the sizes of the first data and the second data through a homomorphic operation to obtain a sorting result;
第一传输模块30用于:从所述排序结果中获取所述目标用户的排序,以及将所述排序结果返回至所述目标用户;The first transmission module 30 is configured to: obtain the ranking of the target user from the ranking result, and return the ranking result to the target user;
第二获取模块40用于:若所述目标用户的排序为预设排序,获取与所述预设主题对应的推荐产品信息;The second acquiring module 40 is configured to: if the ranking of the target user is a preset ranking, acquire recommended product information corresponding to the preset theme;
第二传输模块50用于:向所述目标用户发送与所述预设主题对应的推荐产品信息。The second transmission module 50 is configured to send recommended product information corresponding to the preset theme to the target user.
上述第一获取模块10、比较模块20、第一传输模块30、第二获取模块40和第二传输模块50等程序模块被执行时所实现的功能或操作步骤与上述实施例大体相同,在此不再赘述。When the program modules such as the first acquisition module 10, the comparison module 20, the first transmission module 30, the second acquisition module 40, and the second transmission module 50 are executed, the functions or operation steps implemented by the program modules are substantially the same as those in the foregoing embodiment. No longer.
此外,本申请还提供一种计算机设备,包括:一个或多个处理器;存储器;一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个计算机程序配置用于执行一种基于数据比较的信息推荐方法,其中,所述基于数据比较的信息推荐方法包括:In addition, the present application also provides a computer device, including: one or more processors; a memory; one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be Executed by the one or more processors, and the one or more computer programs are configured to execute an information recommendation method based on data comparison, wherein the information recommendation method based on data comparison includes:
获取目标用户的第一数据和比较用户的第二数据,所述第一数据和所述第二数据是有关预设主题的数据,且所述第一数据和所述第二数据是经过同态加密的数据;Obtain the first data of the target user and the second data of the comparison user. The first data and the second data are data related to a preset theme, and the first data and the second data are homomorphic Encrypted data;
通过同态操作比较所述第一数据和所述第二数据的大小,得到排序结果;Comparing the sizes of the first data and the second data through a homomorphic operation to obtain a sorting result;
从所述排序结果中获取所述目标用户的排序,以及将所述排序结果返回至所述目标用户;Acquiring the ranking of the target user from the ranking result, and returning the ranking result to the target user;
若所述目标用户的排序为预设排序,获取与所述预设主题对应的推荐产品信息;If the ranking of the target user is a preset ranking, obtaining recommended product information corresponding to the preset theme;
向所述目标用户发送与所述预设主题对应的推荐产品信息。Sending recommended product information corresponding to the preset theme to the target user.
本申请计算机设备具体实施方式与上述基于数据比较的信息推荐装置和方法各实施例基本相同,在此不作累述。The specific implementation of the computer equipment of this application is basically the same as the foregoing embodiments of the information recommendation device and method based on data comparison, and will not be repeated here.
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质上存储有基于数据比较的信息推荐程序,所述基于数据比较的信息推荐程序可被一个或多个处理器执行,以实现如下操作:In addition, an embodiment of the present application also proposes a computer-readable storage medium that stores an information recommendation program based on data comparison, and the information recommendation program based on data comparison can be processed by one or more Executed to achieve the following operations:
获取目标用户的第一数据和比较用户的第二数据,所述第一数据和所述第二数据是有关预设主题的数据,且所述第一数据和所述第二数据是经过同态加密的数据;Obtain the first data of the target user and the second data of the comparison user. The first data and the second data are data related to a preset theme, and the first data and the second data are homomorphic Encrypted data;
通过同态操作比较所述第一数据和所述第二数据的大小,得到排序结果;Comparing the sizes of the first data and the second data through a homomorphic operation to obtain a sorting result;
从所述排序结果中获取所述目标用户的排序,以及将所述排序结果返回至所述目标用户;Acquiring the ranking of the target user from the ranking result, and returning the ranking result to the target user;
若所述目标用户的排序为预设排序,获取与所述预设主题对应的推荐产品信息;If the ranking of the target user is a preset ranking, obtaining recommended product information corresponding to the preset theme;
向所述目标用户发送与所述预设主题对应的推荐产品信息。Sending recommended product information corresponding to the preset theme to the target user.
本申请计算机可读存储介质,其中,所述存储介质为易失性存储介质或非易失性存储介质,具体实施方式与上述基于数据比较的信息推荐装置和方法各实施例基本相同,在此不作累述。The computer-readable storage medium of the present application, wherein the storage medium is a volatile storage medium or a non-volatile storage medium, and the specific implementation is basically the same as the foregoing embodiments of the information recommendation device and method based on data comparison. Not to be exhausted.
需要说明的是,上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。并且本文中的术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that the serial numbers of the above-mentioned embodiments of the present invention are only for description, and do not represent the superiority of the embodiments. And the terms "include", "include" or any other variants thereof in this article are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements not only includes those elements, but also includes The other elements listed may also include elements inherent to the process, device, article, or method. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, device, article or method that includes the element.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of the present invention essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , Magnetic disk, optical disk), including several instructions to make a terminal device (can be a mobile phone, computer, server, or network device, etc.) execute the method described in each embodiment of the present invention.
以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。The above are only the preferred embodiments of the present invention, and do not limit the scope of the present invention. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present invention, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of the present invention.

Claims (20)

  1. 一种基于数据比较的信息推荐方法,其中,所述方法包括:An information recommendation method based on data comparison, wherein the method includes:
    获取目标用户的第一数据和比较用户的第二数据,所述第一数据和所述第二数据是有关预设主题的数据,且所述第一数据和所述第二数据是经过同态加密的数据;Obtain the first data of the target user and the second data of the comparison user. The first data and the second data are data related to a preset theme, and the first data and the second data are homomorphic Encrypted data;
    通过同态操作比较所述第一数据和所述第二数据的大小,得到排序结果;Comparing the sizes of the first data and the second data through a homomorphic operation to obtain a sorting result;
    从所述排序结果中获取所述目标用户的排序,以及将所述排序结果返回至所述目标用户;Acquiring the ranking of the target user from the ranking result, and returning the ranking result to the target user;
    若所述目标用户的排序为预设排序,获取与所述预设主题对应的推荐产品信息;If the ranking of the target user is a preset ranking, obtaining recommended product information corresponding to the preset theme;
    向所述目标用户发送与所述预设主题对应的推荐产品信息。Sending recommended product information corresponding to the preset theme to the target user.
  2. 如权利要求1所述的基于数据比较的信息推荐方法,其中,所述获取与所述预设主题对应的推荐产品信息包括:8. The information recommendation method based on data comparison according to claim 1, wherein said obtaining recommended product information corresponding to said preset theme comprises:
    获取所述预设主题的相关信息;Acquiring relevant information about the preset theme;
    将所述预设主题的相关信息映射到预设BOW模型的目标词典,得到目标直方图特征向量,所述目标词典为通过训练样本进行聚类处理得到的;Mapping the relevant information of the preset theme to the target dictionary of the preset BOW model to obtain the target histogram feature vector, the target dictionary being obtained by clustering processing of training samples;
    将所述目标直方图特征向量输入至用于构建所述预设BOW模型的朴素贝叶斯分类器,通过所述朴素贝叶斯分类器对所述预设主题的相关信息进行分类,得到所述预设主题的相关信息的所属类别;The target histogram feature vector is input to the naive Bayes classifier used to construct the preset BOW model, and the relevant information of the preset topic is classified by the naive Bayes classifier to obtain all State the category of related information about the preset theme;
    获取所述预设主题的相关信息的所属类别对应的待推荐产品信息;Obtaining the product information to be recommended corresponding to the category of the related information of the preset theme;
    确定与所述预设主题的相关信息的所属类别对应的待推荐产品信息为所述预设主题对应的推荐产品信息。It is determined that the product information to be recommended corresponding to the category of the related information of the preset theme is the recommended product information corresponding to the preset theme.
  3. 如权利要求2所述的基于数据比较的信息推荐方法,其中,所述确定与所述预设主题的相关信息的所属类别对应的产品信息为所述预设主题对应的推荐产品信息包括:3. The information recommendation method based on data comparison according to claim 2, wherein the determining that the product information corresponding to the category of the related information of the preset theme is the recommended product information corresponding to the preset theme comprises:
    对所述预设主题的相关信息进行词频特征向量提取,得到词频向量;Performing word frequency feature vector extraction on the relevant information of the preset theme to obtain a word frequency vector;
    计算所述词频向量与所述待推荐产品信息的相似度;Calculating the similarity between the word frequency vector and the product information to be recommended;
    确定所述待推荐产品信息之中与所述词频向量的相似度大于预设相似度的产品信息为与所述预设主题对应的推荐产品信息。It is determined that among the product information to be recommended, the product information whose similarity with the word frequency vector is greater than the preset similarity is the recommended product information corresponding to the preset theme.
  4. 如权利要求2所述的基于数据比较的信息推荐方法,其中,所述将所述预设主题的相关信息映射到预设BOW模型的目标词典之前,所述方法还包括:3. The information recommendation method based on data comparison according to claim 2, wherein before the mapping the related information of the preset topic to the target dictionary of the preset BOW model, the method further comprises:
    对所述预设主题的相关信息进行文本处理,所述文本处理包括对所述预设主题的相关信息通过隐性马尔科夫模型进行分词处理,以及通过预设关键词提取算法对分词处理后的信息进行文本改写。Performing text processing on the related information of the preset topic, the text processing includes performing word segmentation processing on the related information of the preset topic through a hidden Markov model, and after word segmentation processing through a preset keyword extraction algorithm The information is rewritten.
  5. 如权利要求1至4中任一项所述的基于数据比较的信息推荐方法,其中,所述通过同态操作比较所述第一数据和所述第二数据的大小,得到排序结果包括:The information recommendation method based on data comparison according to any one of claims 1 to 4, wherein the comparing the sizes of the first data and the second data through a homomorphic operation to obtain a sorting result comprises:
    将所述第一数据与所述第二数据的负数相加,得到第一计算结果,若所述第一计算结果为正数,得到所述第一数据大于所述第二数据的排序结果,若所述第一计算结果为负数,得到所述第一数据小于所述第二数据的排序结果;或者Adding the negative numbers of the first data and the second data to obtain a first calculation result, and if the first calculation result is a positive number, obtaining a sorting result in which the first data is greater than the second data, If the first calculation result is a negative number, a sorting result in which the first data is smaller than the second data is obtained; or
    将所述第一数据的负数与所述第二数据相加,得到第二计算结果,所述第二计算结果为正数,得到所述第一数据小于所述第二数据的排序结果,若所述第二计算结果为负数,得到所述第一数据大于所述第二数据的排序结果。The negative number of the first data is added to the second data to obtain a second calculation result, the second calculation result is a positive number, and the sorting result that the first data is less than the second data is obtained, if The second calculation result is a negative number, and a sorting result in which the first data is greater than the second data is obtained.
  6. 如权利要求1至4中任一项所述的基于数据比较的信息推荐方法,其中,所述将所述排序结果返回至所述目标用户包括:The information recommendation method based on data comparison according to any one of claims 1 to 4, wherein the returning the ranking result to the target user comprises:
    利用接收到的所述目标用户发送的公钥对所述排序结果进行加密,得到加密排序结果;Encrypt the sorting result by using the received public key sent by the target user to obtain the encrypted sorting result;
    将所述加密排序结果返回至所述目标用户。Returning the encrypted sorting result to the target user.
  7. 一种基于数据比较的信息推荐装置,其中,所述装置包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的基于数据比较的信息推荐程序,所述基于数据比较的信息推荐程序被所述处理器执行时实现如下步骤:An information recommendation device based on data comparison, wherein the device includes a memory and a processor, the memory stores an information recommendation program based on data comparison that can be run on the processor, and the data comparison-based information recommendation program When the information recommendation program is executed by the processor, the following steps are implemented:
    获取目标用户的第一数据和比较用户的第二数据,所述第一数据和所述第二数据是有关预设主题的数据,且所述第一数据和所述第二数据是经过同态加密的数据;Obtain the first data of the target user and the second data of the comparison user. The first data and the second data are data related to a preset theme, and the first data and the second data are homomorphic Encrypted data;
    通过同态操作比较所述第一数据和所述第二数据的大小,得到排序结果;Comparing the sizes of the first data and the second data through a homomorphic operation to obtain a sorting result;
    从所述排序结果中获取所述目标用户的排序,以及将所述排序结果返回至所述目标用户;Acquiring the ranking of the target user from the ranking result, and returning the ranking result to the target user;
    若所述目标用户的排序为预设排序,获取与所述预设主题对应的推荐产品信息;If the ranking of the target user is a preset ranking, obtaining recommended product information corresponding to the preset theme;
    向所述目标用户发送与所述预设主题对应的推荐产品信息。Sending recommended product information corresponding to the preset theme to the target user.
  8. 如权利要求7所述的基于数据比较的信息推荐装置,其中,所述获取与所述预设主题对应的推荐产品信息包括:8. The information recommendation device based on data comparison according to claim 7, wherein said obtaining recommended product information corresponding to said preset theme comprises:
    获取所述预设主题的相关信息;Acquiring relevant information about the preset theme;
    将所述预设主题的相关信息映射到预设BOW模型的目标词典,得到目标直方图特征向量,所述目标词典为通过训练样本进行聚类处理得到的;Mapping the relevant information of the preset theme to the target dictionary of the preset BOW model to obtain the target histogram feature vector, the target dictionary being obtained by clustering processing of training samples;
    将所述目标直方图特征向量输入至用于构建所述预设BOW模型的朴素贝叶斯分类器,通过所述朴素贝叶斯分类器对所述预设主题的相关信息进行分类,得到所述预设主题的相关信息的所属类别;The target histogram feature vector is input to the naive Bayes classifier used to construct the preset BOW model, and the relevant information of the preset topic is classified by the naive Bayes classifier to obtain all State the category of related information about the preset theme;
    获取所述预设主题的相关信息的所属类别对应的待推荐产品信息;Obtaining the product information to be recommended corresponding to the category of the related information of the preset theme;
    确定与所述预设主题的相关信息的所属类别对应的待推荐产品信息为所述预设主题对应的推荐产品信息。It is determined that the product information to be recommended corresponding to the category of the related information of the preset theme is the recommended product information corresponding to the preset theme.
  9. 如权利要求7或8所述的基于数据比较的信息推荐装置,其中,所述通过同态操作比较所述第一数据和所述第二数据的大小,得到排序结果包括:The information recommendation device based on data comparison according to claim 7 or 8, wherein the comparing the sizes of the first data and the second data through a homomorphic operation to obtain a sorting result comprises:
    将所述第一数据与所述第二数据的负数相加,得到第一计算结果,若所述第一计算结果为正数,得到所述第一数据大于所述第二数据的排序结果,若所述第一计算结果为负数,得到所述第一数据小于所述第二数据的排序结果;或者Adding the negative numbers of the first data and the second data to obtain a first calculation result, and if the first calculation result is a positive number, obtaining a sorting result in which the first data is greater than the second data, If the first calculation result is a negative number, a sorting result in which the first data is smaller than the second data is obtained; or
    将所述第一数据的负数与所述第二数据相加,得到第二计算结果,所述第二计算结果为正数,得到所述第一数据小于所述第二数据的排序结果,若所述第二计算结果为负数,得到所述第一数据大于所述第二数据的排序结果。The negative number of the first data is added to the second data to obtain a second calculation result, the second calculation result is a positive number, and the sorting result that the first data is less than the second data is obtained, if The second calculation result is a negative number, and a sorting result in which the first data is greater than the second data is obtained.
  10. 一种计算机设备,包括:A computer device including:
    一个或多个处理器;One or more processors;
    存储器;Memory
    一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个计算机程序配置用于执行一种基于数据比较的信息推荐方法,其中,所述基于数据比较的信息推荐方法包括:One or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, and the one or more computer programs are configured to execute An information recommendation method based on data comparison, wherein the information recommendation method based on data comparison includes:
    获取目标用户的第一数据和比较用户的第二数据,所述第一数据和所述第二数据是有关预设主题的数据,且所述第一数据和所述第二数据是经过同态加密的数据;Obtain the first data of the target user and the second data of the comparison user. The first data and the second data are data related to a preset theme, and the first data and the second data are homomorphic Encrypted data;
    通过同态操作比较所述第一数据和所述第二数据的大小,得到排序结果;Comparing the sizes of the first data and the second data through a homomorphic operation to obtain a sorting result;
    从所述排序结果中获取所述目标用户的排序,以及将所述排序结果返回至所述目标用户;Acquiring the ranking of the target user from the ranking result, and returning the ranking result to the target user;
    若所述目标用户的排序为预设排序,获取与所述预设主题对应的推荐产品信息;If the ranking of the target user is a preset ranking, obtaining recommended product information corresponding to the preset theme;
    向所述目标用户发送与所述预设主题对应的推荐产品信息。Sending recommended product information corresponding to the preset theme to the target user.
  11. 根据权利要求10所述的计算机设备,其中,所述获取与所述预设主题对应的推荐产品信息包括:The computer device according to claim 10, wherein said obtaining recommended product information corresponding to said preset theme comprises:
    获取所述预设主题的相关信息;Acquiring relevant information about the preset theme;
    将所述预设主题的相关信息映射到预设BOW模型的目标词典,得到目标直方图特征向量,所述目标词典为通过训练样本进行聚类处理得到的;Mapping the relevant information of the preset theme to the target dictionary of the preset BOW model to obtain the target histogram feature vector, the target dictionary being obtained by clustering processing of training samples;
    将所述目标直方图特征向量输入至用于构建所述预设BOW模型的朴素贝叶斯分类器,通过所述朴素贝叶斯分类器对所述预设主题的相关信息进行分类,得到所述预设主题的相关信息的所属类别;The target histogram feature vector is input to the naive Bayes classifier used to construct the preset BOW model, and the relevant information of the preset topic is classified by the naive Bayes classifier to obtain all State the category of related information about the preset theme;
    获取所述预设主题的相关信息的所属类别对应的待推荐产品信息;Obtaining the product information to be recommended corresponding to the category of the related information of the preset theme;
    确定与所述预设主题的相关信息的所属类别对应的待推荐产品信息为所述预设主题对应的推荐产品信息。It is determined that the product information to be recommended corresponding to the category of the related information of the preset theme is the recommended product information corresponding to the preset theme.
  12. 根据权利要求11所述的计算机设备,其中,所述确定与所述预设主题的相关信息的所属类别对应的产品信息为所述预设主题对应的推荐产品信息包括:The computer device according to claim 11, wherein the determining that the product information corresponding to the category of the related information of the preset theme is the recommended product information corresponding to the preset theme comprises:
    对所述预设主题的相关信息进行词频特征向量提取,得到词频向量;Performing word frequency feature vector extraction on the relevant information of the preset theme to obtain a word frequency vector;
    计算所述词频向量与所述待推荐产品信息的相似度;Calculating the similarity between the word frequency vector and the product information to be recommended;
    确定所述待推荐产品信息之中与所述词频向量的相似度大于预设相似度的产品信息为与所述预设主题对应的推荐产品信息。It is determined that among the product information to be recommended, the product information whose similarity with the word frequency vector is greater than the preset similarity is the recommended product information corresponding to the preset theme.
  13. 根据权利要求11所述的计算机设备,其中,所述将所述预设主题的相关信息映射到预设BOW模型的目标词典之前,所述方法还包括:11. The computer device according to claim 11, wherein before the mapping the related information of the preset theme to the target dictionary of the preset BOW model, the method further comprises:
    对所述预设主题的相关信息进行文本处理,所述文本处理包括对所述预设主题的相关信息通过隐性马尔科夫模型进行分词处理,以及通过预设关键词提取算法对分词处理后的信息进行文本改写。Performing text processing on the related information of the preset topic, the text processing includes performing word segmentation processing on the related information of the preset topic through a hidden Markov model, and after word segmentation processing through a preset keyword extraction algorithm The information is rewritten.
  14. 根据权利要求10至13中任一项所述的计算机设备,其中,所述通过同态操作比较所述第一数据和所述第二数据的大小,得到排序结果包括:The computer device according to any one of claims 10 to 13, wherein the comparing the sizes of the first data and the second data through a homomorphic operation to obtain a sorting result comprises:
    将所述第一数据与所述第二数据的负数相加,得到第一计算结果,若所述第一计算结果为正数,得到所述第一数据大于所述第二数据的排序结果,若所述第一计算结果为负数,得到所述第一数据小于所述第二数据的排序结果;或者Adding the negative numbers of the first data and the second data to obtain a first calculation result, and if the first calculation result is a positive number, obtaining a sorting result in which the first data is greater than the second data, If the first calculation result is a negative number, a sorting result in which the first data is smaller than the second data is obtained; or
    将所述第一数据的负数与所述第二数据相加,得到第二计算结果,所述第二计算结果为正数,得到所述第一数据小于所述第二数据的排序结果,若所述第二计算结果为负数,得到所述第一数据大于所述第二数据的排序结果。The negative number of the first data is added to the second data to obtain a second calculation result, the second calculation result is a positive number, and the sorting result that the first data is less than the second data is obtained, if The second calculation result is a negative number, and a sorting result in which the first data is greater than the second data is obtained.
  15. 根据权利要求10至13中任一项所述的计算机设备,其中,所述将所述排序结果返回至所述目标用户包括:The computer device according to any one of claims 10 to 13, wherein the returning the sorting result to the target user comprises:
    利用接收到的所述目标用户发送的公钥对所述排序结果进行加密,得到加密排序结果;Encrypt the sorting result by using the received public key sent by the target user to obtain the encrypted sorting result;
    将所述加密排序结果返回至所述目标用户。Returning the encrypted sorting result to the target user.
  16. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现基于数据比较的信息推荐方法,其中,所述基于数据比较的信息推荐方法包括以下步骤:A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, an information recommendation method based on data comparison is implemented, wherein the information recommendation method based on data comparison includes The following steps:
    获取目标用户的第一数据和比较用户的第二数据,所述第一数据和所述第二数据是有关预设主题的数据,且所述第一数据和所述第二数据是经过同态加密的数据;Obtain the first data of the target user and the second data of the comparison user. The first data and the second data are data related to a preset theme, and the first data and the second data are homomorphic Encrypted data;
    通过同态操作比较所述第一数据和所述第二数据的大小,得到排序结果;Comparing the sizes of the first data and the second data through a homomorphic operation to obtain a sorting result;
    从所述排序结果中获取所述目标用户的排序,以及将所述排序结果返回至所述目标用户;Acquiring the ranking of the target user from the ranking result, and returning the ranking result to the target user;
    若所述目标用户的排序为预设排序,获取与所述预设主题对应的推荐产品信息;If the ranking of the target user is a preset ranking, obtaining recommended product information corresponding to the preset theme;
    向所述目标用户发送与所述预设主题对应的推荐产品信息。Sending recommended product information corresponding to the preset theme to the target user.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述获取与所述预设主题对应的推荐产品信息包括:The computer-readable storage medium according to claim 16, wherein said obtaining recommended product information corresponding to said preset theme comprises:
    获取所述预设主题的相关信息;Acquiring relevant information about the preset theme;
    将所述预设主题的相关信息映射到预设BOW模型的目标词典,得到目标直方图特征向量,所述目标词典为通过训练样本进行聚类处理得到的;Mapping the relevant information of the preset theme to the target dictionary of the preset BOW model to obtain the target histogram feature vector, the target dictionary being obtained by clustering processing of training samples;
    将所述目标直方图特征向量输入至用于构建所述预设BOW模型的朴素贝叶斯分类器,通过所述朴素贝叶斯分类器对所述预设主题的相关信息进行分类,得到所述预设主题的相关信息的所属类别;The target histogram feature vector is input to the naive Bayes classifier used to construct the preset BOW model, and the relevant information of the preset topic is classified by the naive Bayes classifier to obtain all State the category of related information about the preset theme;
    获取所述预设主题的相关信息的所属类别对应的待推荐产品信息;Obtaining the product information to be recommended corresponding to the category of the related information of the preset theme;
    确定与所述预设主题的相关信息的所属类别对应的待推荐产品信息为所述预设主题对应的推荐产品信息。It is determined that the product information to be recommended corresponding to the category of the related information of the preset theme is the recommended product information corresponding to the preset theme.
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述确定与所述预设主题的相关信息的所属类别对应的产品信息为所述预设主题对应的推荐产品信息包括:18. The computer-readable storage medium of claim 17, wherein the determining that the product information corresponding to the category of the related information of the preset theme is the recommended product information corresponding to the preset theme comprises:
    对所述预设主题的相关信息进行词频特征向量提取,得到词频向量;Performing word frequency feature vector extraction on the relevant information of the preset theme to obtain a word frequency vector;
    计算所述词频向量与所述待推荐产品信息的相似度;Calculating the similarity between the word frequency vector and the product information to be recommended;
    确定所述待推荐产品信息之中与所述词频向量的相似度大于预设相似度的产品信息为与所述预设主题对应的推荐产品信息。It is determined that among the product information to be recommended, the product information whose similarity with the word frequency vector is greater than the preset similarity is the recommended product information corresponding to the preset theme.
  19. 根据权利要求17所述的计算机可读存储介质,其中,所述将所述预设主题的相关信息映射到预设BOW模型的目标词典之前,所述方法还包括:18. The computer-readable storage medium according to claim 17, wherein before the mapping the relevant information of the preset theme to the target dictionary of the preset BOW model, the method further comprises:
    对所述预设主题的相关信息进行文本处理,所述文本处理包括对所述预设主题的相关信息通过隐性马尔科夫模型进行分词处理,以及通过预设关键词提取算法对分词处理后的信息进行文本改写。Performing text processing on the related information of the preset topic, the text processing includes performing word segmentation processing on the related information of the preset topic through a hidden Markov model, and after word segmentation processing through a preset keyword extraction algorithm The information is rewritten.
  20. 根据权利要求16至19中任一项所述的计算机可读存储介质,其中,所述通过同态操作比较所述第一数据和所述第二数据的大小,得到排序结果包括:The computer-readable storage medium according to any one of claims 16 to 19, wherein the comparing the sizes of the first data and the second data through a homomorphic operation to obtain a sorting result comprises:
    将所述第一数据与所述第二数据的负数相加,得到第一计算结果,若所述第一计算结果为正数,得到所述第一数据大于所述第二数据的排序结果,若所述第一计算结果为负数,得到所述第一数据小于所述第二数据的排序结果;或者Adding the negative numbers of the first data and the second data to obtain a first calculation result, and if the first calculation result is a positive number, obtaining a sorting result in which the first data is greater than the second data, If the first calculation result is a negative number, a sorting result in which the first data is smaller than the second data is obtained; or
    将所述第一数据的负数与所述第二数据相加,得到第二计算结果,所述第二计算结果为正数,得到所述第一数据小于所述第二数据的排序结果,若所述第二计算结果为负数,得到所述第一数据大于所述第二数据的排序结果。The negative number of the first data is added to the second data to obtain a second calculation result, the second calculation result is a positive number, and the sorting result that the first data is less than the second data is obtained, if The second calculation result is a negative number, and a sorting result in which the first data is greater than the second data is obtained.
PCT/CN2020/086286 2019-07-05 2020-04-23 Data comparison-based information recommendation method and device, and storage medium WO2021004124A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910605697.8A CN110457574A (en) 2019-07-05 2019-07-05 Information recommendation method, device and the storage medium compared based on data
CN201910605697.8 2019-07-05

Publications (1)

Publication Number Publication Date
WO2021004124A1 true WO2021004124A1 (en) 2021-01-14

Family

ID=68482310

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/086286 WO2021004124A1 (en) 2019-07-05 2020-04-23 Data comparison-based information recommendation method and device, and storage medium

Country Status (2)

Country Link
CN (1) CN110457574A (en)
WO (1) WO2021004124A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113987369A (en) * 2021-12-27 2022-01-28 北京多氪信息科技有限公司 Information display method, device, equipment and medium for concerned user

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457574A (en) * 2019-07-05 2019-11-15 深圳壹账通智能科技有限公司 Information recommendation method, device and the storage medium compared based on data
CN111275091B (en) * 2020-01-16 2024-05-10 平安科技(深圳)有限公司 Text conclusion intelligent recommendation method and device and computer readable storage medium
CN113708930B (en) * 2021-10-20 2022-01-21 杭州趣链科技有限公司 Data comparison method, device, equipment and medium for private data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102984156A (en) * 2012-11-30 2013-03-20 无锡赛思汇智科技有限公司 Verifiable distributed privacy data comparing and sorting method and device
CN103064931A (en) * 2012-12-21 2013-04-24 清华大学 Verifiable privacy data comparison and ranking query method
CN104796475A (en) * 2015-04-24 2015-07-22 苏州大学 Social recommendation method based on homomorphic encryption
CN108319734A (en) * 2018-04-11 2018-07-24 中国计量大学 A kind of product feature structure tree method for auto constructing based on linear combiner
CN110457574A (en) * 2019-07-05 2019-11-15 深圳壹账通智能科技有限公司 Information recommendation method, device and the storage medium compared based on data

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831234B (en) * 2012-08-31 2015-04-22 北京邮电大学 Personalized news recommendation device and method based on news content and theme feature
CN107203530A (en) * 2016-03-16 2017-09-26 北大方正集团有限公司 Information recommendation method
CN106202331B (en) * 2016-07-01 2019-08-30 中国传媒大学 The recommender system of secret protection and the operational method based on the recommender system by different level
CN109117442B (en) * 2017-06-23 2023-03-24 腾讯科技(深圳)有限公司 Application recommendation method and device
CN107506459A (en) * 2017-08-29 2017-12-22 环球智达科技(北京)有限公司 A kind of film recommendation method based on film similarity
CN109840321B (en) * 2017-11-29 2022-02-01 腾讯科技(深圳)有限公司 Text recommendation method and device and electronic equipment
CN109063509A (en) * 2018-08-07 2018-12-21 上海海事大学 It is a kind of that encryption method can search for based on keywords semantics sequence
CN109271806A (en) * 2018-08-14 2019-01-25 同济大学 Research on Privacy Preservation Mechanism based on user behavior
CN109726747B (en) * 2018-12-20 2021-09-28 西安电子科技大学 Data fusion ordering method based on social network recommendation platform

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102984156A (en) * 2012-11-30 2013-03-20 无锡赛思汇智科技有限公司 Verifiable distributed privacy data comparing and sorting method and device
CN103064931A (en) * 2012-12-21 2013-04-24 清华大学 Verifiable privacy data comparison and ranking query method
CN104796475A (en) * 2015-04-24 2015-07-22 苏州大学 Social recommendation method based on homomorphic encryption
CN108319734A (en) * 2018-04-11 2018-07-24 中国计量大学 A kind of product feature structure tree method for auto constructing based on linear combiner
CN110457574A (en) * 2019-07-05 2019-11-15 深圳壹账通智能科技有限公司 Information recommendation method, device and the storage medium compared based on data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113987369A (en) * 2021-12-27 2022-01-28 北京多氪信息科技有限公司 Information display method, device, equipment and medium for concerned user

Also Published As

Publication number Publication date
CN110457574A (en) 2019-11-15

Similar Documents

Publication Publication Date Title
WO2021004124A1 (en) Data comparison-based information recommendation method and device, and storage medium
US11455427B2 (en) Systems, methods, and apparatuses for implementing a privacy-preserving social media data outsourcing model
US11379608B2 (en) Monitoring entity behavior using organization specific security policies
US11349873B2 (en) User model-based data loss prevention
US11275900B2 (en) Systems and methods for automatically assigning one or more labels to discussion topics shown in online forums on the dark web
US11595430B2 (en) Security system using pseudonyms to anonymously identify entities and corresponding security risk related behaviors
Ramanathan et al. phishGILLNET—phishing detection methodology using probabilistic latent semantic analysis, AdaBoost, and co-training
WO2021212968A1 (en) Unstructured data processing method, apparatus, and device, and medium
US8434126B1 (en) Methods and systems for aiding parental control policy decisions
US11483319B2 (en) Security model
US10956476B2 (en) Entropic classification of objects
US20130212111A1 (en) System and method for text categorization based on ontologies
US9043247B1 (en) Systems and methods for classifying documents for data loss prevention
US11250256B2 (en) Binary linear classification
Verma et al. Cybersecurity analytics
US9552494B1 (en) Protected indexing and querying of large sets of textual data
EP3161676A1 (en) Identification of intents from query reformulations in search
US20240320276A1 (en) Using a machine learning system to process a corpus of documents associated with a user to determine a user-specific and/or process-specific consequence index
Park et al. Ontological detection of phishing emails
Rasheed et al. Adversarial attacks on featureless deep learning malicious urls detection
Jia et al. 10 security and privacy problems in large foundation models
Moussaileb et al. Watch out! Doxware on the way…
Zobaed AI-Driven Confidential Computing across Edge-to-Cloud Continuum
Alhindi et al. Preventing Data Loss by Harnessing Semantic Similarity and Relevance.
Umapathy et al. PPHE-automatic detection of sensitive attributes in a privacy preserved Hadoop environment using data mining techniques

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20837464

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 16/05/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20837464

Country of ref document: EP

Kind code of ref document: A1