WO2018068648A1 - 一种信息匹配方法及相关装置 - Google Patents

一种信息匹配方法及相关装置 Download PDF

Info

Publication number
WO2018068648A1
WO2018068648A1 PCT/CN2017/103858 CN2017103858W WO2018068648A1 WO 2018068648 A1 WO2018068648 A1 WO 2018068648A1 CN 2017103858 W CN2017103858 W CN 2017103858W WO 2018068648 A1 WO2018068648 A1 WO 2018068648A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
matching degree
branch
node
user evaluation
Prior art date
Application number
PCT/CN2017/103858
Other languages
English (en)
French (fr)
Inventor
张一昌
赵争超
张建伟
蔡仁贵
林君
肖谦
潘林林
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2018068648A1 publication Critical patent/WO2018068648A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Definitions

  • the present application relates to the field of computer technology, and in particular, to an information matching method and related apparatus.
  • Information matching technology is a commonly used computer technology used to obtain the degree of matching between multiple pieces of information.
  • Information matching technology is widely used in various Internet scenarios. For example, for a plurality of evaluation information input by a buyer on an e-commerce website, the information matching technology obtains the matching degree between each evaluation information and the merchant subscription information, thereby being able to quickly Targeting reviews that are of interest to the business.
  • a commonly used information matching method includes: dividing a plurality of pieces of information to be matched, judging whether the same word segmentation result exists, and calculating a matching degree between the pieces of information according to the same word segmentation result.
  • the above information matching method can only judge whether there is the same word segmentation result among multiple pieces of information, and cannot reflect whether there is correlation between pieces of information.
  • the evaluation information input by the buyer is “not good service”
  • the merchant subscription information is “customer service attitude”.
  • “service is not good” and “customer service attitude” are all describing the service, they have certain relevance, but according to The matching degree calculated by the above information matching method is 0, and the matching accuracy is obviously low.
  • the technical problem to be solved by the present application is to provide an information matching method and related apparatus, so that the calculated matching degree can reflect the correlation between the information, thereby improving the matching accuracy.
  • the application provides a method for information matching, including:
  • the label category tree includes at least two layers, each layer includes at least one label node, and a parent label node of each label node is a parent category of the label node;
  • the matching degree of the merchant subscription information and the user evaluation information is calculated according to at least a matching degree of the first branch and the second branch at each layer.
  • the matching degree between the merchant subscription information and the user evaluation information is calculated according to at least a matching degree of the first branch and the second branch at each layer, including:
  • the first matching degree is calculated according to at least the matching degree of the first branch and the second branch in each layer, including:
  • the first matching degree is calculated according to at least a matching degree of the first branch and the second branch at each layer, and a weight value of each layer.
  • the method further includes:
  • the matching degree of the user evaluation information and the merchant subscription information is calculated according to at least a matching degree of the first branch and the second branch at each layer and the approximate degree.
  • the method further includes:
  • the matching degree between the user evaluation information and the merchant subscription information is calculated according to at least a matching degree of the first branch and the second branch at each layer and the approximate degree, including:
  • the matching degree of the user evaluation information and the merchant subscription information is calculated according to at least the matching degree of the first branch and the second branch in each layer respectively;
  • the degree of approximation is less than the first threshold, the degree of matching between the user evaluation information and the merchant subscription information is zero.
  • a statistical model after training including:
  • obtaining the category corresponding to the user evaluation information including:
  • Obtaining a scene category tree where the scene category tree includes at least two layers, each layer includes at least one scene node, and a parent scene node of each scene node is a parent category of the scene node;
  • the scene node serves as a category corresponding to the user evaluation information.
  • the method further includes:
  • the method further includes:
  • Machine learning is performed according to the degree of matching between the plurality of tag nodes, and the tag category tree is generated or corrected according to the result of machine learning.
  • the application also provides an information matching method, including:
  • the method further includes:
  • Calculating the user rating based at least on the approximation of the sentiment index of the user evaluation information and the target sentiment index The match between the price information and the merchant subscription information, including:
  • calculating, according to the approximation degree and the initial matching degree, the matching degree between the user evaluation information and the merchant subscription information including:
  • the degree of approximation is less than the first threshold, the degree of matching between the user evaluation information and the merchant subscription information is zero.
  • a statistical model after training including:
  • obtaining the category corresponding to the user evaluation information including:
  • Obtaining a scene category tree where the scene category tree includes at least two layers, each layer includes at least one scene node, and a parent scene node of each scene node is a parent category of the scene node;
  • the scene node serves as a category corresponding to the user evaluation information.
  • the method further includes:
  • the application also provides a method for inputting information, including:
  • the client obtains user evaluation information or merchant subscription information input by the user;
  • the client sends the user evaluation information or merchant subscription information to a computing unit, and the computing unit is configured to calculate a matching degree of the user evaluation information and the merchant subscription information.
  • the application also provides an information matching method, including:
  • the label category tree includes at least two layers, each layer includes at least one label node, and a parent label node of each label node is a parent category of the label node;
  • the label node of the lowest layer of the first branch matches the content of the first information, and the label node of the lowest layer of the second branch Matching the content of the second information;
  • the matching degree of the first information and the second information is calculated according to at least a matching degree of the first branch and the second branch in each layer.
  • the matching degree of the first information and the second information is calculated according to at least a matching degree of the first branch and the second branch in each layer, including:
  • the first matching degree is calculated according to at least the matching degree of the first branch and the second branch in each layer, including:
  • the first matching degree is calculated according to at least a matching degree of the first branch and the second branch at each layer, and a weight value of each layer.
  • the method further includes:
  • the matching degree of the first information and the second information is calculated according to at least a matching degree of the first branch and the second branch at each layer and the approximate degree.
  • the method further includes:
  • the matching degree of the first information and the second information is calculated according to the matching degree of the first branch and the second branch in each layer, and the degree of matching, including:
  • the matching degree of the first information and the second information is calculated according to at least a matching degree of the first branch and the second branch in each layer respectively;
  • the matching degree of the first information and the second information is 0.
  • a statistical model after training including:
  • obtaining the category corresponding to the first information including:
  • Obtaining a scene category tree where the scene category tree includes at least two layers, each layer includes at least one scene node, and a parent scene node of each scene node is a parent category of the scene node;
  • the scene node serves as a category corresponding to the first information.
  • the training feature of the trained statistical model includes a word segmentation result of the input information
  • the method further includes: segmenting the first information to obtain a word segmentation result of the first information;
  • Calculating the sentiment index of the first information according to the statistical model comprising: inputting a word segmentation result of the first information into the statistical model, to obtain an sentiment index of the first information.
  • the word segmentation result of the input information is a word segmentation result obtained by segmenting each two adjacent characters in the input information
  • the segmentation of the first information includes: segmenting each two adjacent characters in the first information.
  • the training feature of the trained statistical model further includes an emotional feature of the context
  • the method also includes extracting an emotional feature of a context of the first information
  • Inputting the word segmentation result of the first information into the statistical model, and obtaining an sentiment index of the first information comprising: inputting a word segmentation result of the first information and an emotional feature of a context of the first information To the statistical model, an sentiment index of the first information is obtained.
  • the emotional features of the context include any one or more of the following:
  • the sentiment index of the previous sentence, the topic similarity between the previous sentence and the current sentence, the overall sentiment distribution above, and the emotional distribution of at least one related sentence in the above, the at least one related sentence has a similarity to the topic of the current sentence. Two thresholds.
  • the trained statistical model includes a first statistical model and a second statistical model after training, the training feature of the first statistical model includes a word segmentation result of the input information, and the training feature of the second statistical model Includes emotional characteristics of the context.
  • the trained statistical model is a maximum entropy model after training.
  • the method further includes:
  • the method further includes:
  • Machine learning is performed according to the degree of matching between the plurality of tag nodes, and the tag category tree is generated or corrected according to the result of machine learning.
  • the application also provides an information matching device, including:
  • the information obtaining unit is configured to obtain the merchant subscription information and the user evaluation information to be matched;
  • a category tree obtaining unit configured to label a category tree
  • the label category tree includes at least two layers, each layer includes at least one label node, and a parent label node of each label node is a parent category of the label node;
  • a branch obtaining unit configured to obtain, from the tag category tree, a first branch and a second branch, wherein a label node of a lowest layer of the first branch matches a content of the user evaluation information, and the second branch The lowest level tag node matches the content of the merchant subscription information;
  • the matching degree calculation unit is configured to calculate a matching degree of the merchant subscription information and the user evaluation information according to at least a matching degree of the first branch and the second branch at each layer.
  • the matching degree calculation unit is configured to calculate a first matching degree according to at least a matching degree of the first branch and the second branch in each layer, at least according to the first matching degree. And calculating a matching degree of the merchant subscription information and the user evaluation information.
  • the matching degree calculating unit is specifically configured to use, according to at least the first branch, according to a matching degree of the first branch and the second branch respectively corresponding to each layer.
  • the first matching degree is calculated according to the matching degree of the second branch in each layer and the weight value of each layer.
  • it also includes:
  • a model acquisition unit configured to acquire a statistical model after training
  • An emotion calculation unit configured to calculate an emotional index of the user evaluation information according to the statistical model
  • An approximation calculation unit configured to calculate an approximation degree of the sentiment index of the user evaluation information and the target sentiment index
  • the matching degree calculation unit is configured to calculate the user evaluation information and the merchant subscription information according to at least a matching degree of the first branch and the second branch at each layer and the approximate degree. suitability.
  • the sentiment calculation unit is further configured to calculate an sentiment index of the merchant subscription information according to the statistical model, and the sentiment index of the merchant subscription information is used as the target sentiment index.
  • the matching degree calculation unit is specifically used to:
  • the matching degree of the user evaluation information and the merchant subscription information is calculated according to at least the matching degree of the first branch and the second branch in each layer respectively;
  • the degree of approximation is less than the first threshold, the degree of matching between the user evaluation information and the merchant subscription information is zero.
  • the model obtaining unit is specifically configured to obtain a category corresponding to the user evaluation information, and obtain a trained statistical model corresponding to the category.
  • the model obtaining unit when acquiring the category corresponding to the user evaluation information, is specifically configured to:
  • Obtaining a scene category tree where the scene category tree includes at least two layers, each layer includes at least one scene node, and a parent scene node of each scene node is a parent category of the scene node;
  • the scene node serves as a category corresponding to the user evaluation information.
  • the method further includes: a word vector obtaining unit, configured to acquire a word vector of the user evaluation information and a word vector of the merchant subscription information;
  • the matching degree calculation unit is further configured to calculate a matching degree between the word vector of the user evaluation information and the word vector of the merchant subscription information as a second matching degree;
  • the matching degree calculation unit is specifically configured to use at least according to the matching degree of the user evaluation information and the merchant subscription information according to the matching degree of the first branch and the second branch respectively in each layer. Calculating a matching degree between the user evaluation information and the merchant subscription information, where the first branch and the second branch respectively correspond to the matching degree and the second matching degree in each layer.
  • it also includes:
  • a correcting unit configured to acquire a matching degree between the plurality of tag nodes in the tag category tree, perform machine learning according to the matching degree between the plurality of tag nodes, and generate or correct the tag according to the result of the machine learning Category tree.
  • the application also provides an information matching device, including:
  • the information obtaining unit is configured to obtain the merchant subscription information and the user evaluation information to be matched;
  • a model acquisition unit configured to acquire a statistical model after training
  • An emotion calculation unit configured to calculate an emotional index of the user evaluation information according to the statistical model
  • the matching degree calculation unit is configured to calculate a matching degree of the user evaluation information and the merchant subscription information according to at least an approximation degree of the sentiment index of the user evaluation information and the target sentiment index.
  • it also includes:
  • a matching degree obtaining unit configured to acquire an initial matching degree between the user evaluation information and the merchant evaluation information
  • the matching degree calculating unit is specifically configured to: at least according to the approximation Degrees and the initial matching degree, and the degree of matching between the user evaluation information and the merchant subscription information is calculated.
  • the matching degree calculation unit is specifically configured to:
  • the degree of approximation is less than the first threshold, the degree of matching between the user evaluation information and the merchant subscription information is zero.
  • the model obtaining unit is specifically configured to obtain a category corresponding to the user evaluation information, and obtain a trained statistical model corresponding to the category.
  • the model obtaining unit when acquiring the category corresponding to the user evaluation information, is specifically configured to:
  • Obtaining a scene category tree where the scene category tree includes at least two layers, each layer includes at least one scene node, and a parent scene node of each scene node is a parent category of the scene node;
  • the scene node serves as a category corresponding to the user evaluation information.
  • the sentiment calculation unit is further configured to calculate an sentiment index of the merchant subscription information according to the statistical model, and use an sentiment index of the merchant subscription information as the target sentiment index.
  • the application also provides a client, including:
  • the information obtaining unit is configured to obtain user evaluation information or merchant subscription information input by the user;
  • a sending unit configured to send the user evaluation information or the merchant subscription information to the computing unit, where the calculating unit is configured to calculate a matching degree of the user evaluation information and the merchant subscription information.
  • the application also provides an information matching device, including:
  • An information acquiring unit configured to acquire first information and second information to be matched
  • a category tree obtaining unit configured to label a category tree
  • the label category tree includes at least two layers, each layer includes at least one label node, and a parent label node of each label node is a parent category of the label node;
  • a branch obtaining unit configured to obtain, from the tag category tree, a first branch and a second branch, wherein a label node of a lowest layer of the first branch matches a content of the first information, and the second branch The lowest level tag node matches the content of the second information;
  • the matching degree calculation unit is configured to calculate a matching degree of the first information and the second information according to at least a matching degree corresponding to each of the first branch and the second branch in each layer.
  • the matching degree calculation unit is configured to calculate a first matching degree according to at least a matching degree of each of the first branch and the second branch in each layer; at least according to the first matching degree And calculating a matching degree of the first information and the second information.
  • the matching degree calculating unit is specifically configured to use, according to at least the first branch, according to a matching degree of the first branch and the second branch respectively corresponding to each layer.
  • the first matching degree is calculated according to the matching degree of the second branch in each layer and the weight value of each layer.
  • it also includes:
  • a model acquisition unit configured to acquire a statistical model after training
  • An emotion calculation unit configured to calculate an emotional index of the first information according to the statistical model
  • An approximation calculation unit configured to calculate an approximation degree of the sentiment index of the first information and the target sentiment index
  • the matching degree calculating unit is specifically configured to use, at least, the matching degree of the first information and the second information is calculated according to the matching degree of the first branch and the second branch in each layer respectively. And matching degree of the first information and the second information is calculated according to a matching degree of the first branch and the second branch in each layer and the approximate degree.
  • the sentiment calculation unit is further configured to calculate an sentiment index of the second information according to the statistical model, and an sentiment index of the second information is used as the target sentiment index.
  • the matching degree calculation unit is specifically used to:
  • the matching degree of the first information and the second information is calculated according to at least a matching degree of the first branch and the second branch in each layer respectively;
  • the matching degree of the first information and the second information is 0.
  • the model obtaining unit is configured to acquire a category corresponding to the first information, and obtain a trained statistical model corresponding to the category.
  • the model obtaining unit when acquiring the category corresponding to the first information, is specifically configured to:
  • Obtaining a scene category tree where the scene category tree includes at least two layers, each layer includes at least one scene node, and a parent scene node of each scene node is a parent category of the scene node;
  • the scene node serves as a category corresponding to the first information.
  • the training feature of the trained statistical model includes a word segmentation result of the input information
  • the device further includes: a word segmentation unit, configured to perform segmentation on the first information to obtain a word segmentation result of the first information;
  • the emotion calculation unit is specifically configured to input the word segmentation result of the first information into the statistical model to obtain an emotion index of the first information.
  • the word segmentation result of the input information is a word segmentation result obtained by segmenting each two adjacent characters in the input information
  • the word segmentation unit is specifically configured to perform word segmentation on every two adjacent characters in the first information.
  • the training feature of the trained statistical model further includes an emotional feature of the context
  • the device further includes: an emotion extraction unit, configured to extract an emotional feature of a context of the first information;
  • the emotion calculating unit is specifically configured to: the word segmentation result of the first information and the first information
  • An emotional feature of the context is input to the statistical model to obtain an emotional index of the first information.
  • the emotional features of the context include any one or more of the following:
  • the sentiment index of the previous sentence, the topic similarity between the previous sentence and the current sentence, the overall sentiment distribution above, and the emotional distribution of at least one related sentence in the above, the at least one related sentence has a similarity to the topic of the current sentence. Two thresholds.
  • the trained statistical model includes a first statistical model and a second statistical model after training, the training feature of the first statistical model includes a word segmentation result of the input information, and the training feature of the second statistical model Includes emotional characteristics of the context.
  • the trained statistical model is a maximum entropy model after training.
  • the method further includes: a word vector obtaining unit, configured to acquire a word vector of the first information and a word vector of the second information;
  • the matching degree calculation unit is further configured to calculate a matching degree of the word vector of the first information and the word vector of the second information as a second matching degree;
  • the matching degree calculating unit is specifically configured to use at least according to the Calculating a matching degree between the first information and the second information, where the first branch and the second branch respectively correspond to each other in a matching degree and a second matching degree.
  • the method further includes: a correction unit, configured to acquire a matching degree between the plurality of label nodes in the label category tree, and perform machine learning according to the matching degree between the plurality of label nodes, according to machine learning The result is generated or corrected for the tag category tree.
  • a correction unit configured to acquire a matching degree between the plurality of label nodes in the label category tree, and perform machine learning according to the matching degree between the plurality of label nodes, according to machine learning The result is generated or corrected for the tag category tree.
  • the first information and the second information are matched, the first information and the second information are not directly matched after the word segmentation, but the first information is obtained from the tag category tree.
  • the first branch and the second branch corresponding to the second information.
  • the label node of the lowest layer of the first branch matches the content of the first information
  • the parent tag node of each tag node in the tag category tree is the parent category of the tag node, and thus the The first branch includes not only a tag node that matches the content of the first information, but also a layer-by-layer parent category of the matched tag node
  • the second branch includes not only matching the content of the second information.
  • the label node further includes a layer-by-layer parent category of the matched label node, and therefore, the first information and the second information calculated according to the matching degree of the first branch and the second branch at each layer respectively
  • the matching degree of the information not only reflects the matching degree of the first information and the second information, but also reflects the matching degree of the layer-by-layer parent category of the first information and the second information, which is equivalent to reflecting the first information and the second information.
  • the association between the layer-by-layer parent categories improves the matching accuracy.
  • FIG. 1 is a schematic flow chart of an embodiment of a method provided by the present application.
  • FIG. 2 is a schematic diagram of a tag category tree provided by the present application.
  • FIG. 3 is a schematic flow chart of another method embodiment provided by the present application.
  • FIG. 4 is a schematic diagram of a scenario category tree provided by the present application.
  • FIG. 5 is a schematic flowchart diagram of another method embodiment provided by the present application.
  • FIG. 6 is a schematic structural diagram of an apparatus according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of another apparatus embodiment provided by the present application.
  • FIG. 8 is a schematic structural diagram of another apparatus embodiment provided by the present application.
  • FIG. 9 is a schematic structural diagram of another apparatus embodiment provided by the present application.
  • FIG. 10 is a schematic structural diagram of another apparatus embodiment provided by the present application.
  • FIG. 11 is a schematic structural diagram of another apparatus embodiment provided by the present application.
  • the evaluation information refers to feedback information input by the user on a web platform such as a website or an application (application). For example, after a buyer purchases an item on an e-commerce website, the buyer can evaluate the service flow of the item, the logistics provided by the merchant, and the service. By entering the merchant subscription information, the merchant can extract the evaluation information of interest to the merchant and push it to the merchant.
  • the specific process includes: the buyer inputs a plurality of evaluation information, the merchant inputs the merchant subscription information, separates the merchant subscription information and the evaluation information, and determines whether the two have the same word segmentation result, and calculates the plurality of information according to the same word segmentation result. The degree of matching.
  • the above information matching method can only determine whether there is the same word segmentation result between the evaluation information and the merchant subscription information, and cannot reflect whether there is correlation between the two, for example, it is impossible to determine the relationship between the two parent categories.
  • the evaluation information input by the buyer is “not good service”
  • the merchant subscription information is “customer service attitude”
  • the parent category of “bad service” and “customer service attitude” are services, which have certain relevance.
  • the matching degree calculated according to the above information matching method is 0, and it is obvious that the matching accuracy is low, which causes the merchant to obtain the related evaluation information through an additional algorithm, thereby causing waste of system resources.
  • the embodiment of the present application provides an information matching method and related apparatus, so that the calculated matching degree can reflect the correlation between the information, specifically the correlation between the layer-by-layer parent categories of the multiple pieces of information, thereby improving the matching. Accuracy.
  • an embodiment of the present application provides an embodiment of an information matching method, where the embodiment is The methods include:
  • S101 Acquire first information and second information to be matched.
  • the first information and/or the second information may be information such as words, phrases, and the like input by the user.
  • the first information may be user evaluation information input by a buyer
  • the second information may be merchant subscription information input by a merchant.
  • the tag category tree in the embodiment of the present application includes at least two layers, each layer includes at least one tag node, and a parent tag node of each tag node is a parent class of the tag node.
  • the tag category tree shown in FIG. 2 includes three layers, and the first layer includes a tag node: "service”, that is, the root node of the tag category tree; and the second layer includes two tag nodes: "pre-sales” And “after-sales”; the third layer includes four tag nodes: "customer attitude”, “response speed”, "cash back” and "warranty”.
  • the label category tree is in a step-by-layer increasing order, and the corresponding category is refined layer by layer, that is, the parent label node of each label node is the parent category of the label node.
  • "pre-sales" is the parent category of "customer attitude”
  • service is the parent category of "pre-sales”.
  • S103 Obtain the first branch and the second branch from the label category tree.
  • the first branch and/or the second branch include at least one tag node.
  • the label node of the lowest layer of the first branch matches the content of the first information, because the parent label node of each label node in the label category tree is the parent category of the label node. Therefore, if the first information does not match the root node, the first branch includes not only the tag node that matches the content of the first information, but also the layer-by-layer parent category of the matched tag node.
  • the obtaining process of the first branch may include: matching the first information with each node in the tag category tree to obtain a matching tag node, and the matched tag node and the matched tag node
  • the layer-by-layer parent node acts as the first branch.
  • the first information may be segmented and the word segmentation result matched with the tag category tree before being matched with the tag category tree.
  • the first information is: “The service is not good”, and the first information is segmented to obtain the word segmentation result “service” and “bad”, and the word segmentation result “service” and “bad” are in the tag category tree.
  • Each node performs matching to obtain a matching tag node "service”. Since the tag node "service” is the root node and there is no parent node, the "service” is taken as the first branch.
  • the first information is: “The customer service attitude is not good”, and the matched tag node “customer service attitude” is obtained according to the above manner, and the layer-by-layer parent node of “customer service attitude” and “customer service attitude” is sold: “Before” and "Service” as the first branch.
  • the label node of the lowest layer of the second branch matches the content of the second information. If the second information does not match the root node, the second branch includes not only the tag node that matches the content of the second information, but also the layer-by-layer parent category of the matched tag node.
  • the acquiring process of the second branch is similar to the process of acquiring the first information, and may include: matching the second information with each node in the tag category tree to obtain a matching node, and matching the The node and the layer-by-layer parent of the matching node serve as the second branch.
  • the second information may be segmented before the tag category tree is matched, and the word segmentation result is matched with the tag category tree.
  • S104 Calculate a matching degree of the first information and the second information according to at least a matching degree of the first branch and the second branch in each layer.
  • the step may include: calculating a first matching degree by using a matching degree of the first branch and the second branch in each layer respectively; calculating, according to the first matching degree, the merchant subscription information and The degree of matching of the user evaluation information.
  • the first matching degree may be directly used as the matching degree of the first information and the second information, and the matching degree of the first information and the second information may be calculated according to the first matching degree and combined with other parameters.
  • the first branch includes at least one layer of label nodes
  • the second branch includes at least one layer of label nodes
  • the label nodes corresponding to each layer of the first branch and the second branch are matched to obtain each layer respectively.
  • the first branch includes, in order, a “service”
  • the second branch includes, in order, “service” and “pre-sale”
  • the matching degree of the first layer is 100%
  • the matching degree of the second layer is 0.
  • the first matching degree is calculated according to the matching degree of the two layers. For example, 1/2 of the sum of the matching degrees of the two layers is used as the matching degree between the first information and the second information, and the matching degree calculated in the above example is 50%.
  • the first branch includes, in order: “service”, “pre-sales”, “customer service attitude” and the second branch includes, in order, “service”, “pre-sales”, “response speed”, and the third One-third of the sum of the matching degrees of the layers is used as the matching degree between the first information and the second information, and the calculated matching degree is 67%.
  • the weight value of each layer may also be considered.
  • the first matching degree Tagsim is:
  • w i is the weight value of the i-th layer
  • P i is the matching degree of the first branch and the second branch in the i-th layer
  • the function I is equal to 1
  • P i ⁇ 100 when P i 100%
  • the function I is equal to 0.
  • the weight values of the layers may all be equal to 1, or may be incremented layer by layer, and the weight values may be set and/or adjusted by means of machine learning.
  • the first information and the second information are matched, the first information and the second information are not directly matched after the word segmentation, but the first information is obtained from the tag category tree.
  • the first branch and the second branch corresponding to the second information.
  • the first branch includes not only a label node that matches the content of the first information, but also a layer-by-layer parent category of the matched label node.
  • the second branch includes not only the second information.
  • the label node that matches the content further includes a layer-by-layer parent category of the matched label node, and therefore, the first calculated according to the matching degree of the first branch and the second branch at each layer respectively
  • the matching degree between the information and the second information not only reflects the matching degree of the first information and the second information, but also reflects the matching degree of the layer-by-layer parent class of the first information and the second information, which is equivalent to reflecting the first information. Correlation with the layer-by-layer parent category of the second information, thereby improving the matching accuracy.
  • the embodiment of the present application is actually equivalent to adding at least one layer label to the first information and the second information, and calculating the matching degree of the first information and the second information according to the matching degree of the category label of the corresponding layer. . Therefore, the application of the embodiment of the present application can calculate the matching degree between the information whose related categories have certain relevance, for example, the matching degree between the synonyms, the matching degree between the pieces of information belonging to the same category, and the like.
  • the evaluation information input by the buyer is “not good service”, and the merchant subscription information is “customer service attitude”.
  • “service is not good” and “customer service attitude” are describing services, they have certain relevance.
  • the matching degree is 0, and the matching accuracy is low.
  • the matching degree is calculated by the embodiment of the present application, the first branch includes: “service”, and the second branch includes, in order, "service” and "pre-sale”, and the matching degree of the first layer is 100. %, the matching degree of the second layer is 0, and the final calculated matching degree can be 50%. It can be seen that the matching degree calculated in the embodiment of the present application can reflect the correlation between the two, thus improving the matching accuracy.
  • the first information and the second information may also be information in other application scenarios.
  • the first information is the chat information input by the user in the WeChat group and the nail group
  • the second information is specific subscription information, such as a subscription word or a subscription phrase input by the group administrator, etc. This is not limited. The following is explained by a specific example.
  • the tag category tree For a WeChat group of a movie interest group, the tag category tree consists of two layers, the first layer includes a tag node: "movie”, and the second layer includes two tag nodes: “comedy” and "action drama.”
  • the label category tree is in a step-by-layer increasing order, and the corresponding category is refined layer by layer, that is, the parent label node of each label node is the parent category of the label node. For example, "movie” is the parent category of "comedy” and “action drama.” If the group administrator enters the subscription word: “movie”, the chat information input by the user is: “I like to watch comedy", when the two are directly matched, the matching degree is 0, and the matching accuracy is low.
  • the matching degree is calculated by the embodiment of the present application, the first branch includes: “movie” and “comedy”, and the second branch includes: “movie”, and the final calculated matching degree may be 50%. , improved the matching accuracy.
  • a branch may be selected from the branches matched by the first information, A branch is selected from the branches matched by the second information, and a matching degree between the two branches is calculated, and the calculated highest matching degree is used as a matching degree between the first information and the second information.
  • the method may further include: acquiring a word vector of the first information and a word vector of the second information; calculating a matching degree between a word vector of the first information and a word vector of the second information, as The first matching information is calculated according to the first matching degree, that is, the matching degree of the first branch and the second branch in each layer respectively, and the second matching degree. The degree of matching with the second information.
  • the word vector of each word is extracted, and the word vectors of the respective words are added to obtain a word vector of the first information, and the word vector of the second information can be obtained in a similar manner.
  • the degree of matching between the word vector of the first information and the word vector of the second information is calculated by calculating a cosine similarity or the like.
  • the word vector can be a word vector extracted by a technique such as word2vec.
  • the matching degree of the first information and the second information is calculated according to the first matching degree and the second matching degree
  • the sum of the first matching degree and the second matching degree may be used as the final matching degree
  • the corresponding weight may also be set. value.
  • a weight value that can be set and/or adjusted by machine learning.
  • the principle of word embedding technology is to use machine learning technology to learn a large amount of information, so that words are represented by corresponding word vectors, and the word vector actually represents the context in which the words are located, but in some cases according to The degree of matching calculated by the word vector has a problem of low accuracy.
  • the context of some words is the same, the semantics are quite different, so the word vector cannot accurately represent the semantics of the words in many cases.
  • the semantics of "good” and "bad” are opposite, but the cosine similarity between word vectors is high.
  • the same words are expressed differently in different environments.
  • very thin is a positive word when describing a mobile phone, and a negative word when describing a down jacket, and the matching degree calculated by the word vector is the same.
  • the word vector since it is difficult to prove the meaning corresponding to the numerical values in the word vector, the word vector itself cannot be adjusted to solve the above problem.
  • the embodiment of the present application may also calculate an emotional index of information according to a statistical model, and the sentiment index may indicate whether the information is a positive word, a negative word, or a neutral word, and consider the sentiment index when calculating the final matching degree. .
  • the method in this embodiment of the present application may further include:
  • the statistical model can be trained according to a large amount of training data, and each training data is marked with a corresponding emotional index.
  • the training data is 200,000 statements, each of which is labeled with a corresponding sentiment index.
  • the statistical model may be any mathematical model such as a maximum entropy model.
  • the maximum entropy model can make the calculated sentiment index more suitable for semantics, which can improve the accuracy of information matching.
  • S302 Calculate an sentiment index of the first information according to the statistical model.
  • the first information is input to the trained statistical model, and the emotional index of the first information can be obtained. Wherein, according to the interval in which the sentiment index is located, it can be indicated whether the emotion corresponding to the first information is positive, negative or neutral.
  • S303 Calculate an approximate degree of the sentiment index of the first information and the target sentiment index.
  • the target emotion index may be a preset emotion index, or may be calculated according to the second information.
  • an emotional index of the second information is calculated according to the statistical model, and an emotional index of the second information is used as the target emotional index.
  • the target sentiment index can indicate whether the target emotion is positive, negative or neutral.
  • the approximation may be expressed in any form such as a difference or a ratio, or may be according to the Whether the sentiment index of the information and the sentiment index indicated by the target sentiment index are the same, for example, if the sentiment index of the first information and the sentiment index indicated by the target sentiment index are negative, indicating that the approximation of the two is high.
  • a matching degree of the first information and the second information is calculated according to at least a matching degree of the first branch and the second branch in each layer and the approximate degree.
  • the degree of approximation of the sentiment index of the first information and the target sentiment index is also considered, and the greater the degree of approximation, that is, the first information
  • the degree of approximation that is, the first information
  • the target emotional index may be preset as the negative corresponding emotional index. If the user evaluation information is close to the target emotional index, then the final The calculated matching degree is high, thereby extracting the negative evaluation information that the merchant cares about in this way.
  • the matching degree of the first information and the second information is calculated according to at least a matching degree of the first branch and the second branch respectively in each layer.
  • the sentiment index of the first information and the emotion indicated by the target sentiment index are negative
  • Tagsim is the first matching degree.
  • the matching degree of the first information and the second information is 0.
  • the matching degree of the first information and the second information may be other lower values at this time, which is not limited by the embodiment of the present application.
  • a statistical model corresponding to multiple categories may be set, and each statistical model can calculate the emotion of the first information under the category. index.
  • Different statistical models are trained according to the training data corresponding to different scene categories. For example, for the same sentence, the sentiment indexes marked under different scene categories are different, so that the emotional indices calculated by different statistical models correspond to the scene categories. .
  • the obtaining the trained statistical model may include: acquiring a category corresponding to the first information, and acquiring a trained statistical model corresponding to the category.
  • the category corresponding to the first information may refer to a category to which the evaluation object of the first information belongs. For example, the buyer purchases the clothing category on the e-commerce website.
  • the user evaluation information is input for evaluating the clothing category, that is, the category corresponding to the user evaluation information is clothing.
  • the category corresponding to the first information may be obtained by using a scenario category tree.
  • the acquiring the category corresponding to the first information includes: acquiring a scene category tree, where the scene category tree includes at least two layers, each layer includes at least one scene node, and the parent scene node of each scene node is the Obtaining a parent category of the scene node; obtaining a scene node that matches the first information from the scene category tree, and determining a previous or multi-level parent scene node corresponding to the matched scene node,
  • the upper-level or multi-level parent scene node is used as the category corresponding to the first information.
  • the upper-level or multi-level parent scene node may refer to the root scene node, that is, directly obtain the root scene node as the corresponding category.
  • the buyer purchases the skirt on the e-commerce website and inputs the user evaluation information for evaluating the skirt, so the matching scene node is obtained from the scene category tree: the skirt, and the root corresponding to the scene node is determined.
  • Scene node a clothing class, which acquires a post-training statistical model corresponding to the clothing class, and uses the statistical model to calculate an emotional index of the first information. Therefore, in the calculation of the "very thin" emotional index, the embodiment selects the corresponding statistical model according to the "very thin" corresponding scene category, specifically the mobile phone or the clothing category, thereby calculating "very thin” according to the scene category.
  • the emotional index improves the accuracy of information matching.
  • the training feature of the statistical model in this embodiment includes a word segmentation result of the input information
  • the method further includes: performing segmentation on the first information to obtain a word segmentation result of the first information; and calculating an emotion index of the first information according to the statistical model, comprising: inputting a word segmentation result of the first information To the statistical model, an sentiment index of the first information is obtained.
  • the word segmentation can be performed based on the bigram mode, that is, each two adjacent characters in the first information are segmented to obtain the word segmentation result of the first information. For example, the result of the wording "bad service” is “service”, “do not” and “not good”. Word segmentation based on this method can obtain a higher accuracy of information matching.
  • the training characteristics of the statistical model may also include the emotional features of the context, so that the emotion index can be calculated by synthesizing the words themselves and the context information.
  • the method further includes: extracting an emotional feature of the context of the first information; inputting a word segmentation result of the first information into the statistical model to obtain an emotional index of the first information, including: The word segmentation result of the first information and the emotional feature of the context of the first information are input to the statistical model to obtain an emotional index of the first information.
  • the sentiment index of the previous sentence, the similarity of the topic between the previous sentence and the current sentence, the overall emotional distribution above, and the above An emotional distribution of at least one related sentence in the text, the at least one related sentence having a similarity to a theme of the current sentence being greater than a second threshold.
  • the sentiment index of the previous sentence can indicate whether the emotion of the previous sentence is positive, negative or neutral; the similarity of the topic of the previous sentence can indicate whether the previous sentence and the current sentence describe the same or similar theme; the overall emotional distribution above It can refer to the above, the number of positive, negative and neutral sentences respectively; the related sentence is used to represent the sentence that describes the same or similar topic as the current sentence, and the emotional distribution of at least one related sentence in the above can refer to The number of statements that are positive, negative, and neutral, respectively, in the sentences that describe the same or similar topics above.
  • the embodiment of the present application may specifically calculate the sentiment index of the first information by using two statistical models. That is, the post-training statistical model includes a trained first statistical model and a second statistical model, the training features of the first statistical model include a word segmentation result of the input information, and the training feature of the second statistical model Includes emotional characteristics of the context.
  • the embodiment of the present application provides another method embodiment of the information matching method, where the method in this embodiment includes:
  • S501 Obtain user evaluation information input by the buyer and merchant subscription information input by the merchant.
  • the user evaluation information input by the buyer is used to evaluate the skirt purchased by the buyer, that is, the evaluation object is a skirt.
  • the user rating is “slow response” and the merchant subscription information is "customer attitude"
  • S502 Obtain a label category tree as shown in FIG. 2.
  • the label category tree in the embodiment of the present application may be modified by manually adding or the like.
  • S503 Obtain the first branch and the second branch from the label category tree.
  • the label node of the lowest layer of the first branch matches the user evaluation information, and specifically includes: service, pre-sales, and response speed;
  • the label node of the lowest layer of the second branch matches the merchant subscription information, specifically Including: service, pre-sales, customer service attitude.
  • S503 Calculate a first matching degree according to at least a matching degree of each of the first branch and the second branch in each layer.
  • the calculation formula of the first matching degree is:
  • w i is the weight value of the i-th layer
  • P i is the matching degree of the first branch and the second branch in the i-th layer
  • the function I is equal to 1
  • P i ⁇ 100 when P i 100%
  • the function I is equal to 0.
  • S504 respectively acquiring a word vector of the user evaluation information and a word vector of the merchant subscription information, and calculating a word vector The degree of matching, as the second degree of matching.
  • S505 Acquire a scene category tree as shown in FIG. 4.
  • the scene category tree in the embodiment of the present application may be modified by manually adding or the like.
  • S506 Obtain a scene node that matches the evaluation object from the scene category tree: a skirt, and determine a root scene node corresponding to the scene node: a clothing class.
  • S507 Obtain a trained maximum entropy model A and a maximum entropy model B corresponding to the clothing category.
  • the training feature of the maximum entropy model A includes a word segmentation result based on a bigram mode, and the training feature of the maximum entropy model B includes an emotional feature of the context.
  • S508 Perform user word segmentation based on the bigram mode, input the word segmentation result into the maximum entropy model A, and obtain an emotion index of the user evaluation information.
  • S509 Extract the emotional feature of the context of the user evaluation information, input the emotional feature of the context and the sentiment index obtained in S508 to the maximum entropy model B, and obtain the corrected emotional index.
  • the emotional characteristics of the context include the following:
  • the emotional index of the previous sentence (positive, negative or neutral, and the corresponding intensity), whether the previous sentence and the current sentence describe the same subject, the number of statements with positive, negative, and neutral emotions, and In the sentences describing the same subject above, the number of positive, negative, and neutral sentences, respectively.
  • S510 Calculate a matching degree between the user evaluation information and the merchant subscription information according to the modified sentiment index, the first matching degree, and the second matching degree.
  • the target emotion is negative, and if the emotion indicated by the corrected emotional index obtained in S509 is not negative, the matching degree is 0.
  • the matching degree is:
  • the tagim is the first matching degree calculated in S503
  • Vecsim is the second matching degree calculated in S504
  • ⁇ 1 and ⁇ 2 are the corresponding weight values.
  • FIG. 6 another embodiment of the information matching method is also provided in the embodiment of the present application.
  • the method of this embodiment includes:
  • S601 Acquire first information and second information to be matched.
  • the first information and/or the second information may be information such as words, phrases, and the like input by the user.
  • the first information may be user evaluation information input by a buyer
  • the second information may be merchant subscription information input by a merchant.
  • S603 Calculate an sentiment index of the first information according to the statistical model.
  • S604 Calculate a matching degree of the first information and the second information according to at least an approximation degree of the sentiment index of the first information and the target sentiment index.
  • the method further includes: acquiring an initial matching degree between the first information and the second information; step S604 includes: calculating the first information according to at least the approximate degree and the initial matching degree The degree of matching with the second information.
  • the initial matching degree may be the first matching degree in the foregoing embodiment, that is, the matching degree of the first branch and the second branch respectively corresponding to each layer.
  • calculating, according to the approximation degree and the initial matching degree, a matching degree of the first information and the second information including:
  • the matching degree of the first information and the second information is 0.
  • a statistical model after training including:
  • obtaining the category corresponding to the first information including:
  • Obtaining a scene category tree where the scene category tree includes at least two layers, each layer includes at least one scene node, and a parent scene node of each scene node is a parent category of the scene node;
  • Obtaining a scene node that matches the first information from the scene category tree, and determining the matched scene The upper-level or multi-level parent scene node corresponding to the node, and the upper-level or multi-level parent scene node is used as the category corresponding to the first information.
  • the method further includes:
  • the present application also provides an embodiment of an information input method.
  • the method of this embodiment includes:
  • S701 The client acquires the first information or the second information.
  • the client sends the first information or the second information to a computing unit, where the computing unit is configured to calculate a matching degree of the first information and the second information.
  • the calculation unit may use any one of the foregoing information matching methods to calculate the matching degree of the first information and the second information.
  • the calculation unit may use any one of the foregoing information matching methods to calculate the matching degree of the first information and the second information.
  • the present application also provides corresponding device embodiments, which are specifically described below.
  • an embodiment of the present application provides an apparatus embodiment of an information matching apparatus.
  • the device of this embodiment includes:
  • the information obtaining unit 801 is configured to obtain the merchant subscription information and the user evaluation information to be matched.
  • the category tree obtaining unit 802 is configured to use a label category tree, where the label category tree includes at least two layers, each layer includes at least one label node, and a parent label node of each label node is a parent category of the label node.
  • a branch obtaining unit 803 configured to obtain, from the tag category tree, a first branch and a second branch, wherein a label node of a lowest layer of the first branch matches a content of the user evaluation information, the second The lowest level tag node of the container matches the content of the merchant subscription information.
  • the matching degree calculation unit 804 is configured to calculate a matching degree of the merchant subscription information and the user evaluation information according to at least a matching degree of the first branch and the second branch at each layer.
  • the matching degree calculation unit is configured to calculate a first matching degree according to at least a matching degree of the first branch and the second branch in each layer, at least according to the first matching degree. And calculating a matching degree of the merchant subscription information and the user evaluation information.
  • the matching degree calculation unit is specifically configured to calculate a first matching degree according to at least a matching degree of the first branch and the second branch in each layer, and a weight value of each layer.
  • it also includes:
  • a model acquisition unit configured to acquire a statistical model after training
  • An emotion calculation unit configured to calculate an emotional index of the user evaluation information according to the statistical model
  • An approximation calculation unit configured to calculate an approximation degree of the sentiment index of the user evaluation information and the target sentiment index
  • the matching degree calculation unit is configured to calculate the user evaluation information and the merchant subscription information according to at least a matching degree of the first branch and the second branch at each layer and the approximate degree. suitability.
  • the sentiment calculation unit is further configured to calculate an sentiment index of the merchant subscription information according to the statistical model, and the sentiment index of the merchant subscription information is used as the target sentiment index.
  • the matching degree calculation unit is specifically used to:
  • the matching degree of the user evaluation information and the merchant subscription information is calculated according to at least the matching degree of the first branch and the second branch in each layer respectively;
  • the degree of approximation is less than the first threshold, the degree of matching between the user evaluation information and the merchant subscription information is zero.
  • the model obtaining unit is specifically configured to obtain a category corresponding to the user evaluation information, and obtain a trained statistical model corresponding to the category.
  • the model obtaining unit when acquiring the category corresponding to the user evaluation information, is specifically configured to:
  • Obtaining a scene category tree where the scene category tree includes at least two layers, each layer includes at least one scene node, and a parent scene node of each scene node is a parent category of the scene node;
  • the scene node serves as a category corresponding to the user evaluation information.
  • the method further includes: a word vector obtaining unit, configured to acquire a word vector of the user evaluation information and a word vector of the merchant subscription information;
  • the matching degree calculation unit is further configured to calculate a matching degree between the word vector of the user evaluation information and the word vector of the merchant subscription information as a second matching degree;
  • the matching degree calculation unit is specifically configured to: at least according to the matching degree and the second matching degree respectively corresponding to each of the first branch and the second branch in each layer, Calculating the matching degree of the user evaluation information and the merchant subscription information.
  • it also includes:
  • a correcting unit configured to acquire a matching degree between the plurality of tag nodes in the tag category tree, perform machine learning according to the matching degree between the plurality of tag nodes, and generate or correct the tag according to the result of the machine learning Category tree.
  • an embodiment of the present application provides another apparatus embodiment of an information matching apparatus.
  • the device of this embodiment includes:
  • the information obtaining unit 901 is configured to obtain the merchant subscription information and the user evaluation information to be matched;
  • the model obtaining unit 902 is configured to obtain a statistical model after training
  • the emotion calculation unit 903 is configured to calculate an emotion index of the user evaluation information according to the statistical model
  • the matching degree calculation unit 904 is configured to calculate a matching degree of the user evaluation information and the merchant subscription information according to at least an approximation degree of the emotion index of the user evaluation information and the target emotion index.
  • it also includes:
  • a matching degree obtaining unit configured to acquire an initial matching degree between the user evaluation information and the merchant evaluation information
  • the matching degree calculating unit is specifically configured to: at least according to the approximation Degrees and the initial matching degree, and the degree of matching between the user evaluation information and the merchant subscription information is calculated.
  • the matching degree calculation unit is specifically configured to:
  • the degree of approximation is less than the first threshold, the degree of matching between the user evaluation information and the merchant subscription information is zero.
  • the model obtaining unit is specifically configured to obtain a category corresponding to the user evaluation information, and obtain a trained statistical model corresponding to the category.
  • the model obtaining unit when acquiring the category corresponding to the user evaluation information, is specifically configured to:
  • Obtaining a scene category tree where the scene category tree includes at least two layers, each layer includes at least one scene node, and a parent scene node of each scene node is a parent category of the scene node;
  • the scene node serves as a category corresponding to the user evaluation information.
  • the sentiment calculation unit is further configured to calculate an sentiment index of the merchant subscription information according to the statistical model, and use an sentiment index of the merchant subscription information as the target sentiment index.
  • an embodiment of the present application provides an apparatus embodiment of a client.
  • the device of this embodiment includes:
  • the information obtaining unit 1001 is configured to acquire user evaluation information or merchant subscription information input by the user;
  • the sending unit 1002 is configured to send the user evaluation information or the merchant subscription information to the computing unit, where the calculating unit is configured to calculate a matching degree of the user evaluation information and the merchant subscription information.
  • an embodiment of the present application provides another apparatus embodiment of an information matching apparatus.
  • the device of this embodiment includes:
  • the information acquiring unit 1101 is configured to acquire first information and second information to be matched;
  • the category tree obtaining unit 1102 is configured to use a label category tree, where the label category tree includes at least two layers, each layer includes at least one label node, and a parent label node of each label node is a parent category of the label node;
  • a branch obtaining unit 1103 configured to obtain a first branch and a second branch from the label category tree, wherein a label node of a lowest layer of the first branch matches a content of the first information, the second The label node of the lowest layer of the branch matches the content of the second information;
  • the matching degree calculation unit 1104 is configured to calculate a matching degree of the first information and the second information according to at least a matching degree of the first branch and the second branch in each layer.
  • the matching degree calculation unit is configured to calculate a first matching degree according to at least a matching degree of each of the first branch and the second branch in each layer; at least according to the first matching degree And calculating a matching degree of the first information and the second information.
  • the matching degree calculating unit is specifically configured to use, according to at least the first branch, according to a matching degree of the first branch and the second branch respectively corresponding to each layer.
  • the first matching degree is calculated according to the matching degree of the second branch in each layer and the weight value of each layer.
  • it also includes:
  • a model acquisition unit configured to acquire a statistical model after training
  • An emotion calculation unit configured to calculate an emotional index of the first information according to the statistical model
  • An approximation calculation unit configured to calculate an approximation degree of the sentiment index of the first information and the target sentiment index
  • the matching degree calculating unit is specifically configured to use, at least, the matching degree of the first information and the second information is calculated according to the matching degree of the first branch and the second branch in each layer respectively. And matching degree of the first information and the second information is calculated according to a matching degree of the first branch and the second branch in each layer and the approximate degree.
  • the sentiment calculation unit is further configured to calculate an sentiment index of the second information according to the statistical model, and an sentiment index of the second information is used as the target sentiment index.
  • the matching degree calculation unit is specifically used to:
  • the matching degree of the first information and the second information is calculated according to at least a matching degree of the first branch and the second branch in each layer respectively;
  • the matching degree of the first information and the second information is 0.
  • the model obtaining unit is configured to acquire a category corresponding to the first information, and obtain a trained statistical model corresponding to the category.
  • the model obtaining unit when acquiring the category corresponding to the first information, is specifically configured to:
  • Obtaining a scene category tree where the scene category tree includes at least two layers, each layer includes at least one scene node, and a parent scene node of each scene node is a parent category of the scene node;
  • the scene node serves as a category corresponding to the first information.
  • the training feature of the trained statistical model includes a word segmentation result of the input information
  • the device further includes: a word segmentation unit, configured to perform segmentation on the first information to obtain a word segmentation result of the first information;
  • the emotion calculation unit is specifically configured to input the word segmentation result of the first information into the statistical model to obtain an emotion index of the first information.
  • the word segmentation result of the input information is a word segmentation result obtained by segmenting each two adjacent characters in the input information
  • the word segmentation unit is specifically configured to perform word segmentation on every two adjacent characters in the first information.
  • the training feature of the trained statistical model further includes an emotional feature of the context
  • the device further includes: an emotion extraction unit, configured to extract an emotional feature of a context of the first information;
  • the emotion calculating unit is specifically configured to: the word segmentation result of the first information and the first information
  • An emotional feature of the context is input to the statistical model to obtain an emotional index of the first information.
  • the emotional features of the context include any one or more of the following:
  • the sentiment index of the previous sentence, the topic similarity between the previous sentence and the current sentence, the overall sentiment distribution above, and the emotional distribution of at least one related sentence in the above, the at least one related sentence has a similarity to the topic of the current sentence. Two thresholds.
  • the trained statistical model includes a first statistical model and a second statistical model after training, the training feature of the first statistical model includes a word segmentation result of the input information, and the training feature of the second statistical model Includes emotional characteristics of the context.
  • the trained statistical model is a maximum entropy model after training.
  • the method further includes: a word vector obtaining unit, configured to acquire a word vector of the first information and a word vector of the second information;
  • the matching degree calculation unit is further configured to calculate a matching degree of the word vector of the first information and the word vector of the second information as a second matching degree;
  • the matching degree calculating unit is specifically configured to use at least according to the Calculating a matching degree between the first information and the second information, where the first branch and the second branch respectively correspond to each other in a matching degree and a second matching degree.
  • the method further includes: a correction unit, configured to acquire a matching degree between the plurality of label nodes in the label category tree, and perform machine learning according to the matching degree between the plurality of label nodes, according to machine learning The result is generated or corrected for the tag category tree.
  • a correction unit configured to acquire a matching degree between the plurality of label nodes in the label category tree, and perform machine learning according to the matching degree between the plurality of label nodes, according to machine learning The result is generated or corrected for the tag category tree.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interface, device Or an indirect coupling or communication connection of the unit, which may be in electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • a computer readable storage medium A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Abstract

本申请实施例提供了一种信息匹配方法及相关装置,所述方法包括:获取待匹配的第一信息和第二信息;获取标签类目树,所述标签类目树包括至少两层,每层包括至少一个标签节点,每个标签节点的父标签节点为该标签节点的父类目;从所述标签类目树中获取第一树枝和第二树枝,所述第一树枝的最低层的标签节点与所述第一信息的内容相匹配,所述第二树枝的最低层的标签节点与所述第二信息的内容相匹配;至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述第一信息和所述第二信息的匹配度。可见,本申请实施例计算出的匹配度能够反映信息之间的关联性,从而提高匹配准确率。

Description

一种信息匹配方法及相关装置
本申请要求2016年10月11日递交的申请号为201610887444.0、发明名称为“一种信息匹配方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其是涉及一种信息匹配方法及相关装置。
背景技术
信息匹配技术是一种常用的计算机技术,用于获得多条信息之间的匹配度。信息匹配技术广泛应用在多种互联网场景中,例如,对于买家在电子商务等网站输入的多条评价信息,通过信息匹配技术获得各条评价信息与商家订阅信息的匹配度,从而能够快速地定位到商家感兴趣的评价信息。
目前一种常用的信息匹配方式包括:将待匹配的多条信息分词,判断是否存在相同的分词结果,根据相同的分词结果计算多条信息之间的匹配度。
显然,上述信息匹配方式只能判断出多条信息之间是否存在相同的分词结果,而无法反映多条信息之间是否存在关联性。例如,买家输入的评价信息为“服务不好”,而商家订阅信息为“客服态度”,虽然“服务不好”和“客服态度”都是在描述服务,具有一定的关联性,但是按照上述信息匹配方式计算出的匹配度为0,显然匹配准确率较低。
发明内容
本申请解决的技术问题在于提供一种信息匹配方法及相关装置,使得计算出的匹配度能够反映信息之间的关联性,从而提高匹配准确率。
为此,本申请解决技术问题的技术方案是:
本申请提供了一种信息匹配方法,包括:
获取待匹配的商家订阅信息和用户评价信息;
获取标签类目树,所述标签类目树包括至少两层,每层包括至少一个标签节点,每个标签节点的父标签节点为该标签节点的父类目;
从所述标签类目树中获取第一树枝和第二树枝,所述第一树枝的最低层的标签节点 与所述用户评价信息的内容相匹配,所述第二树枝的最低层的标签节点与所述商家订阅信息的内容相匹配;
至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述商家订阅信息和所述用户评价信息的匹配度。
可选的,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述商家订阅信息和所述用户评价信息的匹配度,包括:
至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算第一匹配度;
至少根据所述第一匹配度,计算所述商家订阅信息和所述用户评价信息的匹配度。
可选的,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算第一匹配度,包括:
至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,以及每层的权重值,计算第一匹配度。
可选的,所述方法还包括:
获取训练后的统计模型;
根据所述统计模型计算所述用户评价信息的情感指数;
计算所述用户评价信息的情感指数与目标情感指数的近似度;
至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述用户评价信息和所述商家订阅信息的匹配度,包括:
至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度以及所述近似度,计算所述用户评价信息和所述商家订阅信息的匹配度。
可选的,所述方法还包括:
根据所述统计模型计算所述商家订阅信息的情感指数,所述商家订阅信息的情感指数作为所述目标情感指数。
可选的,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度以及所述近似度,计算所述用户评价信息和所述商家订阅信息的匹配度,包括:
若所述近似度大于或等于第一阈值,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度计算所述用户评价信息和所述商家订阅信息的匹配度;
若所述近似度小于所述第一阈值,所述用户评价信息和所述商家订阅信息的匹配度为0。
可选的,获取训练后的统计模型,包括:
获取所述用户评价信息对应的类目;
获取所述类目对应的训练后的统计模型。
可选的,获取所述用户评价信息对应的类目,包括:
获取场景类目树,所述场景类目树包括至少两层,每层包括至少一个场景节点,每个场景节点的父场景节点为该场景节点的父类目;
从所述场景类目树中获取与所述用户评价信息匹配的场景节点,确定出所述匹配的场景节点对应的上一级或多级父场景节点,将所述上一级或多级父场景节点作为所述用户评价信息对应的类目。
可选的,所述方法还包括:
获取所述用户评价信息的词向量和所述商家订阅信息的词向量;
计算所述用户评价信息的词向量与所述商家订阅信息的词向量的匹配度,作为第二匹配度;
至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述用户评价信息和所述商家订阅信息的匹配度,包括:
至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度和所述第二匹配度,计算所述用户评价信息和所述商家订阅信息的匹配度。
可选的,所述方法还包括:
获取所述标签类目树中多个标签节点之间的匹配度;
根据所述多个标签节点之间的匹配度进行机器学习,根据机器学习的结果生成或者修正所述标签类目树。
本申请还提供了一种信息匹配方法,包括:
获取待匹配的商家订阅信息和用户评价信息;
获取训练后的统计模型;
根据所述统计模型计算所述用户评价信息的情感指数;
至少根据所述用户评价信息的情感指数与目标情感指数的近似度,计算所述用户评价信息和所述商家订阅信息的匹配度。
可选的,所述方法还包括:
获取所述用户评价信息与所述商家评价信息的初始匹配度;
至少根据所述用户评价信息的情感指数与目标情感指数的近似度,计算所述用户评 价信息和所述商家订阅信息的匹配度,包括:
至少根据所述近似度和所述初始匹配度,计算所述用户评价信息和所述商家订阅信息的匹配度。
可选的,至少根据所述近似度和所述初始匹配度,计算所述用户评价信息和所述商家订阅信息的匹配度,包括:
若所述近似度大于或等于第一阈值,至少根据所述初始匹配度计算所述用户评价信息和所述商家订阅信息的匹配度;
若所述近似度小于所述第一阈值,所述用户评价信息和所述商家订阅信息的匹配度为0。
可选的,获取训练后的统计模型,包括:
获取所述用户评价信息对应的类目;
获取所述类目对应的训练后的统计模型。
可选的,获取所述用户评价信息对应的类目,包括:
获取场景类目树,所述场景类目树包括至少两层,每层包括至少一个场景节点,每个场景节点的父场景节点为该场景节点的父类目;
从所述场景类目树中获取与所述用户评价信息匹配的场景节点,确定出所述匹配的场景节点对应的上一级或多级父场景节点,将所述上一级或多级父场景节点作为所述用户评价信息对应的类目。
可选的,所述方法还包括:
根据所述统计模型计算所述商家订阅信息的情感指数,将所述商家订阅信息的情感指数作为所述目标情感指数。
本申请还提供了一种信息输入方法,包括:
客户端获取用户输入的用户评价信息或者商家订阅信息;
所述客户端将所述用户评价信息或者商家订阅信息发送至计算单元,所述计算单元用于计算用户评价信息和商家订阅信息的匹配度。
本申请还提供了一种信息匹配方法,包括:
获取待匹配的第一信息和第二信息;
获取标签类目树,所述标签类目树包括至少两层,每层包括至少一个标签节点,每个标签节点的父标签节点为该标签节点的父类目;
从所述标签类目树中获取第一树枝和第二树枝,所述第一树枝的最低层的标签节点与所述第一信息的内容相匹配,所述第二树枝的最低层的标签节点与所述第二信息的内容相匹配;
至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述第一信息和所述第二信息的匹配度。
可选的,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述第一信息和所述第二信息的匹配度,包括:
至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算第一匹配度;
至少根据所述第一匹配度,计算所述第一信息和所述第二信息的匹配度。
可选的,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算第一匹配度,包括:
至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,以及每层的权重值,计算第一匹配度。
可选的,所述方法还包括:
获取训练后的统计模型;
根据所述统计模型计算所述第一信息的情感指数;
计算所述第一信息的情感指数与目标情感指数的近似度;
至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述第一信息和所述第二信息的匹配度,包括:
至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度以及所述近似度,计算所述第一信息和所述第二信息的匹配度。
可选的,所述方法还包括:
根据所述统计模型计算所述第二信息的情感指数,所述第二信息的情感指数作为所述目标情感指数。
可选的,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度以及所述近似度,计算所述第一信息和所述第二信息的匹配度,包括:
若所述近似度大于或等于第一阈值,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度计算所述第一信息和所述第二信息的匹配度;
若所述近似度小于所述第一阈值,所述第一信息和所述第二信息的匹配度为0。
可选的,获取训练后的统计模型,包括:
获取所述第一信息对应的类目;
获取所述类目对应的训练后的统计模型。
可选的,获取所述第一信息对应的类目,包括:
获取场景类目树,所述场景类目树包括至少两层,每层包括至少一个场景节点,每个场景节点的父场景节点为该场景节点的父类目;
从所述场景类目树中获取与所述第一信息匹配的场景节点,确定出所述匹配的场景节点对应的上一级或多级父场景节点,将所述上一级或多级父场景节点作为所述第一信息对应的类目。
可选的,所述训练后的统计模型的训练特征包括输入信息的分词结果;
所述方法还包括:对所述第一信息进行分词,得到第一信息的分词结果;
根据所述统计模型计算所述第一信息的情感指数,包括:将所述第一信息的分词结果输入到所述统计模型,得到所述第一信息的情感指数。
可选的,所述输入信息的分词结果为对所述输入信息中每两个相邻字符进行分词所得到的分词结果;
所述对所述第一信息进行分词,包括:对所述第一信息中每两个相邻字符进行分词。
可选的,所述训练后的统计模型的训练特征还包括上下文的情感特征;
所述方法还包括:提取所述第一信息的上下文的情感特征;
将所述第一信息的分词结果输入到所述统计模型,得到所述第一信息的情感指数,包括:将所述第一信息的分词结果和所述第一信息的上下文的情感特征,输入到所述统计模型,得到所述第一信息的情感指数。
可选的,所述上下文的情感特征包括以下任一项或者多项:
前一句的情感指数、前一句与当前句的主题相似度,上文的整体情感分布、以及上文中的至少一条相关句的情感分布,所述至少一条相关句与当前句的主题相似度大于第二阈值。
可选的,所述训练后的统计模型包括训练后的第一统计模型和第二统计模型,所述第一统计模型的训练特征包括输入信息的分词结果,所述第二统计模型的训练特征包括上下文的情感特征。
可选的,所述训练后的统计模型为训练后的最大熵模型。
可选的,所述方法还包括:
获取所述第一信息的词向量和所述第二信息的词向量;
计算所述第一信息的词向量与所述第二信息的词向量的匹配度,作为第二匹配度;
至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述第一信息和所述第二信息的匹配度,包括:
至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度和所述第二匹配度,计算所述第一信息和所述第二信息的匹配度。
可选的,所述方法还包括:
获取所述标签类目树中多个标签节点之间的匹配度;
根据所述多个标签节点之间的匹配度进行机器学习,根据机器学习的结果生成或者修正所述标签类目树。
本申请还提供了一种信息匹配装置,包括:
信息获取单元,用于获取待匹配的商家订阅信息和用户评价信息;
类目树获取单元,用于标签类目树,所述标签类目树包括至少两层,每层包括至少一个标签节点,每个标签节点的父标签节点为该标签节点的父类目;
树枝获取单元,用于从所述标签类目树中获取第一树枝和第二树枝,所述第一树枝的最低层的标签节点与所述用户评价信息的内容相匹配,所述第二树枝的最低层的标签节点与所述商家订阅信息的内容相匹配;
匹配度计算单元,用于至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述商家订阅信息和所述用户评价信息的匹配度。
可选的,所述匹配度计算单元具体用于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算第一匹配度,至少根据所述第一匹配度,计算所述商家订阅信息和所述用户评价信息的匹配度。
可选的,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算第一匹配度时,所述匹配度计算单元具体用于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,以及每层的权重值,计算第一匹配度。
可选的,还包括:
模型获取单元,用于获取训练后的统计模型;
情感计算单元,用于根据所述统计模型计算所述用户评价信息的情感指数;
近似度计算单元,用于计算所述用户评价信息的情感指数与目标情感指数的近似度;
所述匹配度计算单元具体用于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度以及所述近似度,计算所述用户评价信息和所述商家订阅信息的匹配度。
可选的,所述情感计算单元,还用于根据所述统计模型计算所述商家订阅信息的情感指数,所述商家订阅信息的情感指数作为所述目标情感指数。
可选的,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度以及所述近似度,计算所述用户评价信息和所述商家订阅信息的匹配度时,所述匹配度计算单元具体用于:
若所述近似度大于或等于第一阈值,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度计算所述用户评价信息和所述商家订阅信息的匹配度;
若所述近似度小于所述第一阈值,所述用户评价信息和所述商家订阅信息的匹配度为0。
可选的,所述模型获取单元具体用于,获取所述用户评价信息对应的类目,获取所述类目对应的训练后的统计模型。
可选的,获取所述用户评价信息对应的类目时,所述模型获取单元具体用于:
获取场景类目树,所述场景类目树包括至少两层,每层包括至少一个场景节点,每个场景节点的父场景节点为该场景节点的父类目;
从所述场景类目树中获取与所述用户评价信息匹配的场景节点,确定出所述匹配的场景节点对应的上一级或多级父场景节点,将所述上一级或多级父场景节点作为所述用户评价信息对应的类目。
可选的,还包括:词向量获取单元,用于获取所述用户评价信息的词向量和所述商家订阅信息的词向量;
匹配度计算单元,还用于计算所述用户评价信息的词向量与所述商家订阅信息的词向量的匹配度,作为第二匹配度;
至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述用户评价信息和所述商家订阅信息的匹配度时,匹配度计算单元具体用于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度和所述第二匹配度,计算所述用户评价信息和所述商家订阅信息的匹配度。
可选的,还包括:
修正单元,用于获取所述标签类目树中多个标签节点之间的匹配度,根据所述多个标签节点之间的匹配度进行机器学习,根据机器学习的结果生成或者修正所述标签类目树。
本申请还提供了一种信息匹配装置,包括:
信息获取单元,用于获取待匹配的商家订阅信息和用户评价信息;
模型获取单元,用于获取训练后的统计模型;
情感计算单元,用于根据所述统计模型计算所述用户评价信息的情感指数;
匹配度计算单元,用于至少根据所述用户评价信息的情感指数与目标情感指数的近似度,计算所述用户评价信息和所述商家订阅信息的匹配度。
可选的,还包括:
匹配度获取单元,用于获取所述用户评价信息与所述商家评价信息的初始匹配度;
至少根据所述用户评价信息的情感指数与目标情感指数的近似度,计算所述用户评价信息和所述商家订阅信息的匹配度时,所述匹配度计算单元具体用于,至少根据所述近似度和所述初始匹配度,计算所述用户评价信息和所述商家订阅信息的匹配度。
可选的,至少根据所述近似度和所述初始匹配度,计算所述用户评价信息和所述商家订阅信息的匹配度时,所述匹配度计算单元具体用于:
若所述近似度大于或等于第一阈值,至少根据所述初始匹配度计算所述用户评价信息和所述商家订阅信息的匹配度;
若所述近似度小于所述第一阈值,所述用户评价信息和所述商家订阅信息的匹配度为0。
可选的,模型获取单元具体用于,获取所述用户评价信息对应的类目,获取所述类目对应的训练后的统计模型。
可选的,获取所述用户评价信息对应的类目时,所述模型获取单元具体用于:
获取场景类目树,所述场景类目树包括至少两层,每层包括至少一个场景节点,每个场景节点的父场景节点为该场景节点的父类目;
从所述场景类目树中获取与所述用户评价信息匹配的场景节点,确定出所述匹配的场景节点对应的上一级或多级父场景节点,将所述上一级或多级父场景节点作为所述用户评价信息对应的类目。
可选的,情感计算单元还用于,根据所述统计模型计算所述商家订阅信息的情感指数,将所述商家订阅信息的情感指数作为所述目标情感指数。
本申请还提供了一种客户端,包括:
信息获取单元,用于获取用户输入的用户评价信息或者商家订阅信息;
发送单元,用于将所述用户评价信息或者商家订阅信息发送至计算单元,所述计算单元用于计算用户评价信息和商家订阅信息的匹配度。
本申请还提供了一种信息匹配装置,包括:
信息获取单元,用于获取待匹配的第一信息和第二信息;
类目树获取单元,用于标签类目树,所述标签类目树包括至少两层,每层包括至少一个标签节点,每个标签节点的父标签节点为该标签节点的父类目;
树枝获取单元,用于从所述标签类目树中获取第一树枝和第二树枝,所述第一树枝的最低层的标签节点与所述第一信息的内容相匹配,所述第二树枝的最低层的标签节点与所述第二信息的内容相匹配;
匹配度计算单元,用于至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述第一信息和所述第二信息的匹配度。
可选的,所述匹配度计算单元具体用于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算第一匹配度;至少根据所述第一匹配度,计算所述第一信息和所述第二信息的匹配度。
可选的,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算第一匹配度时,所述匹配度计算单元具体用于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,以及每层的权重值,计算第一匹配度。
可选的,还包括:
模型获取单元,用于获取训练后的统计模型;
情感计算单元,用于根据所述统计模型计算所述第一信息的情感指数;
近似度计算单元,用于计算所述第一信息的情感指数与目标情感指数的近似度;
至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述第一信息和所述第二信息的匹配度时,所述匹配度计算单元具体用于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度以及所述近似度,计算所述第一信息和所述第二信息的匹配度。
可选的,情感计算单元还用于,根据所述统计模型计算所述第二信息的情感指数,所述第二信息的情感指数作为所述目标情感指数。
可选的,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度以及所述近似度,计算所述第一信息和所述第二信息的匹配度时,所述匹配度计算单元具体用于:
若所述近似度大于或等于第一阈值,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度计算所述第一信息和所述第二信息的匹配度;
若所述近似度小于所述第一阈值,所述第一信息和所述第二信息的匹配度为0。
可选的,模型获取单元具体用于,获取所述第一信息对应的类目,获取所述类目对应的训练后的统计模型。
可选的,获取所述第一信息对应的类目时,模型获取单元具体用于:
获取场景类目树,所述场景类目树包括至少两层,每层包括至少一个场景节点,每个场景节点的父场景节点为该场景节点的父类目;
从所述场景类目树中获取与所述第一信息匹配的场景节点,确定出所述匹配的场景节点对应的上一级或多级父场景节点,将所述上一级或多级父场景节点作为所述第一信息对应的类目。
可选的,所述训练后的统计模型的训练特征包括输入信息的分词结果;
所述装置还包括:分词单元,用于对所述第一信息进行分词,得到第一信息的分词结果;
情感计算单元具体用于,将所述第一信息的分词结果输入到所述统计模型,得到所述第一信息的情感指数。
可选的,所述输入信息的分词结果为对所述输入信息中每两个相邻字符进行分词所得到的分词结果;
对所述第一信息进行分词时,分词单元具体用于,对所述第一信息中每两个相邻字符进行分词。
可选的,所述训练后的统计模型的训练特征还包括上下文的情感特征;
所述装置还包括:情感提取单元,用于提取所述第一信息的上下文的情感特征;
将所述第一信息的分词结果输入到所述统计模型,得到所述第一信息的情感指数时,情感计算单元具体用于,将所述第一信息的分词结果和所述第一信息的上下文的情感特征,输入到所述统计模型,得到所述第一信息的情感指数。
可选的,所述上下文的情感特征包括以下任一项或者多项:
前一句的情感指数、前一句与当前句的主题相似度,上文的整体情感分布、以及上文中的至少一条相关句的情感分布,所述至少一条相关句与当前句的主题相似度大于第二阈值。
可选的,所述训练后的统计模型包括训练后的第一统计模型和第二统计模型,所述第一统计模型的训练特征包括输入信息的分词结果,所述第二统计模型的训练特征包括上下文的情感特征。
可选的,所述训练后的统计模型为训练后的最大熵模型。
可选的,还包括:词向量获取单元,用于获取所述第一信息的词向量和所述第二信息的词向量;
匹配度计算单元,还用于计算所述第一信息的词向量与所述第二信息的词向量的匹配度,作为第二匹配度;
至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述第一信息和所述第二信息的匹配度时,匹配度计算单元具体用于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度和所述第二匹配度,计算所述第一信息和所述第二信息的匹配度。
可选的,还包括:修正单元,用于获取所述标签类目树中多个标签节点之间的匹配度,根据所述多个标签节点之间的匹配度进行机器学习,根据机器学习的结果生成或者修正所述标签类目树。
通过上述技术方案可知,本申请实施例中在匹配第一信息和第二信息时,不再将第一信息和第二信息分词后直接匹配,而是从标签类目树中获取第一信息对应的第一树枝和第二信息对应的第二树枝。其中,第一树枝的最低层的标签节点与所述第一信息的内容相匹配,并且所述标签类目树中每个标签节点的父标签节点为该标签节点的父类目,因此所述第一树枝不仅包括与第一信息的内容相匹配的标签节点,还包括该相匹配的标签节点的逐层父类目,同样地,所述第二树枝不仅包括与第二信息的内容相匹配的标签节点,还包括该相匹配的标签节点的逐层父类目,因此,根据所述第一树枝与所述第二树枝在每层分别对应的匹配度计算出的第一信息和第二信息的匹配度,不仅能够反映第一信息和第二信息的匹配度,还能够反映第一信息和第二信息的逐层父类目的匹配度,相当于反映出第一信息和第二信息的逐层父类目之间的关联性,从而提高了匹配准确率。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其它的附图。
图1为本申请提供的一种方法实施例的流程示意图;
图2为本申请提供的标签类目树的一种示意图;
图3为本申请提供的另一种方法实施例的流程示意图;
图4为本申请提供的场景类目树的一种示意图;
图5为本申请提供的另一种方法实施例的流程示意图;
图6为本申请提供的一种装置实施例的结构示意图;
图7为本申请提供的另一种装置实施例的结构示意图;
图8为本申请提供的另一种装置实施例的结构示意图;
图9为本申请提供的另一种装置实施例的结构示意图;
图10为本申请提供的另一种装置实施例的结构示意图;
图11为本申请提供的另一种装置实施例的结构示意图。
具体实施方式
评价信息指的是用户在网站、APP(应用程序)等网络平台输入的反馈信息。例如,买家在电子商务网站上购买了商品后,可以对该商品、商家提供的物流、服务等服务流程进行评价。商家通过输入商家订阅信息,能够提取出商家感兴趣的评价信息并推送给商家。具体过程包括:买家输入多条评价信息,商家输入商家订阅信息,将商家订阅信息与评价信息分别进行分词,判断两者是否存在相同的分词结果,根据相同的分词结果计算多条信息之间的匹配度。
显然,上述信息匹配方式只能判断出评价信息和商家订阅信息之间是否存在相同的分词结果,而无法反映两者之间是否存在关联性,例如无法判断出两者的父类目之间的关联性。举例说明,买家输入的评价信息为“服务不好”,而商家订阅信息为“客服态度”,虽然“服务不好”和“客服态度”的父类目都是服务,具有一定的关联性,但是按照上述信息匹配方式计算出的匹配度为0,显然匹配准确率较低,导致商家需要通过额外的算法获取具有关联性的评价信息,造成系统资源的浪费。
本申请实施例提供一种信息匹配方法及相关装置,使得计算出的匹配度能够反映信息之间的关联性,具体是反映多条信息的逐层父类目之间的关联性,从而提高匹配准确率。
为了使本技术领域的人员更好地理解本申请中的技术方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
请参阅图1,本申请实施例提供了信息匹配方法的一种方法实施例,本实施例的所 述方法包括:
S101:获取待匹配的第一信息和第二信息。
其中,所述第一信息和/或所述第二信息可以是用户输入的词语、短句等信息。例如,所述第一信息可以为买家输入的用户评价信息,所述第二信息可以为商家输入的商家订阅信息。
S102:获取标签类目树。
本申请实施例中的所述标签类目树包括至少两层,每层包括至少一个标签节点,每个标签节点的父标签节点为该标签节点的父类目。
例如图2所示的标签类目树包括三层,第一层包括一个标签节点:“服务”,即所述标签类目树的根节点;第二层包括两个标签节点:“售前”和“售后”;第三层包括四个标签节点:“客服态度”、“响应速度”、“返现”和“保修”。其中,所述标签类目树按照逐层递增的顺序,对应的类目逐层细化,也就是说,每个标签节点的父标签节点是该标签节点的父类目。例如,“售前”是“客服态度”的父类目,“服务”是“售前”的父类目。
S103:从所述标签类目树中获取第一树枝和第二树枝。所述第一树枝和/或所述第二树枝包括至少一个标签节点。
其中,所述第一树枝的最低层的标签节点与所述第一信息的内容相匹配,由于所述标签类目树中每个标签节点的父标签节点为该标签节点的父类目。因此,若所述第一信息匹配的不是根节点,则所述第一树枝不仅包括与第一信息的内容相匹配的标签节点,还包括该相匹配的标签节点的逐层父类目。
所述第一树枝的获取过程可以包括:将所述第一信息与所述标签类目树中的各个节点进行匹配,获得匹配的标签节点,将该匹配的标签节点以及该匹配的标签节点的逐层父节点作为所述第一树枝。其中,在与所述标签类目树匹配之前,可以将所述第一信息进行分词,将分词结果与所述标签类目树进行匹配。
例如,所述第一信息为:“服务不好”,将第一信息分词后得到分词结果“服务”和“不好”,将分词结果“服务”和“不好”与标签类目树中的各个节点进行匹配,获得匹配的标签节点“服务”,由于该标签节点“服务”是根节点,没有父节点,则将“服务”作为第一树枝。又例如,所述第一信息为:“客服态度不好”,根据上述类似的方式获得匹配的标签节点“客服态度”,将“客服态度”以及“客服态度”的逐层父节点:“售前”和“服务”作为第一树枝。
同样的,所述第二树枝的最低层的标签节点与所述第二信息的内容相匹配。若所述第二信息匹配的不是根节点,则所述第二树枝不仅包括与第二信息的内容相匹配的标签节点,还包括该相匹配的标签节点的逐层父类目。所述第二树枝的获取过程与所述第一信息的获取过程类似,可以包括:将所述第二信息与所述标签类目树中的各个节点进行匹配,获得匹配的节点,将该匹配的节点以及该匹配的节点的逐层父节点作为所述第二树枝。其中,在与所述标签类目树匹配之前,可以将所述第二信息进行分词,将分词结果与所述标签类目树进行匹配。
S104:至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述第一信息和所述第二信息的匹配度。
具体地,本步骤可以包括:所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算第一匹配度;至少根据所述第一匹配度,计算所述商家订阅信息和所述用户评价信息的匹配度。本申请实施例中可以直接将所述第一匹配度作为第一信息和第二信息的匹配度,也可以根据第一匹配度,并结合其他参数计算第一信息和第二信息的匹配度。
由于所述第一树枝包括至少一层标签节点,所述第二树枝包括至少一层标签节点,将所述第一树枝和所述第二树枝每层对应的标签节点进行匹配,获得每层分别对应的匹配度,并根据每层分别对应的匹配度计算所述第一信息和所述第二信息的匹配度。
例如,所述第一树枝依次包括:“服务”,所述第二树枝依次包括:“服务”、“售前”,第一层的匹配度为100%,第二层的匹配度为0,根据这两层的匹配度计算出第一匹配度。例如将这两层的匹配度之和的1/2作为所述第一信息和所述第二信息的匹配度,上述例子中计算出的匹配度为50%。又例如,所述第一树枝依次包括:“服务”、“售前”、“客服态度”,所述第二树枝依次包括:“服务”、“售前”、“响应速度”,将这三层的匹配度之和的1/3作为所述第一信息和所述第二信息的匹配度,计算出的匹配度为67%。
其中在根据每层分别对应的匹配度计算第一匹配度时,还可以考虑每层的权重值,例如,所述第一匹配度Tagsim为:
Figure PCTCN2017103858-appb-000001
其中,wi为第i层的权重值,Pi为所述第一树枝与所述第二树枝在第i层对应的匹配度,Pi=100%时函数I等于1,Pi≠100%时,函数I等于0。其中,各层的权重值可 以全部等于1,或者也可以逐层递增,权重值可以通过机器学习的方式进行设置和/或调整。需要说明的是,上述公式仅是第一匹配度的一种可选的计算方式,本领域技术人员可以对上述公式进行扩展和变形,例如Pi=100%时函数I可以等于其他数值,或者函数I也可以在满足其他条件时,例如大于一定的数值时等于1,本申请实施例对此不加以限定。
通过上述技术方案可知,本申请实施例中在匹配第一信息和第二信息时,不再将第一信息和第二信息分词后直接匹配,而是从标签类目树中获取第一信息对应的第一树枝和第二信息对应的第二树枝。其中,所述第一树枝不仅包括与第一信息的内容相匹配的标签节点,还包括该相匹配的标签节点的逐层父类目,同样地,所述第二树枝不仅包括与第二信息的内容相匹配的标签节点,还包括该相匹配的标签节点的逐层父类目,因此,根据所述第一树枝与所述第二树枝在每层分别对应的匹配度计算出的第一信息和第二信息的匹配度,不仅能够反映第一信息和第二信息的匹配度,还能够反映第一信息和第二信息的逐层父类目的匹配度,相当于反映出第一信息和第二信息的逐层父类目之间的关联性,从而提高了匹配准确率。
可见,本申请实施例实际上相当于对第一信息和第二信息加上了至少一层的类目标签,根据对应层的类目标签的匹配度计算第一信息和第二信息的匹配度。因此,应用本申请实施例能够计算出所属类目具有一定关联性的信息之间的匹配度,例如,同义词之间的匹配度,属于同一类目的多条信息之间的匹配度等等。
举例说明,买家输入的评价信息为“服务不好”,而商家订阅信息为“客服态度”,虽然“服务不好”和“客服态度”都是在描述服务,具有一定的关联性,然而将两者直接匹配时,匹配度为0,匹配准确率较低。而通过本申请实施例计算两者匹配度时,所述第一树枝依次包括:“服务”,所述第二树枝依次包括:“服务”、“售前”,第一层的匹配度为100%,第二层的匹配度为0,最终计算出的匹配度可以为50%。可见,本申请实施例中计算的匹配度能够反映这两者之间的关联性,因此提高了匹配准确率。
需要说明的是,本申请实施例中,除了用户评价信息和商家订阅信息之外,所述第一信息和所述第二信息也可以为其他应用场景下的信息。例如,所述第一信息为用户在微信群、钉钉群中输入的聊天信息,所述第二信息为特定订阅信息,例如群管理员输入的订阅词或者订阅短语等等,本申请实施例中对此并不加以限定。下面通过一个具体例子加以说明。
对于一个电影兴趣小组的微信群,标签类目树包括两层,第一层包括一个标签节点:“电影”,第二层包括两个标签节点:“喜剧”和“动作剧”。其中,所述标签类目树按照逐层递增的顺序,对应的类目逐层细化,也就是说,每个标签节点的父标签节点是该标签节点的父类目。例如,“电影”是“喜剧”和“动作剧”的父类目。若群管理员输入的订阅词为:“电影”,用户输入的聊天信息为:“我喜欢看喜剧”,将两者直接匹配时,匹配度为0,匹配准确率较低。而通过本申请实施例计算两者匹配度时,所述第一树枝依次包括:“电影”、“喜剧”,所述第二树枝包括:“电影”,最终计算出的匹配度可以为50%,提高了匹配准确率。
需要说明的是,若所述第一信息和/或所述第二信息从所述标签类目树中匹配到多条树枝,则可以从所述第一信息匹配的树枝中选取一个树枝,从所述第二信息匹配的树枝中选取一个树枝,计算两两树枝之间的匹配度,将计算出的最高的匹配度作为所述第一信息和所述第二信息的匹配度。
背景技术中描述的信息匹配方式,由于只判断是否存在相同的分词结果,因此无法计算出同义词之间的匹配度,进一步导致匹配准确率较低。为了解决这一问题,还提出了一种基于word embedding(中文:词向量)技术的信息匹配方式,通过word2vec(一种处理文本的双层神经网络)等方法计算出信息的词向量,根据词向量之间的相似性计算匹配度。因此本申请实施例在计算第一信息和第二信息的匹配度时,还可以结合第一信息和第二信息的词向量之间的相似性。下面具体说明。
所述方法还可以包括:获取所述第一信息的词向量和所述第二信息的词向量;计算所述第一信息的词向量与所述第二信息的词向量的匹配度,作为第二匹配度;S104中至少根据所述第一匹配度,即所述第一树枝与所述第二树枝在每层分别对应的匹配度,和所述第二匹配度,计算所述第一信息和所述第二信息的匹配度。
具体实现时,可以将所述第一信息分词后,提取每个词的词向量,将各个词的词向量相加得到第一信息的词向量,按照类似的方式可以获得第二信息的词向量,通过计算余弦相似度等方式计算第一信息的词向量与第二信息的词向量的匹配度。其中,词向量可以为利用word2vec等技术所提取的词向量。
在根据第一匹配度和第二匹配度计算第一信息和第二信息的匹配度时,可以将第一匹配度和第二匹配度之和作为最终的匹配度,同时也可以设置相应的权重值。例如,第一信息和第二信息的匹配度sim可以为:sim=λ1Vecsim+λ1Tagsim,其中, Tagsim为第一匹配度,Vecsim为第二匹配度,λ1和λ2为对应的权重值,该权重值可以通过机器学习的方式进行设置和/或调整。
其中,word embedding技术的原理就是利用机器学习技术对大量的信息进行学习,从而将词语通过对应的词向量表示,而词向量实际上表示的是词语所处的语境,但是在一些情况下根据词向量计算出的匹配度会存在准确率较低的问题。例如一种情况,有些词语的语境虽然相同,但是语义却有较大差别,因此词向量在很多情况下并不能准确地表示词语的语义。例如,“好”和“坏”的语义相反,但是词向量之间的余弦相似度却很高。例如另一种情况,相同词语在不同环境下所表达的含义不同。例如,“很薄”在描述手机时就是正面词,而在描述羽绒服时就是负面词,而通过词向量这种方式计算出的匹配度都是相同的。此外,由于很难证明词向量中的数值分别对应的含义,因此无法对词向量本身进行调整以解决上述问题。
为了解决上述问题,本申请实施例还可以根据统计模型计算信息的情感指数,该情感指数可以指示出该信息是正面词、负面词还是中性词,并且在计算最终的匹配度时考虑情感指数。
具体地,如图3所示,本申请实施例的所述方法还可以包括:
S301:获取训练后的统计模型。
其中,所述统计模型可以根据对大量的训练数据训练得到,每个训练数据都标记了对应的情感指数。例如,训练数据为20万条语句,每条语句都标记了对应的情感指数。
可选的,统计模型可以为最大熵模型等任一种数学模型。经过发明人大量的实验发现,采用最大熵模型时,能够使得计算出的情感指数更贴合语义,从而能够提高信息匹配的准确率。
S302:根据所述统计模型计算所述第一信息的情感指数。
将第一信息输入到训练后的统计模型,能够获得第一信息的情感指数。其中,根据情感指数所位于的区间,能够指示出第一信息对应的情感分别为正面、负面还是中性。
S303:计算所述第一信息的情感指数与目标情感指数的近似度。
在本申请实施例中,目标情感指数可以是预设的情感指数,也可以根据第二信息计算得出。例如,根据所述统计模型计算所述第二信息的情感指数,所述第二信息的情感指数作为所述目标情感指数。目标情感指数能够指示出目标情感为正面、负面还是中性。
其中,所述近似度可以表现为差值或者占比等任一种形式,或者也可以根据所述第 一信息的情感指数和所述目标情感指数指示的情感是否相同计算,例如,若所述第一信息的情感指数和所述目标情感指数指示的情感均为负面,则表示两者的近似度较高。
S104中至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度以及所述近似度,计算所述第一信息和所述第二信息的匹配度。
本实施例中,在计算第一信息和第二信息的匹配度时,还考虑了第一信息的情感指数和目标情感指数的近似度,并且该近似度越大时,也就是说第一信息的情感与目标情感越接近时,计算出的匹配度越高,反之则越低,从而能够解决语境相同但是语义差别很大时造成的匹配准确率低的问题。例如对于“大”和“小”,由于情感相差很大,因此计算出的匹配度也就越低,与语义相符,从而提高了匹配准确率。
因此在本实施例中,假设商家关心的是用户评价信息中的负面评价信息,因此,可以预设目标情感指数为负面对应的情感指数,若用户评价信息与目标情感指数比较接近时,则最终计算出的匹配度较高,从而根据这种方式提取出商家关心的负面评价信息。
在具体计算匹配度时,可以采用以下方式:
若所述近似度大于或等于第一阈值,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度计算所述第一信息和所述第二信息的匹配度。例如所述第一信息的情感指数和所述目标情感指数指示的情感均为负面,sim=Tagsim,其中sim为第一信息和第二信息的匹配度,Tagsim为第一匹配度。
若所述近似度小于所述第一阈值,所述第一信息和所述第二信息的匹配度为0。例如所述第一信息的情感指数和所述目标情感指数指示的情感不同,sim=0。此时所述第一信息和所述第二信息的匹配度还可以为其他较低的数值,本申请实施例对此不做限定。
本申请实施例中,针对相同词语在不同环境下所表达的含义不同,还可以设置多个类目分别对应的统计模型,每个统计模型能够计算出在该类目下,第一信息的情感指数。不同的统计模型根据不同的场景类目对应的训练数据训练得到,例如对于同一语句,在不同场景类目下标记的情感指数不同,从而使得不同的统计模型计算出的情感指数与场景类目对应。
具体地,获取训练后的统计模型可以包括:获取所述第一信息对应的类目,获取所述类目对应的训练后的统计模型。其中,所述第一信息对应的类目,可以指的是所述第一信息的评价对象所属的类目,举例说明,买家在电子商务网站上购买了服装类的商品, 并输入了用户评价信息用于评价该服装类的商品,即该用户评价信息所对应的类目为服装类。
其中,可以通过场景类目树的方式获取所述第一信息对应的类目。具体地,获取所述第一信息对应的类目包括:获取场景类目树,所述场景类目树包括至少两层,每层包括至少一个场景节点,每个场景节点的父场景节点为该场景节点的父类目;从所述场景类目树中获取与所述第一信息匹配的场景节点,确定出所述匹配的场景节点对应的上一级或多级父场景节点,将所述上一级或多级父场景节点作为所述第一信息对应的类目。其中,上一级或多级父场景节点可以指的是根场景节点,即直接获取根场景节点作为对应的类目。
举例说明,买家在电子商务网站上购买了裙子,并输入了用户评价信息用于评价该裙子,因此从场景类目树中获取到匹配的场景节点:裙子,确定出该场景节点对应的根场景节点:服装类,获取服装类对应的训练后的统计模型,利用该统计模型计算第一信息的情感指数。因此,本实施例在计算“很薄”的情感指数时,根据“很薄”对应的场景类目具体是手机还是服装类,选取对应的统计模型,从而根据场景类目计算出“很薄”的情感指数,提高了信息匹配的准确率。
可选的,本实施例中的统计模型的训练特征包括输入信息的分词结果;
所述方法还包括:对所述第一信息进行分词,得到第一信息的分词结果;根据所述统计模型计算所述第一信息的情感指数,包括:将所述第一信息的分词结果输入到所述统计模型,得到所述第一信息的情感指数。
经过发明人大量的实验表明,在进行分词时,可以基于bigram模式进行分词,也就是对所述第一信息中每两个相邻字符进行分词,得到第一信息的分词结果。例如:“服务不好”的分词结果为“服务”、“务不”和“不好”。基于该方式进行分词能够获得较高的信息匹配的准确率。
除了分词结果之外,统计模型的训练特征还可以包括上下文的情感特征,从而能够综合词语本身和上下文信息对情感指数进行计算。具体地,所述方法还包括:提取所述第一信息的上下文的情感特征;将所述第一信息的分词结果输入到所述统计模型,得到所述第一信息的情感指数,包括:将所述第一信息的分词结果和所述第一信息的上下文的情感特征,输入到所述统计模型,得到所述第一信息的情感指数。
其中,所述上下文的情感特征包括以下任一项或者多项:
前一句的情感指数、前一句与当前句的主题相似度,上文的整体情感分布、以及上 文中的至少一条相关句的情感分布,所述至少一条相关句与当前句的主题相似度大于第二阈值。下面分别说明。前一句的情感指数可以指示前一句的情感是正面、负面还是中性;前一句与当前句的主题相似度能够表示前一句和当前句描述的是否是相同或相似主题;上文的整体情感分布可以指的是上文中,情感分别为正面、负面和中性的语句的数量;相关句用于表示与当前句描述相同或相似主题的句子,而上文中的至少一条相关句的情感分布可以指的是上文描述相同或相似主题的句子中,分别为正面、负面和中性的语句的数量。
本申请实施例具体可以采用两个统计模型计算第一信息的情感指数。也就是说,所述训练后的统计模型包括训练后的第一统计模型和第二统计模型,所述第一统计模型的训练特征包括输入信息的分词结果,所述第二统计模型的训练特征包括上下文的情感特征。
下面以电子商务网站对应的场景为例,描述本申请提供的一种具体实施例。
请参阅图5,本申请实施例提供了信息匹配方法的另一种方法实施例,本实施例的所述方法包括:
S501:获取买家输入的用户评价信息和商家输入的商家订阅信息。其中,买家输入的用户评价信息用于评价买家购买的裙子,即评价对象为裙子。
例如,该用户评价为“响应速度慢”,商家订阅信息为“客服态度”
S502:获取如图2所示的标签类目树。其中,可以通过手动添加等方式对本申请实施例中的标签类目树进行修改。
S503:从所述标签类目树中获取第一树枝和第二树枝。所述第一树枝的最低层的标签节点与所述用户评价信息匹配,具体包括:服务、售前、响应速度;所述第二树枝的最低层的标签节点与所述商家订阅信息匹配,具体包括:服务、售前、客服态度。
S503:至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算第一匹配度。
例如,所述第一匹配度的计算公式为:
Figure PCTCN2017103858-appb-000002
其中,wi为第i层的权重值,Pi为所述第一树枝与所述第二树枝在第i层对应的匹配度,Pi=100%时函数I等于1,Pi≠100%时,函数I等于0。
S504:分别获取用户评价信息的词向量和商家订阅信息的词向量,计算词向量的匹 配度,作为第二匹配度。
S505:获取如图4所示的场景类目树。其中,可以通过手动添加等方式对本申请实施例中的场景类目树进行修改。
S506:从场景类目树中获取与所述评价对象匹配的场景节点:裙子,确定出该场景节点对应的根场景节点:服装类。
S507:获取服装类对应的训练后的最大熵模型A和最大熵模型B。该最大熵模型A的训练特征包括基于bigram模式的分词结果,该最大熵模型B的训练特征包括上下文的情感特征。
S508:对用户评价信息基于bigram模式进行分词,将分词结果输入到最大熵模型A,得到用户评价信息的情感指数。
S509:提取用户评价信息的上下文的情感特征,将该上下文的情感特征和S508中得到的情感指数输入到最大熵模型B,得到修正后的情感指数。
其中,如表1所示,所述上下文的情感特征包括以下多项:
前一句的情感指数(分别为正面、负面还是中性,以及相应的强度)、前一句和当前句描述的是否是相同主题、上文中情感分别为正面、负面和中性的语句的数量、以及上文描述相同主题的句子中,分别为正面、负面和中性的语句的数量。
表1
Figure PCTCN2017103858-appb-000003
S510:根据修正后的情感指数、第一匹配度和第二匹配度计算用户评价信息和商家订阅信息的匹配度。
其中,目标情感为负面,若S509中得到的修正后的情感指数指示的情感不为负面,则匹配度为0。
若S509中得到的修正后的情感指数指示的情感为负面,则匹配度为:
sim=λ1Vecsim+λ1Tagsim
Tagsim为S503中计算出的第一匹配度,Vecsim为S504中计算出的第二匹配度,λ1和λ2为对应的权重值。
请参阅图6,本申请实施例还提供了信息匹配方法的另一种实施例。本实施例的所述方法包括:
S601:获取待匹配的第一信息和第二信息。
其中,所述第一信息和/或所述第二信息可以是用户输入的词语、短句等信息。例如,所述第一信息可以为买家输入的用户评价信息,所述第二信息可以为商家输入的商家订阅信息。
S602:获取训练后的统计模型。
S603:根据所述统计模型计算所述第一信息的情感指数。
S604:至少根据所述第一信息的情感指数与目标情感指数的近似度,计算所述第一信息和第二信息的匹配度。
可选的,所述方法还包括:获取所述第一信息与所述第二信息的初始匹配度;步骤S604包括:至少根据所述近似度和所述初始匹配度,计算所述第一信息和所述第二信息的匹配度。
其中,所述初始匹配度可以是上述实施例中的所述第一匹配度,即所述第一树枝与所述第二树枝在每层分别对应的匹配度。
可选的,至少根据所述近似度和所述初始匹配度,计算所述第一信息和所述第二信息的匹配度,包括:
若所述近似度大于或等于第一阈值,至少根据所述初始匹配度计算所述第一信息和所述第二信息的匹配度;
若所述近似度小于所述第一阈值,所述第一信息和所述第二信息的匹配度为0。
可选的,获取训练后的统计模型,包括:
获取所述第一信息对应的类目;获取所述类目对应的训练后的统计模型。
可选的,获取所述第一信息对应的类目,包括:
获取场景类目树,所述场景类目树包括至少两层,每层包括至少一个场景节点,每个场景节点的父场景节点为该场景节点的父类目;
从所述场景类目树中获取与所述第一信息匹配的场景节点,确定出所述匹配的场景 节点对应的上一级或多级父场景节点,将所述上一级或多级父场景节点作为所述第一信息对应的类目。
可选的,所述方法还包括:
根据所述统计模型计算所述第二信息的情感指数,将所述第二信息的情感指数作为所述目标情感指数。
本实施例的相关内容请参阅图1、3、5所示实施例中的相关描述,这里不再赘述。
请参阅图7,本申请还提供了信息输入方法的一种实施例。本实施例的所述方法包括:
S701:客户端获取第一信息或者第二信息。
S702:所述客户端将所述第一信息或者第二信息发送至计算单元,所述计算单元用于计算第一信息和第二信息的匹配度。
其中,计算单元可以采用上述信息匹配方法的任一种实施例,计算第一信息和第二信息的匹配度。本实施例的相关内容请参阅图1、3、5所示实施例中的相关描述,这里不再赘述。
对应上述方法实施例,本申请还提供了相应的装置实施例,下面具体说明。
请参阅图8,本申请实施例提供了信息匹配装置的一种装置实施例。本实施例的所述装置包括:
信息获取单元801,用于获取待匹配的商家订阅信息和用户评价信息。
类目树获取单元802,用于标签类目树,所述标签类目树包括至少两层,每层包括至少一个标签节点,每个标签节点的父标签节点为该标签节点的父类目。
树枝获取单元803,用于从所述标签类目树中获取第一树枝和第二树枝,所述第一树枝的最低层的标签节点与所述用户评价信息的内容相匹配,所述第二树枝的最低层的标签节点与所述商家订阅信息的内容相匹配。
匹配度计算单元804,用于至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述商家订阅信息和所述用户评价信息的匹配度。
可选的,所述匹配度计算单元具体用于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算第一匹配度,至少根据所述第一匹配度,计算所述商家订阅信息和所述用户评价信息的匹配度。
可选的,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算第 一匹配度时,所述匹配度计算单元具体用于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,以及每层的权重值,计算第一匹配度。
可选的,还包括:
模型获取单元,用于获取训练后的统计模型;
情感计算单元,用于根据所述统计模型计算所述用户评价信息的情感指数;
近似度计算单元,用于计算所述用户评价信息的情感指数与目标情感指数的近似度;
所述匹配度计算单元具体用于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度以及所述近似度,计算所述用户评价信息和所述商家订阅信息的匹配度。
可选的,所述情感计算单元,还用于根据所述统计模型计算所述商家订阅信息的情感指数,所述商家订阅信息的情感指数作为所述目标情感指数。
可选的,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度以及所述近似度,计算所述用户评价信息和所述商家订阅信息的匹配度时,所述匹配度计算单元具体用于:
若所述近似度大于或等于第一阈值,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度计算所述用户评价信息和所述商家订阅信息的匹配度;
若所述近似度小于所述第一阈值,所述用户评价信息和所述商家订阅信息的匹配度为0。
可选的,所述模型获取单元具体用于,获取所述用户评价信息对应的类目,获取所述类目对应的训练后的统计模型。
可选的,获取所述用户评价信息对应的类目时,所述模型获取单元具体用于:
获取场景类目树,所述场景类目树包括至少两层,每层包括至少一个场景节点,每个场景节点的父场景节点为该场景节点的父类目;
从所述场景类目树中获取与所述用户评价信息匹配的场景节点,确定出所述匹配的场景节点对应的上一级或多级父场景节点,将所述上一级或多级父场景节点作为所述用户评价信息对应的类目。
可选的,还包括:词向量获取单元,用于获取所述用户评价信息的词向量和所述商家订阅信息的词向量;
匹配度计算单元,还用于计算所述用户评价信息的词向量与所述商家订阅信息的词向量的匹配度,作为第二匹配度;
至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述用户评 价信息和所述商家订阅信息的匹配度时,匹配度计算单元具体用于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度和所述第二匹配度,计算所述用户评价信息和所述商家订阅信息的匹配度。
可选的,还包括:
修正单元,用于获取所述标签类目树中多个标签节点之间的匹配度,根据所述多个标签节点之间的匹配度进行机器学习,根据机器学习的结果生成或者修正所述标签类目树。
请参阅图9,本申请实施例提供了信息匹配装置的另一种装置实施例。本实施例的所述装置包括:
信息获取单元901,用于获取待匹配的商家订阅信息和用户评价信息;
模型获取单元902,用于获取训练后的统计模型;
情感计算单元903,用于根据所述统计模型计算所述用户评价信息的情感指数;
匹配度计算单元904,用于至少根据所述用户评价信息的情感指数与目标情感指数的近似度,计算所述用户评价信息和所述商家订阅信息的匹配度。
可选的,还包括:
匹配度获取单元,用于获取所述用户评价信息与所述商家评价信息的初始匹配度;
至少根据所述用户评价信息的情感指数与目标情感指数的近似度,计算所述用户评价信息和所述商家订阅信息的匹配度时,所述匹配度计算单元具体用于,至少根据所述近似度和所述初始匹配度,计算所述用户评价信息和所述商家订阅信息的匹配度。
可选的,至少根据所述近似度和所述初始匹配度,计算所述用户评价信息和所述商家订阅信息的匹配度时,所述匹配度计算单元具体用于:
若所述近似度大于或等于第一阈值,至少根据所述初始匹配度计算所述用户评价信息和所述商家订阅信息的匹配度;
若所述近似度小于所述第一阈值,所述用户评价信息和所述商家订阅信息的匹配度为0。
可选的,模型获取单元具体用于,获取所述用户评价信息对应的类目,获取所述类目对应的训练后的统计模型。
可选的,获取所述用户评价信息对应的类目时,所述模型获取单元具体用于:
获取场景类目树,所述场景类目树包括至少两层,每层包括至少一个场景节点,每个场景节点的父场景节点为该场景节点的父类目;
从所述场景类目树中获取与所述用户评价信息匹配的场景节点,确定出所述匹配的场景节点对应的上一级或多级父场景节点,将所述上一级或多级父场景节点作为所述用户评价信息对应的类目。
可选的,情感计算单元还用于,根据所述统计模型计算所述商家订阅信息的情感指数,将所述商家订阅信息的情感指数作为所述目标情感指数。
请参阅图10,本申请实施例提供了客户端的一种装置实施例。本实施例的所述装置包括:
信息获取单元1001,用于获取用户输入的用户评价信息或者商家订阅信息;
发送单元1002,用于将所述用户评价信息或者商家订阅信息发送至计算单元,所述计算单元用于计算用户评价信息和商家订阅信息的匹配度。
请参阅图11,本申请实施例提供了信息匹配装置的另一种装置实施例。本实施例的所述装置包括:
信息获取单元1101,用于获取待匹配的第一信息和第二信息;
类目树获取单元1102,用于标签类目树,所述标签类目树包括至少两层,每层包括至少一个标签节点,每个标签节点的父标签节点为该标签节点的父类目;
树枝获取单元1103,用于从所述标签类目树中获取第一树枝和第二树枝,所述第一树枝的最低层的标签节点与所述第一信息的内容相匹配,所述第二树枝的最低层的标签节点与所述第二信息的内容相匹配;
匹配度计算单元1104,用于至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述第一信息和所述第二信息的匹配度。
可选的,所述匹配度计算单元具体用于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算第一匹配度;至少根据所述第一匹配度,计算所述第一信息和所述第二信息的匹配度。
可选的,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算第一匹配度时,所述匹配度计算单元具体用于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,以及每层的权重值,计算第一匹配度。
可选的,还包括:
模型获取单元,用于获取训练后的统计模型;
情感计算单元,用于根据所述统计模型计算所述第一信息的情感指数;
近似度计算单元,用于计算所述第一信息的情感指数与目标情感指数的近似度;
至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述第一信息和所述第二信息的匹配度时,所述匹配度计算单元具体用于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度以及所述近似度,计算所述第一信息和所述第二信息的匹配度。
可选的,情感计算单元还用于,根据所述统计模型计算所述第二信息的情感指数,所述第二信息的情感指数作为所述目标情感指数。
可选的,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度以及所述近似度,计算所述第一信息和所述第二信息的匹配度时,所述匹配度计算单元具体用于:
若所述近似度大于或等于第一阈值,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度计算所述第一信息和所述第二信息的匹配度;
若所述近似度小于所述第一阈值,所述第一信息和所述第二信息的匹配度为0。
可选的,模型获取单元具体用于,获取所述第一信息对应的类目,获取所述类目对应的训练后的统计模型。
可选的,获取所述第一信息对应的类目时,模型获取单元具体用于:
获取场景类目树,所述场景类目树包括至少两层,每层包括至少一个场景节点,每个场景节点的父场景节点为该场景节点的父类目;
从所述场景类目树中获取与所述第一信息匹配的场景节点,确定出所述匹配的场景节点对应的上一级或多级父场景节点,将所述上一级或多级父场景节点作为所述第一信息对应的类目。
可选的,所述训练后的统计模型的训练特征包括输入信息的分词结果;
所述装置还包括:分词单元,用于对所述第一信息进行分词,得到第一信息的分词结果;
情感计算单元具体用于,将所述第一信息的分词结果输入到所述统计模型,得到所述第一信息的情感指数。
可选的,所述输入信息的分词结果为对所述输入信息中每两个相邻字符进行分词所得到的分词结果;
对所述第一信息进行分词时,分词单元具体用于,对所述第一信息中每两个相邻字符进行分词。
可选的,所述训练后的统计模型的训练特征还包括上下文的情感特征;
所述装置还包括:情感提取单元,用于提取所述第一信息的上下文的情感特征;
将所述第一信息的分词结果输入到所述统计模型,得到所述第一信息的情感指数时,情感计算单元具体用于,将所述第一信息的分词结果和所述第一信息的上下文的情感特征,输入到所述统计模型,得到所述第一信息的情感指数。
可选的,所述上下文的情感特征包括以下任一项或者多项:
前一句的情感指数、前一句与当前句的主题相似度,上文的整体情感分布、以及上文中的至少一条相关句的情感分布,所述至少一条相关句与当前句的主题相似度大于第二阈值。
可选的,所述训练后的统计模型包括训练后的第一统计模型和第二统计模型,所述第一统计模型的训练特征包括输入信息的分词结果,所述第二统计模型的训练特征包括上下文的情感特征。
可选的,所述训练后的统计模型为训练后的最大熵模型。
可选的,还包括:词向量获取单元,用于获取所述第一信息的词向量和所述第二信息的词向量;
匹配度计算单元,还用于计算所述第一信息的词向量与所述第二信息的词向量的匹配度,作为第二匹配度;
至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述第一信息和所述第二信息的匹配度时,匹配度计算单元具体用于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度和所述第二匹配度,计算所述第一信息和所述第二信息的匹配度。
可选的,还包括:修正单元,用于获取所述标签类目树中多个标签节点之间的匹配度,根据所述多个标签节点之间的匹配度进行机器学习,根据机器学习的结果生成或者修正所述标签类目树。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置 或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (37)

  1. 一种信息匹配方法,其特征在于,包括:
    获取待匹配的商家订阅信息和用户评价信息;
    获取标签类目树,所述标签类目树包括至少两层,每层包括至少一个标签节点,每个标签节点的父标签节点为该标签节点的父类目;
    从所述标签类目树中获取第一树枝和第二树枝,所述第一树枝的最低层的标签节点与所述用户评价信息的内容相匹配,所述第二树枝的最低层的标签节点与所述商家订阅信息的内容相匹配;
    至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述商家订阅信息和所述用户评价信息的匹配度。
  2. 根据权利要求1所述的方法,其特征在于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述商家订阅信息和所述用户评价信息的匹配度,包括:
    至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算第一匹配度;
    至少根据所述第一匹配度,计算所述商家订阅信息和所述用户评价信息的匹配度。
  3. 根据权利要求2所述的方法,其特征在于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算第一匹配度,包括:
    至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,以及每层的权重值,计算第一匹配度。
  4. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取训练后的统计模型;
    根据所述统计模型计算所述用户评价信息的情感指数;
    计算所述用户评价信息的情感指数与目标情感指数的近似度;
    至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述用户评价信息和所述商家订阅信息的匹配度,包括:
    至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度以及所述近似度,计算所述用户评价信息和所述商家订阅信息的匹配度。
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    根据所述统计模型计算所述商家订阅信息的情感指数,所述商家订阅信息的情感指 数作为所述目标情感指数。
  6. 根据权利要求4所述的方法,其特征在于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度以及所述近似度,计算所述用户评价信息和所述商家订阅信息的匹配度,包括:
    若所述近似度大于或等于第一阈值,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度计算所述用户评价信息和所述商家订阅信息的匹配度;
    若所述近似度小于所述第一阈值,所述用户评价信息和所述商家订阅信息的匹配度为0。
  7. 根据权利要求4所述的方法,其特征在于,获取训练后的统计模型,包括:
    获取所述用户评价信息对应的类目;
    获取所述类目对应的训练后的统计模型。
  8. 根据权利要求7所述方法,其特征在于,获取所述用户评价信息对应的类目,包括:
    获取场景类目树,所述场景类目树包括至少两层,每层包括至少一个场景节点,每个场景节点的父场景节点为该场景节点的父类目;
    从所述场景类目树中获取与所述用户评价信息匹配的场景节点,确定出所述匹配的场景节点对应的上一级或多级父场景节点,将所述上一级或多级父场景节点作为所述用户评价信息对应的类目。
  9. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取所述用户评价信息的词向量和所述商家订阅信息的词向量;
    计算所述用户评价信息的词向量与所述商家订阅信息的词向量的匹配度,作为第二匹配度;
    至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述用户评价信息和所述商家订阅信息的匹配度,包括:
    至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度和所述第二匹配度,计算所述用户评价信息和所述商家订阅信息的匹配度。
  10. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取所述标签类目树中多个标签节点之间的匹配度;
    根据所述多个标签节点之间的匹配度进行机器学习,根据机器学习的结果生成或者修正所述标签类目树。
  11. 一种信息匹配方法,其特征在于,包括:
    获取待匹配的商家订阅信息和用户评价信息;
    获取训练后的统计模型;
    根据所述统计模型计算所述用户评价信息的情感指数;
    至少根据所述用户评价信息的情感指数与目标情感指数的近似度,计算所述用户评价信息和所述商家订阅信息的匹配度。
  12. 根据权利要求11所述的方法,其特征在于,所述方法还包括:
    获取所述用户评价信息与所述商家评价信息的初始匹配度;
    至少根据所述用户评价信息的情感指数与目标情感指数的近似度,计算所述用户评价信息和所述商家订阅信息的匹配度,包括:
    至少根据所述近似度和所述初始匹配度,计算所述用户评价信息和所述商家订阅信息的匹配度。
  13. 根据权利要求12所述的方法,其特征在于,至少根据所述近似度和所述初始匹配度,计算所述用户评价信息和所述商家订阅信息的匹配度,包括:
    若所述近似度大于或等于第一阈值,至少根据所述初始匹配度计算所述用户评价信息和所述商家订阅信息的匹配度;
    若所述近似度小于所述第一阈值,所述用户评价信息和所述商家订阅信息的匹配度为0。
  14. 根据权利要求11所述的方法,其特征在于,获取训练后的统计模型,包括:
    获取所述用户评价信息对应的类目;
    获取所述类目对应的训练后的统计模型。
  15. 根据权利要求14所述的方法,其特征在于,获取所述用户评价信息对应的类目,包括:
    获取场景类目树,所述场景类目树包括至少两层,每层包括至少一个场景节点,每个场景节点的父场景节点为该场景节点的父类目;
    从所述场景类目树中获取与所述用户评价信息匹配的场景节点,确定出所述匹配的场景节点对应的上一级或多级父场景节点,将所述上一级或多级父场景节点作为所述用户评价信息对应的类目。
  16. 根据权利要求11所述的方法,其特征在于,所述方法还包括:
    根据所述统计模型计算所述商家订阅信息的情感指数,将所述商家订阅信息的情感指数作为所述目标情感指数。
  17. 一种信息输入方法,其特征在于,包括:
    客户端获取用户输入的用户评价信息或者商家订阅信息;
    所述客户端将所述用户评价信息或者商家订阅信息发送至计算单元,所述计算单元用于计算用户评价信息和商家订阅信息的匹配度。
  18. 一种信息匹配方法,其特征在于,包括:
    获取待匹配的第一信息和第二信息;
    获取标签类目树,所述标签类目树包括至少两层,每层包括至少一个标签节点,每个标签节点的父标签节点为该标签节点的父类目;
    从所述标签类目树中获取第一树枝和第二树枝,所述第一树枝的最低层的标签节点与所述第一信息的内容相匹配,所述第二树枝的最低层的标签节点与所述第二信息的内容相匹配;
    至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述第一信息和所述第二信息的匹配度。
  19. 根据权利要求18所述的方法,其特征在于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述第一信息和所述第二信息的匹配度,包括:
    至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算第一匹配度;
    至少根据所述第一匹配度,计算所述第一信息和所述第二信息的匹配度。
  20. 根据权利要求19所述的方法,其特征在于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算第一匹配度,包括:
    至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,以及每层的权重值,计算第一匹配度。
  21. 根据权利要求18所述的方法,其特征在于,所述方法还包括:
    获取训练后的统计模型;
    根据所述统计模型计算所述第一信息的情感指数;
    计算所述第一信息的情感指数与目标情感指数的近似度;
    至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述第一信 息和所述第二信息的匹配度,包括:
    至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度以及所述近似度,计算所述第一信息和所述第二信息的匹配度。
  22. 根据权利要求21所述的方法,其特征在于,所述方法还包括:
    根据所述统计模型计算所述第二信息的情感指数,所述第二信息的情感指数作为所述目标情感指数。
  23. 根据权利要求21所述的方法,其特征在于,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度以及所述近似度,计算所述第一信息和所述第二信息的匹配度,包括:
    若所述近似度大于或等于第一阈值,至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度计算所述第一信息和所述第二信息的匹配度;
    若所述近似度小于所述第一阈值,所述第一信息和所述第二信息的匹配度为0。
  24. 根据权利要求21所述的方法,其特征在于,获取训练后的统计模型,包括:
    获取所述第一信息对应的类目;
    获取所述类目对应的训练后的统计模型。
  25. 根据权利要求24所述方法,其特征在于,获取所述第一信息对应的类目,包括:
    获取场景类目树,所述场景类目树包括至少两层,每层包括至少一个场景节点,每个场景节点的父场景节点为该场景节点的父类目;
    从所述场景类目树中获取与所述第一信息匹配的场景节点,确定出所述匹配的场景节点对应的上一级或多级父场景节点,将所述上一级或多级父场景节点作为所述第一信息对应的类目。
  26. 根据权利要求21所述的方法,其特征在于,所述训练后的统计模型的训练特征包括输入信息的分词结果;
    所述方法还包括:对所述第一信息进行分词,得到第一信息的分词结果;
    根据所述统计模型计算所述第一信息的情感指数,包括:将所述第一信息的分词结果输入到所述统计模型,得到所述第一信息的情感指数。
  27. 根据权利要求26所述的方法,其特征在于,所述输入信息的分词结果为对所述输入信息中每两个相邻字符进行分词所得到的分词结果;
    所述对所述第一信息进行分词,包括:对所述第一信息中每两个相邻字符进行分词。
  28. 根据权利要求26所述的方法,其特征在于,所述训练后的统计模型的训练特征 还包括上下文的情感特征;
    所述方法还包括:提取所述第一信息的上下文的情感特征;
    将所述第一信息的分词结果输入到所述统计模型,得到所述第一信息的情感指数,包括:将所述第一信息的分词结果和所述第一信息的上下文的情感特征,输入到所述统计模型,得到所述第一信息的情感指数。
  29. 根据权利要求28所述的方法,其特征在于,所述上下文的情感特征包括以下任一项或者多项:
    前一句的情感指数、前一句与当前句的主题相似度,上文的整体情感分布、以及上文中的至少一条相关句的情感分布,所述至少一条相关句与当前句的主题相似度大于第二阈值。
  30. 根据权利要求28所述的方法,其特征在于,所述训练后的统计模型包括训练后的第一统计模型和第二统计模型,所述第一统计模型的训练特征包括输入信息的分词结果,所述第二统计模型的训练特征包括上下文的情感特征。
  31. 根据权利要求21至30任一项所述的方法,其特征在于,所述训练后的统计模型为训练后的最大熵模型。
  32. 根据权利要求18所述的方法,其特征在于,所述方法还包括:
    获取所述第一信息的词向量和所述第二信息的词向量;
    计算所述第一信息的词向量与所述第二信息的词向量的匹配度,作为第二匹配度;
    至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述第一信息和所述第二信息的匹配度,包括:
    至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度和所述第二匹配度,计算所述第一信息和所述第二信息的匹配度。
  33. 根据权利要求18所述的方法,其特征在于,所述方法还包括:
    获取所述标签类目树中多个标签节点之间的匹配度;
    根据所述多个标签节点之间的匹配度进行机器学习,根据机器学习的结果生成或者修正所述标签类目树。
  34. 一种信息匹配装置,其特征在于,包括:
    信息获取单元,用于获取待匹配的商家订阅信息和用户评价信息;
    类目树获取单元,用于标签类目树,所述标签类目树包括至少两层,每层包括至少 一个标签节点,每个标签节点的父标签节点为该标签节点的父类目;
    树枝获取单元,用于从所述标签类目树中获取第一树枝和第二树枝,所述第一树枝的最低层的标签节点与所述用户评价信息的内容相匹配,所述第二树枝的最低层的标签节点与所述商家订阅信息的内容相匹配;
    匹配度计算单元,用于至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述商家订阅信息和所述用户评价信息的匹配度。
  35. 一种信息匹配装置,其特征在于,包括:
    信息获取单元,用于获取待匹配的商家订阅信息和用户评价信息;
    模型获取单元,用于获取训练后的统计模型;
    情感计算单元,用于根据所述统计模型计算所述用户评价信息的情感指数;
    匹配度计算单元,用于至少根据所述用户评价信息的情感指数与目标情感指数的近似度,计算所述用户评价信息和所述商家订阅信息的匹配度。
  36. 一种客户端,其特征在于,包括:
    信息获取单元,用于获取用户输入的用户评价信息或者商家订阅信息;
    发送单元,用于将所述用户评价信息或者商家订阅信息发送至计算单元,所述计算单元用于计算用户评价信息和商家订阅信息的匹配度。
  37. 一种信息匹配装置,其特征在于,包括:
    信息获取单元,用于获取待匹配的第一信息和第二信息;
    类目树获取单元,用于标签类目树,所述标签类目树包括至少两层,每层包括至少一个标签节点,每个标签节点的父标签节点为该标签节点的父类目;
    树枝获取单元,用于从所述标签类目树中获取第一树枝和第二树枝,所述第一树枝的最低层的标签节点与所述第一信息的内容相匹配,所述第二树枝的最低层的标签节点与所述第二信息的内容相匹配;
    计算单元,用于至少根据所述第一树枝与所述第二树枝在每层分别对应的匹配度,计算所述第一信息和所述第二信息的匹配度。
PCT/CN2017/103858 2016-10-11 2017-09-28 一种信息匹配方法及相关装置 WO2018068648A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610887444.0A CN107918778B (zh) 2016-10-11 2016-10-11 一种信息匹配方法及相关装置
CN201610887444.0 2016-10-11

Publications (1)

Publication Number Publication Date
WO2018068648A1 true WO2018068648A1 (zh) 2018-04-19

Family

ID=61891935

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/103858 WO2018068648A1 (zh) 2016-10-11 2017-09-28 一种信息匹配方法及相关装置

Country Status (3)

Country Link
CN (1) CN107918778B (zh)
TW (1) TW201814556A (zh)
WO (1) WO2018068648A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034938B (zh) * 2018-06-11 2022-07-05 广东因特利信息科技股份有限公司 信息快速筛选匹配方法、装置、电子设备及存储介质
CN109062986A (zh) * 2018-06-29 2018-12-21 深圳市彬讯科技有限公司 一种标签的分类处理方法及装置
CN109255000B (zh) * 2018-07-17 2022-10-11 土巴兔集团股份有限公司 一种标签数据的维度管理方法及装置
TWI682292B (zh) * 2018-08-24 2020-01-11 內秋應智能科技股份有限公司 遞迴式整合對話之智能語音裝置
CN109614494B (zh) * 2018-12-29 2021-10-26 东软集团股份有限公司 一种文本分类方法及相关装置
CN110335131B (zh) * 2019-06-04 2023-12-05 创新先进技术有限公司 基于树的相似度匹配的金融风险控制方法及装置
CN111797898B (zh) * 2020-06-03 2022-03-15 武汉大学 一种基于深度语义匹配的在线评论自动回复方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207914A (zh) * 2013-04-16 2013-07-17 武汉理工大学 基于用户反馈评价的偏好向量生成方法和系统
CN103679462A (zh) * 2012-08-31 2014-03-26 阿里巴巴集团控股有限公司 一种评论数据处理方法和装置、一种搜索方法和系统
CN103778214A (zh) * 2014-01-16 2014-05-07 北京理工大学 一种基于用户评论的商品属性聚类方法
US20150186790A1 (en) * 2013-12-31 2015-07-02 Soshoma Inc. Systems and Methods for Automatic Understanding of Consumer Evaluations of Product Attributes from Consumer-Generated Reviews
CN105183847A (zh) * 2015-09-07 2015-12-23 北京京东尚科信息技术有限公司 网络评论数据的特征信息采集方法和装置
CN105354183A (zh) * 2015-10-19 2016-02-24 Tcl集团股份有限公司 一种家电产品互联网评论的分析方法、装置及系统
CN105786838A (zh) * 2014-12-22 2016-07-20 阿里巴巴集团控股有限公司 一种信息匹配处理方法和装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102326144B (zh) * 2008-12-12 2015-06-17 阿迪吉欧有限责任公司 使用感兴趣领域确定的信息提供建议
CN103810192A (zh) * 2012-11-09 2014-05-21 腾讯科技(深圳)有限公司 一种用户的兴趣推荐方法和装置
CN104636386A (zh) * 2013-11-14 2015-05-20 华为技术有限公司 信息监控方法及装置
CN103886034B (zh) * 2014-03-05 2019-03-19 北京百度网讯科技有限公司 一种建立索引及匹配用户的查询输入信息的方法和设备
CN105095288B (zh) * 2014-05-14 2020-02-07 腾讯科技(深圳)有限公司 数据分析方法及数据分析装置
CN104933084B (zh) * 2015-05-04 2018-11-09 上海智臻智能网络科技股份有限公司 一种用于获得答案信息的方法、装置和设备
CN105550269A (zh) * 2015-12-10 2016-05-04 复旦大学 一种有监督学习的产品评论分析方法及系统
CN105740228B (zh) * 2016-01-25 2019-06-04 云南大学 一种互联网舆情分析方法及系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103679462A (zh) * 2012-08-31 2014-03-26 阿里巴巴集团控股有限公司 一种评论数据处理方法和装置、一种搜索方法和系统
CN103207914A (zh) * 2013-04-16 2013-07-17 武汉理工大学 基于用户反馈评价的偏好向量生成方法和系统
US20150186790A1 (en) * 2013-12-31 2015-07-02 Soshoma Inc. Systems and Methods for Automatic Understanding of Consumer Evaluations of Product Attributes from Consumer-Generated Reviews
CN103778214A (zh) * 2014-01-16 2014-05-07 北京理工大学 一种基于用户评论的商品属性聚类方法
CN105786838A (zh) * 2014-12-22 2016-07-20 阿里巴巴集团控股有限公司 一种信息匹配处理方法和装置
CN105183847A (zh) * 2015-09-07 2015-12-23 北京京东尚科信息技术有限公司 网络评论数据的特征信息采集方法和装置
CN105354183A (zh) * 2015-10-19 2016-02-24 Tcl集团股份有限公司 一种家电产品互联网评论的分析方法、装置及系统

Also Published As

Publication number Publication date
TW201814556A (zh) 2018-04-16
CN107918778B (zh) 2022-03-15
CN107918778A (zh) 2018-04-17

Similar Documents

Publication Publication Date Title
WO2018068648A1 (zh) 一种信息匹配方法及相关装置
US20220222920A1 (en) Content processing method and apparatus, computer device, and storage medium
AU2018383346B2 (en) Domain-specific natural language understanding of customer intent in self-help
WO2022041979A1 (zh) 一种信息推荐模型的训练方法和相关装置
WO2017121244A1 (zh) 信息推荐方法、系统以及存储介质
US20180336193A1 (en) Artificial Intelligence Based Method and Apparatus for Generating Article
US10026021B2 (en) Training image-recognition systems using a joint embedding model on online social networks
CN107463605B (zh) 低质新闻资源的识别方法及装置、计算机设备及可读介质
US20170220556A1 (en) Identifying expanding hashtags in a message
US10083379B2 (en) Training image-recognition systems based on search queries on online social networks
WO2022199504A1 (zh) 内容识别方法、装置、计算机设备和存储介质
CN105809473B (zh) 匹配模型参数的训练方法、服务推荐方法及对应装置
EP3092581A1 (en) Systems, devices, and methods for automatic detection of feelings in text
WO2015021937A1 (zh) 用户推荐方法和装置
CN109992781B (zh) 文本特征的处理方法、装置和存储介质
CN110955750A (zh) 评论区域和情感极性的联合识别方法、装置、电子设备
CN111813993A (zh) 视频内容的拓展方法、装置、终端设备及存储介质
CN114548274A (zh) 一种基于多模态交互的谣言检测方法及系统
CN107665442B (zh) 获取目标用户的方法及装置
CN113627550A (zh) 一种基于多模态融合的图文情感分析方法
CN107070702B (zh) 基于合作博弈支持向量机的用户账号关联方法及其装置
CN115329207B (zh) 智能销售信息推荐方法及系统
CN112307738A (zh) 用于处理文本的方法和装置
CN107729509B (zh) 基于隐性高维分布式特征表示的篇章相似度判定方法
CN114547266B (zh) 信息生成模型的训练方法、生成信息的方法、装置和设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17859593

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17859593

Country of ref document: EP

Kind code of ref document: A1