WO2018014759A1 - Method, device and system for presenting clustering data table - Google Patents

Method, device and system for presenting clustering data table Download PDF

Info

Publication number
WO2018014759A1
WO2018014759A1 PCT/CN2017/092444 CN2017092444W WO2018014759A1 WO 2018014759 A1 WO2018014759 A1 WO 2018014759A1 CN 2017092444 W CN2017092444 W CN 2017092444W WO 2018014759 A1 WO2018014759 A1 WO 2018014759A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
similarity
business
business objects
objects
Prior art date
Application number
PCT/CN2017/092444
Other languages
French (fr)
Chinese (zh)
Inventor
叶舟
王瑜
张亚楠
苏飞
杨洋
杜楠楠
毛庆凯
Original Assignee
阿里巴巴集团控股有限公司
叶舟
王瑜
张亚楠
苏飞
杨洋
杜楠楠
毛庆凯
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 叶舟, 王瑜, 张亚楠, 苏飞, 杨洋, 杜楠楠, 毛庆凯 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2018014759A1 publication Critical patent/WO2018014759A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present invention relates to the field of information technology, and in particular, to a method for generating a cluster data table, a device for generating a cluster data table, a method for displaying a cluster data table, and a display device for a cluster data table.
  • e-commerce websites such as Taobao and Tmall have been able to bring products from all over the world online for consumers to purchase.
  • e-commerce websites have begun to actively recommend products to consumers to reduce the time for consumers to search and purchase goods. It is one of the important ways to display the recommended products to the consumer groups in the form of product lists.
  • the list of goods usually consists of three parts:
  • Product list This list contains a series of similar products.
  • the list of clothing can be the same style of clothes, pants and shoes.
  • the list of household items can be a combination of the same color curtains, wallpaper and carpet. and many more.
  • the title is a short text that can be used to describe the characteristics of the product list.
  • the title of the clothing list can be “small fresh spring”, “pink matching control” and so on.
  • the list description can be a short paragraph of easy-to-understand text for further elaboration of the title of the list, such as a list of goods titled “The rice bowl in your hand”, the description can be “porcelain”
  • the healthiest bowls, different sizes and different patterns can add a lot to the table, so that consumers can understand the products recommended in the list.
  • the merchandise list of the e-commerce website is mainly realized by relying on the manual operation of the website operator, and by obtaining the consumption data of the consumer and combining the public opinion statistics of the external website, the product to be recommended is determined through manual analysis. The recommended items are then combined into a list, and the title and description of the list are refined.
  • the above method requires a lot of labor costs, and the list of goods formed also has subjective preferences of heavy operators, and may Unable to meet the needs and preferences of most consumers.
  • embodiments of the present application are provided to provide a method for generating a cluster data table, a cluster data table generating device, and a clustering data, which overcome the above problems or at least partially solve the above problems.
  • a display system of a cluster data table including:
  • One or more processors are One or more processors;
  • One or more modules the one or more modules being stored in the memory and configured to be executed by the one or more processors, wherein the one or more modules have the following functions:
  • a cluster data table is presented according to the request, the cluster data table includes a plurality of business object sets, the business object set having a plurality of associated business objects, and corresponding topic information.
  • the present application also discloses a method for displaying a cluster data table, which is characterized in that it comprises:
  • the cluster data table includes a plurality of business object sets, the business object set having a plurality of associated business objects, and corresponding topic information.
  • the multiple business object sets are generated by the following steps:
  • the attribute information of the plurality of business objects includes a name, price information, consumer information, brand information, category information, and/or picture information of the plurality of business objects;
  • the attribute information of the object, the determining the degree of association between the plurality of business objects includes:
  • the degree of association between any two business objects is determined according to the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity.
  • the determining the association between any two business objects according to the name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity respectively The steps of degree include:
  • the name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity are weighted and summed to obtain the degree of association between any two business objects.
  • the step of classifying the multiple service objects according to the degree of association between the multiple service objects, and obtaining the multiple service object sets includes:
  • the business objects whose association degree is greater than the preset threshold are respectively combined to obtain a plurality of business object sets.
  • the topic information includes title information and description information of the service object set, and the topic information is generated by the following steps:
  • the step of determining, according to the attribute information, the header information of the service object set includes:
  • determining header information of the business object set Using the target keyword and the first preset template, determining header information of the business object set.
  • the step of determining the description information of the service object set according to the header information includes:
  • the step of acquiring the comment information corresponding to the title information includes:
  • the review information that matches the one or more participle phrases is separately obtained.
  • the determining, according to the comment information, the description information of the service object set includes:
  • the request further includes user requirement information
  • the step of displaying the cluster data table according to the request includes:
  • the present application also discloses a method for generating a clustering data table, which is characterized in that it comprises:
  • the attribute information of the plurality of business objects includes a name, price information, consumer information, brand information, category information, and/or picture information of the plurality of business objects;
  • the attribute information of the object, the determining the degree of association between the plurality of business objects includes:
  • the degree of association between any two business objects is determined according to the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity.
  • the determining the association between any two business objects according to the name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity respectively The steps of degree include:
  • the name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity are weighted and summed to obtain the degree of association between any two business objects.
  • the step of classifying the multiple service objects according to the degree of association between the multiple service objects, and obtaining the multiple service object sets includes:
  • the business objects whose association degree is greater than the preset threshold are respectively combined to obtain a plurality of business object sets.
  • the step of determining the topic information corresponding to the multiple service object sets according to the multiple associated service objects includes:
  • the step of determining, according to the attribute information, the header information of the service object set includes:
  • determining header information of the business object set Using the target keyword and the first preset template, determining header information of the business object set.
  • the step of determining the description information of the service object set according to the header information includes:
  • the step of acquiring the comment information corresponding to the title information includes:
  • the review information that matches the one or more participle phrases is separately obtained.
  • the determining, according to the comment information, the description information of the service object set includes:
  • a display device for cluster data table which is characterized in that it comprises:
  • a receiving module configured to receive a presentation request of the cluster data table
  • a presentation module configured to present a cluster data table according to the request;
  • the cluster data table includes a plurality of business object sets, the business object set having a plurality of associated business objects, and corresponding topic information.
  • the plurality of business object sets are generated by calling the following modules:
  • a business object obtaining module configured to acquire a plurality of business objects, where the plurality of business objects respectively have corresponding attribute information
  • An association determining module configured to determine, according to attribute information of the multiple business objects, an association degree between the multiple service objects
  • a classification module configured to classify the plurality of business objects according to the degree of association between the plurality of business objects, to obtain a plurality of business object sets.
  • the attribute information of the multiple service objects includes a name, a price information, a consumer information, a brand information, a category information, and/or a picture information of the plurality of service objects.
  • the association degree determining module includes:
  • the similarity determination submodule is configured to respectively determine name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity between any two business objects;
  • the association determination submodule is configured to respectively determine between any two business objects according to the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity The degree of relevance.
  • the association determination submodule includes:
  • the association degree determining unit is configured to weight the sum of the name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity to obtain any two business objects. The degree of association between them.
  • the classification module includes:
  • the combination sub-module is configured to combine the business objects whose degree of association is greater than the preset threshold to obtain a plurality of business object sets.
  • the topic information includes title information and description information of the set of business objects, and the topic information is generated by calling the following module:
  • An attribute information obtaining module configured to acquire attribute information of multiple associated business objects in the business object set
  • a header information determining module configured to determine, according to the attribute information, header information of the service object set
  • a description information determining module configured to determine, according to the title information, description information of the service object set.
  • the title information determining module includes:
  • Keyword acquisition sub-module for obtaining key information in attribute information of multiple associated business objects word
  • a keyword sorting sub-module configured to sort the keywords to obtain a first preset number of target keywords
  • a header information determining submodule configured to determine, by using the target keyword and the first preset template, header information of the service object set.
  • the description information determining module includes:
  • a comment information obtaining submodule configured to obtain comment information corresponding to the title information
  • a description information determining submodule configured to determine, according to the comment information, description information of the business object set.
  • the comment information obtaining submodule includes:
  • a word segmentation unit configured to perform segmentation on the title information to obtain one or more word segmentation phrases
  • a comment information obtaining unit configured to respectively obtain the comment information that matches the one or more participle phrases.
  • the description information determining submodule includes:
  • a comment information sorting unit configured to sort the comment information to obtain a second preset number of target comment information
  • the description information determining unit is configured to determine description information of the business object set by using the target comment information and the second preset template.
  • the request further includes user requirement information, where the presentation module includes:
  • a target business object set obtaining submodule configured to acquire a plurality of target business object sets that match user demand information
  • the target business object presentation submodule is configured to display the plurality of target business object sets.
  • the present application further discloses a device for generating a cluster data table, which includes:
  • An obtaining module configured to acquire a plurality of business objects, where the plurality of business objects respectively have corresponding attribute information
  • An association determining module configured to determine the multiple according to attribute information of the multiple service objects The degree of association between business objects;
  • a classification module configured to classify the plurality of business objects according to the degree of association between the plurality of business objects, to obtain a plurality of business object sets, wherein the plurality of business object sets respectively have multiple associated business objects ;
  • a topic information determining module configured to respectively determine topic information corresponding to the plurality of service object sets according to the plurality of associated business objects
  • a generating module configured to generate a cluster data table according to the plurality of business object sets and the corresponding topic information.
  • the attribute information of the multiple service objects includes a name, a price information, a consumer information, a brand information, a category information, and/or a picture information of the plurality of service objects.
  • the association degree determining module includes:
  • the similarity determination submodule is configured to respectively determine name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity between any two business objects;
  • the association determination submodule is configured to respectively determine between any two business objects according to the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity The degree of relevance.
  • the association determination submodule includes:
  • the association degree determining unit is configured to weight the sum of the name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity to obtain any two business objects. The degree of association between them.
  • the classification module includes:
  • the combination sub-module is configured to combine the business objects whose degree of association is greater than the preset threshold to obtain a plurality of business object sets.
  • the topic information determining module includes:
  • An attribute information obtaining submodule configured to acquire attribute information of multiple associated business objects in the business object set
  • a header information determining submodule configured to determine the set of business objects according to the attribute information Title information
  • a description information determining submodule configured to determine, according to the header information, description information of the service object set.
  • the header information determining submodule includes:
  • a keyword obtaining unit configured to acquire keywords in attribute information of a plurality of associated business objects
  • a keyword sorting unit configured to sort the keywords to obtain a first preset number of target keywords
  • the title information determining unit is configured to determine the title information of the business object set by using the target keyword and the first preset template.
  • the description information determining submodule includes:
  • a comment information obtaining unit configured to obtain comment information corresponding to the title information
  • the description information determining unit is configured to determine description information of the business object set according to the comment information.
  • the comment information acquiring unit includes:
  • a word segmentation unit for segmenting the title information to obtain one or more word segmentation phrases
  • the comment information acquisition subunit is configured to respectively obtain the comment information that matches the one or more participle phrases.
  • the description information determining unit includes:
  • a comment information sorting subunit configured to sort the comment information to obtain a second preset number of target comment information
  • a description information determining subunit configured to determine description information of the business object set by using the target comment information and the second preset template.
  • the embodiments of the present application include the following advantages:
  • the embodiment of the present application may display a cluster data table including a plurality of service object sets according to the request, and can quickly identify the user demand, and exhibit The business object that satisfies the user's needs reduces the time for the user to search or find the business object, saves the resource consumption of the system caused by searching or searching for the business object, and improves the access efficiency.
  • Embodiment 1 is a flow chart showing the steps of Embodiment 1 of a method for generating a cluster data table according to the present application;
  • Embodiment 2 is a flow chart showing the steps of Embodiment 2 of a method for generating a cluster data table according to the present application;
  • Embodiment 3 is a schematic block diagram of Embodiment 2 of a method for generating a cluster data table according to the present application;
  • FIG. 4 is a flow chart showing the steps of an embodiment of a method for displaying a cluster data table according to the present application
  • FIG. 5 is a diagram showing an example of a cluster data table of the present application.
  • FIG. 6 is a structural block diagram of an embodiment of a device for generating a cluster data table according to the present application
  • FIG. 7 is a structural block diagram of an embodiment of a presentation device of a cluster data table of the present application.
  • FIG. 1 a flow chart of a first embodiment of a method for generating a cluster data table according to the present application is shown, which may specifically include the following steps:
  • Step 101 Acquire a plurality of service objects, where the plurality of service objects respectively have corresponding attribute information;
  • the business object may be a commodity, or other types of objects, for example, news information, etc., and the type of the business object is not limited in the present application.
  • the attribute information of the corresponding business object may be different for different business objects.
  • the attribute information may be a name, a price, a consumer, a brand, a specific category, and/or a picture of the item.
  • the attribute information may be information such as the source, time, location, and the like of the news information.
  • a person skilled in the art can select appropriate attribute information according to the specific type of the business object, which is not specifically limited in this application.
  • the acquisition of attribute information of different business objects may also be adopted.
  • the attribute information of the product can be obtained from the product data stored on the platform such as the e-commerce website
  • the attribute information of the news information can be obtained from the information platform such as the information website.
  • Step 102 Determine, according to attribute information of the multiple service objects, a degree of association between the multiple service objects.
  • the degree of association between any two business objects may be calculated.
  • the degree of association may be a numerical description of a degree of association between two business objects obtained by analyzing from attribute information of a plurality of different dimensions, the degree of association may reflect similarity between two business objects or Collocations, for example, for different types of shoes with similarities, such as leather shoes and sandals, can have a higher degree of relevance, but for business objects with certain collocations, such as clothes and pants, can also have higher The degree of relevance.
  • the two business objects may be separately calculated first. Name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity, then based on the name similarity, price similarity, consumer similarity, brand Similarity, category similarity, and/or picture similarity determine the degree of association between any two business objects.
  • the similarity between the different attribute information may be respectively calculated by using different calculation methods. For example, the cosine theorem Cosine formula, or the Jaccard Jaccard similarity may be used, and the specific similarity calculation method is not limited in this application.
  • the name when calculating the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity of any two business objects, the name may be Similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity are weighted and summed to obtain the degree of association between any two business objects.
  • the weights of the similarities of different information dimensions can be adjusted according to actual needs. For example, for clothing goods, the weight of picture similarity can be increased, and for digital goods, the weight of name similarity can be increased, so that the final obtained Relevance can be better Reflects the similarity or collocation between two different business objects.
  • Step 103 Classify the multiple service objects according to the degree of association between the multiple service objects, to obtain multiple service object sets, where the multiple service object sets respectively have multiple associated service objects;
  • the service objects whose association degree is greater than the preset threshold may be respectively combined to obtain a plurality of service object sets.
  • the multiple business objects may be classified by using a hierarchical clustering method to obtain a plurality of business object sets.
  • Hierarchical clustering is the hierarchical decomposition of a data set according to a certain method until a certain condition is met. According to the classification principle, it can be divided into two methods: condensation and splitting. Taking cohesion as an example, condensed hierarchical clustering is a bottom-up strategy. You can first treat each object as a cluster, then merge the clusters into larger and larger clusters until all objects are in one cluster. Medium, or a certain termination condition is met.
  • Hierarchical clustering is a widely used classification algorithm, which is not described in this application.
  • Step 104 Determine, according to the multiple associated service objects, topic information corresponding to the plurality of service object sets respectively;
  • the topic information may include title information and description information of the business object.
  • the title information of the set of business objects may be a phrase or a phrase that can reflect a common feature of all the business objects in the set, and the description information may be text information used to uniformly describe the business objects in the set, and It may be text information that further elaborates on the title information.
  • the step of determining the topic information corresponding to the plurality of service object sets according to the multiple associated service objects may specifically include the following sub-steps:
  • Sub-step 1041 acquiring attribute information of multiple associated business objects in the business object set
  • Sub-step 1042 determining, according to the attribute information, header information of the service object set
  • Sub-step 1043 determining, according to the header information, description information of the service object set.
  • attribute information of all business objects may be obtained first, and then from the The attribute information extracts text information for describing the business object, for example, the name of the product, or the product introduction text, and the like, and then extracts keywords from the text information, and sorts the keywords to obtain a ranking.
  • the first k keywords, and then the k keywords and the preset theme template may be used to determine the title information of the business object set.
  • the keyword When the keyword is sorted, it may be performed according to the number of occurrences of the keyword, or other methods, which is not specifically limited in this application.
  • the review information matching the title may be found according to the title information, and then the comment information with higher relevance to the title information is further filtered out from the searched search information. , thereby obtaining description information of the set of business objects.
  • the segmentation information can be segmented, the semantic model is used to expand the synonym of the segmentation message, and the comment data is matched by the text, thereby recalling the comment information matching the title information.
  • the comment information may be scored according to a certain rule, so that the top-ranked comment information is used, and the description information is generated by using a preset text template.
  • Step 105 Generate a cluster data table according to the plurality of business object sets and corresponding topic information.
  • the business object set and the topic information thereof may be merged into one cluster data table to be presented to the user.
  • the attribute information of the plurality of business objects is obtained, thereby determining the degree of association between the plurality of business objects, and classifying the plurality of business objects according to the degree of association to obtain the plurality of business object sets. Then, the topic information of the business object set is extracted separately, and then the cluster data table is generated, which solves the problem that the cluster data table can only be generated by manual operation in the prior art, and the generation efficiency of the cluster data table is improved. It also makes the generated cluster data table more objective and more suitable for the needs and preferences of most users.
  • the flow chart may specifically include the following steps:
  • Step 201 Acquire a plurality of service objects, where the plurality of service objects respectively have corresponding attribute information;
  • the business object may be a commodity
  • the attribute information of the business object may be a name, a price, a consumer, a brand, a specific category, and/or a picture of the product.
  • FIG. 3 is a schematic block diagram of Embodiment 2 of a method for generating a cluster data table according to the present application.
  • the attribute information of the commodity may be obtained from the commodity data stored on the platform such as an e-commerce website.
  • Step 202 Determine name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity between any two business objects, respectively;
  • the business object of the commodity is taken as an example to describe how to determine the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or between any two commodities. , picture similarity.
  • the name similarity can reflect the similarity between the names of any two commodities.
  • Jaccard Jaccard similarity in text mining can be used for calculation.
  • the basic idea is the number of identical words in the product name and the total words. The ratio between the numbers.
  • the title A is “small tomato custom women's new style” and the title B is “small apple custom women's Korean version”
  • the quantile of the transaction price of the commodity under the same category can be calculated first, and then the quantile is divided into different grades, thereby obtaining the price similarity.
  • the transaction price of the commodity can be first sorted from small to large, and then the 10-digit to 90-digit number can be calculated, so that the entire price domain can be divided into 10 grades according to the order statistics, and the transaction price of each commodity will be Fall to the 10 grades of 1-10. If the price grade of commodity A is 5 and the price grade of commodity B is 8, then the price between commodity A and commodity B can be calculated.
  • Consumer similarity can be calculated by an algorithm that uses collaborative filtering.
  • the basic idea is to calculate it by the consumer's preference and the cosine theorem Cosine formula.
  • the product pair may be first scored according to different behaviors such as browsing, collecting, purchasing, and closing of the product by the consumer. If the transaction is 4 points, the purchase is 3 points, the collection is 2 points, and the browsing is 1 point.
  • a consumer-commodity score sheet as shown in Table 1 below was obtained.
  • Commodity A Commodity B Consumer 1 3 4 Consumer 2 2 1 Consumer 3 3 2
  • Brand similarity can be obtained directly by comparing whether two commodities belong to the same brand. For example, if both the product A and the product B belong to the A brand, the brand similarity between the product A and the product B can be considered to be 1.
  • the category similarity can be calculated by the algorithm of association analysis.
  • the basic idea is to calculate the probability of purchasing the category A commodity and the category B commodity while purchasing the category A commodity in the consumer's order. For example, if there are currently two orders, where order 1 is category A/B/C, order 2 is category B/C/E, and order 3 is category B/D/F, then the calculation knows the purchase category B.
  • the probability of purchasing a category C product at the same time is 2/3, that is, both the order 1 and the order 2 contain the category B/C.
  • Image similarity can be converted into a vector by SIFT/SURF or deep neural network algorithm, and then the cosine theorem Cosine formula or other methods can be used to calculate the similarity.
  • the similarity of the image can reflect the similarity between the product styles.
  • SIFT Scale-invariant feature transform
  • SIFT Scale-invariant feature transform
  • SURF Speeded Up Robust Feature
  • the technique can be applied to object recognition and 3D reconstruction of computer vision, and the SURF operator is improved by the SIFT operator. Specifically, after obtaining the product picture, the picture can be transformed into a vector like [1, 1, 3, 4] by transformation on the data, and then the cosine theorem Cosine formula is used to calculate the picture similarity between the two products.
  • Step 203 Determine, according to the name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity, the degree of association between any two business objects, respectively;
  • the determining the arbitrary two according to the name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity respectively may specifically include the following sub-steps:
  • Sub-step 2031 weighting the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity to obtain an association between any two business objects. degree.
  • the name when calculating the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity of any two business objects, the name may be Similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity are weighted and summed to obtain the degree of association between any two business objects.
  • the weights of the similarities of different information dimensions can be adjusted according to actual needs. For example, for clothing goods, the weight of picture similarity can be increased, and for digital goods, the weight of name similarity can be increased, so that the final obtained
  • the degree of relevance better reflects the similarity or collocation between two different business objects.
  • Step 204 Combine the business objects whose degree of association is greater than the preset threshold to obtain a plurality of service object sets.
  • the hierarchical clustering method can be used to perform clustering based on the degree of association, thereby All acquired business objects are divided into different categories, each of which is a collection of business objects.
  • Step 205 Acquire attribute information of multiple associated business objects in the business object set.
  • Step 206 Determine, according to the attribute information, header information of the service object set.
  • the title information of the business object set may be a phrase or a short sentence that can reflect a common feature of all business objects in the set.
  • the attribute information of all the business objects may be obtained first, and then the text information used to describe the business object, for example, the name of the product, or the introduction text of the product, etc., may be extracted from the attribute information, and then The text information is used to determine the title information of the business object set.
  • the step of determining the title information of the service object set according to the attribute information may specifically include the following sub-steps:
  • Sub-step 2061 acquiring keywords in attribute information of multiple associated business objects
  • Sub-step 2062 sorting the keywords to obtain a first preset number of target keywords
  • Sub-step 2063 determining the title information of the business object set by using the target keyword and the first preset template.
  • the name of the obtained commodity or the attribute information such as the introduction text may be first segmented, the corresponding keyword is obtained, and then the existing statistical algorithm is used to sort the keywords to obtain the top ranking.
  • k keywords, and then the k keywords and the preset theme template may be used to determine the title information of the business object. For example, after obtaining a preset number of keywords, the title information of the business object may be generated using the template "XX of XX" or "Teach you how to XXX” or the like.
  • the selection of a predetermined number of keywords may be determined according to actual needs, which is not specifically limited in this application. For example, select two or three keywords and then use the corresponding template to get the title information of the business object.
  • the existing statistical algorithm may be a TF-IDF (term frequency-inverse document frequency) algorithm, or a TextRank algorithm, which is not specifically limited in this application.
  • TF-IDF term frequency-inverse document frequency
  • Step 207 Determine, according to the header information, description information of the service object set.
  • the step of determining the description information of the service object set according to the header information may specifically include the following sub-steps:
  • Sub-step 2071 obtaining review information corresponding to the title information
  • Sub-step 2072 determining, according to the comment information, description information of the business object set.
  • the comment information related to the title information may be further searched, and the description information of the service object set is determined according to the comment information.
  • the sub-step of obtaining the comment information corresponding to the title information may further include:
  • one or more word segmentation phrases may be obtained by segmenting the title information, and then the semantic model is used to expand the synonyms of the one or more word segmentation messages, and text matching is performed on the comment data, thereby recalling the title The information that matches the information.
  • the sub-step of determining the description information of the service object set according to the comment information may further include:
  • the review information may be ranked by using deep learning and manual labeling, so that the preset number of comment information is sorted by using the preset text template, and the preset text template is generated.
  • Descriptive information about the set of business objects Different sets of business objects are made to correspond to different description information.
  • the description information may be: for the delicate things, it is always unstoppable; for the business object set 2, the description information may be: teasing while chatting, which is a pleasant life; For the business object collection 3, the description information can be: the man wearing the shirt is absolutely the most handsome.
  • Step 208 Generate a cluster data table according to the plurality of service object sets and corresponding header information and description information.
  • the business object set and its title information and description information may be combined into one cluster data table.
  • the cluster data table is a list of commodities including a collection of different commodities and their titles and descriptions.
  • the list of the product list, the title, and the description can be automatically obtained by effectively using the product data and the comment data, thereby greatly improving the generation of the product list. effectiveness.
  • FIG. 4 a flow chart of steps of an embodiment of a method for displaying a cluster data table of the present application is shown, which may specifically include the following steps:
  • Step 401 Receive a presentation request of a cluster data table.
  • Step 402 Present a cluster data table according to the request; the cluster data table includes a plurality of business object sets, the business object set has a plurality of associated business objects, and corresponding topic information.
  • the cluster data table after receiving the presentation request of the cluster data table, the cluster data table may be generated according to the request, so that the cluster data table is presented to the user.
  • the present application does not limit the specific representation of the cluster data table.
  • the cluster data table may include multiple business object sets, and the business object set may include multiple associated business objects, and corresponding topic information.
  • FIG. 5 which is an exemplary diagram of the cluster data table of the present application, a plurality of different product lists shown in FIG. 5 are different sets of business objects, and the product list may include different The item, and subject information generated according to the different item, the subject information includes a title of the item list, and description information for the different item.
  • the multiple service object sets may be generated by the following steps:
  • the attribute information of the plurality of business objects may include a name, price information, consumer information, brand information, category information, and/or picture information of the plurality of business objects;
  • the step of determining the degree of association between the plurality of service objects may include the following sub-steps:
  • Sub-step S121 respectively determining name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity between any two business objects;
  • Sub-step S122 determining the degree of association between any two business objects according to the name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity.
  • the substeps can include:
  • the name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity are weighted and summed to obtain the degree of association between any two business objects.
  • the multiple business objects may be classified by using a hierarchical clustering method to obtain a plurality of business object sets.
  • the topic information may be generated by the following steps:
  • the title information of the business object set may be a phrase or a short sentence that can reflect a common feature of all business objects in the set.
  • the step of determining the title information of the service object set according to the attribute information may include the following sub-steps:
  • Sub-step S222 sorting the keywords to obtain a first preset number of target keywords
  • the name of the obtained commodity or the attribute information such as the introduction text may be first segmented, the corresponding keyword is obtained, and then the existing statistical algorithm is used to sort the keywords to obtain the top ranking.
  • k keywords, and then the k keywords and the preset theme template may be used to determine the title information of the business object. For example, after obtaining a preset number of keywords, the title information of the business object may be generated using the template "XX of XX" or "Teach you how to XXX” or the like.
  • the selection of a predetermined number of keywords may be determined according to actual needs, which is not specifically limited in this application. For example, select two or three keywords and then use the corresponding template to get the title information of the business object.
  • the existing statistical algorithm may be a TF-IDF (term frequency-inverse document frequency) algorithm, or a TextRank algorithm, which is not specifically limited in this application.
  • TF-IDF term frequency-inverse document frequency
  • the description information may be text information used to uniformly describe a business object in the set, and may also be text information that further elaborates the title information.
  • the step of determining the description information of the service object set according to the header information may include the following sub-steps:
  • one or more word segmentation phrases may be obtained by segmenting the title information, and then the semantic model is used to expand the synonyms of the one or more word segmentation messages, and text matching is performed on the comment data, thereby recalling the title The information that matches the information.
  • the review information may be ranked by using deep learning and manual labeling, so that the preset number of comment information is sorted by using the preset text template, and the preset text template is generated.
  • the step of presenting the cluster data table according to the request may specifically include the following sub-steps:
  • Sub-step 4021 acquiring a plurality of target service object sets that match user requirement information
  • Sub-step 4022 presenting the plurality of target business object sets.
  • the request for presenting the cluster data table may further include user requirement information, so after generating the cluster data table, a plurality of target business object sets matching the user requirement information may be acquired, and then the A plurality of target business object collections are presented to the user.
  • the user requirement information may be obtained according to a user's previous browsing or search record of the business object. For example, when the user browses or searches for a jacket, the user may generate clothing including a jacket, pants, shoes, and the like.
  • the product list information may also be obtained according to other methods, which is not limited in this application.
  • the cluster data table including the plurality of service object sets may be presented according to the request, which can quickly identify the user requirements and display the service that meets the user requirements.
  • the object reduces the time for the user to search or find the business object, saves the resource consumption of the system caused by searching or searching for the business object, and improves the access efficiency.
  • FIG. 6 a structural block diagram of an embodiment of a device for generating a clustering data table of the present application is shown, which may specifically include the following modules:
  • the obtaining module 601 is configured to acquire a plurality of service objects, where the plurality of service objects respectively have corresponding attribute information;
  • the association degree determining module 602 is configured to determine, according to the attribute information of the multiple service objects, the degree of association between the multiple service objects;
  • the categorization module 603 is configured to classify the plurality of service objects according to the degree of association between the plurality of service objects, to obtain a plurality of service object sets, where the plurality of service object sets respectively have multiple associated services Object
  • the topic information determining module 604 is configured to separately determine topic information corresponding to the plurality of service object sets according to the plurality of associated service objects;
  • the generating module 605 is configured to generate a cluster data table according to the plurality of business object sets and the corresponding topic information.
  • the attribute information of the plurality of business objects may include a name, price information, consumer information, brand information, category information, and/or picture information of the plurality of business objects;
  • the determining module 602 may specifically include the following submodules:
  • the similarity determination submodule is configured to respectively determine name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity between any two business objects;
  • the association determination submodule is configured to respectively determine between any two business objects according to the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity The degree of relevance.
  • association determining sub-module may specifically include the following units:
  • the association degree determining unit is configured to weight the sum of the name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity to obtain any two business objects. The degree of association between them.
  • the classification module 603 may specifically include the following sub-modules:
  • the combination sub-module is configured to combine the business objects whose degree of association is greater than the preset threshold to obtain a plurality of business object sets.
  • the topic information determining module 604 may specifically include the following submodules:
  • An attribute information obtaining submodule configured to acquire multiple associated service pairs in the business object set Attribute information of the image
  • header information determining submodule configured to determine, according to the attribute information, header information of the service object set
  • a description information determining submodule configured to determine, according to the header information, description information of the service object set.
  • the header information determining submodule may specifically include the following units:
  • a keyword obtaining unit configured to acquire keywords in attribute information of a plurality of associated business objects
  • a keyword sorting unit configured to sort the keywords to obtain a first preset number of target keywords
  • the title information determining unit is configured to determine the title information of the business object set by using the target keyword and the first preset template.
  • the description information determining submodule may specifically include the following units:
  • a comment information obtaining unit configured to obtain comment information corresponding to the title information
  • the description information determining unit is configured to determine description information of the business object set according to the comment information.
  • the comment information acquiring unit may specifically include the following subunits:
  • a word segmentation unit for segmenting the title information to obtain one or more word segmentation phrases
  • the comment information acquisition subunit is configured to respectively obtain the comment information that matches the one or more participle phrases.
  • the description information determining unit may specifically include the following subunits:
  • a comment information sorting subunit configured to sort the comment information to obtain a second preset number of target comment information
  • a description information determining subunit configured to determine description information of the business object set by using the target comment information and the second preset template.
  • FIG. 7 a structural block of an embodiment of a display device of a cluster data table of the present application is shown.
  • the figure may specifically include the following modules:
  • the receiving module 701 is configured to receive a presentation request of the cluster data table.
  • the presentation module 702 is configured to present a cluster data table according to the request; the cluster data table may include a plurality of business object sets, the business object set having a plurality of associated business objects, and corresponding topic information.
  • the multiple service object sets may be generated by calling the following modules:
  • the business object obtaining module 703 is configured to acquire a plurality of business objects, where the plurality of business objects respectively have corresponding attribute information;
  • the association degree determining module 704 is configured to determine, according to attribute information of the multiple service objects, an association degree between the multiple service objects;
  • the classification module 705 is configured to classify the plurality of business objects according to the degree of association between the plurality of business objects, to obtain a plurality of business object sets.
  • the attribute information of the plurality of business objects may include a name, price information, consumer information, brand information, category information, and/or picture information of the plurality of business objects;
  • the determining module 704 may specifically include the following submodules:
  • the similarity determination submodule is configured to respectively determine name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity between any two business objects;
  • the association determination submodule is configured to respectively determine between any two business objects according to the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity The degree of relevance.
  • association determining sub-module may specifically include the following units:
  • the association degree determining unit is configured to weight the sum of the name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity to obtain any two business objects. The degree of association between them.
  • the classification module 705 may specifically include the following sub-modules:
  • a combination sub-module for respectively combining business objects whose association degree is greater than a preset threshold To multiple business object collections.
  • the topic information may include title information and description information of the service object set, and the topic information may be generated by calling the following module:
  • the attribute information obtaining module 706 is configured to acquire attribute information of multiple associated business objects in the business object set;
  • a header information determining module 707 configured to determine, according to the attribute information, header information of the service object set
  • the description information determining module 708 is configured to determine description information of the business object set according to the title information.
  • the title information determining module 707 may specifically include the following sub-modules:
  • a keyword acquisition sub-module configured to acquire keywords in attribute information of multiple associated business objects
  • a keyword sorting sub-module configured to sort the keywords to obtain a first preset number of target keywords
  • a header information determining submodule configured to determine, by using the target keyword and the first preset template, header information of the service object set.
  • the description information determining module 708 may specifically include the following sub-modules:
  • a comment information obtaining submodule configured to obtain comment information corresponding to the title information
  • a description information determining submodule configured to determine, according to the comment information, description information of the business object set.
  • the comment information obtaining submodule may specifically include the following units:
  • a word segmentation unit configured to perform segmentation on the title information to obtain one or more word segmentation phrases
  • a comment information obtaining unit configured to respectively obtain the comment information that matches the one or more participle phrases.
  • the description information determining submodule may specifically include the following yuan:
  • a comment information sorting unit configured to sort the comment information to obtain a second preset number of target comment information
  • the description information determining unit is configured to determine description information of the business object set by using the target comment information and the second preset template.
  • the request may further include user requirement information
  • the presentation module 702 may specifically include the following sub-modules:
  • a target business object set obtaining submodule configured to acquire a plurality of target business object sets that match user demand information
  • the target business object presentation submodule is configured to display the plurality of target business object sets.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • the embodiment of the present application further discloses a presentation system of a cluster data table, and the system may include:
  • One or more processors are One or more processors;
  • One or more modules the one or more modules being stored in the memory and configured to be executed by the one or more processors, wherein the one or more modules have the following functions:
  • One or more processors are One or more processors;
  • One or more modules the one or more modules being stored in the memory and configured to be executed by the one or more processors, wherein the one or more modules have the following functions:
  • the cluster data table including a plurality of business object sets
  • the business object set has a plurality of associated business objects, and corresponding topic information.
  • embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.
  • Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application.
  • the computer program instructions A combination of the processes and/or blocks in the flowcharts and/or block diagrams, and the flowcharts and/or blocks in the flowcharts and/or block diagrams.
  • These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device
  • Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.
  • the method for generating a cluster data table provided by the present application, a device for generating a cluster data table, a method for displaying a cluster data table, a display device for cluster data table, and a clustering
  • the presentation system of the data table is described in detail.
  • the principles and implementation manners of the present application are described in the specific examples. The description of the above embodiments is only used to help understand the method of the present application and its core ideas; For those of ordinary skill in the art, the details of the present invention and the scope of the application are subject to change without departing from the scope of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A method, device and system for presenting a clustering data table. The method comprises: receiving a presentation request for a clustering data table (401); and presenting the clustering data table according to the presentation request, the clustering data table comprising multiple service object sets, the service object sets having multiple associated service objects and corresponding topic information (402). The method enables user needs to be identified rapidly, and can present service objects satisfying the user needs, thus reducing the time for a user to search for or query a service object, reducing the waste of system resources caused by searching for or querying a service object, and increasing the accessing efficiency.

Description

一种聚类数据表的展现方法、装置和系统Method, device and system for displaying cluster data table 技术领域Technical field
本申请涉及信息技术领域,特别是涉及一种聚类数据表的生成方法、一种聚类数据表的生成装置、一种聚类数据表的展现方法、一种聚类数据表的展现装置和一种聚类数据表的展现系统。The present invention relates to the field of information technology, and in particular, to a method for generating a cluster data table, a device for generating a cluster data table, a method for displaying a cluster data table, and a display device for a cluster data table. A presentation system for clustering data tables.
背景技术Background technique
技术的进步推动了电子商务的发展。如今,淘宝、天猫等电子商务网站已经能够将世界各地的商品汇集在网上,供消费者选购。但是,面对品类众多的商品,消费者可能并不清楚哪些商品是值得购买的。因此,部分电子商务网站开始主动向消费者推荐商品,以减少消费者搜索、选购商品的时间,以商品清单的方式向消费群体展示所推荐的商品便是其中的重要方式之一。商品清单通常由三部分组成:Advances in technology have driven the development of e-commerce. Today, e-commerce websites such as Taobao and Tmall have been able to bring products from all over the world online for consumers to purchase. However, in the face of a wide range of goods, consumers may not know which products are worth buying. Therefore, some e-commerce websites have begun to actively recommend products to consumers to reduce the time for consumers to search and purchase goods. It is one of the important ways to display the recommended products to the consumer groups in the form of product lists. The list of goods usually consists of three parts:
(1)商品列表:该列表包含了一系列的同类商品,例如,服装的清单可以是同款式的衣服、裤子及鞋子的搭配,家居的清单可以是同色调的窗帘、墙纸及地毯的组合,等等。(1) Product list: This list contains a series of similar products. For example, the list of clothing can be the same style of clothes, pants and shoes. The list of household items can be a combination of the same color curtains, wallpaper and carpet. and many more.
(2)清单标题:该标题即是一个短文本,可以用来描述商品列表的特色,例如,服装清单的标题可以是“小清新的春天”、“粉色系搭配控”等等。(2) List title: The title is a short text that can be used to describe the characteristics of the product list. For example, the title of the clothing list can be “small fresh spring”, “pink matching control” and so on.
(3)清单描述:清单描述可以是一小段通俗易懂的文字,用来对清单标题进行进一步的阐述,例如标题为“端在自己手上的饭碗”的商品清单,其描述可以是“瓷碗最健康,不同的大小,不同的花纹,都能让饭桌增色不少”,以方便消费者理解清单中所推荐的商品。(3) List description: The list description can be a short paragraph of easy-to-understand text for further elaboration of the title of the list, such as a list of goods titled “The rice bowl in your hand”, the description can be “porcelain” The healthiest bowls, different sizes and different patterns can add a lot to the table, so that consumers can understand the products recommended in the list.
目前,电子商务网站的的商品清单主要是依赖网站运营人员的人工操作来实现的,通过获取消费者的消费数据,并结合外部网站的舆情统计数据,通过人工分析,确定出所要推荐的商品,进而将所推荐的商品组合在清单中,并提炼出清单的标题和描述语句。但是,上述方法需要耗费大量的人力成本,所形成的商品清单也带有较重的运营人员的主观喜好,可能 无法满足大多数消费者的需求和偏好。At present, the merchandise list of the e-commerce website is mainly realized by relying on the manual operation of the website operator, and by obtaining the consumption data of the consumer and combining the public opinion statistics of the external website, the product to be recommended is determined through manual analysis. The recommended items are then combined into a list, and the title and description of the list are refined. However, the above method requires a lot of labor costs, and the list of goods formed also has subjective preferences of heavy operators, and may Unable to meet the needs and preferences of most consumers.
发明内容Summary of the invention
鉴于上述问题,提出了本申请实施例以便提供一种克服上述问题或者至少部分地解决上述问题的一种聚类数据表的生成方法、一种聚类数据表的生成装置、一种聚类数据表的展现方法、一种聚类数据表的展现装置和一种聚类数据表的展现系统。In view of the above problems, embodiments of the present application are provided to provide a method for generating a cluster data table, a cluster data table generating device, and a clustering data, which overcome the above problems or at least partially solve the above problems. A presentation method of a table, a presentation device of a cluster data table, and a presentation system of a cluster data table.
为了解决上述问题,本申请公开了一种聚类数据表的展现系统,包括:In order to solve the above problem, the present application discloses a display system of a cluster data table, including:
一个或多个处理器;One or more processors;
存储器;和,Memory; and,
一个或多个模块,所述一个或多个模块存储于所述存储器中并被配置成由所述一个或多个处理器执行,其中,所述一个或多个模块具有如下功能:One or more modules, the one or more modules being stored in the memory and configured to be executed by the one or more processors, wherein the one or more modules have the following functions:
接收聚类数据表的展现请求;Receiving a presentation request of the cluster data table;
依据所述请求展现聚类数据表,所述聚类数据表包括多个业务对象集合,所述业务对象集合具有多个关联的业务对象,以及,对应的主题信息。A cluster data table is presented according to the request, the cluster data table includes a plurality of business object sets, the business object set having a plurality of associated business objects, and corresponding topic information.
为了解决上述问题,本申请还公开了一种聚类数据表的展现方法,其特征在于,包括:In order to solve the above problem, the present application also discloses a method for displaying a cluster data table, which is characterized in that it comprises:
接收聚类数据表的展现请求;Receiving a presentation request of the cluster data table;
依据所述请求展现聚类数据表;所述聚类数据表包括多个业务对象集合,所述业务对象集合具有多个关联的业务对象,以及,对应的主题信息。And presenting a cluster data table according to the request; the cluster data table includes a plurality of business object sets, the business object set having a plurality of associated business objects, and corresponding topic information.
可选地,所述多个业务对象集合通过如下步骤生成:Optionally, the multiple business object sets are generated by the following steps:
获取多个业务对象,所述多个业务对象分别具有对应的属性信息;Obtaining a plurality of business objects, wherein the plurality of business objects respectively have corresponding attribute information;
根据所述多个业务对象的属性信息,确定所述多个业务对象之间的关联度;Determining, according to attribute information of the plurality of business objects, a degree of association between the plurality of business objects;
根据所述多个业务对象之间的关联度,对所述多个业务对象进行分类,得到多个业务对象集合。 And classifying the plurality of business objects according to the degree of association between the plurality of business objects to obtain a plurality of business object sets.
可选地,所述多个业务对象的属性信息包括多个业务对象的名称、价格信息、消费者信息、品牌信息、类目信息,和/或,图片信息;所述根据所述多个业务对象的属性信息,确定所述多个业务对象之间的关联度的步骤包括:Optionally, the attribute information of the plurality of business objects includes a name, price information, consumer information, brand information, category information, and/or picture information of the plurality of business objects; The attribute information of the object, the determining the degree of association between the plurality of business objects includes:
分别确定任意两个业务对象之间的名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度;Determining name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity between any two business objects;
根据所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度,分别确定任意两个业务对象之间的关联度。The degree of association between any two business objects is determined according to the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity.
可选地,所述根据所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度,分别确定任意两个业务对象之间的关联度的步骤包括:Optionally, the determining the association between any two business objects according to the name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity respectively The steps of degree include:
对所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度加权求和,得到任意两个业务对象之间的关联度。The name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity are weighted and summed to obtain the degree of association between any two business objects.
可选地,所述根据所述多个业务对象之间的关联度,对所述多个业务对象进行分类,得到多个业务对象集合的步骤包括:Optionally, the step of classifying the multiple service objects according to the degree of association between the multiple service objects, and obtaining the multiple service object sets includes:
分别将关联度大于预设阈值的业务对象进行组合,得到多个业务对象集合。The business objects whose association degree is greater than the preset threshold are respectively combined to obtain a plurality of business object sets.
可选地,所述主题信息包括所述业务对象集合的标题信息和描述信息,所述主题信息通过如下步骤生成:Optionally, the topic information includes title information and description information of the service object set, and the topic information is generated by the following steps:
获取所述业务对象集合中多个关联的业务对象的属性信息;Obtaining attribute information of multiple associated business objects in the business object set;
根据所述属性信息,确定所述业务对象集合的标题信息;Determining, according to the attribute information, title information of the business object set;
根据所述标题信息,确定所述业务对象集合的描述信息。Determining the description information of the business object set according to the title information.
可选地,所述根据所述属性信息,确定所述业务对象集合的标题信息的步骤包括:Optionally, the step of determining, according to the attribute information, the header information of the service object set includes:
获取多个关联的业务对象的属性信息中的关键词;Obtaining keywords in attribute information of multiple associated business objects;
对所述关键词进行排序,获得第一预设数量的目标关键词; Sorting the keywords to obtain a first preset number of target keywords;
采用所述目标关键词和第一预设模板,确定出所述业务对象集合的标题信息。Using the target keyword and the first preset template, determining header information of the business object set.
可选地,所述根据所述标题信息,确定所述业务对象集合的描述信息的步骤包括:Optionally, the step of determining the description information of the service object set according to the header information includes:
获取与所述标题信息相对应的评论信息;Obtaining comment information corresponding to the title information;
根据所述评论信息,确定所述业务对象集合的描述信息。Determining the description information of the business object set according to the comment information.
可选地,所述获取与所述标题信息相对应的评论信息的步骤包括:Optionally, the step of acquiring the comment information corresponding to the title information includes:
对所述标题信息进行分词,获得一个或多个分词短语;Segmenting the title information to obtain one or more participle phrases;
分别获取与所述一个或多个分词短语相匹配的评论信息。The review information that matches the one or more participle phrases is separately obtained.
可选地,所述根据所述评论信息,确定所述业务对象集合的描述信息的步骤包括:Optionally, the determining, according to the comment information, the description information of the service object set includes:
对所述评论信息进行排序,获得第二预设数量的目标评论信息;Sorting the comment information to obtain a second preset number of target comment information;
采用所述目标评论信息和第二预设模板,确定所述业务对象集合的描述信息。Determining the description information of the business object set by using the target comment information and the second preset template.
可选地,所述请求中还包括用户需求信息,所述依据所述请求展现聚类数据表的步骤包括:Optionally, the request further includes user requirement information, and the step of displaying the cluster data table according to the request includes:
获取与用户需求信息相匹配的多个目标业务对象集合;Obtaining a plurality of target business object sets that match user demand information;
展现所述多个目标业务对象集合。Presenting the plurality of target business object sets.
为了解决上述问题,本申请还公开了一种聚类数据表的生成方法,其特征在于,包括:In order to solve the above problem, the present application also discloses a method for generating a clustering data table, which is characterized in that it comprises:
获取多个业务对象,所述多个业务对象分别具有对应的属性信息;Obtaining a plurality of business objects, wherein the plurality of business objects respectively have corresponding attribute information;
根据所述多个业务对象的属性信息,确定所述多个业务对象之间的关联度;Determining, according to attribute information of the plurality of business objects, a degree of association between the plurality of business objects;
根据所述多个业务对象之间的关联度,对所述多个业务对象进行分类,得到多个业务对象集合,所述多个业务对象集合分别具有多个关联的业务对象;And categorizing the plurality of business objects according to the degree of association between the plurality of business objects, to obtain a plurality of business object sets, wherein the plurality of business object sets respectively have a plurality of associated business objects;
根据所述多个关联的业务对象,分别确定所述多个业务对象集合对应的 主题信息;Determining, according to the plurality of associated business objects, the plurality of service object sets respectively Subject information
依据所述多个业务对象集合,以及,对应的主题信息,生成聚类数据表。Generating a cluster data table according to the plurality of business object sets and corresponding topic information.
可选地,所述多个业务对象的属性信息包括多个业务对象的名称、价格信息、消费者信息、品牌信息、类目信息,和/或,图片信息;所述根据所述多个业务对象的属性信息,确定所述多个业务对象之间的关联度的步骤包括:Optionally, the attribute information of the plurality of business objects includes a name, price information, consumer information, brand information, category information, and/or picture information of the plurality of business objects; The attribute information of the object, the determining the degree of association between the plurality of business objects includes:
分别确定任意两个业务对象之间的名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度;Determining name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity between any two business objects;
根据所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度,分别确定任意两个业务对象之间的关联度。The degree of association between any two business objects is determined according to the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity.
可选地,所述根据所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度,分别确定任意两个业务对象之间的关联度的步骤包括:Optionally, the determining the association between any two business objects according to the name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity respectively The steps of degree include:
对所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度加权求和,得到任意两个业务对象之间的关联度。The name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity are weighted and summed to obtain the degree of association between any two business objects.
可选地,所述根据所述多个业务对象之间的关联度,对所述多个业务对象进行分类,得到多个业务对象集合的步骤包括:Optionally, the step of classifying the multiple service objects according to the degree of association between the multiple service objects, and obtaining the multiple service object sets includes:
分别将关联度大于预设阈值的业务对象进行组合,得到多个业务对象集合。The business objects whose association degree is greater than the preset threshold are respectively combined to obtain a plurality of business object sets.
可选地,所述根据所述多个关联的业务对象,分别确定所述多个业务对象集合对应的主题信息的步骤包括:Optionally, the step of determining the topic information corresponding to the multiple service object sets according to the multiple associated service objects includes:
获取所述业务对象集合中多个关联的业务对象的属性信息;Obtaining attribute information of multiple associated business objects in the business object set;
根据所述属性信息,确定所述业务对象集合的标题信息;Determining, according to the attribute information, title information of the business object set;
根据所述标题信息,确定所述业务对象集合的描述信息。Determining the description information of the business object set according to the title information.
可选地,所述根据所述属性信息,确定所述业务对象集合的标题信息的步骤包括: Optionally, the step of determining, according to the attribute information, the header information of the service object set includes:
获取多个关联的业务对象的属性信息中的关键词;Obtaining keywords in attribute information of multiple associated business objects;
对所述关键词进行排序,获得第一预设数量的目标关键词;Sorting the keywords to obtain a first preset number of target keywords;
采用所述目标关键词和第一预设模板,确定出所述业务对象集合的标题信息。Using the target keyword and the first preset template, determining header information of the business object set.
可选地,所述根据所述标题信息,确定所述业务对象集合的描述信息的步骤包括:Optionally, the step of determining the description information of the service object set according to the header information includes:
获取与所述标题信息相对应的评论信息;Obtaining comment information corresponding to the title information;
根据所述评论信息,确定所述业务对象集合的描述信息。Determining the description information of the business object set according to the comment information.
可选地,所述获取与所述标题信息相对应的评论信息的步骤包括:Optionally, the step of acquiring the comment information corresponding to the title information includes:
对所述标题信息进行分词,获得一个或多个分词短语;Segmenting the title information to obtain one or more participle phrases;
分别获取与所述一个或多个分词短语相匹配的评论信息。The review information that matches the one or more participle phrases is separately obtained.
可选地,所述根据所述评论信息,确定所述业务对象集合的描述信息的步骤包括:Optionally, the determining, according to the comment information, the description information of the service object set includes:
对所述评论信息进行排序,获得第二预设数量的目标评论信息;Sorting the comment information to obtain a second preset number of target comment information;
采用所述目标评论信息和第二预设模板,确定所述业务对象集合的描述信息。Determining the description information of the business object set by using the target comment information and the second preset template.
为了解决上述问题,本申请还公开了一种聚类数据表的展现装置,其特征在于,包括:In order to solve the above problem, the present application further discloses a display device for cluster data table, which is characterized in that it comprises:
接收模块,用于接收聚类数据表的展现请求;a receiving module, configured to receive a presentation request of the cluster data table;
展现模块,用于依据所述请求展现聚类数据表;所述聚类数据表包括多个业务对象集合,所述业务对象集合具有多个关联的业务对象,以及,对应的主题信息。a presentation module, configured to present a cluster data table according to the request; the cluster data table includes a plurality of business object sets, the business object set having a plurality of associated business objects, and corresponding topic information.
可选地,所述多个业务对象集合通过调用如下模块生成:Optionally, the plurality of business object sets are generated by calling the following modules:
业务对象获取模块,用于获取多个业务对象,所述多个业务对象分别具有对应的属性信息;a business object obtaining module, configured to acquire a plurality of business objects, where the plurality of business objects respectively have corresponding attribute information;
关联度确定模块,用于根据所述多个业务对象的属性信息,确定所述多个业务对象之间的关联度; An association determining module, configured to determine, according to attribute information of the multiple business objects, an association degree between the multiple service objects;
分类模块,用于根据所述多个业务对象之间的关联度,对所述多个业务对象进行分类,得到多个业务对象集合。And a classification module, configured to classify the plurality of business objects according to the degree of association between the plurality of business objects, to obtain a plurality of business object sets.
可选地,所述多个业务对象的属性信息包括多个业务对象的名称、价格信息、消费者信息、品牌信息、类目信息,和/或,图片信息;所述关联度确定模块包括:Optionally, the attribute information of the multiple service objects includes a name, a price information, a consumer information, a brand information, a category information, and/or a picture information of the plurality of service objects. The association degree determining module includes:
相似度确定子模块,用于分别确定任意两个业务对象之间的名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度;The similarity determination submodule is configured to respectively determine name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity between any two business objects;
关联度确定子模块,用于根据所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度,分别确定任意两个业务对象之间的关联度。The association determination submodule is configured to respectively determine between any two business objects according to the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity The degree of relevance.
可选地,所述关联度确定子模块包括:Optionally, the association determination submodule includes:
关联度确定单元,用于对所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度加权求和,得到任意两个业务对象之间的关联度。The association degree determining unit is configured to weight the sum of the name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity to obtain any two business objects. The degree of association between them.
可选地,所述分类模块包括:Optionally, the classification module includes:
组合子模块,用于分别将关联度大于预设阈值的业务对象进行组合,得到多个业务对象集合。The combination sub-module is configured to combine the business objects whose degree of association is greater than the preset threshold to obtain a plurality of business object sets.
可选地,所述主题信息包括所述业务对象集合的标题信息和描述信息,所述主题信息通过调用如下模块生成:Optionally, the topic information includes title information and description information of the set of business objects, and the topic information is generated by calling the following module:
属性信息获取模块,用于获取所述业务对象集合中多个关联的业务对象的属性信息;An attribute information obtaining module, configured to acquire attribute information of multiple associated business objects in the business object set;
标题信息确定模块,用于根据所述属性信息,确定所述业务对象集合的标题信息;a header information determining module, configured to determine, according to the attribute information, header information of the service object set;
描述信息确定模块,用于根据所述标题信息,确定所述业务对象集合的描述信息。And a description information determining module, configured to determine, according to the title information, description information of the service object set.
可选地,所述标题信息确定模块包括:Optionally, the title information determining module includes:
关键词获取子模块,用于获取多个关联的业务对象的属性信息中的关键 词;Keyword acquisition sub-module for obtaining key information in attribute information of multiple associated business objects word;
关键词排序子模块,用于对所述关键词进行排序,获得第一预设数量的目标关键词;a keyword sorting sub-module, configured to sort the keywords to obtain a first preset number of target keywords;
标题信息确定子模块,用于采用所述目标关键词和第一预设模板,确定出所述业务对象集合的标题信息。And a header information determining submodule, configured to determine, by using the target keyword and the first preset template, header information of the service object set.
可选地,所述描述信息确定模块包括:Optionally, the description information determining module includes:
评论信息获取子模块,用于获取与所述标题信息相对应的评论信息;a comment information obtaining submodule, configured to obtain comment information corresponding to the title information;
描述信息确定子模块,用于根据所述评论信息,确定所述业务对象集合的描述信息。And a description information determining submodule, configured to determine, according to the comment information, description information of the business object set.
可选地,所述评论信息获取子模块包括:Optionally, the comment information obtaining submodule includes:
分词单元,用于对所述标题信息进行分词,获得一个或多个分词短语;a word segmentation unit, configured to perform segmentation on the title information to obtain one or more word segmentation phrases;
评论信息获取单元,用于分别获取与所述一个或多个分词短语相匹配的评论信息。a comment information obtaining unit, configured to respectively obtain the comment information that matches the one or more participle phrases.
可选地,所述描述信息确定子模块包括:Optionally, the description information determining submodule includes:
评论信息排序单元,用于对所述评论信息进行排序,获得第二预设数量的目标评论信息;a comment information sorting unit, configured to sort the comment information to obtain a second preset number of target comment information;
描述信息确定单元,用于采用所述目标评论信息和第二预设模板,确定所述业务对象集合的描述信息。The description information determining unit is configured to determine description information of the business object set by using the target comment information and the second preset template.
可选地,所述请求中还包括用户需求信息,所述展现模块包括:Optionally, the request further includes user requirement information, where the presentation module includes:
目标业务对象集合获取子模块,用于获取与用户需求信息相匹配的多个目标业务对象集合;a target business object set obtaining submodule, configured to acquire a plurality of target business object sets that match user demand information;
目标业务对象展现子模块,用于展现所述多个目标业务对象集合。The target business object presentation submodule is configured to display the plurality of target business object sets.
为了解决上述问题,本申请还公开了一种聚类数据表的生成装置,其特征在于,包括:In order to solve the above problem, the present application further discloses a device for generating a cluster data table, which includes:
获取模块,用于获取多个业务对象,所述多个业务对象分别具有对应的属性信息;An obtaining module, configured to acquire a plurality of business objects, where the plurality of business objects respectively have corresponding attribute information;
关联度确定模块,用于根据所述多个业务对象的属性信息,确定所述多 个业务对象之间的关联度;An association determining module, configured to determine the multiple according to attribute information of the multiple service objects The degree of association between business objects;
分类模块,用于根据所述多个业务对象之间的关联度,对所述多个业务对象进行分类,得到多个业务对象集合,所述多个业务对象集合分别具有多个关联的业务对象;a classification module, configured to classify the plurality of business objects according to the degree of association between the plurality of business objects, to obtain a plurality of business object sets, wherein the plurality of business object sets respectively have multiple associated business objects ;
主题信息确定模块,用于根据所述多个关联的业务对象,分别确定所述多个业务对象集合对应的主题信息;a topic information determining module, configured to respectively determine topic information corresponding to the plurality of service object sets according to the plurality of associated business objects;
生成模块,用于依据所述多个业务对象集合,以及,对应的主题信息,生成聚类数据表。And a generating module, configured to generate a cluster data table according to the plurality of business object sets and the corresponding topic information.
可选地,所述多个业务对象的属性信息包括多个业务对象的名称、价格信息、消费者信息、品牌信息、类目信息,和/或,图片信息;所述关联度确定模块包括:Optionally, the attribute information of the multiple service objects includes a name, a price information, a consumer information, a brand information, a category information, and/or a picture information of the plurality of service objects. The association degree determining module includes:
相似度确定子模块,用于分别确定任意两个业务对象之间的名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度;The similarity determination submodule is configured to respectively determine name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity between any two business objects;
关联度确定子模块,用于根据所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度,分别确定任意两个业务对象之间的关联度。The association determination submodule is configured to respectively determine between any two business objects according to the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity The degree of relevance.
可选地,所述关联度确定子模块包括:Optionally, the association determination submodule includes:
关联度确定单元,用于对所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度加权求和,得到任意两个业务对象之间的关联度。The association degree determining unit is configured to weight the sum of the name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity to obtain any two business objects. The degree of association between them.
可选地,所述分类模块包括:Optionally, the classification module includes:
组合子模块,用于分别将关联度大于预设阈值的业务对象进行组合,得到多个业务对象集合。The combination sub-module is configured to combine the business objects whose degree of association is greater than the preset threshold to obtain a plurality of business object sets.
可选地,所述主题信息确定模块包括:Optionally, the topic information determining module includes:
属性信息获取子模块,用于获取所述业务对象集合中多个关联的业务对象的属性信息;An attribute information obtaining submodule, configured to acquire attribute information of multiple associated business objects in the business object set;
标题信息确定子模块,用于根据所述属性信息,确定所述业务对象集合 的标题信息;a header information determining submodule, configured to determine the set of business objects according to the attribute information Title information;
描述信息确定子模块,用于根据所述标题信息,确定所述业务对象集合的描述信息。And a description information determining submodule, configured to determine, according to the header information, description information of the service object set.
可选地,所述标题信息确定子模块包括:Optionally, the header information determining submodule includes:
关键词获取单元,用于获取多个关联的业务对象的属性信息中的关键词;a keyword obtaining unit, configured to acquire keywords in attribute information of a plurality of associated business objects;
关键词排序单元,用于对所述关键词进行排序,获得第一预设数量的目标关键词;a keyword sorting unit, configured to sort the keywords to obtain a first preset number of target keywords;
标题信息确定单元,用于采用所述目标关键词和第一预设模板,确定出所述业务对象集合的标题信息。The title information determining unit is configured to determine the title information of the business object set by using the target keyword and the first preset template.
可选地,所述描述信息确定子模块包括:Optionally, the description information determining submodule includes:
评论信息获取单元,用于获取与所述标题信息相对应的评论信息;a comment information obtaining unit, configured to obtain comment information corresponding to the title information;
描述信息确定单元,用于根据所述评论信息,确定所述业务对象集合的描述信息。The description information determining unit is configured to determine description information of the business object set according to the comment information.
可选地,所述评论信息获取单元包括:Optionally, the comment information acquiring unit includes:
分词子单元,用于对所述标题信息进行分词,获得一个或多个分词短语;a word segmentation unit for segmenting the title information to obtain one or more word segmentation phrases;
评论信息获取子单元,用于分别获取与所述一个或多个分词短语相匹配的评论信息。The comment information acquisition subunit is configured to respectively obtain the comment information that matches the one or more participle phrases.
可选地,所述描述信息确定单元包括:Optionally, the description information determining unit includes:
评论信息排序子单元,用于对所述评论信息进行排序,获得第二预设数量的目标评论信息;a comment information sorting subunit, configured to sort the comment information to obtain a second preset number of target comment information;
描述信息确定子单元,用于采用所述目标评论信息和第二预设模板,确定所述业务对象集合的描述信息。a description information determining subunit, configured to determine description information of the business object set by using the target comment information and the second preset template.
与背景技术相比,本申请实施例包括以下优点:Compared with the background art, the embodiments of the present application include the following advantages:
本申请实施例,在接收到聚类数据表的展现请求后,可以依据所述请求展现包括多个业务对象集合的聚类数据表,能够快速地识别用户需求,展 现满足用户需求的业务对象,减少了用户搜索或查找业务对象的时间,节省了由于搜索或查找业务对象所造成的系统的资源耗费,提升了访问效率。After receiving the presentation request of the cluster data table, the embodiment of the present application may display a cluster data table including a plurality of service object sets according to the request, and can quickly identify the user demand, and exhibit The business object that satisfies the user's needs reduces the time for the user to search or find the business object, saves the resource consumption of the system caused by searching or searching for the business object, and improves the access efficiency.
附图说明DRAWINGS
图1是本申请的一种聚类数据表的生成方法实施例一的步骤流程图;1 is a flow chart showing the steps of Embodiment 1 of a method for generating a cluster data table according to the present application;
图2是本申请的一种聚类数据表的生成方法实施例二的步骤流程图;2 is a flow chart showing the steps of Embodiment 2 of a method for generating a cluster data table according to the present application;
图3是本申请的一种聚类数据表的生成方法实施例二的原理框图;3 is a schematic block diagram of Embodiment 2 of a method for generating a cluster data table according to the present application;
图4是本申请的一种聚类数据表的展现方法实施例的步骤流程图;4 is a flow chart showing the steps of an embodiment of a method for displaying a cluster data table according to the present application;
图5是本申请的聚类数据表的一种示例图;5 is a diagram showing an example of a cluster data table of the present application;
图6是本申请的一种聚类数据表的生成装置实施例的结构框图;6 is a structural block diagram of an embodiment of a device for generating a cluster data table according to the present application;
图7是本申请的一种聚类数据表的展现装置实施例的结构框图。7 is a structural block diagram of an embodiment of a presentation device of a cluster data table of the present application.
具体实施方式detailed description
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。The above described objects, features and advantages of the present application will become more apparent and understood.
参照图1,示出了本申请的一种聚类数据表的生成方法实施例一的步骤流程图,具体可以包括如下步骤:Referring to FIG. 1 , a flow chart of a first embodiment of a method for generating a cluster data table according to the present application is shown, which may specifically include the following steps:
步骤101,获取多个业务对象,所述多个业务对象分别具有对应的属性信息;Step 101: Acquire a plurality of service objects, where the plurality of service objects respectively have corresponding attribute information;
在本申请实施例中,所述业务对象可以是商品,或者其他类型的对象,例如,新闻资讯等等,本申请对业务对象的类型不作限定。In the embodiment of the present application, the business object may be a commodity, or other types of objects, for example, news information, etc., and the type of the business object is not limited in the present application.
需要注意的是,对于不同的业务对象,其相应的业务对象的属性信息也可能是不同的。例如,当业务对象为商品时,所述属性信息可以是商品的名称、价格、消费者、所属品牌、具体的类目,和/或,图片等信息。而当业务对象为新闻资讯时,所述属性信息则可以是所述新闻资讯的来源、发生时间、地点等信息。本领域技术人员可以根据业务对象的具体种类,相应地选择合适的属性信息,本申请对此不作具体限定。It should be noted that the attribute information of the corresponding business object may be different for different business objects. For example, when the business object is a commodity, the attribute information may be a name, a price, a consumer, a brand, a specific category, and/or a picture of the item. When the business object is news information, the attribute information may be information such as the source, time, location, and the like of the news information. A person skilled in the art can select appropriate attribute information according to the specific type of the business object, which is not specifically limited in this application.
在具体实现中,对于不同的业务对象的属性信息的获取,也可以采用 不同的方式,例如,对于商品的属性信息,可以从电子商务网站等平台已储存的商品数据中获得,而对于新闻资讯的属性信息,则可以从资讯类网站等信息平台中获得。In a specific implementation, the acquisition of attribute information of different business objects may also be adopted. Different ways, for example, the attribute information of the product can be obtained from the product data stored on the platform such as the e-commerce website, and the attribute information of the news information can be obtained from the information platform such as the information website.
步骤102,根据所述多个业务对象的属性信息,确定所述多个业务对象之间的关联度;Step 102: Determine, according to attribute information of the multiple service objects, a degree of association between the multiple service objects.
在本申请实施例中,在获得多个业务对象的属性信息后,可以计算任意两个业务对象之间的关联度。所述关联度可以是通过从多个不同维度的属性信息中分析获得的两个业务对象之间的一种关联程度的数值描述,所述关联程度可以体现两个业务对象之间的相似性或搭配关系,例如,对于具有相似性的不同类型的鞋子,如皮鞋和凉鞋,可以是具有较高的关联度,而对于具有一定的搭配关系的业务对象,如衣服和裤子,也可以具有较高的关联度。In the embodiment of the present application, after obtaining attribute information of multiple business objects, the degree of association between any two business objects may be calculated. The degree of association may be a numerical description of a degree of association between two business objects obtained by analyzing from attribute information of a plurality of different dimensions, the degree of association may reflect similarity between two business objects or Collocations, for example, for different types of shoes with similarities, such as leather shoes and sandals, can have a higher degree of relevance, but for business objects with certain collocations, such as clothes and pants, can also have higher The degree of relevance.
在本申请实施例中,在获得商品这一业务对象的名称、价格信息、消费者信息、品牌信息、类目信息,和/或,图片信息后,可以首先分别计算出任意两个业务对象之间的名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度,然后根据所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度,确定出任意两个业务对象之间的关联度。不同的属性信息之间的相似度可以分别采用不同的计算方法,例如,可以采用余弦定理Cosine公式,或者杰卡德Jaccard相似度等等,本申请对具体的相似度的计算方式不作限定。In the embodiment of the present application, after obtaining the name, price information, consumer information, brand information, category information, and/or picture information of the business object of the commodity, the two business objects may be separately calculated first. Name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity, then based on the name similarity, price similarity, consumer similarity, brand Similarity, category similarity, and/or picture similarity determine the degree of association between any two business objects. The similarity between the different attribute information may be respectively calculated by using different calculation methods. For example, the cosine theorem Cosine formula, or the Jaccard Jaccard similarity may be used, and the specific similarity calculation method is not limited in this application.
在具体实现中,当计算获得任意两个业务对象的名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度后,可以对所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度进行加权求和,从而得到任意两个业务对象之间的关联度。不同信息维度的相似度的权重可以按照实际需要进行调整,例如,对于服装类商品,可以增加图片相似度的权重,而对于数码类商品,则可以增加名称相似度的权重,以使得最终获得的关联度能够更好地 体现两个不同的业务对象之间的相似性或可搭配性。In a specific implementation, when calculating the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity of any two business objects, the name may be Similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity are weighted and summed to obtain the degree of association between any two business objects. The weights of the similarities of different information dimensions can be adjusted according to actual needs. For example, for clothing goods, the weight of picture similarity can be increased, and for digital goods, the weight of name similarity can be increased, so that the final obtained Relevance can be better Reflects the similarity or collocation between two different business objects.
步骤103,根据所述多个业务对象之间的关联度,对所述多个业务对象进行分类,得到多个业务对象集合,所述多个业务对象集合分别具有多个关联的业务对象;Step 103: Classify the multiple service objects according to the degree of association between the multiple service objects, to obtain multiple service object sets, where the multiple service object sets respectively have multiple associated service objects;
在本申请实施例中,当计算得到任意两个业务对象之间的关联度后,可以分别将关联度大于预设阈值的业务对象进行组合,从而得到多个业务对象集合。在具体实现中,可以采用层次聚类的方法对所述多个业务对象进行分类,得到多个业务对象集合。层次聚类就是通过对数据集按照某种方法进行层次分解,直到满足某种条件为止。按照分类原理的不同,可以分为凝聚和分裂两种方法。以凝聚为例,凝聚的层次聚类是一种自底向上的策略,可以首先将每个对象作为一个簇,然后合并这些原子簇为越来越大的簇,直到所有的对象都在一个簇中,或者某个终结条件被满足。层次聚类是一种被广泛采用的分类算法,本申请对此不再赘述。In the embodiment of the present application, after the degree of association between any two service objects is calculated, the service objects whose association degree is greater than the preset threshold may be respectively combined to obtain a plurality of service object sets. In a specific implementation, the multiple business objects may be classified by using a hierarchical clustering method to obtain a plurality of business object sets. Hierarchical clustering is the hierarchical decomposition of a data set according to a certain method until a certain condition is met. According to the classification principle, it can be divided into two methods: condensation and splitting. Taking cohesion as an example, condensed hierarchical clustering is a bottom-up strategy. You can first treat each object as a cluster, then merge the clusters into larger and larger clusters until all objects are in one cluster. Medium, or a certain termination condition is met. Hierarchical clustering is a widely used classification algorithm, which is not described in this application.
步骤104,根据所述多个关联的业务对象,分别确定所述多个业务对象集合对应的主题信息;Step 104: Determine, according to the multiple associated service objects, topic information corresponding to the plurality of service object sets respectively;
在本申请实施例中,所述主题信息可以包括所述业务对象的标题信息和描述信息。所述业务对象集合的标题信息可以是能够体现该集合中全部业务对象的某一个共同特征的短语或者短句,所述描述信息可以是用来统一描述所述集合中业务对象的文本信息,还可以是对所述标题信息进行进一步阐述的文本信息。In the embodiment of the present application, the topic information may include title information and description information of the business object. The title information of the set of business objects may be a phrase or a phrase that can reflect a common feature of all the business objects in the set, and the description information may be text information used to uniformly describe the business objects in the set, and It may be text information that further elaborates on the title information.
在本申请的一种优选实施例中,所述根据所述多个关联的业务对象,分别确定所述多个业务对象集合对应的主题信息的步骤具体可以包括如下子步骤:In a preferred embodiment of the present application, the step of determining the topic information corresponding to the plurality of service object sets according to the multiple associated service objects may specifically include the following sub-steps:
子步骤1041,获取所述业务对象集合中多个关联的业务对象的属性信息;Sub-step 1041, acquiring attribute information of multiple associated business objects in the business object set;
子步骤1042,根据所述属性信息,确定所述业务对象集合的标题信息;Sub-step 1042, determining, according to the attribute information, header information of the service object set;
子步骤1043,根据所述标题信息,确定所述业务对象集合的描述信息。Sub-step 1043, determining, according to the header information, description information of the service object set.
在具体实现中,可以首先获得全部业务对象的属性信息,然后从所述 属性信息中提取出用来描述该业务对象的文本信息,例如,商品的名称,或者商品介绍文字等等,然后从所述文本信息中提取出关键词,通过对关键词进行排序,得到排序靠前的k个关键词,进而可以采用所述k个关键词和预设的主题模板,确定出所述业务对象集合的标题信息。在对关键词进行排序时,可以按照关键词的出现次数,或者其他方式进行,本申请对此不作具体限定。In a specific implementation, attribute information of all business objects may be obtained first, and then from the The attribute information extracts text information for describing the business object, for example, the name of the product, or the product introduction text, and the like, and then extracts keywords from the text information, and sorts the keywords to obtain a ranking. The first k keywords, and then the k keywords and the preset theme template may be used to determine the title information of the business object set. When the keyword is sorted, it may be performed according to the number of occurrences of the keyword, or other methods, which is not specifically limited in this application.
当确定出业务对象集合的标题信息后,可以根据所述标题信息,查找出与该标题相匹配的评论信息,然后进一步从查找出的评论信息中筛选出与标题信息相关度较高的评论信息,从而得到所述业务对象集合的描述信息。After determining the title information of the business object set, the review information matching the title may be found according to the title information, and then the comment information with higher relevance to the title information is further filtered out from the searched search information. , thereby obtaining description information of the set of business objects.
在具体实现中,可以通过对标题信息进行分词,利用语义模型对分词短信做近义词扩展,并对评论数据做文本匹配,从而召回与标题信息相匹配的评论信息。在获得评论信息后,可以按照一定规则对所述评论信息进行打分排序,从而采用排序靠前的评论信息,使用预设的文本模版,生成所述描述信息。In the specific implementation, the segmentation information can be segmented, the semantic model is used to expand the synonym of the segmentation message, and the comment data is matched by the text, thereby recalling the comment information matching the title information. After obtaining the comment information, the comment information may be scored according to a certain rule, so that the top-ranked comment information is used, and the description information is generated by using a preset text template.
步骤105,依据所述多个业务对象集合,以及,对应的主题信息,生成聚类数据表。Step 105: Generate a cluster data table according to the plurality of business object sets and corresponding topic information.
在本申请实施例中,在分别获得业务对象集合及其对应的主题信息后,可以将所述业务对象集合及其主题信息合并成一张聚类数据表,以向用户展现。In the embodiment of the present application, after obtaining the business object set and the corresponding topic information respectively, the business object set and the topic information thereof may be merged into one cluster data table to be presented to the user.
在本申请实施例中,通过获取多个业务对象的属性信息,从而确定出多个业务对象之间的关联度,并根据所述关联度对多个业务对象进行分类,得到多个业务对象集合,然后分别提取出所述业务对象集合的主题信息,进而生成聚类数据表,解决了已有技术中只能依靠人工操作生成聚类数据表的问题,提高了聚类数据表的生成效率,也使得所生成的聚类数据表更客观、更能匹配大多数用户的需求和偏好。In the embodiment of the present application, the attribute information of the plurality of business objects is obtained, thereby determining the degree of association between the plurality of business objects, and classifying the plurality of business objects according to the degree of association to obtain the plurality of business object sets. Then, the topic information of the business object set is extracted separately, and then the cluster data table is generated, which solves the problem that the cluster data table can only be generated by manual operation in the prior art, and the generation efficiency of the cluster data table is improved. It also makes the generated cluster data table more objective and more suitable for the needs and preferences of most users.
参照图2,示出了本申请的一种聚类数据表的生成方法实施例二的步骤 流程图,具体可以包括如下步骤:Referring to FIG. 2, the steps of the second embodiment of the method for generating a cluster data table of the present application are shown. The flow chart may specifically include the following steps:
步骤201,获取多个业务对象,所述多个业务对象分别具有对应的属性信息;Step 201: Acquire a plurality of service objects, where the plurality of service objects respectively have corresponding attribute information;
在本申请实施例中,所述业务对象可以是商品,所述业务对象的属性信息可以是商品的名称、价格、消费者、所属品牌、具体的类目,和/或,图片等信息。In the embodiment of the present application, the business object may be a commodity, and the attribute information of the business object may be a name, a price, a consumer, a brand, a specific category, and/or a picture of the product.
如图3所示,是本申请的一种聚类数据表的生成方法实施例二的原理框图。在具体实现中,对于商品的属性信息,可以从电子商务网站等平台已储存的商品数据中获得。FIG. 3 is a schematic block diagram of Embodiment 2 of a method for generating a cluster data table according to the present application. In a specific implementation, the attribute information of the commodity may be obtained from the commodity data stored on the platform such as an e-commerce website.
步骤202,分别确定任意两个业务对象之间的名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度;Step 202: Determine name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity between any two business objects, respectively;
为了便于理解,下面以商品这一业务对象为例,具体介绍如何确定任意两个商品之间的名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度。For the sake of understanding, the business object of the commodity is taken as an example to describe how to determine the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or between any two commodities. , picture similarity.
名称相似度可以体现任意两个商品的名称之间的相似性,具体地,可以采用文本挖掘中的杰卡德Jaccard相似度进行计算,其基本思路是商品名称中的相同词语的数量与总词语数量之间的比值。The name similarity can reflect the similarity between the names of any two commodities. Specifically, the Jaccard Jaccard similarity in text mining can be used for calculation. The basic idea is the number of identical words in the product name and the total words. The ratio between the numbers.
例如,若标题A为“小番茄定制女装新款”,标题B为“小苹果定制女装韩版”,采用杰卡德Jaccard相似度计算时,可以首先将标题分词,然后计算分词的交集大小和并集大小,其中标题A与标题B的交集为“定制”和“女装”,计为2,同理可以得到标题A与标题B的并集大小为6,二者之间的比值2/6=0.33,即为名称相似度。For example, if the title A is “small tomato custom women's new style” and the title B is “small apple custom women's Korean version”, when using the Jaccard Jaccard similarity calculation, the title can be first divided, and then the intersection size and union of the word segmentation can be calculated. Size, where the intersection of title A and title B is "customized" and "women's", counted as 2, the same can be obtained that the union size of title A and title B is 6, the ratio between the two is 2/6=0.33 Is the name similarity.
在确定价格相似度时,可以首先计算同一个类目下商品成交价的分位数,然后将所述分位数划分为不同的档次,从而得到价格相似度。When determining the price similarity, the quantile of the transaction price of the commodity under the same category can be calculated first, and then the quantile is divided into different grades, thereby obtaining the price similarity.
具体地,可以首先将商品的成交价格从小到大排序,然后计算10分位数至90分位数,从而可以按照顺序统计量将整个价格域划分为10个档次,每个商品的成交价格都会落到1-10这10个档次中。若商品A的价格档次为5,而商品B的价格档次为8,那么可以计算得到商品A与商品B之间的价 格相似度为(8-5)/10=0.3。Specifically, the transaction price of the commodity can be first sorted from small to large, and then the 10-digit to 90-digit number can be calculated, so that the entire price domain can be divided into 10 grades according to the order statistics, and the transaction price of each commodity will be Fall to the 10 grades of 1-10. If the price grade of commodity A is 5 and the price grade of commodity B is 8, then the price between commodity A and commodity B can be calculated. The lattice similarity is (8-5)/10=0.3.
消费者相似度可以通过采用协同过滤的算法计算,其基本思路是通过消费者的偏好和余弦定理Cosine公式进行计算。例如,可以首先根据消费者对商品的浏览、收藏、加购、成交等不同行为给商品对进行评分,若成交为4分,加购为3分,收藏为2分,浏览为1分,可以得到如下表一所示的消费者-商品评分表。Consumer similarity can be calculated by an algorithm that uses collaborative filtering. The basic idea is to calculate it by the consumer's preference and the cosine theorem Cosine formula. For example, the product pair may be first scored according to different behaviors such as browsing, collecting, purchasing, and closing of the product by the consumer. If the transaction is 4 points, the purchase is 3 points, the collection is 2 points, and the browsing is 1 point. A consumer-commodity score sheet as shown in Table 1 below was obtained.
表一:Table I:
  商品ACommodity A 商品BCommodity B
消费者1Consumer 1 33 44
消费者2Consumer 2 22 11
消费者3Consumer 3 33 22
然后,利用余弦定理Cosine公式计算,商品A与商品B之间的消费者相似度为:(3*4+2*1+3*2)/(SQRT(3^2+2^2+3^2)*SQRT(4^2+1^2+2^2))=0.93。Then, using the cosine theorem Cosine formula, the consumer similarity between commodity A and commodity B is: (3*4+2*1+3*2)/(SQRT(3^2+2^2+3^ 2) *SQRT(4^2+1^2+2^2))=0.93.
品牌相似度可以直接通过比较两个商品是不是属于同一个品牌得到。例如,若商品A与商品B都同属于甲品牌,则可以认为商品A与商品B之间的品牌相似度为1。Brand similarity can be obtained directly by comparing whether two commodities belong to the same brand. For example, if both the product A and the product B belong to the A brand, the brand similarity between the product A and the product B can be considered to be 1.
类目相似度可以采用关联分析的算法进行计算,其基本思路是在消费者的订单中统计购买类目A商品的同时也购买类目B商品的概率。例如,若当前有两个订单,其中订单1为类目A/B/C,订单2为类目B/C/E,订单3为类目B/D/F,那么计算可知购买类目B商品的同时也购买类目C商品的概率为2/3,即订单1与订单2中同时包含类目B/C。The category similarity can be calculated by the algorithm of association analysis. The basic idea is to calculate the probability of purchasing the category A commodity and the category B commodity while purchasing the category A commodity in the consumer's order. For example, if there are currently two orders, where order 1 is category A/B/C, order 2 is category B/C/E, and order 3 is category B/D/F, then the calculation knows the purchase category B. The probability of purchasing a category C product at the same time is 2/3, that is, both the order 1 and the order 2 contain the category B/C.
图片相似度可以采用SIFT/SURF或者深度神经网络算法将图片转变成向量,进而使用余弦定理Cosine公式或其他方法计算相似度,图片相似度可以体现商品款式之间的相似性。SIFT,即尺度不变特征变换(Scale-invariant feature transform,SIFT),是用于图像处理领域的一种描述,这种描述具有尺度不变性,可以在图像中检测出关键点。而SURF(Speeded Up Robust Feature,)则是指加速的具有鲁棒性的特征,SURF技 术可以应用于计算机视觉的物体识别以及3D重构中,SURF算子由SIFT算子改进而来。具体地,在获得商品图片后,可以通过数据上的变换将图片转变为类似[1,1,3,4]的向量,然后采用余弦定理Cosine公式计算两个商品之间的图片相似度。Image similarity can be converted into a vector by SIFT/SURF or deep neural network algorithm, and then the cosine theorem Cosine formula or other methods can be used to calculate the similarity. The similarity of the image can reflect the similarity between the product styles. SIFT, the Scale-invariant feature transform (SIFT), is a description used in the field of image processing. This description has scale invariance and can detect key points in the image. SURF (Speeded Up Robust Feature,) refers to the robust feature of acceleration, SURF technology The technique can be applied to object recognition and 3D reconstruction of computer vision, and the SURF operator is improved by the SIFT operator. Specifically, after obtaining the product picture, the picture can be transformed into a vector like [1, 1, 3, 4] by transformation on the data, and then the cosine theorem Cosine formula is used to calculate the picture similarity between the two products.
以上对如何计算商品的名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度分别进行了介绍,本领域技术人员也可以采用与上述介绍不同的其他方式进行相似度的计算,本申请对此不作具体限定。The above describes how to calculate the product similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity, respectively, and those skilled in the art can also adopt the above introduction. The calculation of the similarity is performed in different manners, which is not specifically limited in this application.
步骤203,根据所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度,分别确定任意两个业务对象之间的关联度;Step 203: Determine, according to the name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity, the degree of association between any two business objects, respectively;
在本申请的一种优选实施例中,所述根据所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度,分别确定任意两个业务对象之间的关联度的步骤具体可以包括如下子步骤:In a preferred embodiment of the present application, the determining the arbitrary two according to the name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity respectively The step of the degree of association between the business objects may specifically include the following sub-steps:
子步骤2031,对所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度加权求和,得到任意两个业务对象之间的关联度。Sub-step 2031, weighting the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity to obtain an association between any two business objects. degree.
在具体实现中,当计算获得任意两个业务对象的名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度后,可以对所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度进行加权求和,从而得到任意两个业务对象之间的关联度。不同信息维度的相似度的权重可以按照实际需要进行调整,例如,对于服装类商品,可以增加图片相似度的权重,而对于数码类商品,则可以增加名称相似度的权重,以使得最终获得的关联度能够更好地体现两个不同的业务对象之间的相似性或可搭配性。In a specific implementation, when calculating the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity of any two business objects, the name may be Similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity are weighted and summed to obtain the degree of association between any two business objects. The weights of the similarities of different information dimensions can be adjusted according to actual needs. For example, for clothing goods, the weight of picture similarity can be increased, and for digital goods, the weight of name similarity can be increased, so that the final obtained The degree of relevance better reflects the similarity or collocation between two different business objects.
步骤204,分别将关联度大于预设阈值的业务对象进行组合,得到多个业务对象集合;Step 204: Combine the business objects whose degree of association is greater than the preset threshold to obtain a plurality of service object sets.
在具体实现中,可以利用层次聚类方法基于关联度进行聚类,从而将 获取的全部业务对象划分为不同的分类,其中每一个分类即为一个业务对象集合。In a specific implementation, the hierarchical clustering method can be used to perform clustering based on the degree of association, thereby All acquired business objects are divided into different categories, each of which is a collection of business objects.
步骤205,获取所述业务对象集合中多个关联的业务对象的属性信息;Step 205: Acquire attribute information of multiple associated business objects in the business object set.
步骤206,根据所述属性信息,确定所述业务对象集合的标题信息;Step 206: Determine, according to the attribute information, header information of the service object set.
通常,所述业务对象集合的标题信息可以是能够体现该集合中全部业务对象的某一个共同特征的短语或者短句。在具体实现中,可以首先获得全部业务对象的属性信息,然后从所述属性信息中提取出用来描述该业务对象的文本信息,例如,商品的名称,或者商品介绍文字等等,然后根据所述文本信息,确定出业务对象集合的标题信息。Generally, the title information of the business object set may be a phrase or a short sentence that can reflect a common feature of all business objects in the set. In a specific implementation, the attribute information of all the business objects may be obtained first, and then the text information used to describe the business object, for example, the name of the product, or the introduction text of the product, etc., may be extracted from the attribute information, and then The text information is used to determine the title information of the business object set.
在本申请的一种优选实施例中,所述根据所述属性信息,确定所述业务对象集合的标题信息的步骤具体可以包括如下子步骤:In a preferred embodiment of the present application, the step of determining the title information of the service object set according to the attribute information may specifically include the following sub-steps:
子步骤2061,获取多个关联的业务对象的属性信息中的关键词;Sub-step 2061, acquiring keywords in attribute information of multiple associated business objects;
子步骤2062,对所述关键词进行排序,获得第一预设数量的目标关键词;Sub-step 2062, sorting the keywords to obtain a first preset number of target keywords;
子步骤2063,采用所述目标关键词和第一预设模板,确定出所述业务对象集合的标题信息。Sub-step 2063, determining the title information of the business object set by using the target keyword and the first preset template.
在具体实现中,可以首先将获得的商品的名称或者和介绍文字等属性信息进行分词,获得相应的关键词,然后采用已有的统计算法,对所述关键词进行排序,得到排序靠前的k个关键词,进而可以采用所述k个关键词和预设的主题模板,确定出所述业务对象的标题信息。例如,在获得预设数量的关键词后,可以使用模板“XX的XX”,或者“教你如何XXX”等,生成所述业务对象的标题信息。在预设数量的关键词的选择上,可以根据实际需要确定,本申请对此不作具体限定。例如,选择两个或者三个关键词,然后使用相应的模板得到业务对象的标题信息。In a specific implementation, the name of the obtained commodity or the attribute information such as the introduction text may be first segmented, the corresponding keyword is obtained, and then the existing statistical algorithm is used to sort the keywords to obtain the top ranking. k keywords, and then the k keywords and the preset theme template may be used to determine the title information of the business object. For example, after obtaining a preset number of keywords, the title information of the business object may be generated using the template "XX of XX" or "Teach you how to XXX" or the like. The selection of a predetermined number of keywords may be determined according to actual needs, which is not specifically limited in this application. For example, select two or three keywords and then use the corresponding template to get the title information of the business object.
所述已有的统计算法可以是TF-IDF(term frequency–inverse document frequency,信息检索数据挖掘的常用加权技术)算法,也可以是TextRank算法等,本申请对此不作具体限定。The existing statistical algorithm may be a TF-IDF (term frequency-inverse document frequency) algorithm, or a TextRank algorithm, which is not specifically limited in this application.
步骤207,根据所述标题信息,确定所述业务对象集合的描述信息; Step 207: Determine, according to the header information, description information of the service object set.
在本申请的一种优选实施例中,所述根据所述标题信息,确定所述业务对象集合的描述信息的步骤具体可以包括如下子步骤:In a preferred embodiment of the present application, the step of determining the description information of the service object set according to the header information may specifically include the following sub-steps:
子步骤2071,获取与所述标题信息相对应的评论信息;Sub-step 2071, obtaining review information corresponding to the title information;
子步骤2072,根据所述评论信息,确定所述业务对象集合的描述信息。Sub-step 2072, determining, according to the comment information, description information of the business object set.
在具体实现中,在确定出业务对象集合的标题信息后,可以继续查找出与所述标题信息相关的评论信息,进而根据所述评论信息确定出业务对象集合的描述信息。In a specific implementation, after the header information of the service object set is determined, the comment information related to the title information may be further searched, and the description information of the service object set is determined according to the comment information.
所述获取与所述标题信息相对应的评论信息的子步骤可以进一步包括:The sub-step of obtaining the comment information corresponding to the title information may further include:
对所述主题信息进行分词,获得一个或多个分词短语;Segmenting the subject information to obtain one or more participle phrases;
获取与所述一个或多个分词短语相匹配的评论信息。Obtaining review information that matches the one or more participle phrases.
在具体实现中,可以通过对标题信息进行分词,获得一个或多个分词短语,然后利用语义模型对所述一个或多个分词短信做近义词扩展,并对评论数据做文本匹配,从而召回与标题信息相匹配的评论信息。In a specific implementation, one or more word segmentation phrases may be obtained by segmenting the title information, and then the semantic model is used to expand the synonyms of the one or more word segmentation messages, and text matching is performed on the comment data, thereby recalling the title The information that matches the information.
所述根据所述评论信息,确定所述业务对象集合的描述信息的子步骤可以进一步包括:The sub-step of determining the description information of the service object set according to the comment information may further include:
对所述评论信息进行排序,获得第二预设数量的目标评论信息;Sorting the comment information to obtain a second preset number of target comment information;
采用所述目标评论信息和第二预设模板,确定所述业务对象集合的描述信息。Determining the description information of the business object set by using the target comment information and the second preset template.
在具体实现中,在获得评论信息后,可以利用深度学习和人工标注的方式,对评论信息进行评分排序,从而采用排序靠前的预设数量的评论信息,使用预设的文本模版,生成所述业务对象集合的描述信息。使得不同的业务对象集合对应于不同的描述信息。In the specific implementation, after obtaining the comment information, the review information may be ranked by using deep learning and manual labeling, so that the preset number of comment information is sorted by using the preset text template, and the preset text template is generated. Descriptive information about the set of business objects. Different sets of business objects are made to correspond to different description information.
例如,对于业务对象集合1,其描述信息可以是:对于精致东西总是挡不住喜欢;对于业务对象集合2,其描述信息可以是:边闲聊边品茶,这是何等惬意的生活;而对于业务对象集合3,其描述信息可以是:穿衬衫的男人绝对是最帅的。For example, for the business object set 1, the description information may be: for the delicate things, it is always unstoppable; for the business object set 2, the description information may be: teasing while chatting, which is a pleasant life; For the business object collection 3, the description information can be: the man wearing the shirt is absolutely the most handsome.
步骤208,依据所述多个业务对象集合,以及,对应的标题信息、描述信息,生成聚类数据表。 Step 208: Generate a cluster data table according to the plurality of service object sets and corresponding header information and description information.
在本申请实施例中,在分别获得业务对象集合及其标题信息和描述信息后,可以将所述业务对象集合及其标题信息和描述信息合并成一张聚类数据表。对于商品而言,所述聚类数据表即为包括不同商品的集合及其标题和描述的商品清单。In the embodiment of the present application, after obtaining the business object set and its title information and description information respectively, the business object set and its title information and description information may be combined into one cluster data table. For a commodity, the cluster data table is a list of commodities including a collection of different commodities and their titles and descriptions.
在本申请实施例中,通过采用基于图聚类和信息提取算法生成商品清单,能够利有效用商品数据和评论数据自动获得包含商品列表、标题和描述的清单,极大地提升了商品清单的生成效率。In the embodiment of the present application, by generating a product list by using the graph clustering and the information extraction algorithm, the list of the product list, the title, and the description can be automatically obtained by effectively using the product data and the comment data, thereby greatly improving the generation of the product list. effectiveness.
参照图4,示出了本申请的一种聚类数据表的展现方法实施例的步骤流程图,具体可以包括如下步骤:Referring to FIG. 4, a flow chart of steps of an embodiment of a method for displaying a cluster data table of the present application is shown, which may specifically include the following steps:
步骤401,接收聚类数据表的展现请求;Step 401: Receive a presentation request of a cluster data table.
步骤402,依据所述请求展现聚类数据表;所述聚类数据表包括多个业务对象集合,所述业务对象集合具有多个关联的业务对象,以及,对应的主题信息。Step 402: Present a cluster data table according to the request; the cluster data table includes a plurality of business object sets, the business object set has a plurality of associated business objects, and corresponding topic information.
在本申请实施例中,当接收到聚类数据表的展现请求后,可以依据所述请求生成聚类数据表,从而将所述聚类数据表展现给用户。In the embodiment of the present application, after receiving the presentation request of the cluster data table, the cluster data table may be generated according to the request, so that the cluster data table is presented to the user.
本申请对聚类数据表的具体表现形式不作限定,所述聚类数据表可以包括多个业务对象集合,所述业务对象集合中可以包括有多个关联的业务对象,以及,对应的主题信息。如图5所示,是本申请的聚类数据表的一种示例图,图5中所示的多个不同的商品清单即为不同的业务对象集合,所述商品清单中可以包括有不同的商品,以及根据所述不同的商品生成的主题信息,所述主题信息包括有商品清单的标题,以及针对所述不同的商品的描述信息。The present application does not limit the specific representation of the cluster data table. The cluster data table may include multiple business object sets, and the business object set may include multiple associated business objects, and corresponding topic information. . As shown in FIG. 5, which is an exemplary diagram of the cluster data table of the present application, a plurality of different product lists shown in FIG. 5 are different sets of business objects, and the product list may include different The item, and subject information generated according to the different item, the subject information includes a title of the item list, and description information for the different item.
在本申请实施例中,所述多个业务对象集合可以通过如下步骤生成:In this embodiment of the present application, the multiple service object sets may be generated by the following steps:
S11,获取多个业务对象,所述多个业务对象分别具有对应的属性信息;S11. Acquire multiple service objects, where the multiple service objects respectively have corresponding attribute information.
S12,根据所述多个业务对象的属性信息,确定所述多个业务对象之间的关联度; S12. Determine, according to attribute information of the multiple service objects, an association degree between the multiple service objects.
在本申请实施例中,所述多个业务对象的属性信息可以包括多个业务对象的名称、价格信息、消费者信息、品牌信息、类目信息,和/或,图片信息;所述根据所述多个业务对象的属性信息,确定所述多个业务对象之间的关联度的步骤具体可以包括如下子步骤:In the embodiment of the present application, the attribute information of the plurality of business objects may include a name, price information, consumer information, brand information, category information, and/or picture information of the plurality of business objects; The step of determining the degree of association between the plurality of service objects may include the following sub-steps:
子步骤S121,分别确定任意两个业务对象之间的名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度;Sub-step S121, respectively determining name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity between any two business objects;
子步骤S122,根据所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度,分别确定任意两个业务对象之间的关联度。Sub-step S122, determining the degree of association between any two business objects according to the name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity.
进一步地,所述根据所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度,分别确定任意两个业务对象之间的关联度的子步骤可以包括:Further, the determining the degree of association between any two business objects according to the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity The substeps can include:
对所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度加权求和,得到任意两个业务对象之间的关联度。The name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity are weighted and summed to obtain the degree of association between any two business objects.
由于子步骤S121-S122与实施例二中步骤202-203类似,可以互相参阅,本实施例对此不再赘述。Since the sub-steps S121-S122 are similar to the steps 202-203 in the second embodiment, they can be referred to each other, and the details are not described in this embodiment.
S13,根据所述多个业务对象之间的关联度,对所述多个业务对象进行分类,得到多个业务对象集合。S13. Classify the plurality of service objects according to the degree of association between the plurality of service objects, to obtain a plurality of service object sets.
在具体实现中,可以采用层次聚类的方法对所述多个业务对象进行分类,得到多个业务对象集合。In a specific implementation, the multiple business objects may be classified by using a hierarchical clustering method to obtain a plurality of business object sets.
在本申请实施例中,所述主题信息可以通过如下步骤生成:In the embodiment of the present application, the topic information may be generated by the following steps:
S21,获取所述业务对象集合中多个关联的业务对象的属性信息;S21. Acquire attribute information of multiple associated business objects in the business object set.
S22,根据所述属性信息,确定所述业务对象集合的标题信息;S22. Determine, according to the attribute information, header information of the service object set.
通常,所述业务对象集合的标题信息可以是能够体现该集合中全部业务对象的某一个共同特征的短语或者短句。具体地,所述根据所述属性信息,确定所述业务对象集合的标题信息的步骤可以包括如下子步骤:Generally, the title information of the business object set may be a phrase or a short sentence that can reflect a common feature of all business objects in the set. Specifically, the step of determining the title information of the service object set according to the attribute information may include the following sub-steps:
子步骤S221,获取多个关联的业务对象的属性信息中的关键词; Sub-step S221, acquiring keywords in the attribute information of the plurality of associated business objects;
子步骤S222,对所述关键词进行排序,获得第一预设数量的目标关键词;Sub-step S222, sorting the keywords to obtain a first preset number of target keywords;
子步骤S223,采用所述目标关键词和第一预设模板,确定出所述业务对象集合的标题信息。Sub-step S223, determining the title information of the business object set by using the target keyword and the first preset template.
在具体实现中,可以首先将获得的商品的名称或者和介绍文字等属性信息进行分词,获得相应的关键词,然后采用已有的统计算法,对所述关键词进行排序,得到排序靠前的k个关键词,进而可以采用所述k个关键词和预设的主题模板,确定出所述业务对象的标题信息。例如,在获得预设数量的关键词后,可以使用模板“XX的XX”,或者“教你如何XXX”等,生成所述业务对象的标题信息。在预设数量的关键词的选择上,可以根据实际需要确定,本申请对此不作具体限定。例如,选择两个或者三个关键词,然后使用相应的模板得到业务对象的标题信息。In a specific implementation, the name of the obtained commodity or the attribute information such as the introduction text may be first segmented, the corresponding keyword is obtained, and then the existing statistical algorithm is used to sort the keywords to obtain the top ranking. k keywords, and then the k keywords and the preset theme template may be used to determine the title information of the business object. For example, after obtaining a preset number of keywords, the title information of the business object may be generated using the template "XX of XX" or "Teach you how to XXX" or the like. The selection of a predetermined number of keywords may be determined according to actual needs, which is not specifically limited in this application. For example, select two or three keywords and then use the corresponding template to get the title information of the business object.
所述已有的统计算法可以是TF-IDF(term frequency–inverse document frequency,信息检索数据挖掘的常用加权技术)算法,也可以是TextRank算法等,本申请对此不作具体限定。The existing statistical algorithm may be a TF-IDF (term frequency-inverse document frequency) algorithm, or a TextRank algorithm, which is not specifically limited in this application.
S23,根据所述标题信息,确定所述业务对象集合的描述信息。S23. Determine, according to the header information, description information of the service object set.
通常,所述描述信息可以是用来统一描述所述集合中业务对象的文本信息,还可以是对所述标题信息进行进一步阐述的文本信息。具体地,所述根据所述标题信息,确定所述业务对象集合的描述信息的步骤可以包括如下子步骤:Generally, the description information may be text information used to uniformly describe a business object in the set, and may also be text information that further elaborates the title information. Specifically, the step of determining the description information of the service object set according to the header information may include the following sub-steps:
S231,获取与所述标题信息相对应的评论信息;S231. Acquire, the comment information corresponding to the title information.
在具体实现中,可以通过对标题信息进行分词,获得一个或多个分词短语,然后利用语义模型对所述一个或多个分词短信做近义词扩展,并对评论数据做文本匹配,从而召回与标题信息相匹配的评论信息。In a specific implementation, one or more word segmentation phrases may be obtained by segmenting the title information, and then the semantic model is used to expand the synonyms of the one or more word segmentation messages, and text matching is performed on the comment data, thereby recalling the title The information that matches the information.
S232,根据所述评论信息,确定所述业务对象集合的描述信息。S232. Determine, according to the comment information, description information of the service object set.
在具体实现中,在获得评论信息后,可以利用深度学习和人工标注的方式,对评论信息进行评分排序,从而采用排序靠前的预设数量的评论信息,使用预设的文本模版,生成所述业务对象集合的描述信息。使得不同 的业务对象集合对应于不同的描述信息。In the specific implementation, after obtaining the comment information, the review information may be ranked by using deep learning and manual labeling, so that the preset number of comment information is sorted by using the preset text template, and the preset text template is generated. Descriptive information about the set of business objects. Make different The set of business objects corresponds to different descriptive information.
在本申请的一种优选实施例中,所述依据所述请求展现聚类数据表的步骤具体可以包括如下子步骤:In a preferred embodiment of the present application, the step of presenting the cluster data table according to the request may specifically include the following sub-steps:
子步骤4021,获取与用户需求信息相匹配的多个目标业务对象集合;Sub-step 4021: acquiring a plurality of target service object sets that match user requirement information;
子步骤4022,展现所述多个目标业务对象集合。Sub-step 4022, presenting the plurality of target business object sets.
在具体实现中,展现聚类数据表的请求中还可以包括用户需求信息,从而在生成聚类数据表后,可以获取到与用户需求信息相匹配的多个目标业务对象集合,然后将所述多个目标业务对象集合展现给用户。In a specific implementation, the request for presenting the cluster data table may further include user requirement information, so after generating the cluster data table, a plurality of target business object sets matching the user requirement information may be acquired, and then the A plurality of target business object collections are presented to the user.
所述用户需求信息可以是根据用户在先对业务对象的浏览或搜索记录获得的,例如,当用户浏览或搜索了一件外套后,可以为用户生成包括外套、裤子、鞋子等商品的服装类的商品清单;当然,用户需求信息还可以是根据其他方式获得的,本申请对此不作限定。The user requirement information may be obtained according to a user's previous browsing or search record of the business object. For example, when the user browses or searches for a jacket, the user may generate clothing including a jacket, pants, shoes, and the like. The product list information; of course, the user demand information may also be obtained according to other methods, which is not limited in this application.
在本申请实施例中,在接收到聚类数据表的展现请求后,可以依据所述请求展现包括多个业务对象集合的聚类数据表,能够快速地识别用户需求,展现满足用户需求的业务对象,减少了用户搜索或查找业务对象的时间,节省了由于搜索或查找业务对象所造成的系统的资源耗费,提升了访问效率。In the embodiment of the present application, after receiving the presentation request of the cluster data table, the cluster data table including the plurality of service object sets may be presented according to the request, which can quickly identify the user requirements and display the service that meets the user requirements. The object reduces the time for the user to search or find the business object, saves the resource consumption of the system caused by searching or searching for the business object, and improves the access efficiency.
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。It should be noted that, for the method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the embodiments of the present application are not limited by the described action sequence, because In accordance with embodiments of the present application, certain steps may be performed in other sequences or concurrently. In the following, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions involved are not necessarily required in the embodiments of the present application.
参照图6,示出了本申请的一种聚类数据表的生成装置实施例的结构框图,具体可以包括如下模块:Referring to FIG. 6, a structural block diagram of an embodiment of a device for generating a clustering data table of the present application is shown, which may specifically include the following modules:
获取模块601,用于获取多个业务对象,所述多个业务对象分别具有对应的属性信息; The obtaining module 601 is configured to acquire a plurality of service objects, where the plurality of service objects respectively have corresponding attribute information;
关联度确定模块602,用于根据所述多个业务对象的属性信息,确定所述多个业务对象之间的关联度;The association degree determining module 602 is configured to determine, according to the attribute information of the multiple service objects, the degree of association between the multiple service objects;
分类模块603,用于根据所述多个业务对象之间的关联度,对所述多个业务对象进行分类,得到多个业务对象集合,所述多个业务对象集合分别具有多个关联的业务对象;The categorization module 603 is configured to classify the plurality of service objects according to the degree of association between the plurality of service objects, to obtain a plurality of service object sets, where the plurality of service object sets respectively have multiple associated services Object
主题信息确定模块604,用于根据所述多个关联的业务对象,分别确定所述多个业务对象集合对应的主题信息;The topic information determining module 604 is configured to separately determine topic information corresponding to the plurality of service object sets according to the plurality of associated service objects;
生成模块605,用于依据所述多个业务对象集合,以及,对应的主题信息,生成聚类数据表。The generating module 605 is configured to generate a cluster data table according to the plurality of business object sets and the corresponding topic information.
在本申请实施例中,所述多个业务对象的属性信息可以包括多个业务对象的名称、价格信息、消费者信息、品牌信息、类目信息,和/或,图片信息;所述关联度确定模块602具体可以包括如下子模块:In the embodiment of the present application, the attribute information of the plurality of business objects may include a name, price information, consumer information, brand information, category information, and/or picture information of the plurality of business objects; The determining module 602 may specifically include the following submodules:
相似度确定子模块,用于分别确定任意两个业务对象之间的名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度;The similarity determination submodule is configured to respectively determine name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity between any two business objects;
关联度确定子模块,用于根据所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度,分别确定任意两个业务对象之间的关联度。The association determination submodule is configured to respectively determine between any two business objects according to the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity The degree of relevance.
在本申请实施例中,所述关联度确定子模块具体可以包括如下单元:In the embodiment of the present application, the association determining sub-module may specifically include the following units:
关联度确定单元,用于对所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度加权求和,得到任意两个业务对象之间的关联度。The association degree determining unit is configured to weight the sum of the name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity to obtain any two business objects. The degree of association between them.
在本申请实施例中,所述分类模块603具体可以包括如下子模块:In the embodiment of the present application, the classification module 603 may specifically include the following sub-modules:
组合子模块,用于分别将关联度大于预设阈值的业务对象进行组合,得到多个业务对象集合。The combination sub-module is configured to combine the business objects whose degree of association is greater than the preset threshold to obtain a plurality of business object sets.
在本申请实施例中,所述主题信息确定模块604具体可以包括如下子模块:In the embodiment of the present application, the topic information determining module 604 may specifically include the following submodules:
属性信息获取子模块,用于获取所述业务对象集合中多个关联的业务对 象的属性信息;An attribute information obtaining submodule, configured to acquire multiple associated service pairs in the business object set Attribute information of the image;
标题信息确定子模块,用于根据所述属性信息,确定所述业务对象集合的标题信息;a header information determining submodule, configured to determine, according to the attribute information, header information of the service object set;
描述信息确定子模块,用于根据所述标题信息,确定所述业务对象集合的描述信息。And a description information determining submodule, configured to determine, according to the header information, description information of the service object set.
在本申请实施例中,所述标题信息确定子模块具体可以包括如下单元:In the embodiment of the present application, the header information determining submodule may specifically include the following units:
关键词获取单元,用于获取多个关联的业务对象的属性信息中的关键词;a keyword obtaining unit, configured to acquire keywords in attribute information of a plurality of associated business objects;
关键词排序单元,用于对所述关键词进行排序,获得第一预设数量的目标关键词;a keyword sorting unit, configured to sort the keywords to obtain a first preset number of target keywords;
标题信息确定单元,用于采用所述目标关键词和第一预设模板,确定出所述业务对象集合的标题信息。The title information determining unit is configured to determine the title information of the business object set by using the target keyword and the first preset template.
在本申请实施例中,所述描述信息确定子模块具体可以包括如下单元:In the embodiment of the present application, the description information determining submodule may specifically include the following units:
评论信息获取单元,用于获取与所述标题信息相对应的评论信息;a comment information obtaining unit, configured to obtain comment information corresponding to the title information;
描述信息确定单元,用于根据所述评论信息,确定所述业务对象集合的描述信息。The description information determining unit is configured to determine description information of the business object set according to the comment information.
在本申请实施例中,所述评论信息获取单元具体可以包括如下子单元:In the embodiment of the present application, the comment information acquiring unit may specifically include the following subunits:
分词子单元,用于对所述标题信息进行分词,获得一个或多个分词短语;a word segmentation unit for segmenting the title information to obtain one or more word segmentation phrases;
评论信息获取子单元,用于分别获取与所述一个或多个分词短语相匹配的评论信息。The comment information acquisition subunit is configured to respectively obtain the comment information that matches the one or more participle phrases.
在本申请实施例中,所述描述信息确定单元具体可以包括如下子单元:In the embodiment of the present application, the description information determining unit may specifically include the following subunits:
评论信息排序子单元,用于对所述评论信息进行排序,获得第二预设数量的目标评论信息;a comment information sorting subunit, configured to sort the comment information to obtain a second preset number of target comment information;
描述信息确定子单元,用于采用所述目标评论信息和第二预设模板,确定所述业务对象集合的描述信息。a description information determining subunit, configured to determine description information of the business object set by using the target comment information and the second preset template.
参照图7,示出了本申请的一种聚类数据表的展现装置实施例的结构框 图,具体可以包括如下模块:Referring to FIG. 7, a structural block of an embodiment of a display device of a cluster data table of the present application is shown. The figure may specifically include the following modules:
接收模块701,用于接收聚类数据表的展现请求;The receiving module 701 is configured to receive a presentation request of the cluster data table.
展现模块702,用于依据所述请求展现聚类数据表;所述聚类数据表可以包括多个业务对象集合,所述业务对象集合具有多个关联的业务对象,以及,对应的主题信息。The presentation module 702 is configured to present a cluster data table according to the request; the cluster data table may include a plurality of business object sets, the business object set having a plurality of associated business objects, and corresponding topic information.
在本申请实施例中,所述多个业务对象集合可以通过调用如下模块生成:In this embodiment of the present application, the multiple service object sets may be generated by calling the following modules:
业务对象获取模块703,用于获取多个业务对象,所述多个业务对象分别具有对应的属性信息;The business object obtaining module 703 is configured to acquire a plurality of business objects, where the plurality of business objects respectively have corresponding attribute information;
关联度确定模块704,用于根据所述多个业务对象的属性信息,确定所述多个业务对象之间的关联度;The association degree determining module 704 is configured to determine, according to attribute information of the multiple service objects, an association degree between the multiple service objects;
分类模块705,用于根据所述多个业务对象之间的关联度,对所述多个业务对象进行分类,得到多个业务对象集合。The classification module 705 is configured to classify the plurality of business objects according to the degree of association between the plurality of business objects, to obtain a plurality of business object sets.
在本申请实施例中,所述多个业务对象的属性信息可以包括多个业务对象的名称、价格信息、消费者信息、品牌信息、类目信息,和/或,图片信息;所述关联度确定模块704具体可以包括如下子模块:In the embodiment of the present application, the attribute information of the plurality of business objects may include a name, price information, consumer information, brand information, category information, and/or picture information of the plurality of business objects; The determining module 704 may specifically include the following submodules:
相似度确定子模块,用于分别确定任意两个业务对象之间的名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度;The similarity determination submodule is configured to respectively determine name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity between any two business objects;
关联度确定子模块,用于根据所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度,分别确定任意两个业务对象之间的关联度。The association determination submodule is configured to respectively determine between any two business objects according to the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity The degree of relevance.
在本申请实施例中,所述关联度确定子模块具体可以包括如下单元:In the embodiment of the present application, the association determining sub-module may specifically include the following units:
关联度确定单元,用于对所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度加权求和,得到任意两个业务对象之间的关联度。The association degree determining unit is configured to weight the sum of the name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity to obtain any two business objects. The degree of association between them.
在本申请实施例中,所述分类模块705具体可以包括如下子模块:In the embodiment of the present application, the classification module 705 may specifically include the following sub-modules:
组合子模块,用于分别将关联度大于预设阈值的业务对象进行组合,得 到多个业务对象集合。a combination sub-module for respectively combining business objects whose association degree is greater than a preset threshold To multiple business object collections.
在本申请实施例中,所述主题信息可以包括所述业务对象集合的标题信息和描述信息,所述主题信息可以通过调用如下模块生成:In this embodiment, the topic information may include title information and description information of the service object set, and the topic information may be generated by calling the following module:
属性信息获取模块706,用于获取所述业务对象集合中多个关联的业务对象的属性信息;The attribute information obtaining module 706 is configured to acquire attribute information of multiple associated business objects in the business object set;
标题信息确定模块707,用于根据所述属性信息,确定所述业务对象集合的标题信息;a header information determining module 707, configured to determine, according to the attribute information, header information of the service object set;
描述信息确定模块708,用于根据所述标题信息,确定所述业务对象集合的描述信息。The description information determining module 708 is configured to determine description information of the business object set according to the title information.
在本申请实施例中,所述标题信息确定模块707具体可以包括如下子模块:In the embodiment of the present application, the title information determining module 707 may specifically include the following sub-modules:
关键词获取子模块,用于获取多个关联的业务对象的属性信息中的关键词;a keyword acquisition sub-module, configured to acquire keywords in attribute information of multiple associated business objects;
关键词排序子模块,用于对所述关键词进行排序,获得第一预设数量的目标关键词;a keyword sorting sub-module, configured to sort the keywords to obtain a first preset number of target keywords;
标题信息确定子模块,用于采用所述目标关键词和第一预设模板,确定出所述业务对象集合的标题信息。And a header information determining submodule, configured to determine, by using the target keyword and the first preset template, header information of the service object set.
在本申请实施例中,所述描述信息确定模块708具体可以包括如下子模块:In the embodiment of the present application, the description information determining module 708 may specifically include the following sub-modules:
评论信息获取子模块,用于获取与所述标题信息相对应的评论信息;a comment information obtaining submodule, configured to obtain comment information corresponding to the title information;
描述信息确定子模块,用于根据所述评论信息,确定所述业务对象集合的描述信息。And a description information determining submodule, configured to determine, according to the comment information, description information of the business object set.
在本申请实施例中,所述评论信息获取子模块具体可以包括如下单元:In the embodiment of the present application, the comment information obtaining submodule may specifically include the following units:
分词单元,用于对所述标题信息进行分词,获得一个或多个分词短语;a word segmentation unit, configured to perform segmentation on the title information to obtain one or more word segmentation phrases;
评论信息获取单元,用于分别获取与所述一个或多个分词短语相匹配的评论信息。a comment information obtaining unit, configured to respectively obtain the comment information that matches the one or more participle phrases.
在本申请实施例中,所述描述信息确定子模块具体可以包括如下单 元:In the embodiment of the present application, the description information determining submodule may specifically include the following yuan:
评论信息排序单元,用于对所述评论信息进行排序,获得第二预设数量的目标评论信息;a comment information sorting unit, configured to sort the comment information to obtain a second preset number of target comment information;
描述信息确定单元,用于采用所述目标评论信息和第二预设模板,确定所述业务对象集合的描述信息。The description information determining unit is configured to determine description information of the business object set by using the target comment information and the second preset template.
在本申请实施例中,所述请求中还可以包括用户需求信息,所述展现模块702具体可以包括如下子模块:In the embodiment of the present application, the request may further include user requirement information, and the presentation module 702 may specifically include the following sub-modules:
目标业务对象集合获取子模块,用于获取与用户需求信息相匹配的多个目标业务对象集合;a target business object set obtaining submodule, configured to acquire a plurality of target business object sets that match user demand information;
目标业务对象展现子模块,用于展现所述多个目标业务对象集合。The target business object presentation submodule is configured to display the plurality of target business object sets.
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。For the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
本申请实施例还公开了一种聚类数据表的展现系统,所述系统可以包括:The embodiment of the present application further discloses a presentation system of a cluster data table, and the system may include:
一个或多个处理器;One or more processors;
存储器;和,Memory; and,
一个或多个模块,所述一个或多个模块存储于所述存储器中并被配置成由所述一个或多个处理器执行,其中,所述一个或多个模块具有如下功能:One or more modules, the one or more modules being stored in the memory and configured to be executed by the one or more processors, wherein the one or more modules have the following functions:
一个或多个处理器;One or more processors;
存储器;和,Memory; and,
一个或多个模块,所述一个或多个模块存储于所述存储器中并被配置成由所述一个或多个处理器执行,其中,所述一个或多个模块具有如下功能:One or more modules, the one or more modules being stored in the memory and configured to be executed by the one or more processors, wherein the one or more modules have the following functions:
接收聚类数据表的展现请求;Receiving a presentation request of the cluster data table;
依据所述请求展现聚类数据表,所述聚类数据表包括多个业务对象集 合,所述业务对象集合具有多个关联的业务对象,以及,对应的主题信息。Presenting a cluster data table according to the request, the cluster data table including a plurality of business object sets The business object set has a plurality of associated business objects, and corresponding topic information.
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。The various embodiments in the present specification are described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same similar parts between the various embodiments can be referred to each other.
本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
在一个典型的配置中,所述计算机设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非持续性的电脑可读媒体(transitory media),如调制的数据信号和载波。In a typical configuration, the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium. Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.
本申请实施例是参照根据本申请实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令 实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It should be understood that the computer program instructions A combination of the processes and/or blocks in the flowcharts and/or block diagrams, and the flowcharts and/or blocks in the flowcharts and/or block diagrams. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing terminal device such that a series of operational steps are performed on the computer or other programmable terminal device to produce computer-implemented processing, such that the computer or other programmable terminal device The instructions executed above provide steps for implementing the functions specified in one or more blocks of the flowchart or in a block or blocks of the flowchart.
尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。While a preferred embodiment of the embodiments of the present application has been described, those skilled in the art can make further changes and modifications to the embodiments once they are aware of the basic inventive concept. Therefore, the appended claims are intended to be interpreted as including all the modifications and the modifications
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。 Finally, it should also be noted that in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities. There is any such actual relationship or order between operations. Furthermore, the terms "comprises" or "comprising" or "comprising" or any other variations are intended to encompass a non-exclusive inclusion, such that a process, method, article, or terminal device that includes a plurality of elements includes not only those elements but also Other elements that are included, or include elements inherent to such a process, method, article, or terminal device. An element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article, or terminal device that comprises the element, without further limitation.
以上对本申请所提供的一种聚类数据表的生成方法、一种聚类数据表的生成装置、一种聚类数据表的展现方法、一种聚类数据表的展现装置和一种聚类数据表的展现系统,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。 The method for generating a cluster data table provided by the present application, a device for generating a cluster data table, a method for displaying a cluster data table, a display device for cluster data table, and a clustering The presentation system of the data table is described in detail. The principles and implementation manners of the present application are described in the specific examples. The description of the above embodiments is only used to help understand the method of the present application and its core ideas; For those of ordinary skill in the art, the details of the present invention and the scope of the application are subject to change without departing from the scope of the present application.

Claims (30)

  1. 一种聚类数据表的展现系统,其特征在于,所述系统包括:A presentation system for clustering data tables, characterized in that the system comprises:
    一个或多个处理器;One or more processors;
    存储器;和,Memory; and,
    一个或多个模块,所述一个或多个模块存储于所述存储器中并被配置成由所述一个或多个处理器执行,其中,所述一个或多个模块具有如下功能:One or more modules, the one or more modules being stored in the memory and configured to be executed by the one or more processors, wherein the one or more modules have the following functions:
    接收聚类数据表的展现请求;Receiving a presentation request of the cluster data table;
    依据所述请求展现聚类数据表,所述聚类数据表包括多个业务对象集合,所述业务对象集合具有多个关联的业务对象,以及,对应的主题信息。A cluster data table is presented according to the request, the cluster data table includes a plurality of business object sets, the business object set having a plurality of associated business objects, and corresponding topic information.
  2. 一种聚类数据表的展现方法,其特征在于,包括:A method for presenting a cluster data table, comprising:
    接收聚类数据表的展现请求;Receiving a presentation request of the cluster data table;
    依据所述请求展现聚类数据表;所述聚类数据表包括多个业务对象集合,所述业务对象集合具有多个关联的业务对象,以及,对应的主题信息。And presenting a cluster data table according to the request; the cluster data table includes a plurality of business object sets, the business object set having a plurality of associated business objects, and corresponding topic information.
  3. 根据权利要求2所述的方法,其特征在于,所述多个业务对象集合通过如下步骤生成:The method according to claim 2, wherein the plurality of business object sets are generated by the following steps:
    获取多个业务对象,所述多个业务对象分别具有对应的属性信息;Obtaining a plurality of business objects, wherein the plurality of business objects respectively have corresponding attribute information;
    根据所述多个业务对象的属性信息,确定所述多个业务对象之间的关联度;Determining, according to attribute information of the plurality of business objects, a degree of association between the plurality of business objects;
    根据所述多个业务对象之间的关联度,对所述多个业务对象进行分类,得到多个业务对象集合。And classifying the plurality of business objects according to the degree of association between the plurality of business objects to obtain a plurality of business object sets.
  4. 根据权利要求3所述的方法,其特征在于,所述多个业务对象的属性信息包括多个业务对象的名称、价格信息、消费者信息、品牌信息、类目信息,和/或,图片信息;所述根据所述多个业务对象的属性信息,确定所述多个业务对象之间的关联度的步骤包括:The method according to claim 3, wherein the attribute information of the plurality of business objects includes a name, price information, consumer information, brand information, category information, and/or picture information of the plurality of business objects. The step of determining the degree of association between the plurality of service objects according to the attribute information of the multiple service objects includes:
    分别确定任意两个业务对象之间的名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度; Determining name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity between any two business objects;
    根据所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度,分别确定任意两个业务对象之间的关联度。The degree of association between any two business objects is determined according to the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity.
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度,分别确定任意两个业务对象之间的关联度的步骤包括:The method according to claim 4, wherein said determining is based on said name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity, respectively The steps of the degree of association between any two business objects include:
    对所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度加权求和,得到任意两个业务对象之间的关联度。The name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity are weighted and summed to obtain the degree of association between any two business objects.
  6. 根据权利要求3-5任一所述的方法,其特征在于,所述根据所述多个业务对象之间的关联度,对所述多个业务对象进行分类,得到多个业务对象集合的步骤包括:The method according to any one of claims 3-5, wherein the step of classifying the plurality of business objects according to the degree of association between the plurality of business objects to obtain a plurality of business object sets include:
    分别将关联度大于预设阈值的业务对象进行组合,得到多个业务对象集合。The business objects whose association degree is greater than the preset threshold are respectively combined to obtain a plurality of business object sets.
  7. 根据权利要求2所述的方法,其特征在于,所述主题信息包括所述业务对象集合的标题信息和描述信息,所述主题信息通过如下步骤生成:The method according to claim 2, wherein the topic information comprises title information and description information of the set of business objects, and the topic information is generated by the following steps:
    获取所述业务对象集合中多个关联的业务对象的属性信息;Obtaining attribute information of multiple associated business objects in the business object set;
    根据所述属性信息,确定所述业务对象集合的标题信息;Determining, according to the attribute information, title information of the business object set;
    根据所述标题信息,确定所述业务对象集合的描述信息。Determining the description information of the business object set according to the title information.
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述属性信息,确定所述业务对象集合的标题信息的步骤包括:The method according to claim 7, wherein the determining the header information of the set of business objects according to the attribute information comprises:
    获取多个关联的业务对象的属性信息中的关键词;Obtaining keywords in attribute information of multiple associated business objects;
    对所述关键词进行排序,获得第一预设数量的目标关键词;Sorting the keywords to obtain a first preset number of target keywords;
    采用所述目标关键词和第一预设模板,确定出所述业务对象集合的标题信息。Using the target keyword and the first preset template, determining header information of the business object set.
  9. 根据权利要求7所述的方法,其特征在于,所述根据所述标题信息,确定所述业务对象集合的描述信息的步骤包括:The method according to claim 7, wherein the determining the description information of the business object set according to the title information comprises:
    获取与所述标题信息相对应的评论信息; Obtaining comment information corresponding to the title information;
    根据所述评论信息,确定所述业务对象集合的描述信息。Determining the description information of the business object set according to the comment information.
  10. 根据权利要求9所述的方法,其特征在于,所述获取与所述标题信息相对应的评论信息的步骤包括:The method according to claim 9, wherein the step of acquiring the comment information corresponding to the title information comprises:
    对所述标题信息进行分词,获得一个或多个分词短语;Segmenting the title information to obtain one or more participle phrases;
    分别获取与所述一个或多个分词短语相匹配的评论信息。The review information that matches the one or more participle phrases is separately obtained.
  11. 根据权利要求9所述的方法,其特征在于,所述根据所述评论信息,确定所述业务对象集合的描述信息的步骤包括:The method according to claim 9, wherein the determining the description information of the business object set according to the comment information comprises:
    对所述评论信息进行排序,获得第二预设数量的目标评论信息;Sorting the comment information to obtain a second preset number of target comment information;
    采用所述目标评论信息和第二预设模板,确定所述业务对象集合的描述信息。Determining the description information of the business object set by using the target comment information and the second preset template.
  12. 根据权利要求2所述的方法,其特征在于,所述请求中还包括用户需求信息,所述依据所述请求展现聚类数据表的步骤包括:The method according to claim 2, wherein the request further includes user requirement information, and the step of presenting the cluster data table according to the request comprises:
    获取与用户需求信息相匹配的多个目标业务对象集合;Obtaining a plurality of target business object sets that match user demand information;
    展现所述多个目标业务对象集合。Presenting the plurality of target business object sets.
  13. 一种聚类数据表的生成方法,其特征在于,包括:A method for generating a clustering data table, comprising:
    获取多个业务对象,所述多个业务对象分别具有对应的属性信息;Obtaining a plurality of business objects, wherein the plurality of business objects respectively have corresponding attribute information;
    根据所述多个业务对象的属性信息,确定所述多个业务对象之间的关联度;Determining, according to attribute information of the plurality of business objects, a degree of association between the plurality of business objects;
    根据所述多个业务对象之间的关联度,对所述多个业务对象进行分类,得到多个业务对象集合,所述多个业务对象集合分别具有多个关联的业务对象;And categorizing the plurality of business objects according to the degree of association between the plurality of business objects, to obtain a plurality of business object sets, wherein the plurality of business object sets respectively have a plurality of associated business objects;
    根据所述多个关联的业务对象,分别确定所述多个业务对象集合对应的主题信息;Determining, according to the plurality of associated business objects, topic information corresponding to the plurality of service object sets;
    依据所述多个业务对象集合,以及,对应的主题信息,生成聚类数据表。Generating a cluster data table according to the plurality of business object sets and corresponding topic information.
  14. 根据权利要求13所述的方法,其特征在于,所述多个业务对象的属性信息包括多个业务对象的名称、价格信息、消费者信息、品牌信息、类目信息,和/或,图片信息;所述根据所述多个业务对象的属性信息,确定所述多个业务对象之间的关联度的步骤包括: The method according to claim 13, wherein the attribute information of the plurality of business objects includes a name, price information, consumer information, brand information, category information, and/or picture information of the plurality of business objects. The step of determining the degree of association between the plurality of service objects according to the attribute information of the multiple service objects includes:
    分别确定任意两个业务对象之间的名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度;Determining name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity between any two business objects;
    根据所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度,分别确定任意两个业务对象之间的关联度。The degree of association between any two business objects is determined according to the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity.
  15. 根据权利要求14所述的方法,其特征在于,所述根据所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度,分别确定任意两个业务对象之间的关联度的步骤包括:The method according to claim 14, wherein said determining is based on said name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity, respectively The steps of the degree of association between any two business objects include:
    对所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度加权求和,得到任意两个业务对象之间的关联度。The name similarity, the price similarity, the consumer similarity, the brand similarity, the category similarity, and/or the picture similarity are weighted and summed to obtain the degree of association between any two business objects.
  16. 根据权利要求13-15任一所述的方法,其特征在于,所述根据所述多个业务对象之间的关联度,对所述多个业务对象进行分类,得到多个业务对象集合的步骤包括:The method according to any one of claims 13-15, wherein the step of classifying the plurality of business objects according to the degree of association between the plurality of business objects to obtain a plurality of business object sets include:
    分别将关联度大于预设阈值的业务对象进行组合,得到多个业务对象集合。The business objects whose association degree is greater than the preset threshold are respectively combined to obtain a plurality of business object sets.
  17. 根据权利要求16所述的方法,其特征在于,所述根据所述多个关联的业务对象,分别确定所述多个业务对象集合对应的主题信息的步骤包括:The method according to claim 16, wherein the step of determining the topic information corresponding to the plurality of service object sets according to the plurality of associated business objects includes:
    获取所述业务对象集合中多个关联的业务对象的属性信息;Obtaining attribute information of multiple associated business objects in the business object set;
    根据所述属性信息,确定所述业务对象集合的标题信息;Determining, according to the attribute information, title information of the business object set;
    根据所述标题信息,确定所述业务对象集合的描述信息。Determining the description information of the business object set according to the title information.
  18. 根据权利要求17所述的方法,其特征在于,所述根据所述属性信息,确定所述业务对象集合的标题信息的步骤包括:The method according to claim 17, wherein the determining the header information of the set of business objects according to the attribute information comprises:
    获取多个关联的业务对象的属性信息中的关键词;Obtaining keywords in attribute information of multiple associated business objects;
    对所述关键词进行排序,获得第一预设数量的目标关键词;Sorting the keywords to obtain a first preset number of target keywords;
    采用所述目标关键词和第一预设模板,确定出所述业务对象集合的标题信息。 Using the target keyword and the first preset template, determining header information of the business object set.
  19. 根据权利要求17所述的方法,其特征在于,所述根据所述标题信息,确定所述业务对象集合的描述信息的步骤包括:The method according to claim 17, wherein the determining the description information of the business object set according to the title information comprises:
    获取与所述标题信息相对应的评论信息;Obtaining comment information corresponding to the title information;
    根据所述评论信息,确定所述业务对象集合的描述信息。Determining the description information of the business object set according to the comment information.
  20. 根据权利要求19所述的方法,其特征在于,所述获取与所述标题信息相对应的评论信息的步骤包括:The method according to claim 19, wherein the step of acquiring the comment information corresponding to the title information comprises:
    对所述标题信息进行分词,获得一个或多个分词短语;Segmenting the title information to obtain one or more participle phrases;
    分别获取与所述一个或多个分词短语相匹配的评论信息。The review information that matches the one or more participle phrases is separately obtained.
  21. 根据权利要求19所述的方法,其特征在于,所述根据所述评论信息,确定所述业务对象集合的描述信息的步骤包括:The method according to claim 19, wherein the determining the description information of the business object set according to the comment information comprises:
    对所述评论信息进行排序,获得第二预设数量的目标评论信息;Sorting the comment information to obtain a second preset number of target comment information;
    采用所述目标评论信息和第二预设模板,确定所述业务对象集合的描述信息。Determining the description information of the business object set by using the target comment information and the second preset template.
  22. 一种聚类数据表的展现装置,其特征在于,包括:A display device for clustering data tables, comprising:
    接收模块,用于接收聚类数据表的展现请求;a receiving module, configured to receive a presentation request of the cluster data table;
    展现模块,用于依据所述请求展现聚类数据表;所述聚类数据表包括多个业务对象集合,所述业务对象集合具有多个关联的业务对象,以及,对应的主题信息。a presentation module, configured to present a cluster data table according to the request; the cluster data table includes a plurality of business object sets, the business object set having a plurality of associated business objects, and corresponding topic information.
  23. 根据权利要求22所述的装置,其特征在于,所述多个业务对象集合通过调用如下模块生成:The apparatus according to claim 22, wherein said plurality of business object sets are generated by calling a module as follows:
    业务对象获取模块,用于获取多个业务对象,所述多个业务对象分别具有对应的属性信息;a business object obtaining module, configured to acquire a plurality of business objects, where the plurality of business objects respectively have corresponding attribute information;
    关联度确定模块,用于根据所述多个业务对象的属性信息,确定所述多个业务对象之间的关联度;An association determining module, configured to determine, according to attribute information of the multiple business objects, an association degree between the multiple service objects;
    分类模块,用于根据所述多个业务对象之间的关联度,对所述多个业务对象进行分类,得到多个业务对象集合。And a classification module, configured to classify the plurality of business objects according to the degree of association between the plurality of business objects, to obtain a plurality of business object sets.
  24. 根据权利要求23所述的装置,其特征在于,所述多个业务对象的属性信息包括多个业务对象的名称、价格信息、消费者信息、品牌信息、 类目信息,和/或,图片信息;所述关联度确定模块包括:The apparatus according to claim 23, wherein the attribute information of the plurality of business objects includes a name of a plurality of business objects, price information, consumer information, brand information, Category information, and/or picture information; the relevance determination module includes:
    相似度确定子模块,用于分别确定任意两个业务对象之间的名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度;The similarity determination submodule is configured to respectively determine name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity between any two business objects;
    关联度确定子模块,用于根据所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度,分别确定任意两个业务对象之间的关联度。The association determination submodule is configured to respectively determine between any two business objects according to the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity The degree of relevance.
  25. 根据权利要求23或24所述的装置,其特征在于,所述分类模块包括:The apparatus according to claim 23 or 24, wherein the classification module comprises:
    组合子模块,用于分别将关联度大于预设阈值的业务对象进行组合,得到多个业务对象集合。The combination sub-module is configured to combine the business objects whose degree of association is greater than the preset threshold to obtain a plurality of business object sets.
  26. 根据权利要求22所述的装置,其特征在于,所述主题信息包括所述业务对象集合的标题信息和描述信息,所述主题信息通过调用如下模块生成:The apparatus according to claim 22, wherein said topic information comprises title information and description information of said set of business objects, said topic information being generated by calling a module as follows:
    属性信息获取模块,用于获取所述业务对象集合中多个关联的业务对象的属性信息;An attribute information obtaining module, configured to acquire attribute information of multiple associated business objects in the business object set;
    标题信息确定模块,用于根据所述属性信息,确定所述业务对象集合的标题信息;a header information determining module, configured to determine, according to the attribute information, header information of the service object set;
    描述信息确定模块,用于根据所述标题信息,确定所述业务对象集合的描述信息。And a description information determining module, configured to determine, according to the title information, description information of the service object set.
  27. 根据权利要求22所述的装置,其特征在于,所述请求中还包括用户需求信息,所述展现模块包括:The device according to claim 22, wherein the request further includes user requirement information, and the presentation module comprises:
    目标业务对象集合获取子模块,用于获取与用户需求信息相匹配的多个目标业务对象集合;a target business object set obtaining submodule, configured to acquire a plurality of target business object sets that match user demand information;
    目标业务对象展现子模块,用于展现所述多个目标业务对象集合。The target business object presentation submodule is configured to display the plurality of target business object sets.
  28. 一种聚类数据表的生成装置,其特征在于,包括:A device for generating a clustering data table, comprising:
    获取模块,用于获取多个业务对象,所述多个业务对象分别具有对应的 属性信息;An obtaining module, configured to acquire multiple business objects, where the multiple business objects respectively have corresponding Attribute information
    关联度确定模块,用于根据所述多个业务对象的属性信息,确定所述多个业务对象之间的关联度;An association determining module, configured to determine, according to attribute information of the multiple business objects, an association degree between the multiple service objects;
    分类模块,用于根据所述多个业务对象之间的关联度,对所述多个业务对象进行分类,得到多个业务对象集合,所述多个业务对象集合分别具有多个关联的业务对象;a classification module, configured to classify the plurality of business objects according to the degree of association between the plurality of business objects, to obtain a plurality of business object sets, wherein the plurality of business object sets respectively have multiple associated business objects ;
    主题信息确定模块,用于根据所述多个关联的业务对象,分别确定所述多个业务对象集合对应的主题信息;a topic information determining module, configured to respectively determine topic information corresponding to the plurality of service object sets according to the plurality of associated business objects;
    生成模块,用于依据所述多个业务对象集合,以及,对应的主题信息,生成聚类数据表。And a generating module, configured to generate a cluster data table according to the plurality of business object sets and the corresponding topic information.
  29. 根据权利要求28所述的装置,其特征在于,所述多个业务对象的属性信息包括多个业务对象的名称、价格信息、消费者信息、品牌信息、类目信息,和/或,图片信息;所述关联度确定模块包括:The apparatus according to claim 28, wherein the attribute information of the plurality of business objects includes a name, price information, consumer information, brand information, category information, and/or picture information of the plurality of business objects. The association determination module includes:
    相似度确定子模块,用于分别确定任意两个业务对象之间的名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度;The similarity determination submodule is configured to respectively determine name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity between any two business objects;
    关联度确定子模块,用于根据所述名称相似度、价格相似度、消费者相似度、品牌相似度、类目相似度,和/或,图片相似度,分别确定任意两个业务对象之间的关联度。The association determination submodule is configured to respectively determine between any two business objects according to the name similarity, price similarity, consumer similarity, brand similarity, category similarity, and/or picture similarity The degree of relevance.
  30. 根据权利要求28或29所述的装置,其特征在于,所述主题信息确定模块包括:The device according to claim 28 or 29, wherein the subject information determining module comprises:
    属性信息获取子模块,用于获取所述业务对象集合中多个关联的业务对象的属性信息;An attribute information obtaining submodule, configured to acquire attribute information of multiple associated business objects in the business object set;
    标题信息确定子模块,用于根据所述属性信息,确定所述业务对象集合的标题信息;a header information determining submodule, configured to determine, according to the attribute information, header information of the service object set;
    描述信息确定子模块,用于根据所述标题信息,确定所述业务对象集合的描述信息。 And a description information determining submodule, configured to determine, according to the header information, description information of the service object set.
PCT/CN2017/092444 2016-07-18 2017-07-11 Method, device and system for presenting clustering data table WO2018014759A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610565869.XA CN107632984A (en) 2016-07-18 2016-07-18 A kind of cluster data table shows methods, devices and systems
CN201610565869.X 2016-07-18

Publications (1)

Publication Number Publication Date
WO2018014759A1 true WO2018014759A1 (en) 2018-01-25

Family

ID=60991905

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/092444 WO2018014759A1 (en) 2016-07-18 2017-07-11 Method, device and system for presenting clustering data table

Country Status (3)

Country Link
CN (1) CN107632984A (en)
TW (1) TW201816684A (en)
WO (1) WO2018014759A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108921918A (en) * 2018-07-24 2018-11-30 Oppo广东移动通信有限公司 Video creation method and relevant apparatus
CN109558593A (en) * 2018-11-30 2019-04-02 北京字节跳动网络技术有限公司 Method and apparatus for handling text
CN110852094A (en) * 2018-08-01 2020-02-28 北京京东尚科信息技术有限公司 Method, apparatus and computer-readable storage medium for retrieving a target
CN110929002A (en) * 2018-09-03 2020-03-27 广州神马移动信息科技有限公司 Similar article duplicate removal method, device, terminal and computer readable storage medium
CN111291019A (en) * 2018-12-07 2020-06-16 中国移动通信集团陕西有限公司 Similarity discrimination method and device for data model
CN111782916A (en) * 2020-08-20 2020-10-16 支付宝(杭州)信息技术有限公司 Method and device for generating service information report
CN112527965A (en) * 2020-12-18 2021-03-19 国家电网有限公司客户服务中心 Automatic question answering implementation method and device based on combination of professional library and chatting library
CN113722370A (en) * 2021-08-30 2021-11-30 康键信息技术(深圳)有限公司 Data management method, device, equipment and medium based on index analysis
CN114219589A (en) * 2022-02-21 2022-03-22 浙江口碑网络技术有限公司 Virtual entity object generation and page display method and device and electronic equipment
CN116090789A (en) * 2023-03-03 2023-05-09 麦高(广东)数字科技有限公司 Lean manufacturing production management system and method based on data analysis
CN118014514A (en) * 2024-01-17 2024-05-10 南京泛泰数字科技研究院有限公司 Service management system and method based on electronic fence

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647981A (en) * 2018-05-17 2018-10-12 阿里巴巴集团控股有限公司 A kind of target object incidence relation determines method and apparatus
CN109800215B (en) * 2018-12-26 2020-11-24 北京明略软件系统有限公司 Bidding processing method and device, computer storage medium and terminal
CN110232138B (en) * 2019-05-20 2022-05-20 中国银行股份有限公司 Service guiding method, device and storage medium
CN111522606B (en) * 2020-04-26 2023-08-04 广东优特云科技有限公司 Data processing method, device, equipment and storage medium
CN111291059A (en) * 2020-05-12 2020-06-16 北京东方通科技股份有限公司 Data processing method based on memory data grid
CN113807630B (en) * 2020-12-23 2024-03-05 京东科技控股股份有限公司 Method, device, equipment and storage medium for acquiring requirements of robot service platform
CN113256420B (en) * 2021-05-27 2024-03-01 中国航空结算有限责任公司 Enterprise user identification method, device, equipment and medium in transaction
CN115019078B (en) * 2022-08-09 2023-01-24 阿里巴巴(中国)有限公司 Vehicle image processing method, computing device and storage medium
CN117933206B (en) * 2024-03-14 2024-06-25 武汉数澜科技有限公司 Service data processing method, device, equipment, storage medium and program product

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102375823A (en) * 2010-08-13 2012-03-14 腾讯科技(深圳)有限公司 Searching result gathering display method and system
CN103246685A (en) * 2012-02-14 2013-08-14 株式会社理光 Method and equipment for normalizing attributes of object instance into features
CN103678335A (en) * 2012-09-05 2014-03-26 阿里巴巴集团控股有限公司 Method and device for identifying commodity with labels and method for commodity navigation
CN103902674A (en) * 2014-03-19 2014-07-02 百度在线网络技术(北京)有限公司 Method and device for collecting evaluation data of specific subject

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103365902B (en) * 2012-03-31 2017-06-20 北大方正集团有限公司 The appraisal procedure and device of internet news

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102375823A (en) * 2010-08-13 2012-03-14 腾讯科技(深圳)有限公司 Searching result gathering display method and system
CN103246685A (en) * 2012-02-14 2013-08-14 株式会社理光 Method and equipment for normalizing attributes of object instance into features
CN103678335A (en) * 2012-09-05 2014-03-26 阿里巴巴集团控股有限公司 Method and device for identifying commodity with labels and method for commodity navigation
CN103902674A (en) * 2014-03-19 2014-07-02 百度在线网络技术(北京)有限公司 Method and device for collecting evaluation data of specific subject

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108921918A (en) * 2018-07-24 2018-11-30 Oppo广东移动通信有限公司 Video creation method and relevant apparatus
CN108921918B (en) * 2018-07-24 2023-05-30 Oppo广东移动通信有限公司 Video creation method and related device
CN110852094A (en) * 2018-08-01 2020-02-28 北京京东尚科信息技术有限公司 Method, apparatus and computer-readable storage medium for retrieving a target
CN110852094B (en) * 2018-08-01 2023-11-03 北京京东尚科信息技术有限公司 Method, apparatus and computer readable storage medium for searching target
CN110929002A (en) * 2018-09-03 2020-03-27 广州神马移动信息科技有限公司 Similar article duplicate removal method, device, terminal and computer readable storage medium
CN109558593A (en) * 2018-11-30 2019-04-02 北京字节跳动网络技术有限公司 Method and apparatus for handling text
CN111291019B (en) * 2018-12-07 2023-09-29 中国移动通信集团陕西有限公司 Similarity discrimination method and device for data model
CN111291019A (en) * 2018-12-07 2020-06-16 中国移动通信集团陕西有限公司 Similarity discrimination method and device for data model
CN111782916A (en) * 2020-08-20 2020-10-16 支付宝(杭州)信息技术有限公司 Method and device for generating service information report
CN111782916B (en) * 2020-08-20 2024-03-22 支付宝(杭州)信息技术有限公司 Method and device for generating business information report
CN112527965A (en) * 2020-12-18 2021-03-19 国家电网有限公司客户服务中心 Automatic question answering implementation method and device based on combination of professional library and chatting library
CN113722370A (en) * 2021-08-30 2021-11-30 康键信息技术(深圳)有限公司 Data management method, device, equipment and medium based on index analysis
CN114219589A (en) * 2022-02-21 2022-03-22 浙江口碑网络技术有限公司 Virtual entity object generation and page display method and device and electronic equipment
CN114219589B (en) * 2022-02-21 2023-02-10 浙江口碑网络技术有限公司 Virtual entity object generation and page display method and device and electronic equipment
CN116090789B (en) * 2023-03-03 2023-08-29 麦高(广东)数字科技有限公司 Lean manufacturing production management system and method based on data analysis
CN116090789A (en) * 2023-03-03 2023-05-09 麦高(广东)数字科技有限公司 Lean manufacturing production management system and method based on data analysis
CN118014514A (en) * 2024-01-17 2024-05-10 南京泛泰数字科技研究院有限公司 Service management system and method based on electronic fence

Also Published As

Publication number Publication date
TW201816684A (en) 2018-05-01
CN107632984A (en) 2018-01-26

Similar Documents

Publication Publication Date Title
WO2018014759A1 (en) Method, device and system for presenting clustering data table
TWI787196B (en) Method, device and system for generating business object attribute identification
TWI631474B (en) Method and device for product identification label and method for product navigation
KR102075833B1 (en) Curation method and system for recommending of art contents
Hu et al. Collaborative fashion recommendation: A functional tensor factorization approach
US8589429B1 (en) System and method for providing query recommendations based on search activity of a user base
Begelman et al. Automated tag clustering: Improving search and exploration in the tag space
WO2020253591A1 (en) Search method and apparatus applying tag knowledge network
TWI615724B (en) Information push, search method and device based on electronic information-based keyword extraction
TWI557664B (en) Product information publishing method and device
TWI652584B (en) Method and device for matching text information and pushing business objects
CN108346075B (en) Information recommendation method and device
US20150186503A1 (en) Method, system, and computer readable medium for interest tag recommendation
TW201520790A (en) Individualized data search
CN106294500B (en) Content item pushing method, device and system
CN103246980A (en) Information output method and server
TWI705411B (en) Method and device for identifying users with social business characteristics
US8997008B2 (en) System and method for searching through a graphic user interface
CN111651678B (en) Personalized recommendation method based on knowledge graph
CN105931082B (en) Commodity category keyword extraction method and device
CN110347922B (en) Recommendation method, device, equipment and storage medium based on similarity
Feng et al. Recommend social network users favorite brands
CN112417299A (en) Webpage recommendation method, computer storage medium and computing device
Alotaibi et al. A Comparison of Topic Modeling Algorithms on Visual Social Media Networks
ChunLi et al. Aspect-based personalized review ranking

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17830393

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17830393

Country of ref document: EP

Kind code of ref document: A1