CN113362089A - Attribute feature extraction method and device - Google Patents

Attribute feature extraction method and device Download PDF

Info

Publication number
CN113362089A
CN113362089A CN202010136606.3A CN202010136606A CN113362089A CN 113362089 A CN113362089 A CN 113362089A CN 202010136606 A CN202010136606 A CN 202010136606A CN 113362089 A CN113362089 A CN 113362089A
Authority
CN
China
Prior art keywords
article
attribute
item
type
chi
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010136606.3A
Other languages
Chinese (zh)
Inventor
王蕾
肇斌
张旭
王晶
马博文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202010136606.3A priority Critical patent/CN113362089A/en
Publication of CN113362089A publication Critical patent/CN113362089A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0623Item investigation
    • G06Q30/0625Directed, with specific intent or strategy

Abstract

The invention discloses a method and a device for extracting attribute features, and relates to the technical field of computers. One embodiment of the method comprises: determining the article types in response to the input of the article type range, determining an article set corresponding to the article types, acquiring the attribute of each article, and counting the number of the articles corresponding to each attribute in the article set; acquiring expected frequency corresponding to the single attribute and the article type, and calculating a chi-square value of the single attribute to the article type according to the article quantity and the corresponding expected frequency; and sequencing the chi-square values to extract a predetermined number of attributes as basic features of the article types, and further adjusting the tree structure of the article types based on the basic features. According to the method, for a single article class, basic features are extracted from an attribute set of the single article class in a chi-square verification mode, and then the article classes of each branch in the tree structure are screened and reduced in a depth-first search mode or a breadth-first search mode, so that a simplified article class tree structure is obtained.

Description

Attribute feature extraction method and device
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for extracting attribute features.
Background
The market segment oriented competitive trend prediction is the basis of market item design decision and is also an important problem in the field of market demand prediction, and the main purpose of the market segment oriented competitive trend prediction is to predict the demand of a specific user group in the market, so that the characteristics of the most competitive item at present are grasped, and the purpose of avoiding malignant competition (such as price fight) caused by the similarity of the items is achieved. Therefore, the method can quickly and accurately mine the characteristics of the articles, provides a reference basis for the design and sale of the articles, and is a key step for expanding the market and enhancing the competitiveness of enterprises in the future.
However, the traditional market competition trend prediction scheme is more based on manual operation, such as questionnaire, the data size depends on the size of team personnel, and the data reliability depends on the expertise of the team personnel.
In the process of implementing the invention, the inventor finds that the prior art has at least the following problems:
the existing prediction is limited in quantity of related articles, too high in consumption cost, low in analysis timeliness and dependent on team speciality, when the article types are dynamically changed, manual prediction may need to be conducted again, a real-time analysis system cannot be formed to conduct rapid iterative response, and therefore the purpose of rapidly responding to the market cannot be achieved.
Disclosure of Invention
In view of this, embodiments of the present invention provide an attribute feature extraction method and apparatus, which can at least solve the problem in the prior art that the basic features of the categories cannot be adjusted in real time.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided an attribute feature extraction method including:
determining the item types in response to the input of the item type range, determining an item set corresponding to the item types, acquiring the attributes of all items, and counting the quantity of the items corresponding to all the attributes in the item set;
acquiring expected frequency corresponding to the single attribute and the article type, and calculating a chi-square value of the single attribute to the article type according to the article quantity and the corresponding expected frequency;
and sequencing chi-square values to extract a predetermined number of attributes as basic features of the article types, and further adjusting the article type tree structure based on the basic features.
Optionally, before the determining the item type in response to the input of the item type range, the method further includes:
determining the category class of each article in each category grade according to the physical attribute of each article in the article information database, and constructing a category range table;
the determining the item type in response to the input of the item type range comprises:
in response to an input of an item class range, at least one item class corresponding to the item class range is determined using the class range table.
Optionally, before the determining, by using the item class range table, at least one item class corresponding to the item class range, the method further includes:
and acquiring a field corresponding to the item type range from a preset query field table, so as to extract at least one item type corresponding to the field from the item type range table.
Optionally, before the obtaining the attribute of each article, the method further includes:
constructing an article inherent attribute table according to the physical attributes of each article in the article information database;
the obtaining of the attribute of each article further comprises:
and according to the identification of each article in the article set, acquiring the attribute from the inherent attribute table of the article to obtain the attribute of each article.
Optionally, the obtaining the expected frequency corresponding to the single attribute and the article type further includes:
counting the first quantity of the articles in the article set, and acquiring the total quantity of the articles in an article information database and the second quantity of the articles corresponding to the single attribute;
inputting the first quantity, the second quantity, the item quantity and the total quantity into a desired frequency calculation mode to obtain a desired frequency corresponding to the single attribute and the item type.
Optionally, the adjusting the item tree structure based on the basic features includes:
determining the sum of chi-square values of the attributes in the basic features, dividing the number of the attributes by the sum of the chi-square values to obtain a chi-square average value, and taking the chi-square average value as the score of the article class;
in the article type tree structure, acquiring a sub-type of the article type based on a depth-first search mode, judging whether the score of the sub-type is greater than that of the single article type, and if so, retaining the sub-type; if the number of the sub-categories is less than the preset number, the sub-categories are rejected.
Optionally, the adjusting the item tree structure based on the basic features includes:
determining the sum of chi-square values of the attributes in the basic features, dividing the number of the attributes by the sum of the chi-square values to obtain a chi-square average value, and taking the chi-square average value as the score of the article class;
and in the article type tree structure, acquiring all the sub-types of the article types based on a breadth-first search mode so as to eliminate the sub-types with the scores smaller than the scores of the article types.
To achieve the above object, according to another aspect of an embodiment of the present invention, there is provided an attribute feature extraction device including:
the determining module is used for determining the article types in response to the input of the article type range, determining an article set corresponding to the article types, acquiring the attribute of each article, and counting the number of the articles corresponding to each attribute in the article set;
the calculation module is used for acquiring the expected frequency corresponding to the single attribute and the article type, and calculating the chi-square value of the single attribute to the article type according to the article quantity and the corresponding expected frequency;
and the adjusting module is used for sequencing the chi-square values, extracting attributes with a preset number as the basic features of the article types, and further adjusting the tree structure of the article types based on the basic features.
Optionally, the determining module is further configured to:
determining the category class of each article in each category grade according to the physical attribute of each article in the article information database, and constructing a category range table;
in response to an input of an item class range, at least one item class corresponding to the item class range is determined using the class range table.
Optionally, the determining module is further configured to: and acquiring a field corresponding to the item type range from a preset query field table, so as to extract at least one item type corresponding to the field from the item type range table.
Optionally, the determining module is further configured to:
constructing an article inherent attribute table according to the physical attributes of each article in the article information database;
and according to the identification of each article in the article set, acquiring the attribute from the inherent attribute table of the article to obtain the attribute of each article.
Optionally, the calculating module is further configured to:
counting the first quantity of the articles in the article set, and acquiring the total quantity of the articles in an article information database and the second quantity of the articles corresponding to the single attribute;
inputting the first quantity, the second quantity, the item quantity and the total quantity into a desired frequency calculation mode to obtain a desired frequency corresponding to the single attribute and the item type.
Optionally, the adjusting module is configured to:
determining the sum of chi-square values of the attributes in the basic features, dividing the number of the attributes by the sum of the chi-square values to obtain a chi-square average value, and taking the chi-square average value as the score of the article class;
in the article type tree structure, acquiring a sub-type of the article type based on a depth-first search mode, judging whether the score of the sub-type is greater than that of the single article type, and if so, retaining the sub-type; if the number of the sub-categories is less than the preset number, the sub-categories are rejected.
Optionally, the adjusting module is configured to:
determining the sum of chi-square values of the attributes in the basic features, dividing the number of the attributes by the sum of the chi-square values to obtain a chi-square average value, and taking the chi-square average value as the score of the article class;
and in the article type tree structure, acquiring all the sub-types of the article types based on a breadth-first search mode so as to eliminate the sub-types with the scores smaller than the scores of the article types.
To achieve the above object, according to still another aspect of embodiments of the present invention, there is provided an attribute feature extraction electronic device.
The electronic device of the embodiment of the invention comprises: one or more processors; a storage device, configured to store one or more programs, which when executed by the one or more processors, cause the one or more processors to implement any of the above-described attribute feature extraction methods.
To achieve the above object, according to still another aspect of embodiments of the present invention, there is provided a computer-readable medium on which a computer program is stored, the program implementing any of the above-described attribute feature extraction methods when executed by a processor.
According to the scheme provided by the invention, one embodiment of the invention has the following advantages or beneficial effects: and for a single article type, extracting basic features from the attribute set by using a chi-square verification mode, and screening and reducing the article types of each branch by using a depth-first search or breadth-first search mode based on the basic features to obtain a simplified article type tree structure.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
fig. 1 is a main flow diagram of an attribute feature extraction method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart diagram illustrating an alternative attribute feature extraction method according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart diagram illustrating an alternative attribute feature extraction method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an article class tree structure according to an embodiment of the present invention;
FIG. 5 is a schematic flow chart diagram illustrating an alternative attribute feature extraction method according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an attribute feature extraction process according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of another attribute feature extraction process according to an embodiment of the present invention;
fig. 8 is a schematic diagram of the main blocks of an attribute feature extraction apparatus according to an embodiment of the present invention;
FIG. 9 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
FIG. 10 is a schematic block diagram of a computer system suitable for use with a mobile device or server implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Referring to fig. 1, a main flowchart of an attribute feature extraction method provided in an embodiment of the present invention is shown, including the following steps:
s101: determining the item types in response to the input of the item type range, determining an item set corresponding to the item types, acquiring the attributes of all items, and counting the quantity of the items corresponding to all the attributes in the item set;
s102: acquiring expected frequency corresponding to the single attribute and the article type, and calculating a chi-square value of the single attribute to the article type according to the article quantity and the corresponding expected frequency;
s103: and sequencing chi-square values to extract a predetermined number of attributes as basic features of the article types, and further adjusting the article type tree structure based on the basic features.
In the above embodiment, for step S101, the present invention provides a requirement input interface for a user (e.g., an item design department) to define by himself/herself, as a query input condition for invoking the "intelligent item type adaptive feature mining apparatus", and uses the query input condition as a service request content. The invention is mainly described aiming at the article attributes under the article types, and the input condition is mainly 'inquiring the article type range', which is shown in the table 1:
Figure BDA0002397543280000061
query conditions for item category range:
aiming at which article types;
second, aiming at the brand of the article;
and thirdly, range of the articles.
After the query conditions are input, the following attribute feature extraction operation of the article category is finished by calling the intelligent article category self-adaptive feature mining device.
The intelligent article type self-adaptive feature mining device determines article types such as large household appliances according to an article type range input by a user, and extracts articles corresponding to the article types from an article information database to form an article set.
It should be noted that, in the present invention, a category range table is constructed in advance according to the category to which each article belongs in each category level (for example, a primary category, a secondary category, and a tertiary category), and the subsequent article category query may be directly performed according to the table.
Besides the class range table, the invention also constructs an article attribute table in advance for outputting data used for characterizing the article to form a data table used by a subsequent module. The data structure takes the article as a dimension, and the data content is the physical attribute thereof, such as size, color, weight and the like. See table 2 for an indication:
Figure BDA0002397543280000071
and counting the attributes of all the articles in the article set to obtain an attribute set corresponding to the article type of the article, and further counting the number of the articles corresponding to each attribute in the attribute set, wherein taking table 2 as an example, the number of the articles with the color "white" is 2, and the number of the articles with the color "silver" is 1. The number of articles having a color attribute was 5, the number of articles having a weight attribute was 5, the number of articles having a size attribute was 3, the number of articles having a motor type attribute was 3, the number of articles having a washing ratio attribute was 2, and the number of articles having a battery capacity attribute was 1.
For step S102, for feature mining of article types, a feature selection method based on frequency may be used, and the feature mining method may also be based on MI (Expected mutual information), χ (maximum likelihood) and/or the like2(Chi-Squared Test, Chi-square Test). The invention mainly selects a chi-square verification mode, and the specific scheme is as follows:
1) for a certain attribute under a certain article class, counting the number of SKUs (Stock Keeping units) related to the attribute under the article class;
2) calculating the chi-squared value of the attribute for the item class as follows:
Figure BDA0002397543280000081
wherein D is the item set under the item category, t is the item attribute, c is the item category/class, etIf the set D contains the item attribute t, the set D is 1, otherwise, the set D is 0; e.g. of the typecIf the set D belongs to the article class c, the value is 1, otherwise, the value is 0;
Figure BDA0002397543280000082
for observing the frequency, the representation is assigned et、ecThe number of items involved in the process,
Figure BDA0002397543280000083
to designate et、ecThe expected frequency (or theoretical value) of the time period represents the expected quantity of the articles corresponding to a specific attribute under a certain article type, and can be set manually or calculated in a certain mode.
In addition, χ2The (D, t, c) value represents the degree of deviation between the observed frequency and the desired frequency. The closer the observed frequency is to the desired frequency, the smaller the difference between the two, χ2The smaller the (D, t, c) value, the larger otherwise. Therefore, χ2(D, t, c) is the distance between the observed frequency and the desired frequencyThe measurement index is also assumed to be true or false.
Taking the article class as the washing machine and the attribute as the color as an example, calculating
Figure BDA0002397543280000084
The following four parameters are known:
1) total number of all articles in the store, i.e. all et、ecThe total number of the following items is set as N;
2) the number of articles in the article set under the article type washing machine, namely a first number;
3) the number of items in the item library that possess the color attribute, i.e., the second number;
4) number of articles having color attributes under article type washing machine.
Can be calculated by the four parameters
Figure BDA0002397543280000085
The required values are:
1) the number of articles with color attribute under the article type washing machine is marked as N11
2) The number of articles without color attribute under the article type washing machine is marked as N01First number-N11
3) The number of articles which have color attributes but are not of the washing machine type in the article library is marked as N10Second number-N11
4) The number of articles which do not have color attribute and are not the washing machine in the article storage is marked as N00N-first number-second number + N11*2。
Under the condition that the item attribute t and the item class c are independent, calculating to obtain an expected probability:
Figure BDA0002397543280000091
(the example is only, and the practical operation can include the above 4 values)
Similarly, when the chi-squared value of the article attribute t for the article class c is calculated, the calculation needs to be performed based on four values under the dimensions of the attribute and the article class in a manner similar to the calculation of the expected probability, and the specific calculation process is not repeated here.
In step S103, after the chi-squared values of all the attributes in each item class are obtained, the k attributes with the highest chi-squared value score are selected as the basic features in the item class, and the k attributes are extracted in order of the chi-squared values from large to small, for example.
The basic characteristics of a certain article category include not only general attributes (such as price, size, weight and the like) of most articles under the article category, but also characteristic attributes (such as a washing machine with a cleaning ratio attribute, a refrigerator with a door opening mode attribute, and a mobile phone with a memory size attribute) which can be distinguished from other article categories.
The obtained article attributes of a single article type can be used for mining the optimal features of the article type, only the basic features are used as input at the moment, and the method can also be used for simplifying tree nodes in the article type tree structure and refining the article type tree structure with representative significance.
The method provided by the above embodiment defines the problem of how to generate the basic features of the item classes as the feature selection problem of how to select a part of subsets from the item attributes appearing in the item set, thereby achieving the purposes of simplifying features, removing noise and improving feature mining efficiency.
Referring to fig. 2, a schematic flow chart of an optional attribute feature extraction method according to an embodiment of the present invention is shown, including the following steps:
s201: determining the category class of each article in each category grade according to the physical attribute of each article in the article information database, and constructing a category range table;
s202: responding to the input of an article type range, acquiring a field corresponding to the article type range from a preset query field table, and extracting the article type corresponding to the field from the article type range table;
s203: determining an article set corresponding to the article type, acquiring the attribute of each article, and counting the number of the articles corresponding to each attribute in the article set;
s204: acquiring expected frequency corresponding to the single attribute and the article type, and calculating a chi-square value of the single attribute to the article type according to the article quantity and the corresponding expected frequency;
s205: and sequencing chi-square values to extract a predetermined number of attributes as basic features of the article types, and further adjusting the article type tree structure based on the basic features.
In the above embodiment, the descriptions of steps S101 to S103 shown in fig. 1 can be referred to for steps S201 and S203 to S205, and are not repeated herein.
In the above embodiment, in step S202, the "intelligent article type adaptive feature mining apparatus" disassembles the query condition input by the user, and finds the relevant field of the corresponding article type range from the storage system, so as to facilitate later article calibration.
The invention pre-constructs a query field table (or called expansion table) construction unit, and forms a data table for subsequent modules to use for analyzing query input conditions, wherein fields and contents of the data table in a storage system need to be queried.
The data in the storage system is various, however, the text in the query condition needs to be mapped into field names in the data table, for example, "which article types" directly find the relevant fields of "article types", and for example, "sales volume in unit time period is the same as the first three brands", at least three fields of "time", "sales volume", and "brands" need to be parsed. Ideally, all fields in the query are contained in the storage system. See, for example, table 3 below:
Figure BDA0002397543280000101
in actual practice, some fields in the query condition may not exist in the storage system. For the situation, a selection button is arranged in one or more input items of the input query condition interface for the user, and a query field is customized in advance for each selectable item, so that the purpose that the field can be queried is achieved.
Furthermore, the number of the query condition fields is too small, and the query fields can be expanded according to the predefined content meeting the query conditions to form a query field expansion table. As mentioned above, when analyzing "sales volume in unit time period is the same as the first three brands", at least three fields of "time", "sales volume" and "brands" need to be analyzed, and the analysis process is referred to as "development". The results obtained are shown in FIG. 4:
Figure BDA0002397543280000102
Figure BDA0002397543280000111
the range of article types may include primary, secondary, tertiary, etc., as shown in table 6:
Figure BDA0002397543280000112
in the above embodiment, when the article types are too wide and the article range is too large, in order to accurately mine the basic features of the article types, the article types may be preferentially classified, and then the basic feature analysis may be performed, and the article search may be performed based on the field.
Referring to fig. 3, a schematic flow chart of another optional attribute feature extraction method according to the embodiment of the present invention is shown, including the following steps:
s301: determining the item types in response to the input of the item type range, determining an item set corresponding to the item types, acquiring the attributes of all items, and counting the quantity of the items corresponding to all the attributes in the item set;
s302: acquiring expected frequency corresponding to the single attribute and the article type, and calculating a chi-square value of the single attribute to the article type according to the article quantity and the corresponding expected frequency;
s303: sorting the chi-square values to extract a predetermined number of attributes as basic features of the article types;
s304: determining the sum of chi-square values of the attributes in the basic features, dividing the number of the attributes by the sum of the chi-square values to obtain a chi-square average value, and taking the chi-square average value as the score of the article class;
s305: in an article class tree structure, acquiring a sub-class of the article class based on a depth-first search mode, judging whether the score of the sub-class is greater than that of the single article class, and if so, retaining the sub-class; if the number of the sub-categories is less than the preset number, the sub-categories are rejected.
In the above embodiment, the descriptions of steps S101 to S103 shown in fig. 1 can be referred to for steps S301 to S303, and are not repeated herein.
In the above embodiment, in step S304, after the chi-square value of the basic feature of the item class is obtained, the sum of the chi-square values (k chi-square values) is counted, and the chi-square average value is obtained as the group of the item class j, specifically:
Figure BDA0002397543280000121
for step S305, for a binary tree, a Depth First Search (Depth First Search) is to traverse the nodes of the tree along the Depth of the tree, searching for branches of the tree as deep as possible. The method comprises the steps of accessing a root node, traversing a left sub-tree and then traversing a right sub-tree, so that the right sub-tree is firstly pushed and then the left sub-tree is pushed by utilizing the characteristic of first-in and last-out of a stack, and thus the left sub-tree is positioned at the top of the stack, and the left sub-tree of the node can be ensured to be traversed in preference to the right sub-tree.
The invention constructs an article tree structure in advance, and aims to establish the tree structure: when the range of the article is too large, the category label is refined by carrying out category classification processing on the range of the article. The specific method comprises the following steps:
1) if the item class range table of the item dimension output by the item information unit is extracted, if the related fields of the item class classification exist, the related fields can be directly used, and an item class tree structure is established;
2) if not, the article type can be subdivided step by using article expression data for establishing article and article characteristic output through an intelligent algorithm of unsupervised learning and semi-supervised learning, so that an article type division tree and a series of article types on leaf nodes and corresponding basic characteristic sets can be formed.
Combining the condition of querying the range of the article types, finally obtaining a multi-level article type dimension table based on the article type classification tree, as shown in table 7:
article ID Category label details
Article 1 Household appliance-big household appliance-dryer-drum
Article 2 Household appliance-big household appliance-washing machine-drum washing and drying machine
Article 3 Household appliance-big household appliance-washing machine-impeller type
Article 4 Beauty skin care-perfume makeup-air cushion BB/BB cream-air cushion BB
Article 5 Mobile phone communication-mobile phone
For example, as shown in fig. 4, the electrical tree structure is divided into two categories of household electrical appliances and non-household electrical appliances from the root node, the household electrical appliances are further divided into two categories of large electrical appliances and small electrical appliances, the small electrical appliances include electric cookers, wall breaking machines and the like, and the large electrical appliances include air conditioners, refrigerators, washing machines and the like, and polling is performed until there is no sub-category.
Taking a household appliance as an example, when the selected article type range is the household appliance, the root node of the article type tree structure is the household appliance, and the child nodes are small household appliances and big household appliances:
and optimizing the article class tree structure, namely, beginning from the household appliance of the root node, mining the basic characteristics under each article class according to the steps S301 to S304 and calculating the scores of the basic characteristics. In the deep optimization search mode, the leftmost tree node in the tree structure is preferentially calculated, and the scores of the washing machine and the drum washing and drying machine (assuming that no sub-product class exists in the drum washing and drying machine) are also calculated in the above mode.
Firstly, the score of the drum washing and drying machine is compared with the score of the washing machine, if the result is less than the score, the items of the drum washing and drying machine are removed, otherwise, the items are kept, the drum washing and drying machine is switched to a roller type which belongs to the items of the washing machine and is positioned on the second left, and the comparison process is continuously repeated.
In the method provided by the above embodiment, after the basic features under each article class are mined, the score of the article class in the article class tree structure is compared with the score of the leftmost sub-class in the article class tree structure in a deep optimization search mode, so that the sub-classes are simplified, and finally the optimized article class is obtained.
Referring to fig. 5, a schematic flow chart of yet another optional attribute feature extraction method according to the embodiment of the present invention is shown, including the following steps:
s501: determining the item types in response to the input of the item type range, determining an item set corresponding to the item types, acquiring the attributes of all items, and counting the quantity of the items corresponding to all the attributes in the item set;
s502: acquiring expected frequency corresponding to the single attribute and the article type, and calculating a chi-square value of the single attribute to the article type according to the article quantity and the corresponding expected frequency;
s503: sorting the chi-square values to extract a predetermined number of attributes as basic features of the article types;
s504: determining the sum of chi-square values of the attributes in the basic features, dividing the number of the attributes by the sum of the chi-square values to obtain a chi-square average value, and taking the chi-square average value as the score of the article class;
s505: in the article type tree structure, all the sub-types of the article types are obtained based on a breadth-first search mode so as to eliminate the sub-types with the scores smaller than the scores of the article types.
In the above embodiment, steps S501 to S503 can refer to the descriptions of steps S101 to S103 shown in fig. 1, and step S504 can refer to the description of step S304 shown in fig. 3, which are not repeated herein.
In the above embodiment, in step S505, in addition to the depth-First Search method, the present invention may also adjust the structure of the article tree by using a Breadth-First Search method (break First Search). Breadth-first search, also known as breadth-first search or lateral-first search, is a search traversal along the width of the tree starting from the root node.
Also taking the household appliance shown in fig. 4 as an example, starting from the household appliance node, the scores of all the sub-nodes of the household appliance and the small household appliances are calculated, then the scores of all the sub-nodes of the large household appliance (including a washing machine, a dryer, an air conditioner and a refrigerator) are calculated, and then the scores of the sub-nodes of the washing machine and the scores of the sub-nodes of the dryer are calculated. And finally, comparing the scores of the child nodes with the scores of the child nodes to judge whether the child nodes need to be removed or not.
In the method provided by the embodiment, after the basic features of each article class are mined, the scores of the article classes in the article class tree structure are compared with the scores of all the sub-classes thereof by using a breadth optimization search mode, so as to judge whether the sub-classes are removed or not, and thus, the simplification and optimization of the article class tree structure are realized.
Referring to fig. 6, a schematic diagram of an attribute feature extraction process according to an embodiment of the present invention is shown, including:
1) the requirement module is used for proposing requirements by an article design part, and comprises a specified market segmentation range, a defined market competitive trend measuring standard and a defined article type range, and the invention is mainly described aiming at the article type range;
2) the original data extraction processing module comprises a query field expansion table construction unit, an article attribute table construction unit and an article class tree structure construction unit;
firstly, inquiring a field expansion table construction unit: the method is used for disassembling the query condition input by a user and determining a field corresponding to the query condition according to a query field (expansion) table, and the method mainly refers to an article type field;
an article attribute table construction unit: the system is constructed according to the attribute of each article in the article information database;
the object class tree structure building unit: for constructing the household appliances such as large electric appliances and small electric appliances according to the belonged relationship among the article types;
3) and the article class basic feature extraction module is used for mining the top k attributes of a certain article class with the chi-square value ranking as the basic features.
4) The intelligent article class self-adaptive partitioning module comprises a depth-first searching unit, an article class partitioning quality judging unit and a tree structure optimizing unit;
a depth-first search unit: starting from a root node of the tree structure, determining a leftmost tree node and a leftmost child node of the tree node, and so on until no child node exists;
a quality discrimination unit for classifying article types: the sub-nodes are used for comparing the scores of the article classes with the scores of the sub-nodes, and only the sub-nodes with the scores larger than or equal to the scores of the article classes are finally reserved;
(iii) tree structure optimization unit: for optimizing the tree structure of the article class.
Referring to fig. 7, another schematic diagram of an attribute feature extraction process according to an embodiment of the present invention is shown, including:
1) the requirement module is used for proposing requirements by an article design part, and comprises a specified market segment range, a defined market competitive trend measuring standard and a defined article class range.
2) The original data extraction processing module comprises a query field expansion table construction unit, an article attribute table construction unit and an article class tree structure construction unit;
the field (expansion) table construction unit is inquired: the method is used for disassembling the query condition input by a user and determining a field corresponding to the query condition according to a query field expansion table, and the method mainly refers to an article type field;
an article attribute table construction unit: the system is constructed according to the attribute of each article in the article information database;
the object class tree structure building unit: for example, household appliances, such as large electric appliances and small electric appliances, are constructed according to the belongings among the article types.
3) And the article characteristic basic characteristic extraction module is used for mining the top k attributes of a certain article class with the chi-square value ranking as the basic characteristics.
4) The intelligent article type self-adaptive dividing module comprises a breadth-first searching unit and an article type dividing quality judging unit;
a breadth-first search unit: determining child nodes from the root node according to the hierarchical relationship, extracting all child nodes in the hierarchy, and so on until no child node exists;
a quality discrimination unit for classifying article types: the sub-nodes are used for comparing the scores of the article classes with the scores of the sub-nodes, and only the sub-nodes with the scores larger than or equal to the scores of the article classes are finally reserved;
(iii) tree structure optimization unit: for optimizing the tree structure of the article class.
Referring to fig. 8, a schematic diagram of main modules of an attribute feature extraction apparatus 800 according to an embodiment of the present invention is shown, including:
a determining module 801, configured to determine an item type in response to an input of an item type range, determine an item set corresponding to the item type, obtain an attribute of each item, and count the number of items corresponding to each attribute in the item set;
a calculating module 802, configured to obtain expected frequencies corresponding to the single attributes and the article types, and calculate chi-square values of the single attributes for the article types according to the quantity of the articles and the corresponding expected frequencies;
an adjusting module 803, configured to sort the chi-square values to extract a predetermined number of attributes as basic features of the article types, and further adjust an article type tree structure based on the basic features.
In the device for implementing the present invention, the determining module 801 is further configured to:
determining the category class of each article in each category grade according to the physical attribute of each article in the article information database, and constructing a category range table;
in response to an input of an item class range, at least one item class corresponding to the item class range is determined using the class range table.
In the device for implementing the present invention, the determining module 801 is further configured to: and acquiring a field corresponding to the item type range from a preset query field table, so as to extract at least one item type corresponding to the field from the item type range table.
In the device for implementing the present invention, the determining module 801 is further configured to:
constructing an article inherent attribute table according to the physical attributes of each article in the article information database;
and according to the identification of each article in the article set, acquiring the attribute from the inherent attribute table of the article to obtain the attribute of each article.
In the device for implementing the present invention, the calculating module 802 is further configured to:
counting the first quantity of the articles in the article set, and acquiring the total quantity of the articles in an article information database and the second quantity of the articles corresponding to the single attribute;
inputting the first quantity, the second quantity, the item quantity and the total quantity into a desired frequency calculation mode to obtain a desired frequency corresponding to the single attribute and the item type.
In the apparatus for implementing the present invention, the adjusting module 803 is configured to:
determining the sum of chi-square values of the attributes in the basic features, dividing the number of the attributes by the sum of the chi-square values to obtain a chi-square average value, and taking the chi-square average value as the score of the article class;
in the article type tree structure, acquiring a sub-type of the article type based on a depth-first search mode, judging whether the score of the sub-type is greater than that of the single article type, and if so, retaining the sub-type; if the number of the sub-categories is less than the preset number, the sub-categories are rejected.
In the apparatus for implementing the present invention, the adjusting module 803 is configured to:
determining the sum of chi-square values of the attributes in the basic features, dividing the number of the attributes by the sum of the chi-square values to obtain a chi-square average value, and taking the chi-square average value as the score of the article class;
and in the article type tree structure, acquiring all the sub-types of the article types based on a breadth-first search mode so as to eliminate the sub-types with the scores smaller than the scores of the article types.
In addition, the detailed implementation of the device in the embodiment of the present invention has been described in detail in the above method, so that the repeated description is not repeated here.
FIG. 9 illustrates an exemplary system architecture 900 to which embodiments of the invention may be applied.
As shown in fig. 9, the system architecture 900 may include end devices 901, 902, 903, a network 904, and a server 905 (by way of example only). Network 904 is the medium used to provide communication links between terminal devices 901, 902, 903 and server 905. Network 904 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 901, 902, 903 to interact with a server 905 over a network 904 to receive or send messages and the like. Various communication client applications can be installed on the terminal devices 901, 902, 903.
The terminal devices 901, 902, 903 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 905 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 901, 902, 903.
It should be noted that the method provided by the embodiment of the present invention is generally executed by the server 905, and accordingly, the apparatus is generally disposed in the server 905.
It should be understood that the number of terminal devices, networks, and servers in fig. 9 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 10, a block diagram of a computer system 1000 suitable for use with a terminal device implementing an embodiment of the invention is shown. The terminal device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 10, the computer system 1000 includes a Central Processing Unit (CPU)1001 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the system 1000 are also stored. The CPU 1001, ROM 1002, and RAM 1003 are connected to each other via a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output section 1007 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 1008 including a hard disk and the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 is also connected to the I/O interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication part 1009 and/or installed from the removable medium 1011. The computer program executes the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 1001.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a determination module, a calculation module, and an adjustment module. Where the names of these modules do not in some cases constitute a limitation on the module itself, for example, the adaptation module may also be described as a "tree structure adaptation module".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:
determining the item types in response to the input of the item type range, determining an item set corresponding to the item types, acquiring the attributes of all items, and counting the quantity of the items corresponding to all the attributes in the item set;
acquiring expected frequency corresponding to the single attribute and the article type, and calculating a chi-square value of the single attribute to the article type according to the article quantity and the corresponding expected frequency;
and sequencing chi-square values to extract a predetermined number of attributes as basic features of the article types, and further adjusting the article type tree structure based on the basic features.
According to the technical scheme of the embodiment of the invention, for a single article type, basic feature extraction is carried out from the attribute set by using a chi-square check mode, and based on the basic feature, the article type of each branch is screened and reduced by using a depth-first search mode or a breadth-first search mode, so that a simplified article type tree structure is obtained.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for extracting attribute features is characterized by comprising the following steps:
determining the item types in response to the input of the item type range, determining an item set corresponding to the item types, acquiring the attributes of all items, and counting the quantity of the items corresponding to all the attributes in the item set;
acquiring expected frequency corresponding to the single attribute and the article type, and calculating a chi-square value of the single attribute to the article type according to the article quantity and the corresponding expected frequency;
and sequencing chi-square values to extract a predetermined number of attributes as basic features of the article types, and further adjusting the article type tree structure based on the basic features.
2. The method of claim 1, further comprising, prior to said entering an item class determination responsive to an item class range input:
determining the category class of each article in each category grade according to the physical attribute of each article in the article information database, and constructing a category range table;
the determining the item type in response to the input of the item type range comprises:
in response to an input of an item class range, at least one item class corresponding to the item class range is determined using the class range table.
3. The method of claim 2, further comprising, prior to said determining at least one item class corresponding to said item class range using said class range table:
and acquiring a field corresponding to the item type range from a preset query field table, so as to extract at least one item type corresponding to the field from the item type range table.
4. The method of claim 1, further comprising, prior to said obtaining attributes of each item:
constructing an article inherent attribute table according to the physical attributes of each article in the article information database;
the obtaining of the attribute of each article further comprises:
and according to the identification of each article in the article set, acquiring the attribute from the inherent attribute table of the article to obtain the attribute of each article.
5. The method of claim 1, wherein said obtaining a desired frequency corresponding to a single attribute and said item class further comprises:
counting the first quantity of the articles in the article set, and acquiring the total quantity of the articles in an article information database and the second quantity of the articles corresponding to the single attribute;
inputting the first quantity, the second quantity, the item quantity and the total quantity into a desired frequency calculation mode to obtain a desired frequency corresponding to the single attribute and the item type.
6. The method of claim 1, wherein said adjusting the item class tree structure based on said base characteristics comprises:
determining the sum of chi-square values of the attributes in the basic features, dividing the number of the attributes by the sum of the chi-square values to obtain a chi-square average value, and taking the chi-square average value as the score of the article class;
in the article type tree structure, acquiring a sub-type of the article type based on a depth-first search mode, judging whether the score of the sub-type is greater than that of the single article type, and if so, retaining the sub-type; if the number of the sub-categories is less than the preset number, the sub-categories are rejected.
7. The method of claim 1, wherein said adjusting the item class tree structure based on said base characteristics comprises:
determining the sum of chi-square values of the attributes in the basic features, dividing the number of the attributes by the sum of the chi-square values to obtain a chi-square average value, and taking the chi-square average value as the score of the article class;
and in the article type tree structure, acquiring all the sub-types of the article types based on a breadth-first search mode so as to eliminate the sub-types with the scores smaller than the scores of the article types.
8. An attribute feature extraction device, characterized by comprising:
the determining module is used for determining the article types in response to the input of the article type range, determining an article set corresponding to the article types, acquiring the attribute of each article, and counting the number of the articles corresponding to each attribute in the article set;
the calculation module is used for acquiring the expected frequency corresponding to the single attribute and the article type, and calculating the chi-square value of the single attribute to the article type according to the article quantity and the corresponding expected frequency;
and the adjusting module is used for sequencing the chi-square values, extracting attributes with a preset number as the basic features of the article types, and further adjusting the tree structure of the article types based on the basic features.
9. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN202010136606.3A 2020-03-02 2020-03-02 Attribute feature extraction method and device Pending CN113362089A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010136606.3A CN113362089A (en) 2020-03-02 2020-03-02 Attribute feature extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010136606.3A CN113362089A (en) 2020-03-02 2020-03-02 Attribute feature extraction method and device

Publications (1)

Publication Number Publication Date
CN113362089A true CN113362089A (en) 2021-09-07

Family

ID=77523229

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010136606.3A Pending CN113362089A (en) 2020-03-02 2020-03-02 Attribute feature extraction method and device

Country Status (1)

Country Link
CN (1) CN113362089A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104765839A (en) * 2015-04-16 2015-07-08 湘潭大学 Data classifying method based on correlation coefficients between attributes
CN108932335A (en) * 2018-07-10 2018-12-04 北京京东尚科信息技术有限公司 A kind of method and apparatus generating official documents and correspondence
CN108932648A (en) * 2017-07-24 2018-12-04 上海宏原信息科技有限公司 A kind of method and apparatus for predicting its model of item property data and training
US10410224B1 (en) * 2014-03-27 2019-09-10 Amazon Technologies, Inc. Determining item feature information from user content
CN110348771A (en) * 2018-04-02 2019-10-18 北京京东尚科信息技术有限公司 The method and apparatus that a kind of pair of order carries out group list
CN110580649A (en) * 2018-06-08 2019-12-17 北京京东尚科信息技术有限公司 Method and device for determining potential value of commodity

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10410224B1 (en) * 2014-03-27 2019-09-10 Amazon Technologies, Inc. Determining item feature information from user content
CN104765839A (en) * 2015-04-16 2015-07-08 湘潭大学 Data classifying method based on correlation coefficients between attributes
CN108932648A (en) * 2017-07-24 2018-12-04 上海宏原信息科技有限公司 A kind of method and apparatus for predicting its model of item property data and training
CN110348771A (en) * 2018-04-02 2019-10-18 北京京东尚科信息技术有限公司 The method and apparatus that a kind of pair of order carries out group list
CN110580649A (en) * 2018-06-08 2019-12-17 北京京东尚科信息技术有限公司 Method and device for determining potential value of commodity
CN108932335A (en) * 2018-07-10 2018-12-04 北京京东尚科信息技术有限公司 A kind of method and apparatus generating official documents and correspondence

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郭苗苗;范梦飞;曾志国;文美林;康锐;: "基于专家信息融合的可靠性仿真试验效果评价", 电子科学技术, no. 03, pages 314 - 323 *

Similar Documents

Publication Publication Date Title
CN109997124B (en) System and method for measuring semantic relevance of keywords
CN107729336B (en) Data processing method, device and system
WO2021068610A1 (en) Resource recommendation method and apparatus, electronic device and storage medium
WO2020207074A1 (en) Information pushing method and device
US11204707B2 (en) Scalable binning for big data deduplication
US11741094B2 (en) Method and system for identifying core product terms
CN108288208B (en) Display object determination method, device, medium and equipment based on image content
Hegde et al. Aspect based feature extraction and sentiment classification of review data sets using Incremental machine learning algorithm
CN111126442B (en) Method for generating key attribute of article, method and device for classifying article
CN112818230B (en) Content recommendation method, device, electronic equipment and storage medium
CN111428007B (en) Cross-platform based synchronous push feedback method
WO2022247894A1 (en) Service configuration method and apparatus for live broadcast room, and device and medium
CN112861895A (en) Abnormal article detection method and device
CN111767459A (en) Item recommendation method and device
CN105159898A (en) Searching method and searching device
CN114117134A (en) Abnormal feature detection method, device, equipment and computer readable medium
CN108460049B (en) Method and system for determining information category
US10963519B2 (en) Attribute diversity for frequent pattern analysis
CN113362089A (en) Attribute feature extraction method and device
CN108959584B (en) Community structure-based method and device for processing graph data
CN114528378A (en) Text classification method and device, electronic equipment and storage medium
CN115827956A (en) Data information retrieval method and device, electronic equipment and storage medium
CN113869904B (en) Suspicious data identification method, device, electronic equipment, medium and computer program
CN115587877A (en) Live E-commerce platform commodity content intelligent pushing management system based on big data
CN113987026A (en) Method, apparatus, device and storage medium for outputting information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination