CN113609112A - E-commerce commodity attribute data standardization processing method and system - Google Patents

E-commerce commodity attribute data standardization processing method and system Download PDF

Info

Publication number
CN113609112A
CN113609112A CN202110879155.7A CN202110879155A CN113609112A CN 113609112 A CN113609112 A CN 113609112A CN 202110879155 A CN202110879155 A CN 202110879155A CN 113609112 A CN113609112 A CN 113609112A
Authority
CN
China
Prior art keywords
attribute value
attribute
standard
original
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110879155.7A
Other languages
Chinese (zh)
Inventor
哈达
付飞
张勇
林斌
杨守斌
郑枫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhidemai Technology Co ltd
Original Assignee
Beijing Zhidemai Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhidemai Technology Co ltd filed Critical Beijing Zhidemai Technology Co ltd
Priority to CN202110879155.7A priority Critical patent/CN113609112A/en
Publication of CN113609112A publication Critical patent/CN113609112A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0623Item investigation
    • G06Q30/0625Directed, with specific intent or strategy
    • G06Q30/0629Directed, with specific intent or strategy for generating comparisons

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a standardized processing method and a standardized processing system for E-commerce commodity attribute data. The method comprises the following steps: analyzing the attribute data of the commodity, and splitting the attribute data into an original attribute name and an original attribute value; mapping the original attribute name into a standard attribute name; and mapping the original attribute value to a standard attribute value. According to the technical scheme provided by the invention, a set of uniform and standard E-commerce commodity attribute data can be constructed by standardizing the commodity attribute data with the difference of each E-commerce platform, so that the consumer can conveniently compare the commodity attributes of the cross-E-commerce platform, and the shopping experience of the consumer is effectively improved.

Description

E-commerce commodity attribute data standardization processing method and system
Technical Field
The invention relates to a software engineering technology, and particularly belongs to the field of server development.
Background
With the rapid development and prosperity of the e-commerce field, online shopping becomes an indispensable important part in the mass life. Compared with the traditional off-line purchasing mode, the on-line purchasing mode can not directly contact the entity of the commodity, and the characteristics of the commodity can be known through information carriers such as pictures, characters, videos and the like. In the commodity description information in the page, the data of the commodity attributes is an aggregate of commodity features and is a key reference for a consumer to make a shopping decision.
With the concept of "comparing three goods", consumers want to make the best choice against the goods of different e-commerce platforms. Due to the great differences in description modes, data units, definition domains and the like of the commodity attribute data of different e-commerce platforms, consumers cannot visually perform comparison and analysis of commodities across the e-commerce platforms. In order to solve the above problems, how to solve the difference of the attribute data between the platforms, and it becomes a challenge to construct a set of uniform and standardized e-commerce commodity attribute data standards.
Based on the current situation that the existing cross-e-commerce platform attribute data is not uniform, most of the application of the industry boundary to the attribute data is limited to a single mall dimension, namely, consumers can compare commodities in the range of the e-commerce platform. In the aspect of cross-mall application of attribute data, most of the existing technical schemes are based on attributes to make a recommendation system, and research on the aspect of constructing a data normalization processing method is less. For example, in the multi-platform commodity attribute matching processing method and system of patent application No. cn202110004538.x, user data and multi-mall attributes are applied to make accurate matching of target users, and consumers can passively receive commodities recommended by the system, but cannot actively compare cross-platform e-commerce commodities.
Due to the fact that the existing cross-e-commerce platform attribute data are not uniform and lack of data standards, a set of uniform standards cannot be used for comparing commodities of a multi-e-commerce platform. For consumers, a great deal of effort is needed to research the standards of the commodity attributes of different e-commerce platforms, and poor shopping experience is caused; for enterprise operators, the task of removing duplicate links of the same commodities as the multi-provider platform needs a lot of labor cost, which causes a bottleneck of operation efficiency.
Disclosure of Invention
The invention provides a method and a system for standardized processing of E-commerce commodity attribute data, which can standardize the attribute data of different E-commerce platforms and solve the problems that the data of different platforms exist: the problems of different description modes, different data units, different value ranges, missing values, error values and the like. And constructing a set of cross-platform attribute data mapping system to realize the standardized processing of the attribute data of the multi-power-provider platform.
According to a first aspect of the embodiments of the present invention, there is provided a method for processing attribute data of an e-commerce commodity in a standardized manner, including:
analyzing the attribute data of the commodity, and splitting the attribute data into an original attribute name and an original attribute value;
mapping the original attribute name into a standard attribute name;
and mapping the original attribute value to a standard attribute value.
Further, before mapping the original attribute value to a standard attribute value, the method further includes:
dividing the attribute data into category data groups;
carrying out duplicate removal processing on the attribute names and the attribute values in the item data group;
and determining and storing the standard attribute name, the standard attribute name and the corresponding relation of the standard attribute name and the attribute value by using a Delphi method based on the attribute name and the attribute value after the duplicate removal processing.
Further, mapping the original attribute value to a standard attribute value specifically includes:
intercepting the original attribute value according to configured regular expression or character string interception logic;
matching the intercepted original attribute values with each standard attribute value according to the configured regular expression, and replacing the intercepted original attribute values with the matched standard attribute values;
processing the standard attribute value obtained after replacement according to the configured string interception rule to obtain a formatted standard attribute value;
and according to the data type of the original attribute value, using the logic of the preset matching rule of the original attribute value to process the formatted standard attribute value, and establishing the final mapping relation between the original attribute value and the standard attribute value.
Further, the matching rule includes: judging the numerical value, character string and range interval;
the processing the formatted standard attribute value by using a logic of a pre-configured matching rule of the original attribute value according to the data type of the original attribute value specifically includes:
performing numerical equivalence judgment in a standard attribute value according to the digital part in the original attribute value to obtain a standard attribute value of a digital equivalence relation equal to the digital part in the original attribute value; or
Performing character string congruence judgment in a standard attribute value according to the character string in the original attribute value, and matching the character string congruence judgment to the standard attribute value; or
And judging the range interval according to the original attribute value, and matching the range interval with a standard attribute value.
According to a second aspect of the embodiments of the present invention, there is provided an e-commerce product attribute data normalization processing system, including:
the attribute data source analysis module is used for analyzing the attribute data of the commodity and splitting the attribute data into an original attribute name and an original attribute value;
the attribute name mapping configuration module is used for mapping the original attribute name into a standard attribute name;
and the attribute value mapping algorithm module is used for mapping the original attribute value into a standard attribute value.
Further, the system further comprises:
the standard attribute configuration module is specifically configured to: before the attribute value mapping algorithm module maps the original attribute value into a standard attribute value, dividing the attribute data into a category data group; carrying out duplicate removal processing on the attribute names and the attribute values in the item data group; and determining and storing the standard attribute name, the standard attribute name and the corresponding relation of the standard attribute name and the attribute value by using the Delphi method based on the processed attribute name and the attribute value.
Further, the attribute value mapping algorithm module is specifically configured to:
the regular or character string intercepting unit is used for intercepting the original attribute value according to configured regular expression or character string intercepting logic;
the character string replacing unit is used for matching the intercepted original attribute values with the standard attribute values according to the configured regular expressions and replacing the intercepted original attribute values with the matched standard attribute values;
the formatting unit is used for processing the standard attribute value obtained after the replacement according to the configured string interception rule to obtain a formatted standard attribute value;
and the matching unit is used for processing the formatted standard attribute value by using the logic of the pre-configured matching rule of the original attribute value according to the data type of the original attribute value and establishing the final mapping relation between the original attribute value and the standard attribute value.
Further, the matching rule includes: judging the numerical value, character string and range interval;
the matching unit is specifically configured to:
performing numerical equivalence judgment in a standard attribute value according to the digital part in the original attribute value to obtain a standard attribute value of a digital equivalence relation equal to the digital part in the original attribute value; or
Performing character string congruence judgment in a standard attribute value according to the character string in the original attribute value, and matching the character string congruence judgment to the standard attribute value; or
And judging the range interval according to the original attribute value, and matching the range interval with a standard attribute value.
According to a third aspect of the embodiments of the present invention, there is provided a terminal device, including:
a processor; and
a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method as described above.
According to a fourth aspect of embodiments of the present invention, there is provided a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method as described above.
According to the technical scheme provided by the invention, a set of uniform and standard E-commerce commodity attribute data can be constructed by standardizing the commodity attribute data with the difference of each E-commerce platform, so that the consumer can conveniently compare the commodity attributes of the cross-E-commerce platform, and the shopping experience of the consumer is effectively improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout.
Fig. 1 is a flowchart illustrating a method for processing attribute data of an e-commerce commodity according to an exemplary embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for normalizing process data of an E-commerce merchandise according to another exemplary embodiment of the present invention;
FIG. 3 is a block diagram illustrating the structure of an e-commerce merchandise attribute data normalization processing system according to an exemplary embodiment of the present invention;
FIG. 4 is a schematic diagram of a standard property list page;
FIG. 5 is a schematic diagram of a property name relationship list;
FIG. 6 is a schematic diagram of an attribute rule configuration list;
FIG. 7 is a map mapping list diagram;
FIG. 8 is a schematic view of a b2c attribute list;
FIG. 9 is a schematic view of an attribute mapping rule configuration page;
FIG. 10 is a delta data stream layout.
Detailed Description
Preferred embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that, although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
The technical solutions of the embodiments of the present invention are described in detail below with reference to the accompanying drawings.
Fig. 1 is a flowchart illustrating a method for processing attribute data of an e-commerce commodity according to an exemplary embodiment of the present invention.
Referring to fig. 1, the method includes:
11. analyzing the attribute data of the commodity, and splitting the attribute data into an original attribute name and an original attribute value;
specifically, in this step, the attribute data of the commodity obtained from each large platform e-commerce is analyzed, and the attribute data is divided into an attribute name and a key value pair form of the attribute value to be stored.
12. Mapping the original attribute name into a standard attribute name;
specifically, in this step, a mapping relationship between the original attribute name and the standard attribute name may be established according to two dimensions of the mall and the commodity classification standard.
13. And mapping the original attribute value to a standard attribute value.
Specifically, in this step, a mapping rule algorithm for the original attribute value is configured, and mapping is performed in a regular expression, a numerical matching manner, an interval matching manner, and the like according to the data type of the attribute value, and the original attribute value can be mapped to the standard attribute value through the algorithm.
According to the corresponding relation generated by the mapping algorithm, the original attribute data which are not mapped and stored in the database are refreshed in batch by using the script, so that the attribute data can be normalized, and when the incremental attribute data appear, the mapping relation established in the iteration steps 12 and 13 can be maintained continuously.
According to the technical scheme provided by the embodiment of the invention, a set of uniform and standard commodity attribute data of the E-commerce can be constructed by standardizing the commodity attribute data with the difference of each E-commerce platform, so that the commodity attribute comparison of the cross-E-commerce platform is convenient for consumers, and the shopping experience of the consumers is effectively improved.
Alternatively, as an embodiment of the present invention, as shown in fig. 2, the method includes:
21. analyzing the attribute data of the commodity, and splitting the attribute data into an original attribute name and an original attribute value;
22. dividing the attribute data into a category data group, screening and sorting out the corresponding relation between the standard attribute name and the attribute value in the category data group according to the category data group, and storing the corresponding relation;
specifically, when the article type data group is divided, the attribute data of the e-commerce platform with a large scale, such as two platforms of the Jingdong and the Tianmao, can be used as a reference.
The screening and sorting process in step 22 may specifically be: firstly, carrying out duplicate removal processing on the attribute names and the attribute values in the item data group; then, based on the attribute name and the attribute value after the deduplication processing, the standard attribute name, the correspondence between the standard attribute name and the attribute value is determined using the delphire method.
23. Mapping the original attribute name into a standard attribute name;
24. and mapping the original attribute value to a standard attribute value.
Optionally, in this embodiment, step 13 specifically includes:
131. intercepting the original attribute value according to configured regular expression or character string interception logic;
132. matching the intercepted original attribute values with each standard attribute value according to the configured regular expression, and replacing the intercepted original attribute values with the matched standard attribute values;
133. processing the standard attribute value obtained after replacement according to the configured string interception rule to obtain a formatted standard attribute value;
134. and according to the data type of the original attribute value, using the logic of the preset matching rule of the original attribute value to process the formatted standard attribute value, and establishing the final mapping relation between the original attribute value and the standard attribute value.
Optionally, in this embodiment, the matching rule includes: judging the numerical value, character string and range interval;
step 134, specifically including:
performing numerical equivalence judgment in a standard attribute value according to the digital part in the original attribute value to obtain a standard attribute value of a digital equivalence relation equal to the digital part in the original attribute value; or
Performing character string congruence judgment in a standard attribute value according to the character string in the original attribute value, and matching the character string congruence judgment to the standard attribute value; or
And judging the range interval according to the original attribute value, and matching the range interval with a standard attribute value.
Fig. 3 is a block diagram illustrating a structure of an e-commerce merchandise attribute data normalization processing system according to an exemplary embodiment of the present invention.
Referring to fig. 3, the system includes:
the attribute data source analysis module is used for analyzing the attribute data of the commodity and splitting the attribute data into an original attribute name and an original attribute value;
the attribute name mapping configuration module is used for mapping the original attribute name into a standard attribute name;
and the attribute value mapping algorithm module is used for mapping the original attribute value into a standard attribute value.
Optionally, in this embodiment, the system further includes:
the standard attribute configuration module is specifically configured to: before the attribute value mapping algorithm module maps the original attribute value into a standard attribute value, dividing the attribute data into a category data group; carrying out duplicate removal processing on the attribute names and the attribute values in the item data group; and determining and storing the standard attribute name, the standard attribute name and the corresponding relation of the standard attribute name and the attribute value by using the Delphi method based on the processed attribute name and the attribute value.
Optionally, in this embodiment, the attribute value mapping algorithm module is specifically configured to:
the regular or character string intercepting unit is used for intercepting the original attribute value according to configured regular expression or character string intercepting logic;
the character string replacing unit is used for matching the intercepted original attribute values with the standard attribute values according to the configured regular expressions and replacing the intercepted original attribute values with the matched standard attribute values;
the formatting unit is used for processing the standard attribute value obtained after the replacement according to the configured string interception rule to obtain a formatted standard attribute value;
and the matching unit is used for processing the formatted standard attribute value by using the logic of the pre-configured matching rule of the original attribute value according to the data type of the original attribute value and establishing the final mapping relation between the original attribute value and the standard attribute value.
Optionally, in this embodiment, the matching rule includes: judging the numerical value, character string and range interval;
the matching unit is specifically configured to:
performing numerical equivalence judgment in a standard attribute value according to the digital part in the original attribute value to obtain a standard attribute value of a digital equivalence relation equal to the digital part in the original attribute value; or
Performing character string congruence judgment in a standard attribute value according to the character string in the original attribute value, and matching the character string congruence judgment to the standard attribute value; or
And judging the range interval according to the original attribute value, and matching the range interval with a standard attribute value.
With regard to the system in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
The following describes each module related to the technical solution of the present invention in a specific implementation order according to a specific embodiment. The function of the data table involved is as follows:
Figure BDA0003191431720000081
Figure BDA0003191431720000091
first, attribute background web service module
The attribute background web service module provides the following information by developing web service as an information interaction mode of attribute mapping operators and an attribute database: the standard attribute list page, the attribute name relationship list page, the attribute rule configuration list page, the map mapping list page and the b2c attribute list page are respectively introduced as follows:
A. develop a standard properties list page (as shown in FIG. 4): providing an add-delete-modify-configure function for the relationship between the standard attribute name and the standard attribute value; and providing a query report of the relation between the standard attribute name and the standard attribute value.
B. Develop an attribute name relationship list page (as shown in FIG. 5): providing a query report of the mapping relation between the original attribute names and the standard attribute names with the dimensions of classification, mapping state and shopping mall; providing the functions of adding, deleting, modifying and configuring the original attribute name and the standard attribute name; the configuration function for providing the judgment algorithm for the attribute value comprises the following steps: character string equivalence judgment, numerical value equivalence judgment and range interval judgment.
C. Develop attribute rule configuration list page (shown in fig. 6): the method provides a configuration function of a character string matching algorithm in the attribute value mapping relationship, and can configure three logic components: string interception logic, replacement logic, formatting logic.
D. Develop map mapping list page (as shown in fig. 7): and providing an add, delete, modify and configure function of the mapping relation between the original attribute value and the standard attribute value.
E. Develop b2c Property List Page (as shown in FIG. 8): and providing a commodity attribute data query report function after the attribute data normalization processing is finished.
Second, attribute data source analysis module
A. Attribute and attribute name data in json format stored in the spec _ detail field are acquired from the attribute data schedule _ spec _ info, parsed into key value pair forms of attribute names and attribute values, and stored in attrmap _ b2c _ spec.
B. The mall information field mall _ id and the commodity classification standard dimension field map _ cat3_ id are obtained from the attribute data schedule _ spec _ info and stored in attrmap _ b2c _ spec.
Third, standard attribute configuration module
A. The attribute data of the two platforms of the Jingdong platform and the Tianmao platform stored in attrmap _ b2c _ spec are divided into item class data groups according to the map _ cat3_ id field.
B. And performing deduplication processing on the attribute names and the attribute value data in the item data group.
C. And distributing the commodity classification data set to an expert group of operation experience to evaluate the standard attribute, and determining the standard attribute name and the corresponding relation between the standard attribute name and the attribute value by using a Delphi method.
D. The relationship is saved through a standard attribute list page, and is persisted to a standard attribute list attrmap _ base _ spec.
Fourth, attribute name mapping configuration module
A. And acquiring original attribute name data stored in attrmap _ b2c _ spec, comparing the original attribute name data with the attribute names of the same class in attrmap _ base _ spec in a standard attribute table, and establishing a mapping relation between the attribute names with the same meaning.
The attribute name mapping here can be implemented by manual operation, that is, by manually identifying to confirm the relationship. For example: "memory capacity" and "memory size" are two attribute names that have the same meaning. Defining the memory capacity as a standard in the labeled attribute name table, it is necessary to bind the attribute names with the same meaning of different names, such as "memory size", "memory", etc., of different e-commerce platforms to the standard attribute name of "memory capacity".
B. And saving the data relationship in the A to an original attribute name and standard attribute name mapping relationship table attrmap _ base _ orig _ key _ relationship through an attribute name relationship list page.
Fifth, attribute value mapping algorithm module
A. The configuration attribute rule configuration list page (as shown in fig. 9) is specifically divided into three components, 1, regular or string interception 2, string replacement 3, and formatting.
B. And intercepting the original attribute value X according to logic in a 'regular or character string interception' component to obtain an intercepted original attribute value X1.
For example: the resolution of the attribute value of the original genus is "FHD +2400 × 1080", and interception is performed using a regular expression "\ d + ([ X | \ X | × ] | \ s [ X | \ | X | × ] \ s) \ d +", to obtain "2400 × 1080".
C. And replacing the X1 according to the hierarchical regular expression in the character string replacement component in sequence, and finally outputting the original attribute value X2 after replacement processing.
For example: the 2400 × 1080 obtained above sequentially judges whether the four standard attribute values are "QHD + and above", "high definition HD +", "full high definition screen (1920 × 1080)", and "ultra high definition screen (2K/2.5K/3K/4K)" according to the regular expression, and matches, i.e., replaces, the four standard attribute values with the standard attribute values. The regular expressions are in turn: "(1 [2] [0-7] [0-9] |1[0-1] [0-9] [0-9] |9[6-9] [0-9]) (\\ W +)? ([ X | \\ X | × ] | \ s [ X | \\ X | × ] \ s) (\ d +), "(1 [9] [0-1] [0-9] |1[2-9] [0-9] [0-9] |12[8-9] [0-9]) (\\ W +)? ([ X | \\ X | × ] | \ s [ X | \\ X | × ] | s) (\ d +), "(204 [0-7] |20[0-3] [0-9] |19[2-9] [0-9]) ([ X | X | × ] | s [ X | X | × ]. s) (\\ d +)", "([ X-9 ] [0-9] [0-9] [0-9] |20[5-9] [0-9] |204[8-9]) ([ X | \\\\\ X | X | X | 9] [0-9] | s [ X | 5-9] |204[8-9] (" K \\\\\\\\\/K +) (final K/, and "2400X 1080" was replaced with "ultra high definition (2K/2.5K/3K/4K)".
D. And processing the X2 according to a character string interception rule configured in the formatting component to obtain a final processing result formatted attribute value X3.
For example: reformatting the intercepted ultrahigh screen clearing (2K/2.5K/3K/4K), and intercepting and replacing the ultrahigh screen clearing (2K/2.5K/3K/4K) with the ultrahigh screen clearing by using a regular expression.
E. And according to a matching rule configured in the attribute name relationship list page and the data type of the attribute value, establishing a final mapping relationship between the original attribute value and the standard attribute value by using logical processing X3 of numerical value congruent judgment, character string congruent judgment and range interval judgment.
The three decision logics described above are exemplified below: 1. and (3) numerical value congruent judgment, namely carrying out congruent judgment on the original attribute value with the memory capacity attribute of 2G in the standard attribute value through the numerical part 2, finding out a numerical congruent relationship of 2-2, and mapping the numerical congruent relationship onto the standard attribute value of 2 GB. 2. And (4) judging whether the character string is complete or not, judging whether the character string is complete or not according to the high-pass cellover of the original attribute value, and finally matching the character string with the high-pass cellover of the standard attribute value. 3. And (4) judging the range interval, namely judging the original attribute value with the thickness attribute of 19mm through the upper and lower limit numbers of the interval, and finally matching the original attribute value with the standard value of 18.1 mm-20.0 mm.
F. And (4) persisting the mapping relation established in the E to a mapping relation table attrmap _ map _ spec by using the script.
Six, total attribute data mapping module
A. Calling a python mapping script program according to the mapping relation established in the mapping relation table attrmap _ map _ spec, mapping the original attribute name and the original attribute value stored in attrmap _ b2c _ spec, and performing normalization processing on the attribute data.
Seven, increment attribute data mapping module
The updating process of the incremental attribute data is shown in fig. 10, and includes the following steps:
A. a redis queue is initialized as a container for data change events.
B. And encapsulating an attribute rule updating data interface, requesting the interface when performing addition, modification and deletion operations on an attribute name relationship list page and an attribute rule configuration list page at the front end, and generating an event according to the data and the type of the operation by the attribute rule updating data interface and writing the event into a redis queue.
C. And a timing task is newly established, and original attribute data in the attribute data scheduling table schedule _ spec _ info of the yesterday increment is written into a redis queue to generate an event.
D. And consuming the events in the redis queue and judging the event type. Can be divided into attribute data events and configuration data events. If the event is an attribute data event, the program calls the data source analysis module in the second step to perform the operation. Data update operation of attrmap _ b2c _ spec table, and resetting the corresponding mapping relation in attrmap _ map _ spec table. If the event is a configuration data event, the corresponding mapping relationship in the tables attrmap _ map _ spec and attrmap _ b2c _ spec is reset.
E. Setting 10 minutes to execute a round of timing task, and reestablishing the mapping relation reset in the D by using the python mapping script in the sixth step.
After the commodity attribute data is subjected to standardization processing based on the technical scheme of the embodiment of the invention, A/B test is carried out by establishing a control group. In the aspect of internal operation of enterprises, the efficiency of operators is improved to 200 working units/day from 50 working units/day, and the efficiency is improved by 200%. The time of the consumer comparing the two commodity specification parameters is reduced by 80 percent on average, and the shopping experience of the consumer is effectively improved.
The method according to the invention may also be implemented as a computing device comprising a memory and a processor.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include various types of storage units such as system memory, Read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions that are required by the processor or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. Further, the memory may comprise any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, may also be employed. In some embodiments, the memory may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual layer DVD-ROM), a read-only Blu-ray disc, an ultra-dense optical disc, a flash memory card (e.g., SD card, min SD card, Micro-SD card, etc.), a magnetic floppy disc, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.
The memory has stored thereon executable code which, when processed by the processor, causes the processor to perform some or all of the methods described above.
Furthermore, the method according to the invention may also be implemented as a computer program or computer program product comprising computer program code instructions for carrying out some or all of the steps of the above-described method of the invention.
Alternatively, the invention may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or computing device, server, etc.), causes the processor to perform part or all of the various steps of the above-described method according to the invention.
The aspects of the invention have been described in detail hereinabove with reference to the drawings. In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. Those skilled in the art should also appreciate that the acts and modules referred to in the specification are not necessarily required by the invention. In addition, it can be understood that the steps in the method according to the embodiment of the present invention may be sequentially adjusted, combined, and deleted according to actual needs, and the modules in the device according to the embodiment of the present invention may be combined, divided, and deleted according to actual needs.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (10)

1. A standardized processing method for E-commerce commodity attribute data is characterized by comprising the following steps:
analyzing the attribute data of the commodity, and splitting the attribute data into an original attribute name and an original attribute value;
mapping the original attribute name into a standard attribute name;
and mapping the original attribute value to a standard attribute value.
2. The method of claim 1, further comprising, prior to said mapping said original property value to a standard property value:
dividing the attribute data into category data groups;
carrying out duplicate removal processing on the attribute names and the attribute values in the item data group;
and determining and storing the standard attribute name, the standard attribute name and the corresponding relation of the standard attribute name and the attribute value by using a Delphi method based on the attribute name and the attribute value after the duplicate removal processing.
3. The method according to claim 1 or 2, wherein mapping the original attribute value to a standard attribute value specifically comprises:
intercepting the original attribute value according to configured regular expression or character string interception logic;
matching the intercepted original attribute values with each standard attribute value according to the configured regular expression, and replacing the intercepted original attribute values with the matched standard attribute values;
processing the standard attribute value obtained after replacement according to the configured string interception rule to obtain a formatted standard attribute value;
and according to the data type of the original attribute value, using the logic of the preset matching rule of the original attribute value to process the formatted standard attribute value, and establishing the final mapping relation between the original attribute value and the standard attribute value.
4. The method of claim 3, wherein the matching rule comprises: judging the numerical value, character string and range interval;
the processing the formatted standard attribute value by using a logic of a pre-configured matching rule of the original attribute value according to the data type of the original attribute value specifically includes:
performing numerical equivalence judgment in a standard attribute value according to the digital part in the original attribute value to obtain a standard attribute value of a digital equivalence relation equal to the digital part in the original attribute value; or
Performing character string congruence judgment in a standard attribute value according to the character string in the original attribute value, and matching the character string congruence judgment to the standard attribute value; or
And judging the range interval according to the original attribute value, and matching the range interval with a standard attribute value.
5. An e-commerce commodity attribute data normalization processing system, comprising:
the attribute data source analysis module is used for analyzing the attribute data of the commodity and splitting the attribute data into an original attribute name and an original attribute value;
the attribute name mapping configuration module is used for mapping the original attribute name into a standard attribute name;
and the attribute value mapping algorithm module is used for mapping the original attribute value into a standard attribute value.
6. The system of claim 5, further comprising:
the standard attribute configuration module is specifically configured to: before the attribute value mapping algorithm module maps the original attribute value into a standard attribute value, dividing the attribute data into a category data group; carrying out duplicate removal processing on the attribute names and the attribute values in the item data group; and determining and storing the standard attribute name, the standard attribute name and the corresponding relation of the standard attribute name and the attribute value by using the Delphi method based on the processed attribute name and the attribute value.
7. The system according to claim 5 or 6, wherein the attribute value mapping algorithm module is specifically configured to:
the regular or character string intercepting unit is used for intercepting the original attribute value according to configured regular expression or character string intercepting logic;
the character string replacing unit is used for matching the intercepted original attribute values with the standard attribute values according to the configured regular expressions and replacing the intercepted original attribute values with the matched standard attribute values;
the formatting unit is used for processing the standard attribute value obtained after the replacement according to the configured string interception rule to obtain a formatted standard attribute value;
and the matching unit is used for processing the formatted standard attribute value by using the logic of the pre-configured matching rule of the original attribute value according to the data type of the original attribute value and establishing the final mapping relation between the original attribute value and the standard attribute value.
8. The method of claim 7, wherein the matching rule comprises: judging the numerical value, character string and range interval;
the matching unit is specifically configured to:
performing numerical equivalence judgment in a standard attribute value according to the digital part in the original attribute value to obtain a standard attribute value of a digital equivalence relation equal to the digital part in the original attribute value; or
Performing character string congruence judgment in a standard attribute value according to the character string in the original attribute value, and matching the character string congruence judgment to the standard attribute value; or
And judging the range interval according to the original attribute value, and matching the range interval with a standard attribute value.
9. A terminal device, comprising:
a processor; and
a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any one of claims 1-4.
10. A non-transitory machine-readable storage medium having executable code stored thereon, wherein the executable code, when executed by a processor of an electronic device, causes the processor to perform the method of any of claims 1-4.
CN202110879155.7A 2021-08-02 2021-08-02 E-commerce commodity attribute data standardization processing method and system Pending CN113609112A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110879155.7A CN113609112A (en) 2021-08-02 2021-08-02 E-commerce commodity attribute data standardization processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110879155.7A CN113609112A (en) 2021-08-02 2021-08-02 E-commerce commodity attribute data standardization processing method and system

Publications (1)

Publication Number Publication Date
CN113609112A true CN113609112A (en) 2021-11-05

Family

ID=78338988

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110879155.7A Pending CN113609112A (en) 2021-08-02 2021-08-02 E-commerce commodity attribute data standardization processing method and system

Country Status (1)

Country Link
CN (1) CN113609112A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115063211A (en) * 2022-08-16 2022-09-16 华能能源交通产业控股有限公司 Method and device for acquiring commodity attribute data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7043492B1 (en) * 2001-07-05 2006-05-09 Requisite Technology, Inc. Automated classification of items using classification mappings
CN102609459A (en) * 2012-01-12 2012-07-25 神州数码网络(北京)有限公司 Method and device for string matching based on regular expression
CN104090909A (en) * 2014-06-09 2014-10-08 中国建设银行股份有限公司 Commodity information synchronization method for different e-commerce platforms, and commodity information synchronization device for different e-commerce platforms
CN109766339A (en) * 2018-11-30 2019-05-17 广州因特信息科技有限公司 A kind of product information storage method and system, medium realized based on matrix type
CN109903105A (en) * 2017-12-08 2019-06-18 北京京东尚科信息技术有限公司 A kind of method and apparatus for improving end article attribute
CN112256691A (en) * 2019-07-22 2021-01-22 珠海金山办公软件有限公司 Data mapping method and device and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7043492B1 (en) * 2001-07-05 2006-05-09 Requisite Technology, Inc. Automated classification of items using classification mappings
CN102609459A (en) * 2012-01-12 2012-07-25 神州数码网络(北京)有限公司 Method and device for string matching based on regular expression
CN104090909A (en) * 2014-06-09 2014-10-08 中国建设银行股份有限公司 Commodity information synchronization method for different e-commerce platforms, and commodity information synchronization device for different e-commerce platforms
CN109903105A (en) * 2017-12-08 2019-06-18 北京京东尚科信息技术有限公司 A kind of method and apparatus for improving end article attribute
CN109766339A (en) * 2018-11-30 2019-05-17 广州因特信息科技有限公司 A kind of product information storage method and system, medium realized based on matrix type
CN112256691A (en) * 2019-07-22 2021-01-22 珠海金山办公软件有限公司 Data mapping method and device and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115063211A (en) * 2022-08-16 2022-09-16 华能能源交通产业控股有限公司 Method and device for acquiring commodity attribute data
CN115063211B (en) * 2022-08-16 2022-11-11 华能能源交通产业控股有限公司 Method and device for acquiring commodity attribute data

Similar Documents

Publication Publication Date Title
US8799854B2 (en) Reusing software development assets
US11275768B2 (en) Differential support for frequent pattern analysis
US9031901B1 (en) Flexible database schema
US6493723B1 (en) Method and system for integrating spatial analysis and data mining analysis to ascertain warranty issues associated with transportation products
US10496645B1 (en) System and method for analysis of a database proxy
WO2022214699A1 (en) System and method for privacy-preserving analytics on disparate data sets
CN111046237A (en) User behavior data processing method and device, electronic equipment and readable medium
US7113951B2 (en) Method and system for detecting tables to be modified
US20160132496A1 (en) Data filtering
US11625408B2 (en) Systems and methods for expedited large file processing
US20230153281A1 (en) Maintaining a dataset based on periodic cleansing of raw source data
CN117391313B (en) Intelligent decision method, system, equipment and medium based on AI
CN113609112A (en) E-commerce commodity attribute data standardization processing method and system
CN116719799A (en) Environment-friendly data management method, device, computer equipment and storage medium
CN109947797B (en) Data inspection device and method
CN111581431B (en) Data exploration method and device based on dynamic evaluation
CN113609175A (en) E-commerce commodity attribute data processing method and device based on graph database
US20220277400A1 (en) System and method for regular expression generation for improved data transfer
US11546381B1 (en) Unified data security labeling framework
KR102256814B1 (en) Method and system for selecting target data
US20070073645A1 (en) Apparatus and method for identifying relationship mismatches during profiling of multiple data sources
CN113643100A (en) Commodity similarity judgment module contribution quantification method and system
CN112651641A (en) Method and device for processing portrait management data
CN106991103B (en) Navigation data file checking method and engine system
Sung et al. Forecasting association rules using existing data sets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination