WO2023247730A1

WO2023247730A1 - System and method of optimizing digital catalogs for online marketplaces

Info

Publication number: WO2023247730A1
Application number: PCT/EP2023/067042
Authority: WO
Inventors: Lorenzo GUGLIELMI; Giovanni Guardalben
Original assignee: Kipcast S.R.L.
Priority date: 2022-06-23
Filing date: 2023-06-22
Publication date: 2023-12-28

Abstract

A system and a method for generating a refined digital catalog for publishing on a target digital marketplace are provided. An initial digital catalog from a catalog source is received. The initial digital catalog includes a plurality of products offered by a merchant and product information associated therewith. One or more product entities and product attribute entities associated with each of products are extracted based on the product information. One or more standardized intermediary product categories, according to a standardized intermediary taxonomy, are assigned to the products based on the extracted product entities and product attribute entities. Furthermore, target categories, within a target taxonomy used by target digital marketplace, are identified to be assigned to each product based on a mapping between the standardized intermediary taxonomy and the target taxonomy. A refined catalog is generated by assigning the identified target categories to each of the products.

Description

SYSTEM AND METHOD OF OPTIMIZING DIGITAL CATALOGS FOR ONLINE MARKETPLACES

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority from Italian Patent Application No. 102022000013309 filed on June 23, 2022, and U.S. Provisional Patent Application No. 63/420,200 filed on October 28, 2022, the contents of which are incorporated herein by reference in their entirety.

TECHNICAL FIELD

[0002] The present disclosure generally relates to digital catalogs used in ecommerce, and more particularly, relates to a system and method for optimizing the digital catalogs for target digital marketplaces.

BACKGROUND

[0003] E-commerce has gained immense popularity in the past couple of decades and more and more consumers now turn to digital marketplaces instead of visiting brick and mortar stores. These digital marketplaces not only provide convenience for their customers and merchants, but also allow merchants to more practically sell their products and/or services across countries. Expectedly, every merchant wants to increase their visibility on these digital marketplace platforms to improve their sales. To do so, typically merchants build digital catalogs providing various details of all the products and/or services offered by them and publish these digital catalogs on various digital marketplaces. Typically, when a consumer searches for a product and/or service online, the relevance of search results largely depends on how well the products are categorized within the catalog. Thus, with effective categorizing, the merchant’s products and/or services are more likely to be found in response to consumer searches.

[0004] In order to achieve effective categorization of products, most merchants rely on manual categorization of products to match with the product categorization used by the target marketplace where the merchant desires to publish their catalog. Further, even though tools have been developed to categorize products and services in a digital catalog, most are restricted to a single target marketplace, which may turn out to be expensive and not desirable if the merchant wishes to publish on multiple marketplaces. Such conventional systems for product categorization also fail when the merchant wishes to expand their product sales in other countries with different languages. SUMMARY

[0005] In one aspect, a method for generating a refined digital catalog for publishing on a target digital marketplace is provided. The method is performed by a product feed management system. The method includes receiving an initial digital catalog from a catalog source, the initial digital catalog including a plurality of products offered by a merchant and product information associated with each of the plurality of products. The method further includes extracting one or more product entities and one or more product attribute entities associated with each of the plurality of products based on the associated product information. The method includes assigning one or more standardized intermediary product categories, according to a standardized intermediary taxonomy, to each of the plurality of products based on each of the extracted one or more product entities and the product attribute entities. The method further includes identifying one or more target categories, within a target taxonomy used by the target digital marketplace, to be assigned to each of the plurality of products based on a mapping between the standardized intermediary taxonomy and the target taxonomy. Furthermore, the method includes generating the refined catalog by assigning the identified one or more target categories to each of the plurality of products.

[0006] In another aspect, a system for generating a refined digital catalog for publishing on a target marketplace is provided. The system includes an input/output unit, a memory unit, and a product feed management system processor operatively coupled to the input/output unit and the memory unit. The input/output unit receives one or more inputs from and providing output to one or more user devices, one or more catalog sources, and the target marketplace. The product feed management system processor includes an entity mining unit, a categorization unit, and a catalog enrichment unit. The entity mining unit is configured to receive an initial digital catalog from one or more catalog sources, the initial digital catalog including a plurality of products offered by a merchant and product information associated with each of the plurality of products. The entity mining unit is further configured to extract one or more product entities and one or more product attribute entities associated with each of the plurality of products based on the associated product information. Furthermore, the entity mining unit is configured to assign one or more standardized intermediary product categories, according to a standardized intermediary taxonomy, to each of the plurality of products based on each of the extracted one or more product entities and the product attribute entities. The categorization unit is configured to identify one or more target categories, within a target taxonomy used by the target digital marketplace, to be assigned to each of the plurality of products based on a mapping between the standardized intermediary taxonomy and the target taxonomy. The catalog enrichment unit is configured to generate the refined catalog by assigning the identified one or more target categories to each of the plurality of products.

[0007] In a yet another aspect, a computer readable medium comprising computer executable instructions for generating a refined digital catalog for publishing on a target digital marketplace is provided. The computer executable instructions are executed by a processor and cause the processor to receive an initial digital catalog from a catalog source, the initial digital catalog including a plurality of products offered by a merchant and product information associated with each of the plurality of products. The processor extracts one or more product entities and one or more product attribute entities associated with each of the plurality of products based on the associated product information. The processor assigns one or more standardized intermediary product categories, according to a standardized intermediary taxonomy, to each of the plurality of products based on each of the extracted one or more product entities and the product attribute entities. Further, the processor identifies one or more target categories, within a target taxonomy used by the target digital marketplace, to be assigned to each of the plurality of products based on a mapping between the standardized intermediary taxonomy and the target taxonomy. Furthermore, the processor generates the refined catalog by assigning the identified one or more target categories to each of the plurality of products.

BRIEF DESCRIPTION OF THE DRAWINGS

[0008] Embodiments will now be described with reference to the appended drawings wherein:

[0009] FIG. 1 illustrates a computing environment including an example product feed management system.

[0010] FIG. 2 illustrates an example product feed management system.

[0011] FIG. 3 illustrates an example of a knowledgebase entry.

[0012] FIG. 4 illustrates an example organizational scheme for organizing product entities with an internal knowledgebase.

[0013] FIG. 5 illustrates an example product feed management system processor of the product feed management system.

[0014] FIG. 6 illustrates an example schematic representing various stages of generating a refined digital catalog. [0015] FIG. 7 illustrates an example of a mapping table including a mapping between one or more knowledgebase entities and target categories.

[0016] FIG. 8 illustrates an example graphical user interface displayed by the product feed management system.

[0017] FIG. 9 illustrates an example method for optimizing digital catalogs.

DETAILED DESCRIPTION

[0018] At the outset, it will be appreciated that like drawing numbers on different drawings and/or views identify identical, or functionally similar, structural elements of the described system. The following detailed description refers to the accompanying drawings that illustrate exemplary embodiments of the present invention. However, the scope of the present invention is not limited to these embodiments but is instead defined by the appended claims. Thus, embodiments beyond those shown in the accompanying drawings, such as modified versions of the illustrated embodiments, may nevertheless be encompassed by the present invention.

[0019] The terms categories, classifications, and taxonomy are used interchangeably in the disclosure of this application.

[0020] References in the specification to “one embodiment,” “an embodiment,” “an exemplary embodiment,” or the like, indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Furthermore, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to implement such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

[0021] A system and method for optimizing and enriching digital catalogs to be published on a number of digital marketplaces is provided. Merchants publish one or more digital catalogs providing their products to sell on a number of digital marketplaces, such as but not limited to Google® shopping, Yahoo!® Shopping, Amazon®, eBay®, Facebook® marketplace, and the like. Digital catalogs allow users to search and obtain information about the various products offered for sale by the merchants. The digital marketplaces provide these products from multiple merchants to be offered for sale via their platform. When a user searches for a particular product on the digital marketplace platform, they are presented with a number of search results that match their query. Generally, every digital marketplace organizes the products offered on their platform based on categories or taxonomies to enable effective searching of the products. This means, when a user searches for a particular product using keywords, the relevance of the results returned depends on how accurately the products have been categorized in the digital catalogs.

[0022] Typically, merchants manually categorize their products to suit the categories or taxonomies defined by a target marketplace platform. For example, if a merchant A wants to publish their digital catalog on Google®, they manually categorize their products according to the taxonomy used by Google®. This is highly time consuming and expensive. Even the conventional systems utilized for product categorization are not always accurate in categorizing products according to various target taxonomies used by marketplaces. Moreover, the merchant needs to categorize their products multiple times to suit different marketplaces to ensure that their products are effectively searched on all the marketplace platforms. This problem is further exaggerated by the fact that the target marketplaces keep updating their taxonomies and categories as and when new products are added to their platforms. To keep up, the merchants are required to continuously keep updating their catalogs to have the products appropriately categorized to suit the updated target taxonomies. Additionally, when the products are intended to be marketed globally or in multiple countries, where the language may also be different for every digital marketplace, categorization of products becomes even more difficult and expensive. Further, in addition to appropriately categorizing their products, merchants also need to have suitable titles and description of their products in the digital catalogs to ensure that their products are returned as relevant results to a user search query.

[0023] To this end, a system and method for optimizing and generating refined digital product catalogues are provided. FIG. 1 illustrates an example environment 100 for implementing an example product feed management system 102, in accordance with the embodiments of the present disclosure. The product feed management system 102, hereinafter referred to as the system 102, is configured to analyze and optimize product feeds and enhance digital product catalogs before submitting or publishing the product catalogs and feeds to one or more digital or online marketplaces 104.

[0024] In addition to the system 102 and the digital marketplaces 104, the environment 100 also includes one or more catalog sources 106, one or more user devices 108, and a database 110, each communicating with one another and the system 102 via a network 112. Examples of the network 112 may include, but are not limited to, a wide area network (WAN) (e g., a transmission control protocol/internet protocol (TCP/IP) based network), a cellular network, or a local area network (LAN) employing any of a variety of communications protocols as are well known in the art.

[0025] As illustrated, the digital marketplaces 104 may include one or more digital marketplaces, such as the marketplaces 104-1 , 104-2, ... 104-N (collectively referred to as the digital marketplaces 104). The digital marketplaces 104 are online platforms that may be accessed via one or more digital platforms, such as, web portals, mobile applications, and so on, and that facilitate selling and buying of one or more products and/or services by their users. Examples of digital marketplace 104 may include, but not limited to, Google® shopping, Yahoo!® Shopping, Amazon®, eBay®, Facebook® marketplace that allow merchants to advertise and offer their respective products and/or services for sale and allow end users or consumers to purchase the products and/or services offered for sale. Among other goods and services, items such as collectibles, books, apparels, accessories, jewelry, appliances, computers, tickets, sporting goods, furniture, equipment, vehicles, vacation packages may be listed, bought and/or sold on online marketplace web sites.

[0026] The one or more catalog sources 106 may include catalog sources 106-1 , 106-2, ...106-N, collectively referred to as the catalog sources 106, that are configured to provide one or more digital catalogs, such as catalogs C1 , C2, ...CN to the product feed management system 102. For the sake of simplicity and for the purposes of explanation, each catalog source 106 is shown to provide one catalog to the system 102, however, it will be appreciated that every catalog source 106 may be configured to provide multiple catalogs to the system 102 for processing, optimizing, and enhancing, without deviating from the scope of the claimed subject matter. It may also be appreciated that each catalog C may include multiple documents and may even range from a few thousand to a few hundreds of thousands of documents. In an exemplary implementation, the catalog sources 106 may be associated with one or more merchants that intend to sell their products and/or services and the digital catalogs may include information associated with each of the products and/or services offered for sale by the merchant. For instance, a first merchant may provide a first catalog C1 via the first catalog source 106-1 and a second merchant may provide the second catalog C2 via the second catalog source 106-2, and so on, to the system 102 for optimization and enhancement before submitting to the one or more digital marketplaces 104.

[0027] In an example, the catalog sources 106 may be embodied as one or more network devices, such as, but not limited to, a personal computer, desktop computer, tablet, smartphone, or any other computing device capable of communicating with and transmitting digital catalogs to the product feed management system 102 via the network 112. It will be appreciated by those of ordinary skill in the art that the catalog sources 106 alternatively may function within a remote server, cloud computing device, or any other remote computing mechanism. In some embodiments, each catalog source 106 may include a plurality of electrical and electronic components providing power, operational control, communication, and the like. For example, each catalog source 106 may include, among other things, its own transceiver, display device, network interface, processor, and a memory (not shown) that cooperate to enable operations of the corresponding catalog source 106. Such components of a catalog source 106 are well known and hence not described herein in greater detail for the sake of brevity of the disclosure.

[0028] Each of the catalog sources 106 may include appropriate interface(s), such as a touch screen display, keyboard, or any other input-output device, to facilitate providing inputs to and receiving output from the system 102. In some alternative embodiments, the catalog sources 106 may be embodied as a document repository/database having a number of digital catalogs stored therein and may be configured to provide such catalogs to the product feed management system 102 for further processing. In some yet other implementations, the one or more catalog sources 106 may include a combination of network devices and document repositories/database working collaboratively to provide one or more catalogs to the system 102. Further, the catalogue sources 106 may be configured to provide digital catalogs in structured and/or non-structured format, such as but not limited to JavaScript Object Notation (JSON)-data format.

[0029] Each of the one or more user devices 108, such as, 108-1 , 108-2... 108-N (collectively referred to as the user devices 108) operates as an interface for a corresponding user interacting with the product feed management system 102. In on example, each user device 108 may be embodied as a personal computer, desktop computer, tablet, smartphone, or any other computing device capable of communicating with and transmuting documents to the system 102. Each of the user devices 108 may include appropriate interface(s), such as a touch screen display, keyboard, or any other input output device, to facilitate providing inputs to and receiving output from the system 102. Each user may utilize the respective user device 108 to provide one or more user inputs, such as, but not limited to, categorization inputs, training inputs, validation inputs, and receive one or more outputs, such as, but not limited to, categorization outputs, product feed optimization outputs, from the product feed management system 102. In some embodiments, the one or more user devices 108 may include an application or a web portal or any other suitable interface through which the user may communicate with the system 102. In some embodiments, each user device 108 may include a plurality of electrical and electronic components, providing power, operational control, communication, and the like within the user device 108. For example, each user device 108 may include, among other things, its transceiver, display device, network interface, processor, and a memory (not shown) that cooperate to enable operations of the corresponding user device 108.

[0030] The database 110 may be configured to store one or more of received digital catalogs from catalog sources 106, optimized, and enhanced catalogs, product mappings, and any other data received and/or generated by the product feed management system 102. The database 110 may be queried by the product feed management system 102 to retrieve information corresponding to or in response to one or more search queries received from the user devices 108. In an example embodiment, the database 110 may further include a knowledgebase 114 containing of one or more of categorization of products (e.g., a standardized intermediary taxonomy), predefined mapping of product categories or the standardized intermediary taxonomy to each of the one or more target taxonomies, and any other information required by the product feed management system 102 for optimizing product feeds and enhancing product catalogs. It may be appreciated that the knowledgebase 114, although shown integral to the database 110, may alternatively be implemented separately from the database 110, without deviating from the scope of the claimed subject matter. The knowledgebase 114 will be described in further detail with reference to subsequent figures and the upcoming description sections of the present disclosure.

[0031] In an example implementation, the database 110 and the knowledgebase 114 may be implemented as NoSQL (not only Structured Query Language) databases. In other implementations, the database 110 and knowledgebase 114 may be internal and/or an external databases and may be implemented using relational databases, such as, but not limited to, Sybase, Oracle, CodeBase, and Microsoft® SQL Server or other types of databases such as, a flat file database, an entity-relationship database, an object-oriented database, a record-based database, or any other type of database known presently or that may be developed in the future. It will be appreciated that the database 110 and the knowledgebase 114 may include any of volatile memory elements (e.g., random access memory (RAM), nonvolatile memory elements (e.g., ROM), and combinations thereof. Moreover, the database 110 and the knowledgebase 114 may incorporate electronic, magnetic, optical, and/or other types of storage media.

[0032] Referring now to FIG. 2, the example product feed management system 102 includes an input/output (I/O) unit 202, a memory unit 204, a communication interface 206, and a product feed management system processor 208. It will be appreciated by those of ordinary skill in the art that Fig. 2 depicts the product feed management system 102 in a simplified manner and a practical embodiment may include additional components and suitably configured logic to support known or conventional operating features that are not described in greater detail herein.

[0033] According to various implementations of the present disclosure, the product feed management system 102 may be implemented as a server, a personal computer, desktop computer, tablet, smartphone, or any other computing device known in the art or developed in the future. Further, although the entire product feed management system 102 is shown and described to be implemented within a single computing device, it may be contemplated that the one or more components of the system 102 may alternatively be implemented in a distributed computing environment, without deviating from the scope of the claimed subject matter. It will further be appreciated that the system 102 alternatively may function within a remote server, cloud computing device, or any other remote computing mechanism now known or developed in the future. For example, the system 102, in some embodiments, may be a cloud environment incorporating the operations of the I/O unit 202, the memory unit 204, the communication interface 206, the product feed management system processor 208 (hereinafter referred to as the system processor 208), and various other operating modules to provide the functionalities provided herein this disclosure.

[0034] The components of the system 102, including the input/output unit 202, the memory unit 204, the communication interface 206, and the system processor 208, may communicate with one another via a local interface 210. The local interface 210 may be, for example, but not limited to, one or more buses or other wired or wireless connections, as is known in the art. The local interface 210 may have additional elements, which are omitted for simplicity, such as controllers, buffers (caches), drivers, repeaters, and receivers, among many others, to enable communications. Further, the local interface 210 may include address, control, and/or data connections to enable appropriate communications among the aforementioned components.

[0035] The I/O unit 202 may be used to receive one or more inputs from and/or to provide one or more system outputs to one or more devices or components. For example, the I/O unit 202 may be configured to receive one or more inputs from the users of the system 102, such as the catalog sources 106 and the user devices 108, as will be described later herein, and provide output to the one or more users, such as those of the digital marketplaces 104 and the user devices 108 interacting with the system 102. System input may be received by the I/O unit 202 via, for example, a keyboard, touchpad, a mouse, and the like, associated with the devices 106, 108. System output may be provided by the I/O unit 202 via, for example, a display device, speakers, printer (not shown) and the like, associated with one or more of the devices 104, 106, 108.

[0036] The memory unit 204 may include any of the volatile memory elements (e.g., random access memory (RAM), nonvolatile memory elements (e.g., ROM), and combinations thereof. Further, the memory unit 204 may incorporate electronic, magnetic, optical, and/or other types of storage media. It may be contemplated that the memory unit 204 may have a distributed architecture, where various components are situated remotely from one another, and are accessed by the system 102, and its components, such as the system processor 208. The memory unit 204 may include one or more software programs, each of which includes an ordered listing of computer executable instructions for implementing logical functions. The software in the memory unit 204 may include a suitable operating system and one or more programming codes for execution by the components, such as the system processor 208 of the system 102. The operating system may be configured to control the execution of the programming codes and provide scheduling, input-output control, file and data management, memory management, and communication control, and related services. The programming codes may be configured to implement the various processes, algorithms, methods, techniques, etc. described herein.

[0037] The communication interface 206 may be configured to enable the system 102 to communicate on a network, such as the network 112, a wireless access network (WAN), a radio frequency (RF) network, and the like. The communication interface 206 may include, for example, an Ethernet card or adapter or a wireless local area network (WLAN) card or adapter. Additionally, or alternatively, the communication interface 206 may include a radio frequency interface for wide area communications such as Long-Term Evolution (LTE) networks, or any other networks known now or developed in the future. The communication interface 206 may include address, control, and/or data connections to enable appropriate communications on the network 112. [0038] The product feed management system processor 208 is a hardware device for executing software instructions, such as the software instructions stored in the memory unit 204. The system processor 208 may be any custom made or commercially available processor, a central processing unit (CPU), an auxiliary processor among several processors associated with the system processor 208, a semiconductor-based microprocessor, or generally any device for executing software instructions. When the system 102 is in operation, the system processor 208 may be configured to execute computer readable software instructions stored within the memory unit 204 to perform the operations of the system 102 pursuant to the software instructions.

[0039] The detailed working of the product feed management system 102 and how the system 102 operates to optimize the product feeds and enhance the catalogs will now be described in greater detail with reference to FIGS. 3 through 9.

Building Knowledgebase 114

[0040] As explained previously, to optimize product feeds, the system 102 is required to accurately categorize the products within the catalog to suit the categories or taxonomies of the target marketplaces 104. In order to accurately categorize every product in every digital catalog, it may be desirable to have a machine learning based Target Classification Model (hereinafter referred to as the TCM model) which can classify or categorize every product according to the taxonomy of the target marketplaces 104. Generally, such a TCM model may be able to classify the products based on the associated text description, images and/or other product details provided for the products in the catalog. However, developing such a TCM model is a challenge. For example, training such a TCM model requires a huge training corpus with labelled data including examples for every possible target categories corresponding to every target marketplaces, which may be a tedious and challenging task. Further, such a TCM model can only be built for a specific target marketplace, which means for different target marketplaces, different TCM models trained for the respective target marketplaces will be required, which is again not always possible, thereby exaggerating the problem. Moreover, raw outputs generated by such TCM models are not interpretable and cannot be validated by a user on the whole catalog.

[0041] To this end, the product feed management system 102 includes the knowledgebase 114 that is built to: i) provide a library or an intermediary taxonomy of product categories (that are internally standardized, hereinafter referred to as the standardized intermediary taxonomy) to facilitate categorization of the products at an intermediate level and then eventually according to various target marketplaces 104, as will be described later in the following sections; and ii) provide data that may be used for generating training data for machine learning models, such as a target categorization model, in accordance with the present disclosure. The knowledgebase 114 is an internal database or library including a collection of product entities and product attribute entities, where each entity is associated with a unique identifier. For example, the knowledgebase 114 may include various knowledgebase entries for every product entity, and/or for every product attribute entity, and may also be configured to link one or more of these knowledgebase entries with others so as to build a meaningful collection of information, e.g. in the form of the standardized intermediary taxonomy, that can be used for training machine learning models. Examples of product entities may include, but not limited to, bags, shoulder bags, wallets, shoes, and the like, whereas, examples of product entity attributes may include, but not limited to, color, fabric, and/ or any other attributes that can be used in describing one or more product entities.

[0042] Every entry within the knowledgebase 114 may include various information and description associated with the respective product entity and/or product attribute entity. For example, the knowledgebase 114 may include a number of knowledgebase entries, each corresponding to a product entity and including various information, such as, but not limited to, a unique identifier, a title or label, textual description, images and/or links to images, and other details associated with or defining the respective product entity. Similarly, the knowledgebase 114 may include a number of knowledgebase entries, each corresponding to a product attribute entity (e.g., a color or a fabric) and including various information, such as, a unique identifier, title or label, description, aliases, images, or links to images, and so on, that define the respective product attribute entity. In an example embodiment, the knowledgebase 114 is configured to facilitate an intermediary mapping of the products in a digital catalogue received from a merchant to one or more product entities (i.e., a standardized format) identified within the knowledgebase 114. Additionally, the knowledgebase 114 facilitates mapping of the intermediary mappings to target categories used by the various target marketplaces 104.

[0043] In one implementation, the standardized intermediary taxonomy of the knowledgebase 114 may define directional relationships between one or more entities (or nodes) and may be organized in the form of a directed graph having a hierarchical structure including directed relationships between nodes, where each node represents an edge while the directed relationship represents a link between two nodes. A directed graph may represent a formalized structure of the types, properties, and interrelationships of various entities and/or concepts within a domain. In one example, the directed graph may be in the form of a tree structure, however, other types of directed graphs, such as but not limited to, edge lists, linked lists, or any other type of machine-readable structure may also be contemplated to build the knowledgebase 114 in other implementations of the present disclosure. A node may correspond to, for example, an entity, a product attribute entity, a keyword, a term, and the like, corresponding to a domain of products that can be placed within the hierarchy and possesses a relationship with another node in at least one tree structure. Further, a leaf node in the tree may correspond to a terminal node that does not have any further child nodes within the tree. Typically, the knowledgebase 114 may include a number of nodes representing categories and sub-categories and defines a number of hierarchical relationship among them. In various implementations, the knowledgebase 114 may include a number of tree structures, each representing an individual domain of product categories, and wherein each tree structure may or may not be linked to another tree structure of the knowledgebase 114.

[0044] FIG. 3 illustrates an exemplary knowledgebase entry 302. As illustrated, the knowledgebase entry 302 is defined by a unique KB id (e.g., ID ABC123), and corresponds to an entity messenger bag. The knowledgebase entry 302 includes a label of the entity (e.g., “messenger bag”), a description of the entity, category associated with the entity, other categories linked to the entity (e.g., is a subclass of a parent category “Bags” indicates a directional linking of the category “messenger bag” to another category “bags”), aliases for the entity, aliases of the entity in other languages, and so on. It will be appreciated that the knowledgebase entry 302 is merely an example and is shown to be a simplified entry for the purposes of explanation, however, in practical implementations, the knowledgebase entry 302 may include additional fields and information associated with the corresponding entity. The knowledgebase 114 may include multiple knowledgebase entries as shown in FIG. 3 where each entry is identified by its respective unique identifier and is linked to other knowledgebase entries in a hierarchical structure. Further, the knowledgebase 114 also facilitates product categorization in different languages, since the knowledgebase entries corresponding to the same entity in different languages are all linked to the same unique KB id.

[0045] FIG. 4 illustrates an example organizing structure 402 for organizing various entities into categories and sub-categories in a directed graph within the knowledgebase 114 according to various embodiments of the present disclosure. The illustrated structure 402 is shown to organize categories for two domains or root categories, for example, “Arts & Entertainment” and “Apparels and Accessories” in a hierarchical tree structure, hereinafter referred to as sub-graphs 404 and 406. It may be contemplated that the structure 402 is shown to organize only two categorization trees, i.e., the sub-graphs, for the sake of simplicity of the disclosure and that the practical implementation of the structure 402 may include any number of sub-graphs including multiple root node categories and their corresponding sub-categories organized in a similar manner.

[0046] In an embodiment, the structure 402 includes sub-graphs 404, 406 for two root categories, where the relationship between the nodes correspond to “subclass of” and “instance of” relationships. However, it may be contemplated that the structure 402 may include any other additional directed relationships as well, without deviating from the scope of the claimed subject matter. For example, “sports” and “music” are subclass of the root category “Arts & Entertainment”. Similarly, “handbags” and “backpacks” are instances of the category “Bags & Wallets”, which is a subclass of the root category “Apparels & Accessories”. It will be appreciated that the directional arrows represent directional relationships, which may indicate a particular direction of the relationship between two nodes. This means that the reverse direction of the relationship may not be applicable. Therefore, in the illustrated example, the arrows indicate that the “sports” and “music” categories are child nodes or sub-categories (or, in other words, represent the “subclass of” relationship) of the parent category “Arts & Entertainment”, and that the reverse relationship may not be applicable. However, in some other implementations of the present disclosure, the nodes may have bidirectional relationships, and, in such cases, both the directed relationships are identified in the knowledgebase 114. In some examples, one or more nodes may be linked with two categorization trees, such as, nodes “running shoes” and “tennis shoes”, which are instances of sub-category “shoes” in the root category “Arts & Entertainment” may also be connected as being instances of sub-category “Athletic shoes” in the tree corresponding to the root category “Apparels & Accessories”. As will be appreciated, in many instances, it may be possible that a product within a catalog may be required to be categorized in more than one categories and root categories and the knowledgebase 114 is built to accommodate such scenarios. Moreover, a single category may be associated with multiple hierarchical trees, each of which may be useful for a different context in the ecommerce space, and the knowledgebase 114 is also built to accommodate such scenarios. Further, in various implementations, the knowledgebase 114 may also include various graphs for organizing other data and information, such as product attribute entities, that may be useful for categorization of the catalogs. For example, similar to the graphs shown in FIG. 4 for product entities, separate graphs for organizing various colors, fabrics, materials, and so on, may also be stored in the knowledgebase 114, wherein every graph may include hierarchical relationships between their respective nodes. All these graphs may be used by the system to accurately mine entities from the description provided in digital catalogs. In a further implementation, one or more nodes within the graphs may be enriched with additional information, for example, statistics about a typical price associated with the product entity, or gender associated with product entities, or pre-computed average text/image vector embeddings for the nodes, and so on. This additional information may be gathered over time from actual products data and attached to the knowledgebase entries as the machine learning models are iteratively trained over time.

[0047] Further, in an example implementation, the knowledgebase 114 and the information associated with the various product entities may be built based on information obtained from external and/or independent knowledge resources, such as, but not limited to, Wikidata, WordNet, Wikipedia, Wiktionary, and the like. For example, information regarding the product entities, aliases, aliases in other languages, and so on, may be obtained from external resources to build the knowledgebase 114. In some instances, only initial data may be imported from these external resources and the knowledgebase 114 may be enhanced overtime to be more refined and detailed by receiving feedback and additional information provided by the users as more and more catalogs are categorized and optimized. The knowledgebase 114 may also be updated to include new entities resulting from addition of new products.

Product Feed Management System 102

[0048] According to various embodiments of the present disclosure, the product feed management system 102 is configured to optimize and enhance the digital catalogs received from the catalog sources 106 by appropriately categorizing the products within the catalogs to suit the taxonomies or categories used by the target marketplaces 104. In a further embodiment, the product feed management system 102 also enhances catalogs by optimizing the advertisement texts provided by the catalogs. To this end, referring now to FIG. 5, the product feed management system 102 includes an entity mining unit 502, a categorization unit 504, a validation unit 506, an advertisement optimization unit 508, and a catalog enrichment unit 510. In some embodiments, these components may be implemented within the product feed management system processor 208; however, in certain other implementations, one or more of the entity mining unit 502, the categorization unit 504, the validation unit 506, the advertisement optimization unit 508, and the catalog enrichment unit 510 may be implemented in a distributed computing environment with some being implemented remotely, such as in a remote computing device and/or a cloud environment. [0049] FIG. 6 illustrates an exemplary functional schematic diagram 600 depicting various stages of generating the refined digital catalog from the catalogs received from the merchants. In an embodiment, the product feed management system 102 is configured to generate a standardized intermediary product categorization 602 for the received catalogs C. The intermediary product categorization 602 assigns a knowledgebase entity ID to every product received in the catalog C, and represents a first stage or intermediary standardized categorization of products. Further, the system 102 is configured to generate a refined product categorization 604 from the intermediary product categorization 602 to suit specific categorizations used by a desired target marketplace 104. For example, the same original catalog C may be categorized to generate respective refined product categorizations 604 individually suited for the target marketplaces 104. The refined product categorization 604 assigns the knowledgebase entity ID to a corresponding target categorization ID according to the taxonomy used by the desired target marketplace 104. The system 102 is further configured to apply the obtained refined product categorization 604 to the catalog C to obtain a refined catalog RC 606, which is then provided to the respective desired target marketplace 104. Since the refined catalog RC 606 is prepared and published to suit the desired target marketplace 104, the resultant product feeds are optimized according to the target marketplace 104.

[0050] To this end, referring back to FIG. 5, the entity mining unit 502 is configured to process a received catalog C to extract or mine one or more entities present within the catalog C and link the mined entities to one or more knowledgebase entities having a unique identifier associated with them within the knowledgebase 114. In an exemplary embodiment, the entity mining unit 502 may include one or more machine learning based models that are trained to mine entities within a received catalog.

[0051] In some examples, the received catalog C may have an original or merchant specified product types present therein whereas in some other examples, the received catalog C may not have any product types, and in some yet other examples, the catalog C may include product types specified for some products but not for all the products included therein. These product types may represent merchant specific categorization of the products within the catalog. For example, a received catalog may specify a “Handbag” under the categories “Summer Collection Women

Bags”. In some other instances, the catalog may only include description, images, and other information about the product but not the product type or category defined. In most of the cases, the product categorization provided by the merchant may or may not be directly mapped to the product categories used by the target marketplaces 104. To this end, the system 102 includes the entity mining unit 502 that is configured to perform the mining on the catalogs in both scenarios to extract one or more product entities and product attribute entities to be linked to the entities within the knowledgebase 114, which are subsequently converted according to the categorization used by the target marketplaces 104. In addition to identifying presence of a product entities and product attribute entities, the entity mining unit 502 may also be configured to identify absence of one or more product attributes and/or product attribute entities (for example, “pants with no pockets”) from product information so as to accurately identify the relevant product entities and product attribute entities to be linked to the knowledgebase entities. Therefore, in the above example, the entity mining unit 502 may identify only those pants that do not have pockets to be linked to knowledgebase entities.

[0052] In an embodiment of the present disclosure, the entity mining unit 502 may utilize pre-trained machine learning models for natural language processing to extract one or more product entities, product attribute entities, and the directed relationships from the catalog C that can be linked or mapped to one or more product entities within the knowledgebase 114. In one implementation, the entity mining unit 502 may be configured to utilize a pre-trained off-the-shelf natural language processing model(s) 512, such as, but not limited to, bidirectional transformers (e.g., Bidirectional Encode Representations from Transformers (BERT)) that may be already trained to perform natural language processing to identify the product entities and product entity attributes. In some other implementations, the entity mining unit 502 may utilize embeddings transformers language model for creating vectors or embeddings representing the product entities and product attribute entities within the catalog. The embeddings transformers language model may produce a distributed representation of words, phrases etc., by using n-grams, skip grams, GloVe, word2vec, fastText, or any other vectorization technique. For example, the embeddings transformers language model may be pre-trained based on a training text corpus and the model learns a representation for the entities (words, phrases, etc.) contained in the training text corpus.

[0053] In various embodiments of the present disclosure, the language processing model 512 may be configured to continually learn to enhance the accuracy of identification of product entities and product entity attributes within catalogs. For example, in some implementations, the model 512 may be configured to learn from user feedbacks regarding validations of generated outputs, or through manual identifications of product entities and product entity attributes (in case the model is unable to identify product entities, a user may manually assign entities to products within a catalog via a graphical user interface (not shown)). In some additional or alternative implementations, the model 512 may utilize web scraping techniques to continually learn and enhance the model’s product entity prediction abilities. For instance, the model 512 may continuously search for updates and/or additions to the information available on the Internet, and/or on merchant websites and/or on target marketplaces 104 by using web scraping techniques to keep learning and enhancing the product entities and product attribute entities prediction abilities.

[0054] In an implementation of the present disclosure, the language processing model 512 may be configured to process the text associated with each of the products and perform a text and/or pattern matching against the knowledgebase entities to identify the unique IDs within the knowledgebase 114 that can be mapped to the product. For example, the language processing model 512 may be configured to identify the matches based on one or more of position of the mined entity in the text, number of occurrences of the entity in the text (i.e., preferring entities with multiple occurrences), length of match (preferring longer matches), and specificity of the entity within the knowledgebase 114 (i.e., preferring more specific entities within the knowledgebase graphs). This mapping obtains an intermediary categorization of the product to standardized knowledgebase entities within the knowledgebase 114. Further, the entity mining unit 502 may additionally utilize a document level root classifier model 514 (hereinafter referred to as the RCM model 514) that may be trained to predict for every product, the knowledgebase root entity corresponding to the mined knowledgebase identities contained in the catalog C by the language processing model 512.

[0055] For example, consider the following product description of a “Bag” in a catalog:

“Shoulder baa with long straps and zipped pockets for wallet”

[0056] In the above example, the language processing model 512 may be configured to utilize the pattern matching to identify two different entities “shoulder bag¹’ and “walle in the knowledgebase 114. However, this result from the language processing model 512 may result in incorrect categorization of the product as wallet. Therefore, the RCM model 514 may be configured to understand the rest of the text description of the product as well to identify that the correct root entity is “Bag”. That is, the RCM model 514 reads the other words of the description (e.g., long straps, zipped) and understands that these words do not appear in wallet products. The entity mining unit 502, further determines that since the “wallet” descends from a different root entity in the knowledgebase 114, the correct categorization of the product will be “shoulder bag” and not “wallet”. Accordingly, the product is assigned the unique KB id corresponding to the “shoulder bag” knowledgebase entity in the knowledgebase 114. In order to further narrow down the entities to be mined, the entity mining unit 502 may be configured to receive a set of user defined root categories and may be configured to mine for entities that are related to these user defined root categories only. For example, a user can specify, via the GUI, to only mine entities under the KB root entities "Pants" and "Shirts", and in such cases, the entity mining unit 502 may only look for descendants of these root entities within the text of the received catalog C, and eliminate other non-related entities, such as, "Sunglasses" from consideration. Additionally, in some embodiments, the entity mining unit 502 may be configured to further narrow down the entities by pruning the resultant graphs, such as based on user selected input categories received via the GUI, to select only the entities and sub-graphs relevant for the user defined categories and exclude one or more subsets of entities that are irrelevant. For instance, in the above example, the entity mining unit 504 may be configured to further refine the resultant graph by eliminating the subgraphs for “Shorts” and/or “Skirts” from consideration.

[0057] It will be appreciated that similarly, the entity mining unit 502 may be configured to assign multiple unique KB ids to a single product depending on how many entities the language processing model 512 and the RCM model 514 are able to identify accurately. In some implementations, the RCM model 514 may be trained on initial training data including accurately mined root entities for products, however, in some other implementations, the RCM model 514 may also be trained without the mined entities so that it learns to use only the other words present in the training data. Further, it will be appreciated that the above-mentioned method of using the RCM model is merely an example way of improving the selection of knowledgebase entities representing the product within the catalog C, and that any other method or machine learning models may be utilized to achieve similar results, without deviating from the scope of the claimed subject matter.

[0058] Further, in an embodiment, the entity mining unit 502 may be configured to cooperate with the validation unit 508 to provide the generated output of intermediary categorization to a user via a Graphical User Interface (GUI) displayed on the respective user device 108. In some examples, the generated output may be validated by the user via the GUI, such that the RCM model 514 and the language processing model 512 iteratively learn from the user feedback to increase accuracy of categorization of products over time. Further, the GUI allows the user to correct any incorrect mappings determined by the entity mining unit 502, which in turn prevents the system 102 from incorrectly categorizing the products and compromising on the quality of product feeds submitted to the target marketplaces 104. Over time, the models 512, 514 may be trained to be able to accurately identify the relevant knowledgebase entities within the catalog without any human intervention.

[0059] Further, the entity mining unit 502 may be configured to analyze one or more missed products in the catalog C that could not be mapped to any knowledgebase entity of the knowledgebase 114. For example, missed products may correspond to one or more products with no mined entities or when the correct mined entity is not present in the knowledgebase 114 or when the models 512, 514 incorrectly map the products to wrong entities in the knowledgebase 114. To ensure that the catalog is accurately categorized, it is important to make sure all the products are categorized and that too, accurately. Thus, the entity mining unit 502 identifies the one or more missed products and attempts to determine possible product entities matching the product.

[0060] In some implementations, the entity mining unit 502 may provide to display one or more of the missed products on the GUI of the user device 108 and prompt the user to manually enter product entities for the missed products. The user provided entities may then be included in the knowledgebase 114. In some additional or alternative implementations, the language processing model 512 may be configured to suggest new entities and receive feedback on the suggested entities from the user via the GUI. In such implementations, for example, the language processing model 514 may be configured to compare a term-frequency ranking for n- grams in the missing products with the term-frequency ranking for n-grams in the labeled ones and predict the possible shortlisted entities for the products based on such comparison. For example, a candidate entity may only be shortlisted if it appears for a predefined minimum number of times.

[0061] The shortlisted entities may be presented to the user via the GUI to receive the user’s feedback, and the validated entities are assigned to the missed product and also added to the knowledgebase 114, thereby enriching the knowledgebase 114 over time. Further, the user may either manually refer to external resources or the entity mining unit 502 may obtain information about the new entities from external resources, for example, to determine if the new entity is somehow associated with an existing knowledgebase entity or if it is completely new. For example, if the new entity is determined to be an alias of an existing knowledgebase entity, then the user may add the new entity as an alias to the corresponding knowledgebase entry (such as the one shown in FIG. 3) associated with the knowledgebase entity. However, if the new entity is determined to be not connected to any of the existing knowledgebase entity, then the user may add the new entity as a custom entity to the knowledgebase 114. Additionally, in order to assist the user and/or the entity mining unit 502 in accurately placing the new entity in the knowledgebase 114, the RCM model 514 may be configured to also suggest one or more root categories that may be associated with the new entity, in a similar manner as described previously. However, in some additional or alternative implementations, the entity mining unit 502 may utilize additional models, such as, a named entity recognition model, topic modelling/clustering models, keyword extraction algorithms, and the like, that may be trained on labeled training data to suggest new entities for missed products and accordingly add them to the knowledgebase 114 in a similar manner. Furthermore, the newly added entities may be periodically reviewed, either by using machine learning models or by the users to spot common entities that can be added to the knowledgebase 114, so that they are available for the other product catalogs that may be processed by the system 102 in future.

[0062] Further, the product feed management system 102 includes the categorization unit 504 that is configured to determine the target categories corresponding to the target marketplaces 104 and generate the refined product categorization 604 for the products in the catalog C. At this stage, the standardized intermediary product categorization 602 has been achieved for the products in the catalog C, in that, every product has been assigned the unique KB id(s) corresponding to the product entity within the knowledgebase 114. In an example embodiment, the categorization unit 504 may utilize a machine learning based target classification model (hereinafter referred to as the TCM model) that is configured to apply a predefined mapping of knowledgebase entities (or the standardized intermediary taxonomy in the knowledgebase 114) to the target marketplace taxonomy on the obtained catalog to obtain the refined product categorization 604.

[0063] As will be appreciated, similar to the structure 402 shown in FIG. 4 for organizing entities in the knowledgebase 114, every target marketplace 104 also has its own taxonomy or organizational scheme for organizing the products and/or services that are provided on their marketing platform. In some implementations, a predefined mapping between knowledgebase entities and different target taxonomies may be stored in the database 110 and the categorization unit 504 may use these mappings to predict the target categories applicable for the mined product entities in the catalog C. In one example, the mapping may initially be defined for broad or generic categories, such as “Bags”, “shoes”, etc., and may be propagated from the corresponding generic category in the knowledgebase 114 to all its sub-categories, which means that the entire sub-graph corresponding to the generic category (e.g., the sub-categories descending from the category “bags”) in the knowledgebase 114 may be mapped to the same generic target category in the target marketplace 104. Such mappings may be enriched with inputs and information obtained over time. In other examples, the mappings may be more detailed to include direct mappings between sub-categories in the knowledgebase 114 and the target categories.

[0064] In some yet other implementations, the knowledgebase 114 may include mapping of only a subset of the graph to the target categories. For example, when some sub-categories do not reflect the expected meaning of the generic category, the mapping propagated to the descendants may be inaccurate. Therefore, in such cases, the graphs may be pruned before mapping, by selecting only a desired entity or a sub graph and removing all the descendants that are non-critical for the purpose of mapping to the target categories. By pruning the graphs, the mapping accuracy may be enhanced. For example, the categorization unit 504 may be configured to iteratively parse through the entire graph, determine relevancy of every entity and if the relevance of a particular entity is not defined, remove that entity from the graph as being non-critical for categorization. In some examples, all entities descending from a given root entity may be removed if the root entity is determined as non-critical. However, in some other examples, only those entities that have been individually identified as being non-critical for categorization may be removed from the graph.

[0065] The database 110 may store various mapping tables corresponding to different target marketplaces 104. One such exemplary mapping table 700 is shown in FIG. 7. The mapping table 700 includes a mapping of a few categories in the knowledgebase 114 to the categories in the target marketplace 104. More specifically, the mapping table 700 indicates the mapping between the knowledgebase entity and the corresponding knowledgebase entity ID (shown in column 702) to the target entity and the corresponding target entity ID (shown in column 704) used by the target marketplace 104. As explained previously, the mapping table 700 may be different for different target marketplaces 104.

[0066] As will be appreciated, the taxonomies used by target marketplaces 104 may be available on the Internet and may be obtained by querying, web crawling, web scraping techniques. Once this information is available, a mapping of the knowledgebase entities to the categories in the target taxonomies can be populated either manually or by using machinelearning models. In some implementations, the categorization unit 504 may facilitate building such mapping tables 700. To this end, the categorization unit 504 may include and utilize a mapping unit 516 to build the mapping tables 700 for various target marketplaces 104. The mapping unit 516 may be configured to receive an initial data, from the user via the GUI, including a small set of user specified generic product entities (e.g., bags, shoes, clothes) from the knowledgebase 114 and their corresponding mapped target categories. The mapping unit 516 may build a very small and generic mapping table based on this received initial data. Next, the mapping unit 516 may be configured to identify all the entities (i.e., one or more sub-graphs) within the knowledgebase 114 that descend from these user specified entities (e.g., heels, athletic shoes, shoulder bags, backpacks, dresses, trousers, etc. as shown in the exemplary structure 402 in FIG. 4). The mapping unit 516 attempts to map these sub-graphs to target categories using the initial mapping table. The mapping unit 516 further determines a coverage percentage indicating how much of the identified sub-graph(s) can be mapped to target categories using the mapping table built so far. When the coverage percentage is determined to be lower than a threshold, the mapping unit 516 may prompt the user, via the GUI, to provide additional mappings on some more generic entities to increase the coverage. Over time, the mapping tables may become more robust and detailed.

[0067] However, in some other implementations, the user may only select one or more target categories from the GUI and the mapping unit 516 may be configured to identify the relevant knowledgebase entities that can be mapped to the user selected target categories. For instance, such a configuration may be implemented once the complete set of mappings for the selected target marketplace is defined, and new catalogs are required to be categorized using these mappings. To this end, the categorization unit 504 may be configured to receive a set of user defined target root categories, for example, corresponding to a particular target marketplace, and may be configured to search only for the KB entities that are related to these user defined target categories. For example, the categorization unit 504 may be configured to utilize a predefined broad set of mappings (built and stored in the knowledgebase 114) for the selected target marketplace and trace inversely to identify the relevant Knowledgebase entities for the received catalog that can be mapped to the target categories. For instance, a following predefined knowledgebase entity to target categories mapping may be stored in the knowledgebase 114 for a particular target marketplace, e.g., Google® Shopping:

- Pants (KB) - > Apparel > Clothing > Trousers (Google®)

- Shirts (KB) - > Apparel > Clothing > T-Shirts (Google®)

- Shoes (KB) - > Apparel > Clothing (Google®)

- Sunglasses (KB) - > Apparel > Accessories > Sunglasses (Google®)

- ...N [0068] Therefore, in the above example, when the user selects a Target Category “Apparel --> Clothing (Google®)” to configure the system, the categorization unit 504 may be configured to back track and identify [Pants (KB), Shirts (KB), Shoes (KB)] as the relevant KB entities that can be mapped to the target category for a received catalog that needs to be categorized for Google® Shopping. This way, the search criteria can be narrowed down to only those KB entities that are relevant to the user selected target categories and may eliminate any potential inaccuracies predicted by the model(s).

[0069] The mapping tables 700 may also be accessible to the user via the GUI displayed on their respective user devices 108. The user may add, delete, edit, or review these mapping tables periodically to ensure accuracy. Moreover, the mapping tables 700 are periodically updated according to the changing taxonomies of the target marketplaces 104. For example, the target marketplaces 104 may be continually scanned, by using querying, web crawling, web scraping techniques, to identify any updates or changes to the target taxonomies and the mapping tables 700 are accordingly updated to ensure that the knowledgebase entity IDs are accurately mapped according to the updated taxonomies of the target marketplaces 104. Furthermore, the user inputs and edits to the mapping tables 700 may also be based on the changes made to the target taxonomies and all such edits may be validated by the user and then updated in the database 110. In some other implementations, the mapping tables 700 may be updated by leveraging the language processing model 512 that has been trained to accurately mine product entities and update or add new mappings between the knowledgebase entities and the target entities.

[0070] Further, the categorization unit 504 is configured to obtain a target category labels dataset TD that may be applicable to the catalog C and the products included therein. The target category labels dataset TD may include all the target categories and their corresponding target entity ids that may be mapped to the knowledgebase entities and the corresponding knowledgebase entity ids included in the intermediary product categorization 602 of the catalog C. Once the target category labels dataset TD is obtained, the categorization unit 504 may be configured to use this dataset to deduce one or more mapping rules among the original product types (provided in the original merchant catalog), the mined product entities (i.e., the standardized intermediary product categorization 602 obtained from the knowledgebase 114), and the target categories used by the desired target marketplaces 104. The deduced mapping rules facilitate the machine learning based target classification model to eventually learn to automatically determine the target entities that can be mapped to the product types in the original catalogs. Examples of the mapping rules may include, but not limited to:

Rule 1 : (original product type) (target category)

Rule 2: (original product type, mined entity) (target category)

Rule 3: (mined entity) (target category)

[0071] The above example rules may be presented to the user for validation, and once approved, they are used by the categorization unit 504 to process the received catalog and predict the target categories. For example, initially, the categorization unit 504 may be configured to determine one or more product types in the original catalog C that can be directly mapped to the target category (Rule 1 ). Rule 2 is applied to determine if a combination of the original product types and the one or more product entities mined can identify any target categories that can be mapped. Rule 2 may facilitate determining target categories for non- homogenous product types and also allows the user to spot incorrectly mined keywords or entities by the entity mining unit 502. For instance, if a mined entity does not match with the product types, the mined entity may be determined as incorrect and may be discarded or corrected. Rule 3 may be applicable when only the mined entities are available to determine target categories for mapping.

[0072] For validation, the categorization unit 504 may present each of these rules and the predicted target categories based on these rules along with a confidence score to the user via the GUI. The confidence score may be determined based on one or more of a i) a raw count of how many elements in the catalog are mapped to the target category; ii) the hierarchy structure of the taxonomy at the target marketplace 104; and iii) a probability score associated with the prediction output generated by the target classification model of the categorization unit 504. The confidence score may indicate how confident the categorization unit 504 is about the accuracy of the predicted mapping.

[0073] In an exemplary implementation, where rule 1 is applied, the confidence score is presented in terms of a percentage value indicating how much the original product type overlaps with a target category. For instance, given an original product type being defined as “Summer Collection Women

Bags”, the categorization unit 504 applies rule 1 to predict the target categories. In case where almost all of the original product entities are mapped to a single target entity also named “Bags”, the categorization unit 504 may obtain a confidence score of, for example, 95% and display the same on the GUI. However, in case where the product entities are mapped to target category “Bags” but also to sub-categories “Handbags” and “Backpacks”, then a confidence score for the target category “Bags” will be higher than the confidence score for sub-categories “Handbags” and “Backpacks”. Similarly, where only mined entities are used to predict the target categories (e.g., in rule 3), the confidence score may be presented as a raw count of how many mined entities are mapped to each target category.

[0074] These confidence scores are displayed in a sorted manner (e.g., highest on top) on the GUI to allow the user to validate the mapping rules that have a higher confidence score and edit the ones that have low confidence scores to manually correct/edit/or reject the mappings. The validation and the feedback provided by the user may be provided to retrain the target classification model of the categorization unit 504 to enhance the accuracy of mappings over time. FIG. 8 illustrates an exemplary graphical user interface (GUI) 800 displayed on the user device 108 to provide the mapping rule predictions along with their associated confidence scores. It will be appreciated that the GUI 800 shown in FIG. 8 is merely exemplary and is simplified for the sake of simplicity of the disclosure, whereas in practical implementations, the GUI may include additional details and may display many more prediction results in any manner to achieve similar results.

[0075] As shown, for Rule 1 , the target category of “Clothing > Dress” is predicted with 95% confidence score for the original product type “Women > Dress”, whereas for original product type “Women > Bag”, the target category “Apparel > Handbag” is predicted with only 60% score. Based on the displayed confidence scores and/or counts, the user may approve or disprove the predicted mapping rules. For example, the user may approve the mapping rules for individual prediction results having a confidence score greater than a threshold, and may disprove or edit the ones that have confidence scores lower than a threshold value. In some implementations, the threshold values may be predefined or may be defined in real time to allow the users to evaluate how accurate the predictions are.

[0076] Once the one or more mapping rules are approved, for example, by the user via the GUI, the entire catalog C having intermediary product categorization, is processed using the approved mapping rules to obtain the refined product categorization 604 for every product in the catalog C. In an implementation, if the original product type appears in the approved mapping rule 1 , then the rule is applied to obtain the target categories corresponding to the product types. However, if the direct mapping between the original product type and target categories is not applicable, then the categorization unit 504 determines if the mined entities and the original product type can be used in combination to predict the target categories. If yes, then rule 2 is applied else the categorization unit 504 tries to apply rule 3 to predict target categories based on the mined entities only. In some implementations, the categorization unit 504 may also leverage the raw TCM model outputs as a fallback to predict the target categories if none of the rules are able to predict target categories for one or more products in the catalog.

[0077] In some embodiments, as the received catalog C is categorized for more and more products contained in it according to a particular target marketplace, the categorizations and the mappings achieved may be further used as labeled dataset to build and/or train a specialized TCM model, (which is built only on the data coming from the particular catalog) that may be able to predict the target categories directly from the products (such as from their titles, description, and so on) within the received catalog C, that is, without the entity mining and the intermediate categorization of the products. The specialized TCM model, may therefore, include fewer target categories (that are specific to the received catalog C and for the target marketplace) and is specialized to recognize patterns present in the received catalog C. In some implementations, separate specialized TCM models may be built for categorizing every received catalog according to every desired target marketplace. That is, a first TCM model may specialize in categorizing the received catalog C according to a first target marketplace, such as Google® shopping and a second TCM model may specialize in categorizing the received catalog C according to a second target marketplace, such as Facebook® marketplace, and so on. Next, the categorization unit 504 may apply the same method as described above: i.e., mining KB entities, applying mapping and computing a mapping rule table using the predicted outputs from the specialized TCM model. The categorization unit 504 may then be able to present these mapping rules with a higher or more precise confidence score for the user to approve or reject via the GUI, in a similar manner as described previously. In some implementations, the categorization unit 504 may be configured to utilize the predictions and confidence scores from both of the specialized TCM model as well as the first TCM model using the entity mining and intermediate categorization described above, to generate a confidence score that measures the agreement of output categorization by both models, thereby generating a more accurate and efficient output over time.

[0078] In a yet another embodiment, the categorization unit 504 may further be configured to utilize predictions generated by a third model, which may be a generic TCM model that has been trained to categorize products into target categories. Such a generic TCM model may be a machine learning model that has been trained to categorize all products in various catalogs (i.e., the model is not specifically trained on any catalog) according to target marketplaces. In an implementation, categorization unit 504 may utilize separate generic TCM models for separate target marketplaces, such as Google® shopping, Facebook® marketplace, and so on to categorize various catalogs. Further, accuracy of these generic TCM models may be enhanced by allowing the user to provide a set of target categories (as described above) and restricting the prediction output of the generic TCM model based on them. Therefore, although the generic TCM model may predict any target category (because it is trained on numerous different catalogs), by having the user select a few target categories, the predictions of the generic TCM model may be restricted to only the user defined categories and any incorrect mappings (i.e., other than the ones defined by the user) may be filtered out. The categorization unit 504 may be configured to obtain the mapping rules based on the predictions of this third model in a similar manner as described above. In such implementations, the categorization unit 504 may be configured to utilize an “ensemble” of all three models, i.e., the first TCM model including the steps of entity mining and intermediary categorization, the specialized TCM model and the generic TCM model, to generate the confidence scores for the mapping rules, where this generated confidence score measures the agreement of output between the three models.

[0079] Once the target categories are predicted with a high confidence score for all the products in the catalog C, the catalog enrichment unit 510 may be configured to assign the unique target ids associated with the determined target categories to the respective products in the catalog C. As explained above, all the predicted results may be validated by the user before applying them to the catalog C to prevent any errors in categorization. However, as the user provides feedback, the models may be iteratively retrained on this user feedback to enhance their prediction abilities.

[0080] As the products within the catalog C are categorized according to the target categories used by the target marketplaces 104, the overall product feeds published at the target marketplaces are consequently enhanced.

[0081] In an embodiment of the present disclosure, the product feed management system 102 further includes an advertisement optimization unit 508 to further improve product search ranking of the products within the catalog. Once the products are accurately categorized according to the target categories, the advertisement optimization unit 508 may be configured to enhance the product details within the catalog C. To this end, in one embodiment, the advertisement optimization unit 508 may be configured to communicate with one or more auxiliary data sources 518 (shown in FIG. 5) to receive information about frequently or most searched keywords, for example, in any domain or in any given target countries. The auxiliary data sources 518 may be any external or third-party database or knowledgebase having useful keyword search volume statistics that may provide insights about the most searched keywords and/or key strings. This data may be obtained for a particular domain, such in the ecommerce domain, and/or for specific target countries, such as in the United States of America (USA), Canada, the United Kingdom (UK) and so on. In a further embodiment, the advertisement optimization unit 508 may also be configured to receive data about the most searched keywords in any particular language, such as French, Spanish, Chinese, Japanese, and so on. In some implementations, the search volumes may be retrieved for raw entity names, such as “shoulder bag” as well as with the brand names, such as “Prada shoulder bag” to know the search volumes relevant for the merchant and the specific catalog C that is being optimized. In some yet other implementations, the advertisement optimization unit 508 may also analyze competitor products and campaigns to identify which keywords and/or products are highly searched in a desired target marketplace.

[0082] Once the product entities are mined from the catalog C, the advertisement optimization unit 508 may be configured to extract all the aliases/synonyms identified for each of the mined entities from the knowledgebase 114. Further, the advertisement optimization unit 508 may be configured to compare a volume of the actual mined entities with the data received from the auxiliary data sources 518 to determine if the actual mined entities are among the highly searched keywords or not. If the unit 508 determines that the actual mined entities are not among the highly searched, it searches for aliases or synonyms of the mined entity to check if any of them appear in the highly searched keywords. Based on this analysis, the advertisement optimization unit 508 may be configured to revise one or more of the product title, and the product description of the product by adding the alternative aliases or synonyms that are determined to be highly searched to them. However, in some further implementations, the advertisement optimization unit 508 may be configured to even replace the actual mined entity with the determined alternative aliases or synonyms to achieve a more optimized catalog.

[0083] For example, if the actual mined entity in the product title states, “messenger bag” and the advertisement optimization unit 508 determines that “Courier Bag” and/or “Cross-Body bag” are more vastly searched than the term “messenger bag”, then the unit 508 may optimize the product title and the product description to also include the terms “courier bag” and “crossbody bag” to improve the product search ranking. In other examples, the advertisement optimization unit 508 may be configured to similarly optimize product titles and descriptions for different languages by obtaining search volumes for the corresponding language and the target country.

[0084] Further, in some implementations, the advertisement optimization unit 508 is configured to coordinate with the validation unit 506 to provide the search volume statistics and the optimized product titles and product descriptions to the user for validation via the GUI. In such implementations, the user may be prompted to select the best keywords to add to the product titles and description. In some other implementations, the advertisement optimization unit 508 may utilize a machine learning model to predict the popular keywords suitable for adding to the title and/or description and display them with a confidence score on the GUI to facilitate the user to either approve or reject the predicted results. This way, the machine learning model may learn from the user feedbacks to automatically predict the best keywords and automatically optimize the product titles and descriptions according to the target marketplaces with higher accuracy.

[0085] Further, once the product titles and descriptions are optimized by the advertisement optimization unit 508 and all the products are appropriately categorized according to the target marketplace 104 by the categorization unit 504, the catalog enhancement unit 510 may be configured to generate the refined catalog RC by applying the refined product categorization 604 and the optimized product details to the original catalog C. As explained previously, the product feed management system 102 may be configured to generate the refined catalogs in the similar manner for different target marketplaces, even to suit different countries and languages. Finally, the generated refined catalog RC is transmitted to the desired digital marketplace for publishing.

[0086] Referring now to FIG. 9, an example method 900 for refining a digital catalog is provided. The method begins at step 902, where the entity mining unit 502 receives a digital catalog C and uses machine learning based models to extract one or more product entities from the catalog at step 904. Further, at step 906, the entity mining unit 502 is configured to link each of the product entities to one or more intermediary standardized product category, e.g., the product entities within the knowledgebase 114, thereby obtaining a standardized intermediary product categorization 602 for the catalog C. In an embodiment, the entity mining unit 502 may utilize machine learning models, such as the language processing model 512 and the RCM model 514 to extract one or more product entities, product attribute entities, and the directed relationships from the catalog C and link or map them to one or more product entities within the knowledgebase 114. Further, the entity mining unit 502 also analyzes missed products within the catalog to identify any new intermediary product categories that may be added to the knowledgebase 114.

[0087] Once all the products are categorized with the standardized intermediary product categories, at step 908, the categorization unit 504 determines one or more target categories corresponding to the desired target marketplace 104 to be mapped to each of the assigned one or more intermediary standardized product categories to obtain a refined product categorization for each of the extracted product entities. In some implementations, predefined mappings between knowledgebase entities (or the standardized intermediary taxonomy) and the different target taxonomies may be stored in the database 110 and the categorization unit 504 may use these mappings to predict the target categories applicable for the mined product entities in the catalog C. A target category labels dataset TD that may be applicable to the catalog C and the products included therein may be obtained. For example, the target category labels dataset TD may include all the target categories and their corresponding target entity ids that can be mapped to the intermediary product categories (i.e., the knowledgebase entities) and the corresponding knowledgebase entity ids included in the intermediary product categorization 602 of the catalog C. Further, once the dataset TD is obtained, one or more mapping rules for predicting the target categories may be deduced, for example, by the categorization unit 504.

These mapping rules present mapping configurations to an operator, such as support technician of the system 102, for validation. For example, the categorization unit 504 may apply these rules to predict target categories for every product type and accordingly assigns a confidence score to each predicted result. The operator may validate these rules, via the GUI, if the predicted target category by using the rule has a high confidence score. Once the one or more mapping rules are approved, the entire catalog C having intermediary product categorization, is processed using the approved mapping rules to obtain the refined product categorization 604 for every product in the catalog C.

[0088] Further, at step 910, the advertisement optimization unit 508 may determine one or more additional and/or replacement keywords for optimizing product title and product description of each product within the received catalog C. For example, the advertisement optimization unit 508 may be configured to communicate with one or more external auxiliary data sources 518 to receive information about frequently or most searched keywords, for example, in any domain or in any given target countries or in any target language. The advertisement optimization unit 508 compares a volume of the actual mined entities within the catalog C with the data received from the auxiliary data sources 518 and revises the one or more of the product title, and the product description of the product to replace the actual mined entity with the alternative aliases or synonyms that may be more popular or are highly searched.

[0089] At step 912, a refined catalog is generated based on the determined target categories for each of the extracted product entities and the optimized product title and product description. For example, once the product titles and descriptions are optimized by the advertisement optimization unit 508 and all the products are appropriately categorized according to the target marketplace 104 by the categorization unit 504, the catalog enhancement unit 510 may generate the refined catalog RC by applying the refined product categorization 604 and the optimized product details to the original catalog C. Finally, the generated refined catalog RC is transmitted to the desired digital marketplace for publishing.

[0090] For simplicity and clarity of illustration, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements. In addition, numerous specific details are set forth in order to provide a thorough understanding of the examples described herein. However, it will be understood by those of ordinary skill in the art that the examples described herein may be practiced without these specific details. In other instances, well-known methods, procedures, and components have not been described in detail so as not to obscure the examples described herein. Also, the description is not to be considered as limiting the scope of the examples described herein.

[0091] It will be appreciated that the examples and corresponding diagrams used herein are for illustrative purposes only. Different configurations and terminology can be used without departing from the principles expressed herein. For instance, components and modules can be added, deleted, modified, or arranged with differing connections without departing from these principles.

[0092] The steps or operations in the flow charts and diagrams described herein are just for example. There may be many variations to these steps or operations without departing from the principles discussed above. For instance, the steps may be performed in a differing order, or steps may be added, deleted, or modified. Although the above principles have been described with reference to certain specific examples, various modifications thereof will be apparent to those skilled in the art.

Claims

CLAIMS:

1 . A method for generating a refined digital catalog for publishing on a target digital marketplace, the method comprising: receiving, by a product feed management system, an initial digital catalog from a catalog source, the initial digital catalog including a plurality of products offered by a merchant and product information associated with each of the plurality of products; extracting, by the product feed management system, one or more product entities and one or more product attribute entities associated with each of the plurality of products based on the associated product information; assigning, by the product feed management system, one or more standardized intermediary product categories, according to a standardized intermediary taxonomy, to each of the plurality of products based on each of the extracted one or more product entities and the product attribute entities; identifying, by the product feed management system, one or more target categories, within a target taxonomy used by the target digital marketplace, to be assigned to each of the plurality of products based on a mapping between the standardized intermediary taxonomy and the target taxonomy; and generating, by the product feed management system, the refined catalog by assigning the identified one or more target categories to each of the plurality of products.

2. The method of claim 1 , wherein assigning one or more standardized intermediary product categories further comprising identifying, by the product feed management system, absence of one or more product entities or product attribute entities within the product information associated with each of the plurality of products.

3. The method of claim 1 , wherein extracting the one or more product entities and product attribute entities further comprising: displaying, by the product feed management system via a graphical user interface (GUI), one or more of non-mined or incorrectly mined product entities and product attribute entities for each of the plurality of products; receiving, by the product feed management system, from a user via the GUI, one or more of a user feedback or additional product entities and product attribute entities for each of the displayed non-mined or incorrectly mined product entities or product attribute entities; and updating, by the product feed management system, the standardized intermediary taxonomy to incorporate each of the user feedback and the one or more additional product entities and product attribute entities received from the user.

4. The method of claim 1 , wherein each standardized intermediary product category within the standardized intermediary taxonomy includes a unique identifier associated therewith, and wherein assigning the one or more standardized intermediary product categories to each of the plurality of products further includes assigning, by the product feed management system, the unique identifiers corresponding to each of the assigned standardized intermediary product categories to the products.

5. The method of claim 1 , wherein each of the one or more standardized intermediary product categories within the standardized intermediary taxonomy includes one or more of description of the category, aliases, category information in a plurality of languages, and a directed relationship with at least one other category within the standardized intermediary taxonomy.

6. The method of claim 1 , wherein assigning the one or more standardized intermediary product categories further comprising: displaying, by the product feed management system on a graphical user interface, each of the assigned one or more standardized intermediary product categories; receiving, by the product feed management system from a user via the graphical user input, a validation feedback for each of the one or more standardized intermediary product categories; and updating, by the product feed management system, each of the one or more standardized intermediary product categories based on the received user validation feedback.

7. The method of claim 1 , wherein the mapping between the standardized intermediary taxonomy and the target taxonomy is predefined and stored in a database.

8. The method of claim 1 further comprising: receiving, by the product feed management system, at least one target taxonomy from the target marketplace; generating, by the product feed management system, a mapping table for mapping each of the intermediary standardized product categories within the standardized intermediary taxonomy to one or more target product categories within the target taxonomy; updating, by the product feed management system, the mapping table in response to a detected update in the target taxonomy.

9. The method of claim 1 further comprising determining, by the product feed management system, one or more mapping rules for identifying the target categories to be mapped to each of the plurality of products, and wherein the target categories are assigned to the products based on a confidence score indicating a degree of match of the respective product with the one or more target categories based on each of the one or more mapping rules.

10. The method of claim 1 further comprising: determining, by the product feed management system, one or more additional keywords to be added to the product information for one or more of the products within the received catalog; and generating the refined digital catalog by incorporating each of the one or more determined additional keywords within the product information associated with the respective one or more of the products.

11. A system for generating a refined digital catalog for publishing on a target digital marketplace, the system comprising: an input/output unit for receiving one or more inputs from and providing output to one or more user devices, one or more catalog sources, and the target marketplace; a memory unit; and a product feed management system processor operatively coupled to the input/output unit and the memory unit, the product feed management system processor including: an entity mining unit configured to: receive an initial digital catalog from one or more catalog sources, the initial digital catalog including a plurality of products offered by a merchant and product information associated with each of the plurality of products; extract one or more product entities and one or more product attribute entities associated with each of the plurality of products based on the associated product information; assign one or more standardized intermediary product categories, according to a standardized intermediary taxonomy, to each of the plurality of products based on each of the extracted one or more product entities and the product attribute entities; a categorization unit configured to: identify one or more target categories, within a target taxonomy used by the target digital marketplace, to be assigned to each of the plurality of products based on a mapping between the standardized intermediary taxonomy and the target taxonomy; and a catalog enrichment unit configured to: generate the refined catalog by assigning the identified one or more target categories to each of the plurality of products.

12. The system of claim 11 , wherein the entity mining unit is further configured to identify absence of one or more product entities or product attribute entities within the product information associated with each of the plurality of products to assign the one or more standardized intermediary product categories.

13. The system of claim 11 , wherein for extracting the one or more product entities and product attribute entities, the entity mining unit is further configured to: display, via a graphical user interface (GUI), one or more of non-mined or incorrectly mined product entities and product attribute entities for each of the plurality of products; and receive, from a user via the GUI, one or more of a user feedback or additional product entities and product attribute entities for each of the displayed non-mined or incorrectly mined product entities or product attribute entities; and update the standardized intermediary taxonomy to incorporate each of the user feedback and the one or more additional product entities and product attribute entities received from the user.

14. The system of claim 11 , wherein each standardized intermediary product category within the standardized intermediary taxonomy includes a unique identifier associated therewith, and wherein the entity mining unit is configured to assign the unique identifiers corresponding to each of the assigned standardized intermediary product categories to the products.

15. The system of claim 11 , wherein each of the one or more standardized intermediary product categories within the standardized intermediary taxonomy includes one or more of description of the category, aliases, category information in a plurality of languages, and a directed relationship with at least one other category within the standardized intermediary taxonomy.

16. The system of claim 11 , wherein the entity mining unit is further configured to: display, via a graphical user interface (GUI), each of the assigned one or more standardized intermediary product categories; receive, from a user via the GUI, a validation feedback for each of the one or more standardized intermediary product categories; and update each of the one or more standardized intermediary product categories based on the received user validation feedback.

17. The system of claim 11 , wherein the mapping between the standardized intermediary taxonomy and the target taxonomy is predefined and stored in a database.

18. The system of claim 11 , wherein the categorization unit is further configured to: receive at least one target taxonomy from the target marketplace; generate a mapping table for mapping each of the intermediary standardized product categories within the standardized intermediary taxonomy to one or more target product categories within the target taxonomy; and update the mapping table in response to a detected update in the target taxonomy.

19. The system of claim 11 , wherein the categorization unit is further configured to determine one or more mapping rules for identifying the target categories to be mapped to each of the plurality of products, and wherein the target categories are assigned to the products based on a confidence score indicating a degree of match of the respective product with the one or more target categories based on each of the one or more mapping rules.

20. The system of claim 11 , wherein the catalog enrichment unit is further configured to: determine one or more additional keywords to be added to the product information for one or more of the products within the received catalog; and generate the refined digital catalog by incorporating each of the one or more determined additional keywords within the product information associated with the respective one or more of the products.

21 . A computer readable medium comprising computer executable instructions for generating a refined digital catalog for publishing on a target digital marketplace, the computer executable instructions when executed by a processor cause the processor to: receive an initial digital catalog from a catalog source, the initial digital catalog including a plurality of products offered by a merchant and product information associated with each of the plurality of products; extract one or more product entities and one or more product attribute entities associated with each of the plurality of products based on the associated product information; assign one or more standardized intermediary product categories, according to an standardized intermediary taxonomy, to each of the plurality of products based on each of the extracted one or more product entities and the product attribute entities; identify one or more target categories, within a target taxonomy used by the target digital marketplace, to be assigned to each of the plurality of products based on a mapping between the standardized intermediary taxonomy and the target taxonomy; and generate the refined catalog by assigning the identified one or more target categories to each of the plurality of products.