CN114036246A

CN114036246A - Commodity map vectorization method and device, electronic equipment and storage medium

Info

Publication number: CN114036246A
Application number: CN202111480886.0A
Authority: CN
Inventors: 杨艳丽; 毛建新; 范亚国; 陈志刚; 高源�; 张烨
Original assignee: Guoneng Beijing Business Network Co ltd
Current assignee: Guoneng Beijing Business Network Co ltd
Priority date: 2021-12-06
Filing date: 2021-12-06
Publication date: 2022-02-11

Abstract

The disclosure provides a commodity map vectorization method, a commodity map vectorization device, electronic equipment and a storage medium, and relates to the technical field of knowledge maps. The method comprises the following steps: constructing a commodity map, wherein the commodity map at least comprises the following entity relations: user dimensions, commodity dimensions, and category dimensions; and learning the vector representation of the entity relationship in the commodity map according to the commodity data and the user interaction data acquired from the E-commerce platform to obtain a commodity map vectorization representation model. According to the method and the device, a commodity map vectorization representation model suitable for the commodity field can be obtained, and tasks such as commodity recommendation, commodity search and map completion in the commodity field can be achieved based on the commodity map vectorization model.

Description

Commodity map vectorization method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of knowledge graph technologies, and in particular, to a method and an apparatus for vectorizing a commodity graph, an electronic device, and a storage medium.

Background

The knowledge graph is a large-scale semantic network knowledge base, adopts a symbolized knowledge representation mode, describes specific knowledge by utilizing triples, represents and stores the specific knowledge in a directed graph mode, and has the advantages of rich semantics, friendly structure, easiness in understanding and the like. Due to the excellent characteristic of expressing the prior knowledge of human beings, the knowledge graph has been widely and successfully applied in the fields of natural language processing, question-answering systems, recommendation systems and the like in recent years. At present, the definition of the knowledge graph on the triples is simple, the relationship between the entities is mainly an inheritance relationship or a combination relationship, generally only a text mode exists, and the semantic relationship is simple.

Knowledge Graph Embedding (KGE) learns the vector representation of entity relationships in a Knowledge base. The knowledge graph embedding model provided in the related technology is mostly used in a graph completion task, and also has two tasks of relation extraction and intelligent question and answer, and is less applied to tasks of recommendation, search and the like in the commodity field because of the problems of noise and data sparsity of commodity graph data.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure provides a commodity map vectorization method, apparatus, electronic device, and storage medium, which at least to some extent overcome the technical problem that the knowledge map embedded model provided in the related art is not suitable for commodity map data.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to one aspect of the present disclosure, there is provided a commodity map vectorization method, including: constructing a commodity map, wherein the commodity map at least comprises the following entity relations: user dimensions, commodity dimensions, and category dimensions; and learning the vector representation of the entity relationship in the commodity map according to commodity data and user interaction data acquired from the E-commerce platform to obtain a commodity map vectorization representation model.

In some embodiments, the entity relationships of the user dimension include: co-purchasing and co-browsing; the entity relationship of the commodity dimension comprises: similar goods and goods descriptions; the entity relationship of the category dimension comprises: the hierarchical relationship of the categories to which the commodity belongs.

In some embodiments, learning a vector representation of entity relationships in the commodity map according to commodity data and user interaction data collected from an e-commerce platform to obtain a commodity map vectorization representation model, including: determining vector representation of similar commodities according to the collected commodity data; and extracting the vector representation of the hierarchical relationship of the co-purchase, co-browse, commodity description and the category to which the commodity belongs from the collected commodity data and the user interaction data by using a self-attention mechanism.

In some embodiments, determining a vector representation of similar merchandise from the collected merchandise data includes: extracting commodity titles and/or commodity description information of each commodity from the collected commodity data; performing word segmentation on the commodity title and/or the commodity description information of each commodity to obtain word segmentation results; training the word segmentation result by using a word2vec model training method to obtain a model file of the word2vec model on the commodity data set; generating vector representation of each commodity according to the commodity title of each commodity by using a model file of the word2vec model on the commodity data set; calculating the similarity between the commodities according to the vector representation of the commodities; and determining similar commodities according to the similarity among the commodities, and generating vector representation of the similar commodities.

In some embodiments, the commodity map vectorization method provided by the present disclosure further includes: and taking the marked data of the task to be executed as model output data, and performing parameter optimization on the commodity map vectorization representation model to obtain the commodity map vectorization representation model suitable for the task to be executed.

In some embodiments, the task to be performed is any one of the following: the commodity knowledge map completion task, the commodity search ordering task or the commodity recommendation task.

In some embodiments, the commodity map vectorization method provided by the present disclosure further includes: the following commodity data of a plurality of modes are collected: text, pictures or video.

According to another aspect of the present disclosure, there is also provided a commodity map vectorization apparatus, including: the commodity map building module is used for building a commodity map, wherein the commodity map at least comprises the following entity relations: user dimensions, commodity dimensions, and category dimensions; and the commodity map vectorization module is used for learning the vector representation of the entity relationship in the commodity map according to the commodity data and the user interaction data acquired from the E-commerce platform to obtain a commodity map vectorization representation model.

In some embodiments, the commodity map vectorization module is further configured to: determining vector representation of similar commodities according to the collected commodity data; and extracting the vector representation of the hierarchical relationship of the co-purchase, co-browse, commodity description and the category to which the commodity belongs from the collected commodity data and the user interaction data by using a self-attention mechanism.

In some embodiments, the commodity map vectorization module is further configured to: extracting commodity titles and/or commodity description information of each commodity from the collected commodity data; performing word segmentation on the commodity title and/or the commodity description information of each commodity to obtain word segmentation results; training the word segmentation result by using a word2vec model training method to obtain a model file of the word2vec model on the commodity data set; generating vector representation of each commodity according to the commodity title of each commodity by using a model file of the word2vec model on the commodity data set; calculating the similarity between the commodities according to the vector representation of the commodities; and determining similar commodities according to the similarity among the commodities, and generating vector representation of the similar commodities.

In some embodiments, the commodity map vectorization apparatus provided in the embodiments of the present disclosure further includes: and the model optimization module is used for taking the marking data of the task to be executed as model output data, and performing parameter optimization on the commodity map vectorization representation model to obtain the commodity map vectorization representation model suitable for the task to be executed.

In some embodiments, the commodity map vectorization apparatus provided by the present disclosure further includes: the data acquisition module is used for acquiring the following commodity data in multiple modes: text, pictures or video.

According to another aspect of the present disclosure, there is also provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute any one of the above commodity map vectorization methods via execution of the executable instructions.

According to another aspect of the present disclosure, there is also provided a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the method for vectorizing a commodity map according to any one of the above.

The commodity map vectorization method, the commodity map vectorization device, the electronic equipment and the storage medium, provided by the embodiment of the disclosure, are used for constructing a commodity map containing entity relations such as user dimensions, commodity dimensions and category dimensions, learning vector representation of the entity relations in the commodity map according to commodity data and user interaction data collected from an e-commerce platform, and obtaining a commodity map vectorization representation model suitable for the commodity field.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

Fig. 1 shows a flow chart of a commodity map vectorization method in an embodiment of the present disclosure;

FIG. 2 illustrates a flow chart of vectorizing a commodity map in an embodiment of the present disclosure;

FIG. 3 illustrates a flow diagram for determining a vector representation of similar goods in an embodiment of the present disclosure;

FIG. 4 illustrates a commodity map schematic in an embodiment of the present disclosure;

FIG. 5 illustrates a flow chart for modeling commodity similarity relationships in an embodiment of the present disclosure;

fig. 6 is a schematic diagram illustrating a specific implementation of a commodity map vectorization method in an embodiment of the present disclosure;

FIG. 7 is a flow chart illustrating the construction of a self-attention model in an embodiment of the present disclosure;

fig. 8 is a schematic diagram illustrating a commodity map vectorization apparatus according to an embodiment of the present disclosure;

fig. 9 shows a block diagram of an electronic device in an embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

Freebase: a knowledge base in the open field comprises more than 4000 ten thousand entities, more than ten thousand attribute relations and more than 24 hundred million fact triples;

DBpedia: an open domain knowledge base comprising 400 more than ten thousand entities, 48293 attribute relationships, 10 million fact triplets;

YAGO: an open-domain knowledge base comprising 980 million entities, more than 100 attribute relationships, 1 million more fact triples;

NLP: english is called Natural Language Processing, and is translated into Natural Language Processing;

word2 vec: a simple neural network text embedding model;

skip-gram: a network structure used for predicting a subsequent vocabulary in a word2vec model;

KGE: english is called Knowledge Graph Embedding, translated into 'Knowledge map Embedding', a Knowledge map-based vector generation technology;

BERT: a pre-trained language model based on large-scale language information;

GCN: english is called Graph convolution Network, translated into 'Graph convolution neural Network';

MAP @ 10: average accuracy of the first 10 results;

HIT @ 10: when the correct result exists in the current 10 results, the accuracy of all outputs is high;

NDCG @ 10: normalized differentiated cumulative gain, wherein results with high degree of correlation have higher scores;

pair-wise: and setting a hinge loss function to calculate the score difference m between the correct answer and the wrong answer, constructing a sequencing model to enable the loss function output to be larger than m, and sequencing the final result.

Point-wise: and (4) regarding the sorting problem as a two-classification problem, and performing two-classification sorting on all the results to obtain the highest score.

KBQA: knowledge graph-based question-answering techniques;

Transformer: the coding and decoding model constructed based on the Attention mechanism comprises an encoder and a decoder, is widely applied to a plurality of NLP tasks including machine translation and the like, and achieves better effect.

Softmax: a function for performing a normalization calculation while preserving input features.

The present exemplary embodiment will be described in detail below with reference to the drawings and examples.

Most of the existing map embedding technologies rely on public map data sets, such as Freebase, DBpedia and YAGO, and are widely applied to NLP tasks such as information extraction, semantic analysis and intelligent question answering. For the common knowledge-graph embedding technique, h and t are generally used to represent the head and tail entities of a triple, and r is used to represent the relationship. Can be summarized as follows:

in TransE, the relationship of triples is learned using a distance function:

d_r(h,t)＝∑||h+r-t||；

where, | | | represents the L1 or L2 norm.

In TransH, relationship-based entity embedding is further contemplated, e.g., will

And

defined as the physical projection of h and t onto the relational hyperplane. The distance on a triplet may be defined as:

TransR and TransH have similar ideas, but differ in that TransR takes into account the linear subspace of the relationship constraint. TransD adds more constraints to the linear subspace of the relationship constraints to achieve higher efficiency.

Distributed characterization based on word2vec. W2V learns the distributed characterization of words using the Skip-gram model, with the score function defined as the sum of the logarithmic probabilities on the text S:

S＝∑_i∑_{j∈context(i,c)}log p(e_i|e_j)；

where Context (i, c) represents the set of neighbors of entity i in the c window size. Each probability will be calculated using the Softmax function.

The core idea of the self-attention mechanism is based on an assumption: only part of the information in the input sequence is relevant for the output. The self-attention mechanism is widely applied to the fields of image recognition and machine translation, and the self-attention mechanism is recently best evaluated in various models based on transformers.

The definition of a triple of a data source of a general map is relatively simple, most of relations are defined by isA, hasA and the like, the modes are few, and only text modes are available generally; semantic relationships are also relatively poor. The existing knowledge graph embedding technology cannot achieve good effects on tasks such as recommendation, search, entity alignment and the like in the commodity field, and mainly cannot well solve the problems of noise and data sparsity in the commodity field or the commodity knowledge graph.

In order to solve the problems of noise and sparseness of data in the commodity field, the data characteristics of the commodity field are increased on the basis of the traditional map embedding technology, and a data organization form of the commodity field provides a commodity map vectorization model in the commodity field so as to be suitable for tasks of commodity recommendation, commodity search, commodity map completion and the like in the commodity field.

First, the embodiment of the present disclosure provides a commodity map vectorization method, which may be executed by any electronic device with computing processing capability.

Fig. 1 shows a flowchart of a commodity map vectorization method in an embodiment of the present disclosure, and as shown in fig. 1, the commodity map vectorization method provided in the embodiment of the present disclosure includes the following steps:

s102, constructing a commodity map, wherein the commodity map at least comprises the following entity relations: user dimensions, commodity dimensions, and category dimensions.

The product map in the embodiments of the present disclosure refers to a knowledge map in the product field, and may be applied to tasks such as product recommendation and product search in the product field.

In some embodiments, the entity relationship of the user dimension includes, but is not limited to: co-purchasing and co-browsing; the entity relationships of the commodity dimensions include, but are not limited to: similar goods and goods descriptions; entity relationships for a class dimension include, but are not limited to: the hierarchical relationship of the categories to which the commodity belongs. It should be noted that the co-purchasing and co-browsing in the embodiment of the present disclosure may be a user relationship of co-purchasing or browsing the same product, or may be a user relationship of co-purchasing or browsing similar products.

The commodity map constructed in the embodiment of the disclosure can provide a calculation mode of knowledge input and knowledge superposition for embedding the commodity knowledge map, provide a check standard for a downstream knowledge completion task, and provide verification data for commodity quantity representation of similar commodities.

And S104, learning the vector representation of the entity relationship in the commodity map according to the commodity data and the user interaction data acquired from the E-commerce platform to obtain a commodity map vectorization representation model.

In addition, commodity purchasing data, commodity browsing data, commodity alternative records, commodity searching data, commodity clicking data, commodity description information, commodity category hierarchical label information and other data of each user can be acquired from the E-commerce platform, and the constructed commodity map is learned by the data, so that a commodity map embedding model which is relatively universal and accurate in the commodity field can be obtained. In addition, as the commodity data collected by the e-commerce platform comprises data of multiple modes, in the embodiment of the disclosure, the constructed commodity map is learned according to the commodity data collected by the e-commerce platform, and the obtained commodity map embedded model can still work normally when the commodity data of one or more modes fails.

In some embodiments, the commodity map provided in the embodiments of the present disclosure and the commodity map embedded model obtained by learning the commodity map can be well applied to the commodity fields such as purchasing.

It should be noted that the task to be executed may be, but is not limited to, any one of the following: the method comprises a commodity knowledge graph completion task (namely predicting unknown triples according to the existing triples in the knowledge graph), a commodity search sequencing task or a commodity recommendation task.

Different downstream tasks adopt different evaluation indexes, and for a knowledge completion task, evaluation is mainly performed on link prediction, so that on the evaluation of the task, whether the position of a prediction entity is accurate or not is mainly evaluated, and Top-10 accuracy (HIT @10) is used as the evaluation index. For the search sorting task, mainly evaluating whether a sorting result obtained by calculation according to a generated vector accords with a search intention, and therefore calculating the recall rate and the average accuracy (MAP @10) of Top-10; for the recommendation task, whether the ordering of the commodity recommendation result is reasonable or not is mainly considered, so HIT @10 and NDCG @10 are used as evaluation indexes.

In some embodiments, the commodity map vectorization method provided by the present disclosure further includes the following steps: the following commodity data of a plurality of modes are collected: text, pictures or video. The commodity map vectorization representation model is constructed according to the multi-modal commodity data, so that the constructed commodity map vectorization representation model can still work normally when one or more modal commodity data are invalid.

In some embodiments, as shown in fig. 2, the method for vectorizing a commodity map provided in the embodiments of the present disclosure may vectorize the commodity map by the following steps:

s202, determining vector representation of similar commodities according to the collected commodity data;

and S204, extracting vector representation of the hierarchical relationship of the co-purchase, co-browsing, commodity description and the category to which the commodity belongs from the collected commodity data and the user interaction data by using a self-attention mechanism.

It should be noted that the commodity similarity relationship is an entity link which is most widely applied in the commodity map, and is applied to downstream tasks such as searching and recommending. When the commodity map embedded model is constructed, the commodity similarity relation has certain influence on various relations. Therefore, in the embodiment of the disclosure, the vector representation of the similar goods is determined according to the collected goods data, and then the vector representation of the hierarchical relationship between the co-purchasing, the co-browsing, the goods description and the category to which the goods belong is extracted from the collected goods data and the user interaction data by using the self-attention mechanism, and finally the goods map vectorization representation model is obtained.

In some embodiments, the BERT-based generated vectors may be used as input to commodity-like tasks, and the GCN may also be used to output vectorized tokens in a commodity map.

Words with similar literal meanings have similar characterization patterns according to the distributed language embedding model assumption. And the propagation rule shows that in the word2vec model output, there are vector outputs with close distances. It can therefore be speculated that commodities having a similar relationship should also have a similar property to the preceding, i.e. a vector representation method with close distances between them. Thus, in some embodiments, as shown in fig. 3, the method for vectorizing the commodity map provided in the embodiments of the present disclosure may determine the vector representation of similar commodities by:

s302, extracting commodity titles and/or commodity description information of each commodity from the collected commodity data;

s304, performing word segmentation on the commodity title and/or the commodity description information of each commodity to obtain word segmentation results;

s306, training the word segmentation result by using a word2vec model training method to obtain a model file of the word2vec model on the commodity data set;

s308, generating vector representation of each commodity according to the commodity title of each commodity by using the model file of the word2vec model on the commodity data set;

s308, calculating the similarity among the commodities according to the vector representation of the commodities; and determining similar commodities according to the similarity among the commodities, and generating vector representation of the similar commodities.

Fig. 4 shows a schematic diagram of a commodity map in the embodiment of the present disclosure, as shown in fig. 2, the commodity map includes 3 dimensions and 11 relationships.

The 3 dimensions are as follows:

1) user dimension: a user name;

2) the commodity dimension is as follows: commodity SKU, brand, commodity attributes/key attributes;

3) category dimension: primary grade, secondary grade and tertiary grade.

It should be noted that, in principle, in the embodiments of the present disclosure, there may be more levels regarding the categories to which the commodities belong, and the present disclosure does not limit this, and in the embodiments, three levels are taken as examples for illustration.

The 11 relationships are as follows:

1) primary category-including-secondary category, wherein the primary category is the top-level commodity category, and the secondary category is a sub-category of the primary category;

2) secondary category-including-tertiary category, wherein tertiary category is a sub-category of secondary category, also final commodity category;

3) class one-comprises-class three;

4) item-comprises-brand;

5) SKU-Attribute X-Attribute value;

6) class-contains-class attribute;

7) commodity-similar-commodity;

8) user-buy-goods;

9) user-browse-goods;

10) user-purchase of similar goods-user;

11) user-browse similar goods-user.

Fig. 5 is a flowchart illustrating modeling of a commodity similarity relationship in an embodiment of the present disclosure, and as shown in fig. 5, the method specifically includes:

s502, data processing:

collecting a commodity and user data set, and designing a commodity map body structure according to a commodity map structure shown in FIG. 4; and processing structured data, semi-structured data, unstructured data and noise data in the commodity and user data sets, so that the processed data can reach the readable degree of the model, and a triple result for constructing the map is generated. Processing operations include, but are not limited to:

meaning conversion is carried out on json data;

-unstructured data segmentation/clause: separating paragraphs, sentences and phrases according to the existing characters such as punctuations, carriage returns, spaces and the like in the paragraphs;

english/punctuation processing: identifying English words independently and removing punctuation marks;

cleaning special characters: removing special characters from the text or converting to a machine recognizable form.

S504, constructing a map:

s506, model construction, may be achieved by S5061, S5062, S5063, S5064, and S5065.

In S5061, a participle processing method such as jieba, hanlp, ictclas, and thulac is used to perform participle processing on the commodity title and/or commodity description information, a dictionary is added in combination with a manual labeling mode, and a verification data set is constructed by using a commodity alternative data set of a user and a manual checking and proofreading mode.

And in S5062, performing additional training on the word segmentation result in S5061 by using a word2vec.train method, if the data volume is enough, retraining the model, and if the data volume is not enough, adding the data set output by S5061 on the basis of the public data set, and finally obtaining a model file of the word2vec on the commodity data set.

In S5063, model parameters, such as skip-gram/c-bow, dimension size, iteration number, etc., are optimized according to the verification result.

In S5064, the commodity titles of the commodities in the commodity data set are input into the constructed model, the commodity vectors are output, the distance from each commodity vector to other commodity vectors is calculated using cosine similarity, the calculation result is normalized to an interval of 0 to 1, and the calculation result is sorted according to the score.

For vectors u, v, the remaining chord similarity calculation is as follows:

in S5065, the model output is verified to obtain a verification result.

S508, index evaluation:

and counting the accuracy of the Top-5 result, optimizing the model until the accuracy of the model is not changed any more, and outputting the generated commodity quantity and the model file of the word2vec model on the commodity data set so as to generate vectorization representation of similar commodity relation by using the model file.

Fig. 6 is a schematic diagram illustrating a specific implementation of a commodity map vectorization method in the embodiment of the present disclosure, and as shown in fig. 6, in the embodiment of the present disclosure, commodity information and user interaction information are used as original data, and Z _ input is used as a core, so as to establish a map vector suitable for an enterprise purchasing commodity knowledge map. And Z _ input represents vectorization representation of commodity input, and is used as a commodity map embedding method of the core to input vectors of an embedding layer and a self-attention layer to each downstream task.

As shown in fig. 6, in the embodiment of the present disclosure, knowledge completion, search ordering, commodity similarity, and commodity recommendation of the commodity map are selected as downstream tasks of the commodity map embedded representation, and a vector construction effect is detected. After the downstream task is set, the output of the trainer is taken as the labeled data of the downstream task, and the Z is^IAnd fine adjustment is carried out, and a vector construction result aiming at a specific task can be obtained.

In order to extract the co-purchase, co-browse, and product description relationships from the customer order, browse, and search behaviors, respectively, in the embodiments of the present disclosure, a specific self-attention model is used. Here, Z ^ O is used to represent the output vector of the commodity. Likewise, the sequence length 1 will vary from task to task.

Self-attentive embedding layer: an ordered sequence of entities is used as input. To model the location information, the self-attention mechanism uses the correspondence of location k to vector P_k∈R^dThe position of (2) is encoded. The physical sequence length is clipped to a maximum length l, denoted by e ═ e 1. The embedding layer calculates an entity vector Z^IOr Z^OThe sum of (a) and the corresponding position code P, the output is expressed by the following formula:

wherein Z is^OVector output for a co-purchasing relationship or co-browsing relationship of goods, Z^IAnd inputting a commodity vector.

② self-attention layer: using the attention mechanism based on dot product expansion as a basis, the following is defined:

wherein Q^TThe transpose matrix of all query sequences Q is represented, K represents all keywords of the query key, V represents values of all keywords, and d represents the dimensionality of the vector. Since each row Q, K, V corresponds to an entity, the dot product attention layer will output the sum of all weighted entity vectors in V, where the weights reflect the pair-wise relationship of "query-key" in the entity sequence.

According to the distributed characterization arrangement in the embodiments of the present disclosure, the query sequence Q can be naturally considered as E^IThe key word K of the query key is E^OBut they also contain textual information and location information input for the entity pair. Likewise, use of E^IAs the value V of the key. The output of the Attention layer may therefore be defined as H-Attention (E)^I，E^O，E^I). As to how the attention layer assigns weights to all entities, the H function can be broken down into

Wherein the weight is

For capturing input E^I，E^OElement e in (1)_i、e_jThe relation between its text and location information.

However, calculating E directly^I，E^OThe inner product between cannot contain mutual information between different implicit dimensions. This impairs the ability to characterize the attention layer. Thus, before the attention tier is built up, a two-tier point-wise feed-forward network will be added to the input and output of the entity.

③ predicting the layer: when modeling user purchase and browsing data, the preceding purchased or browsed merchandise input may be used directly to predict the merchandise to be purchased or browsed. Similar to word2vec, p (e)_l+1|e₁…，e_l) As output, where e can take the value of a word token or commodity.

(iii) a scoring function: the output of the prediction layer can be used directly to calculate the score for the good, then:

wherein the PD is a data set for searching, purchasing, browsing and describing information of the commodity.

The specific implementation process of the commodity map vectorization method provided by the embodiment of the disclosure comprises the following steps:

1) data processing:

co-purchasing: dividing users with common purchasing behavior into a group, and taking SKUID or commodity keywords as grouping labels according to the division rule that more than two same commodities are purchased together in the past 6 months.

Co-browsing: dividing users with common purchasing behavior into a group, and taking SKUID or commodity keywords as grouping labels according to the division rule based on that more than three same commodities are browsed together in the past 6 months.

Description of the goods: the description of the good contains information of multiple modalities, including but not limited to: the method comprises the steps of commodity video, a commodity main graph, a commodity basic description, commodity details and commodity attributes (basic attributes, specification packages, sales attributes, special attributes and the like).

The video processing method mainly comprises the steps of identifying key frames and then converting the information of the key frames into characters in an image identification mode.

The commodity main graph is also processed by an image identification mode, is converted into an image segmentation problem firstly, identifies the boundary of a main area of the picture, classifies articles or main elements in the area and converts the articles or the main elements into text information;

the commodity details are mostly displayed in a picture mode, and for the commodity details, OCR technology is used for recognizing text information in the commodity details, error correction is carried out according to word segmentation rule alignment, and nonsense words and symbols are removed;

the commodity attribute can be directly subjected to word segmentation processing according to the attribute name and the attribute value.

And (4) commodity level: the product structure information of the first, second and third levels is treated as one of the attributes of the product, and the treatment method is the same as the above.

2) Constructing a self-attention layer, as shown in FIG. 7, comprising:

firstly, constructing input vectors, uniformly defining vector dimensions as 128, and calculating average vectors according to the number of superposition times n after the vectors are superposed due to different modal number and attribute number of each commodity, namely

Construction QKV:

wherein the Attention (Q, K, V) function can be converted into H ═ Attention (E)^I，E^O，E^I) A function to calculate respective corresponding values of QKV converted therein;

calculating the score of each vector:

the first stage, in which the product of the input sequence Q and its corresponding key K is looked up, calculates s [ ] in a dot product manner.

And in the second stage, the calculation result s [ ] of F (Q, K) is normalized and the characteristic is highlighted, the work is completed by using a Softmax function, and the output is recorded as a [ ].

Weighted summation to obtain the Attention value:

the Value of the Attention is obtained by calculating the product of the Value of each a [ ] and the Value.

3) Generating the vector of the four data by using the Attention mechanism

QKV in the Attention are assigned in a classified manner, and different K values are respectively assigned to the common purchasing relation and the common browsing relation, namely E^OThen, the Attention output is calculated according to the method in step 3, and the Attention vector is taken as the output.

4) Setting evaluation indexes and knowledge maps:

and constructing a knowledge graph according to the commodity ontology shown in the figure 4, and evaluating the application effect of the knowledge graph on each task according to the evaluation indexes of different downstream tasks, wherein the knowledge graph is mainly used for verifying the accuracy of purchasing knowledge completion on tasks such as co-purchasing and browsing.

5) Applying vectors in evaluation indexes and maps, verifying vector effects, and outputting optimal models

In the knowledge completion task, a node and relation prediction task is mainly constructed, based on d | | | h + r-t | |, the distance or similarity of candidate nodes is calculated by using the vector generated in the step 4), and the prediction accuracy is evaluated;

in a commodity searching task, calculating the cosine distance between the vector generated in the step 4) and an input vector or other result vectors to enhance the recall effect, and evaluating the Top-10 accuracy (sorting from high to low according to the similarity calculated by the cosine distance, taking the Top 10 results, and manually evaluating the correct number of the 10 results);

in the commodity recommendation task, the vector generated in the step 4) is used, the recall effect is enhanced in a double-tower model or a collaborative filtering model, and the HIT @10 index is evaluated (the top 10 results returned by the task are taken, and the correct number of the 10 results is manually evaluated).

6) And (3) optimizing the parameters in the steps 2) and 3) according to the index evaluation result until the task evaluation index tends to be stable, taking the output in the step 3) as a final output, and packaging the steps into an end-to-end model file as an engineering method of the embodiment of the disclosure.

In summary, the commodity map vectorization method provided in the embodiments of the present disclosure can achieve, but is not limited to, the following technical effects: modeling a learning object into a discrete event sequence learning problem, similar to a neural network language translation model, and avoiding the problem of data noise in a commodity map; constructing multi-mode data, and solving the problem of data sparsity; the commodity knowledge graph is constructed, and the embedded representation based on the commodity knowledge graph is learned, so that the application effect of entity completion of commodities and the like in the graph can be improved; fourthly, the search sequencing result is enriched through the vector representation embedded in the commodity map, and the application effect of the commodity map in commodity search can be improved; and fifthly, calculating the commodity map embedding vector, increasing the dimensionality of the recalled commodities in recommendation, and improving the application effect of the commodity map on commodity recommendation.

Based on the same inventive concept, the embodiment of the present disclosure further provides a commodity map vectorization device, as in the following embodiments. Because the principle of the embodiment of the apparatus for solving the problem is similar to that of the embodiment of the method, the embodiment of the apparatus can be implemented by referring to the implementation of the embodiment of the method, and repeated details are not described again.

Fig. 8 shows a schematic diagram of a commodity map vectorization apparatus in an embodiment of the present disclosure, as shown in fig. 8, the apparatus includes: a commodity map construction module 81 and a commodity map vectorization module 82.

The commodity map building module 81 is configured to build a commodity map, where the commodity map at least includes an entity relationship of the following dimensions: user dimensions, commodity dimensions, and category dimensions; and the commodity map vectorization module 82 is used for learning the vector representation of the entity relationship in the commodity map according to the commodity data and the user interaction data acquired from the e-commerce platform to obtain a commodity map vectorization representation model.

In some embodiments, the entity relationship of the user dimension includes: co-purchasing and co-browsing; the entity relationship of the commodity dimension comprises: similar goods and goods descriptions; entity relationships for the category dimensions include: the hierarchical relationship of the categories to which the commodity belongs.

In some embodiments, the commodity map vectorization module 82 is further configured to: determining vector representation of similar commodities according to the collected commodity data; and extracting the vector representation of the hierarchical relationship of the co-purchase, co-browse, commodity description and the category to which the commodity belongs from the collected commodity data and the user interaction data by using a self-attention mechanism.

In some embodiments, the commodity map vectorization module 82 is further configured to: extracting commodity titles and/or commodity description information of each commodity from the collected commodity data; performing word segmentation on the commodity title and/or the commodity description information of each commodity to obtain word segmentation results; training the word segmentation result by using a word2vec model training method to obtain a model file of the word2vec model on the commodity data set; generating vector representation of each commodity according to the commodity title of each commodity by using a model file of the word2vec model on the commodity data set; calculating the similarity between the commodities according to the vector representation of the commodities; and determining similar commodities according to the similarity among the commodities, and generating vector representation of the similar commodities.

In some embodiments, as shown in fig. 8, the commodity map vectorization device provided in the embodiments of the present disclosure further includes: and the model optimization module 83 is configured to use the annotation data of the task to be executed as model output data, and perform parameter optimization on the commodity map vectorization representation model to obtain a commodity map vectorization representation model suitable for the task to be executed.

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 900 according to this embodiment of the disclosure is described below with reference to fig. 9. The electronic device 900 shown in fig. 9 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present disclosure.

As shown in fig. 9, the electronic device 900 is embodied in the form of a general purpose computing device. Components of electronic device 900 may include, but are not limited to: the at least one processing unit 910, the at least one memory unit 920, and a bus 930 that couples various system components including the memory unit 920 and the processing unit 910.

Wherein the storage unit stores program code that is executable by the processing unit 910 to cause the processing unit 910 to perform steps according to various exemplary embodiments of the present disclosure described in the above section "exemplary method" of the present specification. For example, the processing unit 910 may perform the following steps of the above method embodiments: constructing a commodity map, wherein the commodity map at least comprises the following entity relations: user dimensions, commodity dimensions, and category dimensions; and learning the vector representation of the entity relationship in the commodity map according to the commodity data and the user interaction data acquired from the E-commerce platform to obtain a commodity map vectorization representation model.

The storage unit 920 may include a readable medium in the form of a volatile storage unit, such as a random access memory unit (RAM)9201 and/or a cache memory unit 9202, and may further include a read only memory unit (ROM) 9203.

Storage unit 920 may also include a program/utility 9204 having a set (at least one) of program modules 9205, such program modules 9205 including but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 930 can be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 900 may also communicate with one or more external devices 940 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 900, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 900 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interface 950. Also, the electronic device 900 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) via the network adapter 960. As shown, the network adapter 960 communicates with the other modules of the electronic device 900 via the bus 930. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 900, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium, which may be a readable signal medium or a readable storage medium. On which a program product capable of implementing the above-described method of the present disclosure is stored. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the disclosure described in the "exemplary methods" section above of this specification, when the program product is run on the terminal device.

More specific examples of the computer-readable storage medium in the present disclosure may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In the present disclosure, a computer readable storage medium may include a propagated data signal with readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Alternatively, program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

In particular implementations, program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A commodity map vectorization method is characterized by comprising the following steps:

constructing a commodity map, wherein the commodity map at least comprises the following entity relations: user dimensions, commodity dimensions, and category dimensions;

and learning the vector representation of the entity relationship in the commodity map according to commodity data and user interaction data acquired from the E-commerce platform to obtain a commodity map vectorization representation model.

2. The commodity map vectorization method according to claim 1, wherein the entity relationship of the user dimension comprises: co-purchasing and co-browsing; the entity relationship of the commodity dimension comprises: similar goods and goods descriptions; the entity relationship of the category dimension comprises: the hierarchical relationship of the categories to which the commodity belongs.

3. The commodity map vectorization method according to claim 1, wherein learning a vector representation of entity relationships in the commodity map according to commodity data and user interaction data collected from an e-commerce platform to obtain a commodity map vectorization representation model comprises:

determining vector representation of similar commodities according to the collected commodity data;

and extracting the vector representation of the hierarchical relationship of the co-purchase, co-browse, commodity description and the category to which the commodity belongs from the collected commodity data and the user interaction data by using a self-attention mechanism.

4. The commodity map vectorization method according to claim 2, wherein determining vector representations of similar commodities according to the collected commodity data comprises:

extracting commodity titles and/or commodity description information of each commodity from the collected commodity data;

performing word segmentation on the commodity title and/or the commodity description information of each commodity to obtain word segmentation results;

training the word segmentation result by using a word2vec model training method to obtain a model file of the word2vec model on the commodity data set;

generating vector representation of each commodity according to the commodity title of each commodity by using a model file of the word2vec model on the commodity data set;

calculating the similarity between the commodities according to the vector representation of the commodities;

and determining similar commodities according to the similarity among the commodities, and generating vector representation of the similar commodities.

5. The commodity map vectorization method according to claim 1, wherein the method further comprises:

and taking the marked data of the task to be executed as model output data, and performing parameter optimization on the commodity map vectorization representation model to obtain the commodity map vectorization representation model suitable for the task to be executed.

6. The commodity map vectorization method according to claim 5, wherein the task to be executed is any one of: the commodity knowledge map completion task, the commodity search ordering task or the commodity recommendation task.

7. The commodity map vectorization method according to any one of claims 1 to 6, wherein the method further comprises:

the following commodity data of a plurality of modes are collected: text, pictures or video.

8. A commodity map vectorization device is characterized by comprising:

the commodity map building module is used for building a commodity map, wherein the commodity map at least comprises the following entity relations: user dimensions, commodity dimensions, and category dimensions;

and the commodity map vectorization module is used for learning the vector representation of the entity relationship in the commodity map according to the commodity data and the user interaction data acquired from the E-commerce platform to obtain a commodity map vectorization representation model.

9. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to execute the commodity map vectorization method of any one of claims 1 to 7 via execution of the executable instructions.

10. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the commodity map vectorization method according to any one of claims 1 to 7.