CN115545808A - Data alignment method, device and equipment for E-commerce commodities - Google Patents

Data alignment method, device and equipment for E-commerce commodities Download PDF

Info

Publication number
CN115545808A
CN115545808A CN202211533043.7A CN202211533043A CN115545808A CN 115545808 A CN115545808 A CN 115545808A CN 202211533043 A CN202211533043 A CN 202211533043A CN 115545808 A CN115545808 A CN 115545808A
Authority
CN
China
Prior art keywords
commodity
vector
standard
commodities
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211533043.7A
Other languages
Chinese (zh)
Inventor
牟昊
周俊贤
何宇轩
徐亚波
李旭日
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Datastory Information Technology Co ltd
Original Assignee
Guangzhou Datastory Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Datastory Information Technology Co ltd filed Critical Guangzhou Datastory Information Technology Co ltd
Priority to CN202211533043.7A priority Critical patent/CN115545808A/en
Publication of CN115545808A publication Critical patent/CN115545808A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0605Supply or demand aggregation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Computational Linguistics (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Marketing (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Game Theory and Decision Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data alignment method, a device and equipment for E-commerce commodities. When data alignment operation is carried out, firstly, data preprocessing is carried out on commodity information to obtain a first standard vector, and first matching operation is carried out according to the first standard vector and a second standard vector of each standard commodity in a vector library to obtain candidate commodities; and then carrying out second matching operation on the first splicing vector and the second splicing vector of each candidate commodity to obtain a target commodity, and aligning the data of the commodity to be aligned according to the target commodity. By adopting the embodiment of the invention, the commodity information utilized when the commodity information is matched with the vector library is relatively comprehensive, and the accuracy of the E-commerce commodity in data alignment can be effectively improved through two matching operations.

Description

Data alignment method, device and equipment for E-commerce commodities
Technical Field
The invention relates to the technical field of data processing, in particular to a data alignment method, device and equipment for E-commerce commodities.
Background
Along with the development of the internet and electronic commerce, more and more commodities can be sold on an e-commerce platform, and great convenience is brought to the life of people. For the same type of commodity, due to personalized release of different merchants, the commodity standardization is caused, the structuralization degree is low, and therefore a commodity alignment technology is needed to align a plurality of commodities which have different display contents and are sold to the same commodity, so that valuable contents such as commodity selling condition statistics, customer evaluation analysis and the like can be completed.
The existing E-commerce commodity alignment is mainly completed through title alignment, including a learning mode represented by a bag-of-words model and a bert text, representing a commodity as a vector through a title, and on the basis, completing the commodity alignment based on a vector retrieval mode. The prior art mainly uses a commodity title and a pre-constructed standard database to perform text similarity matching, and completes commodity alignment when the similarity meets the condition. However, the method of matching similarity by using the title of a product alone cannot use all the information of the product, which results in a low accuracy of product alignment.
Disclosure of Invention
The embodiment of the invention aims to provide a data alignment method, a device and equipment for E-commerce commodities, which can effectively improve the accuracy of the E-commerce commodities in data alignment.
In order to achieve the above object, an embodiment of the present invention provides a data alignment method for an e-commerce product, including:
acquiring commodity information of commodities to be aligned from a commodity detail page of an e-commerce platform; the commodity information comprises commodity pictures, commodity titles and commodity parameters;
carrying out data preprocessing on the commodity information to obtain a plurality of first standard vectors;
respectively carrying out first matching operation on the plurality of first standard vectors and a second standard vector of each standard commodity in a pre-constructed vector library so as to obtain a plurality of candidate commodities from the vector library;
splicing first standard vectors belonging to the same commodity to be aligned to obtain a first spliced vector;
performing second matching operation on the first splicing vector and a second splicing vector of each candidate commodity to acquire a target commodity from the candidate commodities;
and aligning the data of the commodity to be aligned according to the target commodity.
As an improvement of the above scheme, the performing data preprocessing on the commodity information includes:
converting the commodity information into a corresponding first initial vector by using a preset data conversion model; and dividing the first initial vector by the modular length of the first initial vector to obtain the first standard vector.
As an improvement of the above solution, the converting the commodity information into a corresponding first initial vector by using a preset data conversion model includes:
converting the commodity picture into a corresponding first initial picture vector by using a ResNets model;
extracting character information in the commodity picture by using an OCR (optical character recognition) model, and converting the character information into a corresponding first initial character vector by using a bert model;
converting the commodity title into a corresponding first initial title vector by using a bert model;
and splicing the data in the commodity parameters, and converting the spliced commodity parameters into corresponding first initial parameter vectors by using a bert model.
As an improvement of the above solution, the performing a first matching operation on the plurality of first standard vectors and a second standard vector of each standard commodity in a pre-constructed vector library respectively includes:
respectively carrying out dot product calculation on the plurality of first standard vectors and a second standard vector of each standard commodity in a vector library constructed in advance to obtain a plurality of dot product values;
obtaining weight values corresponding to the plurality of first standard vectors;
obtaining the similarity value of the commodity to be aligned and the standard commodity by weighting the dot product value according to the weight value;
and when the similarity values of the to-be-aligned commodities and all the standard commodities are calculated, acquiring candidate commodities from all the standard commodities according to the similarity values and a preset candidate rule.
As an improvement of the above scheme, the candidate rule includes:
sequencing all similarity values according to the numerical sequence from large to small;
acquiring front K similarity values, and taking the standard commodities corresponding to the front K similarity values as the candidate commodities; wherein K is an integer and is more than or equal to 2.
As an improvement of the above solution, the performing a second matching operation on the first mosaic vector and a second mosaic vector of each candidate commodity includes:
splicing the first splicing vector and each second splicing vector to obtain a plurality of high-dimensional vectors;
inputting the high-dimensional vectors into a preset similarity calculation model so that the similarity calculation model outputs corresponding similarity scores;
obtaining a maximum value in the similarity scores;
and when the maximum value is larger than a preset similarity threshold value, taking the candidate commodity corresponding to the maximum value as the target commodity.
As an improvement of the above scheme, the construction method of the vector library includes:
acquiring commodity information of a commodity to be processed from a commodity detail page of an official platform;
carrying out data preprocessing on the commodity information of the commodity to be processed to obtain a plurality of second standard vectors;
splicing second standard vectors belonging to the same commodity to be processed to obtain a second spliced vector;
and constructing the vector library according to the second standard vector and the second splicing vector.
As an improvement of the above, the method further comprises:
and when the commodity to be aligned in the commodity detail page comprises at least two pictures, acquiring a first picture displayed as the commodity picture.
In order to achieve the above object, an embodiment of the present invention further provides a data alignment apparatus for an e-commerce product, including:
the commodity information acquisition module is used for acquiring the commodity information of the commodity to be aligned from the commodity detail page of the e-commerce platform; the commodity information comprises commodity pictures, commodity titles and commodity parameters;
the data preprocessing module is used for preprocessing the commodity information to obtain a plurality of first standard vectors;
the first matching module is used for performing first matching operation on the plurality of first standard vectors and a second standard vector of each standard commodity in a pre-constructed vector library respectively so as to obtain a plurality of candidate commodities from the vector library;
the vector splicing module is used for splicing the first standard vectors belonging to the same commodity to be aligned to obtain a first spliced vector;
the second matching module is used for performing second matching operation on the first splicing vector and a second splicing vector of each candidate commodity so as to acquire a target commodity from the candidate commodities;
and the alignment module is used for aligning the data of the commodities to be aligned according to the target commodity.
In order to achieve the above object, an embodiment of the present invention further provides a data alignment apparatus for an e-commerce product, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor executes the computer program to implement the data alignment method for the e-commerce product according to any one of the above embodiments.
Compared with the prior art, the data alignment method, the device and the equipment for the E-commerce commodities, disclosed by the invention, can be used for acquiring the commodity picture, the commodity title and the commodity parameters of the commodity to be aligned from the commodity detail page of the E-commerce platform, and can accurately reflect the information of the commodity due to the comprehensive acquired commodity information data. Firstly, carrying out data preprocessing on the commodity information to obtain a plurality of first standard vectors, and carrying out first matching operation according to the first standard vectors and a second standard vector of each standard commodity in a vector library to obtain a plurality of candidate commodities; then splicing the first standard vectors belonging to the same commodity to be aligned to obtain a first spliced vector; and finally, performing second matching operation on the first splicing vector and the second splicing vector of each candidate commodity to obtain a target commodity, and aligning the data of the commodity to be aligned according to the target commodity. By adopting the embodiment of the invention, the commodity information utilized when the commodity information is matched with the vector library is relatively comprehensive, and the accuracy of the E-commerce commodity in data alignment can be effectively improved through two matching operations.
Drawings
Fig. 1 is a flowchart of a data alignment method for an e-commerce product according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a similarity calculation model provided in an embodiment of the present invention;
fig. 3 is a block diagram of a data alignment apparatus for an e-commerce product according to an embodiment of the present invention;
fig. 4 is a block diagram of a data alignment apparatus for an e-commerce product according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart of a data alignment method for an e-commerce product according to an embodiment of the present invention, where the data alignment method for the e-commerce product according to the embodiment of the present invention may be implemented by a server, and the data alignment method for the e-commerce product includes:
s1, acquiring commodity information of a commodity to be aligned from a commodity detail page of an e-commerce platform; the commodity information comprises commodity pictures, commodity titles and commodity parameters;
s2, carrying out data preprocessing on the commodity information to obtain a plurality of first standard vectors;
s3, respectively carrying out first matching operation on the plurality of first standard vectors and a second standard vector of each standard commodity in a pre-constructed vector library to obtain a plurality of candidate commodities from the vector library;
s4, splicing the first standard vectors belonging to the same commodity to be aligned to obtain a first spliced vector;
s5, performing second matching operation on the first splicing vector and the second splicing vector of each candidate commodity to acquire a target commodity from the candidate commodities;
and S6, aligning the data of the commodities to be aligned according to the target commodities.
Illustratively, when a merchant shelves a commodity to an e-commerce platform, the setting of the commodity title, the picture format and the parameters is often not standard, which brings great difficulty to the analysis of e-commerce data in product dimensions.
In the embodiment of the invention, by utilizing a multi-mode technology, the commodity picture, the commodity title and the commodity parameter are all subjected to model to obtain corresponding vectors, and the vectors are stored as a vector library. When a new commodity comes, the commodity picture, the commodity title and the commodity parameters are subjected to model to obtain vectors, and then the vectors are subjected to multi-mode rough arrangement and fine arrangement matching with the standard commodity in the vector library, so that the commodity is matched with the standard commodity. Compared with the method which only uses the title text information, the method takes the accuracy and the efficiency into consideration by utilizing multi-mode information and adopting a two-stage coarse-fine-arranging scheme.
Specifically, in step S1, when the product to be aligned in the product detail page includes at least two pictures, a first picture displayed is acquired as the product picture.
Specifically, in step S2, the data preprocessing of the commodity information includes steps S21 to S22:
s21, converting the commodity information into a corresponding first initial vector by using a preset data conversion model;
and S22, dividing the first initial vector by the modular length of the first initial vector to obtain the first standard vector.
Specifically, in step S21, the first initial vector includes a first initial picture vector, a first initial text vector, a first initial title vector, and a first initial parameter vector; converting the commodity picture into a corresponding first initial picture vector by using a ResNets model; extracting character information in the commodity picture by using an OCR (optical character recognition) model, and converting the character information into a corresponding first initial character vector by using a bert model; converting the commodity title into a corresponding first initial title vector by using a bert model; and splicing the data in the commodity parameters, and converting the spliced commodity parameters into corresponding first initial parameter vectors by using a bert model. Illustratively, the first initial vector is preferably a 768-dimensional vector, the 768 dimensions are set by the model output layer, and the dimensions may also be reduced to 300 dimensions, 200 dimensions, and the like, and the dimensions may be set by the user, which is not specifically limited herein.
Specifically, in step S22, the first initial vector is divided by the modulo length of itself, so as to obtain the first standard vector with unit length of 1. The corresponding first standard vector comprises a first standard picture vector, a first standard literal vector, a first standard header vector and a first standard parameter vector.
Specifically, in step S3, the first matching operation is performed on the plurality of first standard vectors and the second standard vector of each standard commodity in the pre-constructed vector library respectively, and the steps S31 to S34 are included:
s31, performing dot product calculation on the plurality of first standard vectors and a second standard vector of each standard commodity in a pre-constructed vector library respectively to obtain a plurality of dot product values;
s32, obtaining weight values corresponding to the first standard vectors;
s33, obtaining a similarity value of the commodity to be aligned and the standard commodity through weighting the dot product value according to the weight value;
and S34, when the similarity values of the to-be-aligned commodities and all the standard commodities are calculated, acquiring candidate commodities from all the standard commodities according to the similarity values and preset candidate rules.
Illustratively, the dot product of two vectors represents the "distance", and the higher the dot product, the closer the two vectors are in the vector space, i.e. the more similar the two vectors are. For example, the dot product value of the picture vector of the to-be-aligned commodity and the standard commodity 1 is 0.32, the dot product value of the picture character vector of the to-be-aligned commodity and the standard commodity 1 is 0.85, the dot product value of the title vector of the to-be-aligned commodity and the standard commodity 1 is 0.40, the dot product value of the commodity parameter vector of the to-be-aligned commodity and the standard commodity 1 is 0.25, and the similarity between the to-be-aligned commodity and the standard commodity 1 is w1 0.32 × w 2+ 0.85 × w3 + 0.40 × w4 + 0.25 obtained by weighting, where w1, w2, w3, and w4 are all weight values defined manually. It should be noted that, for a specific way of calculating the dot product value between two vectors, reference may be made to the prior art, and details are not described herein again.
Illustratively, when the similarity values of the to-be-aligned commodities and all the standard commodities are calculated, all the similarity values are sorted according to the numerical sequence from large to small; acquiring front K similarity values, and taking the standard commodities corresponding to the front K similarity values as the candidate commodities; wherein K is an integer and is more than or equal to 2.
Specifically, in step S4, a first standard picture vector, a first standard literal vector, a first standard title vector, and a first standard parameter vector of the commodity to be aligned are spliced to obtain a first spliced vector of 768 × 4 =3072-dimensional vectors.
Further, before the first matching operation is performed, standard commodities which are in the same brand and the same commodity type (such as skin care products under the same xx brand) as the commodity to be aligned are searched in the vector library, and then the commodity to be aligned is performed with the second standard vectors of the standard commodities for the first matching operation. And screening the standard commodities in the vector library according to the two conditions of the brand and the commodity type of the commodity to be aligned, so that useless data can be filtered, and the matching efficiency and accuracy are improved.
Specifically, in step S5, the second matching operation of the first mosaic vector and the second mosaic vector of each candidate product includes steps S51 to S54:
s51, splicing the first splicing vector and each second splicing vector to obtain a plurality of high-dimensional vectors;
s52, inputting the high-dimensional vectors into a preset similarity calculation model so that the similarity calculation model outputs corresponding similarity scores;
s53, obtaining the maximum value in the similarity scores;
and S54, when the maximum value is larger than a preset similarity threshold value, taking the candidate commodity corresponding to the maximum value as the target commodity.
It should be noted that the standard library does not store the second splicing vectors of all the standard commodities, and after the first splicing vector is obtained, the second standard vectors of the candidate commodities are spliced to obtain the second splicing vector of the 768 × 4 =3072-dimensional vector. Or the second splicing vectors of all the standard commodities are stored in the standard commodity library, and at the moment, the second splicing vectors corresponding to the candidate commodities are directly obtained from the standard commodity library, so that subsequent splicing is not needed.
Illustratively, the first splicing vectors of the commodities to be aligned and the second splicing vectors of the standard commodities are spliced respectively to finally obtain K6144-dimensional vectors, and the K high-dimensional vectors are subjected to a multi-layer neural network full connection layer and an activation function respectively to finally obtain a similarity score of 0 to 1. Specifically, the structure of the similarity calculation model is shown in fig. 2, the input layer is a 6144-dimensional vector, the 1024-dimensional vector is obtained as the output of the intermediate layer 1 through the activation function Relu after multiplying the input layer by the weight matrix W1 (with the dimensions [6144,1024 ]), the 1024-dimensional vector is obtained as the output of the intermediate layer 1 through the activation function Relu after multiplying the 1024-dimensional vector by the weight matrix W2 (with the dimensions [1024, 512 ]), the 512-dimensional vector is obtained as the output of the intermediate layer 2 through the activation function Relu, and the 1-dimensional vector with the value range of 0 to 1 is obtained as the final output of the output layer through the activation function Sigmoid after multiplying the 512-dimensional vector by the weight matrix W3 (with the dimensions [512,1 ]).
Illustratively, a maximum value of the similarity scores is obtained, and when the maximum value is greater than a preset similarity threshold, the candidate commodity corresponding to the maximum value is taken as the target commodity. If the maximum values are all smaller than or equal to the similarity threshold value, the matching is empty, that is, the commodity to be aligned cannot be mounted on the corresponding standard commodity, that is, the commodity to be aligned cannot bring effective information, and as the statistical sales volume and the like are counted by taking the standard commodity as a main body, the commodity which is empty in matching is discarded without alignment.
Specifically, in step S6, after the target commodity is obtained, the data information of the commodity to be aligned is replaced with the data information of the target commodity. Some commodities possibly lack marketing information or brand information in the commodity detail page, and by adopting the embodiment of the invention, the data alignment of the e-commerce commodities can be completed, and the data integrity of the e-commerce commodities is improved.
Further, the vector library is constructed in advance, and the construction method of the vector library comprises the following steps of S101-S103:
s101, acquiring commodity information of a commodity to be processed from a commodity detail page of an official platform;
s102, carrying out data preprocessing on the commodity information of the commodity to be processed to obtain a plurality of second standard vectors;
s103, the vector library is constructed according to the second standard vector and the commodity information.
Specifically, the official platform is a flagship store or a merchant official website in an e-commerce platform, the data preprocessing mode is the same as the step S2, the commodity information is converted into a corresponding second initial vector by using a preset data conversion model, and the second initial vector comprises a second initial picture vector, a second initial character vector, a second initial title vector and a second initial parameter vector; converting the commodity picture into a corresponding second initial picture vector by using a ResNets model; extracting character information in the commodity picture by using an OCR (optical character recognition) model, and converting the character information into a corresponding second initial character vector by using a bert model; converting the commodity title into a corresponding second initial title vector by using a bert model; and splicing the data in the commodity parameters, and converting the spliced commodity parameters into corresponding second initial parameter vectors by using a bert model. Illustratively, the second initial vector is preferably a 768-dimensional vector, the 768 dimensions are set by the model output layer, and the dimensions may also be reduced to 300 dimensions, 200 dimensions, and the like, and the dimensions may be set by the user, which is not specifically limited herein. And dividing the second initial vector by the modulus length of the second initial vector to obtain the second standard vector with the unit length of 1. The corresponding second standard vector comprises a second standard picture vector, a second standard word vector, a second standard header vector and a second standard parameter vector.
Specifically, a vector library is established, each row is labeled with a standard commodity, and columns comprise pictures, picture OCR, titles, spliced commodity parameters and respective vector representations thereof. The purpose of establishing the vector library is to map the commodity to be aligned to the standard commodity in the vector library and store the vector in advance, and the calculation is not carried out each time when a new commodity is matched, so that the matching calculation efficiency is improved. Table 1 is an example of a vector library.
Table 1 vector library example
Commodity Main figure Commodity owner Graph vector Main pattern of commodity OCR Commodity owner Graphic OCR direction Measurement of Commodity label Question (I) Commodity label Question vector Ginseng for sale Number of Direction of commodity parameters Measurement of
Standard of reference Goods of commerce 1 [ scheme (b) ] Sheet] [0.2323 , - 0.1123, 0.2323, …, 0.6786] xx braised beef in soy sauce The meat noodles are This taste! [0.2233 , - 0.8343, 0.1355, …, 0.3684] xx classic Braised beef in soy sauce Meat noodles 100g*24 Bag box Instant noodles [0.1236 , 0, 5644, - 0.2357, …, 0.2357] Shelf life: 180. day; brand name: AA; goods and goods Number: suda Noodle shop A tank; product produced by birth In ground Country; economic Zhejiang province Saving; city (a city) Hangzhou city Market; package (I) Mode of bag Clothes (CN) [0.2311, 0.6533, 0.7429, …, 0.7855
Standard of reference Goods of commerce 2 [ scheme (b) ] Sheet] [0.1643 , - 0.7543, 0.6324, …, 0.2456] XX ultra-light hand Feeling |5.4mm Unfolding super-balloon [0.2588 , - 0.2389, 0.0765, …, 0.4378] [ 24 th phase exemption Can reduce 1080】XX Folding screen Mobile phone officer Square ultralight Thin and super flat Whole super can "Yuanhongmeng Brand new x2 Large screen body Flag-checking ship Shop [0.7554 , 0, 7543, - 0.1332, …, 0.2368] Name of the product Weighing: BB The model is as follows: CC; fuselage color Color brocade White and stored Storage capacity: 8GB+ 256GB a book reservation edition; network model Double card type Double standby; CPU type Xiaolong No. 888 4G [0.4682, 0.8904, 0.4823, …, 0.2356]
Further, when the standard product library is a standard product library in which the second splicing vectors of all the standard commodities are stored, after a plurality of second standard vectors are obtained, the method further comprises the following steps:
splicing second standard vectors belonging to the same commodity to be processed to obtain a second spliced vector; then, the constructing the vector library according to the second standard vector and the commodity information includes:
and constructing the vector library according to the second standard vector, the second splicing vector and the commodity information.
Compared with the prior art, the data alignment method for the E-commerce commodities, disclosed by the invention, can be used for acquiring the commodity picture, the commodity title and the commodity parameters of the commodity to be aligned from the commodity detail page of the E-commerce platform, and can be used for accurately reflecting the information of the commodity due to the comprehensive acquired commodity information data. Firstly, carrying out data preprocessing on the commodity information to obtain a plurality of first standard vectors, and carrying out first matching operation according to the first standard vectors and a second standard vector of each standard commodity in a vector library to obtain a plurality of candidate commodities; then splicing the first standard vectors belonging to the same commodity to be aligned to obtain a first spliced vector; and finally, performing second matching operation on the first splicing vector and the second splicing vector of each candidate commodity to obtain a target commodity, and aligning the data of the commodity to be aligned according to the target commodity. By adopting the embodiment of the invention, the commodity information utilized when the commodity information is matched with the vector library is relatively comprehensive, and the accuracy of the E-commerce commodity in data alignment can be effectively improved through two matching operations.
Referring to fig. 3, fig. 3 is a block diagram of a data alignment apparatus 100 for an e-commerce product according to an embodiment of the present invention, where the data alignment apparatus 100 for an e-commerce product includes:
the commodity information acquisition module 11 is configured to acquire commodity information of a commodity to be aligned from a commodity detail page of the e-commerce platform; the commodity information comprises commodity pictures, commodity titles and commodity parameters;
the data preprocessing module 12 is configured to perform data preprocessing on the commodity information to obtain a plurality of first standard vectors;
the first matching module 13 is configured to perform a first matching operation on the plurality of first standard vectors and a second standard vector of each standard commodity in a pre-constructed vector library respectively, so as to obtain a plurality of candidate commodities from the vector library;
the vector splicing module 14 is configured to splice first standard vectors belonging to the same commodity to be aligned to obtain a first spliced vector;
the second matching module 15 is configured to perform a second matching operation on the first mosaic vector and a second mosaic vector of each candidate commodity, so as to obtain a target commodity from the candidate commodities;
and the alignment module 16 is configured to align the data of the to-be-aligned commodity according to the target commodity.
Specifically, when the to-be-aligned commodity in the commodity detail page includes at least two pictures, the commodity information acquiring module 11 acquires a first picture displayed as the commodity picture.
Specifically, the data preprocessing module 12 is specifically configured to:
converting the commodity information into a corresponding first initial vector by using a preset data conversion model;
and dividing the first initial vector by the modular length of the first initial vector to obtain the first standard vector.
Wherein, the converting the commodity information into a corresponding first initial vector by using a preset data conversion model comprises: converting the commodity picture into a corresponding first initial picture vector by using a ResNets model; extracting character information in the commodity picture by using an OCR (optical character recognition) model, and converting the character information into a corresponding first initial character vector by using a bert model; converting the commodity title into a corresponding first initial title vector by using a bert model; and splicing the data in the commodity parameters, and converting the spliced commodity parameters into corresponding first initial parameter vectors by using a bert model.
Specifically, the first matching module 13 is specifically configured to:
respectively carrying out dot product calculation on the plurality of first standard vectors and a second standard vector of each standard commodity in a pre-constructed vector library to obtain a plurality of dot product values;
obtaining weight values corresponding to the plurality of first standard vectors;
obtaining the similarity value of the commodity to be aligned and the standard commodity by weighting the dot product value according to the weight value;
and when the similarity values of the to-be-aligned commodities and all the standard commodities are calculated, acquiring candidate commodities from all the standard commodities according to the similarity values and a preset candidate rule.
The candidate rules include:
sequencing all similarity values according to the numerical sequence from large to small;
acquiring front K similarity values, and taking the standard commodities corresponding to the front K similarity values as the candidate commodities; wherein K is an integer and is more than or equal to 2.
Specifically, the second matching module 15 is specifically configured to:
performing a second matching operation on the first stitching vector and a second stitching vector of each candidate commodity, including:
splicing the first splicing vector and each second splicing vector to obtain a plurality of high-dimensional vectors;
inputting the high-dimensional vectors into a preset similarity calculation model so that the similarity calculation model outputs corresponding similarity scores;
obtaining a maximum value in the similarity scores;
and when the maximum value is larger than a preset similarity threshold value, taking the candidate commodity corresponding to the maximum value as the target commodity.
Specifically, the construction method of the vector library comprises the following steps:
acquiring commodity information of a commodity to be processed from a commodity detail page of an official platform;
carrying out data preprocessing on the commodity information of the commodity to be processed to obtain a plurality of second standard vectors;
and constructing the vector library according to the second standard vector and the commodity information.
It should be noted that, for the working process of each module in the data alignment apparatus 100 for e-commerce commodities in the embodiment of the present invention, reference may be made to the working process of the data alignment method for e-commerce commodities described in the above embodiment, and details are not repeated herein.
Compared with the prior art, the data alignment device 100 for the e-commerce commodities, disclosed by the invention, can be used for acquiring the commodity picture, the commodity title and the commodity parameters of the commodity to be aligned from the commodity detail page of the e-commerce platform, and can be used for accurately reflecting the information of the commodity due to the comprehensive acquired commodity information data. Firstly, carrying out data preprocessing on the commodity information to obtain a plurality of first standard vectors, and carrying out first matching operation according to the first standard vectors and second standard vectors of each standard commodity in a vector library to obtain a plurality of candidate commodities; then splicing the first standard vectors belonging to the same commodity to be aligned to obtain a first spliced vector; and finally, performing second matching operation on the first splicing vector and the second splicing vector of each candidate commodity to obtain a target commodity, and aligning the data of the commodity to be aligned according to the target commodity. By adopting the embodiment of the invention, the commodity information utilized when the commodity information is matched with the vector library is relatively comprehensive, and the accuracy of the E-commerce commodity in data alignment can be effectively improved through two matching operations.
Referring to fig. 4, fig. 4 is a block diagram of a data alignment apparatus 200 for an e-commerce product according to an embodiment of the present invention, where the data alignment apparatus 200 for an e-commerce product includes a processor 21, a memory 22, and a computer program stored in the memory 22 and operable on the processor 21. When the processor 21 executes the computer program, the steps, such as steps S1 to S6, in the above embodiments of the data alignment method for each e-commerce product are implemented. Alternatively, the processor 21 implements the functions of the modules/units in the above device embodiments when executing the computer program.
Illustratively, the computer program may be divided into one or more modules/units, which are stored in the memory 22 and executed by the processor 21 to accomplish the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program in the data alignment device 200 of the e-commerce item.
The data alignment device 200 of the e-commerce product may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The data alignment device 200 of the e-commerce commodity may include, but is not limited to, a processor 21 and a memory 22. It will be appreciated by those skilled in the art that the schematic is merely an example of the data alignment device 200 for an e-commerce article and does not constitute a limitation of the data alignment device 200 for an e-commerce article and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the data alignment device 200 for an e-commerce article may also include input-output devices, network access devices, buses, etc.
The Processor 21 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor 21 is the control center of the data alignment device 200 for the e-commerce commodity, and various interfaces and lines are used to connect the various parts of the data alignment device 200 for the entire e-commerce commodity.
The memory 22 may be used for storing the computer programs and/or modules, and the processor 21 implements various functions of the data alignment apparatus 200 of the e-commerce commodity by operating or executing the computer programs and/or modules stored in the memory 22 and calling the data stored in the memory 22. The memory 22 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory 22 may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
Wherein, the module/unit integrated by the data alignment device 200 of the e-commerce commodity can be stored in a computer readable storage medium if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, all or part of the flow of the method according to the above embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and used by the processor 21 to implement the steps of the above embodiments of the method. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, and software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
Compared with the prior art, the data alignment equipment 200 for the e-commerce commodities disclosed by the embodiment of the invention acquires the commodity picture, the commodity title and the commodity parameters of the commodities to be aligned from the commodity detail page of the e-commerce platform, and can accurately reflect the information of the commodities because the acquired commodity information data is comprehensive. Firstly, carrying out data preprocessing on the commodity information to obtain a plurality of first standard vectors, and carrying out first matching operation according to the first standard vectors and second standard vectors of each standard commodity in a vector library to obtain a plurality of candidate commodities; then splicing the first standard vectors belonging to the same commodity to be aligned to obtain a first spliced vector; and finally, performing second matching operation on the first splicing vector and the second splicing vector of each candidate commodity to obtain the target commodity, and aligning the data of the commodity to be aligned according to the target commodity. By adopting the embodiment of the invention, the commodity information utilized when the commodity information is matched with the vector library is relatively comprehensive, and the accuracy of the E-commerce commodity in data alignment can be effectively improved through two matching operations.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (10)

1. A data alignment method for E-commerce commodities is characterized by comprising the following steps:
acquiring commodity information of commodities to be aligned from a commodity detail page of an e-commerce platform; the commodity information comprises commodity pictures, commodity titles and commodity parameters;
carrying out data preprocessing on the commodity information to obtain a plurality of first standard vectors;
respectively carrying out first matching operation on the plurality of first standard vectors and a second standard vector of each standard commodity in a pre-constructed vector library so as to obtain a plurality of candidate commodities from the vector library;
splicing first standard vectors belonging to the same commodity to be aligned to obtain a first spliced vector;
performing second matching operation on the first splicing vector and a second splicing vector of each candidate commodity to acquire a target commodity from the candidate commodities;
and aligning the data of the commodities to be aligned according to the target commodity.
2. The data alignment method for E-commerce commodities of claim 1, wherein the data preprocessing of the commodity information comprises:
converting the commodity information into a corresponding first initial vector by using a preset data conversion model;
and dividing the first initial vector by the modular length of the first initial vector to obtain the first standard vector.
3. The data alignment method for e-commerce commodities as claimed in claim 2, wherein said converting said commodity information into a corresponding first initial vector using a preset data conversion model includes:
converting the commodity picture into a corresponding first initial picture vector by using a ResNets model;
extracting character information in the commodity picture by using an OCR (optical character recognition) model, and converting the character information into a corresponding first initial character vector by using a bert model;
converting the commodity title into a corresponding first initial title vector by using a bert model;
and splicing the data in the commodity parameters, and converting the spliced commodity parameters into corresponding first initial parameter vectors by using a bert model.
4. The data alignment method for E-commerce commodities according to claim 1, wherein the performing a first matching operation on a plurality of first standard vectors and a second standard vector of each standard commodity in a pre-constructed vector library respectively comprises:
respectively carrying out dot product calculation on the plurality of first standard vectors and a second standard vector of each standard commodity in a vector library constructed in advance to obtain a plurality of dot product values;
obtaining weight values corresponding to the plurality of first standard vectors;
obtaining the similarity value of the commodity to be aligned and the standard commodity by weighting the dot product value according to the weight value;
and when the similarity values of the to-be-aligned commodities and all the standard commodities are calculated, acquiring candidate commodities from all the standard commodities according to the similarity values and a preset candidate rule.
5. The data alignment method for e-commerce commodities of claim 4, wherein the candidate rule comprises:
sequencing all similarity values according to the numerical sequence from large to small;
acquiring front K similarity values, and taking the standard commodities corresponding to the front K similarity values as the candidate commodities; wherein K is an integer and is more than or equal to 2.
6. The method for aligning data of e-commerce commodities according to claim 1, wherein said performing a second matching operation on said first stitched vector and a second stitched vector of each candidate commodity comprises:
splicing the first splicing vector and each second splicing vector to obtain a plurality of high-dimensional vectors;
inputting the high-dimensional vectors into a preset similarity calculation model so that the similarity calculation model outputs corresponding similarity scores;
obtaining a maximum value of the similarity scores;
and when the maximum value is larger than a preset similarity threshold value, taking the candidate commodity corresponding to the maximum value as the target commodity.
7. The data alignment method for e-commerce commodities of claim 1, wherein the vector library construction method comprises:
acquiring commodity information of a commodity to be processed from a commodity detail page of an official platform;
carrying out data preprocessing on the commodity information of the commodities to be processed to obtain a plurality of second standard vectors;
and constructing the vector library according to the second standard vector and the commodity information.
8. The method for aligning data of an e-commerce commodity according to claim 1, further comprising:
and when the to-be-aligned commodity in the commodity detail page comprises at least two pictures, acquiring a first displayed picture as the commodity picture.
9. A data alignment device for an e-commerce product, comprising:
the commodity information acquisition module is used for acquiring the commodity information of the commodity to be aligned from the commodity detail page of the e-commerce platform; the commodity information comprises commodity pictures, commodity titles and commodity parameters;
the data preprocessing module is used for preprocessing the commodity information to obtain a plurality of first standard vectors;
the first matching module is used for performing first matching operation on the plurality of first standard vectors and a second standard vector of each standard commodity in a pre-constructed vector library respectively so as to obtain a plurality of candidate commodities from the vector library;
the vector splicing module is used for splicing the first standard vectors belonging to the same commodity to be aligned to obtain a first spliced vector;
the second matching module is used for performing second matching operation on the first splicing vector and the second splicing vector of each candidate commodity so as to acquire a target commodity from the candidate commodities;
and the alignment module is used for aligning the data of the commodity to be aligned according to the target commodity.
10. A data alignment apparatus for an e-commerce item, comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the data alignment method for the e-commerce item as recited in any one of claims 1 to 8 when executing the computer program.
CN202211533043.7A 2022-12-02 2022-12-02 Data alignment method, device and equipment for E-commerce commodities Pending CN115545808A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211533043.7A CN115545808A (en) 2022-12-02 2022-12-02 Data alignment method, device and equipment for E-commerce commodities

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211533043.7A CN115545808A (en) 2022-12-02 2022-12-02 Data alignment method, device and equipment for E-commerce commodities

Publications (1)

Publication Number Publication Date
CN115545808A true CN115545808A (en) 2022-12-30

Family

ID=84722605

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211533043.7A Pending CN115545808A (en) 2022-12-02 2022-12-02 Data alignment method, device and equipment for E-commerce commodities

Country Status (1)

Country Link
CN (1) CN115545808A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116862626A (en) * 2023-09-05 2023-10-10 广州数说故事信息科技有限公司 Multi-mode commodity alignment method
CN117151826A (en) * 2023-09-13 2023-12-01 广州数说故事信息科技有限公司 Multi-mode electronic commerce commodity alignment method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110928894A (en) * 2019-11-18 2020-03-27 精硕科技(北京)股份有限公司 Entity alignment method and device
CN112418969A (en) * 2020-05-12 2021-02-26 上海哔哩哔哩科技有限公司 Commodity matching method and device and computer equipment
CN113742487A (en) * 2021-11-01 2021-12-03 北京值得买科技股份有限公司 Automatic commodity matching method
CN113962773A (en) * 2021-10-22 2022-01-21 广州华多网络科技有限公司 Same-style commodity polymerization method and device, equipment, medium and product thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110928894A (en) * 2019-11-18 2020-03-27 精硕科技(北京)股份有限公司 Entity alignment method and device
CN112418969A (en) * 2020-05-12 2021-02-26 上海哔哩哔哩科技有限公司 Commodity matching method and device and computer equipment
CN113962773A (en) * 2021-10-22 2022-01-21 广州华多网络科技有限公司 Same-style commodity polymerization method and device, equipment, medium and product thereof
CN113742487A (en) * 2021-11-01 2021-12-03 北京值得买科技股份有限公司 Automatic commodity matching method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116862626A (en) * 2023-09-05 2023-10-10 广州数说故事信息科技有限公司 Multi-mode commodity alignment method
CN116862626B (en) * 2023-09-05 2023-12-05 广州数说故事信息科技有限公司 Multi-mode commodity alignment method
CN117151826A (en) * 2023-09-13 2023-12-01 广州数说故事信息科技有限公司 Multi-mode electronic commerce commodity alignment method and device, electronic equipment and storage medium
CN117151826B (en) * 2023-09-13 2024-05-28 广州数说故事信息科技有限公司 Multi-mode electronic commerce commodity alignment method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN115545808A (en) Data alignment method, device and equipment for E-commerce commodities
CN109584006B (en) Cross-platform commodity matching method based on deep matching model
CN111915400B (en) Personalized clothing recommendation method and device based on deep learning
CN116308684B (en) Online shopping platform store information pushing method and system
CN108269169A (en) A kind of shopping guide method and system
CN110046648A (en) The method and device of business classification is carried out based at least one business disaggregated model
CN117788109A (en) Method for generating commodity label based on large language model and electronic equipment
CN109325529A (en) Sketch identification method and application of sketch identification method in commodity retrieval
CN111667245A (en) Service item commercialization method and device
CN106600360B (en) Method and device for sorting recommended objects
CN112184250B (en) Method and device for generating retrieval page, storage medium and computer equipment
US10621208B2 (en) Category name extraction device, category name extraction method, and category name extraction program
CN115905472A (en) Business opportunity service processing method, business opportunity service processing device, business opportunity service processing server and computer readable storage medium
KR20220118703A (en) Machine Learning based Online Shopping Review Sentiment Prediction System and Method
TWM607726U (en) Shopping system capable of providing product on blockchain
US20240127282A1 (en) Listed product information check system and storage medium
CN113627449A (en) Model training method and device and label determining method and device
TWI851453B (en) Intelligent shopping and marketing assistance system
CN111028067A (en) E-commerce commodity searching method, device and equipment
Shanmuganathan Robust K-Nearest Neighbor Classifier and Nearest Mean Classifier Based Informative Knowledge Distillation Face Recognition
CN109214848B (en) Method and system for analyzing influence similarity of virtual commodities on recommendation system
PRASAD et al. Product Price Prediction
CN118446776A (en) Method and system for generating commodity selling point information from E-commerce website links
CN117892144A (en) Commodity information creation method, system, electronic equipment and medium
CN115661844A (en) Model training and form information extraction method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20221230