CN110738538A

CN110738538A - Method and device for identifying similar articles

Info

Publication number: CN110738538A
Application number: CN201810791952.8A
Authority: CN
Inventors: 余帅兵; 王泉泉
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2018-07-18
Filing date: 2018-07-18
Publication date: 2020-01-31

Abstract

The invention discloses a method and a device for identifying similar articles, which relate to the technical field of computers.A specific implementation mode of the method comprises the steps of processing data of not less than dimensions, determining a similar article set corresponding to each dimension data, and calculating the similar article set corresponding to each dimension data to obtain a similar article set.

Description

Method and device for identifying similar articles

Technical Field

The invention relates to the technical field of computers, in particular to methods and devices for identifying similar articles.

Background

The important bases are that the known method is to identify similar commodities, brands and merchants through effective means so as to better analyze commodity-to-commodity and competition, and the hot spot of attention of the brand merchants is formed.

Whether professional data companies (such as Nielsen) or industry consulting companies GFK, or online websites, media, data monitoring mechanisms and the like, when acquiring data and generating market industry competition analysis reports, generally acquires relevant information of online brand merchants based on web crawler capture technology, data acquisition and public opinion monitoring technology, manual labeling, experience recognition and the like.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:

at present, the process of identifying similar commodities by online brands and merchants needs to consume a large amount of manpower, time and capital, the process belongs to manual matching, the known information of the similar commodities is relatively limited, and flexible adjustment and monitoring can not be carried out along with daily updating of online data.

Disclosure of Invention

In view of this, the embodiments of the present invention provide methods and apparatuses for identifying similar items, in which data of multiple dimensions related to an item are processed to determine a similar item set corresponding to data of different dimensions, and then the determined similar item set is operated to obtain a similar item set, so that the similar item can be automatically identified accurately, flexibly, and comprehensively according to online data of multiple dimensions in time.

To achieve the above objects, according to aspects of embodiments of the present invention, methods of identifying similar items are provided.

method for identifying similar items comprises processing data of no less than dimensions, determining a similar item set corresponding to each dimension data, and performing operation on the similar item set corresponding to each dimension data to obtain a similar item set.

Optionally, the data of not less than dimensions comprises user dimension data, and the step of processing the data of not less than dimensions and determining the similar item set corresponding to each dimension data comprises the steps of obtaining the transition probability of a user among different items according to the user dimension data, generating a transition probability matrix according to the transition probability, and obtaining the similar item set corresponding to the user dimension data according to the transition probability matrix.

Optionally, the step of obtaining the transition probability of the user among different articles according to the user dimension data comprises the steps of obtaining behavior data of the user in a preset time period, wherein the behavior data comprises th behavior records and second behavior records, obtaining the behavior times S of th behaviors of the user on an article B according to the th behavior records, obtaining the transition times T of the user from the article A to the article B from a second behavior record corresponding to the th behaviors of the user on the article B, wherein the article A is the last articles which are not the article B and are used for executing the second behaviors before the th behaviors are executed on the article B, and taking the T/S as the transition probability of the user from the article A to the article B.

Optionally, the data of not less than dimensions includes item dimension data, and the step of processing the data of not less than dimensions and determining a similar item set corresponding to each dimension data includes comparing data information of not less than dimensions of items included in the item dimension data according to a predetermined dimension order to obtain a similar item set corresponding to the item dimension data.

Optionally, the data of not less than dimensions includes sales dimension data, and the step of processing the data of not less than dimensions to determine a similar item set corresponding to each dimension data includes calculating the sales volume and the sales amount of the item within a predetermined time period included in the sales dimension data to obtain a comprehensive sales index of the item, and determining the similar item set corresponding to the sales dimension data by using the item whose difference from the comprehensive sales index of the item C is within a predetermined range as the similar item of the item C.

Optionally, the step of performing an operation on the similar item set corresponding to each dimension data to obtain a similar item set includes: and performing intersection operation on the similar article set corresponding to each dimension data to obtain a similar article set.

According to another aspect of an embodiment of the present invention, there are provided means for identifying similar items.

apparatus for identifying similar articles comprises a set determining module for processing data of no less than dimensions and determining a similar article set corresponding to each dimension data, and a set operating module for operating the similar article set corresponding to each dimension data to obtain a similar article set.

Optionally, the data of not less than dimensions includes user dimension data, and the set determining module is further configured to obtain transition probabilities of the user among different items according to the user dimension data, generate a transition probability matrix according to the transition probabilities, and obtain a similar item set corresponding to the user dimension data according to the transition probability matrix.

Optionally, the set determining module is further configured to obtain behavior data of the user in a predetermined time period, where the behavior data includes th behavior records and second behavior records, obtain a behavior frequency S of th behavior of the user on item B according to the th behavior record, obtain a transition frequency T of the user from item a to item B from a second behavior record corresponding to th behavior of the user on item B, where item a is a last non-item B items on which the user performs the second behavior before performing th behavior on item B, and take T/S as a transition probability of the user from item a to item B.

Optionally, the data of not less than dimensions includes item dimension data, and the set determination module is further configured to compare data information of not less than dimensions of items included in the item dimension data according to a predetermined dimension order to obtain a similar item set corresponding to the item dimension data.

Optionally, the data of not less than dimensions includes sales dimension data, and the set determining module is further configured to calculate a sales volume and a sales amount of the item within a predetermined time period included in the sales dimension data to obtain a comprehensive sales index of the item, and determine a set of similar items corresponding to the sales dimension data by using the item whose difference from the comprehensive sales index of the item C is within a predetermined range as a similar item of the item C.

Optionally, the set operation module is further configured to: and performing intersection operation on the similar article set corresponding to each dimension data to obtain a similar article set.

According to yet another aspect of an embodiment of the present invention, there are provided electronic devices for identifying similar items.

an electronic device for identifying similar items, comprising or multiple processors, a storage device for storing or multiple programs, wherein when the or multiple programs are executed by the or multiple processors, the or multiple processors implement the method for identifying similar items provided by the embodiment of the invention.

According to a further aspect of embodiments of the present invention, computer-readable media are provided.

computer readable media having stored thereon a computer program that, when executed by a processor, performs the method of identifying similar items provided by embodiments of the present invention.

The embodiments in the invention have the advantages or beneficial effects that through processing data of multiple dimensions related to articles to determine similar article sets corresponding to data of different dimensions, and then, performing operation on the determined similar article sets to obtain similar article sets, sets of similarity identification models capable of automatically adjusting and calculating in real time according to online data changes are constructed, similar articles, brands and merchants can be automatically identified accurately, flexibly and comprehensively according to online data of multiple dimensions, so that time and energy spent on label alignment work are greatly saved, fine label alignment from different dimensions can be realized, and data support is provided for development of articles and establishment of improved targets and strategies.

The effect of step of the above non-conventional alternative is described below in conjunction with the detailed description.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram of the main steps of a method of identifying similar items according to an embodiment of the invention;

FIG. 2 is a schematic diagram of the main modules of an apparatus for identifying similar items according to an embodiment of the present invention;

FIG. 3 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 4 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The online brand and the merchant can not meet daily business requirements and can not respond to the market change speed by virtue of analytical reports provided by a data consulting company or manual professional experience, and the online brand and the merchant can not meet the daily business requirements and can not respond to the market change speed.

In order to solve the problems, the invention provides methods for identifying similar articles, which can automatically adjust the calculated similarity identification model in real time according to the change of online data by constructing sets of similarity identification models, and can accurately, flexibly and comprehensively identify similar articles, brands and merchants by online data in time.

Fig. 1 is a schematic view of the main steps of a method of identifying similar items according to an embodiment of the present invention. As shown in fig. 1, the method for identifying similar items according to the embodiment of the present invention mainly includes the following steps S101 and S102.

And S101, processing the data of not less than dimensions, and determining a similar item set corresponding to each dimension data.

In the embodiment of the invention, in order to more comprehensively and accurately identify similar articles, the identification of the similar articles is comprehensively carried out from multiple angles, data of multiple dimensions related to the articles are collected, and the data of the multiple dimensions are analyzed and processed. Specifically, in the embodiment of the present invention, the data of 3 dimensions, which is the user dimension data, the item dimension data, and the sales dimension data related to the item, are taken as an example to describe how to identify similar items. When data of each dimension is processed and analyzed, corresponding similarity judgment logics are also different, for example, the corresponding similarity judgment logics are respectively as follows:

(1) and (3) similarity judgment logic of user dimension data: judging the similarity of the articles based on the transition probability of the user to different articles;

(2) and (3) similarity judgment logic of item dimension data: judging the similarity of the articles based on the coincidence degree of the data information of the articles;

(3) similarity judgment logic of sales dimension data: the similarity of the items is judged based on the sales amount and sales amount of the items, and the like.

The following describes how embodiments of the present invention determine a similar item set corresponding to each dimension data by processing the data of each dimension. In the embodiment of the invention, the method for identifying similar articles is described by taking the example of identifying the similarity of the commodities of the E-commerce platform.

(1) User dimension data

The method comprises the steps of obtaining behavior data of a user, such as actual ordering records of the user when shopping is carried out through an e-commerce platform, browsing access path records on a commodity detail page and the like, analyzing the obtained behavior data to obtain transition probabilities of the user among different commodities, and judging the similarity of the commodities based on the transition probabilities of the user on the different commodities. Specifically, a transition probability matrix may be formed from the transition probabilities, and then the transition probability matrix may be used as a user-selected alternative model, and the similarity of the commodities may be determined based on the user-selected alternative model.

Where each element of the Transition Probability Matrix (Transition Probability Matrix) is non-negative and the sum of the row elements equals 1, each element is represented by a Probability and mutual Transition can occur under the conditions.

1) P (i, j) is more than or equal to 0 and less than or equal to 1, and the probability value of the ith row and the jth column is represented;

2)

i.e. the sum of the transition probabilities equals 1 for every rows in the matrix.

The elements constituting the transition probability matrix are elementsThat is, according to the transition probability of the user between different commodities, a transition probability matrix can be formed. For example, a transition probability matrix R in which the individual elements (P) are₁₁、P₁₂、…、P_mnEtc.) are the transition probabilities of the user between different commodities:

therefore, in the embodiment of the application, by means of the transition probability matrix, in a specific selection set of known users and known commodities, regular learning and recognition can be carried out on the consumption behaviors (ordering behaviors) and browsing behaviors and the like of the 1 st to the Nth times of the users, and the transition probabilities of the users among different commodities can be calculated.

types of specific selection sets of known users and known commodities can be obtained by carrying out cluster analysis on massive user orders under the condition of no input, for example, for specific selection among multiple commodities under categories by a user, user order data of all commodities of the categories can be extracted from an order model, steps are carried out on commodities related to orders and browsing behaviors of users (within specified time ranges), a specific selection set is obtained for machine learning, types of specific commodity ranges are directly defined according to the use scenes of the users as specific selection sets under the condition of input, and machine learning is carried out on the basis of the selection data, wherein the categories refer to groups of specific commodities or services which are considered by the consumers to be related and can be replaced mutually.

After a specific selection set of known users and known commodities is obtained, data in the specific selection set can be analyzed, transition probabilities of the users among different commodities can be obtained by performing regular machine learning on corresponding behavior data such as consumption behaviors (i.e., ordering behaviors) and browsing behaviors of the users in the specific selection set, transition probability matrixes are formed by the transition probabilities, embodiments of the invention, the transition probabilities of the users among different commodities can be obtained according to the steps that firstly, behavior data of the users in a preset time period is obtained, the behavior data comprises 0 behavior record and a second behavior record, then, the number of times S of behaviors of the users in th behaviors of the commodities B is obtained according to behavior record, then, the number T of times of the users in transferring from the commodities A to the commodities B is obtained from the second behavior record corresponding to th row of the commodities B, the number T of the users in transferring from the commodities A to the commodities B is obtained from the previous stage of the commodities → the commodities is obtained as a transition probability of a single commodity → a single commodity shift, the last commodity shift is obtained from a single commodity → a single commodity shift behavior before the purchase behavior of the commodity → a single commodity shift, the last commodity shift is obtained according to a single commodity shift behavior before the user, the initial stage of the transition probability of the last commodity → the user is obtained, the transition behavior before the transition behavior of the purchase of the commodity → the commodity is obtained, the commodity is represented by the purchase of the commodity → the commodity, the commodity → the commodity, the commodity → the commodity is represented by the commodity, the commodity is represented by the purchase of the commodity, the commodity is represented by the commodity, the commodity is represented by.

Similarly, the transition probability of the user between different items can be obtained by machine learning the behavior data of the user within a predetermined time period and then by a voting mechanism (for example, the transition probability of the user being transferred from items A to items B can be realized based on the majority voting algorithm, the mole voting algorithm, and other existing voting algorithms).

Specifically, when obtaining transition probabilities of users among different commodities, taking behavior data of users in periods as an example, behavior data of the user can be obtained according to login identification of the user, assuming that the user browses a plurality of commodities under specific categories, and finally commodities are selected for ordering, at this time, the sum of the transition probabilities of all commodities browsed by the user is regarded as 1, flow distribution and theoretical purchase probability of different commodities in the user behavior data at this time can be calculated, and then, by selecting N (N is an integer greater than 1) users in the category and performing cluster analysis on the behavior data of each user, preferences of the N users commonly embodied in transition among different commodities in M shopping behaviors (M is an integer greater than 1) can be found, and probability calculation is performed according to preference conditions of the transition.

Table 1 shows the transition probabilities of a user between different items based on the user's behavioral data over a predetermined period of times.

TABLE 1

In table 1, th row and th column indicate the item identifier of the user browsing or ordering, respectively, and the data in the remaining rows and columns are the transition probabilities of the user between different items, taking the data in the second row in table 1 as an example, the transition probabilities of the user from item 1 itself and other items i (i ═ 1, 2, …, 12) to item 1 are given, wherein, assuming that the user has browsed item 1 itself directly, i.e., ordered, in 100 ordering actions for item 1, the probability of the user transitioning from item 1 to item 1 is 69%, assuming that for 14 ordering actions for item 1 from item 3 to item 1 and ordered, the probability of the user transitioning from item 3 to item 1 is 14%, from the data in the second row, the predetermined number (e.g., 4) of items with the highest similarity to item 1 is given as item 3, item 4, item 5 and item 2, respectively, the subset of items with the highest similarity to item 1 is given as the corresponding item set, and the set of similar items is given as { item 3, item 4, item 5, item 3, item 5, item 1, item set, and item 1, item set, item 1, item set, and item set.

According to the characteristics of the transition probability matrix, machine learning and rule training are carried out on behavior data of a user according to the characteristics of the transition probability matrix and by considering the behaviors of ordering, browsing and the like of the user and factors such as the process of browsing behaviors of the user, the transition probability matrix is formed by the transition probabilities, then the transition probability matrix is used as a user selection alternative model, and the similarity of commodities is judged based on the user selection alternative model.

In addition, before the user behavior data in the real online environment is used as sampling data to be analyzed, the user behavior data can be processed in advance to delete abnormal bill swiping data or data of a shopping path which is obviously deviated from a standard value and the like, so that real, effective and healthy user behavior data can be obtained.

After the set 1 is obtained, the quantity of the articles corresponding to each brand can be counted according to the brands corresponding to the articles in the set 1, so as to obtain similar brands.

(2) Item dimension data

The data information of multiple dimensions of the milk in the last 1 month can be selected as the dimension data of the article, and then the data of multiple dimensions corresponding to the article 'milk' with different article identifications (used for identifying articles exclusively, such as barcodes of commodities) are compared to obtain a similar article set (hereinafter referred to as set 2) corresponding to the dimension data of the article.

Specifically, taking a commodity of an e-commerce platform as an example, when judging whether two commodities are similar, comparing according to the sequence of the commodity name, the description, the keyword, the attribute and the attribute value, if the commodity name is judged to be result, the two commodities are considered to be similar, and subsequent dimension data is not required to be judged, otherwise, if the commodity name is judged not to be result, then judging whether the commodity description is result, if result, the two commodities are considered to be similar, and subsequently, judging of other dimension data is not carried out, if not result, whether the keyword is result is continuously judged, if judging whether the keyword is result, a condition meeting result is preset, for example, two commodities have at least three keywords (the number of the keywords can be optimized and adjusted according to an algorithm) result, then the keyword result of the two commodities can be judged, if the keywords of the two commodities are judged to be 632 result, the two commodities are considered to be similar, otherwise, if the attribute and the attribute value are judged to be the same, then, if the commodity is judged to be result, if the attribute and the commodity is judged to be identical, then, the attribute value is , then, the attribute value is judged to be at least 3884, and the attribute can be identical, and the attribute can be obtained if the attribute value is judged to be identical.

When judging whether the data information of the multiple dimensions of the article is , the data information of the multiple dimensions of the article can be used as a text, the text is subjected to word segmentation, the feature vector of each word after word segmentation is solved, the similarity between the feature vectors is calculated, and then the words meeting the similarity threshold are used as similar words to judge whether the data information of the multiple dimensions of the article is .

According to embodiments of the present invention, after determining whether two items are similar according to the item dimension data, the similar two items can be saved as pairs of similar items, and then, according to the brand of the item, the number of pairs of similar items corresponding to each brands is counted, and further similarity between different brands is obtained, when determining similarity of brands according to the number of pairs of similar items, specific determination rules can be set according to the contents of user interest in practical applications, for example, the number of pairs of similar items between different brands can be counted by a user comparing the number of similar items concerned, and the number of pairs of similar items between different brands can be counted, and the number of pairs of similar items between different brands can be divided by the number of pairs of similar items corresponding to the brands to obtain the number of pairs of similar items between different brands, so as to determine similarity, by counting the number of pairs of similar items between different brands, the number of pairs of similar items obtained by the user comparing the number of similar items between different brands is assumed to be 2, 3, 2, 4, 2, 4, and 3, 4, and 4.

TABLE 2

Brand	1	2	3
				1	80	24	45
2	24	100	4
				3	45	4	70

(3) Sales dimension data

For the E-commerce platform, under the same categories, the sales volume and the sales amount of different articles in a preset time period are different, and may have a large difference or only a slight difference.

Specifically, under the same categories, whether the two items are similar or not can be judged by carrying out statistical calculation on sales dimension data (sales volume and sales amount) in time ranges (for example, 1 month or 3 months) to obtain a comprehensive sales index of the items, and since the sales volume and the sales amount are not in orders of magnitude, index-grouping processing can be carried out in advance by combining the weights thereof to map the sales volume and the sales amount into the range of 0-1, so that the data processing is more convenient and quicker, wherein the weights of the sales volume and the sales amount can be set according to needs, for example, the sales volume accounts for 50%, and the sales amount accounts for 50%, and when the index-grouping processing is carried out, the sales volume and the sales amount can be mapped into the range of 0-1 by methods such as dispersion standardization (max-min Normalization).

And finally, judging whether the goods are similar by comparing the comprehensive sales indexes of the two goods, wherein the more the comprehensive sales indexes are close, the more similar the two goods are, the difference range of comprehensive sales indexes can be preset, and for the goods C, the goods with the difference value within the preset difference range are similar goods of the goods C, so that a corresponding similar goods set (hereinafter referred to as set 3) can be obtained according to the sales dimension data.

Similarly, after set 3 is determined, similar brands may also be derived based on the brands corresponding to the items in set 3.

Similar item sets corresponding to each dimension data can be obtained according to the step S101, which are set 1, set 2 and set 3 respectively.

Step S102: and operating the similar item set corresponding to each dimension data to obtain a similar item set.

After the similar item sets corresponding to each dimension data are obtained, the similar item sets, namely the set 1 ∩ and the set 2 ∩, can be obtained by performing intersection operation on the similar item sets, and in addition, the operation performed on the similar item sets corresponding to each dimension data is not limited to the intersection operation, and can also be set as other set operations according to the function implementation requirements, such as union set operation and the like.

According to the embodiment of the invention, when the similar item set corresponding to each dimension data is operated to obtain the similar item set corresponding to the item, different time ranges (such as day, month, year and the like) can be selected for dynamic query and calculation, and when the different time ranges are selected, the obtained similar item set may change, for example, in a month unit, assuming that the similar item set of the item 1 obtained by processing and operating the data of a certain month is { item 2, item 3 and item 4}, and the similar item set of the item 1 obtained by processing and operating the data of months later may change into { item 3, item 4 and item 5}, so in order to reflect the dynamic change, the actual similar item set obtained by operation is most accurate or closest to the actual value, and the invention supports dynamic query and calculation.

In addition, according to the technical scheme of the embodiment of the invention, in the practical application process, dimensionalities of data can be selected for analysis processing to simply identify similar articles, and multiple dimensionalities of data can be selected for analysis processing to comprehensively analyze and identify similar articles, which is set by a user according to the requirement of function realization.

According to the steps S101 and S102, similar item sets corresponding to different dimensional data can be determined by processing the data of multiple dimensions related to the items, and then the determined similar item sets are calculated to obtain the similar item sets, so that the similar items can be accurately, flexibly and comprehensively automatically identified in time according to the online data of the multiple dimensions.

Fig. 2 is a schematic view of main blocks of an apparatus for identifying similar items according to an embodiment of the present invention. As shown in fig. 2, the apparatus 200 for identifying similar items according to the embodiment of the present invention mainly includes a set determining module 201 and a set calculating module 202.

The set determining module 201 is configured to process data of not less than dimensions, and determine a similar item set corresponding to each dimension data;

the set operation module 202 is configured to perform operation on the similar item set corresponding to each dimension data to obtain a similar item set.

According to embodiments of the invention, the data of no less than dimensions may include user dimension data, and the set determination module 201 may be further configured to:

obtaining the transition probability of the user among different articles according to the user dimension data, generating a transition probability matrix according to the transition probability, and then obtaining a similar article set corresponding to the user dimension data according to the transition probability matrix.

According to embodiments of the invention, the set determination module 201 may be further configured to:

acquiring behavior data of a user in a preset time period, wherein the behavior data comprises th behavior record and a second behavior record;

acquiring the behavior times S of th behaviors of the user on the article B according to the th behavior record;

acquiring the transfer times T of the user from the item A to the item B from a second behavior record corresponding to the th behavior of the user on the item B, wherein the item A is the final non-item B items of the second behavior executed by the user before the th behavior of the user on the item B;

and taking the T/S as the transition probability of the user from the item A to the item B.

According to another embodiments of the invention, the data of no less than dimensions may include item dimension data, and the set determination module 201 may be further configured to:

comparing the data information of not less than dimensions of the articles included in the article dimension data according to a preset dimension sequence to obtain a similar article set corresponding to the article dimension data.

According to still another embodiments of the invention, the data for no less than dimensions may include sales dimension data, and the set determination module 201 may be further configured to:

and calculating the sales volume and the sales amount of the article in a preset time period to obtain a comprehensive sales index of the article, and taking the article with the difference value with the comprehensive sales index of the article C in a preset range as a similar article of the article C, so as to determine a similar article set corresponding to the sales dimension data.

According to the technical solution of the embodiment of the present invention, the set operation module 201 may be further configured to:

and performing intersection operation on the similar article set corresponding to each dimension data to obtain a similar article set.

According to the technical scheme of the embodiment of the invention, the data of multiple dimensions related to the articles are processed to determine the similar article sets corresponding to the data of different dimensions, then the determined similar article sets are operated to obtain the similar article sets, sets of similarity identification models capable of automatically adjusting and calculating in real time according to online data changes are constructed, and the similar articles, brands and merchants can be automatically identified accurately, flexibly and comprehensively according to the online data of multiple dimensions in time, so that the time and the energy spent on the benchmarking work are greatly saved, the fine benchmarking from different dimensions can be realized, and data support is provided for the development of articles and the establishment of improved targets and strategies.

Fig. 3 illustrates an exemplary system architecture 300 to which the method of identifying similar items or the apparatus for identifying similar items of embodiments of the present invention may be applied.

As shown in fig. 3, the system architecture 300 may include

terminal devices

301, 302, 303, a network 304, and a server 305. The network 304 serves as a medium for providing communication links between the

terminal devices

301, 302, 303 and the server 305. Network 304 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal device

301, 302, 303 to interact with the server 305 via the network 304 to receive or send messages or the like. The

terminal devices

301, 302, 303 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

301, 302, 303 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 305 may be a server providing various services, such as a background management server (for example only) providing support for shopping-like websites browsed by users using the

terminal devices

301, 302, 303. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.

It should be noted that the method for identifying similar items according to the embodiment of the present invention is generally executed by the server 305, and accordingly, the apparatus for identifying similar items is generally disposed in the server 305.

It should be understood that the number of terminal devices, networks, and servers in fig. 3 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 4, there is shown a schematic block diagram of a computer system 400 suitable for implementing a terminal device or server according to an embodiment of the invention, where the terminal device or server shown in FIG. 4 is only examples and should not be taken as limiting the scope of use or functionality of an embodiment of the invention.

As shown in fig. 4, the computer system 400 includes a Central Processing Unit (CPU)401 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for the operation of the system 400 are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.

For example, the embodiments of the present disclosure include computer program products comprising a computer program embodied on a computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart.

A more specific example of a computer readable storage medium may include, but is not limited to, an electrical connection having or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures, for example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved, and it may also be noted that each block of the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, may be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The described units or modules can also be arranged in a processor, for example, processors comprise a set determining module and a set operating module, wherein the names of the units or modules do not form the limitation of the units or modules in a certain situation, for example, the set determining module can also be described as a module for processing data of not less than dimensions and determining a similar item set corresponding to each dimension data.

In another aspect, the present invention further provides computer readable media, which may be included in the device described in the above embodiments, or may be separately present and not installed in the device, where the computer readable media carries or more programs, and when the or more programs are executed by devices, the device includes processing data of not less than dimensions, determining a similar item set corresponding to each dimension data, and performing an operation on the similar item set corresponding to each dimension data to obtain a similar item set.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

A method of identifying similar items in , comprising:

processing the data of not less than dimensions, and determining a similar item set corresponding to each dimension data;

and calculating the similar article set corresponding to each dimension data to obtain a similar article set.
2. The method of claim 1,

the data of not less than dimensions includes user dimension data, and,

processing data of not less than dimensions, wherein the step of determining a similar item set corresponding to each dimension data comprises:

obtaining the transition probability of the user among different articles according to the user dimension data, generating a transition probability matrix according to the transition probability, and then obtaining a similar article set corresponding to the user dimension data according to the transition probability matrix.
3. The method of claim 2, wherein the step of deriving a transition probability of the user between different items from the user dimensional data comprises:

acquiring behavior data of a user in a preset time period, wherein the behavior data comprises th behavior record and a second behavior record;

acquiring the behavior times S of th behaviors of the user on the article B according to the th behavior record;

acquiring the transfer times T of the user from the item A to the item B from a second behavior record corresponding to the th behavior of the user on the item B, wherein the item A is the final non-item B items of which the user performs the second behavior before performing the th behavior on the item B;

and taking the T/S as the transition probability of the user from the item A to the item B.
4. The method according to claim 1 or 2,

the no less than dimensions of data include item dimension data and,

processing data of not less than dimensions, wherein the step of determining a similar item set corresponding to each dimension data comprises:

comparing the data information of not less than dimensions of the articles included in the article dimension data according to a preset dimension sequence to obtain a similar article set corresponding to the article dimension data.
5. The method of claim 1,

the data of not less than dimensions includes sales dimension data and,

processing data of not less than dimensions, wherein the step of determining a similar item set corresponding to each dimension data comprises:

and calculating the sales volume and the sales amount of the article in the preset time period included in the sales dimension data to obtain a comprehensive sales index of the article, and taking the article with the difference value with the comprehensive sales index of the article C in a preset range as a similar article of the article C, so as to determine a similar article set corresponding to the sales dimension data.
6. The method according to claim 1, wherein the step of operating the similar item set corresponding to each dimension data to obtain a similar item set comprises:

and performing intersection operation on the similar article set corresponding to each dimension data to obtain a similar article set.
An apparatus for identifying similar items of the kind 7, , comprising:

the set determining module is used for processing the data of not less than dimensions and determining a similar item set corresponding to each dimension data;

and the set operation module is used for operating the similar article set corresponding to each dimension data to obtain the similar article set.
8. The apparatus of claim 7,

the data of not less than dimensions includes user dimension data, and,

the set determination module is further to:

obtaining the transition probability of the user among different articles according to the user dimension data, generating a transition probability matrix according to the transition probability, and then obtaining a similar article set corresponding to the user dimension data according to the transition probability matrix.
9. The apparatus of claim 8, wherein the set determination module is further configured to:

acquiring behavior data of a user in a preset time period, wherein the behavior data comprises th behavior record and a second behavior record;

acquiring the behavior times S of th behaviors of the user on the article B according to the th behavior record;

acquiring the transfer times T of the user from the item A to the item B from a second behavior record corresponding to the th behavior of the user on the item B, wherein the item A is the final non-item B items of which the user performs the second behavior before performing the th behavior on the item B;

and taking the T/S as the transition probability of the user from the item A to the item B.
10. The apparatus according to claim 7 or 8,

the no less than dimensions of data include item dimension data and,

the set determination module is further to:

comparing the data information of not less than dimensions of the articles included in the article dimension data according to a preset dimension sequence to obtain a similar article set corresponding to the article dimension data.
11. The apparatus of claim 7,

the data of not less than dimensions includes sales dimension data and,

the set determination module is further to:

and calculating the sales volume and the sales amount of the article in the preset time period included in the sales dimension data to obtain a comprehensive sales index of the article, and taking the article with the difference value with the comprehensive sales index of the article C in a preset range as a similar article of the article C, so as to determine a similar article set corresponding to the sales dimension data.
12. The apparatus of claim 7, wherein the set operation module is further configured to:

and performing intersection operation on the similar article set corresponding to each dimension data to obtain a similar article set.
An electronic device of the type identified as item , comprising:

or more processors;

a storage device for storing or more programs,

when the or more programs are executed by the or more processors, cause the or more processors to implement the method of any of claims 1-6 to .
14, computer readable medium, having stored thereon a computer program, characterized in that the program, when being executed by a processor, carries out the method according to any of claims 1-6, .