CN105320778A - Commodity labeling method suitable for electronic commerce Chinese website - Google Patents

Commodity labeling method suitable for electronic commerce Chinese website Download PDF

Info

Publication number
CN105320778A
CN105320778A CN201510828440.0A CN201510828440A CN105320778A CN 105320778 A CN105320778 A CN 105320778A CN 201510828440 A CN201510828440 A CN 201510828440A CN 105320778 A CN105320778 A CN 105320778A
Authority
CN
China
Prior art keywords
label
commodity
labels
word segmentation
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510828440.0A
Other languages
Chinese (zh)
Other versions
CN105320778B (en
Inventor
沈华楠
赵亮亮
姜平
何学勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Focus Technology Co Ltd
Original Assignee
Focus Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Focus Technology Co Ltd filed Critical Focus Technology Co Ltd
Priority to CN201510828440.0A priority Critical patent/CN105320778B/en
Publication of CN105320778A publication Critical patent/CN105320778A/en
Application granted granted Critical
Publication of CN105320778B publication Critical patent/CN105320778B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

A commodity labeling method suitable for an electronic commerce Chinese website includes the step of building a word segmentation lexicon, the step of collecting labels and the step of marking commodities with the labels. According to the step of building the word segmentation lexicon, based on frequency statistics, in different commodity descriptions, of all commodity key words in the electronic commerce Chinese website, the commodity key words with the frequency larger than three are reserved, and the key words with the number of Chinese characters of the commodity key words smaller than or equal to five are screened from the commodity key words to serve as lexicon data. According to the step of collecting the labels, based on the built word segmentation lexicon, word segmentation processing is carried out on all commodity names in the electronic commerce Chinese website through a reverse maximum matching word segmentation algorithm, after word segmentation processing is carried out through the reverse maximum matching word segmentation algorithm, the last word formed by word segmentation processing of each commodity is selected as the commodity label of the commodity, and finally all the labels form a label data set. According to the step of marking the commodities with the labels, relations between commodity attributes and the labels are found by using a text mining algorithm.

Description

A kind of method being applicable to ecommerce Chinese website Commercial goods labels
Technical field
The invention belongs to computer internet field, particularly relate to a kind of method being applicable to ecommerce Chinese website Commercial goods labels.
Background technology
In ecommerce Chinese website, when user utilizes keyword retrieval commodity, normally directly retrieve the essential information of commodity, but be mostly to be filled in by businessman oneself and safeguard due to the merchandise news in website, though businessman can safeguard merchandise news according to the commodity rule of website, but still the appearance of two class problems can not be avoided: the problem of first merchandise news cheating, businessman is in order to provide exposure rate in commercial articles searching process of oneself commodity and the frequency of occurrences, make the commodity of issue noticeable, make commodity purchaser can search the commodity of issue more, they are abusing brand name or there is not the keyword associated with these commodity to during descriptive labelling, thus cause commodity purchaser cannot find the commodity of needs exactly, it two is the incomplete problems of merchandise news, the key message of descriptive labelling is omitted by businessman when describing commodity, comprise the important information disappearances such as commodity title, picture, description, and when loss of learning will cause user to do item retrieves, website cannot return the item retrieves result of more heterogeneous pass.
For the problem of businessman's cheating merchandise news, e-commerce website usually sets rule and solves, and falls power to the cheating commodity that those do not meet rule, but rule exists defect to a certain degree, and hard and fast rule may cause the commodity of not practising fraud to fall power; Loose rule may make the effect of anti-cheating obvious not; Solving in the infull problem of businessman's fill message, for ensureing to recall Related product as much as possible, the stint no sacrifice retrieval quality of e-commerce website and select expand retrieval merchandise news range of search, namely do in multiple merchandise news field and mate, sometimes the field that even " descriptive labelling " this kind of data volume is huge but second-rate is all selected, although this mode can recall more commodity, the commodity of recalling can not make user satisfied, and then cause flow to run off in a large number.
Summary of the invention
For the imperfection of prior art, the present invention seeks to, a kind of method being applicable to ecommerce Chinese website Commercial goods labels is provided, by comprehensively analyzing the information of trade name and item property, the label relevant to commodity is provided to indicate it, with the merchandise news in Perfect Electronic Business Chinese website.These will participate in retrieval, to ensure, while recalling more heterogeneous underlying commodity, also can promote the accuracy rate of item retrieves as important search field in order to the label data indicating commodity in commercial articles searching process.
Technical scheme of the present invention is as follows, and a kind of method being applicable to ecommerce Chinese website Product labelling, is characterized in that, concrete steps comprise the method for the construction method of participle dictionary, method that label gathers and labeled marker commodity;
The construction method of so-called participle dictionary, refer to based on to the frequency statistics of each commodity keyword in different descriptive labelling in ecommerce Chinese website, retain the commodity keyword that the frequency is greater than 3, and therefrom filter out commodity keyword number of words and be less than or equal to the keyword of 5 as dictionary data, when the commodity keyword that length is long comprise multiple short keyword time, these long words can not be put in storage;
So-called commodity keyword, referring to the word freely added by website background system by businessman, is the description of businessman to commodity key feature;
Especially, consider that in ecommerce Chinese website, commodity keyword is added by commodity seller usually, thus from these keywords, choose brief refining and high frequency occur word list in participle dictionary, at utmost can ensure the accuracy of participle;
So-called label acquisition method, referring to the participle dictionary based on having built, carrying out word segmentation processing by reverse Max Match word segmentation arithmetic to trade names all in ecommerce Chinese website; After the word segmentation processing of maximum reverse matching algorithm, according to Chinese grammar feature, namely in the statement form of " adjective+noun ", noun is positioned at end of the sentence, and then chooses last word that commodity are formed after word segmentation processing Commercial goods labels as these commodity; Finally, these all label composition label datas set;
So-called trade name, refers to one section of brief textual description to commodity of being added voluntarily by businessman;
The method of so-called labeled marker commodity, refers to by utilizing text mining algorithm, finds the relation between item property and label.Especially, utilize the prerequisite of text mining algorithm be item property and label all possess can embody both sides relation and representational content as basis for estimation.Item property multi-facetedly can show product features, if label also has the characteristic of oneself, by comparing both similaritys in feature, can determine the similarity relation between item property and label.
Further, the step that the method for labeled marker commodity specifically comprises has:
Step 1: the acquisition of label characteristics
The characteristic information being subordinate to each label is determined on the basis of tag set.If the label of certain commodity appears in the trade name of certain commodity, then give tacit consent to this label and this commodity exist correlationship.
According to above-mentioned thinking, first filter out the trade name comprising a certain specific label word, then find the product features information data of these commodity according to trade name, count the characteristic information data of all product features information datas as this label; Especially, product features information data comes from information attribute value;
Step 2: judge the similarity relation between commodity and label
Based on all label characteristics of a certain label, analyze the weight of each label characteristics, assess the representativeness of each label characteristics in the feature of all labels, specifically comprise:
Step 2-1: analyze the distribution situation of each label characteristics at tag set a: if label characteristics concentrates in a label, then the representativeness giving tacit consent to this label characteristics is strong; If a label characteristics is distributed in multiple label, then the representativeness giving tacit consent to this label characteristics is not strong;
Step 2-2: with reference to TF*IDF weighing computation method, for the label characteristics that representativeness is strong, do weighting, weight is that the frequency that label characteristics occurs in this label is multiplied by initial weight; For the label characteristics that representativeness is weak, do and fall power, weight is the frequency that initial weight occurs in different label divided by this label; Label characteristics weight Boost in the label pcan refer to following formula:
Boost p = ( c o u n t ( p , t ) s i z e ( t ) ) × l o g ( N t a g s ( p , t ) )
Wherein, count (p, t) number of times that label characteristics p occurs in label t is represented, size (t) represents the number of the label characteristics that label t comprises, N represents the total number of labels in tag set, tags (p, t) represents the number comprising the label t of label characteristics p.
Step 2-3: the space vector characteristic information set of label and the characteristic information set of commodity being abstracted into respectively a multidimensional, utilize space vector cosine similarity principle, by calculating the similarity between two space vectors, judge the correlationship between commodity and label;
Step 3: the respective labels determining commodity
Because the degree of correlation between commodity and label has dividing of height quality, thus the degree of correlation coefficient value of label and commodity is also not enough directly gives commodity by label, need by setting reasonable threshold values, filter out the similarity between two space vectors and between commodity and label, the label of correlationship coefficient on threshold values is as the label of commodity, threshold range is between 0 ~ 1; The setting of threshold values can require to provide strict or loose value according to the quality of data, if wish, commercial articles searching process is stricter, and threshold values is more close to 1.In addition, the mean value of all degree of correlation coefficient values can also be got as threshold values;
Especially, for choosing the label of commodity more accurately, optionally can control the label number of each commodity, and the maximally related label within selection restriction number is as Commercial goods labels.
Information attribute value represents some features of commodity, if label also has the characteristic of oneself, so we excavate the relation that both relations between characteristic just can know commodity and label.
The present invention compared with prior art, its beneficial effect:
(1) the present invention utilizes commodity keyword to build participle dictionary, realizes doing word segmentation processing based on the key feature of existing goods in website to descriptive labelling, thus ensures participle accuracy, be conducive to accurately locking trade name in descriptive labelling;
(2) the present invention is by identification and the feature determining label, label characteristics and product features are carried out similarity-rough set, thus confirm the similar names of trade name, for commodity indicate abundanter label, improve merchandise news, contribute in search procedure, promote search recall rate and accuracy rate;
(3) the present invention is by finding entity tag and respective labels for the commodity in e-commerce website, is ensureing that Commercial goods labels has more
While objectivity, also can improve the reliability of merchandise news;
Accompanying drawing explanation
A kind of structural drawing being applicable to ecommerce Chinese website Commercial goods labels method in Fig. 1 embodiment of the present invention;
The process flow diagram that in Fig. 2 embodiment of the present invention, labeled marker commodity method realizes;
The process flow diagram that between commodity and label, similarity relation method realizes is judged in Fig. 3 embodiment of the present invention.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.
The present invention specifically comprises method, the method for label collection and the method for labeled marker commodity that participle dictionary builds; The method that participle dictionary builds is used for being called word segmentation processing to the trade name in ecommerce Chinese website; It is that the label corresponding to it found by all commodity in ecommerce Chinese website that the method for label collection is used for according to trade name; The method of labeled marker commodity is used for for the label having correlationship with it found by commodity all in ecommerce Chinese website.Described trade name is the brief textual description that the businessman user of ecommerce Chinese website does oneself commodity.
For made in China net Chinese website, a kind of method being applicable to ecommerce Chinese website Commercial goods labels, comprises the construction method of participle dictionary, the method for label collection and the method for labeled marker commodity, consults shown in Fig. 1;
The construction method of so-called participle dictionary, refer to based on to the frequency statistics of each commodity keyword in different descriptive labelling in ecommerce Chinese website, retain the keyword that the frequency is greater than 3, and therefrom filter out keyword number of words and be less than or equal to the keyword of 5 as dictionary data, when the keyword that length is long comprise multiple short keyword time, these long words can not be put in storage, such as: " electric bicycle ", this word comprises " electronic " and " bicycle " two short words, and so " electric bicycle " this word can not sign in in participle dictionary.
Especially, consider that in ecommerce Chinese website, commodity keyword is added by commodity seller usually, thus from these keywords, choose brief refining and high frequency occur word list in participle dictionary, at utmost can ensure the accuracy of participle;
Existing following 15 commodity and businessman are the commodity keyword that it adds:
Through statistics, the commodity keyword selecting frequency to be more than or equal to 3 enters participle dictionary, as shown in the table:
Keyword Frequency statistics
Screen printer 5
Screen-printing machine 4
Automatically 7
Silk screen 4
Printing machine 6
Lathe 4
Polychrome 3
Numerical control 3
So-called label acquisition method, referring to the dictionary based on having built, carrying out word segmentation processing by reverse Max Match word segmentation arithmetic to trade names all in ecommerce Chinese website; According to Chinese grammar feature, choose last word that trade name formed after word segmentation processing Commercial goods labels as these commodity; Finally, all Commercial goods labels composition label datas set;
According to above-mentioned example, the Commercial goods labels of result after participle of 15 commodity commodity and formation is as follows:
The so-called reverse Max Match word segmentation arithmetic based on dictionary, refer to needing the statement of participle repeatedly to scan from back to front, the phrase maximum length of each scanning is the length of the word that in dictionary, length is maximum, when the phrase of scanning is in dictionary, the position then scanned is just as cut-off, and next time, scanning continued scanning forward from this cut-off; If sweep length minimumly also not to find dictionary from being up to, then scanning position moves forward one, and scanning, as new cut-off, is then continued in this position.Here is object lesson:
For trade name " power surpasses Full-automatic film switch screen printer ", the existing dictionary built based on us carries out participle:
Step one: confirm that length is maximum in dictionary word be " screen printer " or " automatically " equal length is the word of 3, thus the length scanned from maximum be successively decrease 3, minimum sweep length is 2;
Step 2: start scanning from back to front and treat participle statement, first three words scanned are " screen printers ", the word of this three words composition is in dictionary, so " screen printer " front this position is as cut-off, statement becomes " power surpasses Full-automatic film switch/screen printer ";
Step 3: from last scan to cut-off continue scanning, first three words scanned are " membrane switchs ", the word of these three word compositions is not in dictionary, again scan so sweep length subtracts 1, two words scanned are " switches ", the word of these two word compositions is not still in dictionary, and need to move forward one this time and find new cut-off, this time, statement became " power surpasses Full-automatic film ON/OFF/screen printer ";
Step 4: continue the scanning cutting according to step 2 and step 3, cutting is always to last, and statement becomes " power/super/full-automatic/thin/film/ON/OFF/screen printer ", then stops exiting;
Through four steps above, the word segmentation result of specifying statement based on dictionary can be obtained.
The method of so-called labeled marker commodity, refers to by utilizing text mining algorithm, finds the relation between commodity and label.Especially, utilize the prerequisite of text mining algorithm be commodity and label all possess can embody both sides relation and representational content as basis for estimation.Item property multi-facetedly can show product features, if label also has the characteristic of oneself, by comparing both similaritys in feature, can determine the similarity relation between commodity and label.
Consult shown in Fig. 2, the step that the method for labeled marker commodity specifically comprises has:
Step 101: the acquisition of label characteristics
The characteristic information being subordinate to each label is determined on the basis of tag set.If certain label appears in the title of certain commodity, then give tacit consent to this label and this commodity exist correlationship.
According to this thinking, first filter out the trade name comprising a certain specific label word, then find the product features information data of these commodity according to trade name, count the characteristic information data of all product features information datas as this label; Especially, product features information data comes from information attribute value;
According to above-mentioned example, first arrange out 15 commodity and their item property, correspondingly, the label characteristics of label " screen printer " comprising: mode of operation _ full-automatic, printing surface _ plane, print color _ polychrome; The label characteristics of label " coating machine " comprising: print color _ polychrome, mode of operation _ full-automatic, printing surface _ plane; The label characteristics of label " screen-printing machine " comprising: print color _ polychrome, brand _ hat reach, mode of operation _ full-automatic, printing surface _ plane; Other are more specifically as following table:
Step 102: judge the similarity relation between commodity and label
Based on all label characteristics of a certain specific label, analyze the weight of each label characteristics, assess the representativeness of each label characteristics in all label characteristics, specifically comprise:
Step 102-1: analyze the distribution situation of each label characteristics at tag set a: if label characteristics concentrates in same label, then the representativeness giving tacit consent to this label characteristics is strong; If a label characteristics is distributed in multiple label, then the representativeness giving tacit consent to this label characteristics is not strong;
For convenience of understanding, choosing label " screen printer ", " screen-printing machine " and " lathe ", and adding up the frequency of the appearance of their label characteristics, as following table:
Step 102-2: with reference to TF*IDF weighing computation method, for the label characteristics that representativeness is strong, do weighting, weight is that the frequency that label characteristics occurs in this label is multiplied by initial weight (initial weight is determined on demand); For the label characteristics that representativeness is weak, do and fall power, weight is the frequency that initial weight occurs in different label divided by this label; Label characteristics weight Boost in the label pcan refer to following formula:
Boost p = ( c o u n t ( p , t ) s i z e ( t ) ) × l o g ( N t a g s ( p , t ) )
Wherein, count (p, t) number of times that label characteristics p occurs in label t is represented, size (t) represents the number of the label characteristics that label t comprises, N represents the total number of labels in tag set, tags (p, t) represents the number comprising the label t of label characteristics p.
Below the weight of the respective characteristic attribute of label " screen printer ", " screen-printing machine " and " lathe ":
[Boost screen printer] (mode of operation _ full-automatic)=(3/7) * log (3/2)=0.075
[Boost screen printer] (printing surface _ plane)=(2/7) * log (3/2)=0.050
[Boost screen printer] (print color _ polychrome)=(2/7) * log (3/2)=0.050
[Boost screen-printing machine] (print color _ polychrome)=(3/11) * log (3/2)=0.048
[Boost screen-printing machine] (mode of operation _ full-automatic)=(3/11) * log (3/2)=0.048
[Boost screen-printing machine] (printing surface _ plane)=(3/11) * log (3/2)=0.048
[Boost screen-printing machine] (brand _ hat reaches)=(2/11) * log (3/2)=0.032
[Boost lathe] (installation form _ console mode)=(3/16) * log (3/1)=0.089
[Boost lathe] (precision _ precision)=(4/16) * log (3/1)=0.119
[Boost lathe] (distribution form _ horizontal)=(2/16) * log (3/1)=0.060
[Boost lathe] (automaticity _ automatically)=(3/16) * log (3/1)=0.089
[Boost lathe] (knife rest quantity _ double tool rest numerically controlled lathe)=(2/16) * log (3/1)=0.060
[Boost lathe] (control mode _ numerical control)=(2/16) * log (3/1)=0.060
Step 102-3: the space vector characteristic information set of label and the characteristic information set of commodity being abstracted into respectively a multidimensional, using the weighted value of feature as vector value, utilize space vector cosine similarity principle, by calculating the similarity between two space vectors, judge the correlationship between commodity and label;
According to similarity formula:
Cos (power surpasses Full-automatic film switch screen printer, label (lathe))=0.0%
Cos (double-colored automatic screen-printing machine, label (lathe))=0.0%
Cos (half tone coating machine, label (lathe))=0.0%
Cos (platform encourages good fortune numerical control press, label (screen-printing machine))=0.0%
Cos (platform encourages good fortune numerical control press, label (screen printer))=0.0%
Step 103: the respective labels determining commodity
Because the degree of correlation between commodity and label has dividing of height quality, thus the degree of correlation coefficient value of label and commodity is also not enough to directly give commodity by label, by setting reasonable threshold values, the label of the label of degree of correlation coefficient on threshold values as commodity need be filtered out; The setting of threshold values can require to provide strict or loose value according to the quality of data, also can get the mean value of all degree of correlation coefficient values as threshold values;
Especially, for choosing the label of commodity more accurately, optionally can control the label number of each commodity, and the maximally related label within selection restriction number is as Commercial goods labels;
Information attribute value represents some features of commodity, if label also has the characteristic of oneself, so we excavate the relation that both relations between characteristic just can know commodity and label;
According to above-mentioned steps, we have drawn the similarity between each commodity and label, in order to ensure the quality of respective labels, threshold values is set to 90% by us, similarity between commodity and label is more than 90%, we think that this label can use as of a commodity label, so we stamp the label of " lathe " to " platform encourages good fortune numerical control press ", stamp " screen printer " and " screen-printing machine " label to " half tone coating machine " these commodity.In this case, just " half tone coating machine " these commodity can be called back when user search " screen printer " or " screen-printing machine " time.By this method, we stamp relevant label can to more commodity, thus promote the sophistication of merchandise news, ensure the recall rate of search.
Those of ordinary skill in the field are to be understood that: the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (4)

1. be applicable to a method for ecommerce Chinese website Product labelling, it is characterized in that, concrete steps comprise the method for the construction method of participle dictionary, method that label gathers and labeled marker commodity;
The construction method of so-called participle dictionary, refer to based on to the frequency statistics of each commodity keyword in different descriptive labelling in ecommerce Chinese website, retain the commodity keyword that the frequency is greater than 3, and therefrom filter out commodity keyword number of words and be less than or equal to the keyword of 5 as dictionary data, when the commodity keyword that length is long comprise multiple short keyword time, these long words can not be put in storage;
So-called commodity keyword, referring to the word freely added by website background system by businessman, is the description of businessman to commodity key feature;
Especially, consider that in ecommerce Chinese website, commodity keyword is added by commodity seller usually, thus from these keywords, choose brief refining and high frequency occur word list in participle dictionary, at utmost can ensure the accuracy of participle;
So-called label acquisition method, referring to the participle dictionary based on having built, carrying out word segmentation processing by reverse Max Match word segmentation arithmetic to trade names all in ecommerce Chinese website; After the word segmentation processing of maximum reverse matching algorithm, according to Chinese grammar feature, namely in the statement form of " adjective+noun ", noun is positioned at end of the sentence, and then chooses last word that commodity are formed after word segmentation processing Commercial goods labels as these commodity; Finally, these all label composition label datas set;
So-called trade name, refers to one section of brief textual description to commodity of being added voluntarily by businessman;
The method of so-called labeled marker commodity, refers to by utilizing text mining algorithm, finds the relation between item property and label; Utilize the prerequisite of text mining algorithm be item property and label all possess can embody both sides relation and representational content as basis for estimation.Item property multi-facetedly can show product features, if label also has the characteristic of oneself, by comparing both similaritys in feature, can determine the similarity relation between item property and label.
2. method according to claim 1, is characterized in that, the step that the method for labeled marker commodity specifically comprises has:
Step 1: the acquisition of label characteristics
The characteristic information being subordinate to each label is determined on the basis of tag set.If the label of certain commodity appears in the trade name of certain commodity, then give tacit consent to this label and this commodity exist correlationship.
According to above-mentioned thinking, first filter out the trade name comprising a certain specific label word, then find the product features information data of these commodity according to trade name, count the characteristic information data of all product features information datas as this label; Especially, product features information data comes from information attribute value;
Step 2: judge the similarity relation between commodity and label
Step 3: the respective labels determining commodity
Because the degree of correlation between commodity and label has dividing of height quality, thus the degree of correlation coefficient value of label and commodity is also not enough directly gives commodity by label, need by setting reasonable threshold values, filter out the similarity between two space vectors and between commodity and label, the label of correlationship coefficient on threshold values is as the label of commodity, threshold range is between 0 ~ 1; The setting of threshold values can require to provide strict or loose value according to the quality of data, if wish, commercial articles searching process is stricter, and threshold values is more close to 1.In addition, the mean value of all degree of correlation coefficient values can also be got as threshold values;
Especially, for choosing the label of commodity more accurately, optionally can control the label number of each commodity, and the maximally related label within selection restriction number is as Commercial goods labels.
3. method according to claim 2, is characterized in that, in step 2: based on all label characteristics of a certain label, analyze the weight of each label characteristics, assesses the representativeness of each label characteristics in the feature of all labels, specifically comprises:
Step 2-1: analyze the distribution situation of each label characteristics at tag set a: if label characteristics concentrates in a label, then the representativeness giving tacit consent to this label characteristics is strong; If a label characteristics is distributed in multiple label, then the representativeness giving tacit consent to this label characteristics is not strong;
Step 2-2: with reference to TF*IDF weighing computation method, for the label characteristics that representativeness is strong, do weighting, weight is that the frequency that label characteristics occurs in this label is multiplied by initial weight; For the label characteristics that representativeness is weak, do and fall power, weight is the frequency that initial weight occurs in different label divided by this label; Label characteristics weight Boost in the label pcan refer to following formula:
Boost p = ( c o u n t ( p , t ) s i z e ( t ) ) × log ( N t a g s ( p , t ) )
Wherein, count (p, t) number of times that label characteristics p occurs in label t is represented, size (t) represents the number of the label characteristics that label t comprises, N represents the total number of labels in tag set, tags (p, t) represents the number comprising the label t of label characteristics p.
Step 2-3: the space vector characteristic information set of label and the characteristic information set of commodity being abstracted into respectively a multidimensional, utilize space vector cosine similarity principle, by calculating the similarity between two space vectors, judge the correlationship between commodity and label;
4. method according to claim 3, it is characterized in that, in the deterministic process of label and commodity relation, first the trade name comprising certain label is filtered out, then find the product features information data of these commodity according to trade name, count the characteristic information data of all product features information datas as this label; Described product features information data comes from information attribute value.
CN201510828440.0A 2015-11-25 2015-11-25 A method of suitable for e-commerce Chinese website Commercial goods labels Active CN105320778B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510828440.0A CN105320778B (en) 2015-11-25 2015-11-25 A method of suitable for e-commerce Chinese website Commercial goods labels

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510828440.0A CN105320778B (en) 2015-11-25 2015-11-25 A method of suitable for e-commerce Chinese website Commercial goods labels

Publications (2)

Publication Number Publication Date
CN105320778A true CN105320778A (en) 2016-02-10
CN105320778B CN105320778B (en) 2019-04-02

Family

ID=55248164

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510828440.0A Active CN105320778B (en) 2015-11-25 2015-11-25 A method of suitable for e-commerce Chinese website Commercial goods labels

Country Status (1)

Country Link
CN (1) CN105320778B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844554A (en) * 2016-12-30 2017-06-13 全民互联科技(天津)有限公司 A kind of contract classification automatic identifying method and system
CN107729422A (en) * 2017-09-27 2018-02-23 广州市万表科技股份有限公司 A kind of personality method of testing and system based on commodity identification
CN107784029A (en) * 2016-08-31 2018-03-09 阿里巴巴集团控股有限公司 Generation prompting keyword, the method for establishing index relative, server and client side
CN107909097A (en) * 2017-11-08 2018-04-13 阿里巴巴集团控股有限公司 The update method and device of sample in sample storehouse
CN108388555A (en) * 2018-02-01 2018-08-10 口碑(上海)信息技术有限公司 Commodity De-weight method based on category of employment and device
CN110334306A (en) * 2019-06-21 2019-10-15 无线生活(北京)信息技术有限公司 Label processing method and device
CN110909532A (en) * 2019-10-31 2020-03-24 银联智惠信息服务(上海)有限公司 User name matching method and device, computer equipment and storage medium
CN111126110A (en) * 2018-10-31 2020-05-08 杭州海康威视数字技术股份有限公司 Commodity information identification method, settlement method and device and unmanned retail system
CN111782847A (en) * 2019-07-31 2020-10-16 北京京东尚科信息技术有限公司 Image processing method, apparatus and computer-readable storage medium
CN112768080A (en) * 2021-01-25 2021-05-07 武汉大学 Medical keyword bank establishing method and system based on medical big data
CN113779243A (en) * 2021-08-16 2021-12-10 深圳市世强元件网络有限公司 Automatic commodity classification method and device and computer equipment
CN114817672A (en) * 2022-06-07 2022-07-29 舟谱数据技术南京有限公司 Processing system and processing method for realizing normalization by using keywords in trade names

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727500A (en) * 2010-01-15 2010-06-09 清华大学 Text classification method of Chinese web page based on steam clustering
CN103605815A (en) * 2013-12-11 2014-02-26 焦点科技股份有限公司 Automatic commodity information classifying and recommending method applicable to B2B (Business to Business) e-commerce platform
US20140067815A1 (en) * 2012-09-05 2014-03-06 Alibaba Group Holding Limited Labeling Product Identifiers and Navigating Products
CN105069086A (en) * 2015-07-31 2015-11-18 焦点科技股份有限公司 Method and system for optimizing electronic commerce commodity searching

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727500A (en) * 2010-01-15 2010-06-09 清华大学 Text classification method of Chinese web page based on steam clustering
US20140067815A1 (en) * 2012-09-05 2014-03-06 Alibaba Group Holding Limited Labeling Product Identifiers and Navigating Products
CN103605815A (en) * 2013-12-11 2014-02-26 焦点科技股份有限公司 Automatic commodity information classifying and recommending method applicable to B2B (Business to Business) e-commerce platform
CN105069086A (en) * 2015-07-31 2015-11-18 焦点科技股份有限公司 Method and system for optimizing electronic commerce commodity searching

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107784029A (en) * 2016-08-31 2018-03-09 阿里巴巴集团控股有限公司 Generation prompting keyword, the method for establishing index relative, server and client side
CN107784029B (en) * 2016-08-31 2022-02-08 阿里巴巴集团控股有限公司 Method, server and client for generating prompt keywords and establishing index relationship
CN106844554A (en) * 2016-12-30 2017-06-13 全民互联科技(天津)有限公司 A kind of contract classification automatic identifying method and system
CN107729422A (en) * 2017-09-27 2018-02-23 广州市万表科技股份有限公司 A kind of personality method of testing and system based on commodity identification
CN107909097B (en) * 2017-11-08 2021-07-30 创新先进技术有限公司 Method and device for updating samples in sample library
CN107909097A (en) * 2017-11-08 2018-04-13 阿里巴巴集团控股有限公司 The update method and device of sample in sample storehouse
CN108388555A (en) * 2018-02-01 2018-08-10 口碑(上海)信息技术有限公司 Commodity De-weight method based on category of employment and device
CN111126110B (en) * 2018-10-31 2024-01-05 杭州海康威视数字技术股份有限公司 Commodity information identification method, settlement method, device and unmanned retail system
CN111126110A (en) * 2018-10-31 2020-05-08 杭州海康威视数字技术股份有限公司 Commodity information identification method, settlement method and device and unmanned retail system
CN110334306A (en) * 2019-06-21 2019-10-15 无线生活(北京)信息技术有限公司 Label processing method and device
CN111782847A (en) * 2019-07-31 2020-10-16 北京京东尚科信息技术有限公司 Image processing method, apparatus and computer-readable storage medium
CN110909532B (en) * 2019-10-31 2021-06-11 银联智惠信息服务(上海)有限公司 User name matching method and device, computer equipment and storage medium
CN110909532A (en) * 2019-10-31 2020-03-24 银联智惠信息服务(上海)有限公司 User name matching method and device, computer equipment and storage medium
CN112768080A (en) * 2021-01-25 2021-05-07 武汉大学 Medical keyword bank establishing method and system based on medical big data
CN113779243A (en) * 2021-08-16 2021-12-10 深圳市世强元件网络有限公司 Automatic commodity classification method and device and computer equipment
CN114817672A (en) * 2022-06-07 2022-07-29 舟谱数据技术南京有限公司 Processing system and processing method for realizing normalization by using keywords in trade names
CN114817672B (en) * 2022-06-07 2022-09-20 舟谱数据技术南京有限公司 Processing system and processing method for realizing normalization by using keywords in trade names

Also Published As

Publication number Publication date
CN105320778B (en) 2019-04-02

Similar Documents

Publication Publication Date Title
CN105320778A (en) Commodity labeling method suitable for electronic commerce Chinese website
CN103729359B (en) A kind of method and system recommending search word
US8190556B2 (en) Intellegent data search engine
CN103049435B (en) Text fine granularity sentiment analysis method and device
EP2192500B1 (en) System and method for providing robust topic identification in social indexes
CN108763321B (en) Related entity recommendation method based on large-scale related entity network
JP4637969B1 (en) Properly understand the intent of web pages and user preferences, and recommend the best information in real time
JP5721818B2 (en) Use of model information group in search
CN106156204A (en) The extracting method of text label and device
KR20110040147A (en) Apparatus for question answering based on answer trustworthiness and method thereof
CN105868255A (en) Query recommendation method and apparatus
CN105824833A (en) Keyword recommendation method and system based on user behavior feedback
CN103294778A (en) Method and system for pushing messages
CN104077417A (en) Figure tag recommendation method and system in social network
US20200272674A1 (en) Method and apparatus for recommending entity, electronic device and computer readable medium
CN115203309B (en) Method and device for structuring bid-winning data of webpage
DE102012221251A1 (en) Semantic and contextual search of knowledge stores
CN105528411A (en) Full-text retrieval device and method for interactive electronic technical manual of shipping equipment
CN111444304A (en) Search ranking method and device
CN105468649A (en) Method and apparatus for determining matching of to-be-displayed object
CN109710725A (en) A kind of Chinese table column label restoration methods and system based on text classification
Aria et al. Package ‘bibliometrix’
CN106372232B (en) Information mining method and device based on artificial intelligence
CN108153728A (en) A kind of keyword determines method and device
CN113553491A (en) Industrial big data search optimization method based on inverted index

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant