CN105320778A - Commodity labeling method suitable for electronic commerce Chinese website - Google Patents
Commodity labeling method suitable for electronic commerce Chinese website Download PDFInfo
- Publication number
- CN105320778A CN105320778A CN201510828440.0A CN201510828440A CN105320778A CN 105320778 A CN105320778 A CN 105320778A CN 201510828440 A CN201510828440 A CN 201510828440A CN 105320778 A CN105320778 A CN 105320778A
- Authority
- CN
- China
- Prior art keywords
- label
- commodity
- labels
- word segmentation
- keyword
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Abstract
A commodity labeling method suitable for an electronic commerce Chinese website includes the step of building a word segmentation lexicon, the step of collecting labels and the step of marking commodities with the labels. According to the step of building the word segmentation lexicon, based on frequency statistics, in different commodity descriptions, of all commodity key words in the electronic commerce Chinese website, the commodity key words with the frequency larger than three are reserved, and the key words with the number of Chinese characters of the commodity key words smaller than or equal to five are screened from the commodity key words to serve as lexicon data. According to the step of collecting the labels, based on the built word segmentation lexicon, word segmentation processing is carried out on all commodity names in the electronic commerce Chinese website through a reverse maximum matching word segmentation algorithm, after word segmentation processing is carried out through the reverse maximum matching word segmentation algorithm, the last word formed by word segmentation processing of each commodity is selected as the commodity label of the commodity, and finally all the labels form a label data set. According to the step of marking the commodities with the labels, relations between commodity attributes and the labels are found by using a text mining algorithm.
Description
Technical field
The invention belongs to computer internet field, particularly relate to a kind of method being applicable to ecommerce Chinese website Commercial goods labels.
Background technology
In ecommerce Chinese website, when user utilizes keyword retrieval commodity, normally directly retrieve the essential information of commodity, but be mostly to be filled in by businessman oneself and safeguard due to the merchandise news in website, though businessman can safeguard merchandise news according to the commodity rule of website, but still the appearance of two class problems can not be avoided: the problem of first merchandise news cheating, businessman is in order to provide exposure rate in commercial articles searching process of oneself commodity and the frequency of occurrences, make the commodity of issue noticeable, make commodity purchaser can search the commodity of issue more, they are abusing brand name or there is not the keyword associated with these commodity to during descriptive labelling, thus cause commodity purchaser cannot find the commodity of needs exactly, it two is the incomplete problems of merchandise news, the key message of descriptive labelling is omitted by businessman when describing commodity, comprise the important information disappearances such as commodity title, picture, description, and when loss of learning will cause user to do item retrieves, website cannot return the item retrieves result of more heterogeneous pass.
For the problem of businessman's cheating merchandise news, e-commerce website usually sets rule and solves, and falls power to the cheating commodity that those do not meet rule, but rule exists defect to a certain degree, and hard and fast rule may cause the commodity of not practising fraud to fall power; Loose rule may make the effect of anti-cheating obvious not; Solving in the infull problem of businessman's fill message, for ensureing to recall Related product as much as possible, the stint no sacrifice retrieval quality of e-commerce website and select expand retrieval merchandise news range of search, namely do in multiple merchandise news field and mate, sometimes the field that even " descriptive labelling " this kind of data volume is huge but second-rate is all selected, although this mode can recall more commodity, the commodity of recalling can not make user satisfied, and then cause flow to run off in a large number.
Summary of the invention
For the imperfection of prior art, the present invention seeks to, a kind of method being applicable to ecommerce Chinese website Commercial goods labels is provided, by comprehensively analyzing the information of trade name and item property, the label relevant to commodity is provided to indicate it, with the merchandise news in Perfect Electronic Business Chinese website.These will participate in retrieval, to ensure, while recalling more heterogeneous underlying commodity, also can promote the accuracy rate of item retrieves as important search field in order to the label data indicating commodity in commercial articles searching process.
Technical scheme of the present invention is as follows, and a kind of method being applicable to ecommerce Chinese website Product labelling, is characterized in that, concrete steps comprise the method for the construction method of participle dictionary, method that label gathers and labeled marker commodity;
The construction method of so-called participle dictionary, refer to based on to the frequency statistics of each commodity keyword in different descriptive labelling in ecommerce Chinese website, retain the commodity keyword that the frequency is greater than 3, and therefrom filter out commodity keyword number of words and be less than or equal to the keyword of 5 as dictionary data, when the commodity keyword that length is long comprise multiple short keyword time, these long words can not be put in storage;
So-called commodity keyword, referring to the word freely added by website background system by businessman, is the description of businessman to commodity key feature;
Especially, consider that in ecommerce Chinese website, commodity keyword is added by commodity seller usually, thus from these keywords, choose brief refining and high frequency occur word list in participle dictionary, at utmost can ensure the accuracy of participle;
So-called label acquisition method, referring to the participle dictionary based on having built, carrying out word segmentation processing by reverse Max Match word segmentation arithmetic to trade names all in ecommerce Chinese website; After the word segmentation processing of maximum reverse matching algorithm, according to Chinese grammar feature, namely in the statement form of " adjective+noun ", noun is positioned at end of the sentence, and then chooses last word that commodity are formed after word segmentation processing Commercial goods labels as these commodity; Finally, these all label composition label datas set;
So-called trade name, refers to one section of brief textual description to commodity of being added voluntarily by businessman;
The method of so-called labeled marker commodity, refers to by utilizing text mining algorithm, finds the relation between item property and label.Especially, utilize the prerequisite of text mining algorithm be item property and label all possess can embody both sides relation and representational content as basis for estimation.Item property multi-facetedly can show product features, if label also has the characteristic of oneself, by comparing both similaritys in feature, can determine the similarity relation between item property and label.
Further, the step that the method for labeled marker commodity specifically comprises has:
Step 1: the acquisition of label characteristics
The characteristic information being subordinate to each label is determined on the basis of tag set.If the label of certain commodity appears in the trade name of certain commodity, then give tacit consent to this label and this commodity exist correlationship.
According to above-mentioned thinking, first filter out the trade name comprising a certain specific label word, then find the product features information data of these commodity according to trade name, count the characteristic information data of all product features information datas as this label; Especially, product features information data comes from information attribute value;
Step 2: judge the similarity relation between commodity and label
Based on all label characteristics of a certain label, analyze the weight of each label characteristics, assess the representativeness of each label characteristics in the feature of all labels, specifically comprise:
Step 2-1: analyze the distribution situation of each label characteristics at tag set a: if label characteristics concentrates in a label, then the representativeness giving tacit consent to this label characteristics is strong; If a label characteristics is distributed in multiple label, then the representativeness giving tacit consent to this label characteristics is not strong;
Step 2-2: with reference to TF*IDF weighing computation method, for the label characteristics that representativeness is strong, do weighting, weight is that the frequency that label characteristics occurs in this label is multiplied by initial weight; For the label characteristics that representativeness is weak, do and fall power, weight is the frequency that initial weight occurs in different label divided by this label; Label characteristics weight Boost in the label
pcan refer to following formula:
Wherein, count (p, t) number of times that label characteristics p occurs in label t is represented, size (t) represents the number of the label characteristics that label t comprises, N represents the total number of labels in tag set, tags (p, t) represents the number comprising the label t of label characteristics p.
Step 2-3: the space vector characteristic information set of label and the characteristic information set of commodity being abstracted into respectively a multidimensional, utilize space vector cosine similarity principle, by calculating the similarity between two space vectors, judge the correlationship between commodity and label;
Step 3: the respective labels determining commodity
Because the degree of correlation between commodity and label has dividing of height quality, thus the degree of correlation coefficient value of label and commodity is also not enough directly gives commodity by label, need by setting reasonable threshold values, filter out the similarity between two space vectors and between commodity and label, the label of correlationship coefficient on threshold values is as the label of commodity, threshold range is between 0 ~ 1; The setting of threshold values can require to provide strict or loose value according to the quality of data, if wish, commercial articles searching process is stricter, and threshold values is more close to 1.In addition, the mean value of all degree of correlation coefficient values can also be got as threshold values;
Especially, for choosing the label of commodity more accurately, optionally can control the label number of each commodity, and the maximally related label within selection restriction number is as Commercial goods labels.
Information attribute value represents some features of commodity, if label also has the characteristic of oneself, so we excavate the relation that both relations between characteristic just can know commodity and label.
The present invention compared with prior art, its beneficial effect:
(1) the present invention utilizes commodity keyword to build participle dictionary, realizes doing word segmentation processing based on the key feature of existing goods in website to descriptive labelling, thus ensures participle accuracy, be conducive to accurately locking trade name in descriptive labelling;
(2) the present invention is by identification and the feature determining label, label characteristics and product features are carried out similarity-rough set, thus confirm the similar names of trade name, for commodity indicate abundanter label, improve merchandise news, contribute in search procedure, promote search recall rate and accuracy rate;
(3) the present invention is by finding entity tag and respective labels for the commodity in e-commerce website, is ensureing that Commercial goods labels has more
While objectivity, also can improve the reliability of merchandise news;
Accompanying drawing explanation
A kind of structural drawing being applicable to ecommerce Chinese website Commercial goods labels method in Fig. 1 embodiment of the present invention;
The process flow diagram that in Fig. 2 embodiment of the present invention, labeled marker commodity method realizes;
The process flow diagram that between commodity and label, similarity relation method realizes is judged in Fig. 3 embodiment of the present invention.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.
The present invention specifically comprises method, the method for label collection and the method for labeled marker commodity that participle dictionary builds; The method that participle dictionary builds is used for being called word segmentation processing to the trade name in ecommerce Chinese website; It is that the label corresponding to it found by all commodity in ecommerce Chinese website that the method for label collection is used for according to trade name; The method of labeled marker commodity is used for for the label having correlationship with it found by commodity all in ecommerce Chinese website.Described trade name is the brief textual description that the businessman user of ecommerce Chinese website does oneself commodity.
For made in China net Chinese website, a kind of method being applicable to ecommerce Chinese website Commercial goods labels, comprises the construction method of participle dictionary, the method for label collection and the method for labeled marker commodity, consults shown in Fig. 1;
The construction method of so-called participle dictionary, refer to based on to the frequency statistics of each commodity keyword in different descriptive labelling in ecommerce Chinese website, retain the keyword that the frequency is greater than 3, and therefrom filter out keyword number of words and be less than or equal to the keyword of 5 as dictionary data, when the keyword that length is long comprise multiple short keyword time, these long words can not be put in storage, such as: " electric bicycle ", this word comprises " electronic " and " bicycle " two short words, and so " electric bicycle " this word can not sign in in participle dictionary.
Especially, consider that in ecommerce Chinese website, commodity keyword is added by commodity seller usually, thus from these keywords, choose brief refining and high frequency occur word list in participle dictionary, at utmost can ensure the accuracy of participle;
Existing following 15 commodity and businessman are the commodity keyword that it adds:
Through statistics, the commodity keyword selecting frequency to be more than or equal to 3 enters participle dictionary, as shown in the table:
Keyword | Frequency statistics |
Screen printer | 5 |
Screen-printing machine | 4 |
Automatically | 7 |
Silk screen | 4 |
Printing machine | 6 |
Lathe | 4 |
Polychrome | 3 |
Numerical control | 3 |
So-called label acquisition method, referring to the dictionary based on having built, carrying out word segmentation processing by reverse Max Match word segmentation arithmetic to trade names all in ecommerce Chinese website; According to Chinese grammar feature, choose last word that trade name formed after word segmentation processing Commercial goods labels as these commodity; Finally, all Commercial goods labels composition label datas set;
According to above-mentioned example, the Commercial goods labels of result after participle of 15 commodity commodity and formation is as follows:
The so-called reverse Max Match word segmentation arithmetic based on dictionary, refer to needing the statement of participle repeatedly to scan from back to front, the phrase maximum length of each scanning is the length of the word that in dictionary, length is maximum, when the phrase of scanning is in dictionary, the position then scanned is just as cut-off, and next time, scanning continued scanning forward from this cut-off; If sweep length minimumly also not to find dictionary from being up to, then scanning position moves forward one, and scanning, as new cut-off, is then continued in this position.Here is object lesson:
For trade name " power surpasses Full-automatic film switch screen printer ", the existing dictionary built based on us carries out participle:
Step one: confirm that length is maximum in dictionary word be " screen printer " or " automatically " equal length is the word of 3, thus the length scanned from maximum be successively decrease 3, minimum sweep length is 2;
Step 2: start scanning from back to front and treat participle statement, first three words scanned are " screen printers ", the word of this three words composition is in dictionary, so " screen printer " front this position is as cut-off, statement becomes " power surpasses Full-automatic film switch/screen printer ";
Step 3: from last scan to cut-off continue scanning, first three words scanned are " membrane switchs ", the word of these three word compositions is not in dictionary, again scan so sweep length subtracts 1, two words scanned are " switches ", the word of these two word compositions is not still in dictionary, and need to move forward one this time and find new cut-off, this time, statement became " power surpasses Full-automatic film ON/OFF/screen printer ";
Step 4: continue the scanning cutting according to step 2 and step 3, cutting is always to last, and statement becomes " power/super/full-automatic/thin/film/ON/OFF/screen printer ", then stops exiting;
Through four steps above, the word segmentation result of specifying statement based on dictionary can be obtained.
The method of so-called labeled marker commodity, refers to by utilizing text mining algorithm, finds the relation between commodity and label.Especially, utilize the prerequisite of text mining algorithm be commodity and label all possess can embody both sides relation and representational content as basis for estimation.Item property multi-facetedly can show product features, if label also has the characteristic of oneself, by comparing both similaritys in feature, can determine the similarity relation between commodity and label.
Consult shown in Fig. 2, the step that the method for labeled marker commodity specifically comprises has:
Step 101: the acquisition of label characteristics
The characteristic information being subordinate to each label is determined on the basis of tag set.If certain label appears in the title of certain commodity, then give tacit consent to this label and this commodity exist correlationship.
According to this thinking, first filter out the trade name comprising a certain specific label word, then find the product features information data of these commodity according to trade name, count the characteristic information data of all product features information datas as this label; Especially, product features information data comes from information attribute value;
According to above-mentioned example, first arrange out 15 commodity and their item property, correspondingly, the label characteristics of label " screen printer " comprising: mode of operation _ full-automatic, printing surface _ plane, print color _ polychrome; The label characteristics of label " coating machine " comprising: print color _ polychrome, mode of operation _ full-automatic, printing surface _ plane; The label characteristics of label " screen-printing machine " comprising: print color _ polychrome, brand _ hat reach, mode of operation _ full-automatic, printing surface _ plane; Other are more specifically as following table:
Step 102: judge the similarity relation between commodity and label
Based on all label characteristics of a certain specific label, analyze the weight of each label characteristics, assess the representativeness of each label characteristics in all label characteristics, specifically comprise:
Step 102-1: analyze the distribution situation of each label characteristics at tag set a: if label characteristics concentrates in same label, then the representativeness giving tacit consent to this label characteristics is strong; If a label characteristics is distributed in multiple label, then the representativeness giving tacit consent to this label characteristics is not strong;
For convenience of understanding, choosing label " screen printer ", " screen-printing machine " and " lathe ", and adding up the frequency of the appearance of their label characteristics, as following table:
Step 102-2: with reference to TF*IDF weighing computation method, for the label characteristics that representativeness is strong, do weighting, weight is that the frequency that label characteristics occurs in this label is multiplied by initial weight (initial weight is determined on demand); For the label characteristics that representativeness is weak, do and fall power, weight is the frequency that initial weight occurs in different label divided by this label; Label characteristics weight Boost in the label
pcan refer to following formula:
Wherein, count (p, t) number of times that label characteristics p occurs in label t is represented, size (t) represents the number of the label characteristics that label t comprises, N represents the total number of labels in tag set, tags (p, t) represents the number comprising the label t of label characteristics p.
Below the weight of the respective characteristic attribute of label " screen printer ", " screen-printing machine " and " lathe ":
[Boost screen printer]
(mode of operation _ full-automatic)=(3/7) * log (3/2)=0.075
[Boost screen printer]
(printing surface _ plane)=(2/7) * log (3/2)=0.050
[Boost screen printer]
(print color _ polychrome)=(2/7) * log (3/2)=0.050
[Boost screen-printing machine]
(print color _ polychrome)=(3/11) * log (3/2)=0.048
[Boost screen-printing machine]
(mode of operation _ full-automatic)=(3/11) * log (3/2)=0.048
[Boost screen-printing machine]
(printing surface _ plane)=(3/11) * log (3/2)=0.048
[Boost screen-printing machine]
(brand _ hat reaches)=(2/11) * log (3/2)=0.032
[Boost lathe]
(installation form _ console mode)=(3/16) * log (3/1)=0.089
[Boost lathe]
(precision _ precision)=(4/16) * log (3/1)=0.119
[Boost lathe]
(distribution form _ horizontal)=(2/16) * log (3/1)=0.060
[Boost lathe]
(automaticity _ automatically)=(3/16) * log (3/1)=0.089
[Boost lathe]
(knife rest quantity _ double tool rest numerically controlled lathe)=(2/16) * log (3/1)=0.060
[Boost lathe]
(control mode _ numerical control)=(2/16) * log (3/1)=0.060
Step 102-3: the space vector characteristic information set of label and the characteristic information set of commodity being abstracted into respectively a multidimensional, using the weighted value of feature as vector value, utilize space vector cosine similarity principle, by calculating the similarity between two space vectors, judge the correlationship between commodity and label;
According to similarity formula:
Cos (power surpasses Full-automatic film switch screen printer, label (lathe))=0.0%
Cos (double-colored automatic screen-printing machine, label (lathe))=0.0%
Cos (half tone coating machine, label (lathe))=0.0%
Cos (platform encourages good fortune numerical control press, label (screen-printing machine))=0.0%
Cos (platform encourages good fortune numerical control press, label (screen printer))=0.0%
Step 103: the respective labels determining commodity
Because the degree of correlation between commodity and label has dividing of height quality, thus the degree of correlation coefficient value of label and commodity is also not enough to directly give commodity by label, by setting reasonable threshold values, the label of the label of degree of correlation coefficient on threshold values as commodity need be filtered out; The setting of threshold values can require to provide strict or loose value according to the quality of data, also can get the mean value of all degree of correlation coefficient values as threshold values;
Especially, for choosing the label of commodity more accurately, optionally can control the label number of each commodity, and the maximally related label within selection restriction number is as Commercial goods labels;
Information attribute value represents some features of commodity, if label also has the characteristic of oneself, so we excavate the relation that both relations between characteristic just can know commodity and label;
According to above-mentioned steps, we have drawn the similarity between each commodity and label, in order to ensure the quality of respective labels, threshold values is set to 90% by us, similarity between commodity and label is more than 90%, we think that this label can use as of a commodity label, so we stamp the label of " lathe " to " platform encourages good fortune numerical control press ", stamp " screen printer " and " screen-printing machine " label to " half tone coating machine " these commodity.In this case, just " half tone coating machine " these commodity can be called back when user search " screen printer " or " screen-printing machine " time.By this method, we stamp relevant label can to more commodity, thus promote the sophistication of merchandise news, ensure the recall rate of search.
Those of ordinary skill in the field are to be understood that: the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.
Claims (4)
1. be applicable to a method for ecommerce Chinese website Product labelling, it is characterized in that, concrete steps comprise the method for the construction method of participle dictionary, method that label gathers and labeled marker commodity;
The construction method of so-called participle dictionary, refer to based on to the frequency statistics of each commodity keyword in different descriptive labelling in ecommerce Chinese website, retain the commodity keyword that the frequency is greater than 3, and therefrom filter out commodity keyword number of words and be less than or equal to the keyword of 5 as dictionary data, when the commodity keyword that length is long comprise multiple short keyword time, these long words can not be put in storage;
So-called commodity keyword, referring to the word freely added by website background system by businessman, is the description of businessman to commodity key feature;
Especially, consider that in ecommerce Chinese website, commodity keyword is added by commodity seller usually, thus from these keywords, choose brief refining and high frequency occur word list in participle dictionary, at utmost can ensure the accuracy of participle;
So-called label acquisition method, referring to the participle dictionary based on having built, carrying out word segmentation processing by reverse Max Match word segmentation arithmetic to trade names all in ecommerce Chinese website; After the word segmentation processing of maximum reverse matching algorithm, according to Chinese grammar feature, namely in the statement form of " adjective+noun ", noun is positioned at end of the sentence, and then chooses last word that commodity are formed after word segmentation processing Commercial goods labels as these commodity; Finally, these all label composition label datas set;
So-called trade name, refers to one section of brief textual description to commodity of being added voluntarily by businessman;
The method of so-called labeled marker commodity, refers to by utilizing text mining algorithm, finds the relation between item property and label; Utilize the prerequisite of text mining algorithm be item property and label all possess can embody both sides relation and representational content as basis for estimation.Item property multi-facetedly can show product features, if label also has the characteristic of oneself, by comparing both similaritys in feature, can determine the similarity relation between item property and label.
2. method according to claim 1, is characterized in that, the step that the method for labeled marker commodity specifically comprises has:
Step 1: the acquisition of label characteristics
The characteristic information being subordinate to each label is determined on the basis of tag set.If the label of certain commodity appears in the trade name of certain commodity, then give tacit consent to this label and this commodity exist correlationship.
According to above-mentioned thinking, first filter out the trade name comprising a certain specific label word, then find the product features information data of these commodity according to trade name, count the characteristic information data of all product features information datas as this label; Especially, product features information data comes from information attribute value;
Step 2: judge the similarity relation between commodity and label
Step 3: the respective labels determining commodity
Because the degree of correlation between commodity and label has dividing of height quality, thus the degree of correlation coefficient value of label and commodity is also not enough directly gives commodity by label, need by setting reasonable threshold values, filter out the similarity between two space vectors and between commodity and label, the label of correlationship coefficient on threshold values is as the label of commodity, threshold range is between 0 ~ 1; The setting of threshold values can require to provide strict or loose value according to the quality of data, if wish, commercial articles searching process is stricter, and threshold values is more close to 1.In addition, the mean value of all degree of correlation coefficient values can also be got as threshold values;
Especially, for choosing the label of commodity more accurately, optionally can control the label number of each commodity, and the maximally related label within selection restriction number is as Commercial goods labels.
3. method according to claim 2, is characterized in that, in step 2: based on all label characteristics of a certain label, analyze the weight of each label characteristics, assesses the representativeness of each label characteristics in the feature of all labels, specifically comprises:
Step 2-1: analyze the distribution situation of each label characteristics at tag set a: if label characteristics concentrates in a label, then the representativeness giving tacit consent to this label characteristics is strong; If a label characteristics is distributed in multiple label, then the representativeness giving tacit consent to this label characteristics is not strong;
Step 2-2: with reference to TF*IDF weighing computation method, for the label characteristics that representativeness is strong, do weighting, weight is that the frequency that label characteristics occurs in this label is multiplied by initial weight; For the label characteristics that representativeness is weak, do and fall power, weight is the frequency that initial weight occurs in different label divided by this label; Label characteristics weight Boost in the label
pcan refer to following formula:
Wherein, count (p, t) number of times that label characteristics p occurs in label t is represented, size (t) represents the number of the label characteristics that label t comprises, N represents the total number of labels in tag set, tags (p, t) represents the number comprising the label t of label characteristics p.
Step 2-3: the space vector characteristic information set of label and the characteristic information set of commodity being abstracted into respectively a multidimensional, utilize space vector cosine similarity principle, by calculating the similarity between two space vectors, judge the correlationship between commodity and label;
4. method according to claim 3, it is characterized in that, in the deterministic process of label and commodity relation, first the trade name comprising certain label is filtered out, then find the product features information data of these commodity according to trade name, count the characteristic information data of all product features information datas as this label; Described product features information data comes from information attribute value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510828440.0A CN105320778B (en) | 2015-11-25 | 2015-11-25 | A method of suitable for e-commerce Chinese website Commercial goods labels |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510828440.0A CN105320778B (en) | 2015-11-25 | 2015-11-25 | A method of suitable for e-commerce Chinese website Commercial goods labels |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105320778A true CN105320778A (en) | 2016-02-10 |
CN105320778B CN105320778B (en) | 2019-04-02 |
Family
ID=55248164
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510828440.0A Active CN105320778B (en) | 2015-11-25 | 2015-11-25 | A method of suitable for e-commerce Chinese website Commercial goods labels |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105320778B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106844554A (en) * | 2016-12-30 | 2017-06-13 | 全民互联科技(天津)有限公司 | A kind of contract classification automatic identifying method and system |
CN107729422A (en) * | 2017-09-27 | 2018-02-23 | 广州市万表科技股份有限公司 | A kind of personality method of testing and system based on commodity identification |
CN107784029A (en) * | 2016-08-31 | 2018-03-09 | 阿里巴巴集团控股有限公司 | Generation prompting keyword, the method for establishing index relative, server and client side |
CN107909097A (en) * | 2017-11-08 | 2018-04-13 | 阿里巴巴集团控股有限公司 | The update method and device of sample in sample storehouse |
CN108388555A (en) * | 2018-02-01 | 2018-08-10 | 口碑(上海)信息技术有限公司 | Commodity De-weight method based on category of employment and device |
CN110334306A (en) * | 2019-06-21 | 2019-10-15 | 无线生活(北京)信息技术有限公司 | Label processing method and device |
CN110909532A (en) * | 2019-10-31 | 2020-03-24 | 银联智惠信息服务(上海)有限公司 | User name matching method and device, computer equipment and storage medium |
CN111126110A (en) * | 2018-10-31 | 2020-05-08 | 杭州海康威视数字技术股份有限公司 | Commodity information identification method, settlement method and device and unmanned retail system |
CN111782847A (en) * | 2019-07-31 | 2020-10-16 | 北京京东尚科信息技术有限公司 | Image processing method, apparatus and computer-readable storage medium |
CN112768080A (en) * | 2021-01-25 | 2021-05-07 | 武汉大学 | Medical keyword bank establishing method and system based on medical big data |
CN113779243A (en) * | 2021-08-16 | 2021-12-10 | 深圳市世强元件网络有限公司 | Automatic commodity classification method and device and computer equipment |
CN114817672A (en) * | 2022-06-07 | 2022-07-29 | 舟谱数据技术南京有限公司 | Processing system and processing method for realizing normalization by using keywords in trade names |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101727500A (en) * | 2010-01-15 | 2010-06-09 | 清华大学 | Text classification method of Chinese web page based on steam clustering |
CN103605815A (en) * | 2013-12-11 | 2014-02-26 | 焦点科技股份有限公司 | Automatic commodity information classifying and recommending method applicable to B2B (Business to Business) e-commerce platform |
US20140067815A1 (en) * | 2012-09-05 | 2014-03-06 | Alibaba Group Holding Limited | Labeling Product Identifiers and Navigating Products |
CN105069086A (en) * | 2015-07-31 | 2015-11-18 | 焦点科技股份有限公司 | Method and system for optimizing electronic commerce commodity searching |
-
2015
- 2015-11-25 CN CN201510828440.0A patent/CN105320778B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101727500A (en) * | 2010-01-15 | 2010-06-09 | 清华大学 | Text classification method of Chinese web page based on steam clustering |
US20140067815A1 (en) * | 2012-09-05 | 2014-03-06 | Alibaba Group Holding Limited | Labeling Product Identifiers and Navigating Products |
CN103605815A (en) * | 2013-12-11 | 2014-02-26 | 焦点科技股份有限公司 | Automatic commodity information classifying and recommending method applicable to B2B (Business to Business) e-commerce platform |
CN105069086A (en) * | 2015-07-31 | 2015-11-18 | 焦点科技股份有限公司 | Method and system for optimizing electronic commerce commodity searching |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107784029A (en) * | 2016-08-31 | 2018-03-09 | 阿里巴巴集团控股有限公司 | Generation prompting keyword, the method for establishing index relative, server and client side |
CN107784029B (en) * | 2016-08-31 | 2022-02-08 | 阿里巴巴集团控股有限公司 | Method, server and client for generating prompt keywords and establishing index relationship |
CN106844554A (en) * | 2016-12-30 | 2017-06-13 | 全民互联科技(天津)有限公司 | A kind of contract classification automatic identifying method and system |
CN107729422A (en) * | 2017-09-27 | 2018-02-23 | 广州市万表科技股份有限公司 | A kind of personality method of testing and system based on commodity identification |
CN107909097B (en) * | 2017-11-08 | 2021-07-30 | 创新先进技术有限公司 | Method and device for updating samples in sample library |
CN107909097A (en) * | 2017-11-08 | 2018-04-13 | 阿里巴巴集团控股有限公司 | The update method and device of sample in sample storehouse |
CN108388555A (en) * | 2018-02-01 | 2018-08-10 | 口碑(上海)信息技术有限公司 | Commodity De-weight method based on category of employment and device |
CN111126110B (en) * | 2018-10-31 | 2024-01-05 | 杭州海康威视数字技术股份有限公司 | Commodity information identification method, settlement method, device and unmanned retail system |
CN111126110A (en) * | 2018-10-31 | 2020-05-08 | 杭州海康威视数字技术股份有限公司 | Commodity information identification method, settlement method and device and unmanned retail system |
CN110334306A (en) * | 2019-06-21 | 2019-10-15 | 无线生活(北京)信息技术有限公司 | Label processing method and device |
CN111782847A (en) * | 2019-07-31 | 2020-10-16 | 北京京东尚科信息技术有限公司 | Image processing method, apparatus and computer-readable storage medium |
CN110909532B (en) * | 2019-10-31 | 2021-06-11 | 银联智惠信息服务(上海)有限公司 | User name matching method and device, computer equipment and storage medium |
CN110909532A (en) * | 2019-10-31 | 2020-03-24 | 银联智惠信息服务(上海)有限公司 | User name matching method and device, computer equipment and storage medium |
CN112768080A (en) * | 2021-01-25 | 2021-05-07 | 武汉大学 | Medical keyword bank establishing method and system based on medical big data |
CN113779243A (en) * | 2021-08-16 | 2021-12-10 | 深圳市世强元件网络有限公司 | Automatic commodity classification method and device and computer equipment |
CN114817672A (en) * | 2022-06-07 | 2022-07-29 | 舟谱数据技术南京有限公司 | Processing system and processing method for realizing normalization by using keywords in trade names |
CN114817672B (en) * | 2022-06-07 | 2022-09-20 | 舟谱数据技术南京有限公司 | Processing system and processing method for realizing normalization by using keywords in trade names |
Also Published As
Publication number | Publication date |
---|---|
CN105320778B (en) | 2019-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105320778A (en) | Commodity labeling method suitable for electronic commerce Chinese website | |
CN103729359B (en) | A kind of method and system recommending search word | |
US8190556B2 (en) | Intellegent data search engine | |
CN103049435B (en) | Text fine granularity sentiment analysis method and device | |
EP2192500B1 (en) | System and method for providing robust topic identification in social indexes | |
CN108763321B (en) | Related entity recommendation method based on large-scale related entity network | |
JP4637969B1 (en) | Properly understand the intent of web pages and user preferences, and recommend the best information in real time | |
JP5721818B2 (en) | Use of model information group in search | |
CN106156204A (en) | The extracting method of text label and device | |
KR20110040147A (en) | Apparatus for question answering based on answer trustworthiness and method thereof | |
CN105868255A (en) | Query recommendation method and apparatus | |
CN105824833A (en) | Keyword recommendation method and system based on user behavior feedback | |
CN103294778A (en) | Method and system for pushing messages | |
CN104077417A (en) | Figure tag recommendation method and system in social network | |
US20200272674A1 (en) | Method and apparatus for recommending entity, electronic device and computer readable medium | |
CN115203309B (en) | Method and device for structuring bid-winning data of webpage | |
DE102012221251A1 (en) | Semantic and contextual search of knowledge stores | |
CN105528411A (en) | Full-text retrieval device and method for interactive electronic technical manual of shipping equipment | |
CN111444304A (en) | Search ranking method and device | |
CN105468649A (en) | Method and apparatus for determining matching of to-be-displayed object | |
CN109710725A (en) | A kind of Chinese table column label restoration methods and system based on text classification | |
Aria et al. | Package ‘bibliometrix’ | |
CN106372232B (en) | Information mining method and device based on artificial intelligence | |
CN108153728A (en) | A kind of keyword determines method and device | |
CN113553491A (en) | Industrial big data search optimization method based on inverted index |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |