CN107203548A - Attribute acquisition methods and device - Google Patents

Attribute acquisition methods and device Download PDF

Info

Publication number
CN107203548A
CN107203548A CN201610154037.9A CN201610154037A CN107203548A CN 107203548 A CN107203548 A CN 107203548A CN 201610154037 A CN201610154037 A CN 201610154037A CN 107203548 A CN107203548 A CN 107203548A
Authority
CN
China
Prior art keywords
attribute
target
word
candidate
destination object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610154037.9A
Other languages
Chinese (zh)
Inventor
陈强
吴夙慧
郭立超
李传福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610154037.9A priority Critical patent/CN107203548A/en
Priority to TW106104935A priority patent/TW201734901A/en
Priority to PCT/CN2017/075829 priority patent/WO2017157198A1/en
Publication of CN107203548A publication Critical patent/CN107203548A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides attribute acquisition methods and device, by from original platform be used for destination object is described non-structured text in, the target word matched with the preset attribute of target platform is extracted, and then attribute of the destination object in target platform is determined according to target word.For e-commerce platform, can be by extracting the attributes of commodity in describing this non-structured text from the commodity title and details of original platform, therefore solving can not be handled for non-structured text in the prior art, obtain the technical problem of attribute of the commodity of original platform on target platform.

Description

Attribute acquisition methods and device
Technical field
The present invention relates to information technology, more particularly to an attribute acquisition methods and device.
Background technology
In ecommerce processing platform, a commodity storehouse can be safeguarded to the commodity issued, according to the commodity classification of commodity in commodity storehouse, it is determined that commodity are described brand, material, color, style, price range etc. attribute item, screened consequently facilitating carrying out statistics and user.It is often otherwise varied with target platform due to the attribute for being used to describe commodity on original platform, including attribute item and property value on target platform during publishing commodity when the safe business of original platform such as silver needs the target platforms such as access Taobao.For example:On silver-colored safe commercial podium, employ brand, color, material and Time To Market and describe one-piece dress this commodity of commodity class now, and brand, color classification, style and price range are then employed on Taobao's platform.Therefore, on Taobao's platform before publishing commodity, it is thus necessary to determine that the property value of each attribute item when the commodity on silver Thailand commercial podium are described in Taobao's platform, that is, attribute of the commodity on target platform is got.
In the prior art can be according to the attribute of target platform, clustering processing is carried out to the attribute of original platform commodity, so as to obtain the attribute of the commodity on target platform, but this mode can only be handled for attribute of the commodity on original platform, and the non-structured text such as title of the commodity on original platform or details description can not be handled.
The content of the invention
The present invention provides an attribute acquisition methods and device, for carrying out the attribute that processing obtains the commodity based on the non-structured text such as title of the commodity on original platform or details description.
To reach above-mentioned purpose, embodiments of the invention are adopted the following technical scheme that:
First aspect there is provided a kind of attribute acquisition methods, including:
From the non-structured text for describing destination object, the target word matched with preset attribute is extracted;
The attribute of the destination object is determined according to the target word.Second aspect there is provided an attribute acquisition device, including:
Abstraction module, the target word preset attribute matched for from the non-structured text for describing destination object, extracting with preset attribute;
Determining module, the attribute preset attribute for determining the destination object according to the target word.
Attribute acquisition methods and device provided in an embodiment of the present invention, by from original platform be used for destination object is described non-structured text in, the target word matched with the preset attribute of target platform is extracted, and then attribute of the destination object in target platform is determined according to target word.For e-commerce platform, can realize from the title and details of commodity this non-structured text is described in extract the attributes of commodity, therefore solving can not be handled for non-structured text in the prior art, obtain the technical problem of attribute of the commodity of original platform on target platform.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, and can be practiced according to the content of specification, and in order to which above and other objects of the present invention, feature and advantage can be become apparent, below especially exemplified by the embodiment of the present invention.
Brief description of the drawings
By reading the detailed description of hereafter preferred embodiment, various other advantages and benefit will be clear understanding for those of ordinary skill in the art.Accompanying drawing is only used for showing the purpose of preferred embodiment, and is not considered as limitation of the present invention.And in whole accompanying drawing, identical part is denoted by the same reference numerals.In the accompanying drawings:
Fig. 1 is the schematic flow sheet for the attribute acquisition methods that embodiment one is provided;
Fig. 2 is the application scenarios schematic diagram of attribute acquisition methods;
Fig. 3 is the schematic flow sheet for the attribute acquisition methods that the embodiment of the present invention two is provided;
Fig. 4 is the structural representation for the attribute acquisition device that the embodiment of the present invention three is provided;
Fig. 5 is the structural representation for the attribute acquisition device that the embodiment of the present invention four is provided.
Embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the exemplary embodiment of the disclosure in accompanying drawing, it being understood, however, that may be realized in various forms the disclosure without that should be limited by embodiments set forth here.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can by the scope of the present disclosure completely convey to those skilled in the art.
Attribute acquisition methods provided in an embodiment of the present invention and device are described in detail below in conjunction with the accompanying drawings.
Embodiment one
Fig. 1 is the schematic flow sheet for the attribute acquisition methods that embodiment one is provided, the method that the present embodiment is provided can be used in e-commerce platform, that is, object mentioned in the present embodiment can be commodity, embodiment can be used for delivering the commodity in original platform to before target platform, attribute of the commodity in target platform is obtained, as shown in figure 1, method includes:
Step 101, from the non-structured text for describing destination object, extract the target word matched with preset attribute.
Wherein, preset attribute includes preset attribute and preset attribute value.For same preset attribute, corresponding preset attribute value can be made up of one or more vocabulary.Optionally, setting between preset attribute and preset attribute value after corresponding relation, the corresponding relation between multiple preset attribute subvalues can also be set for each preset attribute value, wherein, preset attribute subvalue has similar semantic with preset attribute value.
For example:For the preset attribute of dress ornament style, it can be provided for describing the vocabulary of different dress ornament styles as preset attribute value.Further, can also be for that for each dress ornament style vocabulary multiple vocabulary with similar semantic can also be set to be used as preset attribute subvalue, specifically, it is preset attribute value that can set national, and then Miao ethnic group, Han nationality, Tibetan etc. can also be set to specifically describe national vocabulary as preset attribute subvalue, campus, literature and art can also be and for example set while setting institute as preset attribute value and small pure and fresh etc. are used for the vocabulary for specifically describing institute's style as preset attribute subvalue.
It should be noted that matching mentioned here refers not only to absolute matches, include the situation of part matching.
Specifically, each word corresponding with the preset attribute of the word in non-structured text is matched, think that the word is matched with preset attribute if the vocabulary matched in the presence of at least one, and then determine that the word is target word.Before matching, can be by obtaining the non-structured texts such as title and details description of the destination object in original platform, these non-structured texts are pre-processed, pretreatment operation mainly includes participle, full-shape and changes half-angle, capital and small letter unification, normalizing is carried out to text, brand word is accurately identified, individual character etc. is handled.And then in target platform, the preset attribute of the affiliated class of inquiry destination object now.Using similarity algorithm, string matching is carried out to the non-structured text and the preset attribute, the target words such as the word of matching are obtained, and obtain the matching degree between each target word and preset attribute.By carrying out string matching, the vocabulary similar to preset attribute is found from non-structured text, similarity algorithm used herein can include:Editing distance, cosine angle similarity, Euclidean distance, Jacarrd genetic similarties distance (Jacarrd is a kind of algorithm of genetic similarty), two-dimensional grammar (2-Gram) language model, longest common subsequence, most long continuous public substring etc..
The foregoing string matching referred to can be not only used in this step, it would however also be possible to employ other modes extract target word from non-structured text, such as semantic matches.
It should be noted that, the foregoing classification referred to refers to object generic, the granularity of classification can voluntarily be set by user, such as can generally be divided into clothes, shoes and hats, electronic product, can also further it be segmented, such as can be divided into more fine-grained shirt, one-piece dress, trousers for clothes.The granularity that classification is divided is thinner, and the degree of accuracy of the attribute got is higher, but it is corresponding the need for the preset attribute safeguarded it is more.Granularity set by classification may be referred to the otherness that the preset attribute between two different classifications is present, the division of classification should cause the certain otherness of the presence of preset attribute between two classifications, so as on the premise of the degree of accuracy for the attribute for ensureing to get, maintain the preset attribute set of an appropriate scale.
Step 102, the attribute for determining according to target word destination object.
As a kind of possible implementation, according to target word and the matching degree of preset attribute, the attribute of destination object is determined from target word.
Can be by the way that target word be matched with the preset attribute value in preset attribute and/or preset attribute subvalue, so that according to target word and the matching degree of preset attribute, the attribute of destination object is determined from target word.Specifically, pre-set similarity threshold, i.e. first threshold and Second Threshold, wherein, first threshold is more than Second Threshold.It is higher than the target word of first threshold for matching degree, is defined as attribute of the destination object in target platform;Higher than Second Threshold but candidate attribute is used as matching degree less than the target word of first threshold, use semantic discriminant approach whether to determine the candidate attribute for the attribute in the target platform, attribute of the destination object in target platform is determined from the candidate attribute according to differentiation result.
In general, the value of matching degree is between 0 to 1, and the matching degree obtained in previous step has three kinds of situations compared with first threshold and Second Threshold:
The first situation, the target word of first threshold is more than for matching degree, it is believed that have the attribute that greater probability is the destination object;
Second of situation, it is less than first threshold but more than the target word of Second Threshold for matching degree, think its attribute for being likely to be destination object, these target words can be regard as candidate attribute, need further to be judged, specifically determined whether in the present embodiment using semantic discriminant approach;
The third situation, for matching degree be less than Second Threshold destination object, it is believed that be the attribute of destination object probability it is very low, directly given up.
It can be seen that, by from original platform be used for destination object is described non-structured text in, extract the target word matched with the preset attribute of target platform, and then according to target word and the matching degree of preset attribute, the scheme of attribute of the destination object in target platform is determined from target word, can realize from the title and details of commodity this non-structured text is described in extract the attributes of commodity, therefore solving can not be handled for non-structured text in the prior art, obtain the technical problem of attribute of the commodity of original platform on target platform.
As alternatively possible implementation, it can be analyzed based on the semanteme of target word, obtain the attribute of the destination object.For example:It can be " Miao ethnic group's traditional clothes " that the word that details to commodity are described in page, which is extracted obtained target word, analyzed for the semanteme of target word, it is determined that the semanteme of " Miao ethnic group's traditional clothes " is for describing national style, thus can using national style as the commodity attribute.Here semantic analysis can be based on similar semantic, and summarize a variety of semantic relations such as semanteme and analyzed, specifically, similar semantic refers to it can is, with similar semanteme, to summarize semanteme refers to it can is upper the next concept between attribute and target word between attribute and target word.
Due to being with semantic dependency between foregoing preset attribute value and preset attribute subvalue, the preset attribute subvalue that can be thus matched according to target word, carry out the preset attribute value corresponding to inquiry acquisition preset attribute subvalue, using the preset attribute value as commodity property value, using the preset attribute value corresponding preset attribute as commodity attribute item.
It should be noted that in actual use can also be by the way of other semantemes based on target word be analyzed, so as to obtain the attribute of destination object, such as:Using the grader in data mining, the grader is that the semanteme based on vocabulary is trained acquisition.
By foregoing attribute acquisition methods, just attribute of the commodity in target platform can be obtained by the description pages of commodity in original platform.Fig. 2 is the application scenarios schematic diagram of attribute acquisition methods, as shown in Figure 2, left figure is the commodity page in original platform, include commodity title and commodity details in the page, extraction target word is carried out to commodity title and commodity details, according to the item property list of the target word acquisition extracted as shown at right, the item property list can be used for the Select to use for carrying out commodity.Wherein, item property includes the property value of item property and commodity, and first is classified as the attribute item of commodity, and second is classified as the property value of commodity.
Embodiment two
Specific in E-business applications scene in the present embodiment, when original platform accesses target platform, it is described in detail for how to obtain attribute of the commodity in original platform in target platform, Figure 33 is the schematic flow sheet for the attribute acquisition methods that the embodiment of the present invention two is provided, as shown in figure 3, including:
Step 201, based on being used to describe the non-structured text of end article in original platform, end article is predicted in the classification belonging to target platform.
Specifically, a disaggregated model can be built in advance first, such as disaggregated model can be Nave Bayesian Classifier algorithm classification model.By collecting the click data after the keyword and search that user scans for, according to the classification that commodity are clicked after being searched in click data, the corresponding classification of each keyword is determined, the corresponding relation of keyword and classification is obtained.And then participle is done to keyword, entry is obtained, entry is substituted to the keyword in the corresponding relation of keyword and classification, the corresponding relation of entry and classification is obtained.Using the corresponding relation of entry and classification as training set, disaggregated model is trained, disaggregated model is trained, completes the structure of disaggregated model.
Then, the non-structured text based on the destination object, data mining is carried out using trained disaggregated model, obtains the destination object in the affiliated classification of target platform.Wherein, non-structured text can be title and/or details page description.
For example:When the third-party platforms such as silver-colored Thailand need access this target platform of Taobao as original platform, the entry that participle obtains title can be carried out to the title of end article in third-party platform, and then part-of-speech tagging is carried out to the entry of title, the part-of-speech information of each entry is obtained.Using word algorithm is lost, entry is carried out according to part-of-speech information to lose word processing, so as to some noise words in end article title be abandoned, a retained product word, qualifier, brand word, season time word, promotion word etc..The entry retained is inputted to the disaggregated model trained, classification of the end article in Taobao's platform is obtained.
Due in different platforms, the division of classification is often different, therefore, prediction mode can be based on, obtain accurate classification of the end article belonging in target platform, consequently facilitating obtaining target word based on the matching of such purpose preset attribute, improve in the target word got and there is end article attribute.
Step 202, the target word that the preset attribute with the class predicted now is matched is extracted from non-structured text.
Specifically, carrying out Similarity Measure to the non-structured text by pretreatment, the target word matched with preset attribute, and matching degree are obtained.For the ease of describing matching degree can be designated as sim1.Wherein, matching degree is used for the similarity degree for describing target word and preset attribute.
Include two parts, respectively attribute item and property value in preset attribute, if target word is similar to the property value in preset attribute, claim target word to be matched with preset attribute, target word can be combined to form attribute to being designated as PV with the attribute item in the attribute matched.
Step 203, attribute and candidate attribute of the destination object in target platform are determined from target word according to the matching degree of target word.
For example:Similarity sim5 is more than to predetermined threshold value a target word, attribute of the destination object in target platform is used as;Similarity is less than predetermined threshold value a, and the target word more than predetermined threshold value b, candidate attribute is used as.Wherein, 0<b<a<1.
Step 204, the target word for being defined as attribute, match the commodity of stored target platform in database, extract the attribute of candidate's commodity in matching.
Specifically, database includes product library and commodity storehouse, product library does not include this field of businessman compared with commodity storehouse, and remainder data can be identical.That is a kind of product that one businessman of each record correspondence provides in a kind of each product of record correspondence, commodity storehouse in product library.
First, inquired about in product library, candidate's commodity in product library with being defined as during the target complete word of attribute is matched are obtained by inquiry.
Then, inquired about in commodity storehouse, candidate's commodity in commodity storehouse with being defined as during the target complete word of attribute is matched are obtained by inquiry.
The attribute of the whole candidate's commodity obtained will be inquired about twice as the attribute of end article, and then calculates the confidence level of each attribute.
The confidence level of step 205, each attribute of calculating candidate's commodity.
Wherein, confidence level is used to refer to the order of accuarcy of the end article described in target platform.
If it is determined that for attribute target word include brand and model when, and candidate's commodity it is unique when, then can directly set candidate's commodity each attribute confidence be 100%, confidence calculations formula referenced below can also be brought into and calculated, result is identical.Confidence calculations formula is as follows:
Confidence level=(occurrence number/candidate's commodity sum in the attribute of candidate's commodity) %
For example:
Target word constitute attribute to for:P1V1 and P2V2
If there are candidate's commodity of matching in commodity storehouse has 3, the PV of candidate's commodity is to being respectively:
P1V1、P2V2、P3V3、P6V6
P1V1、P2V2、P7V7
P1V1、P2V2、P8V8
P1V1, P2V2, P3V3, P7V7, P8V8 are then exported as the attribute of end article.
And then according to confidence level formula, calculate P1V1, P2V2, P3V3, P7V7, P8V8 confidence level, respectively 100%, 100%, 33.3%, 33.3%, 33.3%.
Step 206, the target word for being defined as candidate attribute, using semantic discriminant approach, determine confidence level of the candidate attribute for the attribute in target platform.
First, based on the relation between word and word, semantic differentiation is carried out.Each preset attribute value in target platform is separated according to word in advance, it is used as training text, model training is carried out using word2vec algorithms, it will determine as the discrimination model that the target word input of candidate attribute is trained, obtain word vector, word vector is added up, term vector is obtained, uses the cosine value of term vector as confidence level sim2 of the candidate attribute for the attribute in target platform.
Secondly, the context based on target word in non-structured text, carries out semantic differentiation.It regard the title of each commodity in target platform or details page as language material in advance, carry out participle, using word segmentation result, it is used as training text, model training is carried out using word2vec algorithms, it will determine as the discrimination model that the target word input of candidate attribute is trained, term vector is obtained, the cosine value of term vector is used as confidence level sim3 of the candidate attribute for the attribute in target platform.
Finally, the similarity sim2 and sim3 obtained according to two kinds of semantic discriminant approaches determines confidence level S of the candidate attribute for the attribute in target platform.For example:Using to sim2 and sim3 be weighted summation or average weighted mode calculate confidence level S.
As a kind of possible implementation, it can count the frequency that each candidate attribute occurs in the attribute of candidate's commodity for calculating confidence level S, with reference to candidate's commodity in previous step and the confidence level calculated be modified, obtain revised confidence level S.
Step 207, collect the target word for being defined as attribute and candidate attribute, and candidate's commodity attribute, the attribute of end article is determined from summarized results according to confidence level.
The required degree of accuracy can be obtained according to attribute, the threshold value of confidence level is determined.The required degree of accuracy is higher, then can accordingly heighten confidence threshold value, if the required degree of accuracy is relatively low, can set relatively low confidence threshold value.Selected from summarized results confidence level more than confidence threshold value target word be used as end article attribute.
Embodiment three
Fig. 4 is the structural representation for the attribute acquisition device that the embodiment of the present invention three is provided, as shown in figure 4, including:Abstraction module 31 and determining module 32.
Abstraction module 31, for from the non-structured text for describing destination object, extracting the target word matched with preset attribute;
Specifically, abstraction module 31 is specifically for using similarity algorithm, string matching is carried out to the non-structured text and the preset attribute, the target word and Corresponding matching degree of matching is obtained.
Determining module 32, the attribute preset attribute for determining the destination object according to the target word.
Specifically, determining module 32, specifically for the matching degree according to the target word and the preset attribute, determines the attribute of the destination object from the target word.
Or, specifically, determining module 32, is analyzed specifically for the semanteme based on the target word, obtains the attribute of the destination object.
In the present embodiment, by from original platform be used for destination object is described non-structured text in, extract the target word matched with the preset attribute of target platform, and then the scheme of attribute of the destination object in target platform is determined according to target word, can realize from the title and details of commodity this non-structured text is described in extract the attributes of commodity, therefore solving can not be handled for non-structured text in the prior art, obtain the technical problem of attribute of the commodity of original platform on target platform.
Example IV
Fig. 5 is the structural representation for the attribute acquisition device that the embodiment of the present invention four is provided, and on the basis of the attribute acquisition device that Fig. 4 is provided, determining module 32 further comprises:First determining unit 321 and the second determining unit 322.
First determining unit 321, the target word for being higher than first threshold for matching degree, is defined as attribute of the destination object in target platform.
Second determining unit 322, for higher than Second Threshold but being used as candidate attribute less than the target word of the first threshold for matching degree, use semantic discriminant approach whether to determine the candidate attribute for the attribute in the target platform, attribute of the destination object in target platform is determined from the candidate attribute according to differentiation result.
Further, the second determining unit 322, can include:First differentiates that subelement 3221 and second differentiates at least one in subelement 3222.As a kind of signal of possible implementation, the second determining unit 322 includes the first differentiation subelement 3221 and the second differentiation subelement 3222 in Fig. 4.
Wherein, first differentiates subelement 3221, for based on the relation in the candidate attribute between word and word, carrying out semantic differentiation, obtains confidence level of the candidate attribute for the attribute in the target platform.
Specifically, first differentiates subelement 3221 specifically for by semantic discrimination model between the word of each character input training in advance in the candidate attribute, obtaining word vector;Semantic discrimination model, is that each character in the attribute of the target platform is trained into acquisition as training text between the word;Word vector is added up, the first term vector is obtained;It regard the cosine value of first term vector as confidence level of the candidate attribute for the attribute in the target platform.
Second differentiates subelement 3222, for the context relation based on the candidate attribute in the non-structured text, carries out semantic differentiation, obtains confidence level of the candidate attribute for the attribute in the target platform.
Specifically, second differentiates subelement 3222, semantic discrimination model, obtains the second term vector between the word specifically for each word in the non-structured text to be inputted to training in advance;Semantic discrimination model between institute's predicate, is that as training text each word in non-structured text in the target platform is trained into acquisition;It regard the cosine value of second term vector as confidence level of the candidate attribute for the attribute in the target platform.
Further, the second determining unit 322 can also include:Attribute determination subelement 3223.
Attribute determination subelement 3223, for according to the confidence level, attribute of the destination object in target platform to be determined from the candidate attribute.
Further, determining module 32, in addition to:Matching unit 323.
Matching unit 323, is matched, the candidate target in being matched for the target word by the matching degree higher than first threshold with the attribute of each object in the target platform stored in database;The frequency occurred according to the attribute of each candidate target in the attribute of whole candidate targets, the attribute for calculating candidate target is the probability of attribute of the destination object in target platform;According to the probability calculated, attribute of the destination object in target platform is determined from the attribute of the candidate target.
Further, the attribute acquisition device that the present embodiment is provided, in addition to:Classification prediction module 33 and preset attribute determining module 34.
Classification prediction module 33, for predicting the destination object in the affiliated classification of target platform according to the non-structured text.
Preset attribute determining module 34, for regarding the attribute of class now described in the target platform as the preset attribute.
Wherein, classification prediction module 33, including:Excavate unit 331 and modeling unit 332.
Unit 331 is excavated, for the non-structured text based on the destination object, data mining is carried out using trained disaggregated model, the destination object is obtained in the affiliated classification of target platform.
Modeling unit 332, the affiliated classification of object for obtaining user's search key and being selected from search result;Word segmentation processing is carried out to the keyword, search entry is obtained;According to the search entry and the affiliated classification generation training set of the object selected;The disaggregated model is trained using the training set.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above-mentioned each method embodiment can be completed by the related hardware of programmed instruction.Foregoing program can be stored in a computer read/write memory medium.The program upon execution, performs the step of including above-mentioned each method embodiment;And foregoing storage medium includes:ROM, RAM, magnetic disc or CD etc. are various can be with the medium of store program codes.
Finally it should be noted that:Various embodiments above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although the present invention is described in detail with reference to foregoing embodiments, it will be understood by those within the art that:It can still modify to the technical scheme described in foregoing embodiments, or carry out equivalent substitution to which part or all technical characteristic;And these modifications or replacement, the essence of appropriate technical solution is departed from the scope of various embodiments of the present invention technical scheme.

Claims (26)

1. an attribute acquisition methods, it is characterised in that including:
From the non-structured text for describing destination object, extract what is matched with preset attribute Target word;
The attribute of the destination object is determined according to the target word.
2. attribute acquisition methods according to claim 1, it is characterised in that it is described from for In the non-structured text for describing destination object, the target word matched with preset attribute is extracted, is wrapped Include:
Using similarity algorithm, character string is carried out to the non-structured text and the preset attribute Matching, obtains the target word and Corresponding matching degree of matching.
3. attribute acquisition methods according to claim 1, it is characterised in that described according to institute The attribute that target word determines the destination object is stated, including:
According to the target word and the matching degree of the preset attribute, institute is determined from the target word State the attribute of destination object.
4. attribute acquisition methods according to claim 1, it is characterised in that described according to institute The attribute that target word determines the destination object is stated, including:
Semanteme based on the target word is analyzed, and obtains the attribute of the destination object.
5. attribute acquisition methods according to claim 3, it is characterised in that described according to institute The matching degree of target word and the preset attribute is stated, the destination object is determined from the target word Attribute, including:
It is higher than the target word of first threshold for matching degree, is defined as the destination object flat in target Attribute in platform;
As candidate belong to higher than Second Threshold but less than the target word of the first threshold for matching degree Property, use semantic discriminant approach whether to determine the candidate attribute for the attribute in the target platform, Attribute of the destination object in target platform is determined from the candidate attribute according to differentiation result.
6. attribute acquisition methods according to claim 5, it is characterised in that the use language Adopted discriminant approach determines whether the candidate attribute is attribute in the target platform, including:
Based on the relation in the candidate attribute between word and word, semantic differentiation is carried out, obtains described Candidate attribute be the target platform in attribute confidence level;
And/or, the context relation based on the candidate attribute in the non-structured text is entered Row is semantic to be differentiated, obtains confidence level of the candidate attribute for the attribute in the target platform.
7. attribute acquisition methods according to claim 6, it is characterised in that described to be based on institute The relation between word and word in candidate attribute is stated, semantic differentiation is carried out, including:
By semantic discrimination model between the word of each character input training in advance in the candidate attribute, obtain Obtain word vector;Semantic discrimination model, is by each character in the attribute of the target platform between the word Acquisition is trained as training text;
Word vector is added up, the first term vector is obtained;
It is in the target platform using the cosine value of first term vector as the candidate attribute The confidence level of attribute.
8. attribute acquisition methods according to claim 6, it is characterised in that described to be based on institute Context relation of the candidate attribute in the non-structured text is stated, semantic differentiation is carried out, including:
By semantic discrimination model between the word of each word input training in advance in the non-structured text, Obtain the second term vector;Semantic discrimination model, is by non-structural in the target platform between institute's predicate Change each word in text as training text and be trained acquisition;
It is in the target platform using the cosine value of second term vector as the candidate attribute The confidence level of attribute.
9. attribute acquisition methods according to claim 6, it is characterised in that the basis is sentenced Other result determines attribute of the destination object in target platform from the candidate attribute, including:
According to the confidence level, determine the destination object in target platform from the candidate attribute In attribute.
10. attribute acquisition methods according to claim 5, it is characterised in that it is described for Matching degree is higher than the target word of first threshold, is defined as category of the destination object in target platform After property, in addition to:
The matching degree is put down higher than the target stored in the target word and database of first threshold The attribute of each object is matched in platform, the candidate target in being matched;
The frequency occurred according to the attribute of each candidate target in the attribute of whole candidate targets, is calculated The attribute of candidate target is the probability of attribute of the destination object in target platform;
According to the probability calculated, the destination object is determined from the attribute of the candidate target Attribute in target platform.
11. the attribute acquisition methods according to claim any one of 1-10, it is characterised in that In the non-structured text from for describing destination object, extract what is matched with preset attribute Before target word, in addition to:
Predict the destination object in the affiliated classification of target platform according to the non-structured text;
It regard the attribute of class now described in the target platform as the preset attribute.
12. attribute acquisition methods according to claim 11, it is characterised in that the basis The non-structured text predicts the destination object in the affiliated classification of target platform, including:
Based on the non-structured text of the destination object, carried out using trained disaggregated model Data mining, obtains the destination object in the affiliated classification of target platform.
13. attribute acquisition methods according to claim 12, it is characterised in that the use Trained disaggregated model is carried out before data mining, in addition to:
Obtain user's search key and the affiliated classification of object selected from search result;
Word segmentation processing is carried out to the keyword, search entry is obtained;
According to the search entry and the affiliated classification generation training set of the object selected;
The disaggregated model is trained using the training set.
14. an attribute acquisition device, it is characterised in that including:
Abstraction module, for from the non-structured text for describing destination object, extract with The target word of preset attribute matching;
Determining module, the attribute for determining the destination object according to the target word.
15. attribute acquisition device according to claim 14, it is characterised in that
The abstraction module, specifically for using similarity algorithm, to the non-structured text with The preset attribute carries out string matching, obtains the target word and Corresponding matching degree of matching.
16. attribute acquisition device according to claim 14, it is characterised in that
The determining module, specifically for the matching degree according to the target word and the preset attribute, The attribute of the destination object is determined from the target word.
17. attribute acquisition device according to claim 14, it is characterised in that
The determining module, is analyzed specifically for the semanteme based on the target word, obtains institute State the attribute of destination object.
18. attribute acquisition device according to claim 16, it is characterised in that the determination Module, including:
First determining unit, the target word for being higher than first threshold for matching degree, is defined as institute State attribute of the destination object in target platform;
Second determining unit, for for matching degree is higher than Second Threshold but is less than the first threshold Target word as candidate attribute, use semantic discriminant approach to determine the candidate attribute whether for institute The attribute in target platform is stated, the target pair is determined from the candidate attribute according to differentiation result As the attribute in target platform.
19. attribute acquisition device according to claim 18, it is characterised in that described second Determining unit, including:
First differentiates subelement, for based on the relation in the candidate attribute between word and word, entering Row is semantic to be differentiated, obtains confidence level of the candidate attribute for the attribute in the target platform;
And/or, second differentiate subelement, for based on the candidate attribute in the unstructured text Context relation in this, carries out semantic differentiation, and it is the target platform to obtain the candidate attribute In attribute confidence level.
20. attribute acquisition device according to claim 19, it is characterised in that
Described first differentiates subelement, specifically for each character input in the candidate attribute is pre- Semantic discrimination model between the word first trained, obtains word vector;Semantic discrimination model between the word, be Each character in the attribute of the target platform is trained acquisition as training text;To described Word vector is added up, and obtains the first term vector;It regard the cosine value of first term vector as institute State confidence level of the candidate attribute for the attribute in the target platform.
21. attribute acquisition device according to claim 19, it is characterised in that
Described second differentiates subelement, specifically for each word in the non-structured text is defeated Enter semantic discrimination model between the word of training in advance, obtain the second term vector;It is semantic between institute's predicate to differentiate Model, is to carry out each word in non-structured text in the target platform as training text What training was obtained;It is the target using the cosine value of second term vector as the candidate attribute The confidence level of attribute in platform.
22. attribute acquisition device according to claim 19, it is characterised in that described second Determining unit, in addition to:
Attribute determination subelement, for according to the confidence level, institute to be determined from the candidate attribute State attribute of the destination object in target platform.
23. attribute acquisition device according to claim 18, it is characterised in that the determination Module, in addition to:
Matching unit, is deposited for the matching degree to be higher than in the target word of first threshold and database The attribute of each object is matched in the target platform of storage, the candidate target in being matched; The frequency occurred according to the attribute of each candidate target in the attribute of whole candidate targets, calculates candidate The attribute of object is the probability of attribute of the destination object in target platform;According to being calculated Probability, category of the destination object in target platform is determined from the attribute of the candidate target Property.
24. the attribute acquisition device according to claim any one of 14-23, it is characterised in that Described device, in addition to:
Classification prediction module, for predicting the destination object in mesh according to the non-structured text Mark the affiliated classification of platform;
Preset attribute determining module, for using the attribute of class now described in the target platform as The preset attribute.
25. attribute acquisition device according to claim 24, it is characterised in that the classification Prediction module, including:
Unit is excavated, for the non-structured text based on the destination object, using by training Disaggregated model carry out data mining, obtain the destination object in the affiliated classification of target platform.
26. attribute acquisition device according to claim 25, it is characterised in that the classification Prediction module, in addition to:
Modeling unit, for pair for obtaining user's search key and being selected from search result As affiliated classification;Word segmentation processing is carried out to the keyword, search entry is obtained;Searched according to described Rope entry and the affiliated classification generation training set of the object selected;Using the training set to described point Class model is trained.
CN201610154037.9A 2016-03-17 2016-03-17 Attribute acquisition methods and device Pending CN107203548A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201610154037.9A CN107203548A (en) 2016-03-17 2016-03-17 Attribute acquisition methods and device
TW106104935A TW201734901A (en) 2016-03-17 2017-02-15 Attribute acquisition method and device
PCT/CN2017/075829 WO2017157198A1 (en) 2016-03-17 2017-03-07 Attribute acquisition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610154037.9A CN107203548A (en) 2016-03-17 2016-03-17 Attribute acquisition methods and device

Publications (1)

Publication Number Publication Date
CN107203548A true CN107203548A (en) 2017-09-26

Family

ID=59850988

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610154037.9A Pending CN107203548A (en) 2016-03-17 2016-03-17 Attribute acquisition methods and device

Country Status (3)

Country Link
CN (1) CN107203548A (en)
TW (1) TW201734901A (en)
WO (1) WO2017157198A1 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197180A (en) * 2017-12-25 2018-06-22 中山大学 A kind of method of the editable image of clothing retrieval of clothes attribute
CN109101595A (en) * 2018-07-27 2018-12-28 郑州云海信息技术有限公司 A kind of information query method, device, equipment and computer readable storage medium
CN109711951A (en) * 2019-01-18 2019-05-03 中合金网(北京)电子商务有限公司 Commodity automation collection and moving method
CN110175322A (en) * 2019-05-22 2019-08-27 北京神州泰岳软件股份有限公司 A kind of structural method and device of document
CN110223095A (en) * 2018-03-02 2019-09-10 阿里巴巴集团控股有限公司 Determine the method, apparatus, equipment and storage medium of item property
CN110334185A (en) * 2019-07-05 2019-10-15 政采云有限公司 The treating method and apparatus of data in a kind of platform
CN110807095A (en) * 2018-08-01 2020-02-18 北京京东尚科信息技术有限公司 Article matching method and device
CN111797622A (en) * 2019-06-20 2020-10-20 北京沃东天骏信息技术有限公司 Method and apparatus for generating attribute information
CN111860575A (en) * 2020-06-05 2020-10-30 百度在线网络技术(北京)有限公司 Method and device for processing article attribute information, electronic equipment and storage medium
CN112800978A (en) * 2021-01-29 2021-05-14 北京金山云网络技术有限公司 Attribute recognition method, and training method and device for part attribute extraction network
CN113256379A (en) * 2021-05-24 2021-08-13 北京小米移动软件有限公司 Method for correlating shopping demands for commodities
CN113609279A (en) * 2021-08-05 2021-11-05 湖南特能博世科技有限公司 Material model extraction method and device and computer equipment
CN113724055A (en) * 2021-09-14 2021-11-30 京东科技信息技术有限公司 Commodity attribute mining method and device

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807083A (en) * 2018-08-02 2020-02-18 北京京东尚科信息技术有限公司 Keyword evaluation method and device
CN110874408B (en) * 2018-08-29 2023-05-26 阿里巴巴集团控股有限公司 Model training method, text recognition device and computing equipment
CN110955822B (en) * 2018-09-25 2024-02-06 北京京东尚科信息技术有限公司 Commodity searching method and device
CN111444334B (en) * 2019-01-16 2023-04-25 阿里巴巴集团控股有限公司 Data processing method, text recognition device and computer equipment
CN111444335B (en) * 2019-01-17 2023-04-07 阿里巴巴集团控股有限公司 Method and device for extracting central word
CN110263123B (en) * 2019-06-05 2023-10-31 腾讯科技(深圳)有限公司 Method and device for predicting organization name abbreviation and computer equipment
CN110827063A (en) * 2019-10-18 2020-02-21 用友网络科技股份有限公司 Multi-strategy fused commodity recommendation method, device, terminal and storage medium
US20210304275A1 (en) * 2020-03-31 2021-09-30 Coupang Corp. Computer-implemented systems and methods for electronicaly determining a real-time product registration
CN112183035B (en) * 2020-11-06 2023-11-21 上海恒生聚源数据服务有限公司 Text labeling method, device, equipment and readable storage medium
CN112507702B (en) * 2020-12-03 2023-08-22 北京百度网讯科技有限公司 Text information extraction method and device, electronic equipment and storage medium
CN113627509B (en) * 2021-08-04 2024-05-10 口碑(上海)信息技术有限公司 Data classification method, device, computer equipment and computer readable storage medium
CN113722496B (en) * 2021-11-02 2022-03-08 北京世纪好未来教育科技有限公司 Triple extraction method and device, readable storage medium and electronic equipment
CN114201973B (en) * 2022-02-15 2022-06-07 深圳博士创新技术转移有限公司 Resource pool object data mining method and system based on artificial intelligence

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073729A (en) * 2011-01-14 2011-05-25 百度在线网络技术(北京)有限公司 Relationship knowledge sharing platform and implementation method thereof
CN103324761A (en) * 2013-07-11 2013-09-25 广州市尊网商通资讯科技有限公司 Product database forming method based on Internet data and system
CN103473317A (en) * 2013-09-12 2013-12-25 百度在线网络技术(北京)有限公司 Method and equipment for extracting keywords
CN103605815A (en) * 2013-12-11 2014-02-26 焦点科技股份有限公司 Automatic commodity information classifying and recommending method applicable to B2B (Business to Business) e-commerce platform
CN104850554A (en) * 2014-02-14 2015-08-19 北京搜狗科技发展有限公司 Searching method and system
CN105005917A (en) * 2015-07-07 2015-10-28 上海晶赞科技发展有限公司 Universal method for correlating single items of different e-commerce websites

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5257172B2 (en) * 2009-03-16 2013-08-07 富士通株式会社 SEARCH METHOD, SEARCH PROGRAM, AND SEARCH DEVICE
CN102375823B (en) * 2010-08-13 2014-11-05 腾讯科技(深圳)有限公司 Searching result gathering display method and system
CN103309886B (en) * 2012-03-13 2017-05-10 阿里巴巴集团控股有限公司 Trading-platform-based structural information searching method and device
CN104504138A (en) * 2014-12-31 2015-04-08 广州索答信息科技有限公司 Human-based information fusion method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073729A (en) * 2011-01-14 2011-05-25 百度在线网络技术(北京)有限公司 Relationship knowledge sharing platform and implementation method thereof
CN103324761A (en) * 2013-07-11 2013-09-25 广州市尊网商通资讯科技有限公司 Product database forming method based on Internet data and system
CN103473317A (en) * 2013-09-12 2013-12-25 百度在线网络技术(北京)有限公司 Method and equipment for extracting keywords
CN103605815A (en) * 2013-12-11 2014-02-26 焦点科技股份有限公司 Automatic commodity information classifying and recommending method applicable to B2B (Business to Business) e-commerce platform
CN104850554A (en) * 2014-02-14 2015-08-19 北京搜狗科技发展有限公司 Searching method and system
CN105005917A (en) * 2015-07-07 2015-10-28 上海晶赞科技发展有限公司 Universal method for correlating single items of different e-commerce websites

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
严灿勋: "《英汉军事语料句子对齐研究》", 30 June 2015, 国防工业出版社 *
曾道建等: "面向非结构化文本的开放式实体属性抽取", 《江西师范大学学报》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197180A (en) * 2017-12-25 2018-06-22 中山大学 A kind of method of the editable image of clothing retrieval of clothes attribute
CN110223095A (en) * 2018-03-02 2019-09-10 阿里巴巴集团控股有限公司 Determine the method, apparatus, equipment and storage medium of item property
CN109101595B (en) * 2018-07-27 2022-07-08 郑州云海信息技术有限公司 Information query method, device, equipment and computer readable storage medium
CN109101595A (en) * 2018-07-27 2018-12-28 郑州云海信息技术有限公司 A kind of information query method, device, equipment and computer readable storage medium
CN110807095A (en) * 2018-08-01 2020-02-18 北京京东尚科信息技术有限公司 Article matching method and device
CN109711951A (en) * 2019-01-18 2019-05-03 中合金网(北京)电子商务有限公司 Commodity automation collection and moving method
CN110175322A (en) * 2019-05-22 2019-08-27 北京神州泰岳软件股份有限公司 A kind of structural method and device of document
CN111797622A (en) * 2019-06-20 2020-10-20 北京沃东天骏信息技术有限公司 Method and apparatus for generating attribute information
CN111797622B (en) * 2019-06-20 2024-04-09 北京沃东天骏信息技术有限公司 Method and device for generating attribute information
CN110334185A (en) * 2019-07-05 2019-10-15 政采云有限公司 The treating method and apparatus of data in a kind of platform
CN111860575A (en) * 2020-06-05 2020-10-30 百度在线网络技术(北京)有限公司 Method and device for processing article attribute information, electronic equipment and storage medium
CN112800978A (en) * 2021-01-29 2021-05-14 北京金山云网络技术有限公司 Attribute recognition method, and training method and device for part attribute extraction network
CN113256379A (en) * 2021-05-24 2021-08-13 北京小米移动软件有限公司 Method for correlating shopping demands for commodities
CN113609279A (en) * 2021-08-05 2021-11-05 湖南特能博世科技有限公司 Material model extraction method and device and computer equipment
CN113609279B (en) * 2021-08-05 2023-12-08 湖南特能博世科技有限公司 Material model extraction method and device and computer equipment
CN113724055A (en) * 2021-09-14 2021-11-30 京东科技信息技术有限公司 Commodity attribute mining method and device
CN113724055B (en) * 2021-09-14 2024-04-09 京东科技信息技术有限公司 Commodity attribute mining method and device

Also Published As

Publication number Publication date
WO2017157198A1 (en) 2017-09-21
TW201734901A (en) 2017-10-01

Similar Documents

Publication Publication Date Title
CN107203548A (en) Attribute acquisition methods and device
US11341170B2 (en) Automated extraction, inference and normalization of structured attributes for product data
KR101778679B1 (en) Method and system for classifying data consisting of multiple attribues represented by sequences of text words or symbols using deep learning
More Attribute extraction from product titles in ecommerce
Carrara et al. LSTM-based real-time action detection and prediction in human motion streams
EP2812883B1 (en) System and method for semantically annotating images
JP5424001B2 (en) LEARNING DATA GENERATION DEVICE, REQUESTED EXTRACTION EXTRACTION SYSTEM, LEARNING DATA GENERATION METHOD, AND PROGRAM
US11373424B1 (en) Document analysis architecture
US11379665B1 (en) Document analysis architecture
WO2018090468A1 (en) Method and device for searching for video program
US20220284392A1 (en) Automated extraction, inference and normalization of structured attributes for product data
Zarchi et al. A semantic model for general purpose content-based image retrieval systems
EP4165487A1 (en) Document analysis architecture
CN116738988A (en) Text detection method, computer device, and storage medium
CN114416998A (en) Text label identification method and device, electronic equipment and storage medium
CN113903042A (en) Trademark identification method and device, computer equipment and storage medium
CN111061939B (en) Scientific research academic news keyword matching recommendation method based on deep learning
Shi et al. Random pairwise shapelets forest
Huang et al. Keyword spotting in unconstrained handwritten Chinese documents using contextual word model
US11776291B1 (en) Document analysis architecture
CN110298228A (en) A kind of multi-Target Image search method
Zheng et al. A hybrid architecture based on CNN for image semantic annotation
Waykar et al. Intent aware optimization for content based lecture video retrieval using Grey Wolf optimizer
Chen et al. Pseudo-label diversity exploitation for few-shot object detection
US11893065B2 (en) Document analysis architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination