CN107203548A - Attribute acquisition methods and device - Google Patents
Attribute acquisition methods and device Download PDFInfo
- Publication number
- CN107203548A CN107203548A CN201610154037.9A CN201610154037A CN107203548A CN 107203548 A CN107203548 A CN 107203548A CN 201610154037 A CN201610154037 A CN 201610154037A CN 107203548 A CN107203548 A CN 107203548A
- Authority
- CN
- China
- Prior art keywords
- attribute
- target
- word
- candidate
- destination object
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention provides attribute acquisition methods and device, by from original platform be used for destination object is described non-structured text in, the target word matched with the preset attribute of target platform is extracted, and then attribute of the destination object in target platform is determined according to target word.For e-commerce platform, can be by extracting the attributes of commodity in describing this non-structured text from the commodity title and details of original platform, therefore solving can not be handled for non-structured text in the prior art, obtain the technical problem of attribute of the commodity of original platform on target platform.
Description
Technical field
The present invention relates to information technology, more particularly to an attribute acquisition methods and device.
Background technology
In ecommerce processing platform, a commodity storehouse can be safeguarded to the commodity issued, according to the commodity classification of commodity in commodity storehouse, it is determined that commodity are described brand, material, color, style, price range etc. attribute item, screened consequently facilitating carrying out statistics and user.It is often otherwise varied with target platform due to the attribute for being used to describe commodity on original platform, including attribute item and property value on target platform during publishing commodity when the safe business of original platform such as silver needs the target platforms such as access Taobao.For example:On silver-colored safe commercial podium, employ brand, color, material and Time To Market and describe one-piece dress this commodity of commodity class now, and brand, color classification, style and price range are then employed on Taobao's platform.Therefore, on Taobao's platform before publishing commodity, it is thus necessary to determine that the property value of each attribute item when the commodity on silver Thailand commercial podium are described in Taobao's platform, that is, attribute of the commodity on target platform is got.
In the prior art can be according to the attribute of target platform, clustering processing is carried out to the attribute of original platform commodity, so as to obtain the attribute of the commodity on target platform, but this mode can only be handled for attribute of the commodity on original platform, and the non-structured text such as title of the commodity on original platform or details description can not be handled.
The content of the invention
The present invention provides an attribute acquisition methods and device, for carrying out the attribute that processing obtains the commodity based on the non-structured text such as title of the commodity on original platform or details description.
To reach above-mentioned purpose, embodiments of the invention are adopted the following technical scheme that:
First aspect there is provided a kind of attribute acquisition methods, including:
From the non-structured text for describing destination object, the target word matched with preset attribute is extracted;
The attribute of the destination object is determined according to the target word.Second aspect there is provided an attribute acquisition device, including:
Abstraction module, the target word preset attribute matched for from the non-structured text for describing destination object, extracting with preset attribute;
Determining module, the attribute preset attribute for determining the destination object according to the target word.
Attribute acquisition methods and device provided in an embodiment of the present invention, by from original platform be used for destination object is described non-structured text in, the target word matched with the preset attribute of target platform is extracted, and then attribute of the destination object in target platform is determined according to target word.For e-commerce platform, can realize from the title and details of commodity this non-structured text is described in extract the attributes of commodity, therefore solving can not be handled for non-structured text in the prior art, obtain the technical problem of attribute of the commodity of original platform on target platform.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, and can be practiced according to the content of specification, and in order to which above and other objects of the present invention, feature and advantage can be become apparent, below especially exemplified by the embodiment of the present invention.
Brief description of the drawings
By reading the detailed description of hereafter preferred embodiment, various other advantages and benefit will be clear understanding for those of ordinary skill in the art.Accompanying drawing is only used for showing the purpose of preferred embodiment, and is not considered as limitation of the present invention.And in whole accompanying drawing, identical part is denoted by the same reference numerals.In the accompanying drawings:
Fig. 1 is the schematic flow sheet for the attribute acquisition methods that embodiment one is provided;
Fig. 2 is the application scenarios schematic diagram of attribute acquisition methods;
Fig. 3 is the schematic flow sheet for the attribute acquisition methods that the embodiment of the present invention two is provided;
Fig. 4 is the structural representation for the attribute acquisition device that the embodiment of the present invention three is provided;
Fig. 5 is the structural representation for the attribute acquisition device that the embodiment of the present invention four is provided.
Embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the exemplary embodiment of the disclosure in accompanying drawing, it being understood, however, that may be realized in various forms the disclosure without that should be limited by embodiments set forth here.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can by the scope of the present disclosure completely convey to those skilled in the art.
Attribute acquisition methods provided in an embodiment of the present invention and device are described in detail below in conjunction with the accompanying drawings.
Embodiment one
Fig. 1 is the schematic flow sheet for the attribute acquisition methods that embodiment one is provided, the method that the present embodiment is provided can be used in e-commerce platform, that is, object mentioned in the present embodiment can be commodity, embodiment can be used for delivering the commodity in original platform to before target platform, attribute of the commodity in target platform is obtained, as shown in figure 1, method includes:
Step 101, from the non-structured text for describing destination object, extract the target word matched with preset attribute.
Wherein, preset attribute includes preset attribute and preset attribute value.For same preset attribute, corresponding preset attribute value can be made up of one or more vocabulary.Optionally, setting between preset attribute and preset attribute value after corresponding relation, the corresponding relation between multiple preset attribute subvalues can also be set for each preset attribute value, wherein, preset attribute subvalue has similar semantic with preset attribute value.
For example:For the preset attribute of dress ornament style, it can be provided for describing the vocabulary of different dress ornament styles as preset attribute value.Further, can also be for that for each dress ornament style vocabulary multiple vocabulary with similar semantic can also be set to be used as preset attribute subvalue, specifically, it is preset attribute value that can set national, and then Miao ethnic group, Han nationality, Tibetan etc. can also be set to specifically describe national vocabulary as preset attribute subvalue, campus, literature and art can also be and for example set while setting institute as preset attribute value and small pure and fresh etc. are used for the vocabulary for specifically describing institute's style as preset attribute subvalue.
It should be noted that matching mentioned here refers not only to absolute matches, include the situation of part matching.
Specifically, each word corresponding with the preset attribute of the word in non-structured text is matched, think that the word is matched with preset attribute if the vocabulary matched in the presence of at least one, and then determine that the word is target word.Before matching, can be by obtaining the non-structured texts such as title and details description of the destination object in original platform, these non-structured texts are pre-processed, pretreatment operation mainly includes participle, full-shape and changes half-angle, capital and small letter unification, normalizing is carried out to text, brand word is accurately identified, individual character etc. is handled.And then in target platform, the preset attribute of the affiliated class of inquiry destination object now.Using similarity algorithm, string matching is carried out to the non-structured text and the preset attribute, the target words such as the word of matching are obtained, and obtain the matching degree between each target word and preset attribute.By carrying out string matching, the vocabulary similar to preset attribute is found from non-structured text, similarity algorithm used herein can include:Editing distance, cosine angle similarity, Euclidean distance, Jacarrd genetic similarties distance (Jacarrd is a kind of algorithm of genetic similarty), two-dimensional grammar (2-Gram) language model, longest common subsequence, most long continuous public substring etc..
The foregoing string matching referred to can be not only used in this step, it would however also be possible to employ other modes extract target word from non-structured text, such as semantic matches.
It should be noted that, the foregoing classification referred to refers to object generic, the granularity of classification can voluntarily be set by user, such as can generally be divided into clothes, shoes and hats, electronic product, can also further it be segmented, such as can be divided into more fine-grained shirt, one-piece dress, trousers for clothes.The granularity that classification is divided is thinner, and the degree of accuracy of the attribute got is higher, but it is corresponding the need for the preset attribute safeguarded it is more.Granularity set by classification may be referred to the otherness that the preset attribute between two different classifications is present, the division of classification should cause the certain otherness of the presence of preset attribute between two classifications, so as on the premise of the degree of accuracy for the attribute for ensureing to get, maintain the preset attribute set of an appropriate scale.
Step 102, the attribute for determining according to target word destination object.
As a kind of possible implementation, according to target word and the matching degree of preset attribute, the attribute of destination object is determined from target word.
Can be by the way that target word be matched with the preset attribute value in preset attribute and/or preset attribute subvalue, so that according to target word and the matching degree of preset attribute, the attribute of destination object is determined from target word.Specifically, pre-set similarity threshold, i.e. first threshold and Second Threshold, wherein, first threshold is more than Second Threshold.It is higher than the target word of first threshold for matching degree, is defined as attribute of the destination object in target platform;Higher than Second Threshold but candidate attribute is used as matching degree less than the target word of first threshold, use semantic discriminant approach whether to determine the candidate attribute for the attribute in the target platform, attribute of the destination object in target platform is determined from the candidate attribute according to differentiation result.
In general, the value of matching degree is between 0 to 1, and the matching degree obtained in previous step has three kinds of situations compared with first threshold and Second Threshold:
The first situation, the target word of first threshold is more than for matching degree, it is believed that have the attribute that greater probability is the destination object;
Second of situation, it is less than first threshold but more than the target word of Second Threshold for matching degree, think its attribute for being likely to be destination object, these target words can be regard as candidate attribute, need further to be judged, specifically determined whether in the present embodiment using semantic discriminant approach;
The third situation, for matching degree be less than Second Threshold destination object, it is believed that be the attribute of destination object probability it is very low, directly given up.
It can be seen that, by from original platform be used for destination object is described non-structured text in, extract the target word matched with the preset attribute of target platform, and then according to target word and the matching degree of preset attribute, the scheme of attribute of the destination object in target platform is determined from target word, can realize from the title and details of commodity this non-structured text is described in extract the attributes of commodity, therefore solving can not be handled for non-structured text in the prior art, obtain the technical problem of attribute of the commodity of original platform on target platform.
As alternatively possible implementation, it can be analyzed based on the semanteme of target word, obtain the attribute of the destination object.For example:It can be " Miao ethnic group's traditional clothes " that the word that details to commodity are described in page, which is extracted obtained target word, analyzed for the semanteme of target word, it is determined that the semanteme of " Miao ethnic group's traditional clothes " is for describing national style, thus can using national style as the commodity attribute.Here semantic analysis can be based on similar semantic, and summarize a variety of semantic relations such as semanteme and analyzed, specifically, similar semantic refers to it can is, with similar semanteme, to summarize semanteme refers to it can is upper the next concept between attribute and target word between attribute and target word.
Due to being with semantic dependency between foregoing preset attribute value and preset attribute subvalue, the preset attribute subvalue that can be thus matched according to target word, carry out the preset attribute value corresponding to inquiry acquisition preset attribute subvalue, using the preset attribute value as commodity property value, using the preset attribute value corresponding preset attribute as commodity attribute item.
It should be noted that in actual use can also be by the way of other semantemes based on target word be analyzed, so as to obtain the attribute of destination object, such as:Using the grader in data mining, the grader is that the semanteme based on vocabulary is trained acquisition.
By foregoing attribute acquisition methods, just attribute of the commodity in target platform can be obtained by the description pages of commodity in original platform.Fig. 2 is the application scenarios schematic diagram of attribute acquisition methods, as shown in Figure 2, left figure is the commodity page in original platform, include commodity title and commodity details in the page, extraction target word is carried out to commodity title and commodity details, according to the item property list of the target word acquisition extracted as shown at right, the item property list can be used for the Select to use for carrying out commodity.Wherein, item property includes the property value of item property and commodity, and first is classified as the attribute item of commodity, and second is classified as the property value of commodity.
Embodiment two
Specific in E-business applications scene in the present embodiment, when original platform accesses target platform, it is described in detail for how to obtain attribute of the commodity in original platform in target platform, Figure 33 is the schematic flow sheet for the attribute acquisition methods that the embodiment of the present invention two is provided, as shown in figure 3, including:
Step 201, based on being used to describe the non-structured text of end article in original platform, end article is predicted in the classification belonging to target platform.
Specifically, a disaggregated model can be built in advance first, such as disaggregated model can be Nave Bayesian Classifier algorithm classification model.By collecting the click data after the keyword and search that user scans for, according to the classification that commodity are clicked after being searched in click data, the corresponding classification of each keyword is determined, the corresponding relation of keyword and classification is obtained.And then participle is done to keyword, entry is obtained, entry is substituted to the keyword in the corresponding relation of keyword and classification, the corresponding relation of entry and classification is obtained.Using the corresponding relation of entry and classification as training set, disaggregated model is trained, disaggregated model is trained, completes the structure of disaggregated model.
Then, the non-structured text based on the destination object, data mining is carried out using trained disaggregated model, obtains the destination object in the affiliated classification of target platform.Wherein, non-structured text can be title and/or details page description.
For example:When the third-party platforms such as silver-colored Thailand need access this target platform of Taobao as original platform, the entry that participle obtains title can be carried out to the title of end article in third-party platform, and then part-of-speech tagging is carried out to the entry of title, the part-of-speech information of each entry is obtained.Using word algorithm is lost, entry is carried out according to part-of-speech information to lose word processing, so as to some noise words in end article title be abandoned, a retained product word, qualifier, brand word, season time word, promotion word etc..The entry retained is inputted to the disaggregated model trained, classification of the end article in Taobao's platform is obtained.
Due in different platforms, the division of classification is often different, therefore, prediction mode can be based on, obtain accurate classification of the end article belonging in target platform, consequently facilitating obtaining target word based on the matching of such purpose preset attribute, improve in the target word got and there is end article attribute.
Step 202, the target word that the preset attribute with the class predicted now is matched is extracted from non-structured text.
Specifically, carrying out Similarity Measure to the non-structured text by pretreatment, the target word matched with preset attribute, and matching degree are obtained.For the ease of describing matching degree can be designated as sim1.Wherein, matching degree is used for the similarity degree for describing target word and preset attribute.
Include two parts, respectively attribute item and property value in preset attribute, if target word is similar to the property value in preset attribute, claim target word to be matched with preset attribute, target word can be combined to form attribute to being designated as PV with the attribute item in the attribute matched.
Step 203, attribute and candidate attribute of the destination object in target platform are determined from target word according to the matching degree of target word.
For example:Similarity sim5 is more than to predetermined threshold value a target word, attribute of the destination object in target platform is used as;Similarity is less than predetermined threshold value a, and the target word more than predetermined threshold value b, candidate attribute is used as.Wherein, 0<b<a<1.
Step 204, the target word for being defined as attribute, match the commodity of stored target platform in database, extract the attribute of candidate's commodity in matching.
Specifically, database includes product library and commodity storehouse, product library does not include this field of businessman compared with commodity storehouse, and remainder data can be identical.That is a kind of product that one businessman of each record correspondence provides in a kind of each product of record correspondence, commodity storehouse in product library.
First, inquired about in product library, candidate's commodity in product library with being defined as during the target complete word of attribute is matched are obtained by inquiry.
Then, inquired about in commodity storehouse, candidate's commodity in commodity storehouse with being defined as during the target complete word of attribute is matched are obtained by inquiry.
The attribute of the whole candidate's commodity obtained will be inquired about twice as the attribute of end article, and then calculates the confidence level of each attribute.
The confidence level of step 205, each attribute of calculating candidate's commodity.
Wherein, confidence level is used to refer to the order of accuarcy of the end article described in target platform.
If it is determined that for attribute target word include brand and model when, and candidate's commodity it is unique when, then can directly set candidate's commodity each attribute confidence be 100%, confidence calculations formula referenced below can also be brought into and calculated, result is identical.Confidence calculations formula is as follows:
Confidence level=(occurrence number/candidate's commodity sum in the attribute of candidate's commodity) %
For example:
Target word constitute attribute to for:P1V1 and P2V2
If there are candidate's commodity of matching in commodity storehouse has 3, the PV of candidate's commodity is to being respectively:
P1V1、P2V2、P3V3、P6V6
P1V1、P2V2、P7V7
P1V1、P2V2、P8V8
P1V1, P2V2, P3V3, P7V7, P8V8 are then exported as the attribute of end article.
And then according to confidence level formula, calculate P1V1, P2V2, P3V3, P7V7, P8V8 confidence level, respectively 100%, 100%, 33.3%, 33.3%, 33.3%.
Step 206, the target word for being defined as candidate attribute, using semantic discriminant approach, determine confidence level of the candidate attribute for the attribute in target platform.
First, based on the relation between word and word, semantic differentiation is carried out.Each preset attribute value in target platform is separated according to word in advance, it is used as training text, model training is carried out using word2vec algorithms, it will determine as the discrimination model that the target word input of candidate attribute is trained, obtain word vector, word vector is added up, term vector is obtained, uses the cosine value of term vector as confidence level sim2 of the candidate attribute for the attribute in target platform.
Secondly, the context based on target word in non-structured text, carries out semantic differentiation.It regard the title of each commodity in target platform or details page as language material in advance, carry out participle, using word segmentation result, it is used as training text, model training is carried out using word2vec algorithms, it will determine as the discrimination model that the target word input of candidate attribute is trained, term vector is obtained, the cosine value of term vector is used as confidence level sim3 of the candidate attribute for the attribute in target platform.
Finally, the similarity sim2 and sim3 obtained according to two kinds of semantic discriminant approaches determines confidence level S of the candidate attribute for the attribute in target platform.For example:Using to sim2 and sim3 be weighted summation or average weighted mode calculate confidence level S.
As a kind of possible implementation, it can count the frequency that each candidate attribute occurs in the attribute of candidate's commodity for calculating confidence level S, with reference to candidate's commodity in previous step and the confidence level calculated be modified, obtain revised confidence level S.
Step 207, collect the target word for being defined as attribute and candidate attribute, and candidate's commodity attribute, the attribute of end article is determined from summarized results according to confidence level.
The required degree of accuracy can be obtained according to attribute, the threshold value of confidence level is determined.The required degree of accuracy is higher, then can accordingly heighten confidence threshold value, if the required degree of accuracy is relatively low, can set relatively low confidence threshold value.Selected from summarized results confidence level more than confidence threshold value target word be used as end article attribute.
Embodiment three
Fig. 4 is the structural representation for the attribute acquisition device that the embodiment of the present invention three is provided, as shown in figure 4, including:Abstraction module 31 and determining module 32.
Abstraction module 31, for from the non-structured text for describing destination object, extracting the target word matched with preset attribute;
Specifically, abstraction module 31 is specifically for using similarity algorithm, string matching is carried out to the non-structured text and the preset attribute, the target word and Corresponding matching degree of matching is obtained.
Determining module 32, the attribute preset attribute for determining the destination object according to the target word.
Specifically, determining module 32, specifically for the matching degree according to the target word and the preset attribute, determines the attribute of the destination object from the target word.
Or, specifically, determining module 32, is analyzed specifically for the semanteme based on the target word, obtains the attribute of the destination object.
In the present embodiment, by from original platform be used for destination object is described non-structured text in, extract the target word matched with the preset attribute of target platform, and then the scheme of attribute of the destination object in target platform is determined according to target word, can realize from the title and details of commodity this non-structured text is described in extract the attributes of commodity, therefore solving can not be handled for non-structured text in the prior art, obtain the technical problem of attribute of the commodity of original platform on target platform.
Example IV
Fig. 5 is the structural representation for the attribute acquisition device that the embodiment of the present invention four is provided, and on the basis of the attribute acquisition device that Fig. 4 is provided, determining module 32 further comprises:First determining unit 321 and the second determining unit 322.
First determining unit 321, the target word for being higher than first threshold for matching degree, is defined as attribute of the destination object in target platform.
Second determining unit 322, for higher than Second Threshold but being used as candidate attribute less than the target word of the first threshold for matching degree, use semantic discriminant approach whether to determine the candidate attribute for the attribute in the target platform, attribute of the destination object in target platform is determined from the candidate attribute according to differentiation result.
Further, the second determining unit 322, can include:First differentiates that subelement 3221 and second differentiates at least one in subelement 3222.As a kind of signal of possible implementation, the second determining unit 322 includes the first differentiation subelement 3221 and the second differentiation subelement 3222 in Fig. 4.
Wherein, first differentiates subelement 3221, for based on the relation in the candidate attribute between word and word, carrying out semantic differentiation, obtains confidence level of the candidate attribute for the attribute in the target platform.
Specifically, first differentiates subelement 3221 specifically for by semantic discrimination model between the word of each character input training in advance in the candidate attribute, obtaining word vector;Semantic discrimination model, is that each character in the attribute of the target platform is trained into acquisition as training text between the word;Word vector is added up, the first term vector is obtained;It regard the cosine value of first term vector as confidence level of the candidate attribute for the attribute in the target platform.
Second differentiates subelement 3222, for the context relation based on the candidate attribute in the non-structured text, carries out semantic differentiation, obtains confidence level of the candidate attribute for the attribute in the target platform.
Specifically, second differentiates subelement 3222, semantic discrimination model, obtains the second term vector between the word specifically for each word in the non-structured text to be inputted to training in advance;Semantic discrimination model between institute's predicate, is that as training text each word in non-structured text in the target platform is trained into acquisition;It regard the cosine value of second term vector as confidence level of the candidate attribute for the attribute in the target platform.
Further, the second determining unit 322 can also include:Attribute determination subelement 3223.
Attribute determination subelement 3223, for according to the confidence level, attribute of the destination object in target platform to be determined from the candidate attribute.
Further, determining module 32, in addition to:Matching unit 323.
Matching unit 323, is matched, the candidate target in being matched for the target word by the matching degree higher than first threshold with the attribute of each object in the target platform stored in database;The frequency occurred according to the attribute of each candidate target in the attribute of whole candidate targets, the attribute for calculating candidate target is the probability of attribute of the destination object in target platform;According to the probability calculated, attribute of the destination object in target platform is determined from the attribute of the candidate target.
Further, the attribute acquisition device that the present embodiment is provided, in addition to:Classification prediction module 33 and preset attribute determining module 34.
Classification prediction module 33, for predicting the destination object in the affiliated classification of target platform according to the non-structured text.
Preset attribute determining module 34, for regarding the attribute of class now described in the target platform as the preset attribute.
Wherein, classification prediction module 33, including:Excavate unit 331 and modeling unit 332.
Unit 331 is excavated, for the non-structured text based on the destination object, data mining is carried out using trained disaggregated model, the destination object is obtained in the affiliated classification of target platform.
Modeling unit 332, the affiliated classification of object for obtaining user's search key and being selected from search result;Word segmentation processing is carried out to the keyword, search entry is obtained;According to the search entry and the affiliated classification generation training set of the object selected;The disaggregated model is trained using the training set.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above-mentioned each method embodiment can be completed by the related hardware of programmed instruction.Foregoing program can be stored in a computer read/write memory medium.The program upon execution, performs the step of including above-mentioned each method embodiment;And foregoing storage medium includes:ROM, RAM, magnetic disc or CD etc. are various can be with the medium of store program codes.
Finally it should be noted that:Various embodiments above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although the present invention is described in detail with reference to foregoing embodiments, it will be understood by those within the art that:It can still modify to the technical scheme described in foregoing embodiments, or carry out equivalent substitution to which part or all technical characteristic;And these modifications or replacement, the essence of appropriate technical solution is departed from the scope of various embodiments of the present invention technical scheme.
Claims (26)
1. an attribute acquisition methods, it is characterised in that including:
From the non-structured text for describing destination object, extract what is matched with preset attribute
Target word;
The attribute of the destination object is determined according to the target word.
2. attribute acquisition methods according to claim 1, it is characterised in that it is described from for
In the non-structured text for describing destination object, the target word matched with preset attribute is extracted, is wrapped
Include:
Using similarity algorithm, character string is carried out to the non-structured text and the preset attribute
Matching, obtains the target word and Corresponding matching degree of matching.
3. attribute acquisition methods according to claim 1, it is characterised in that described according to institute
The attribute that target word determines the destination object is stated, including:
According to the target word and the matching degree of the preset attribute, institute is determined from the target word
State the attribute of destination object.
4. attribute acquisition methods according to claim 1, it is characterised in that described according to institute
The attribute that target word determines the destination object is stated, including:
Semanteme based on the target word is analyzed, and obtains the attribute of the destination object.
5. attribute acquisition methods according to claim 3, it is characterised in that described according to institute
The matching degree of target word and the preset attribute is stated, the destination object is determined from the target word
Attribute, including:
It is higher than the target word of first threshold for matching degree, is defined as the destination object flat in target
Attribute in platform;
As candidate belong to higher than Second Threshold but less than the target word of the first threshold for matching degree
Property, use semantic discriminant approach whether to determine the candidate attribute for the attribute in the target platform,
Attribute of the destination object in target platform is determined from the candidate attribute according to differentiation result.
6. attribute acquisition methods according to claim 5, it is characterised in that the use language
Adopted discriminant approach determines whether the candidate attribute is attribute in the target platform, including:
Based on the relation in the candidate attribute between word and word, semantic differentiation is carried out, obtains described
Candidate attribute be the target platform in attribute confidence level;
And/or, the context relation based on the candidate attribute in the non-structured text is entered
Row is semantic to be differentiated, obtains confidence level of the candidate attribute for the attribute in the target platform.
7. attribute acquisition methods according to claim 6, it is characterised in that described to be based on institute
The relation between word and word in candidate attribute is stated, semantic differentiation is carried out, including:
By semantic discrimination model between the word of each character input training in advance in the candidate attribute, obtain
Obtain word vector;Semantic discrimination model, is by each character in the attribute of the target platform between the word
Acquisition is trained as training text;
Word vector is added up, the first term vector is obtained;
It is in the target platform using the cosine value of first term vector as the candidate attribute
The confidence level of attribute.
8. attribute acquisition methods according to claim 6, it is characterised in that described to be based on institute
Context relation of the candidate attribute in the non-structured text is stated, semantic differentiation is carried out, including:
By semantic discrimination model between the word of each word input training in advance in the non-structured text,
Obtain the second term vector;Semantic discrimination model, is by non-structural in the target platform between institute's predicate
Change each word in text as training text and be trained acquisition;
It is in the target platform using the cosine value of second term vector as the candidate attribute
The confidence level of attribute.
9. attribute acquisition methods according to claim 6, it is characterised in that the basis is sentenced
Other result determines attribute of the destination object in target platform from the candidate attribute, including:
According to the confidence level, determine the destination object in target platform from the candidate attribute
In attribute.
10. attribute acquisition methods according to claim 5, it is characterised in that it is described for
Matching degree is higher than the target word of first threshold, is defined as category of the destination object in target platform
After property, in addition to:
The matching degree is put down higher than the target stored in the target word and database of first threshold
The attribute of each object is matched in platform, the candidate target in being matched;
The frequency occurred according to the attribute of each candidate target in the attribute of whole candidate targets, is calculated
The attribute of candidate target is the probability of attribute of the destination object in target platform;
According to the probability calculated, the destination object is determined from the attribute of the candidate target
Attribute in target platform.
11. the attribute acquisition methods according to claim any one of 1-10, it is characterised in that
In the non-structured text from for describing destination object, extract what is matched with preset attribute
Before target word, in addition to:
Predict the destination object in the affiliated classification of target platform according to the non-structured text;
It regard the attribute of class now described in the target platform as the preset attribute.
12. attribute acquisition methods according to claim 11, it is characterised in that the basis
The non-structured text predicts the destination object in the affiliated classification of target platform, including:
Based on the non-structured text of the destination object, carried out using trained disaggregated model
Data mining, obtains the destination object in the affiliated classification of target platform.
13. attribute acquisition methods according to claim 12, it is characterised in that the use
Trained disaggregated model is carried out before data mining, in addition to:
Obtain user's search key and the affiliated classification of object selected from search result;
Word segmentation processing is carried out to the keyword, search entry is obtained;
According to the search entry and the affiliated classification generation training set of the object selected;
The disaggregated model is trained using the training set.
14. an attribute acquisition device, it is characterised in that including:
Abstraction module, for from the non-structured text for describing destination object, extract with
The target word of preset attribute matching;
Determining module, the attribute for determining the destination object according to the target word.
15. attribute acquisition device according to claim 14, it is characterised in that
The abstraction module, specifically for using similarity algorithm, to the non-structured text with
The preset attribute carries out string matching, obtains the target word and Corresponding matching degree of matching.
16. attribute acquisition device according to claim 14, it is characterised in that
The determining module, specifically for the matching degree according to the target word and the preset attribute,
The attribute of the destination object is determined from the target word.
17. attribute acquisition device according to claim 14, it is characterised in that
The determining module, is analyzed specifically for the semanteme based on the target word, obtains institute
State the attribute of destination object.
18. attribute acquisition device according to claim 16, it is characterised in that the determination
Module, including:
First determining unit, the target word for being higher than first threshold for matching degree, is defined as institute
State attribute of the destination object in target platform;
Second determining unit, for for matching degree is higher than Second Threshold but is less than the first threshold
Target word as candidate attribute, use semantic discriminant approach to determine the candidate attribute whether for institute
The attribute in target platform is stated, the target pair is determined from the candidate attribute according to differentiation result
As the attribute in target platform.
19. attribute acquisition device according to claim 18, it is characterised in that described second
Determining unit, including:
First differentiates subelement, for based on the relation in the candidate attribute between word and word, entering
Row is semantic to be differentiated, obtains confidence level of the candidate attribute for the attribute in the target platform;
And/or, second differentiate subelement, for based on the candidate attribute in the unstructured text
Context relation in this, carries out semantic differentiation, and it is the target platform to obtain the candidate attribute
In attribute confidence level.
20. attribute acquisition device according to claim 19, it is characterised in that
Described first differentiates subelement, specifically for each character input in the candidate attribute is pre-
Semantic discrimination model between the word first trained, obtains word vector;Semantic discrimination model between the word, be
Each character in the attribute of the target platform is trained acquisition as training text;To described
Word vector is added up, and obtains the first term vector;It regard the cosine value of first term vector as institute
State confidence level of the candidate attribute for the attribute in the target platform.
21. attribute acquisition device according to claim 19, it is characterised in that
Described second differentiates subelement, specifically for each word in the non-structured text is defeated
Enter semantic discrimination model between the word of training in advance, obtain the second term vector;It is semantic between institute's predicate to differentiate
Model, is to carry out each word in non-structured text in the target platform as training text
What training was obtained;It is the target using the cosine value of second term vector as the candidate attribute
The confidence level of attribute in platform.
22. attribute acquisition device according to claim 19, it is characterised in that described second
Determining unit, in addition to:
Attribute determination subelement, for according to the confidence level, institute to be determined from the candidate attribute
State attribute of the destination object in target platform.
23. attribute acquisition device according to claim 18, it is characterised in that the determination
Module, in addition to:
Matching unit, is deposited for the matching degree to be higher than in the target word of first threshold and database
The attribute of each object is matched in the target platform of storage, the candidate target in being matched;
The frequency occurred according to the attribute of each candidate target in the attribute of whole candidate targets, calculates candidate
The attribute of object is the probability of attribute of the destination object in target platform;According to being calculated
Probability, category of the destination object in target platform is determined from the attribute of the candidate target
Property.
24. the attribute acquisition device according to claim any one of 14-23, it is characterised in that
Described device, in addition to:
Classification prediction module, for predicting the destination object in mesh according to the non-structured text
Mark the affiliated classification of platform;
Preset attribute determining module, for using the attribute of class now described in the target platform as
The preset attribute.
25. attribute acquisition device according to claim 24, it is characterised in that the classification
Prediction module, including:
Unit is excavated, for the non-structured text based on the destination object, using by training
Disaggregated model carry out data mining, obtain the destination object in the affiliated classification of target platform.
26. attribute acquisition device according to claim 25, it is characterised in that the classification
Prediction module, in addition to:
Modeling unit, for pair for obtaining user's search key and being selected from search result
As affiliated classification;Word segmentation processing is carried out to the keyword, search entry is obtained;Searched according to described
Rope entry and the affiliated classification generation training set of the object selected;Using the training set to described point
Class model is trained.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610154037.9A CN107203548A (en) | 2016-03-17 | 2016-03-17 | Attribute acquisition methods and device |
TW106104935A TW201734901A (en) | 2016-03-17 | 2017-02-15 | Attribute acquisition method and device |
PCT/CN2017/075829 WO2017157198A1 (en) | 2016-03-17 | 2017-03-07 | Attribute acquisition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610154037.9A CN107203548A (en) | 2016-03-17 | 2016-03-17 | Attribute acquisition methods and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107203548A true CN107203548A (en) | 2017-09-26 |
Family
ID=59850988
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610154037.9A Pending CN107203548A (en) | 2016-03-17 | 2016-03-17 | Attribute acquisition methods and device |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN107203548A (en) |
TW (1) | TW201734901A (en) |
WO (1) | WO2017157198A1 (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108197180A (en) * | 2017-12-25 | 2018-06-22 | 中山大学 | A kind of method of the editable image of clothing retrieval of clothes attribute |
CN109101595A (en) * | 2018-07-27 | 2018-12-28 | 郑州云海信息技术有限公司 | A kind of information query method, device, equipment and computer readable storage medium |
CN109711951A (en) * | 2019-01-18 | 2019-05-03 | 中合金网(北京)电子商务有限公司 | Commodity automation collection and moving method |
CN110175322A (en) * | 2019-05-22 | 2019-08-27 | 北京神州泰岳软件股份有限公司 | A kind of structural method and device of document |
CN110223095A (en) * | 2018-03-02 | 2019-09-10 | 阿里巴巴集团控股有限公司 | Determine the method, apparatus, equipment and storage medium of item property |
CN110334185A (en) * | 2019-07-05 | 2019-10-15 | 政采云有限公司 | The treating method and apparatus of data in a kind of platform |
CN110807095A (en) * | 2018-08-01 | 2020-02-18 | 北京京东尚科信息技术有限公司 | Article matching method and device |
CN111797622A (en) * | 2019-06-20 | 2020-10-20 | 北京沃东天骏信息技术有限公司 | Method and apparatus for generating attribute information |
CN111860575A (en) * | 2020-06-05 | 2020-10-30 | 百度在线网络技术(北京)有限公司 | Method and device for processing article attribute information, electronic equipment and storage medium |
CN112800978A (en) * | 2021-01-29 | 2021-05-14 | 北京金山云网络技术有限公司 | Attribute recognition method, and training method and device for part attribute extraction network |
CN113256379A (en) * | 2021-05-24 | 2021-08-13 | 北京小米移动软件有限公司 | Method for correlating shopping demands for commodities |
CN113609279A (en) * | 2021-08-05 | 2021-11-05 | 湖南特能博世科技有限公司 | Material model extraction method and device and computer equipment |
CN113724055A (en) * | 2021-09-14 | 2021-11-30 | 京东科技信息技术有限公司 | Commodity attribute mining method and device |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110807083A (en) * | 2018-08-02 | 2020-02-18 | 北京京东尚科信息技术有限公司 | Keyword evaluation method and device |
CN110874408B (en) * | 2018-08-29 | 2023-05-26 | 阿里巴巴集团控股有限公司 | Model training method, text recognition device and computing equipment |
CN110955822B (en) * | 2018-09-25 | 2024-02-06 | 北京京东尚科信息技术有限公司 | Commodity searching method and device |
CN111444334B (en) * | 2019-01-16 | 2023-04-25 | 阿里巴巴集团控股有限公司 | Data processing method, text recognition device and computer equipment |
CN111444335B (en) * | 2019-01-17 | 2023-04-07 | 阿里巴巴集团控股有限公司 | Method and device for extracting central word |
CN110263123B (en) * | 2019-06-05 | 2023-10-31 | 腾讯科技(深圳)有限公司 | Method and device for predicting organization name abbreviation and computer equipment |
CN110827063A (en) * | 2019-10-18 | 2020-02-21 | 用友网络科技股份有限公司 | Multi-strategy fused commodity recommendation method, device, terminal and storage medium |
US20210304275A1 (en) * | 2020-03-31 | 2021-09-30 | Coupang Corp. | Computer-implemented systems and methods for electronicaly determining a real-time product registration |
CN112183035B (en) * | 2020-11-06 | 2023-11-21 | 上海恒生聚源数据服务有限公司 | Text labeling method, device, equipment and readable storage medium |
CN112507702B (en) * | 2020-12-03 | 2023-08-22 | 北京百度网讯科技有限公司 | Text information extraction method and device, electronic equipment and storage medium |
CN113627509B (en) * | 2021-08-04 | 2024-05-10 | 口碑(上海)信息技术有限公司 | Data classification method, device, computer equipment and computer readable storage medium |
CN113722496B (en) * | 2021-11-02 | 2022-03-08 | 北京世纪好未来教育科技有限公司 | Triple extraction method and device, readable storage medium and electronic equipment |
CN114201973B (en) * | 2022-02-15 | 2022-06-07 | 深圳博士创新技术转移有限公司 | Resource pool object data mining method and system based on artificial intelligence |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102073729A (en) * | 2011-01-14 | 2011-05-25 | 百度在线网络技术(北京)有限公司 | Relationship knowledge sharing platform and implementation method thereof |
CN103324761A (en) * | 2013-07-11 | 2013-09-25 | 广州市尊网商通资讯科技有限公司 | Product database forming method based on Internet data and system |
CN103473317A (en) * | 2013-09-12 | 2013-12-25 | 百度在线网络技术(北京)有限公司 | Method and equipment for extracting keywords |
CN103605815A (en) * | 2013-12-11 | 2014-02-26 | 焦点科技股份有限公司 | Automatic commodity information classifying and recommending method applicable to B2B (Business to Business) e-commerce platform |
CN104850554A (en) * | 2014-02-14 | 2015-08-19 | 北京搜狗科技发展有限公司 | Searching method and system |
CN105005917A (en) * | 2015-07-07 | 2015-10-28 | 上海晶赞科技发展有限公司 | Universal method for correlating single items of different e-commerce websites |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5257172B2 (en) * | 2009-03-16 | 2013-08-07 | 富士通株式会社 | SEARCH METHOD, SEARCH PROGRAM, AND SEARCH DEVICE |
CN102375823B (en) * | 2010-08-13 | 2014-11-05 | 腾讯科技(深圳)有限公司 | Searching result gathering display method and system |
CN103309886B (en) * | 2012-03-13 | 2017-05-10 | 阿里巴巴集团控股有限公司 | Trading-platform-based structural information searching method and device |
CN104504138A (en) * | 2014-12-31 | 2015-04-08 | 广州索答信息科技有限公司 | Human-based information fusion method and device |
-
2016
- 2016-03-17 CN CN201610154037.9A patent/CN107203548A/en active Pending
-
2017
- 2017-02-15 TW TW106104935A patent/TW201734901A/en unknown
- 2017-03-07 WO PCT/CN2017/075829 patent/WO2017157198A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102073729A (en) * | 2011-01-14 | 2011-05-25 | 百度在线网络技术(北京)有限公司 | Relationship knowledge sharing platform and implementation method thereof |
CN103324761A (en) * | 2013-07-11 | 2013-09-25 | 广州市尊网商通资讯科技有限公司 | Product database forming method based on Internet data and system |
CN103473317A (en) * | 2013-09-12 | 2013-12-25 | 百度在线网络技术(北京)有限公司 | Method and equipment for extracting keywords |
CN103605815A (en) * | 2013-12-11 | 2014-02-26 | 焦点科技股份有限公司 | Automatic commodity information classifying and recommending method applicable to B2B (Business to Business) e-commerce platform |
CN104850554A (en) * | 2014-02-14 | 2015-08-19 | 北京搜狗科技发展有限公司 | Searching method and system |
CN105005917A (en) * | 2015-07-07 | 2015-10-28 | 上海晶赞科技发展有限公司 | Universal method for correlating single items of different e-commerce websites |
Non-Patent Citations (2)
Title |
---|
严灿勋: "《英汉军事语料句子对齐研究》", 30 June 2015, 国防工业出版社 * |
曾道建等: "面向非结构化文本的开放式实体属性抽取", 《江西师范大学学报》 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108197180A (en) * | 2017-12-25 | 2018-06-22 | 中山大学 | A kind of method of the editable image of clothing retrieval of clothes attribute |
CN110223095A (en) * | 2018-03-02 | 2019-09-10 | 阿里巴巴集团控股有限公司 | Determine the method, apparatus, equipment and storage medium of item property |
CN109101595B (en) * | 2018-07-27 | 2022-07-08 | 郑州云海信息技术有限公司 | Information query method, device, equipment and computer readable storage medium |
CN109101595A (en) * | 2018-07-27 | 2018-12-28 | 郑州云海信息技术有限公司 | A kind of information query method, device, equipment and computer readable storage medium |
CN110807095A (en) * | 2018-08-01 | 2020-02-18 | 北京京东尚科信息技术有限公司 | Article matching method and device |
CN109711951A (en) * | 2019-01-18 | 2019-05-03 | 中合金网(北京)电子商务有限公司 | Commodity automation collection and moving method |
CN110175322A (en) * | 2019-05-22 | 2019-08-27 | 北京神州泰岳软件股份有限公司 | A kind of structural method and device of document |
CN111797622A (en) * | 2019-06-20 | 2020-10-20 | 北京沃东天骏信息技术有限公司 | Method and apparatus for generating attribute information |
CN111797622B (en) * | 2019-06-20 | 2024-04-09 | 北京沃东天骏信息技术有限公司 | Method and device for generating attribute information |
CN110334185A (en) * | 2019-07-05 | 2019-10-15 | 政采云有限公司 | The treating method and apparatus of data in a kind of platform |
CN111860575A (en) * | 2020-06-05 | 2020-10-30 | 百度在线网络技术(北京)有限公司 | Method and device for processing article attribute information, electronic equipment and storage medium |
CN112800978A (en) * | 2021-01-29 | 2021-05-14 | 北京金山云网络技术有限公司 | Attribute recognition method, and training method and device for part attribute extraction network |
CN113256379A (en) * | 2021-05-24 | 2021-08-13 | 北京小米移动软件有限公司 | Method for correlating shopping demands for commodities |
CN113609279A (en) * | 2021-08-05 | 2021-11-05 | 湖南特能博世科技有限公司 | Material model extraction method and device and computer equipment |
CN113609279B (en) * | 2021-08-05 | 2023-12-08 | 湖南特能博世科技有限公司 | Material model extraction method and device and computer equipment |
CN113724055A (en) * | 2021-09-14 | 2021-11-30 | 京东科技信息技术有限公司 | Commodity attribute mining method and device |
CN113724055B (en) * | 2021-09-14 | 2024-04-09 | 京东科技信息技术有限公司 | Commodity attribute mining method and device |
Also Published As
Publication number | Publication date |
---|---|
WO2017157198A1 (en) | 2017-09-21 |
TW201734901A (en) | 2017-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107203548A (en) | Attribute acquisition methods and device | |
US11341170B2 (en) | Automated extraction, inference and normalization of structured attributes for product data | |
KR101778679B1 (en) | Method and system for classifying data consisting of multiple attribues represented by sequences of text words or symbols using deep learning | |
More | Attribute extraction from product titles in ecommerce | |
Carrara et al. | LSTM-based real-time action detection and prediction in human motion streams | |
EP2812883B1 (en) | System and method for semantically annotating images | |
JP5424001B2 (en) | LEARNING DATA GENERATION DEVICE, REQUESTED EXTRACTION EXTRACTION SYSTEM, LEARNING DATA GENERATION METHOD, AND PROGRAM | |
US11373424B1 (en) | Document analysis architecture | |
US11379665B1 (en) | Document analysis architecture | |
WO2018090468A1 (en) | Method and device for searching for video program | |
US20220284392A1 (en) | Automated extraction, inference and normalization of structured attributes for product data | |
Zarchi et al. | A semantic model for general purpose content-based image retrieval systems | |
EP4165487A1 (en) | Document analysis architecture | |
CN116738988A (en) | Text detection method, computer device, and storage medium | |
CN114416998A (en) | Text label identification method and device, electronic equipment and storage medium | |
CN113903042A (en) | Trademark identification method and device, computer equipment and storage medium | |
CN111061939B (en) | Scientific research academic news keyword matching recommendation method based on deep learning | |
Shi et al. | Random pairwise shapelets forest | |
Huang et al. | Keyword spotting in unconstrained handwritten Chinese documents using contextual word model | |
US11776291B1 (en) | Document analysis architecture | |
CN110298228A (en) | A kind of multi-Target Image search method | |
Zheng et al. | A hybrid architecture based on CNN for image semantic annotation | |
Waykar et al. | Intent aware optimization for content based lecture video retrieval using Grey Wolf optimizer | |
Chen et al. | Pseudo-label diversity exploitation for few-shot object detection | |
US11893065B2 (en) | Document analysis architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |