WO2017157198A1

WO2017157198A1 - Attribute acquisition method and device

Info

Publication number: WO2017157198A1
Application number: PCT/CN2017/075829
Authority: WO
Inventors: 陈强; 吴夙慧; 郭立超; 李传福
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2016-03-17
Filing date: 2017-03-07
Publication date: 2017-09-21
Also published as: TW201734901A; CN107203548A

Abstract

An attribute acquisition method and a device, the method comprising: extracting a target word matching a preset attribute of a target platform from an unstructured text for describing a target object in an original platform (101); and then determining the attribute of the target object in the target platform according to the target word (102). For e-commerce platforms, the attribute of goods can be extracted from such unstructured texts as the product title and detailed description in the original platform, thereby solving the technical problem that in the prior art, the unstructured text cannot be processed to obtain the attribute of goods in the original platform on the target platform.

Description

Attribute acquisition method and device

The present application claims priority to Chinese Patent Application Serial No. No. No. No. No. No. No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No

Technical field

The present invention relates to information technology, and in particular, to an attribute acquisition method and apparatus.

Background technique

In the e-commerce processing platform, a product library can be maintained for the published product, and the product, the material, the color, the style, the price range and the like are determined according to the product category of the product, and the product is described. This facilitates statistics and user screening. When the original platform, such as Intime Commercial needs to access the target platform such as Taobao, when the product is released on the target platform, the attributes used to describe the product on the original platform, including the attribute items and attribute values, are often different from the target platform. For example, on the Intime commercial platform, brands, colors, materials and time-to-market descriptions of the products under the category of dresses were used, while on the Taobao platform, brands, color classifications, styles and price ranges were used. Therefore, before releasing the product on the Taobao platform, it is necessary to determine the attribute value of each attribute item when the product on the Intime commercial platform is described in the Taobao platform, that is, obtain the attribute of the item on the target platform.

In the prior art, the attributes of the original platform product may be clustered according to the attributes of the target platform, thereby obtaining the attribute of the product on the target platform, but the method can only process the attribute of the product on the original platform. It is not possible to process unstructured text such as titles or detailed descriptions of products on the original platform.

Summary of the invention

The present invention provides an attribute acquisition method and apparatus for processing an attribute of an item based on an unstructured text such as a title or a detailed description of an item on the original platform.

In order to achieve the above object, embodiments of the present invention adopt the following technical solutions:

In a first aspect, an attribute acquisition method is provided, including:

Extracting a target word that matches the preset attribute from the unstructured text used to describe the target object;

Determining an attribute of the target object according to the target word. In a second aspect, an attribute obtaining apparatus is provided, including:

An extraction module, configured to extract, from the unstructured text used to describe the target object, a target word preset attribute that matches the preset attribute;

a determining module, configured to determine an attribute preset attribute of the target object according to the target word.

The attribute obtaining method and device provided by the embodiment of the present invention extracts a target word that matches a preset attribute of the target platform from the unstructured text used by the original platform to describe the target object, and then determines the target object according to the target word. Attributes in the target platform. For the e-commerce platform, the attributes of the product can be extracted from the unstructured text of the title and the detailed description of the product, thereby solving the problem that the prior art cannot process the unstructured text and obtain the original platform. Technical issues with attributes on the target platform.

The above description is only an overview of the technical solutions of the present invention, and the above-described and other objects, features and advantages of the present invention can be more clearly understood. Specific embodiments of the invention are set forth below.

DRAWINGS

Various other advantages and benefits will become apparent to those skilled in the art from a The drawings are only for the purpose of illustrating the preferred embodiments and are not to be construed as limiting. Throughout the drawings, the same reference numerals are used to refer to the same parts. In the drawing:

FIG. 1 is a schematic flowchart diagram of an attribute obtaining method according to Embodiment 1;

2 is a schematic diagram of an application scenario of an attribute acquisition method;

3 is a schematic flowchart of an attribute obtaining method according to Embodiment 2 of the present invention;

4 is a schematic structural diagram of an attribute obtaining apparatus according to Embodiment 3 of the present invention;

FIG. 5 is a schematic structural diagram of an attribute obtaining apparatus according to Embodiment 4 of the present invention.

detailed description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the embodiments of the present invention have been shown in the drawings, the embodiments Rather, these embodiments are provided so that this disclosure will be more fully understood and the scope of the disclosure will be fully disclosed.

The attribute acquisition method and apparatus provided by the embodiments of the present invention are described in detail below with reference to the accompanying drawings.

Embodiment 1

FIG. 1 is a schematic flowchart of a method for obtaining an attribute according to the first embodiment. The method provided in this embodiment may be used in an e-commerce platform, that is, the object mentioned in this embodiment may be a commodity, and implemented. The example can be used to obtain the attributes of the product in the target platform before the goods in the original platform are delivered to the target platform, as shown in FIG. 1 As shown, the method includes:

Step 101: Extract, from the unstructured text used to describe the target object, a target word that matches the preset attribute.

The preset attribute includes a preset attribute item and a preset attribute value. For the same preset attribute item, the corresponding preset attribute value may be composed of one or more words. Optionally, after setting the correspondence between the preset attribute item and the preset attribute value, the correspondence between the preset attribute value and the plurality of preset attribute sub-values may also be set, where the preset attribute Subvalues have similar semantics to preset attribute values.

For example, for a preset attribute item of a clothing style, a vocabulary for describing different clothing styles may be set as a preset attribute value. Further, it is also possible to set a plurality of vocabulary with similar semantics as a preset attribute sub-value for each clothing style vocabulary. Specifically, the nationality may be set as a preset attribute value, and further, the Miao, Han, and Tibetan may also be set. Specifically, the vocabulary of the nationality is used as the default attribute sub-value, and if the college is set as the preset attribute value, the campus, the literary art, the small fresh, and the like are set to describe the vocabulary of the college style as the preset attribute sub-value. .

It should be noted that the matching mentioned here refers not only to absolute matching but also to partial matching.

Specifically, the words in the unstructured text are matched with the words corresponding to the preset attribute item, and if there is at least one matching word, the word is considered to match the preset attribute, and then the word is determined as the target word. Before the matching, the unstructured text can be pre-processed by obtaining the unstructured text such as the title and detailed description of the target object in the original platform. The pre-processing operations mainly include word segmentation, full-width half-width, unified case, and text. Perform normalization, accurately identify brand words, and process single words. Then, in the target platform, the preset attribute under the category to which the target object belongs is queried. The similarity algorithm is used to perform string matching on the unstructured text and the preset attribute, obtain a target word such as a matched word, and obtain a matching degree between each target word and a preset attribute. By performing string matching, vocabulary similar to the preset attribute is found from the unstructured text. The similarity algorithm used here may include: edit distance, cosine angle similarity, Euclidean distance, and Jacarrd genetic similarity distance (Jacarrd is An algorithm of genetic similarity), a binary grammar (2-Gram) language model, a longest common subsequence, a longest continuous common substring, and the like.

In this step, not only the string matching mentioned above may be used, but also the target words, such as semantic matching, may be extracted from the unstructured text in other manners.

It should be noted that the category mentioned above refers to the category to which the object belongs, and the granularity of the category can be set by the user, for example, can be generally divided into clothing, shoes, hats, electronic products, etc., and can be further refined. For example, for clothing, it can be divided into more fine-grained shirts, dresses, pants, and the like. The finer the granularity of the category division, the more The accuracy of the attributes retrieved is higher, but the corresponding preset attributes that need to be maintained are more. The granularity set by the category can refer to the difference of the preset attributes between two different categories. The division of the categories should make certain differences between the preset attributes of the two categories, so as to ensure the obtained Under the premise of the accuracy of the attribute, maintain a preset set of attributes of appropriate size.

Step 102: Determine an attribute of the target object according to the target word.

As a possible implementation manner, the attribute of the target object is determined from the target word according to the matching degree of the target word and the preset attribute.

The target object may be determined from the target word according to the matching degree between the target word and the preset attribute by matching the target word with the preset attribute value and/or the preset attribute sub-value in the preset attribute. Specifically, the similarity threshold, that is, the first threshold and the second threshold, are set in advance, wherein the first threshold is greater than the second threshold. For the target word whose matching degree is higher than the first threshold, the attribute of the target object in the target platform is determined; for the target word whose matching degree is higher than the second threshold but smaller than the first threshold as the candidate attribute, the semantic determination manner is used to determine the Whether the candidate attribute is an attribute in the target platform, and determining, according to the determination result, the attribute of the target object in the target platform from the candidate attribute.

Generally, the matching degree is between 0 and 1, and the matching degree obtained in the previous step is compared with the first threshold and the second threshold. There are three cases:

In the first case, for a target word whose matching degree is greater than the first threshold, it is considered that a greater probability is an attribute of the target object;

In the second case, for a target word whose matching degree is less than the first threshold but greater than the second threshold, it is considered that it may be an attribute of the target object, and these target words may be used as candidate attributes, and further judgment is needed, specifically in the implementation. In the example, the semantic judgment method is used for further judgment;

In the third case, for a target object whose matching degree is smaller than the second threshold, the probability that the attribute of the target object is considered to be low is directly discarded.

It can be seen that, by using the unstructured text used to describe the target object from the original platform, the target word matching the preset attribute of the target platform is extracted, and then the matching degree between the target word and the preset attribute is determined, and the target word is determined from the target word. The scheme of the attribute of the target object in the target platform can realize the extraction of the attribute of the commodity from the unstructured text of the title and the detailed description of the commodity, thereby solving the problem that the unstructured text cannot be processed in the prior art. The technical problem of the attributes of the original platform's merchandise on the target platform.

As another possible implementation manner, the analysis may be performed based on the semantics of the target word to obtain the attribute of the target object. For example, the target word obtained by extracting the words in the detailed description page of the product may be “Miao tradition”. "Apparel", analyzes the semantics of the target words, and determines that the semantics of "Miao traditional costumes" is used to describe the national style, so the national style can be used as the attribute of the commodity. The semantic analysis here can be based on similar semantics and general semantics. A variety of semantic relationships are analyzed. Specifically, similar semantics means that attributes can have similar semantics with target words. Generalized semantics refers to the concept that attributes can be up and down between target words.

Because the foregoing preset attribute value and the preset attribute sub-value are semantically related, the preset attribute corresponding to the preset attribute sub-value can be obtained according to the preset attribute sub-value matched by the target word. The value is used as the attribute value of the item, and the preset attribute item corresponding to the preset attribute value is used as the attribute item of the item.

It should be noted that in the actual use, other methods based on the semantics of the target word can be used to obtain the attributes of the target object, for example, using a classifier in data mining, which is based on the semantics of vocabulary. acquired.

Through the foregoing attribute acquisition method, the attribute of the product in the target platform can be obtained through the description page of the product in the original platform. 2 is a schematic diagram of an application scenario of the attribute acquisition method. As shown in FIG. 2, the left picture is a product page in the original platform, and the product title and product details are included in the page, and the target word is extracted from the product title and the product details, according to The extracted target word obtains a list of product attributes as shown in the right figure, and the item attribute list can be used for screening the items. The product attribute includes the item attribute item and the attribute value of the item, the first column is the attribute item of the item, and the second column is the attribute value of the item.

Embodiment 2

In this embodiment, specifically, in the e-commerce application scenario, when the original platform accesses the target platform, how to obtain the attributes of the products in the original platform in the target platform is described in detail. FIG. 3 is a schematic diagram of Embodiment 2 of the present invention. A schematic diagram of a process for obtaining an attribute, as shown in FIG. 3, includes:

Step 201: Based on the unstructured text used to describe the target commodity in the original platform, predict the target product to be in the category to which the target platform belongs.

Specifically, a classification model may be constructed in advance, for example, the classification model may be a simple Bayesian algorithm classification model. By collecting the keyword searched by the user and the click data after the search, the category corresponding to each keyword is determined according to the category of the clicked product after the search in the click data, and the correspondence between the keyword and the category is obtained. Then, the keyword is segmented, the term is obtained, and the term is replaced by the keyword in the correspondence between the keyword and the category, and the correspondence between the term and the category is obtained. The correspondence between the entry and the category is used as the training set, the classification model is trained, the classification model is trained, and the classification model is constructed.

Then, based on the unstructured text of the target object, using a trained classification model for data mining, Obtain the category to which the target object belongs in the target platform. Among them, the unstructured text can be described as a title and/or a detail page.

For example, when a third-party platform such as Intime needs to access the target platform of Taobao as the original platform, the title of the target product in the third-party platform can be segmented to obtain the title of the title, and then the title of the title is marked with the part of speech. The part of speech information of each entry. Using the word-loss algorithm, the words are processed according to the part of speech information, so that some of the interference words in the target product title are discarded, and only product words, modifiers, brand words, time-season words, promotional words, and the like are retained. Enter the retained terms into the trained classification model to obtain the categories of the target products on the Taobao platform.

Since the classification of the categories is often different in different platforms, the accurate categories of the target products in the target platform can be obtained based on the prediction mode, so that the target attributes can be obtained by matching the preset attributes based on the category. , to improve the possibility of the target product attribute in the obtained target word.

Step 202: Extract a target word that matches the preset attribute under the predicted category from the unstructured text.

Specifically, the pre-processed unstructured text is subjected to similarity calculation, and the target word matching the preset attribute is obtained, and the matching degree is obtained. For the convenience of description, the matching degree can be written as sim1. The matching degree is used to describe the degree of similarity between the target word and the preset attribute.

The preset attribute includes two parts, namely an attribute item and an attribute value. If the target word is similar to the attribute value in the preset attribute, the target word is matched with the preset attribute, and the target word and the matched attribute may be The attribute item combination forms an attribute pair as PV.

Step 203: Determine, according to the matching degree of the target word, the attribute and the candidate attribute of the target object in the target platform from the target word.

For example, the target word whose similarity sim5 is greater than the preset threshold a is used as the attribute of the target object in the target platform; the target word whose similarity is smaller than the preset threshold a and larger than the preset threshold b is used as the candidate attribute. Where 0<b<a<1.

Step 204: Match the stored target platform products in the database for the target words determined as attributes, and extract the attributes of the candidate products in the matching.

Specifically, the database includes a product library and a commodity library, and the product library does not include the merchant field compared with the commodity library, and the remaining data may be identical. That is to say, each record in the product library corresponds to one product, and each record in the product library corresponds to one product provided by one merchant.

First, the query is performed in the product library, and the candidate products in the product library that match all the target words determined as attributes are obtained through the query.

Then, the query is performed in the product library, and all the target words in the product library and determined as attributes are obtained through the query. The candidate in the match.

The attributes of all the candidate products obtained by the two queries are used as the attributes of the target item, and the confidence of each attribute is calculated.

Step 205: Calculate a confidence level of each attribute of the candidate item.

Among them, the confidence level is used to indicate the accuracy of describing the target item in the target platform.

If it is determined that the target word of the attribute includes the brand and the model, and the candidate product is unique, the confidence value of each attribute of the candidate product may be directly set to 100%, or may be calculated by the confidence calculation formula mentioned below, and the result is calculated. Are the same. The confidence calculation formula is as follows:

Confidence = (number of occurrences in the attributes of the candidate products / total number of candidate items) %

E.g:

The attribute pairs formed by the target words are: P1V1 and P2V2

If there are 3 matching candidate products in the product library, the PV pairs of the candidate products are:

P1V1, P2V2, P3V3, P6V6

P1V1, P2V2, P7V7

P1V1, P2V2, P8V8

Then, P1V1, P2V2, P3V3, P7V7, and P8V8 are output as attributes of the target item.

Further, according to the confidence formula, the confidence levels of P1V1, P2V2, P3V3, P7V7, and P8V8 are calculated, which are 100%, 100%, 33.3%, 33.3%, and 33.3%, respectively.

Step 206: Determine, for a target word determined as a candidate attribute, a semantic discriminant manner to determine a candidate attribute as a confidence level of the attribute in the target platform.

First, based on the relationship between words and words, semantic discrimination is performed. The preset attribute values in the target platform are separated according to words in advance. As the training text, the word2vec algorithm is used for model training, and the target words determined as candidate attributes are input into the trained discriminant model to obtain the word vector, and the word vectors are accumulated. The word vector is obtained, and the cosine value of the word vector is used as the candidate attribute as the confidence sim2 of the attribute in the target platform.

Second, semantic discrimination is based on the context of the target word in the unstructured text. The title or detail page of each product in the target platform is used as a corpus, and the word segmentation result is used as the training text. The word2vec algorithm is used for model training, and the target word determined as the candidate attribute is input into the trained discriminant model to obtain the word. The vector takes the cosine of the word vector as the candidate attribute as the confidence sim3 of the attribute in the target platform.

Finally, the similarities sim2 and sim3 obtained according to the two semantic discriminating methods determine the candidate attribute as the confidence S of the attribute in the target platform. For example: using a weighted sum or weighted average of sim2 and sim3 Calculate the confidence S.

As a possible implementation manner, the calculated confidence level may be corrected by calculating the confidence S, referring to the candidate products in the previous step, and counting the frequency of occurrence of each candidate attribute in the attributes of the candidate product, and obtaining the corrected confidence. Confidence S.

Step 207: Collect the target words determined as the attribute and the candidate attribute, and the attributes of the candidate item, and determine the attribute of the target item from the summary result according to the confidence.

The threshold of confidence can be determined by obtaining the required accuracy based on the attributes. The higher the accuracy required, the higher the confidence threshold can be raised, and if the required accuracy is lower, a lower confidence threshold can be set. The target word with a confidence greater than the confidence threshold is selected from the summary results as the attribute of the target commodity.

Embodiment 3

4 is a schematic structural diagram of an attribute obtaining apparatus according to Embodiment 3 of the present invention. As shown in FIG. 4, the method includes: an extracting module 31 and a determining module 32.

The extracting module 31 is configured to extract, from the unstructured text used to describe the target object, a target word that matches the preset attribute;

Specifically, the extraction module 31 is specifically configured to perform a string matching on the unstructured text and the preset attribute by using a similarity algorithm to obtain a matching target word and a corresponding matching degree.

The determining module 32 is configured to determine an attribute preset attribute of the target object according to the target word.

Specifically, the determining module 32 is specifically configured to determine an attribute of the target object from the target words according to a matching degree between the target word and the preset attribute.

Or, specifically, the determining module 32 is specifically configured to perform an analysis based on the semantics of the target word to obtain an attribute of the target object.

In this embodiment, the target word matching the preset attribute of the target platform is extracted from the unstructured text used to describe the target object by the original platform, and then the attribute of the target object in the target platform is determined according to the target word. The solution can realize the extraction of the attributes of the goods from the unstructured text of the title and the detailed description of the product, thereby solving the problem that the prior art cannot process the unstructured text and obtain the original platform on the target platform. Technical issues of the property.

Embodiment 4

FIG. 5 is a schematic structural diagram of an attribute obtaining apparatus according to Embodiment 4 of the present invention, and the genus provided in FIG. 4 Based on the sexual acquisition device, the determination module 32 further includes: a first determination unit 321 and a second determination unit 322.

The first determining unit 321 is configured to determine, as a target word whose matching degree is higher than the first threshold, an attribute of the target object in the target platform.

The second determining unit 322 is configured to determine, by using a semantic discriminant manner, whether the candidate attribute is an attribute in the target platform, for a target word whose matching degree is higher than a second threshold but smaller than the first threshold, as a candidate attribute, according to The discriminating result determines an attribute of the target object in the target platform from the candidate attributes.

Further, the second determining unit 322 may include at least one of the first discriminating subunit 3221 and the second discriminating subunit 3222. As a schematic representation of one possible implementation, the second determining unit 322 in FIG. 4 includes a first discriminating subunit 3221 and a second discriminating subunit 3222.

The first discriminating sub-unit 3221 is configured to perform semantic discriminating based on the relationship between words and words in the candidate attribute, and obtain the confidence that the candidate attribute is an attribute in the target platform.

Specifically, the first discriminating sub-unit 3221 is specifically configured to input each character in the candidate attribute into a pre-trained inter-word semantic discriminant model to obtain a word vector; the inter-word semantic discriminant model is to use the target platform Each character in the attribute is obtained by training as a training text; accumulating the word vector to obtain a first word vector; using a cosine value of the first word vector as the candidate attribute as an attribute in the target platform Confidence.

The second discriminating sub-unit 3222 is configured to perform semantic discriminating based on the context relationship of the candidate attribute in the unstructured text, and obtain the confidence that the candidate attribute is an attribute in the target platform.

Specifically, the second discriminating sub-unit 3222 is specifically configured to input each word in the unstructured text into a pre-trained inter-word semantic discriminant model to obtain a second word vector; the inter-word semantic discriminant model is Each word in the unstructured text in the target platform is trained as training text; and a cosine value of the second word vector is used as the candidate attribute as a confidence level of an attribute in the target platform.

Further, the second determining unit 322 may further include: an attribute determining subunit 3223.

The attribute determining sub-unit 3223 is configured to determine, according to the confidence, an attribute of the target object in the target platform from the candidate attributes.

Further, the determining module 32 further includes: a matching unit 323.

The matching unit 323 is configured to match the target word whose matching degree is higher than the first threshold with the attributes of each object in the target platform stored in the database, to obtain candidate objects in the matching; according to the attributes of each candidate object a frequency of occurrence of attributes of all candidate objects, calculating a probability that an attribute of the candidate object is an attribute of the target object in the target platform; determining, according to the calculated probability, the target object from among attributes of the candidate object The attributes in the target platform.

Further, the attribute obtaining apparatus provided in this embodiment further includes: a category prediction module 33 and a preset attribute determining module 34.

The category prediction module 33 is configured to predict, according to the unstructured text, the category of the target object in the target platform.

The preset attribute determining module 34 is configured to use an attribute under the category in the target platform as the preset attribute.

The category prediction module 33 includes: a mining unit 331 and a modeling unit 332.

The mining unit 331 is configured to perform data mining using the trained classification model based on the unstructured text of the target object, and obtain the category of the target object in the target platform.

The modeling unit 332 is configured to acquire a user search keyword and a category to which the selected object is selected from the search result; perform word segmentation processing on the keyword to obtain a search term; according to the search term and the selected item The category to which the object belongs generates a training set; the training model is used to train the classification model.

One of ordinary skill in the art will appreciate that all or part of the steps to implement the various method embodiments described above may be accomplished by hardware associated with the program instructions. The aforementioned program can be stored in a computer readable storage medium. The program, when executed, performs the steps including the foregoing method embodiments; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the foregoing embodiments may be modified, or some or all of the technical features may be equivalently replaced; and the modifications or substitutions do not deviate from the technical solutions of the embodiments of the present invention. range.

Claims

An attribute acquisition method, comprising:

Extracting a target word that matches the preset attribute from the unstructured text used to describe the target object;

Determining an attribute of the target object according to the target word.
The attribute acquisition method according to claim 1, wherein the extracting the target word matching the preset attribute from the unstructured text for describing the target object comprises:

The similarity algorithm is used to perform string matching on the unstructured text and the preset attribute to obtain a matching target word and a corresponding matching degree.
The attribute obtaining method according to claim 1, wherein the determining the attribute of the target object according to the target word comprises:

Determining an attribute of the target object from the target word according to a matching degree of the target word and the preset attribute.
The attribute obtaining method according to claim 1, wherein the determining the attribute of the target object according to the target word comprises:

Performing an analysis based on the semantics of the target word to obtain an attribute of the target object.
The attribute obtaining method according to claim 3, wherein the determining the attribute of the target object from the target word according to the matching degree of the target word and the preset attribute comprises:

For the target word whose matching degree is higher than the first threshold, determining an attribute of the target object in the target platform;

And determining, by using a semantic discriminant manner, whether the candidate attribute is an attribute in the target platform, and using the target word whose matching degree is higher than the second threshold but smaller than the first threshold as a candidate attribute, according to the discriminating result, from the candidate attribute. Determining the attributes of the target object in the target platform.
The attribute obtaining method according to claim 5, wherein the determining, by the semantic discriminating manner, whether the candidate attribute is an attribute in the target platform comprises:

Performing semantic discrimination based on the relationship between the words and the words in the candidate attributes, and obtaining the confidence that the candidate attributes are attributes in the target platform;

And/or performing semantic discrimination based on a context relationship of the candidate attribute in the unstructured text, obtaining a confidence that the candidate attribute is an attribute in the target platform.
The attribute obtaining method according to claim 6, wherein the semantic discrimination is performed based on a relationship between a word and a word in the candidate attribute, including:

Inputting each character in the candidate attribute into a pre-trained inter-word semantic discriminant model to obtain a word vector; The semantic discrimination model between words is obtained by training each character in the attribute of the target platform as a training text;

Accumulating the word vectors to obtain a first word vector;

The cosine value of the first word vector is used as the candidate attribute as a confidence level of the attribute in the target platform.
The attribute obtaining method according to claim 6, wherein the semantic determination based on the context relationship of the candidate attribute in the unstructured text comprises:

Inputting each word in the unstructured text into a pre-trained inter-word semantic discriminant model to obtain a second word vector; the inter-word semantic discriminant model is to use each word in the unstructured text in the target platform Obtained as training text;

The cosine value of the second word vector is used as the candidate attribute as a confidence level of the attribute in the target platform.
The attribute acquisition method according to claim 6, wherein the determining, according to the determination result, the attribute of the target object in the target platform from the candidate attributes comprises:

And determining, according to the confidence, an attribute of the target object in the target platform from the candidate attributes.
The attribute acquisition method according to claim 5, wherein the determining, after the target object is in the target platform, the target word with the matching degree being higher than the first threshold, the method further includes:

Matching the target words whose matching degree is higher than the first threshold with the attributes of each object in the target platform stored in the database, to obtain candidate objects in the matching;

Calculating a probability that an attribute of the candidate object is an attribute of the target object in the target platform according to a frequency of occurrence of an attribute of each candidate object in an attribute of all candidate objects;

And determining, according to the calculated probability, an attribute of the target object in the target platform from attributes of the candidate object.
The attribute obtaining method according to any one of claims 1 to 10, wherein the extracting the target word matching the preset attribute from the unstructured text for describing the target object further includes:

Predicting, according to the unstructured text, the category of the target object in the target platform;

The attribute under the category in the target platform is used as the preset attribute.
The attribute obtaining method according to claim 11, wherein the predicting the target object in the target platform according to the unstructured text comprises:

Based on the unstructured text of the target object, the trained classification model is used for data mining, and the target object belongs to the category to which the target platform belongs.
The attribute acquisition method according to claim 12, wherein before the data mining using the trained classification model, the method further comprises:

Obtaining a user search keyword and a category to which the object selected from the search results belongs;

Performing word segmentation on the keyword to obtain a search term;

Generating a training set according to the search term and the category to which the selected object belongs;

The classification model is trained using the training set.
An attribute obtaining device, comprising:

An extraction module, configured to extract, from the unstructured text used to describe the target object, a target word that matches the preset attribute;

a determining module, configured to determine an attribute of the target object according to the target word.
The attribute acquisition device according to claim 14, wherein

The extracting module is specifically configured to perform a string matching on the unstructured text and the preset attribute by using a similarity algorithm to obtain a matching target word and a corresponding matching degree.
The attribute acquisition device according to claim 14, wherein

The determining module is specifically configured to determine an attribute of the target object from the target word according to a matching degree of the target word and the preset attribute.
The attribute acquisition device according to claim 14, wherein

The determining module is specifically configured to perform an analysis based on semantics of the target word to obtain an attribute of the target object.
The attribute obtaining apparatus according to claim 16, wherein the determining module comprises:

a first determining unit, configured to determine, as a target word whose matching degree is higher than the first threshold, an attribute of the target object in the target platform;

a second determining unit, configured to determine, as a candidate attribute, a target word whose matching degree is higher than a second threshold but smaller than the first threshold, and determine, by using a semantic discriminating manner, whether the candidate attribute is an attribute in the target platform, according to the determining As a result, an attribute of the target object in the target platform is determined from the candidate attributes.
The attribute obtaining apparatus according to claim 18, wherein the second determining unit comprises:

a first discriminating subunit, configured to perform semantic discriminating based on a relationship between words and words in the candidate attribute, and obtain a confidence that the candidate attribute is an attribute in the target platform;

And/or a second discriminating subunit, configured to perform semantic discriminating based on a context relationship of the candidate attribute in the unstructured text, and obtain a confidence that the candidate attribute is an attribute in the target platform.
The attribute acquisition device according to claim 19, wherein

The first discriminating subunit is specifically configured to input each character in the candidate attribute into a pre-trained inter-word semantic discriminant model to obtain a word vector; the inter-word semantic discriminant model is an attribute of the target platform Each character is trained as training text; accumulating the word vector to obtain a first word vector; using a cosine value of the first word vector as the candidate attribute as a confidence in an attribute in the target platform degree.
The attribute acquisition device according to claim 19, wherein

The second discriminating subunit is specifically configured to input each word in the unstructured text into a pre-trained inter-word semantic discriminant model to obtain a second word vector; the inter-word semantic discriminant model is Each word in the unstructured text in the target platform is trained as training text; the cosine value of the second word vector is used as the candidate attribute as a confidence level of the attribute in the target platform.
The attribute obtaining apparatus according to claim 19, wherein the second determining unit further comprises:

An attribute determining subunit, configured to determine an attribute of the target object in the target platform from the candidate attributes according to the confidence level.
The attribute obtaining apparatus according to claim 18, wherein the determining module further comprises:

a matching unit, configured to match the target word with the matching degree higher than the first threshold with the attributes of each object in the target platform stored in the database, to obtain candidate objects in the matching; according to the attributes of each candidate object a frequency of occurrence of the candidate object, calculating a probability that the attribute of the candidate object is an attribute of the target object in the target platform; determining, according to the calculated probability, the target object from the attribute of the candidate object Attributes in the platform.
The attribute obtaining device according to any one of claims 14 to 23, wherein the device further comprises:

a category prediction module, configured to predict, according to the unstructured text, the category of the target object in the target platform;

And a preset attribute determining module, configured to use an attribute under the category in the target platform as the preset attribute.
The attribute obtaining apparatus according to claim 24, wherein the category prediction module comprises:

The mining unit is configured to perform data mining by using the trained classification model based on the unstructured text of the target object, and obtain the category of the target object in the target platform.
The attribute obtaining apparatus according to claim 25, wherein the category prediction module further comprises:

a modeling unit, configured to acquire a user search keyword and a category to which the selected object is selected from the search result; perform word segmentation processing on the keyword to obtain a search term; according to the search term and the selected The category to which the object belongs generates a training set; the classification model is trained using the training set.