WO2023060910A1

WO2023060910A1 - Information extraction method and apparatus

Info

Publication number: WO2023060910A1
Application number: PCT/CN2022/096657
Authority: WO
Inventors: 唐波
Original assignee: 北京达佳互联信息技术有限公司
Priority date: 2021-10-11
Filing date: 2022-06-01
Publication date: 2023-04-20
Also published as: CN113627201A; CN113627201B

Abstract

The present disclosure relates to the technical field of deep learning, and relates to an information extraction method and apparatus, a device, and a storage medium. The information extraction method comprises: performing language conversion on native evaluation information of a plurality of objects to obtain target evaluation information corresponding to each piece of native evaluation information; performing semantic unit splitting on the native evaluation information and the target evaluation information to obtain a plurality of native semantic units and a plurality of target semantic units; constructing semantic unit matching groups, wherein each semantic unit matching group comprises a target semantic unit and a plurality of native semantic units having the same semantics as the target semantic unit, and the plurality of native semantic units correspond to different languages; and on the basis of a semantic clustering result of the plurality of target semantic units and the semantic unit matching groups, obtaining multilingual attribute description information corresponding to the plurality of objects.

Description

Information extraction method and device

Cross References to Related Applications

This application is based on a Chinese patent application with application number 202111180788.5 and a filing date of October 11, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference into this application.

technical field

The present disclosure relates to the technical field of deep learning, and in particular to an information extraction method, device, equipment and storage medium.

Background technique

At present, e-commerce is shifting from a traditional business model to a content e-commerce model. Content e-commerce is the content that needs to be valued. Through the integration and dissemination of brand owners, e-commerce platforms and various resources, it can accurately reach target users and increase conversion rates. Evaluation is the largest piece of UGC content (User Generated Content) in the content e-commerce system. The quality of evaluation content organization will affect the user's decision-making time and conversion rate. In the organization of evaluation content, the current relatively novel way is everyone's impression words. This function is mainly to classify and summarize the evaluation content. The public impression word may refer to a short sentence that frequently appears in the evaluation text and is used to describe the target object.

Contents of the invention

The disclosure provides an information extraction method, device, equipment and storage medium.

According to the first aspect of the embodiments of the present disclosure, an information extraction method is provided, including:

Performing language conversion on the original evaluation information of multiple objects to obtain target evaluation information corresponding to each piece of original evaluation information; wherein, the original evaluation information of the multiple objects includes multilingual original evaluation information;

performing semantic unit splitting on the native evaluation information and the target evaluation information to obtain multiple native semantic units and multiple target semantic units;

Constructing a semantic unit matching group; wherein each of the semantic unit matching groups includes a target semantic unit and a plurality of native semantic units having the same semantics as the target semantic unit; the plurality of native semantic units correspond to different languages ;

Based on the semantic clustering results of the multiple target semantic units and the semantic unit matching group, the multilingual attribute description information corresponding to the multiple objects is obtained.

In an exemplary embodiment, the information extraction method further includes:

generating semantic vectors corresponding to the plurality of target semantic units;

Semantic clustering is performed on the semantic vectors corresponding to the plurality of target semantic units to obtain a plurality of target classes;

A plurality of clustered semantic units are determined from the plurality of target semantic units based on the semantic vectors in each of the target classes and the semantic unit matching group.

In an exemplary embodiment, the obtaining the multilingual attribute description information corresponding to the multiple objects based on the semantic clustering results of the multiple target semantic units and the semantic unit matching group includes:

Based on the semantic unit matching group, determine a plurality of native semantic units that match each of the clustered semantic units;

Each of the clustering semantic units and multiple native semantic units matching the clustering semantic units are determined as multilingual attribute description information corresponding to the multiple objects.

In an exemplary embodiment, the generating semantic vectors corresponding to the plurality of target semantic units includes:

Based on the word vector of each word in each of the target semantic units, the word vector contained in the target semantic unit is obtained;

Taking the average value of the word vectors contained in the target semantic unit to obtain the semantic vector corresponding to the target semantic unit;

Semantic vectors corresponding to the multiple target semantic units are obtained based on the semantic vectors corresponding to each target semantic unit.

In an exemplary embodiment, the determining a plurality of clustering semantic units from the plurality of target semantic units based on the semantic vectors in each of the target classes and the semantic unit matching group includes:

determining a central semantic vector for each of said target classes;

Based on the distance between each semantic vector in each of the target classes and the central semantic vector, determine a candidate semantic vector for each of the target classes;

According to the target semantic unit corresponding to the candidate semantic vector of each target class, a plurality of candidate semantic units are obtained;

determining the number of native semantic units matching each candidate semantic unit based on the semantic unit matching group;

The clustered semantic units are determined from the candidate semantic units based on the number of native semantic units matching each candidate semantic unit.

In an exemplary embodiment, the information extraction method further includes:

In response to determining the attribute description information of each object, traverse each clustering semantic unit, and perform the following operations based on each clustering semantic unit:

Find the current clustering semantic unit in the original evaluation information of the object;

In response to the original evaluation information of the object including the current cluster semantic information, the current cluster semantic unit is determined as the attribute description information of the object.

In an exemplary embodiment, the information extraction method further includes:

For each attribute description information of the object, determine the sentiment value of the attribute description information;

Determining the original evaluation information of the object that includes the attribute description information and is consistent with the emotional value of the attribute description information as the original evaluation information that matches the attribute description information;

Mount the original evaluation information matching the attribute description information into the attribute description information.

In an exemplary embodiment, the information extraction method further includes:

Perform similarity calculation on any two items of attribute description information of the object;

Determine similar attribute information pairs based on similarity calculation results; each similar attribute information pair includes two pieces of attribute description information whose similarity is greater than a preset value;

Counting the number of native evaluation information attached to each attribute description information of the object;

Based on the descending order of the number of native evaluation information mounted, sort the attribute description information;

Mounting the original evaluation information corresponding to the attribute description information ranked last in the similar attribute information pair to the original evaluation information corresponding to the attribute description information ranked first in the similar attribute information pair.

In an exemplary embodiment, the semantic unit splitting of the native evaluation information and the target evaluation information to obtain multiple native semantic units and multiple target semantic units includes:

performing semantic unit splitting on the native evaluation information to obtain a plurality of first semantic units;

Deduplicating the plurality of first semantic units to obtain the plurality of original semantic units;

performing semantic unit splitting on the target evaluation information to obtain a plurality of second semantic units;

Deduplication is performed on the plurality of second semantic units to obtain the plurality of target semantic units.

According to a second aspect of an embodiment of the present disclosure, an information extraction device is provided, including:

The language conversion unit is configured to perform language conversion on the original evaluation information of multiple objects to obtain target evaluation information corresponding to each piece of original evaluation information; wherein, the original evaluation information of the plurality of objects includes multilingual original evaluation information information;

a semantic unit splitting unit configured to split the native evaluation information and the target evaluation information into semantic units to obtain multiple native semantic units and multiple target semantic units;

A semantic unit matching group construction unit configured to construct a semantic unit matching group; wherein each of the semantic unit matching groups includes a target semantic unit and multiple native semantic units having the same semantics as the target semantic unit; The above multiple native semantic units correspond to different languages;

The information generation unit is configured to obtain multilingual attribute description information corresponding to the plurality of objects based on the semantic clustering results of the plurality of target semantic units and the matching group of the semantic units.

In an exemplary embodiment, the information extraction device further includes:

a semantic vector generating unit configured to generate semantic vectors corresponding to the plurality of target semantic units;

A semantic clustering unit configured to perform semantic clustering on the semantic vectors corresponding to the multiple target semantic units to obtain multiple target classes;

The first determining unit is configured to determine a plurality of clustering semantic units from the plurality of target semantic units based on the semantic vectors in each of the target classes and the semantic unit matching group.

In an exemplary embodiment, the information generation unit includes:

The second determining unit is configured to determine a plurality of native semantic units matching each of the clustering semantic units based on the semantic unit matching group;

The third determining unit is configured to determine each of the clustering semantic units and multiple native semantic units matching the clustering semantic units as the multilingual attribute description information corresponding to the multiple objects.

In an exemplary embodiment, the semantic vector generation unit includes:

The first word vector determination unit is configured to obtain the word vector contained in the target semantic unit based on the word vector of each word in each of the target semantic units;

an average calculation unit configured to average the word vectors contained in the target semantic unit to obtain the semantic vector corresponding to the target semantic unit;

The second word vector determining unit is configured to obtain semantic vectors corresponding to the plurality of target semantic units based on the semantic vectors corresponding to each target semantic unit.

In an exemplary embodiment, the first determination unit includes:

a central semantic vector determining unit configured to determine the central semantic vector of each of the target classes;

a candidate semantic vector determining unit configured to determine a candidate semantic vector for each of the target classes based on the distance between each semantic vector in each of the target classes and the central semantic vector;

The candidate semantic unit determining unit is configured to obtain a plurality of candidate semantic units according to the target semantic unit corresponding to the candidate semantic vector of each target class;

The first quantity determining unit is configured to determine the quantity of native semantic units matching each candidate semantic unit based on the semantic unit matching group;

The clustering semantic unit determining unit is configured to determine the clustering semantic unit from the candidate semantic units based on the number of native semantic units matching each candidate semantic unit.

In an exemplary embodiment, the information extraction device further includes:

The traversal unit is configured to, in response to determining the attribute description information of each object, traverse each of the clustering semantic units, and perform the following operations based on each of the clustering semantic units:

a search unit configured to search for the current clustering semantic unit in the original evaluation information of the object;

The fourth determining unit is configured to determine the current clustering semantic unit as the attribute description information of the object in response to the original evaluation information of the object containing the current clustering semantic information.

In an exemplary embodiment, the information extraction device further includes:

an emotional value determination unit configured to determine the emotional value of the attribute description information for each item of attribute description information of the object;

The fifth determination unit is configured to determine that the native evaluation information of the object includes the attribute description information and is consistent with the emotional value of the attribute description information as the original evaluation information that matches the attribute description information. evaluation information;

The first mounting unit is configured to mount native evaluation information matching the attribute description information into the attribute description information.

In an exemplary embodiment, the information extraction device further includes:

The second quantity unit is configured to count the number of native evaluation information loaded by each attribute description information of the object;

The sorting unit is configured to sort the attribute description information based on the descending order of the number of native evaluation information mounted;

A similarity calculation unit configured to perform similarity calculations on each attribute description information of the object;

The similar attribute information pair determination unit is configured to determine similar attribute information pairs based on similarity calculation results; each similar attribute information pair includes two items of attribute description information whose similarity is greater than a preset value;

The second mounting unit is configured to mount the native evaluation information corresponding to the attribute description information ranked last in the similar attribute information pair to the native evaluation information corresponding to the attribute description information ranked first in the similar attribute information pair. evaluation information.

In an exemplary embodiment, the semantic unit splitting unit includes:

The first splitting unit is configured to split the original evaluation information into semantic units to obtain multiple first semantic units;

The first deduplication unit is configured to deduplicate the plurality of first semantic units to obtain the plurality of original semantic units;

The second splitting unit is configured to split the target evaluation information into semantic units to obtain a plurality of second semantic units;

The second deduplication unit is configured to deduplicate the plurality of second semantic units to obtain the plurality of target semantic units.

According to a third aspect of an embodiment of the present disclosure, there is provided an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to implement Information extraction methods as described above.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a non-volatile computer-readable storage medium. When instructions in the computer-readable storage medium are executed by a processor of a server, the server can execute the above-mentioned information extraction method.

According to a fifth aspect of the embodiments of the present disclosure, a computer program product is provided, the computer program product includes a computer program, the computer program is stored in a readable storage medium, at least one processor of a computer device reads from the The storage medium reads and executes the computer program, so that the device executes the above information extraction method.

This disclosure converts the original evaluation information of multiple objects to obtain corresponding target evaluation information, and converts multilingual evaluation information into evaluation information in a unified target language, which can improve the convenience of subsequent processing based on target evaluation information performance; then the original evaluation information and target evaluation information of multiple objects are split into semantic units, and a semantic unit matching group is constructed based on the semantic unit splitting results, each of which includes a target semantic unit, and The target semantic unit has multiple native semantic units with the same semantics; the multiple native semantic units correspond to different languages, so that semantic units of different languages with the same semantics have a matching relationship; then based on the multiple target semantics Semantic clustering results of the units, and the semantic unit matching group, to obtain multilingual attribute description information corresponding to the plurality of objects. The multilingual attribute description information in this disclosure is extracted from the original evaluation information, thereby improving the effect of multilingual localization expression, and avoiding the fact that the translation is inaccurate based on machine translation, thereby improving the multilingual Language attribute describes the accuracy of information expression.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

Description of drawings

The accompanying drawings here are incorporated into the specification and constitute a part of the specification, show embodiments consistent with the disclosure, and are used together with the description to explain the principle of the disclosure, and do not constitute an improper limitation of the disclosure.

Fig. 1 is a schematic diagram showing an implementation environment according to an exemplary embodiment.

Fig. 2 is a flowchart of an information extraction method according to an exemplary embodiment.

Fig. 3 is a schematic diagram of language conversion according to an exemplary embodiment.

Fig. 4 is a flowchart of a semantic unit splitting method according to an exemplary embodiment.

Fig. 5 is a schematic diagram showing semantic unit splitting according to an exemplary embodiment.

Fig. 6 is a schematic diagram of a multilingual phrase matching process according to an exemplary embodiment.

Fig. 7 is a schematic diagram showing a multilingual phrase matching table according to an exemplary embodiment.

Fig. 8 is a flowchart of a semantic clustering method according to an exemplary embodiment.

Fig. 9 is a flow chart of a method for determining multilingual attribute description information corresponding to multiple objects according to an exemplary embodiment.

Fig. 10 is a flowchart showing a method for generating semantic vectors according to an exemplary embodiment.

Fig. 11 is a flowchart showing a method for determining clustering semantic units according to an exemplary embodiment.

Fig. 12 is a flowchart of a method for determining corresponding attribute description information for each object according to an exemplary embodiment.

Fig. 13 is a flow chart of a method for evaluating and mounting according to an exemplary embodiment.

Fig. 14 is a flowchart of a method for merging attribute description information according to an exemplary embodiment.

Fig. 15 is a schematic diagram of an information extraction device according to an exemplary embodiment.

Fig. 16 is a schematic structural diagram of a device according to an exemplary embodiment.

Detailed ways

In order to enable ordinary persons in the art to better understand the technical solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings.

It should be noted that the terms "first" and "second" in the specification and claims of the present disclosure and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present disclosure as recited in the appended claims.

Firstly, the relevant nouns involved in the embodiments of the present disclosure are described as follows:

Clustering: The process of semantically merging text into multiple classes.

Native evaluation: the actual evaluation text input by the user.

Word Embedding: Also known as Word Embedding, it is a general term for language models and representation learning techniques in natural language processing. Conceptually, it refers to embedding a high-dimensional space with all words in a dimension into a continuous vector space with a lower dimension, and each word or phrase is mapped to a vector on the real number field.

In the related technology, everyone's impression words are mainly generated by means of manual addition and algorithm assistance, and the multilingual expression of our impression words is completed with the help of machine translation. The accuracy of the multilingual everyone's impression words obtained through machine translation is affected by It is limited to the effect of machine translation, and the language localization expression of everyone's impression words is not good.

Please refer to FIG. 1, which shows a schematic diagram of an implementation environment provided by an embodiment of the present disclosure. The implementation environment may include: at least one first terminal 110 and a second terminal 120, the first terminal 110 and the second terminal 120 Data communication is possible via the network.

In some embodiments, the first terminal 110 can publish evaluation information on multiple objects in the relevant object platform; the second terminal 120 can obtain evaluation information on multiple objects, perform text analysis on the evaluation information of multiple objects, and Information extraction is to generate attribute description information corresponding to each object; thus, when the evaluation information of an object is browsed through the first terminal 110, the attribute description information corresponding to the object can be displayed.

The first terminal 110 may communicate with the second terminal 120 based on a browser/server mode (Browser/Server, B/S) or a client/server mode (Client/Server, C/S). The first terminal 110 may include physical devices such as smart phones, tablet computers, notebook computers, digital assistants, smart wearable devices, vehicle terminals, servers, etc., and may also include software running on physical devices, such as application programs wait. The operating system running on the first terminal 110 in the embodiment of the present disclosure may include but not limited to Android system, IOS system, linux, windows and so on.

The second terminal 120 and the first terminal 110 can establish a communication connection through wired or wireless, and the second terminal 120 can include an independently operated server, or a distributed server, or a server cluster composed of multiple servers, wherein the server can It is a cloud server.

In order to avoid the fact that the accuracy of multilingual public impression words is limited by the effect of machine translation, and the language localization expression effect of public impression words is not good, the embodiment of the present disclosure provides an information extraction method, the executive body of which can be The second terminal in Figure 1 may be a server, please refer to Figure 2, the information extraction method may include step S210: perform language conversion on the original evaluation information of multiple objects, and obtain target evaluation information corresponding to each original evaluation information ; Wherein, the original evaluation information of the plurality of objects includes multilingual original evaluation information; Step S220: Splitting the semantic units of the original evaluation information and the target evaluation information to obtain multiple original semantic units and multiple target semantic unit; step S230: construct a semantic unit matching group; wherein each of the semantic unit matching groups includes a target semantic unit, and a plurality of original semantic units having the same semantics as the target semantic unit; the multiple Native semantic units correspond to different languages; Step S240: Obtain multilingual attribute description information corresponding to the plurality of objects based on the semantic clustering results of the plurality of target semantic units and the matching group of the semantic units.

S210. Perform language conversion on native evaluation information of multiple objects to obtain target evaluation information corresponding to each piece of native evaluation information; wherein, the native evaluation information of multiple objects includes native evaluation information in multiple languages.

The native evaluation information of multiple objects includes native evaluation information in multiple languages. It may mean that the native evaluation information of different objects may include native evaluation information in the same language, or may include native evaluation information in different languages, that is, the native evaluation information corresponding to different objects. The number of languages of the evaluation varies.

For example, the native evaluation information of object 1 may include native evaluation information of language 1 and language 2, and the native evaluation information of object 2 may include native evaluation information of language 2 and language 3, so that object 1 and object 2 have the same native evaluation information of language 2 The evaluation information has original evaluation information in different languages 1 and 3. Among them, the original evaluation information with the same language is only in the same language, but the corresponding evaluation content is not necessarily the same.

Before the language conversion of the original evaluation information, the original evaluation information can also be preprocessed, which can include: firstly, the language identification of the original evaluation information is carried out to obtain the real language of the original evaluation, and then special character processing is performed to remove meaningless words in the text Characters, and finally check the spelling of words, correct wrong words, get more standardized text data, and prepare for the implementation of subsequent algorithms.

In some embodiments, the target language can be English. Please refer to FIG. 3, which shows a schematic diagram of language conversion. It can be seen from FIG. 3 that the original evaluation information 1 whose language is Russian is translated into a corresponding English evaluation. Information 1; similarly, for native evaluation information 2 whose language is Spanish, it can be translated into corresponding English evaluation information 2.

S220. Perform semantic unit splitting on the native evaluation information and the target evaluation information to obtain multiple native semantic units and multiple target semantic units.

Please refer to FIG. 4 , which shows a semantic unit splitting method, including the following steps S410 to S440.

S410. Perform semantic unit splitting on the native evaluation information to obtain multiple first semantic units.

S420. Deduplicate the multiple first semantic units to obtain the multiple original semantic units.

S430. Perform semantic unit splitting on the target evaluation information to obtain multiple second semantic units.

S440. Deduplicate the multiple second semantic units to obtain the multiple target semantic units.

In the embodiment of the present disclosure, a semantic unit may be a short sentence. In the original evaluation information, users easily combine multiple evaluation targets into one evaluation clause, so that the corresponding target evaluation information will also appear in parallel sentences, resulting in a decline in the text clustering effect; for example, in the e-commerce scenario , the object can be clothing, and the evaluation targets for clothing can be material quality, workmanship, and logistics evaluation. The disclosure recognizes parallel evaluation targets in the evaluation information through conjunctions, and then splits the evaluation information into multiple complete clauses through grammatical rules. In addition, due to the semantic splitting of multiple original evaluation information and multiple target evaluation information, there will be situations where the semantic units obtained after splitting are repeated. At this time, the semantic unit can be deduplicated to avoid semantic unit Redundancy improves data processing efficiency.

Please refer to Figure 5, which shows a schematic diagram of semantic unit splitting. It can be seen from Figure 5 that the original English evaluation information "High quality sewing and material" is split into "High quality sewing" and "High quality sewing" material", the evaluation targets here are "sewing" and "material".

S230. Construct a semantic unit matching group; each of the semantic unit matching groups includes a target semantic unit and a plurality of native semantic units having the same semantics as the target semantic unit; the plurality of native semantic units correspond to different language.

In the present disclosure, the Champollion algorithm can be used for text alignment to obtain a matching pair of the original semantic unit and the target semantic unit of each object, wherein each semantic unit matching group includes multiple semantic units of different languages with the same semantic meaning.

Please refer to Figure 6, which shows a schematic diagram of a multilingual phrase matching process, wherein, based on the same semantics, the original semantic unit and the corresponding target semantic unit are mapped to form a corresponding semantic unit matching relationship; each matching A pair (pair) includes a native semantic unit and a target semantic unit.

Please refer to FIG. 7, which shows a schematic diagram of a multilingual phrase matching table. For multiple matching pairs in FIG. 6, there are semantically identical matching pairs, and semantic unit matching groups can be generated based on semantically identical matching pairs; for example pair(Rc_k, Tm_n) in Figure 6, and pair(Se_f, Tc_j), where Tm_n=Tc_j, that is, the target semantic unit is the same, then the corresponding Rc_k and Se_f also have the same semantics, Rc_k and Se_f correspond to different languages, thus The semantic unit matching group (Tm_n, Rc_k, Se_f) can be constructed. A semantic unit matching table can be formed based on multiple semantic unit matching groups, and the semantic unit matching table can provide indexes for subsequent multilingual expressions.

Due to the different content of native evaluation information between different objects, for example, some objects have fewer native evaluations, or only one language of native evaluation information, it is impossible to construct a multilingual semantic unit matching relationship through its own native evaluation information. In the present disclosure, matching can be performed based on matching pairs of each object, and a multilingual semantic unit matching relationship can be constructed by complementing evaluation information between objects.

S240. Based on the semantic clustering results of the multiple target semantic units and the semantic unit matching group, obtain multilingual attribute description information corresponding to the multiple objects.

Please refer to FIG. 8 , which shows a semantic clustering method, including the following steps S810 to S830.

S810. Generate semantic vectors corresponding to the multiple target semantic units.

S820. Perform semantic clustering on the semantic vectors corresponding to the multiple target semantic units to obtain multiple target classes.

S830. Based on the semantic vectors in each of the target classes and the semantic unit matching group, determine a plurality of clustered semantic units from the plurality of target semantic units.

In the present disclosure, the K-means clustering algorithm can be used to obtain the category to which each sentence belongs. The K-means clustering algorithm is a common unsupervised and efficient clustering algorithm. With this algorithm, the semantic units of the same semantics can be clustered into the same class, where K in the K-means algorithm is obtained through the contour coefficient to be sure.

The clustering of the target semantic units in the present disclosure can be realized based on the corresponding semantic vectors, since the semantic vectors can fully reflect the characteristic information of the corresponding semantic units and are easy to calculate, thus improving the accuracy and convenience of the semantic unit clustering.

Please refer to FIG. 9 , which shows a method for determining multilingual attribute description information corresponding to multiple objects, including steps S910 and S920.

S910. Based on the semantic unit matching group, determine a plurality of native semantic units that match each of the clustered semantic units.

S920. Determine each of the clustering semantic units and multiple native semantic units matching the clustering semantic units as multilingual attribute description information corresponding to the multiple objects.

The attribute description information here can be used to represent the extraction and generalization information of the characteristics of multiple objects, which can reflect the characteristic information of the object, and the corresponding object can be roughly understood through the attribute description information. The multilingual attribute description information obtained here may refer to comprehensive attribute description information of multiple objects, or multiple attribute description information, and each item of attribute description information includes attribute description information in multiple languages and has the same semantics.

Please refer to FIG. 10 , which shows a method for generating a semantic vector, including steps S1010 to S1030.

S1010. Obtain the word vector contained in the target semantic unit based on the word vector of each word in each target semantic unit.

S1020. Average the word vectors included in the target semantic unit to obtain the semantic vector corresponding to the target semantic unit.

S1030. Based on the semantic vectors corresponding to each target semantic unit, obtain semantic vectors corresponding to the multiple target semantic units.

In the present disclosure, before determining the semantic vector of the target semantic unit, the word vector of each word in the target semantic unit needs to be calculated. Obtain a number of words involved in each target semantic unit through word segmentation, and perform word segmentation and part-of-speech tagging. Put the semantic units into the Word2Vec model according to the category, train the word vector of each word, and at the same time put the word segmentation in the word Index in the vector table; for each target semantic unit, retrieve the words contained in it in the word vector table to obtain the vector combination contained in each target semantic unit, and finally take the mean value of all the word vectors in the vector combination , to get the semantic vector expression of the target semantic unit. Computing the semantic vector of the target semantic unit based on the pre-generated word vector can improve the accuracy and convenience of semantic vector calculation.

Among them, the generation method of the word vector can also be realized by using a dynamic semantic vector model.

Please refer to FIG. 11 , which shows a method for determining a clustering semantic unit, including the following steps S1110 to S1150.

S1110. Determine the center semantic vector of each target class.

S1120. Based on the distance between each semantic vector in each of the target classes and the central semantic vector, determine a candidate semantic vector for each of the target classes.

S1130. Obtain a plurality of candidate semantic units according to the target semantic units corresponding to the candidate semantic vectors of each target class.

S1140. Based on the semantic unit matching group, determine the number of native semantic units matching each candidate semantic unit.

S1150. Based on the number of native semantic units matching each candidate semantic unit, determine the cluster semantic unit from the candidate semantic units.

In each target class, the corresponding central semantic vector can be determined first, and then the distance between other semantic vectors in the target class and the central semantic vector can be calculated, and the semantic vectors can be sorted from near to far based on the distance from the central semantic vector , for example, select the top 10% semantic vectors as the candidate semantic vectors corresponding to the target class.

After the candidate semantic vectors are obtained, the corresponding candidate semantic units can be obtained, and based on the above semantic unit matching group, the number of original semantic units matched by each candidate semantic unit can be determined. In the embodiment of the present disclosure, the candidate semantic unit with a larger number of matching native semantic units is selected as the clustering semantic unit, because the more the number of matching native semantic units, the more language types the corresponding multilingual expression can achieve. Multilingual semantic unit expression improves the diversity and richness of semantic unit expression.

Please refer to FIG. 12 , which shows a method for determining corresponding attribute description information for each object, including steps S1210 to S1230.

S1210. In response to determining the attribute description information of each object, traverse each of the clustering semantic units.

S1220. Find the current clustering semantic unit in the original evaluation information of the object.

S1230. In response to the original evaluation information of the object including the current cluster semantic information, determine the current cluster semantic unit as attribute description information of the object.

Since the above-mentioned clustering semantic unit is determined based on the original evaluation information of multiple objects, the corresponding clustering semantic unit is for multiple objects, and not every object corresponds to the above-mentioned multiple clustering semantics Units, at this point need to be personalized for each object separately. Each clustering semantic unit is matched with the original evaluation information of each object to determine the attribute description information of each object, which further improves the personalized display of object attribute information. The reason why it is necessary to generate clustering semantic units based on the original evaluation information of multiple objects is to realize the complementarity of multilingual information expressions between objects.

Please refer to FIG. 13 , which shows a method for evaluating and mounting, including steps S1310 to S1330.

S1310. For each item of attribute description information of the object, determine the sentiment value of the attribute description information.

S1320. Determine the original evaluation information of the object that includes the attribute description information and is consistent with the sentiment value of the attribute description information as the original evaluation information that matches the attribute description information.

S1330. Mount native evaluation information matching the attribute description information into the attribute description information.

Emotional values can include positive, negative, and neutral. In the case of mounting evaluation information, the accuracy of mounting can be improved based on the premise that the emotional value is consistent; the user can quickly identify the current object through attribute description information. Have a general understanding, in order to further obtain detailed evaluation information, you can mount each attribute description information and corresponding evaluation information, realize the classification of evaluation information, and be able to classify and obtain evaluation information; based on attribute description information Then the evaluation information related to the attribute description information can be obtained, which improves the convenience of obtaining the evaluation information.

Please refer to FIG. 14 , which shows a method for merging attribute description information, including steps S1410 to S1470.

S1410. Perform similarity calculation on any two items of attribute description information of the object.

S1420. Determine whether there is similar attribute description information in the attribute description information of the object; if there is similar attribute description information in the attribute description information of the object, perform step S1430; In the case that there is no similar attribute description information among the attribute description information of each item, step S1470 is executed.

S1430. Determine similar attribute information pairs based on similarity calculation results; each similar attribute information pair includes two pieces of attribute description information whose similarity is greater than a preset value.

S1440. Count the number of native evaluation information loaded by each attribute description information of the object.

S1450. Based on the descending order of the quantity of the mounted native evaluation information, sort each attribute description information.

S1460. Mount the native evaluation information corresponding to the lower attribute description information in the similar attribute information pair to the native evaluation information corresponding to the higher attribute description information in the similar attribute information pair.

S1470. Determine that the current attribute description information is the attribute description information of the object.

For each object, the attribute description information obtained by the above method contains attribute description information with inconsistent semantic granularity; for example, in an e-commerce scenario, the corresponding attribute description information may include "logistics fast", "delivery fast", " At this time, attribute description information with inconsistent semantic granularity can be merged, avoiding the fact that the semantic granularity of attribute description information in unsupervised clustering is inconsistent, and making the semantic level of attribute description information more consistent.

The method for merging attribute description information in the embodiment of the present disclosure may include step 3 in step 1 below.

1. ESIM model training; firstly, similar attribute information pairs are obtained through open source data sets and rules, and a training data set is constructed, and then ESIM model training is performed. The full name of ESIM is Enhanced Sequential Inference Model, which is an enhanced sequence inference model. Therefore, in this embodiment, the ESIM model is used to judge the similarity of attribute description information.

2. Judging the similarity of attribute description information; sort the attribute description information in order of the number of mounted native evaluation information from large to small, and use the model in 1 to identify and sort the top attribute description information The latter attributes describe the similarity relationship of information.

3. Merge similar attribute description information; if it is judged in 2 that the attribute description information ranked first and the attribute description information ranked lower are similar, then replace the attribute description information ranked lower with the earlier ranked attribute description information attribute description information, and mount the evaluation information corresponding to the lower-ranked attribute description information to the higher-ranked attribute description information.

Repeat steps 2 and 3 until the attribute description information of the objects is not similar to each other.

In the embodiment of the present disclosure, when the user terminal displays the attribute description information, the corresponding display language may be determined based on the user's definition, or may be determined based on the location information of the user terminal.

This disclosure mines the attribute description information based on the object dimension, and improves the personalization of the attribute description information through the difference of evaluation information between objects; on the clustering results, the ESIM algorithm is used to merge similar attribute description information, avoiding unnecessary Supervise the fact that the semantic granularity of the attribute description information in the clustering is inconsistent, so that the semantic level of the attribute description information is more consistent; through the complementarity of the evaluation information content between objects, the matching relationship between the original evaluation information and the target evaluation information is constructed, so that the attribute description Information is more localized in multilingual presentations.

This disclosure converts the original evaluation information of multiple objects to obtain corresponding target evaluation information, and converts multilingual evaluation information into evaluation information in a unified target language, which can improve the convenience of subsequent processing based on target evaluation information performance; then the original evaluation information and target evaluation information of multiple objects are split into semantic units, and a semantic unit matching group is constructed based on the semantic unit splitting results, each of which includes a target semantic unit, and The target semantic unit has multiple native semantic units with the same semantics; the multiple native semantic units correspond to different languages, so that semantic units of different languages with the same semantics have a matching relationship; then based on the multiple target semantics Semantic clustering results of the units, and the semantic unit matching group, to obtain multilingual attribute description information corresponding to the plurality of objects. The multilingual attribute description information in this disclosure is extracted from the original evaluation information, thereby improving the effect of multilingual localization expression, and avoiding the fact that the translation is inaccurate based on machine translation, thereby improving the multilingual The language attribute describes the accuracy of information expression.

Fig. 15 is a block diagram of an information extraction device according to an exemplary embodiment. Referring to FIG. 15 , the information extraction device includes a language conversion unit 1510 , a semantic unit splitting unit 1520 , a semantic unit matching group construction unit 1530 and an information generation unit 1540 .

The language conversion unit 1510 is configured to perform language conversion on the original evaluation information of multiple objects to obtain target evaluation information corresponding to each piece of original evaluation information; wherein, the original evaluation information of the multiple objects includes native Review information.

The semantic unit splitting unit 1520 is configured to split the native evaluation information and the target evaluation information into semantic units to obtain multiple native semantic units and multiple target semantic units.

The semantic unit matching group construction unit 1530 is configured to construct a semantic unit matching group; wherein each of the semantic unit matching groups includes a target semantic unit and multiple native semantic units having the same semantics as the target semantic unit; The multiple native semantic units correspond to different languages.

The information generation unit 1540 is configured to obtain multilingual attribute description information corresponding to the multiple objects based on the semantic clustering results of the multiple target semantic units and the semantic unit matching group.

In an exemplary embodiment, the information extraction device further includes:

In an exemplary embodiment, the information generating unit 1540 includes:

In an exemplary embodiment, the semantic vector generation unit includes:

In an exemplary embodiment, the first determination unit includes:

In an exemplary embodiment, the information extraction device further includes:

In an exemplary embodiment, the semantic unit splitting unit includes:

Regarding the apparatus in the foregoing embodiments, the specific manner in which each module executes operations has been described in detail in the embodiments related to the method, and will not be described in detail here.

In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium including instructions. In some embodiments, the computer-readable storage medium may be ROM, random access memory (RAM), CD- ROM, magnetic tape, floppy disk, and optical data storage device, etc.; when the instructions in the computer-readable storage medium are executed by the processor of the server, the server can perform any method as described above.

In an exemplary embodiment, there is also provided a computer program product comprising a computer program stored in a readable storage medium from which at least one processor of a computer device reads Reading and executing the computer program causes the device to perform any of the above methods.

Further, FIG. 16 shows a schematic diagram of a hardware structure of a device for implementing the method provided by the embodiment of the present disclosure, and the device may participate in constituting or include the apparatus provided by the embodiment of the present disclosure. As shown in FIG. 16, the device 10 may include one or more (shown as 102a, 102b, ..., 102n in the figure) processor 102 (the processor 102 may include but not limited to a microprocessor MCU or programmable logic A processing device such as a device FPGA, etc.), a memory 104 for storing data, and a transmission device 106 for a communication function. In addition, it can also include: a display, an input/output interface (I/O interface), a universal serial bus (USB) port (which can be included as one of the ports of the I/O interface), a network interface, a power supply and/or camera. Those skilled in the art can understand that the structure shown in FIG. 16 is only a schematic diagram, which does not limit the structure of the above-mentioned electronic device. For example, device 10 may also include more or fewer components than shown in FIG. 16 , or have a different configuration than that shown in FIG. 16 .

It should be noted that the one or more processors 102 and/or other data processing circuits described above may generally be referred to herein as "data processing circuits". The data processing circuit may be implemented in whole or in part as software, hardware, firmware or other arbitrary combinations. In addition, the data processing circuitry can be a single independent processing module, or be fully or partially integrated into any of the other elements in the device 10 (or mobile device). As involved in the embodiments of the present disclosure, the data processing circuit serves as a processor control (for example, the selection of the variable resistor terminal path connected to the interface).

The memory 104 can be used to store software programs and modules of application software, such as the program instruction/data storage device corresponding to the method described in the embodiments of the present disclosure, and the processor 102 executes the software program and modules stored in the memory 104 by running the Various functional applications and data processing are to implement the above-mentioned player preloading method or player running method. The memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 104 may further include memory located remotely from processor 102 , and such remote memory may be connected to device 10 via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or transmit data via a network. Examples of the aforementioned networks may include wireless networks provided by the communications provider of device 10 . In one example, the transmission device 106 includes a network adapter (Network Interface Controller, NIC), which can be connected to other network devices through a base station so as to communicate with the Internet. In one example, the transmission device 106 may be a radio frequency (Radio Frequency, RF) module, which is used to communicate with the Internet in a wireless manner.

The display may, for example, be a touchscreen liquid crystal display (LCD), which may enable a user to interact with the user interface of device 10 (or mobile device).

Any of the above-mentioned methods in this embodiment can be implemented based on the device shown in FIG. 16 .

All the embodiments of the present disclosure can be implemented independently or in combination with other embodiments, which are all regarded as the scope of protection required by the present disclosure.

Claims

An information extraction method, comprising:

Performing language conversion on the original evaluation information of multiple objects to obtain target evaluation information corresponding to each piece of original evaluation information; wherein, the original evaluation information of the multiple objects includes multilingual original evaluation information;

performing semantic unit splitting on the native evaluation information and the target evaluation information to obtain multiple native semantic units and multiple target semantic units;

Constructing a semantic unit matching group; wherein each of the semantic unit matching groups includes a target semantic unit and a plurality of native semantic units having the same semantics as the target semantic unit; the plurality of native semantic units correspond to different languages ;

Based on the semantic clustering results of the multiple target semantic units and the semantic unit matching group, the multilingual attribute description information corresponding to the multiple objects is obtained.
The information extraction method according to claim 1, further comprising:

generating semantic vectors corresponding to the plurality of target semantic units;

Semantic clustering is performed on the semantic vectors corresponding to the plurality of target semantic units to obtain a plurality of target classes;

A plurality of clustered semantic units are determined from the plurality of target semantic units based on the semantic vectors in each of the target classes and the semantic unit matching group.
The information extraction method according to claim 2, wherein, based on the semantic clustering results of the plurality of target semantic units and the matching group of the semantic units, the multilingual attributes corresponding to the plurality of objects are obtained Descriptive information includes:

Based on the semantic unit matching group, determine a plurality of native semantic units that match each of the clustered semantic units;

Each of the clustering semantic units and multiple native semantic units matching the clustering semantic units are determined as multilingual attribute description information corresponding to the multiple objects.
The information extraction method according to claim 2, wherein said generating semantic vectors corresponding to said plurality of target semantic units comprises:

Based on the word vector of each word in each of the target semantic units, the word vector contained in the target semantic unit is obtained;

Taking the average value of the word vectors contained in the target semantic unit to obtain the semantic vector corresponding to the target semantic unit;

Semantic vectors corresponding to the multiple target semantic units are obtained based on the semantic vectors corresponding to each target semantic unit.
The information extraction method according to claim 2, wherein, based on the semantic vectors in each of the target classes and the semantic unit matching group, a plurality of clusters are determined from the plurality of target semantic units Semantic units include:

determining a central semantic vector for each of said target classes;

Based on the distance between each semantic vector in each of the target classes and the central semantic vector, determine a candidate semantic vector for each of the target classes;

According to the target semantic unit corresponding to the candidate semantic vector of each target class, a plurality of candidate semantic units are obtained;

determining the number of native semantic units matching each candidate semantic unit based on the semantic unit matching group;

The clustered semantic units are determined from the candidate semantic units based on the number of native semantic units matching each candidate semantic unit.
The information extraction method according to claim 2, further comprising:

In response to determining the attribute description information of each object, traverse each clustering semantic unit, and perform the following operations based on each clustering semantic unit:

Find the current clustering semantic unit in the original evaluation information of the object;

In response to the original evaluation information of the object including the current cluster semantic information, the current cluster semantic unit is determined as the attribute description information of the object.
The information extraction method according to claim 6, further comprising:

For each attribute description information of the object, determine the sentiment value of the attribute description information;

Determining the original evaluation information of the object that includes the attribute description information and is consistent with the emotional value of the attribute description information as the original evaluation information that matches the attribute description information;

Mount the original evaluation information matching the attribute description information into the attribute description information.
The information extraction method according to claim 7, further comprising:

Perform similarity calculation on any two items of attribute description information of the object;

Determine similar attribute information pairs based on similarity calculation results; each similar attribute information pair includes two pieces of attribute description information whose similarity is greater than a preset value;

Counting the number of native evaluation information attached to each attribute description information of the object;

Based on the descending order of the number of native evaluation information mounted, sort the attribute description information;

Mount the native evaluation information corresponding to the attribute description information sorted last in the similar attribute information pair to the native evaluation information corresponding to the attribute description information sorted in front of the similar attribute information pair.
The information extraction method according to claim 1, wherein said splitting said native evaluation information and said target evaluation information into semantic units to obtain a plurality of original semantic units and a plurality of target semantic units comprises:

performing semantic unit splitting on the native evaluation information to obtain a plurality of first semantic units;

Deduplicating the plurality of first semantic units to obtain the plurality of original semantic units;

performing semantic unit splitting on the target evaluation information to obtain a plurality of second semantic units;

Deduplication is performed on the plurality of second semantic units to obtain the plurality of target semantic units.
An information extraction device, comprising:

The language conversion unit is configured to perform language conversion on the original evaluation information of multiple objects to obtain target evaluation information corresponding to each piece of original evaluation information; wherein, the original evaluation information of the plurality of objects includes multilingual original evaluation information information;

a semantic unit splitting unit configured to split the native evaluation information and the target evaluation information into semantic units to obtain multiple native semantic units and multiple target semantic units;

A semantic unit matching group construction unit configured to construct a semantic unit matching group; wherein each of the semantic unit matching groups includes a target semantic unit and multiple native semantic units having the same semantics as the target semantic unit; The above multiple native semantic units correspond to different languages;

The information generation unit is configured to obtain multilingual attribute description information corresponding to the plurality of objects based on the semantic clustering results of the plurality of target semantic units and the matching group of the semantic units.
The information extraction device according to claim 10, further comprising:

a semantic vector generating unit configured to generate semantic vectors corresponding to the plurality of target semantic units;

A semantic clustering unit configured to perform semantic clustering on the semantic vectors corresponding to the multiple target semantic units to obtain multiple target classes;

The first determining unit is configured to determine a plurality of clustering semantic units from the plurality of target semantic units based on the semantic vectors in each of the target classes and the semantic unit matching group.
The information extraction device according to claim 11, wherein the information generating unit comprises:

The second determining unit is configured to determine a plurality of native semantic units matching each of the clustering semantic units based on the semantic unit matching group;

The third determining unit is configured to determine each of the clustering semantic units and multiple native semantic units matching the clustering semantic units as the multilingual attribute description information corresponding to the multiple objects.
The information extraction device according to claim 11, wherein the semantic vector generating unit comprises:

The first word vector determination unit is configured to obtain the word vector contained in the target semantic unit based on the word vector of each word in each of the target semantic units;

an average calculation unit configured to average the word vectors contained in the target semantic unit to obtain the semantic vector corresponding to the target semantic unit;

The second word vector determining unit is configured to obtain semantic vectors corresponding to the plurality of target semantic units based on the semantic vectors corresponding to each target semantic unit.
The information extraction device according to claim 11, wherein the first determining unit comprises:

a central semantic vector determining unit configured to determine the central semantic vector of each of the target classes;

a candidate semantic vector determining unit configured to determine a candidate semantic vector for each of the target classes based on the distance between each semantic vector in each of the target classes and the central semantic vector;

The candidate semantic unit determining unit is configured to obtain a plurality of candidate semantic units according to the target semantic unit corresponding to the candidate semantic vector of each target class;

The first quantity determining unit is configured to determine the quantity of native semantic units matching each candidate semantic unit based on the semantic unit matching group;

The clustering semantic unit determining unit is configured to determine the clustering semantic unit from the candidate semantic units based on the number of native semantic units matching each candidate semantic unit.
The information extraction device according to claim 11, further comprising:

The traversal unit is configured to, in response to determining the attribute description information of each object, traverse each of the clustering semantic units, and perform the following operations based on each of the clustering semantic units:

a search unit configured to search for the current clustering semantic unit in the original evaluation information of the object;

The fourth determining unit is configured to determine the current clustering semantic unit as the attribute description information of the object in response to the original evaluation information of the object containing the current clustering semantic information.
The information extraction device according to claim 15, further comprising:

an emotional value determination unit configured to determine the emotional value of the attribute description information for each item of attribute description information of the object;

The fifth determination unit is configured to determine that the native evaluation information of the object includes the attribute description information and is consistent with the emotional value of the attribute description information as the original evaluation information that matches the attribute description information. evaluation information;

The first mounting unit is configured to mount native evaluation information matching the attribute description information into the attribute description information.
The information extraction device according to claim 16, further comprising:

The second quantity unit is configured to count the number of native evaluation information loaded by each attribute description information of the object;

The sorting unit is configured to sort the attribute description information based on the descending order of the number of native evaluation information mounted;

A similarity calculation unit configured to perform similarity calculations on each attribute description information of the object;

The similar attribute information pair determination unit is configured to determine similar attribute information pairs based on similarity calculation results; each similar attribute information pair includes two items of attribute description information whose similarity is greater than a preset value;

The second mounting unit is configured to mount the native evaluation information corresponding to the attribute description information ranked last in the similar attribute information pair to the native evaluation information corresponding to the attribute description information ranked first in the similar attribute information pair. evaluation information.
The information extraction device according to claim 10, wherein the semantic unit splitting unit comprises:

The first splitting unit is configured to split the original evaluation information into semantic units to obtain multiple first semantic units;

The first deduplication unit is configured to deduplicate the plurality of first semantic units to obtain the plurality of original semantic units;

The second splitting unit is configured to split the target evaluation information into semantic units to obtain a plurality of second semantic units;

The second deduplication unit is configured to deduplicate the plurality of second semantic units to obtain the plurality of target semantic units.
An electronic device comprising:

processor;

memory for storing said processor-executable instructions;

Wherein, the processor is configured to execute the instructions to achieve the following steps:

Performing language conversion on the original evaluation information of multiple objects to obtain target evaluation information corresponding to each piece of original evaluation information; wherein, the original evaluation information of the multiple objects includes multilingual original evaluation information;

performing semantic unit splitting on the native evaluation information and the target evaluation information to obtain multiple native semantic units and multiple target semantic units;

Constructing a semantic unit matching group; wherein each of the semantic unit matching groups includes a target semantic unit and a plurality of native semantic units having the same semantics as the target semantic unit; the plurality of native semantic units correspond to different languages ;

Based on the semantic clustering results of the multiple target semantic units and the semantic unit matching group, the multilingual attribute description information corresponding to the multiple objects is obtained.
The electronic device of claim 19, wherein the processor is further configured to:

generating semantic vectors corresponding to the plurality of target semantic units;

Semantic clustering is performed on the semantic vectors corresponding to the plurality of target semantic units to obtain a plurality of target classes;

A plurality of clustered semantic units are determined from the plurality of target semantic units based on the semantic vectors in each of the target classes and the semantic unit matching group.
The electronic device of claim 20, wherein the processor is further configured to:

Based on the semantic unit matching group, determine a plurality of native semantic units that match each of the clustered semantic units;

Each of the clustering semantic units and multiple native semantic units matching the clustering semantic units are determined as multilingual attribute description information corresponding to the multiple objects.
The electronic device of claim 20, wherein the processor is further configured to:

Based on the word vector of each word in each of the target semantic units, the word vector contained in the target semantic unit is obtained;

Taking the average value of the word vectors contained in the target semantic unit to obtain the semantic vector corresponding to the target semantic unit;

Semantic vectors corresponding to the multiple target semantic units are obtained based on the semantic vectors corresponding to each target semantic unit.
The electronic device of claim 20, wherein the processor is further configured to:

determining a central semantic vector for each of said target classes;

Based on the distance between each semantic vector in each of the target classes and the central semantic vector, determine a candidate semantic vector for each of the target classes;

According to the target semantic unit corresponding to the candidate semantic vector of each target class, a plurality of candidate semantic units are obtained;

determining the number of native semantic units matching each candidate semantic unit based on the semantic unit matching group;

Determining the clustering semantic units from the candidate semantic units based on the number of native semantic units matching each candidate semantic unit.
The electronic device of claim 20, wherein the processor is further configured to:

In response to determining the attribute description information of each object, traverse each clustering semantic unit, and perform the following operations based on each clustering semantic unit:

Find the current clustering semantic unit in the original evaluation information of the object;

In response to the original evaluation information of the object including the current cluster semantic information, the current cluster semantic unit is determined as the attribute description information of the object.
The electronic device of claim 24, wherein the processor is further configured to:

For each attribute description information of the object, determine the sentiment value of the attribute description information;

Determining the original evaluation information of the object that includes the attribute description information and is consistent with the emotional value of the attribute description information as the original evaluation information that matches the attribute description information;

Mount the original evaluation information matching the attribute description information into the attribute description information.
The electronic device of claim 25, wherein the processor is further configured to:

Perform similarity calculation on any two items of attribute description information of the object;

Determine similar attribute information pairs based on similarity calculation results; each similar attribute information pair includes two pieces of attribute description information whose similarity is greater than a preset value;

Counting the number of native evaluation information attached to each attribute description information of the object;

Based on the descending order of the number of native evaluation information mounted, sort the attribute description information;

Mounting the original evaluation information corresponding to the attribute description information ranked last in the similar attribute information pair to the original evaluation information corresponding to the attribute description information ranked first in the similar attribute information pair.
The electronic device of claim 19, wherein the processor is further configured to:

performing semantic unit splitting on the native evaluation information to obtain a plurality of first semantic units;

Deduplicating the plurality of first semantic units to obtain the plurality of original semantic units;

performing semantic unit splitting on the target evaluation information to obtain a plurality of second semantic units;

Deduplication is performed on the plurality of second semantic units to obtain the plurality of target semantic units.
A non-volatile computer-readable storage medium, wherein, when the instructions in the computer-readable storage medium are executed by a processor of the electronic device, the electronic device is enabled to perform the following steps:

Performing language conversion on the original evaluation information of multiple objects to obtain target evaluation information corresponding to each piece of original evaluation information; wherein, the original evaluation information of the multiple objects includes multilingual original evaluation information;

performing semantic unit splitting on the native evaluation information and the target evaluation information to obtain multiple native semantic units and multiple target semantic units;

Constructing a semantic unit matching group; wherein each of the semantic unit matching groups includes a target semantic unit and a plurality of native semantic units having the same semantics as the target semantic unit; the plurality of native semantic units correspond to different languages ;

Based on the semantic clustering results of the multiple target semantic units and the semantic unit matching group, the multilingual attribute description information corresponding to the multiple objects is obtained.
A computer program product comprising computer programs/instructions, wherein the computer programs/instructions implement the following steps when executed by a processor:

Performing language conversion on the original evaluation information of multiple objects to obtain target evaluation information corresponding to each piece of original evaluation information; wherein, the original evaluation information of the multiple objects includes multilingual original evaluation information;

performing semantic unit splitting on the native evaluation information and the target evaluation information to obtain multiple native semantic units and multiple target semantic units;

Constructing a semantic unit matching group; wherein each of the semantic unit matching groups includes a target semantic unit and a plurality of native semantic units having the same semantics as the target semantic unit; the plurality of native semantic units correspond to different languages ;

Based on the semantic clustering results of the multiple target semantic units and the semantic unit matching group, the multilingual attribute description information corresponding to the multiple objects is obtained.