WO2023060910A1 - Information extraction method and apparatus - Google Patents

Information extraction method and apparatus Download PDF

Info

Publication number
WO2023060910A1
WO2023060910A1 PCT/CN2022/096657 CN2022096657W WO2023060910A1 WO 2023060910 A1 WO2023060910 A1 WO 2023060910A1 CN 2022096657 W CN2022096657 W CN 2022096657W WO 2023060910 A1 WO2023060910 A1 WO 2023060910A1
Authority
WO
WIPO (PCT)
Prior art keywords
semantic
unit
target
information
units
Prior art date
Application number
PCT/CN2022/096657
Other languages
French (fr)
Chinese (zh)
Inventor
唐波
Original Assignee
北京达佳互联信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京达佳互联信息技术有限公司 filed Critical 北京达佳互联信息技术有限公司
Publication of WO2023060910A1 publication Critical patent/WO2023060910A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the present disclosure relates to the technical field of deep learning, and in particular to an information extraction method, device, equipment and storage medium.
  • e-commerce is shifting from a traditional business model to a content e-commerce model.
  • Content e-commerce is the content that needs to be valued. Through the integration and dissemination of brand owners, e-commerce platforms and various resources, it can accurately reach target users and increase conversion rates.
  • Evaluation is the largest piece of UGC content (User Generated Content) in the content e-commerce system.
  • UGC content User Generated Content
  • the quality of evaluation content organization will affect the user's decision-making time and conversion rate.
  • the current relatively novel way is everyone's impression words. This function is mainly to classify and summarize the evaluation content.
  • the public impression word may refer to a short sentence that frequently appears in the evaluation text and is used to describe the target object.
  • the disclosure provides an information extraction method, device, equipment and storage medium.
  • an information extraction method including:
  • the original evaluation information of the multiple objects includes multilingual original evaluation information
  • each of the semantic unit matching groups includes a target semantic unit and a plurality of native semantic units having the same semantics as the target semantic unit; the plurality of native semantic units correspond to different languages ;
  • the multilingual attribute description information corresponding to the multiple objects is obtained.
  • the information extraction method further includes:
  • Semantic clustering is performed on the semantic vectors corresponding to the plurality of target semantic units to obtain a plurality of target classes;
  • a plurality of clustered semantic units are determined from the plurality of target semantic units based on the semantic vectors in each of the target classes and the semantic unit matching group.
  • the obtaining the multilingual attribute description information corresponding to the multiple objects based on the semantic clustering results of the multiple target semantic units and the semantic unit matching group includes:
  • Each of the clustering semantic units and multiple native semantic units matching the clustering semantic units are determined as multilingual attribute description information corresponding to the multiple objects.
  • the generating semantic vectors corresponding to the plurality of target semantic units includes:
  • the word vector contained in the target semantic unit is obtained;
  • Semantic vectors corresponding to the multiple target semantic units are obtained based on the semantic vectors corresponding to each target semantic unit.
  • the determining a plurality of clustering semantic units from the plurality of target semantic units based on the semantic vectors in each of the target classes and the semantic unit matching group includes:
  • the clustered semantic units are determined from the candidate semantic units based on the number of native semantic units matching each candidate semantic unit.
  • the information extraction method further includes:
  • each clustering semantic unit In response to determining the attribute description information of each object, traverse each clustering semantic unit, and perform the following operations based on each clustering semantic unit:
  • the current cluster semantic unit is determined as the attribute description information of the object.
  • the information extraction method further includes:
  • the information extraction method further includes:
  • each similar attribute information pair includes two pieces of attribute description information whose similarity is greater than a preset value
  • the semantic unit splitting of the native evaluation information and the target evaluation information to obtain multiple native semantic units and multiple target semantic units includes:
  • Deduplication is performed on the plurality of second semantic units to obtain the plurality of target semantic units.
  • an information extraction device including:
  • the language conversion unit is configured to perform language conversion on the original evaluation information of multiple objects to obtain target evaluation information corresponding to each piece of original evaluation information; wherein, the original evaluation information of the plurality of objects includes multilingual original evaluation information information;
  • semantic unit splitting unit configured to split the native evaluation information and the target evaluation information into semantic units to obtain multiple native semantic units and multiple target semantic units
  • a semantic unit matching group construction unit configured to construct a semantic unit matching group; wherein each of the semantic unit matching groups includes a target semantic unit and multiple native semantic units having the same semantics as the target semantic unit; The above multiple native semantic units correspond to different languages;
  • the information generation unit is configured to obtain multilingual attribute description information corresponding to the plurality of objects based on the semantic clustering results of the plurality of target semantic units and the matching group of the semantic units.
  • the information extraction device further includes:
  • a semantic vector generating unit configured to generate semantic vectors corresponding to the plurality of target semantic units
  • a semantic clustering unit configured to perform semantic clustering on the semantic vectors corresponding to the multiple target semantic units to obtain multiple target classes
  • the first determining unit is configured to determine a plurality of clustering semantic units from the plurality of target semantic units based on the semantic vectors in each of the target classes and the semantic unit matching group.
  • the information generation unit includes:
  • the second determining unit is configured to determine a plurality of native semantic units matching each of the clustering semantic units based on the semantic unit matching group;
  • the third determining unit is configured to determine each of the clustering semantic units and multiple native semantic units matching the clustering semantic units as the multilingual attribute description information corresponding to the multiple objects.
  • the semantic vector generation unit includes:
  • the first word vector determination unit is configured to obtain the word vector contained in the target semantic unit based on the word vector of each word in each of the target semantic units;
  • an average calculation unit configured to average the word vectors contained in the target semantic unit to obtain the semantic vector corresponding to the target semantic unit
  • the second word vector determining unit is configured to obtain semantic vectors corresponding to the plurality of target semantic units based on the semantic vectors corresponding to each target semantic unit.
  • the first determination unit includes:
  • a central semantic vector determining unit configured to determine the central semantic vector of each of the target classes
  • a candidate semantic vector determining unit configured to determine a candidate semantic vector for each of the target classes based on the distance between each semantic vector in each of the target classes and the central semantic vector;
  • the candidate semantic unit determining unit is configured to obtain a plurality of candidate semantic units according to the target semantic unit corresponding to the candidate semantic vector of each target class;
  • the first quantity determining unit is configured to determine the quantity of native semantic units matching each candidate semantic unit based on the semantic unit matching group;
  • the clustering semantic unit determining unit is configured to determine the clustering semantic unit from the candidate semantic units based on the number of native semantic units matching each candidate semantic unit.
  • the information extraction device further includes:
  • the traversal unit is configured to, in response to determining the attribute description information of each object, traverse each of the clustering semantic units, and perform the following operations based on each of the clustering semantic units:
  • a search unit configured to search for the current clustering semantic unit in the original evaluation information of the object
  • the fourth determining unit is configured to determine the current clustering semantic unit as the attribute description information of the object in response to the original evaluation information of the object containing the current clustering semantic information.
  • the information extraction device further includes:
  • an emotional value determination unit configured to determine the emotional value of the attribute description information for each item of attribute description information of the object
  • the fifth determination unit is configured to determine that the native evaluation information of the object includes the attribute description information and is consistent with the emotional value of the attribute description information as the original evaluation information that matches the attribute description information. evaluation information;
  • the first mounting unit is configured to mount native evaluation information matching the attribute description information into the attribute description information.
  • the information extraction device further includes:
  • the second quantity unit is configured to count the number of native evaluation information loaded by each attribute description information of the object
  • the sorting unit is configured to sort the attribute description information based on the descending order of the number of native evaluation information mounted;
  • a similarity calculation unit configured to perform similarity calculations on each attribute description information of the object
  • the similar attribute information pair determination unit is configured to determine similar attribute information pairs based on similarity calculation results; each similar attribute information pair includes two items of attribute description information whose similarity is greater than a preset value;
  • the second mounting unit is configured to mount the native evaluation information corresponding to the attribute description information ranked last in the similar attribute information pair to the native evaluation information corresponding to the attribute description information ranked first in the similar attribute information pair. evaluation information.
  • the semantic unit splitting unit includes:
  • the first splitting unit is configured to split the original evaluation information into semantic units to obtain multiple first semantic units
  • the first deduplication unit is configured to deduplicate the plurality of first semantic units to obtain the plurality of original semantic units
  • the second splitting unit is configured to split the target evaluation information into semantic units to obtain a plurality of second semantic units
  • the second deduplication unit is configured to deduplicate the plurality of second semantic units to obtain the plurality of target semantic units.
  • an electronic device including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to implement Information extraction methods as described above.
  • a non-volatile computer-readable storage medium When instructions in the computer-readable storage medium are executed by a processor of a server, the server can execute the above-mentioned information extraction method.
  • a computer program product includes a computer program, the computer program is stored in a readable storage medium, at least one processor of a computer device reads from the The storage medium reads and executes the computer program, so that the device executes the above information extraction method.
  • This disclosure converts the original evaluation information of multiple objects to obtain corresponding target evaluation information, and converts multilingual evaluation information into evaluation information in a unified target language, which can improve the convenience of subsequent processing based on target evaluation information performance; then the original evaluation information and target evaluation information of multiple objects are split into semantic units, and a semantic unit matching group is constructed based on the semantic unit splitting results, each of which includes a target semantic unit, and
  • the target semantic unit has multiple native semantic units with the same semantics; the multiple native semantic units correspond to different languages, so that semantic units of different languages with the same semantics have a matching relationship; then based on the multiple target semantics Semantic clustering results of the units, and the semantic unit matching group, to obtain multilingual attribute description information corresponding to the plurality of objects.
  • the multilingual attribute description information in this disclosure is extracted from the original evaluation information, thereby improving the effect of multilingual localization expression, and avoiding the fact that the translation is inaccurate based on machine translation, thereby improving the multilingual Language attribute describes the accuracy of information expression.
  • Fig. 1 is a schematic diagram showing an implementation environment according to an exemplary embodiment.
  • Fig. 2 is a flowchart of an information extraction method according to an exemplary embodiment.
  • Fig. 3 is a schematic diagram of language conversion according to an exemplary embodiment.
  • Fig. 4 is a flowchart of a semantic unit splitting method according to an exemplary embodiment.
  • Fig. 5 is a schematic diagram showing semantic unit splitting according to an exemplary embodiment.
  • Fig. 6 is a schematic diagram of a multilingual phrase matching process according to an exemplary embodiment.
  • Fig. 7 is a schematic diagram showing a multilingual phrase matching table according to an exemplary embodiment.
  • Fig. 8 is a flowchart of a semantic clustering method according to an exemplary embodiment.
  • Fig. 9 is a flow chart of a method for determining multilingual attribute description information corresponding to multiple objects according to an exemplary embodiment.
  • Fig. 10 is a flowchart showing a method for generating semantic vectors according to an exemplary embodiment.
  • Fig. 11 is a flowchart showing a method for determining clustering semantic units according to an exemplary embodiment.
  • Fig. 12 is a flowchart of a method for determining corresponding attribute description information for each object according to an exemplary embodiment.
  • Fig. 13 is a flow chart of a method for evaluating and mounting according to an exemplary embodiment.
  • Fig. 14 is a flowchart of a method for merging attribute description information according to an exemplary embodiment.
  • Fig. 15 is a schematic diagram of an information extraction device according to an exemplary embodiment.
  • Fig. 16 is a schematic structural diagram of a device according to an exemplary embodiment.
  • Clustering The process of semantically merging text into multiple classes.
  • Native evaluation the actual evaluation text input by the user.
  • Word Embedding Also known as Word Embedding, it is a general term for language models and representation learning techniques in natural language processing. Conceptually, it refers to embedding a high-dimensional space with all words in a dimension into a continuous vector space with a lower dimension, and each word or phrase is mapped to a vector on the real number field.
  • the disclosure provides an information extraction method, device, equipment and storage medium.
  • FIG. 1 shows a schematic diagram of an implementation environment provided by an embodiment of the present disclosure.
  • the implementation environment may include: at least one first terminal 110 and a second terminal 120, the first terminal 110 and the second terminal 120 Data communication is possible via the network.
  • the first terminal 110 can publish evaluation information on multiple objects in the relevant object platform; the second terminal 120 can obtain evaluation information on multiple objects, perform text analysis on the evaluation information of multiple objects, and Information extraction is to generate attribute description information corresponding to each object; thus, when the evaluation information of an object is browsed through the first terminal 110, the attribute description information corresponding to the object can be displayed.
  • the first terminal 110 may communicate with the second terminal 120 based on a browser/server mode (Browser/Server, B/S) or a client/server mode (Client/Server, C/S).
  • the first terminal 110 may include physical devices such as smart phones, tablet computers, notebook computers, digital assistants, smart wearable devices, vehicle terminals, servers, etc., and may also include software running on physical devices, such as application programs wait.
  • the operating system running on the first terminal 110 in the embodiment of the present disclosure may include but not limited to Android system, IOS system, linux, windows and so on.
  • the second terminal 120 and the first terminal 110 can establish a communication connection through wired or wireless, and the second terminal 120 can include an independently operated server, or a distributed server, or a server cluster composed of multiple servers, wherein the server can It is a cloud server.
  • the embodiment of the present disclosure provides an information extraction method, the executive body of which can be The second terminal in Figure 1 may be a server, please refer to Figure 2, the information extraction method may include step S210: perform language conversion on the original evaluation information of multiple objects, and obtain target evaluation information corresponding to each original evaluation information ; Wherein, the original evaluation information of the plurality of objects includes multilingual original evaluation information; Step S220: Splitting the semantic units of the original evaluation information and the target evaluation information to obtain multiple original semantic units and multiple target semantic unit; step S230: construct a semantic unit matching group; wherein each of the semantic unit matching groups includes a target semantic unit, and a plurality of original semantic units having the same semantics as the target semantic unit; the multiple Native semantic units correspond to different languages; Step S240: Obtain multilingual attribute description information corresponding to the plurality of objects based on the semantic clustering results of the plurality of target semantic
  • S210 Perform language conversion on native evaluation information of multiple objects to obtain target evaluation information corresponding to each piece of native evaluation information; wherein, the native evaluation information of multiple objects includes native evaluation information in multiple languages.
  • the native evaluation information of multiple objects includes native evaluation information in multiple languages. It may mean that the native evaluation information of different objects may include native evaluation information in the same language, or may include native evaluation information in different languages, that is, the native evaluation information corresponding to different objects. The number of languages of the evaluation varies.
  • the native evaluation information of object 1 may include native evaluation information of language 1 and language 2
  • the native evaluation information of object 2 may include native evaluation information of language 2 and language 3, so that object 1 and object 2 have the same native evaluation information of language 2
  • the evaluation information has original evaluation information in different languages 1 and 3. Among them, the original evaluation information with the same language is only in the same language, but the corresponding evaluation content is not necessarily the same.
  • the original evaluation information can also be preprocessed, which can include: firstly, the language identification of the original evaluation information is carried out to obtain the real language of the original evaluation, and then special character processing is performed to remove meaningless words in the text Characters, and finally check the spelling of words, correct wrong words, get more standardized text data, and prepare for the implementation of subsequent algorithms.
  • the target language can be English.
  • FIG. 3 shows a schematic diagram of language conversion. It can be seen from FIG. 3 that the original evaluation information 1 whose language is Russian is translated into a corresponding English evaluation. Information 1; similarly, for native evaluation information 2 whose language is Spanish, it can be translated into corresponding English evaluation information 2.
  • S220 Perform semantic unit splitting on the native evaluation information and the target evaluation information to obtain multiple native semantic units and multiple target semantic units.
  • FIG. 4 shows a semantic unit splitting method, including the following steps S410 to S440.
  • S410 Perform semantic unit splitting on the native evaluation information to obtain multiple first semantic units.
  • a semantic unit may be a short sentence.
  • the object can be clothing, and the evaluation targets for clothing can be material quality, workmanship, and logistics evaluation.
  • the disclosure recognizes parallel evaluation targets in the evaluation information through conjunctions, and then splits the evaluation information into multiple complete clauses through grammatical rules.
  • the semantic unit can be deduplicated to avoid semantic unit Redundancy improves data processing efficiency.
  • Figure 5 shows a schematic diagram of semantic unit splitting. It can be seen from Figure 5 that the original English evaluation information "High quality sewing and material” is split into “High quality sewing” and “High quality sewing” material", the evaluation targets here are "sewing” and "material”.
  • each of the semantic unit matching groups includes a target semantic unit and a plurality of native semantic units having the same semantics as the target semantic unit; the plurality of native semantic units correspond to different language.
  • the Champollion algorithm can be used for text alignment to obtain a matching pair of the original semantic unit and the target semantic unit of each object, wherein each semantic unit matching group includes multiple semantic units of different languages with the same semantic meaning.
  • each matching A pair includes a native semantic unit and a target semantic unit.
  • FIG. 7 shows a schematic diagram of a multilingual phrase matching table.
  • the semantic unit matching group (Tm_n, Rc_k, Se_f) can be constructed.
  • a semantic unit matching table can be formed based on multiple semantic unit matching groups, and the semantic unit matching table can provide indexes for subsequent multilingual expressions.
  • a multilingual semantic unit matching relationship Due to the different content of native evaluation information between different objects, for example, some objects have fewer native evaluations, or only one language of native evaluation information, it is impossible to construct a multilingual semantic unit matching relationship through its own native evaluation information.
  • matching can be performed based on matching pairs of each object, and a multilingual semantic unit matching relationship can be constructed by complementing evaluation information between objects.
  • FIG. 8 shows a semantic clustering method, including the following steps S810 to S830.
  • the K-means clustering algorithm can be used to obtain the category to which each sentence belongs.
  • the K-means clustering algorithm is a common unsupervised and efficient clustering algorithm. With this algorithm, the semantic units of the same semantics can be clustered into the same class, where K in the K-means algorithm is obtained through the contour coefficient to be sure.
  • the clustering of the target semantic units in the present disclosure can be realized based on the corresponding semantic vectors, since the semantic vectors can fully reflect the characteristic information of the corresponding semantic units and are easy to calculate, thus improving the accuracy and convenience of the semantic unit clustering.
  • FIG. 9 shows a method for determining multilingual attribute description information corresponding to multiple objects, including steps S910 and S920.
  • the attribute description information here can be used to represent the extraction and generalization information of the characteristics of multiple objects, which can reflect the characteristic information of the object, and the corresponding object can be roughly understood through the attribute description information.
  • the multilingual attribute description information obtained here may refer to comprehensive attribute description information of multiple objects, or multiple attribute description information, and each item of attribute description information includes attribute description information in multiple languages and has the same semantics.
  • FIG. 10 shows a method for generating a semantic vector, including steps S1010 to S1030.
  • the word vector of each word in the target semantic unit needs to be calculated.
  • Computing the semantic vector of the target semantic unit based on the pre-generated word vector can improve the accuracy and convenience of semantic vector calculation.
  • the generation method of the word vector can also be realized by using a dynamic semantic vector model.
  • FIG. 11 shows a method for determining a clustering semantic unit, including the following steps S1110 to S1150.
  • the corresponding central semantic vector can be determined first, and then the distance between other semantic vectors in the target class and the central semantic vector can be calculated, and the semantic vectors can be sorted from near to far based on the distance from the central semantic vector , for example, select the top 10% semantic vectors as the candidate semantic vectors corresponding to the target class.
  • the corresponding candidate semantic units can be obtained, and based on the above semantic unit matching group, the number of original semantic units matched by each candidate semantic unit can be determined.
  • the candidate semantic unit with a larger number of matching native semantic units is selected as the clustering semantic unit, because the more the number of matching native semantic units, the more language types the corresponding multilingual expression can achieve.
  • Multilingual semantic unit expression improves the diversity and richness of semantic unit expression.
  • FIG. 12 shows a method for determining corresponding attribute description information for each object, including steps S1210 to S1230.
  • the corresponding clustering semantic unit is for multiple objects, and not every object corresponds to the above-mentioned multiple clustering semantics Units, at this point need to be personalized for each object separately.
  • Each clustering semantic unit is matched with the original evaluation information of each object to determine the attribute description information of each object, which further improves the personalized display of object attribute information. The reason why it is necessary to generate clustering semantic units based on the original evaluation information of multiple objects is to realize the complementarity of multilingual information expressions between objects.
  • FIG. 13 shows a method for evaluating and mounting, including steps S1310 to S1330.
  • S1320 Determine the original evaluation information of the object that includes the attribute description information and is consistent with the sentiment value of the attribute description information as the original evaluation information that matches the attribute description information.
  • Emotional values can include positive, negative, and neutral.
  • the accuracy of mounting can be improved based on the premise that the emotional value is consistent; the user can quickly identify the current object through attribute description information.
  • FIG. 14 shows a method for merging attribute description information, including steps S1410 to S1470.
  • step S1420 Determine whether there is similar attribute description information in the attribute description information of the object; if there is similar attribute description information in the attribute description information of the object, perform step S1430; In the case that there is no similar attribute description information among the attribute description information of each item, step S1470 is executed.
  • each similar attribute information pair includes two pieces of attribute description information whose similarity is greater than a preset value.
  • the attribute description information obtained by the above method contains attribute description information with inconsistent semantic granularity; for example, in an e-commerce scenario, the corresponding attribute description information may include "logistics fast”, “delivery fast”, " At this time, attribute description information with inconsistent semantic granularity can be merged, avoiding the fact that the semantic granularity of attribute description information in unsupervised clustering is inconsistent, and making the semantic level of attribute description information more consistent.
  • the method for merging attribute description information in the embodiment of the present disclosure may include step 3 in step 1 below.
  • ESIM model training firstly, similar attribute information pairs are obtained through open source data sets and rules, and a training data set is constructed, and then ESIM model training is performed.
  • the full name of ESIM is Enhanced Sequential Inference Model, which is an enhanced sequence inference model. Therefore, in this embodiment, the ESIM model is used to judge the similarity of attribute description information.
  • the corresponding display language when the user terminal displays the attribute description information, the corresponding display language may be determined based on the user's definition, or may be determined based on the location information of the user terminal.
  • This disclosure mines the attribute description information based on the object dimension, and improves the personalization of the attribute description information through the difference of evaluation information between objects; on the clustering results, the ESIM algorithm is used to merge similar attribute description information, avoiding unnecessary Supervise the fact that the semantic granularity of the attribute description information in the clustering is inconsistent, so that the semantic level of the attribute description information is more consistent; through the complementarity of the evaluation information content between objects, the matching relationship between the original evaluation information and the target evaluation information is constructed, so that the attribute description Information is more localized in multilingual presentations.
  • This disclosure converts the original evaluation information of multiple objects to obtain corresponding target evaluation information, and converts multilingual evaluation information into evaluation information in a unified target language, which can improve the convenience of subsequent processing based on target evaluation information performance; then the original evaluation information and target evaluation information of multiple objects are split into semantic units, and a semantic unit matching group is constructed based on the semantic unit splitting results, each of which includes a target semantic unit, and
  • the target semantic unit has multiple native semantic units with the same semantics; the multiple native semantic units correspond to different languages, so that semantic units of different languages with the same semantics have a matching relationship; then based on the multiple target semantics Semantic clustering results of the units, and the semantic unit matching group, to obtain multilingual attribute description information corresponding to the plurality of objects.
  • the multilingual attribute description information in this disclosure is extracted from the original evaluation information, thereby improving the effect of multilingual localization expression, and avoiding the fact that the translation is inaccurate based on machine translation, thereby improving the multilingual
  • the language attribute describes the accuracy of information expression.
  • Fig. 15 is a block diagram of an information extraction device according to an exemplary embodiment.
  • the information extraction device includes a language conversion unit 1510 , a semantic unit splitting unit 1520 , a semantic unit matching group construction unit 1530 and an information generation unit 1540 .
  • the language conversion unit 1510 is configured to perform language conversion on the original evaluation information of multiple objects to obtain target evaluation information corresponding to each piece of original evaluation information; wherein, the original evaluation information of the multiple objects includes native Review information.
  • the semantic unit splitting unit 1520 is configured to split the native evaluation information and the target evaluation information into semantic units to obtain multiple native semantic units and multiple target semantic units.
  • the semantic unit matching group construction unit 1530 is configured to construct a semantic unit matching group; wherein each of the semantic unit matching groups includes a target semantic unit and multiple native semantic units having the same semantics as the target semantic unit; The multiple native semantic units correspond to different languages.
  • the information generation unit 1540 is configured to obtain multilingual attribute description information corresponding to the multiple objects based on the semantic clustering results of the multiple target semantic units and the semantic unit matching group.
  • the information extraction device further includes:
  • a semantic vector generating unit configured to generate semantic vectors corresponding to the plurality of target semantic units
  • a semantic clustering unit configured to perform semantic clustering on the semantic vectors corresponding to the multiple target semantic units to obtain multiple target classes
  • the first determining unit is configured to determine a plurality of clustering semantic units from the plurality of target semantic units based on the semantic vectors in each of the target classes and the semantic unit matching group.
  • the information generating unit 1540 includes:
  • the second determining unit is configured to determine a plurality of native semantic units matching each of the clustering semantic units based on the semantic unit matching group;
  • the third determining unit is configured to determine each of the clustering semantic units and multiple native semantic units matching the clustering semantic units as the multilingual attribute description information corresponding to the multiple objects.
  • the semantic vector generation unit includes:
  • the first word vector determination unit is configured to obtain the word vector contained in the target semantic unit based on the word vector of each word in each of the target semantic units;
  • an average calculation unit configured to average the word vectors contained in the target semantic unit to obtain the semantic vector corresponding to the target semantic unit
  • the second word vector determining unit is configured to obtain semantic vectors corresponding to the plurality of target semantic units based on the semantic vectors corresponding to each target semantic unit.
  • the first determination unit includes:
  • a central semantic vector determining unit configured to determine the central semantic vector of each of the target classes
  • a candidate semantic vector determining unit configured to determine a candidate semantic vector for each of the target classes based on the distance between each semantic vector in each of the target classes and the central semantic vector;
  • the candidate semantic unit determining unit is configured to obtain a plurality of candidate semantic units according to the target semantic unit corresponding to the candidate semantic vector of each target class;
  • the first quantity determining unit is configured to determine the quantity of native semantic units matching each candidate semantic unit based on the semantic unit matching group;
  • the clustering semantic unit determining unit is configured to determine the clustering semantic unit from the candidate semantic units based on the number of native semantic units matching each candidate semantic unit.
  • the information extraction device further includes:
  • the traversal unit is configured to, in response to determining the attribute description information of each object, traverse each of the clustering semantic units, and perform the following operations based on each of the clustering semantic units:
  • a search unit configured to search for the current clustering semantic unit in the original evaluation information of the object
  • the fourth determining unit is configured to determine the current clustering semantic unit as the attribute description information of the object in response to the original evaluation information of the object containing the current clustering semantic information.
  • the information extraction device further includes:
  • an emotional value determination unit configured to determine the emotional value of the attribute description information for each item of attribute description information of the object
  • the fifth determination unit is configured to determine that the native evaluation information of the object includes the attribute description information and is consistent with the emotional value of the attribute description information as the original evaluation information that matches the attribute description information. evaluation information;
  • the first mounting unit is configured to mount native evaluation information matching the attribute description information into the attribute description information.
  • the information extraction device further includes:
  • the second quantity unit is configured to count the number of native evaluation information loaded by each attribute description information of the object
  • the sorting unit is configured to sort the attribute description information based on the descending order of the number of native evaluation information mounted;
  • a similarity calculation unit configured to perform similarity calculations on each attribute description information of the object
  • the similar attribute information pair determination unit is configured to determine similar attribute information pairs based on similarity calculation results; each similar attribute information pair includes two items of attribute description information whose similarity is greater than a preset value;
  • the second mounting unit is configured to mount the native evaluation information corresponding to the attribute description information ranked last in the similar attribute information pair to the native evaluation information corresponding to the attribute description information ranked first in the similar attribute information pair. evaluation information.
  • the semantic unit splitting unit includes:
  • the first splitting unit is configured to split the original evaluation information into semantic units to obtain multiple first semantic units
  • the first deduplication unit is configured to deduplicate the plurality of first semantic units to obtain the plurality of original semantic units
  • the second splitting unit is configured to split the target evaluation information into semantic units to obtain a plurality of second semantic units
  • the second deduplication unit is configured to deduplicate the plurality of second semantic units to obtain the plurality of target semantic units.
  • the computer-readable storage medium may be ROM, random access memory (RAM), CD- ROM, magnetic tape, floppy disk, and optical data storage device, etc.; when the instructions in the computer-readable storage medium are executed by the processor of the server, the server can perform any method as described above.
  • a computer program product comprising a computer program stored in a readable storage medium from which at least one processor of a computer device reads Reading and executing the computer program causes the device to perform any of the above methods.
  • FIG. 16 shows a schematic diagram of a hardware structure of a device for implementing the method provided by the embodiment of the present disclosure, and the device may participate in constituting or include the apparatus provided by the embodiment of the present disclosure.
  • the device 10 may include one or more (shown as 102a, 102b, ..., 102n in the figure) processor 102 (the processor 102 may include but not limited to a microprocessor MCU or programmable logic A processing device such as a device FPGA, etc.), a memory 104 for storing data, and a transmission device 106 for a communication function.
  • processor 102 may include but not limited to a microprocessor MCU or programmable logic A processing device such as a device FPGA, etc.
  • a memory 104 for storing data
  • a transmission device 106 for a communication function.
  • FIG. 16 is only a schematic diagram, which does not limit the structure of the above-mentioned electronic device.
  • device 10 may also include more or fewer components than shown in FIG. 16 , or have a different configuration than that shown in FIG. 16 .
  • the one or more processors 102 and/or other data processing circuits described above may generally be referred to herein as "data processing circuits".
  • the data processing circuit may be implemented in whole or in part as software, hardware, firmware or other arbitrary combinations.
  • the data processing circuitry can be a single independent processing module, or be fully or partially integrated into any of the other elements in the device 10 (or mobile device).
  • the data processing circuit serves as a processor control (for example, the selection of the variable resistor terminal path connected to the interface).
  • the memory 104 can be used to store software programs and modules of application software, such as the program instruction/data storage device corresponding to the method described in the embodiments of the present disclosure, and the processor 102 executes the software program and modules stored in the memory 104 by running the Various functional applications and data processing are to implement the above-mentioned player preloading method or player running method.
  • the memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • memory 104 may further include memory located remotely from processor 102 , and such remote memory may be connected to device 10 via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the transmission device 106 is used to receive or transmit data via a network.
  • Examples of the aforementioned networks may include wireless networks provided by the communications provider of device 10 .
  • the transmission device 106 includes a network adapter (Network Interface Controller, NIC), which can be connected to other network devices through a base station so as to communicate with the Internet.
  • the transmission device 106 may be a radio frequency (Radio Frequency, RF) module, which is used to communicate with the Internet in a wireless manner.
  • RF Radio Frequency
  • the display may, for example, be a touchscreen liquid crystal display (LCD), which may enable a user to interact with the user interface of device 10 (or mobile device).
  • LCD liquid crystal display

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The present disclosure relates to the technical field of deep learning, and relates to an information extraction method and apparatus, a device, and a storage medium. The information extraction method comprises: performing language conversion on native evaluation information of a plurality of objects to obtain target evaluation information corresponding to each piece of native evaluation information; performing semantic unit splitting on the native evaluation information and the target evaluation information to obtain a plurality of native semantic units and a plurality of target semantic units; constructing semantic unit matching groups, wherein each semantic unit matching group comprises a target semantic unit and a plurality of native semantic units having the same semantics as the target semantic unit, and the plurality of native semantic units correspond to different languages; and on the basis of a semantic clustering result of the plurality of target semantic units and the semantic unit matching groups, obtaining multilingual attribute description information corresponding to the plurality of objects.

Description

信息抽取方法及装置Information extraction method and device
相关申请的交叉引用Cross References to Related Applications
本申请基于申请号为202111180788.5、申请日为2021年10月11日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is based on a Chinese patent application with application number 202111180788.5 and a filing date of October 11, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference into this application.
技术领域technical field
本公开涉及深度学习技术领域,尤其涉及一种信息抽取方法、装置、设备及存储介质。The present disclosure relates to the technical field of deep learning, and in particular to an information extraction method, device, equipment and storage medium.
背景技术Background technique
当前电商正从传统商业模式转向内容电商模式,内容电商是将有需求价值的内容,通过品牌主、电商平台及各种资源的整合传播,精准触达目标用户,提高转化率。评价作为内容电商系统中一块最大的UGC内容(User Generated Content,用户原创内容),评价内容组织的好坏将影响到用户的决策时间和和转化率。在评价内容的组织上,目前比较新颖的方式是大家印象词,该功能主要是对评价内容进行分类和总结。大家印象词可以是指评价文本中频繁出现的用于对目标对象进行描述的一个短句。At present, e-commerce is shifting from a traditional business model to a content e-commerce model. Content e-commerce is the content that needs to be valued. Through the integration and dissemination of brand owners, e-commerce platforms and various resources, it can accurately reach target users and increase conversion rates. Evaluation is the largest piece of UGC content (User Generated Content) in the content e-commerce system. The quality of evaluation content organization will affect the user's decision-making time and conversion rate. In the organization of evaluation content, the current relatively novel way is everyone's impression words. This function is mainly to classify and summarize the evaluation content. The public impression word may refer to a short sentence that frequently appears in the evaluation text and is used to describe the target object.
发明内容Contents of the invention
本公开提供一种信息抽取方法、装置、设备及存储介质。The disclosure provides an information extraction method, device, equipment and storage medium.
根据本公开实施例的第一方面,提供一种信息抽取方法,包括:According to the first aspect of the embodiments of the present disclosure, an information extraction method is provided, including:
对多个对象的原生评价信息进行语种转换,得到与每条原生评价信息对应的目标评价信息;其中,所述多个对象的原生评价信息中包括多语种的原生评价信息;Performing language conversion on the original evaluation information of multiple objects to obtain target evaluation information corresponding to each piece of original evaluation information; wherein, the original evaluation information of the multiple objects includes multilingual original evaluation information;
对所述原生评价信息和所述目标评价信息进行语义单元拆分,得到多个原生语义单元和多个目标语义单元;performing semantic unit splitting on the native evaluation information and the target evaluation information to obtain multiple native semantic units and multiple target semantic units;
构建语义单元匹配组;其中每个所述语义单元匹配组中包括一个目标语义单元,以及与所述目标语义单元具有相同语义的多个原生语义单元;所述多个原生语义单元对应不同的语种;Constructing a semantic unit matching group; wherein each of the semantic unit matching groups includes a target semantic unit and a plurality of native semantic units having the same semantics as the target semantic unit; the plurality of native semantic units correspond to different languages ;
基于对所述多个目标语义单元的语义聚类结果,以及所述语义单元匹配组,得到与所述多个对象对应的多语种属性描述信息。Based on the semantic clustering results of the multiple target semantic units and the semantic unit matching group, the multilingual attribute description information corresponding to the multiple objects is obtained.
在一示例性实施例中,所述信息抽取方法还包括:In an exemplary embodiment, the information extraction method further includes:
生成与所述多个目标语义单元对应的语义向量;generating semantic vectors corresponding to the plurality of target semantic units;
对所述与所述多个目标语义单元对应的语义向量进行语义聚类,得到多个目标类;Semantic clustering is performed on the semantic vectors corresponding to the plurality of target semantic units to obtain a plurality of target classes;
基于每个所述目标类中的语义向量,以及所述语义单元匹配组,从所述多个目标语义单元中确定出多个聚类语义单元。A plurality of clustered semantic units are determined from the plurality of target semantic units based on the semantic vectors in each of the target classes and the semantic unit matching group.
在一示例性实施例中,所述基于对所述多个目标语义单元的语义聚类结果,以及所述语义单元匹配组,得到与所述多个对象对应的多语种属性描述信息包括:In an exemplary embodiment, the obtaining the multilingual attribute description information corresponding to the multiple objects based on the semantic clustering results of the multiple target semantic units and the semantic unit matching group includes:
基于所述语义单元匹配组,确定与每项所述聚类语义单元相匹配的多个原生语义单元;Based on the semantic unit matching group, determine a plurality of native semantic units that match each of the clustered semantic units;
将每项所述聚类语义单元,以及与所述聚类语义单元相匹配的多个原生语义单元确定为与所述多个对象对应的多语种属性描述信息。Each of the clustering semantic units and multiple native semantic units matching the clustering semantic units are determined as multilingual attribute description information corresponding to the multiple objects.
在一示例性实施例中,所述生成与所述多个目标语义单元对应的语义向量包括:In an exemplary embodiment, the generating semantic vectors corresponding to the plurality of target semantic units includes:
基于每个所述目标语义单元中每个词语的词向量,得到所述目标语义单元包含的词向量;Based on the word vector of each word in each of the target semantic units, the word vector contained in the target semantic unit is obtained;
对所述目标语义单元包含的词向量取平均值,得到所述目标语义单元对应的所述语义向量;Taking the average value of the word vectors contained in the target semantic unit to obtain the semantic vector corresponding to the target semantic unit;
基于各目标语义单元对应的所述语义向量,得到与所述多个目标语义单元对应的语义向量。Semantic vectors corresponding to the multiple target semantic units are obtained based on the semantic vectors corresponding to each target semantic unit.
在一示例性实施例中,所述基于每个所述目标类中的语义向量,以及所述语义单元匹配组,从所述多个目标语义单元中确定出多个聚类语义单元包括:In an exemplary embodiment, the determining a plurality of clustering semantic units from the plurality of target semantic units based on the semantic vectors in each of the target classes and the semantic unit matching group includes:
确定每个所述目标类的中心语义向量;determining a central semantic vector for each of said target classes;
基于每个所述目标类中的各语义向量与所述中心语义向量的距离,确定每个所述目标类的候选语义向量;Based on the distance between each semantic vector in each of the target classes and the central semantic vector, determine a candidate semantic vector for each of the target classes;
根据每个所述目标类的候选语义向量对应的目标语义单元,得到多个候选语义单元;According to the target semantic unit corresponding to the candidate semantic vector of each target class, a plurality of candidate semantic units are obtained;
基于所述语义单元匹配组,确定与每个候选语义单元相匹配的原生语义单元的数量;determining the number of native semantic units matching each candidate semantic unit based on the semantic unit matching group;
基于与每个候选语义单元相匹配的原生语义单元的数量,从所述候选语义单元中确定出所述聚类语义单元。The clustered semantic units are determined from the candidate semantic units based on the number of native semantic units matching each candidate semantic unit.
在一示例性实施例中,所述信息抽取方法还包括:In an exemplary embodiment, the information extraction method further includes:
响应于确定每个对象的属性描述信息,遍历每个所述聚类语义单元,基于每个所述聚类语义单元执行以下操作:In response to determining the attribute description information of each object, traverse each clustering semantic unit, and perform the following operations based on each clustering semantic unit:
在所述对象的原生评价信息中查找当前聚类语义单元;Find the current clustering semantic unit in the original evaluation information of the object;
响应于所述对象的原生评价信息中包含所述当前聚类语义信息,将所述当前聚类语义单元确定为所述对象的属性描述信息。In response to the original evaluation information of the object including the current cluster semantic information, the current cluster semantic unit is determined as the attribute description information of the object.
在一示例性实施例中,所述信息抽取方法还包括:In an exemplary embodiment, the information extraction method further includes:
对于所述对象的每项属性描述信息,确定所述属性描述信息的情感值;For each attribute description information of the object, determine the sentiment value of the attribute description information;
将所述对象的原生评价信息中包含所述属性描述信息,与所述属性描述信息的情感值一致的原生评价信息确定为与所述属性描述信息相匹配的原生评价信息;Determining the original evaluation information of the object that includes the attribute description information and is consistent with the emotional value of the attribute description information as the original evaluation information that matches the attribute description information;
将与所述属性描述信息相匹配的原生评价信息挂载到所述属性描述信息中。Mount the original evaluation information matching the attribute description information into the attribute description information.
在一示例性实施例中,所述信息抽取方法还包括:In an exemplary embodiment, the information extraction method further includes:
对所述对象的各项属性描述信息中任意两项属性描述信息进行相似度计算;Perform similarity calculation on any two items of attribute description information of the object;
基于相似度计算结果,确定相似属性信息对;每项所述相似属性信息对中包括相似度大于预设值的两项属性描述信息;Determine similar attribute information pairs based on similarity calculation results; each similar attribute information pair includes two pieces of attribute description information whose similarity is greater than a preset value;
统计所述对象的各项属性描述信息所挂载的原生评价信息的数量;Counting the number of native evaluation information attached to each attribute description information of the object;
基于所挂载的原生评价信息的数量由大到小的顺序,对各项属性描述信息进行排序;Based on the descending order of the number of native evaluation information mounted, sort the attribute description information;
将所述相似属性信息对中排序在后的属性描述信息对应的原生评价信息,挂载到所述相似属性信息对中排序在前的属性描述信息对应的原生评价信息中。Mounting the original evaluation information corresponding to the attribute description information ranked last in the similar attribute information pair to the original evaluation information corresponding to the attribute description information ranked first in the similar attribute information pair.
在一示例性实施例中,所述对所述原生评价信息和所述目标评价信息进行语义单元拆分,得到多个原生语义单元和多个目标语义单元包括:In an exemplary embodiment, the semantic unit splitting of the native evaluation information and the target evaluation information to obtain multiple native semantic units and multiple target semantic units includes:
对所述原生评价信息进行语义单元拆分,得到多个第一语义单元;performing semantic unit splitting on the native evaluation information to obtain a plurality of first semantic units;
对所述多个第一语义单元进行去重,得到所述多个原生语义单元;Deduplicating the plurality of first semantic units to obtain the plurality of original semantic units;
对所述目标评价信息进行语义单元拆分,得到多个第二语义单元;performing semantic unit splitting on the target evaluation information to obtain a plurality of second semantic units;
对所述多个第二语义单元进行去重,得到所述多个目标语义单元。Deduplication is performed on the plurality of second semantic units to obtain the plurality of target semantic units.
根据本公开实施例的第二方面,提供一种信息抽取装置,包括:According to a second aspect of an embodiment of the present disclosure, an information extraction device is provided, including:
语种转换单元,被配置为对多个对象的原生评价信息进行语种转换,得到与每条原生评价信息对应的目标评价信息;其中,所述多个对象的原生评价信息中包括多语种的原生评价信息;The language conversion unit is configured to perform language conversion on the original evaluation information of multiple objects to obtain target evaluation information corresponding to each piece of original evaluation information; wherein, the original evaluation information of the plurality of objects includes multilingual original evaluation information information;
语义单元拆分单元,被配置为对所述原生评价信息和所述目标评价信息进行语义单元拆分,得到多个原生语义单元和多个目标语义单元;a semantic unit splitting unit configured to split the native evaluation information and the target evaluation information into semantic units to obtain multiple native semantic units and multiple target semantic units;
语义单元匹配组构建单元,被配置为构建语义单元匹配组;其中每个所述语义单元匹配组中包括一个目标语义单元,以及与所述目标语义单元具有相同语义的多个原生语义单元;所述多个原生语义单元对应不同的语种;A semantic unit matching group construction unit configured to construct a semantic unit matching group; wherein each of the semantic unit matching groups includes a target semantic unit and multiple native semantic units having the same semantics as the target semantic unit; The above multiple native semantic units correspond to different languages;
信息生成单元,被配置为基于对所述多个目标语义单元的语义聚类结果,以及所述语义单元匹配组,得到与所述多个对象对应的多语种属性描述信息。The information generation unit is configured to obtain multilingual attribute description information corresponding to the plurality of objects based on the semantic clustering results of the plurality of target semantic units and the matching group of the semantic units.
在一示例性实施例中,所述信息抽取装置还包括:In an exemplary embodiment, the information extraction device further includes:
语义向量生成单元,被配置为生成与所述多个目标语义单元对应的语义向量;a semantic vector generating unit configured to generate semantic vectors corresponding to the plurality of target semantic units;
语义聚类单元,被配置为对所述与所述多个目标语义单元对应的语义向量进行语义聚类,得到多个目标类;A semantic clustering unit configured to perform semantic clustering on the semantic vectors corresponding to the multiple target semantic units to obtain multiple target classes;
第一确定单元,被配置为基于每个所述目标类中的语义向量,以及所述语义单元匹配组,从所述多个目标语义单元中确定出多个聚类语义单元。The first determining unit is configured to determine a plurality of clustering semantic units from the plurality of target semantic units based on the semantic vectors in each of the target classes and the semantic unit matching group.
在一示例性实施例中,所述信息生成单元包括:In an exemplary embodiment, the information generation unit includes:
第二确定单元,被配置为基于所述语义单元匹配组,确定与每项所述聚类语义单元相匹配的多个原生语义单元;The second determining unit is configured to determine a plurality of native semantic units matching each of the clustering semantic units based on the semantic unit matching group;
第三确定单元,被配置为将每项所述聚类语义单元,以及与所述聚类语义单元相匹配的多个原生语义单元确定为与所述多个对象对应的多语种属性描述信息。The third determining unit is configured to determine each of the clustering semantic units and multiple native semantic units matching the clustering semantic units as the multilingual attribute description information corresponding to the multiple objects.
在一示例性实施例中,所述语义向量生成单元包括:In an exemplary embodiment, the semantic vector generation unit includes:
第一词向量确定单元,被配置为基于每个所述目标语义单元中每个词语的词向量,得到所述目标语义单元包含的词向量;The first word vector determination unit is configured to obtain the word vector contained in the target semantic unit based on the word vector of each word in each of the target semantic units;
平均值计算单元,被配置为对所述目标语义单元包含的词向量取平均值,得到所述目标语义单元对应的所述语义向量;an average calculation unit configured to average the word vectors contained in the target semantic unit to obtain the semantic vector corresponding to the target semantic unit;
第二词向量确定单元,被配置为基于各目标语义单元对应的所述语义向量,得到与所述多个目标语义单元对应的语义向量。The second word vector determining unit is configured to obtain semantic vectors corresponding to the plurality of target semantic units based on the semantic vectors corresponding to each target semantic unit.
在一示例性实施例中,所述第一确定单元包括:In an exemplary embodiment, the first determination unit includes:
中心语义向量确定单元,被配置为确定每个所述目标类的中心语义向量;a central semantic vector determining unit configured to determine the central semantic vector of each of the target classes;
候选语义向量确定单元,被配置为基于每个所述目标类中的各语义向量与所述中心语义向量的距离,确定每个所述目标类的候选语义向量;a candidate semantic vector determining unit configured to determine a candidate semantic vector for each of the target classes based on the distance between each semantic vector in each of the target classes and the central semantic vector;
候选语义单元确定单元,被配置为根据每个所述目标类的候选语义向量对应的目标语义单元,得到多个候选语义单元;The candidate semantic unit determining unit is configured to obtain a plurality of candidate semantic units according to the target semantic unit corresponding to the candidate semantic vector of each target class;
第一数量确定单元,被配置为基于所述语义单元匹配组,确定与每个候选语义单元相匹配的原生语义单元的数量;The first quantity determining unit is configured to determine the quantity of native semantic units matching each candidate semantic unit based on the semantic unit matching group;
聚类语义单元确定单元,被配置为基于与每个候选语义单元相匹配的原生语义单元的数量,从所述候选语义单元中确定出所述聚类语义单元。The clustering semantic unit determining unit is configured to determine the clustering semantic unit from the candidate semantic units based on the number of native semantic units matching each candidate semantic unit.
在一示例性实施例中,所述信息抽取装置还包括:In an exemplary embodiment, the information extraction device further includes:
遍历单元,被配置为响应于确定每个对象的属性描述信息,遍历每个所述聚类语义单元,基于每个所述聚类语义单元执行以下操作:The traversal unit is configured to, in response to determining the attribute description information of each object, traverse each of the clustering semantic units, and perform the following operations based on each of the clustering semantic units:
查找单元,被配置为在所述对象的原生评价信息中查找当前聚类语义单元;a search unit configured to search for the current clustering semantic unit in the original evaluation information of the object;
第四确定单元,被配置为响应于所述对象的原生评价信息中包含所述当前聚类语义信息,将所述当前聚类语义单元确定为所述对象的属性描述信息。The fourth determining unit is configured to determine the current clustering semantic unit as the attribute description information of the object in response to the original evaluation information of the object containing the current clustering semantic information.
在一示例性实施例中,所述信息抽取装置还包括:In an exemplary embodiment, the information extraction device further includes:
情感值确定单元,被配置为对于所述对象的每项属性描述信息,确定所述属性描述信息的情感值;an emotional value determination unit configured to determine the emotional value of the attribute description information for each item of attribute description information of the object;
第五确定单元,被配置为将所述对象的原生评价信息中包含所述属性描述信息,与所述属性描述信息的情感值一致的原生评价信息确定为与所述属性描述信息相匹配的原生评价信息;The fifth determination unit is configured to determine that the native evaluation information of the object includes the attribute description information and is consistent with the emotional value of the attribute description information as the original evaluation information that matches the attribute description information. evaluation information;
第一挂载单元,被配置为将与所述属性描述信息相匹配的原生评价信息挂载到所述属性描述信息中。The first mounting unit is configured to mount native evaluation information matching the attribute description information into the attribute description information.
在一示例性实施例中,所述信息抽取装置还包括:In an exemplary embodiment, the information extraction device further includes:
第二数量单元,被配置为统计所述对象的各项属性描述信息所挂载的原生评价信息的数量;The second quantity unit is configured to count the number of native evaluation information loaded by each attribute description information of the object;
排序单元,被配置为基于所挂载的原生评价信息的数量由大到小的顺序,对各项属性描述信息进行排序;The sorting unit is configured to sort the attribute description information based on the descending order of the number of native evaluation information mounted;
相似度计算单元,被配置为对所述对象的各项属性描述信息进行相似度计算;A similarity calculation unit configured to perform similarity calculations on each attribute description information of the object;
相似属性信息对确定单元,被配置为基于相似度计算结果,确定相似属性信息对;每项所述相似属性信息对中包括相似度大于预设值的两项属性描述信息;The similar attribute information pair determination unit is configured to determine similar attribute information pairs based on similarity calculation results; each similar attribute information pair includes two items of attribute description information whose similarity is greater than a preset value;
第二挂载单元,被配置为将所述相似属性信息对中排序在后的属性描述信息对应的原生评价信息,挂载到所述相似属性信息对中排序在前的属性描述信息对应的原生评价信息中。The second mounting unit is configured to mount the native evaluation information corresponding to the attribute description information ranked last in the similar attribute information pair to the native evaluation information corresponding to the attribute description information ranked first in the similar attribute information pair. evaluation information.
在一示例性实施例中,所述语义单元拆分单元包括:In an exemplary embodiment, the semantic unit splitting unit includes:
第一拆分单元,被配置为对所述原生评价信息进行语义单元拆分,得到多个第一语义单元;The first splitting unit is configured to split the original evaluation information into semantic units to obtain multiple first semantic units;
第一去重单元,被配置为对所述多个第一语义单元进行去重,得到所述多个原生语义单元;The first deduplication unit is configured to deduplicate the plurality of first semantic units to obtain the plurality of original semantic units;
第二拆分单元,被配置为对所述目标评价信息进行语义单元拆分,得到多个第二语义单元;The second splitting unit is configured to split the target evaluation information into semantic units to obtain a plurality of second semantic units;
第二去重单元,被配置为对所述多个第二语义单元进行去重,得到所述多个目标语义单元。The second deduplication unit is configured to deduplicate the plurality of second semantic units to obtain the plurality of target semantic units.
根据本公开实施例的第三方面,提供一种电子设备,包括:处理器;用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令,以实现如上所述的信息抽取方法。According to a third aspect of an embodiment of the present disclosure, there is provided an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to implement Information extraction methods as described above.
根据本公开实施例的第四方面,提供一种非易失性计算机可读存储介质,当所述计算机可读存储介质中的指令由服务器的处理器执行时,使得服务器能够执行如上所述的信息抽取方法。According to a fourth aspect of the embodiments of the present disclosure, there is provided a non-volatile computer-readable storage medium. When instructions in the computer-readable storage medium are executed by a processor of a server, the server can execute the above-mentioned information extraction method.
根据本公开实施例的第五方面,提供一种计算机程序产品,所述计算机程序产品包括计算机程序,所述计算机程序存储在可读存储介质中,计算机设备的至少一个处理器从所述可读存储介质读取并执行所述计算机程序,使得设备执行上述的信息抽取方法。According to a fifth aspect of the embodiments of the present disclosure, a computer program product is provided, the computer program product includes a computer program, the computer program is stored in a readable storage medium, at least one processor of a computer device reads from the The storage medium reads and executes the computer program, so that the device executes the above information extraction method.
本公开通过对多个对象的原生评价信息进行语种转换,得到相应的目标评价信息,通过将多语种的评价信息转换为统一的目标语种的评价信息,能够提高后续基于目标评价信息进行处理的便利性;再对多个对象的原生评价信息以及目标评价信息进行语义单元拆分,基于语义单元拆分结果构建语义单元匹配组,每个所述语义单元匹配组中包括一个目标语义单元,以及与所述目标语义单元具有相同语义的多个原生语义单元;所述多个原生语义单元对应不同的语种,从而具有相同语义的不同语种的语义单元具有匹配关系;然后基于对所述多个目标语义单元的语义聚类结果,以及所述语义单元匹配组,得到与所述多个对象对应的多语种属性描述信息。本公开中的多语种属性描述信息均是从原生评价信息中提取出来的,从而提高了多语言本地化表达的效果,并 且能够避免基于机器翻译带来的翻译不准确的事实,从而提高了多语种属性描述信息表达的准确性。This disclosure converts the original evaluation information of multiple objects to obtain corresponding target evaluation information, and converts multilingual evaluation information into evaluation information in a unified target language, which can improve the convenience of subsequent processing based on target evaluation information performance; then the original evaluation information and target evaluation information of multiple objects are split into semantic units, and a semantic unit matching group is constructed based on the semantic unit splitting results, each of which includes a target semantic unit, and The target semantic unit has multiple native semantic units with the same semantics; the multiple native semantic units correspond to different languages, so that semantic units of different languages with the same semantics have a matching relationship; then based on the multiple target semantics Semantic clustering results of the units, and the semantic unit matching group, to obtain multilingual attribute description information corresponding to the plurality of objects. The multilingual attribute description information in this disclosure is extracted from the original evaluation information, thereby improving the effect of multilingual localization expression, and avoiding the fact that the translation is inaccurate based on machine translation, thereby improving the multilingual Language attribute describes the accuracy of information expression.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理,并不构成对本公开的不当限定。The accompanying drawings here are incorporated into the specification and constitute a part of the specification, show embodiments consistent with the disclosure, and are used together with the description to explain the principle of the disclosure, and do not constitute an improper limitation of the disclosure.
图1是根据一示例性实施例示出的一种实施环境示意图。Fig. 1 is a schematic diagram showing an implementation environment according to an exemplary embodiment.
图2是根据一示例性实施例示出的一种信息抽取方法流程图。Fig. 2 is a flowchart of an information extraction method according to an exemplary embodiment.
图3是根据一示例性实施例示出的一种语种转换示意图。Fig. 3 is a schematic diagram of language conversion according to an exemplary embodiment.
图4是根据一示例性实施例示出的一种语义单元拆分方法流程图。Fig. 4 is a flowchart of a semantic unit splitting method according to an exemplary embodiment.
图5是根据一示例性实施例示出的一种语义单元拆分示意图。Fig. 5 is a schematic diagram showing semantic unit splitting according to an exemplary embodiment.
图6是根据一示例性实施例示出的一种多语种短句匹配过程示意图。Fig. 6 is a schematic diagram of a multilingual phrase matching process according to an exemplary embodiment.
图7是根据一示例性实施例示出的一种多语种短句匹配表示意图。Fig. 7 is a schematic diagram showing a multilingual phrase matching table according to an exemplary embodiment.
图8是根据一示例性实施例示出的一种语义聚类方法流程图。Fig. 8 is a flowchart of a semantic clustering method according to an exemplary embodiment.
图9是根据一示例性实施例示出的与多个对象对应的多语种属性描述信息的确定方法流程图。Fig. 9 is a flow chart of a method for determining multilingual attribute description information corresponding to multiple objects according to an exemplary embodiment.
图10是根据一示例性实施例示出的一种语义向量生成方法流程图。Fig. 10 is a flowchart showing a method for generating semantic vectors according to an exemplary embodiment.
图11是根据一示例性实施例示出的一种聚类语义单元确定方法流程图。Fig. 11 is a flowchart showing a method for determining clustering semantic units according to an exemplary embodiment.
图12是根据一示例性实施例示出的对每个对象确定相应属性描述信息的方法流程图。Fig. 12 is a flowchart of a method for determining corresponding attribute description information for each object according to an exemplary embodiment.
图13是根据一示例性实施例示出的一种评价挂载方法流程图。Fig. 13 is a flow chart of a method for evaluating and mounting according to an exemplary embodiment.
图14是根据一示例性实施例示出的一种属性描述信息归并方法流程图。Fig. 14 is a flowchart of a method for merging attribute description information according to an exemplary embodiment.
图15是根据一示例性实施例示出的一种信息抽取装置示意图。Fig. 15 is a schematic diagram of an information extraction device according to an exemplary embodiment.
图16是根据一示例性实施例示出的一种设备结构示意图。Fig. 16 is a schematic structural diagram of a device according to an exemplary embodiment.
具体实施方式Detailed ways
为了使本领域普通人员更好地理解本公开的技术方案,下面将结合附图,对本公开实施例中的技术方案进行清楚、完整地描述。In order to enable ordinary persons in the art to better understand the technical solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings.
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。It should be noted that the terms "first" and "second" in the specification and claims of the present disclosure and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present disclosure as recited in the appended claims.
首先对本公开实施例中涉及的相关名词做以下说明:Firstly, the relevant nouns involved in the embodiments of the present disclosure are described as follows:
聚类:将文本按照语义合并为多个类的过程。Clustering: The process of semantically merging text into multiple classes.
原生评价:用户真实输入的评价文本。Native evaluation: the actual evaluation text input by the user.
词向量(Word Embedding):又叫Word嵌入,是自然语言处理中语言模型和表征学习技术的统称。概念上而言,它是指把一个维数为所有词的高维空间嵌入到一个维数较低的连续向量空间中,每个单词或词组被映射为实数域上的向量。Word Embedding: Also known as Word Embedding, it is a general term for language models and representation learning techniques in natural language processing. Conceptually, it refers to embedding a high-dimensional space with all words in a dimension into a continuous vector space with a lower dimension, and each word or phrase is mapped to a vector on the real number field.
相关技术中,大家印象词主要是通过人工加算法辅助的方式生成,且在大家印象词的多语种表达上是借助机器翻译来完成的,经过机器翻译得到的多语种大家印象词的准确率受限于机器翻译的效果,并且大家印象词的语言本地化表达效果不佳。In the related technology, everyone's impression words are mainly generated by means of manual addition and algorithm assistance, and the multilingual expression of our impression words is completed with the help of machine translation. The accuracy of the multilingual everyone's impression words obtained through machine translation is affected by It is limited to the effect of machine translation, and the language localization expression of everyone's impression words is not good.
本公开提供一种信息抽取方法、装置、设备及存储介质。The disclosure provides an information extraction method, device, equipment and storage medium.
请参阅图1,其示出了本公开实施例提供的实施环境示意图,该实施环境可包括:至少一个第一终端110和第二终端120,所述第一终端110和所述第二终端120可通过网络进行数据通信。Please refer to FIG. 1, which shows a schematic diagram of an implementation environment provided by an embodiment of the present disclosure. The implementation environment may include: at least one first terminal 110 and a second terminal 120, the first terminal 110 and the second terminal 120 Data communication is possible via the network.
在一些实施例中,通过第一终端110可发布对相关对象平台中多个对象的评价信息;第二终端120可获取对多个对象的评价信息,对多个对象的评价信息进行文本分析以及信息提取,生成与每个对象对应的属性描述信息;从而在通过第一终端110浏览某个对象的评价信息的情况下,可以展示与该对象对应的属性描述信息。In some embodiments, the first terminal 110 can publish evaluation information on multiple objects in the relevant object platform; the second terminal 120 can obtain evaluation information on multiple objects, perform text analysis on the evaluation information of multiple objects, and Information extraction is to generate attribute description information corresponding to each object; thus, when the evaluation information of an object is browsed through the first terminal 110, the attribute description information corresponding to the object can be displayed.
第一终端110可以基于浏览器/服务器模式(Browser/Server,B/S)或客户端/服务器模式(Client/Server,C/S)与第二终端120进行通信。所述第一终端110可以包括:智能手机、平板电脑、笔记本电脑、数字助理、智能可穿戴设备、车载终端、服务器等类型的实体设备,也可以 包括运行于实体设备中的软体,例如应用程序等。本公开实施例中的第一终端110上运行的操作系统可以包括但不限于安卓系统、IOS系统、linux、windows等。The first terminal 110 may communicate with the second terminal 120 based on a browser/server mode (Browser/Server, B/S) or a client/server mode (Client/Server, C/S). The first terminal 110 may include physical devices such as smart phones, tablet computers, notebook computers, digital assistants, smart wearable devices, vehicle terminals, servers, etc., and may also include software running on physical devices, such as application programs wait. The operating system running on the first terminal 110 in the embodiment of the present disclosure may include but not limited to Android system, IOS system, linux, windows and so on.
第二终端120与第一终端110可以通过有线或者无线建立通信连接,所述第二终端120可以包括一个独立运行的服务器,或者分布式服务器,或者由多个服务器组成的服务器集群,其中服务器可以是云端服务器。The second terminal 120 and the first terminal 110 can establish a communication connection through wired or wireless, and the second terminal 120 can include an independently operated server, or a distributed server, or a server cluster composed of multiple servers, wherein the server can It is a cloud server.
为了避免多语种大家印象词的准确率受限于机器翻译的效果,并且大家印象词的语言本地化表达效果不佳的事实,本公开实施例提供了一种信息抽取方法,其执行主体可以为图1中的第二终端,可以为服务器,请参阅图2,该信息抽取方法可以包括步骤S210:对多个对象的原生评价信息进行语种转换,得到与每条原生评价信息对应的目标评价信息;其中,所述多个对象的原生评价信息中包括多语种的原生评价信息;步骤S220:对所述原生评价信息和所述目标评价信息进行语义单元拆分,得到多个原生语义单元和多个目标语义单元;步骤S230:构建语义单元匹配组;其中每个所述语义单元匹配组中包括一个目标语义单元,以及与所述目标语义单元具有相同语义的多个原生语义单元;所述多个原生语义单元对应不同的语种;步骤S240:基于对所述多个目标语义单元的语义聚类结果,以及所述语义单元匹配组,得到与所述多个对象对应的多语种属性描述信息。In order to avoid the fact that the accuracy of multilingual public impression words is limited by the effect of machine translation, and the language localization expression effect of public impression words is not good, the embodiment of the present disclosure provides an information extraction method, the executive body of which can be The second terminal in Figure 1 may be a server, please refer to Figure 2, the information extraction method may include step S210: perform language conversion on the original evaluation information of multiple objects, and obtain target evaluation information corresponding to each original evaluation information ; Wherein, the original evaluation information of the plurality of objects includes multilingual original evaluation information; Step S220: Splitting the semantic units of the original evaluation information and the target evaluation information to obtain multiple original semantic units and multiple target semantic unit; step S230: construct a semantic unit matching group; wherein each of the semantic unit matching groups includes a target semantic unit, and a plurality of original semantic units having the same semantics as the target semantic unit; the multiple Native semantic units correspond to different languages; Step S240: Obtain multilingual attribute description information corresponding to the plurality of objects based on the semantic clustering results of the plurality of target semantic units and the matching group of the semantic units.
S210.对多个对象的原生评价信息进行语种转换,得到与每条原生评价信息对应的目标评价信息;其中,所述多个对象的原生评价信息中包括多语种的原生评价信息。S210. Perform language conversion on native evaluation information of multiple objects to obtain target evaluation information corresponding to each piece of native evaluation information; wherein, the native evaluation information of multiple objects includes native evaluation information in multiple languages.
多个对象的原生评价信息中包括多语种的原生评价信息可以是指,不同对象的原生评价信息可以包含相同语种的原生评价信息,也可以包含不同语种的原生评价信息,即不同对象对应的原生评价的语种数量不同。The native evaluation information of multiple objects includes native evaluation information in multiple languages. It may mean that the native evaluation information of different objects may include native evaluation information in the same language, or may include native evaluation information in different languages, that is, the native evaluation information corresponding to different objects. The number of languages of the evaluation varies.
例如,对象1的原生评价信息可包含语种1、语种2的原生评价信息,对象2的原生评价信息可包含语种2、语种3的原生评价信息,从而对象1和对象2具有相同语种2的原生评价信息,具有不同语种1、语种3的原生评价信息。其中,具有相同语种的原生评价信息只是语种相同,但相应的评价内容不一定相同。For example, the native evaluation information of object 1 may include native evaluation information of language 1 and language 2, and the native evaluation information of object 2 may include native evaluation information of language 2 and language 3, so that object 1 and object 2 have the same native evaluation information of language 2 The evaluation information has original evaluation information in different languages 1 and 3. Among them, the original evaluation information with the same language is only in the same language, but the corresponding evaluation content is not necessarily the same.
在对原生评价信息进行语种转换之前还可对原生评价信息进行预处理,可包括:首先对原生评价信息进行语种识别,得到原生评价的真实语种,接着进行特殊字符处理,去掉文本中无意义的字符,最后再进行单词拼写检查,纠正错误单词,得到较为规范的文本数据,为后续算法的实施做准备工作。Before the language conversion of the original evaluation information, the original evaluation information can also be preprocessed, which can include: firstly, the language identification of the original evaluation information is carried out to obtain the real language of the original evaluation, and then special character processing is performed to remove meaningless words in the text Characters, and finally check the spelling of words, correct wrong words, get more standardized text data, and prepare for the implementation of subsequent algorithms.
在一些实施例中,目标语种可以为英语,请参阅图3,其示出了一种语种转换示意图,从图3可以看出,将语种为俄语的原生评价信息1翻译成了对应的英文评价信息1;同样地,对于语种为西语的原生评价信息2,可翻译成对应的英文评价信息2。In some embodiments, the target language can be English. Please refer to FIG. 3, which shows a schematic diagram of language conversion. It can be seen from FIG. 3 that the original evaluation information 1 whose language is Russian is translated into a corresponding English evaluation. Information 1; similarly, for native evaluation information 2 whose language is Spanish, it can be translated into corresponding English evaluation information 2.
S220.对所述原生评价信息和所述目标评价信息进行语义单元拆分,得到多个原生语义单元和多个目标语义单元。S220. Perform semantic unit splitting on the native evaluation information and the target evaluation information to obtain multiple native semantic units and multiple target semantic units.
请参阅图4,其示出了一种语义单元拆分方法,包括以下步骤S410至步骤S440。Please refer to FIG. 4 , which shows a semantic unit splitting method, including the following steps S410 to S440.
S410.对所述原生评价信息进行语义单元拆分,得到多个第一语义单元。S410. Perform semantic unit splitting on the native evaluation information to obtain multiple first semantic units.
S420.对所述多个第一语义单元进行去重,得到所述多个原生语义单元。S420. Deduplicate the multiple first semantic units to obtain the multiple original semantic units.
S430.对所述目标评价信息进行语义单元拆分,得到多个第二语义单元。S430. Perform semantic unit splitting on the target evaluation information to obtain multiple second semantic units.
S440.对所述多个第二语义单元进行去重,得到所述多个目标语义单元。S440. Deduplicate the multiple second semantic units to obtain the multiple target semantic units.
本公开实施例中,一个语义单元可以为一个短句。在原生评价信息中,用户容易将多个评价目标合并在一个评价子句中,从而使得相应的目标评价信息也会出现并列句的情况,造成文本聚类效果下降;例如,在电商场景下,对象可以为服饰,对于服饰的评价目标可以为材质的质量、做工以及物流评价等。本公开通过连词识别出评价信息中并列的评价目标,然后通过语法规则将评价信息拆分成多个完整的子句。另外,由于是对多项原生评价信息以及多项目标评价信息进行语义拆分,会存在经过拆分之后得到的语义单元重复的情况,此时可以对语义单元进行去重操作,能够避免语义单元的冗余,提高数据处理效率。In the embodiment of the present disclosure, a semantic unit may be a short sentence. In the original evaluation information, users easily combine multiple evaluation targets into one evaluation clause, so that the corresponding target evaluation information will also appear in parallel sentences, resulting in a decline in the text clustering effect; for example, in the e-commerce scenario , the object can be clothing, and the evaluation targets for clothing can be material quality, workmanship, and logistics evaluation. The disclosure recognizes parallel evaluation targets in the evaluation information through conjunctions, and then splits the evaluation information into multiple complete clauses through grammatical rules. In addition, due to the semantic splitting of multiple original evaluation information and multiple target evaluation information, there will be situations where the semantic units obtained after splitting are repeated. At this time, the semantic unit can be deduplicated to avoid semantic unit Redundancy improves data processing efficiency.
请参阅图5,其示出了一种语义单元拆分示意图,从图5可以看出,对于原英文评价信息“High quality sewing and material”,被拆分为“High quality sewing”和“High quality material”,这里的评价目标即为“sewing”和“material”。Please refer to Figure 5, which shows a schematic diagram of semantic unit splitting. It can be seen from Figure 5 that the original English evaluation information "High quality sewing and material" is split into "High quality sewing" and "High quality sewing" material", the evaluation targets here are "sewing" and "material".
S230.构建语义单元匹配组;其中每个所述语义单元匹配组中包括一个目标语义单元,以及与所述目标语义单元具有相同语义的多个原生语义单元;所述多个原生语义单元对应不同的语种。S230. Construct a semantic unit matching group; each of the semantic unit matching groups includes a target semantic unit and a plurality of native semantic units having the same semantics as the target semantic unit; the plurality of native semantic units correspond to different language.
本公开可采用Champollion算法进行文本对齐,得到每个对象的原生语义单元和目标语义单元的匹配对,其中每个语义单元匹配组中包含具有相同语义的多个不同语种的语义单元。In the present disclosure, the Champollion algorithm can be used for text alignment to obtain a matching pair of the original semantic unit and the target semantic unit of each object, wherein each semantic unit matching group includes multiple semantic units of different languages with the same semantic meaning.
请参阅图6,其示出了一种多语种短句匹配过程示意图,其中,基于相同的语义,将原生语义单元和相应的目标语义单元对应起来,形成相应的语义单元匹配关系;每个匹配对(pair对)中包括一个原生语义单元和一个目标语义单元。Please refer to Figure 6, which shows a schematic diagram of a multilingual phrase matching process, wherein, based on the same semantics, the original semantic unit and the corresponding target semantic unit are mapped to form a corresponding semantic unit matching relationship; each matching A pair (pair) includes a native semantic unit and a target semantic unit.
请参阅图7,其示出了一种多语种短句匹配表示意图,对于图6中的多个匹配对,存在语义相同的匹配对,可基于语义相同的匹配对生成语义单元匹配组;例如图6中的pair(Rc_k,Tm_n),以及pair(Se_f,Tc_j),其中Tm_n=Tc_j,即目标语义单元相同,那么相应的Rc_k和Se_f也具有相同的语义,Rc_k和Se_f对应不同语种,从而可构建(Tm_n,Rc_k,Se_f)这一语义单元匹配组。基于多个语义单元匹配组可形成语义单元匹配表,该语义单元匹配表可为后续的多语种表达提供索引。Please refer to FIG. 7, which shows a schematic diagram of a multilingual phrase matching table. For multiple matching pairs in FIG. 6, there are semantically identical matching pairs, and semantic unit matching groups can be generated based on semantically identical matching pairs; for example pair(Rc_k, Tm_n) in Figure 6, and pair(Se_f, Tc_j), where Tm_n=Tc_j, that is, the target semantic unit is the same, then the corresponding Rc_k and Se_f also have the same semantics, Rc_k and Se_f correspond to different languages, thus The semantic unit matching group (Tm_n, Rc_k, Se_f) can be constructed. A semantic unit matching table can be formed based on multiple semantic unit matching groups, and the semantic unit matching table can provide indexes for subsequent multilingual expressions.
由于不同对象之间的原生评价信息之间内容不同,例如有些对象的原生评价较少,或者只有一种语种的原生评价信息,无法通过自身的原生评价信息构建多语种的语义单元匹配关系,在本公开中,可基于各对象的匹配对进行匹配,通过对象之间评价信息的互补,构建多语种语义单元匹配关系。Due to the different content of native evaluation information between different objects, for example, some objects have fewer native evaluations, or only one language of native evaluation information, it is impossible to construct a multilingual semantic unit matching relationship through its own native evaluation information. In the present disclosure, matching can be performed based on matching pairs of each object, and a multilingual semantic unit matching relationship can be constructed by complementing evaluation information between objects.
S240.基于对所述多个目标语义单元的语义聚类结果,以及所述语义单元匹配组,得到与所述多个对象对应的多语种属性描述信息。S240. Based on the semantic clustering results of the multiple target semantic units and the semantic unit matching group, obtain multilingual attribute description information corresponding to the multiple objects.
请参阅图8,其示出了一种语义聚类方法,包括以下步骤S810至步骤S830。Please refer to FIG. 8 , which shows a semantic clustering method, including the following steps S810 to S830.
S810.生成与所述多个目标语义单元对应的语义向量。S810. Generate semantic vectors corresponding to the multiple target semantic units.
S820.对所述与所述多个目标语义单元对应的语义向量进行语义聚类,得到多个目标类。S820. Perform semantic clustering on the semantic vectors corresponding to the multiple target semantic units to obtain multiple target classes.
S830.基于每个所述目标类中的语义向量,以及所述语义单元匹配组,从所述多个目标语义单元中确定出多个聚类语义单元。S830. Based on the semantic vectors in each of the target classes and the semantic unit matching group, determine a plurality of clustered semantic units from the plurality of target semantic units.
本公开中可采用K-means聚类算法得到每个句子所属的类别。K-means聚类算法是一种常见的无监督高效的聚类算法,利用该算法,便可以将相同语义的语义单元聚到同一个类中,其中K-means算法中的K是通过轮廓系数来确定的。In the present disclosure, the K-means clustering algorithm can be used to obtain the category to which each sentence belongs. The K-means clustering algorithm is a common unsupervised and efficient clustering algorithm. With this algorithm, the semantic units of the same semantics can be clustered into the same class, where K in the K-means algorithm is obtained through the contour coefficient to be sure.
本公开中对目标语义单元的聚类可基于相应的语义向量来实现,由于语义向量能够充分体现相应语义单元的特征信息且计算方便,从而提高了语义单元聚类的准确性和便利性。The clustering of the target semantic units in the present disclosure can be realized based on the corresponding semantic vectors, since the semantic vectors can fully reflect the characteristic information of the corresponding semantic units and are easy to calculate, thus improving the accuracy and convenience of the semantic unit clustering.
请参阅图9,其示出了与多个对象对应的多语种属性描述信息的确定方法,包括步骤S910和步骤S920。Please refer to FIG. 9 , which shows a method for determining multilingual attribute description information corresponding to multiple objects, including steps S910 and S920.
S910.基于所述语义单元匹配组,确定与每项所述聚类语义单元相匹配的多个原生语义单元。S910. Based on the semantic unit matching group, determine a plurality of native semantic units that match each of the clustered semantic units.
S920.将每项所述聚类语义单元,以及与所述聚类语义单元相匹配的多个原生语义单元确定为与所述多个对象对应的多语种属性描述信息。S920. Determine each of the clustering semantic units and multiple native semantic units matching the clustering semantic units as multilingual attribute description information corresponding to the multiple objects.
这里的属性描述信息可用于表征对多个对象特征的提炼和概括信息,其能够体现对象的特征信息,通过属性描述信息即可大概了解相应的对象。这里得到的多语种属性描述信息可以是指多个对象的综合属性描述信息,可以是多项属性描述信息,每项属性描述信息包括多个语种且具有相同语义的属性描述信息。The attribute description information here can be used to represent the extraction and generalization information of the characteristics of multiple objects, which can reflect the characteristic information of the object, and the corresponding object can be roughly understood through the attribute description information. The multilingual attribute description information obtained here may refer to comprehensive attribute description information of multiple objects, or multiple attribute description information, and each item of attribute description information includes attribute description information in multiple languages and has the same semantics.
请参阅图10,其示出了一种语义向量生成方法,包括步骤S1010至步骤S1030。Please refer to FIG. 10 , which shows a method for generating a semantic vector, including steps S1010 to S1030.
S1010.基于每个所述目标语义单元中每个词语的词向量,得到所述目标语义单元包含的词向量。S1010. Obtain the word vector contained in the target semantic unit based on the word vector of each word in each target semantic unit.
S1020.对所述目标语义单元包含的词向量取平均值,得到所述目标语义单元对应的所述语义向量。S1020. Average the word vectors included in the target semantic unit to obtain the semantic vector corresponding to the target semantic unit.
S1030.基于各目标语义单元对应的所述语义向量,得到与所述多个目标语义单元对应的语义向量。S1030. Based on the semantic vectors corresponding to each target semantic unit, obtain semantic vectors corresponding to the multiple target semantic units.
本公开在确定目标语义单元的语义向量之前,还需要计算其中每个词语的词向量。通过分词得到每一个目标语义单元所涉及的若干词语,并且进行分词和词性标注,按照类目将语义单元放入到Word2Vec模型中,训练出每一个单词的词向量,同时将分词的词语在词向量表中进行索引;对于每个目标语义单元,将其中包含的词语在词向量表中进行检索,得到每个目标语义单元所包含的向量组合,最后通过对向量组合中所有词语词向量取均值,得到该目标语义单元的语义向量表达。基于预先生成的词向量来计算目标语义单元的语义向量,能够提高语义向量计算的准确性和便利性。In the present disclosure, before determining the semantic vector of the target semantic unit, the word vector of each word in the target semantic unit needs to be calculated. Obtain a number of words involved in each target semantic unit through word segmentation, and perform word segmentation and part-of-speech tagging. Put the semantic units into the Word2Vec model according to the category, train the word vector of each word, and at the same time put the word segmentation in the word Index in the vector table; for each target semantic unit, retrieve the words contained in it in the word vector table to obtain the vector combination contained in each target semantic unit, and finally take the mean value of all the word vectors in the vector combination , to get the semantic vector expression of the target semantic unit. Computing the semantic vector of the target semantic unit based on the pre-generated word vector can improve the accuracy and convenience of semantic vector calculation.
其中,对于词向量的生成方法还可以采用动态语义向量模型实现。Among them, the generation method of the word vector can also be realized by using a dynamic semantic vector model.
请参阅图11,其示出了一种聚类语义单元确定方法,包括以下步骤S1110至步骤S1150。Please refer to FIG. 11 , which shows a method for determining a clustering semantic unit, including the following steps S1110 to S1150.
S1110.确定每个所述目标类的中心语义向量。S1110. Determine the center semantic vector of each target class.
S1120.基于每个所述目标类中的各语义向量与所述中心语义向量的距离,确定每个所述目标 类的候选语义向量。S1120. Based on the distance between each semantic vector in each of the target classes and the central semantic vector, determine a candidate semantic vector for each of the target classes.
S1130.根据每个所述目标类的候选语义向量对应的目标语义单元,得到多个候选语义单元。S1130. Obtain a plurality of candidate semantic units according to the target semantic units corresponding to the candidate semantic vectors of each target class.
S1140.基于所述语义单元匹配组,确定与每个候选语义单元相匹配的原生语义单元的数量。S1140. Based on the semantic unit matching group, determine the number of native semantic units matching each candidate semantic unit.
S1150.基于与每个候选语义单元相匹配的原生语义单元的数量,从所述候选语义单元中确定出所述聚类语义单元。S1150. Based on the number of native semantic units matching each candidate semantic unit, determine the cluster semantic unit from the candidate semantic units.
在每个目标类中,首先可确定相应的中心语义向量,然后计算该目标类中其他语义向量与该中心语义向量的距离,并基于与中心语义向量的距离由近及远对语义向量进行排序,例如,选择排序靠前10%的语义向量作为与该目标类对应的候选语义向量。In each target class, the corresponding central semantic vector can be determined first, and then the distance between other semantic vectors in the target class and the central semantic vector can be calculated, and the semantic vectors can be sorted from near to far based on the distance from the central semantic vector , for example, select the top 10% semantic vectors as the candidate semantic vectors corresponding to the target class.
在得到候选语义向量之后,可得到相应的候选语义单元,基于上述的语义单元匹配组,可确定每个候选语义单元相匹配的原生语义单元的数量。本公开实施例中,选择匹配原生语义单元数量较多的候选语义单元作为聚类语义单元,因为匹配的原生语义单元数量越多,相应的多语种表达的语种类型就越多,从而可实现更多语种的语义单元表达,提高语义单元表达形式的多样性和丰富性。After the candidate semantic vectors are obtained, the corresponding candidate semantic units can be obtained, and based on the above semantic unit matching group, the number of original semantic units matched by each candidate semantic unit can be determined. In the embodiment of the present disclosure, the candidate semantic unit with a larger number of matching native semantic units is selected as the clustering semantic unit, because the more the number of matching native semantic units, the more language types the corresponding multilingual expression can achieve. Multilingual semantic unit expression improves the diversity and richness of semantic unit expression.
请参阅图12,其示出了对每个对象确定相应属性描述信息的方法,包括步骤S1210至步骤S1230。Please refer to FIG. 12 , which shows a method for determining corresponding attribute description information for each object, including steps S1210 to S1230.
S1210.响应于确定每个对象的属性描述信息,遍历每个所述聚类语义单元。S1210. In response to determining the attribute description information of each object, traverse each of the clustering semantic units.
S1220.在所述对象的原生评价信息中查找当前聚类语义单元。S1220. Find the current clustering semantic unit in the original evaluation information of the object.
S1230.响应于所述对象的原生评价信息中包含所述当前聚类语义信息,将所述当前聚类语义单元确定为所述对象的属性描述信息。S1230. In response to the original evaluation information of the object including the current cluster semantic information, determine the current cluster semantic unit as attribute description information of the object.
由于上述的聚类语义单元的确定是基于多对象的原生评价信息生成的,从而相应的聚类语义单元是针对多个对象而言的,并不是每个对象均对应上述的多个聚类语义单元,此时需要分别为每个对象进行个性化处理。将每个聚类语义单元与每个对象的原生评价信息进行匹配,从而确定每个对象的属性描述信息,进一步提高了对象属性信息的个性化展示。之所以需要先基于多个对象的原生评价信息生成聚类语义单元,是为了实现对象之间多语种信息表达的互补。Since the above-mentioned clustering semantic unit is determined based on the original evaluation information of multiple objects, the corresponding clustering semantic unit is for multiple objects, and not every object corresponds to the above-mentioned multiple clustering semantics Units, at this point need to be personalized for each object separately. Each clustering semantic unit is matched with the original evaluation information of each object to determine the attribute description information of each object, which further improves the personalized display of object attribute information. The reason why it is necessary to generate clustering semantic units based on the original evaluation information of multiple objects is to realize the complementarity of multilingual information expressions between objects.
请参阅图13,其示出了一种评价挂载方法,包括步骤S1310至步骤S1330。Please refer to FIG. 13 , which shows a method for evaluating and mounting, including steps S1310 to S1330.
S1310.对于所述对象的每项属性描述信息,确定所述属性描述信息的情感值。S1310. For each item of attribute description information of the object, determine the sentiment value of the attribute description information.
S1320.将所述对象的原生评价信息中包含所述属性描述信息,与所述属性描述信息的情感值一致的原生评价信息确定为与所述属性描述信息相匹配的原生评价信息。S1320. Determine the original evaluation information of the object that includes the attribute description information and is consistent with the sentiment value of the attribute description information as the original evaluation information that matches the attribute description information.
S1330.将与所述属性描述信息相匹配的原生评价信息挂载到所述属性描述信息中。S1330. Mount native evaluation information matching the attribute description information into the attribute description information.
情感值可包括正向、负向,以及中性,在进行评价信息挂载的情况下,基于情感值一致这一前提能够提高挂载的准确定;通过属性描述信息能够使得用户快速对当前对象有一个大致了解,为了能够进一步获取详细的评价信息,可将每项属性描述信息与相应的评价信息进行挂载,实现了对评价信息的分类,能够进行评价信息的分类获取;基于属性描述信息便可获取与该属性描述信息相关的评价信息,提高了评价信息获取的便利性。Emotional values can include positive, negative, and neutral. In the case of mounting evaluation information, the accuracy of mounting can be improved based on the premise that the emotional value is consistent; the user can quickly identify the current object through attribute description information. Have a general understanding, in order to further obtain detailed evaluation information, you can mount each attribute description information and corresponding evaluation information, realize the classification of evaluation information, and be able to classify and obtain evaluation information; based on attribute description information Then the evaluation information related to the attribute description information can be obtained, which improves the convenience of obtaining the evaluation information.
请参阅图14,其示出了一种属性描述信息归并方法,包括步骤S1410至步骤S1470。Please refer to FIG. 14 , which shows a method for merging attribute description information, including steps S1410 to S1470.
S1410.对所述对象的各项属性描述信息中任意两项属性描述信息进行相似度计算。S1410. Perform similarity calculation on any two items of attribute description information of the object.
S1420.判断所述对象的各项属性描述信息中是否存在相似的属性描述信息;在所述对象的各项属性描述信息中存在相似的属性描述信息的情况下,执行步骤S1430;在所述对象的各项属性描述信息中不存在相似的属性描述信息的情况下,执行步骤S1470。S1420. Determine whether there is similar attribute description information in the attribute description information of the object; if there is similar attribute description information in the attribute description information of the object, perform step S1430; In the case that there is no similar attribute description information among the attribute description information of each item, step S1470 is executed.
S1430.基于相似度计算结果,确定相似属性信息对;每项所述相似属性信息对中包括相似度大于预设值的两项属性描述信息。S1430. Determine similar attribute information pairs based on similarity calculation results; each similar attribute information pair includes two pieces of attribute description information whose similarity is greater than a preset value.
S1440.统计所述对象的各项属性描述信息所挂载的原生评价信息的数量。S1440. Count the number of native evaluation information loaded by each attribute description information of the object.
S1450.基于所挂载的原生评价信息的数量由大到小的顺序,对各项属性描述信息进行排序。S1450. Based on the descending order of the quantity of the mounted native evaluation information, sort each attribute description information.
S1460.将所述相似属性信息对中排序在后的属性描述信息对应的原生评价信息,挂载到所述相似属性信息对中排序在前的属性描述信息对应的原生评价信息中。S1460. Mount the native evaluation information corresponding to the lower attribute description information in the similar attribute information pair to the native evaluation information corresponding to the higher attribute description information in the similar attribute information pair.
S1470.确定当前属性描述信息为所述对象的属性描述信息。S1470. Determine that the current attribute description information is the attribute description information of the object.
对于每个对象,通过上述方法得到的属性描述信息中包含语义粗细粒度不一致的属性描述信息;例如在电商场景中,相应的属性描述信息可以包括“物流快”、“发货快”、“运输快”等,此时可对这些语义粗细粒度不一致的属性描述信息进行归并,避免了无监督聚类里属性描述信息语义粒度不一致的事实,让属性描述信息语义层级更加一致。For each object, the attribute description information obtained by the above method contains attribute description information with inconsistent semantic granularity; for example, in an e-commerce scenario, the corresponding attribute description information may include "logistics fast", "delivery fast", " At this time, attribute description information with inconsistent semantic granularity can be merged, avoiding the fact that the semantic granularity of attribute description information in unsupervised clustering is inconsistent, and making the semantic level of attribute description information more consistent.
本公开实施例中对属性描述信息进行归并的方法可以包括以下步骤1中步骤3。The method for merging attribute description information in the embodiment of the present disclosure may include step 3 in step 1 below.
1.ESIM模型训练;首先通过开源的数据集和规则捞取的方式得到相似属性信息对,构建出 训练数据集,然后进行ESIM模型的训练。ESIM全称Enhanced Sequential Inference Model,是一种增强序列推断模型,所以本实施例中采用了ESIM模型来做属性描述信息的相似性判断。1. ESIM model training; firstly, similar attribute information pairs are obtained through open source data sets and rules, and a training data set is constructed, and then ESIM model training is performed. The full name of ESIM is Enhanced Sequential Inference Model, which is an enhanced sequence inference model. Therefore, in this embodiment, the ESIM model is used to judge the similarity of attribute description information.
2.属性描述信息相似性判断;将所挂载的原生评价信息的数量由大到小的顺序,对各项属性描述信息进行排序,利用1中的模型判别排序靠前的属性描述信息和排序靠后的属性描述信息的相似关系。2. Judging the similarity of attribute description information; sort the attribute description information in order of the number of mounted native evaluation information from large to small, and use the model in 1 to identify and sort the top attribute description information The latter attributes describe the similarity relationship of information.
3.相似属性描述信息合并;在2中判断了排序靠前的属性描述信息和排序靠后的属性描述信息是相似的的情况下,那么将排序靠后的属性描述信息替换成排序靠前的属性描述信息,并且将排序靠后的属性描述信息对应评价信息挂载到排序靠前的属性描述信息中。3. Merge similar attribute description information; if it is judged in 2 that the attribute description information ranked first and the attribute description information ranked lower are similar, then replace the attribute description information ranked lower with the earlier ranked attribute description information attribute description information, and mount the evaluation information corresponding to the lower-ranked attribute description information to the higher-ranked attribute description information.
重复步骤2和步骤3,直到对象的属性描述信息互不相似为止。Repeat steps 2 and 3 until the attribute description information of the objects is not similar to each other.
本公开实施例中,在用户终端进行属性描述信息展示的情况下,相应的显示语种可以是基于用户自定义确定的,也可以是基于用户终端的定位信息确定的。In the embodiment of the present disclosure, when the user terminal displays the attribute description information, the corresponding display language may be determined based on the user's definition, or may be determined based on the location information of the user terminal.
本公开基于对象维度进行属性描述信息的挖掘,通过对象之间评价信息的差异性提升属性描述信息的个性化程度;在聚类的结果上采用ESIM算法对相似属性描述信息进行合并,避免了无监督聚类里属性描述信息语义粒度不一致的事实,让属性描述信息语义层级更加一致;通过对象之间评价信息内容的互补,构建出原生评价信息与目标评价信息之间的匹配关系,让属性描述信息在多语种的展示上更加本地化。This disclosure mines the attribute description information based on the object dimension, and improves the personalization of the attribute description information through the difference of evaluation information between objects; on the clustering results, the ESIM algorithm is used to merge similar attribute description information, avoiding unnecessary Supervise the fact that the semantic granularity of the attribute description information in the clustering is inconsistent, so that the semantic level of the attribute description information is more consistent; through the complementarity of the evaluation information content between objects, the matching relationship between the original evaluation information and the target evaluation information is constructed, so that the attribute description Information is more localized in multilingual presentations.
本公开通过对多个对象的原生评价信息进行语种转换,得到相应的目标评价信息,通过将多语种的评价信息转换为统一的目标语种的评价信息,能够提高后续基于目标评价信息进行处理的便利性;再对多个对象的原生评价信息以及目标评价信息进行语义单元拆分,基于语义单元拆分结果构建语义单元匹配组,每个所述语义单元匹配组中包括一个目标语义单元,以及与所述目标语义单元具有相同语义的多个原生语义单元;所述多个原生语义单元对应不同的语种,从而具有相同语义的不同语种的语义单元具有匹配关系;然后基于对所述多个目标语义单元的语义聚类结果,以及所述语义单元匹配组,得到与所述多个对象对应的多语种属性描述信息。本公开中的多语种属性描述信息均是从原生评价信息中提取出来的,从而提高了多语言本地化表达的效果,并且能够避免基于机器翻译带来的翻译不准确的事实,从而提高了多语种属性描述信息表达的准确性。This disclosure converts the original evaluation information of multiple objects to obtain corresponding target evaluation information, and converts multilingual evaluation information into evaluation information in a unified target language, which can improve the convenience of subsequent processing based on target evaluation information performance; then the original evaluation information and target evaluation information of multiple objects are split into semantic units, and a semantic unit matching group is constructed based on the semantic unit splitting results, each of which includes a target semantic unit, and The target semantic unit has multiple native semantic units with the same semantics; the multiple native semantic units correspond to different languages, so that semantic units of different languages with the same semantics have a matching relationship; then based on the multiple target semantics Semantic clustering results of the units, and the semantic unit matching group, to obtain multilingual attribute description information corresponding to the plurality of objects. The multilingual attribute description information in this disclosure is extracted from the original evaluation information, thereby improving the effect of multilingual localization expression, and avoiding the fact that the translation is inaccurate based on machine translation, thereby improving the multilingual The language attribute describes the accuracy of information expression.
图15是根据一示例性实施例示出的一种信息抽取装置框图。参照图15,该信息抽取装置包括语种转换单元1510、语义单元拆分单元1520、语义单元匹配组构建单元1530和信息生成单元1540。Fig. 15 is a block diagram of an information extraction device according to an exemplary embodiment. Referring to FIG. 15 , the information extraction device includes a language conversion unit 1510 , a semantic unit splitting unit 1520 , a semantic unit matching group construction unit 1530 and an information generation unit 1540 .
语种转换单元1510,被配置为对多个对象的原生评价信息进行语种转换,得到与每条原生评价信息对应的目标评价信息;其中,所述多个对象的原生评价信息中包括多语种的原生评价信息。The language conversion unit 1510 is configured to perform language conversion on the original evaluation information of multiple objects to obtain target evaluation information corresponding to each piece of original evaluation information; wherein, the original evaluation information of the multiple objects includes native Review information.
语义单元拆分单元1520,被配置为对所述原生评价信息和所述目标评价信息进行语义单元拆分,得到多个原生语义单元和多个目标语义单元。The semantic unit splitting unit 1520 is configured to split the native evaluation information and the target evaluation information into semantic units to obtain multiple native semantic units and multiple target semantic units.
语义单元匹配组构建单元1530,被配置为构建语义单元匹配组;其中每个所述语义单元匹配组中包括一个目标语义单元,以及与所述目标语义单元具有相同语义的多个原生语义单元;所述多个原生语义单元对应不同的语种。The semantic unit matching group construction unit 1530 is configured to construct a semantic unit matching group; wherein each of the semantic unit matching groups includes a target semantic unit and multiple native semantic units having the same semantics as the target semantic unit; The multiple native semantic units correspond to different languages.
信息生成单元1540,被配置为基于对所述多个目标语义单元的语义聚类结果,以及所述语义单元匹配组,得到与所述多个对象对应的多语种属性描述信息。The information generation unit 1540 is configured to obtain multilingual attribute description information corresponding to the multiple objects based on the semantic clustering results of the multiple target semantic units and the semantic unit matching group.
在一示例性实施例中,所述信息抽取装置还包括:In an exemplary embodiment, the information extraction device further includes:
语义向量生成单元,被配置为生成与所述多个目标语义单元对应的语义向量;a semantic vector generating unit configured to generate semantic vectors corresponding to the plurality of target semantic units;
语义聚类单元,被配置为对所述与所述多个目标语义单元对应的语义向量进行语义聚类,得到多个目标类;A semantic clustering unit configured to perform semantic clustering on the semantic vectors corresponding to the multiple target semantic units to obtain multiple target classes;
第一确定单元,被配置为基于每个所述目标类中的语义向量,以及所述语义单元匹配组,从所述多个目标语义单元中确定出多个聚类语义单元。The first determining unit is configured to determine a plurality of clustering semantic units from the plurality of target semantic units based on the semantic vectors in each of the target classes and the semantic unit matching group.
在一示例性实施例中,所述信息生成单元1540包括:In an exemplary embodiment, the information generating unit 1540 includes:
第二确定单元,被配置为基于所述语义单元匹配组,确定与每项所述聚类语义单元相匹配的多个原生语义单元;The second determining unit is configured to determine a plurality of native semantic units matching each of the clustering semantic units based on the semantic unit matching group;
第三确定单元,被配置为将每项所述聚类语义单元,以及与所述聚类语义单元相匹配的多个原生语义单元确定为与所述多个对象对应的多语种属性描述信息。The third determining unit is configured to determine each of the clustering semantic units and multiple native semantic units matching the clustering semantic units as the multilingual attribute description information corresponding to the multiple objects.
在一示例性实施例中,所述语义向量生成单元包括:In an exemplary embodiment, the semantic vector generation unit includes:
第一词向量确定单元,被配置为基于每个所述目标语义单元中每个词语的词向量,得到所述目标语义单元包含的词向量;The first word vector determination unit is configured to obtain the word vector contained in the target semantic unit based on the word vector of each word in each of the target semantic units;
平均值计算单元,被配置为对所述目标语义单元包含的词向量取平均值,得到所述目标语义单元对应的所述语义向量;an average calculation unit configured to average the word vectors contained in the target semantic unit to obtain the semantic vector corresponding to the target semantic unit;
第二词向量确定单元,被配置为基于各目标语义单元对应的所述语义向量,得到与所述多个目标语义单元对应的语义向量。The second word vector determining unit is configured to obtain semantic vectors corresponding to the plurality of target semantic units based on the semantic vectors corresponding to each target semantic unit.
在一示例性实施例中,所述第一确定单元包括:In an exemplary embodiment, the first determination unit includes:
中心语义向量确定单元,被配置为确定每个所述目标类的中心语义向量;a central semantic vector determining unit configured to determine the central semantic vector of each of the target classes;
候选语义向量确定单元,被配置为基于每个所述目标类中的各语义向量与所述中心语义向量的距离,确定每个所述目标类的候选语义向量;a candidate semantic vector determining unit configured to determine a candidate semantic vector for each of the target classes based on the distance between each semantic vector in each of the target classes and the central semantic vector;
候选语义单元确定单元,被配置为根据每个所述目标类的候选语义向量对应的目标语义单元,得到多个候选语义单元;The candidate semantic unit determining unit is configured to obtain a plurality of candidate semantic units according to the target semantic unit corresponding to the candidate semantic vector of each target class;
第一数量确定单元,被配置为基于所述语义单元匹配组,确定与每个候选语义单元相匹配的原生语义单元的数量;The first quantity determining unit is configured to determine the quantity of native semantic units matching each candidate semantic unit based on the semantic unit matching group;
聚类语义单元确定单元,被配置为基于与每个候选语义单元相匹配的原生语义单元的数量,从所述候选语义单元中确定出所述聚类语义单元。The clustering semantic unit determining unit is configured to determine the clustering semantic unit from the candidate semantic units based on the number of native semantic units matching each candidate semantic unit.
在一示例性实施例中,所述信息抽取装置还包括:In an exemplary embodiment, the information extraction device further includes:
遍历单元,被配置为响应于确定每个对象的属性描述信息,遍历每个所述聚类语义单元,基于每个所述聚类语义单元执行以下操作:The traversal unit is configured to, in response to determining the attribute description information of each object, traverse each of the clustering semantic units, and perform the following operations based on each of the clustering semantic units:
查找单元,被配置为在所述对象的原生评价信息中查找当前聚类语义单元;a search unit configured to search for the current clustering semantic unit in the original evaluation information of the object;
第四确定单元,被配置为响应于所述对象的原生评价信息中包含所述当前聚类语义信息,将所述当前聚类语义单元确定为所述对象的属性描述信息。The fourth determining unit is configured to determine the current clustering semantic unit as the attribute description information of the object in response to the original evaluation information of the object containing the current clustering semantic information.
在一示例性实施例中,所述信息抽取装置还包括:In an exemplary embodiment, the information extraction device further includes:
情感值确定单元,被配置为对于所述对象的每项属性描述信息,确定所述属性描述信息的情感值;an emotional value determination unit configured to determine the emotional value of the attribute description information for each item of attribute description information of the object;
第五确定单元,被配置为将所述对象的原生评价信息中包含所述属性描述信息,与所述属性描述信息的情感值一致的原生评价信息确定为与所述属性描述信息相匹配的原生评价信息;The fifth determination unit is configured to determine that the native evaluation information of the object includes the attribute description information and is consistent with the emotional value of the attribute description information as the original evaluation information that matches the attribute description information. evaluation information;
第一挂载单元,被配置为将与所述属性描述信息相匹配的原生评价信息挂载到所述属性描述信息中。The first mounting unit is configured to mount native evaluation information matching the attribute description information into the attribute description information.
在一示例性实施例中,所述信息抽取装置还包括:In an exemplary embodiment, the information extraction device further includes:
第二数量单元,被配置为统计所述对象的各项属性描述信息所挂载的原生评价信息的数量;The second quantity unit is configured to count the number of native evaluation information loaded by each attribute description information of the object;
排序单元,被配置为基于所挂载的原生评价信息的数量由大到小的顺序,对各项属性描述信息进行排序;The sorting unit is configured to sort the attribute description information based on the descending order of the number of native evaluation information mounted;
相似度计算单元,被配置为对所述对象的各项属性描述信息进行相似度计算;A similarity calculation unit configured to perform similarity calculations on each attribute description information of the object;
相似属性信息对确定单元,被配置为基于相似度计算结果,确定相似属性信息对;每项所述相似属性信息对中包括相似度大于预设值的两项属性描述信息;The similar attribute information pair determination unit is configured to determine similar attribute information pairs based on similarity calculation results; each similar attribute information pair includes two items of attribute description information whose similarity is greater than a preset value;
第二挂载单元,被配置为将所述相似属性信息对中排序在后的属性描述信息对应的原生评价信息,挂载到所述相似属性信息对中排序在前的属性描述信息对应的原生评价信息中。The second mounting unit is configured to mount the native evaluation information corresponding to the attribute description information ranked last in the similar attribute information pair to the native evaluation information corresponding to the attribute description information ranked first in the similar attribute information pair. evaluation information.
在一示例性实施例中,所述语义单元拆分单元包括:In an exemplary embodiment, the semantic unit splitting unit includes:
第一拆分单元,被配置为对所述原生评价信息进行语义单元拆分,得到多个第一语义单元;The first splitting unit is configured to split the original evaluation information into semantic units to obtain multiple first semantic units;
第一去重单元,被配置为对所述多个第一语义单元进行去重,得到所述多个原生语义单元;The first deduplication unit is configured to deduplicate the plurality of first semantic units to obtain the plurality of original semantic units;
第二拆分单元,被配置为对所述目标评价信息进行语义单元拆分,得到多个第二语义单元;The second splitting unit is configured to split the target evaluation information into semantic units to obtain a plurality of second semantic units;
第二去重单元,被配置为对所述多个第二语义单元进行去重,得到所述多个目标语义单元。The second deduplication unit is configured to deduplicate the plurality of second semantic units to obtain the plurality of target semantic units.
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the apparatus in the foregoing embodiments, the specific manner in which each module executes operations has been described in detail in the embodiments related to the method, and will not be described in detail here.
在示例性实施例中,还提供了一种包括指令的非易失性计算机可读存储介质,在一些实施例中,计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等;当所述计算机可读存储介质中的指令由服务器的处理器执行时,使得服务器能够执行如上所述的任一方法。In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium including instructions. In some embodiments, the computer-readable storage medium may be ROM, random access memory (RAM), CD- ROM, magnetic tape, floppy disk, and optical data storage device, etc.; when the instructions in the computer-readable storage medium are executed by the processor of the server, the server can perform any method as described above.
在示例性实施例中,还提供一种计算机程序产品,所述计算机程序产品包括计算机程序,所述计算机程序存储在可读存储介质中,计算机设备的至少一个处理器从所述可读存储介质读取并执行所述计算机程序,使得设备执行上述任一方法。In an exemplary embodiment, there is also provided a computer program product comprising a computer program stored in a readable storage medium from which at least one processor of a computer device reads Reading and executing the computer program causes the device to perform any of the above methods.
进一步地,图16示出了一种用于实现本公开实施例所提供的方法的设备的硬件结构示意图,所述设备可以参与构成或包含本公开实施例所提供的装置。如图16所示,设备10可以包括一个 或多个(图中采用102a、102b,……,102n来示出)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)、用于存储数据的存储器104、以及用于通信功能的传输装置106。除此以外,还可以包括:显示器、输入/输出接口(I/O接口)、通用串行总线(USB)端口(可以作为I/O接口的端口中的一个端口被包括)、网络接口、电源和/或相机。本领域普通技术人员可以理解,图16所示的结构仅为示意,其并不对上述电子装置的结构造成限定。例如,设备10还可包括比图16中所示更多或者更少的组件,或者具有与图16所示不同的配置。Further, FIG. 16 shows a schematic diagram of a hardware structure of a device for implementing the method provided by the embodiment of the present disclosure, and the device may participate in constituting or include the apparatus provided by the embodiment of the present disclosure. As shown in FIG. 16, the device 10 may include one or more (shown as 102a, 102b, ..., 102n in the figure) processor 102 (the processor 102 may include but not limited to a microprocessor MCU or programmable logic A processing device such as a device FPGA, etc.), a memory 104 for storing data, and a transmission device 106 for a communication function. In addition, it can also include: a display, an input/output interface (I/O interface), a universal serial bus (USB) port (which can be included as one of the ports of the I/O interface), a network interface, a power supply and/or camera. Those skilled in the art can understand that the structure shown in FIG. 16 is only a schematic diagram, which does not limit the structure of the above-mentioned electronic device. For example, device 10 may also include more or fewer components than shown in FIG. 16 , or have a different configuration than that shown in FIG. 16 .
应当注意到的是上述一个或多个处理器102和/或其他数据处理电路在本文中通常可以被称为“数据处理电路”。该数据处理电路可以全部或部分的体现为软件、硬件、固件或其他任意组合。此外,数据处理电路可为单个独立的处理模块,或全部或部分的结合到设备10(或移动设备)中的其他元件中的任意一个内。如本公开实施例中所涉及到的,该数据处理电路作为一种处理器控制(例如与接口连接的可变电阻终端路径的选择)。It should be noted that the one or more processors 102 and/or other data processing circuits described above may generally be referred to herein as "data processing circuits". The data processing circuit may be implemented in whole or in part as software, hardware, firmware or other arbitrary combinations. In addition, the data processing circuitry can be a single independent processing module, or be fully or partially integrated into any of the other elements in the device 10 (or mobile device). As involved in the embodiments of the present disclosure, the data processing circuit serves as a processor control (for example, the selection of the variable resistor terminal path connected to the interface).
存储器104可用于存储应用软件的软件程序以及模块,如本公开实施例中所述的方法对应的程序指令/数据存储装置,处理器102通过运行存储在存储器104内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的一种播放器预加载方法或一种播放器运行方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至设备10。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 104 can be used to store software programs and modules of application software, such as the program instruction/data storage device corresponding to the method described in the embodiments of the present disclosure, and the processor 102 executes the software program and modules stored in the memory 104 by running the Various functional applications and data processing are to implement the above-mentioned player preloading method or player running method. The memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 104 may further include memory located remotely from processor 102 , and such remote memory may be connected to device 10 via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
传输装置106用于经由一个网络接收或者发送数据。上述的网络的实例可包括设备10的通信供应商提供的无线网络。在一个实例中,传输装置106包括一个网络适配器(Network Interface Controller,NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输装置106可以为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。The transmission device 106 is used to receive or transmit data via a network. Examples of the aforementioned networks may include wireless networks provided by the communications provider of device 10 . In one example, the transmission device 106 includes a network adapter (Network Interface Controller, NIC), which can be connected to other network devices through a base station so as to communicate with the Internet. In one example, the transmission device 106 may be a radio frequency (Radio Frequency, RF) module, which is used to communicate with the Internet in a wireless manner.
显示器可以例如触摸屏式的液晶显示器(LCD),该液晶显示器可使得用户能够与设备10(或移动设备)的用户界面进行交互。The display may, for example, be a touchscreen liquid crystal display (LCD), which may enable a user to interact with the user interface of device 10 (or mobile device).
本实施例上述的任一方法均可基于图16所示的设备进行实施。Any of the above-mentioned methods in this embodiment can be implemented based on the device shown in FIG. 16 .
本公开所有实施例均可以单独被执行,也可以与其他实施例相结合被执行,均视为本公开要求的保护范围。All the embodiments of the present disclosure can be implemented independently or in combination with other embodiments, which are all regarded as the scope of protection required by the present disclosure.

Claims (29)

  1. 一种信息抽取方法,包括:An information extraction method, comprising:
    对多个对象的原生评价信息进行语种转换,得到与每条原生评价信息对应的目标评价信息;其中,所述多个对象的原生评价信息中包括多语种的原生评价信息;Performing language conversion on the original evaluation information of multiple objects to obtain target evaluation information corresponding to each piece of original evaluation information; wherein, the original evaluation information of the multiple objects includes multilingual original evaluation information;
    对所述原生评价信息和所述目标评价信息进行语义单元拆分,得到多个原生语义单元和多个目标语义单元;performing semantic unit splitting on the native evaluation information and the target evaluation information to obtain multiple native semantic units and multiple target semantic units;
    构建语义单元匹配组;其中每个所述语义单元匹配组中包括一个目标语义单元,以及与所述目标语义单元具有相同语义的多个原生语义单元;所述多个原生语义单元对应不同的语种;Constructing a semantic unit matching group; wherein each of the semantic unit matching groups includes a target semantic unit and a plurality of native semantic units having the same semantics as the target semantic unit; the plurality of native semantic units correspond to different languages ;
    基于对所述多个目标语义单元的语义聚类结果,以及所述语义单元匹配组,得到与所述多个对象对应的多语种属性描述信息。Based on the semantic clustering results of the multiple target semantic units and the semantic unit matching group, the multilingual attribute description information corresponding to the multiple objects is obtained.
  2. 根据权利要求1所述的信息抽取方法,还包括:The information extraction method according to claim 1, further comprising:
    生成与所述多个目标语义单元对应的语义向量;generating semantic vectors corresponding to the plurality of target semantic units;
    对所述与所述多个目标语义单元对应的语义向量进行语义聚类,得到多个目标类;Semantic clustering is performed on the semantic vectors corresponding to the plurality of target semantic units to obtain a plurality of target classes;
    基于每个所述目标类中的语义向量,以及所述语义单元匹配组,从所述多个目标语义单元中确定出多个聚类语义单元。A plurality of clustered semantic units are determined from the plurality of target semantic units based on the semantic vectors in each of the target classes and the semantic unit matching group.
  3. 根据权利要求2所述的信息抽取方法,其中,所述基于对所述多个目标语义单元的语义聚类结果,以及所述语义单元匹配组,得到与所述多个对象对应的多语种属性描述信息包括:The information extraction method according to claim 2, wherein, based on the semantic clustering results of the plurality of target semantic units and the matching group of the semantic units, the multilingual attributes corresponding to the plurality of objects are obtained Descriptive information includes:
    基于所述语义单元匹配组,确定与每项所述聚类语义单元相匹配的多个原生语义单元;Based on the semantic unit matching group, determine a plurality of native semantic units that match each of the clustered semantic units;
    将每项所述聚类语义单元,以及与所述聚类语义单元相匹配的多个原生语义单元确定为与所述多个对象对应的多语种属性描述信息。Each of the clustering semantic units and multiple native semantic units matching the clustering semantic units are determined as multilingual attribute description information corresponding to the multiple objects.
  4. 根据权利要求2所述的信息抽取方法,其中,所述生成与所述多个目标语义单元对应的语义向量包括:The information extraction method according to claim 2, wherein said generating semantic vectors corresponding to said plurality of target semantic units comprises:
    基于每个所述目标语义单元中每个词语的词向量,得到所述目标语义单元包含的词向量;Based on the word vector of each word in each of the target semantic units, the word vector contained in the target semantic unit is obtained;
    对所述目标语义单元包含的词向量取平均值,得到所述目标语义单元对应的所述语义向量;Taking the average value of the word vectors contained in the target semantic unit to obtain the semantic vector corresponding to the target semantic unit;
    基于各目标语义单元对应的所述语义向量,得到与所述多个目标语义单元对应的语义向量。Semantic vectors corresponding to the multiple target semantic units are obtained based on the semantic vectors corresponding to each target semantic unit.
  5. 根据权利要求2所述的信息抽取方法,其中,所述基于每个所述目标类中的语义向量,以及所述语义单元匹配组,从所述多个目标语义单元中确定出多个聚类语义单元包括:The information extraction method according to claim 2, wherein, based on the semantic vectors in each of the target classes and the semantic unit matching group, a plurality of clusters are determined from the plurality of target semantic units Semantic units include:
    确定每个所述目标类的中心语义向量;determining a central semantic vector for each of said target classes;
    基于每个所述目标类中的各语义向量与所述中心语义向量的距离,确定每个所述目标类的候选语义向量;Based on the distance between each semantic vector in each of the target classes and the central semantic vector, determine a candidate semantic vector for each of the target classes;
    根据每个所述目标类的候选语义向量对应的目标语义单元,得到多个候选语义单元;According to the target semantic unit corresponding to the candidate semantic vector of each target class, a plurality of candidate semantic units are obtained;
    基于所述语义单元匹配组,确定与每个候选语义单元相匹配的原生语义单元的数量;determining the number of native semantic units matching each candidate semantic unit based on the semantic unit matching group;
    基于与每个候选语义单元相匹配的原生语义单元的数量,从所述候选语义单元中确定出所述聚类语义单元。The clustered semantic units are determined from the candidate semantic units based on the number of native semantic units matching each candidate semantic unit.
  6. 根据权利要求2所述的信息抽取方法,还包括:The information extraction method according to claim 2, further comprising:
    响应于确定每个对象的属性描述信息,遍历每个所述聚类语义单元,基于每个所述聚类语义单元执行以下操作:In response to determining the attribute description information of each object, traverse each clustering semantic unit, and perform the following operations based on each clustering semantic unit:
    在所述对象的原生评价信息中查找当前聚类语义单元;Find the current clustering semantic unit in the original evaluation information of the object;
    响应于所述对象的原生评价信息中包含所述当前聚类语义信息,将所述当前聚类语义单元确定为所述对象的属性描述信息。In response to the original evaluation information of the object including the current cluster semantic information, the current cluster semantic unit is determined as the attribute description information of the object.
  7. 根据权利要求6所述的信息抽取方法,还包括:The information extraction method according to claim 6, further comprising:
    对于所述对象的每项属性描述信息,确定所述属性描述信息的情感值;For each attribute description information of the object, determine the sentiment value of the attribute description information;
    将所述对象的原生评价信息中包含所述属性描述信息,与所述属性描述信息的情感值一致的原生评价信息确定为与所述属性描述信息相匹配的原生评价信息;Determining the original evaluation information of the object that includes the attribute description information and is consistent with the emotional value of the attribute description information as the original evaluation information that matches the attribute description information;
    将与所述属性描述信息相匹配的原生评价信息挂载到所述属性描述信息中。Mount the original evaluation information matching the attribute description information into the attribute description information.
  8. 根据权利要求7所述的信息抽取方法,还包括:The information extraction method according to claim 7, further comprising:
    对所述对象的各项属性描述信息中任意两项属性描述信息进行相似度计算;Perform similarity calculation on any two items of attribute description information of the object;
    基于相似度计算结果,确定相似属性信息对;每项所述相似属性信息对中包括相似度大于预设值的两项属性描述信息;Determine similar attribute information pairs based on similarity calculation results; each similar attribute information pair includes two pieces of attribute description information whose similarity is greater than a preset value;
    统计所述对象的各项属性描述信息所挂载的原生评价信息的数量;Counting the number of native evaluation information attached to each attribute description information of the object;
    基于所挂载的原生评价信息的数量由大到小的顺序,对各项属性描述信息进行排序;Based on the descending order of the number of native evaluation information mounted, sort the attribute description information;
    将所述相似属性信息对中排序在后的属性描述信息对应的原生评价信息,挂载到所述相似属 性信息对中排序在前的属性描述信息对应的原生评价信息中。Mount the native evaluation information corresponding to the attribute description information sorted last in the similar attribute information pair to the native evaluation information corresponding to the attribute description information sorted in front of the similar attribute information pair.
  9. 根据权利要求1所述的信息抽取方法,其中,所述对所述原生评价信息和所述目标评价信息进行语义单元拆分,得到多个原生语义单元和多个目标语义单元包括:The information extraction method according to claim 1, wherein said splitting said native evaluation information and said target evaluation information into semantic units to obtain a plurality of original semantic units and a plurality of target semantic units comprises:
    对所述原生评价信息进行语义单元拆分,得到多个第一语义单元;performing semantic unit splitting on the native evaluation information to obtain a plurality of first semantic units;
    对所述多个第一语义单元进行去重,得到所述多个原生语义单元;Deduplicating the plurality of first semantic units to obtain the plurality of original semantic units;
    对所述目标评价信息进行语义单元拆分,得到多个第二语义单元;performing semantic unit splitting on the target evaluation information to obtain a plurality of second semantic units;
    对所述多个第二语义单元进行去重,得到所述多个目标语义单元。Deduplication is performed on the plurality of second semantic units to obtain the plurality of target semantic units.
  10. 一种信息抽取装置,包括:An information extraction device, comprising:
    语种转换单元,被配置为对多个对象的原生评价信息进行语种转换,得到与每条原生评价信息对应的目标评价信息;其中,所述多个对象的原生评价信息中包括多语种的原生评价信息;The language conversion unit is configured to perform language conversion on the original evaluation information of multiple objects to obtain target evaluation information corresponding to each piece of original evaluation information; wherein, the original evaluation information of the plurality of objects includes multilingual original evaluation information information;
    语义单元拆分单元,被配置为对所述原生评价信息和所述目标评价信息进行语义单元拆分,得到多个原生语义单元和多个目标语义单元;a semantic unit splitting unit configured to split the native evaluation information and the target evaluation information into semantic units to obtain multiple native semantic units and multiple target semantic units;
    语义单元匹配组构建单元,被配置为构建语义单元匹配组;其中每个所述语义单元匹配组中包括一个目标语义单元,以及与所述目标语义单元具有相同语义的多个原生语义单元;所述多个原生语义单元对应不同的语种;A semantic unit matching group construction unit configured to construct a semantic unit matching group; wherein each of the semantic unit matching groups includes a target semantic unit and multiple native semantic units having the same semantics as the target semantic unit; The above multiple native semantic units correspond to different languages;
    信息生成单元,被配置为基于对所述多个目标语义单元的语义聚类结果,以及所述语义单元匹配组,得到与所述多个对象对应的多语种属性描述信息。The information generation unit is configured to obtain multilingual attribute description information corresponding to the plurality of objects based on the semantic clustering results of the plurality of target semantic units and the matching group of the semantic units.
  11. 根据权利要求10所述的信息抽取装置,还包括:The information extraction device according to claim 10, further comprising:
    语义向量生成单元,被配置为生成与所述多个目标语义单元对应的语义向量;a semantic vector generating unit configured to generate semantic vectors corresponding to the plurality of target semantic units;
    语义聚类单元,被配置为对所述与所述多个目标语义单元对应的语义向量进行语义聚类,得到多个目标类;A semantic clustering unit configured to perform semantic clustering on the semantic vectors corresponding to the multiple target semantic units to obtain multiple target classes;
    第一确定单元,被配置为基于每个所述目标类中的语义向量,以及所述语义单元匹配组,从所述多个目标语义单元中确定出多个聚类语义单元。The first determining unit is configured to determine a plurality of clustering semantic units from the plurality of target semantic units based on the semantic vectors in each of the target classes and the semantic unit matching group.
  12. 根据权利要求11所述的信息抽取装置,其中,所述信息生成单元包括:The information extraction device according to claim 11, wherein the information generating unit comprises:
    第二确定单元,被配置为基于所述语义单元匹配组,确定与每项所述聚类语义单元相匹配的多个原生语义单元;The second determining unit is configured to determine a plurality of native semantic units matching each of the clustering semantic units based on the semantic unit matching group;
    第三确定单元,被配置为将每项所述聚类语义单元,以及与所述聚类语义单元相匹配的多个原生语义单元确定为与所述多个对象对应的多语种属性描述信息。The third determining unit is configured to determine each of the clustering semantic units and multiple native semantic units matching the clustering semantic units as the multilingual attribute description information corresponding to the multiple objects.
  13. 根据权利要求11所述的信息抽取装置,其中,所述语义向量生成单元包括:The information extraction device according to claim 11, wherein the semantic vector generating unit comprises:
    第一词向量确定单元,被配置为基于每个所述目标语义单元中每个词语的词向量,得到所述目标语义单元包含的词向量;The first word vector determination unit is configured to obtain the word vector contained in the target semantic unit based on the word vector of each word in each of the target semantic units;
    平均值计算单元,被配置为对所述目标语义单元包含的词向量取平均值,得到所述目标语义单元对应的所述语义向量;an average calculation unit configured to average the word vectors contained in the target semantic unit to obtain the semantic vector corresponding to the target semantic unit;
    第二词向量确定单元,被配置为基于各目标语义单元对应的所述语义向量,得到与所述多个目标语义单元对应的语义向量。The second word vector determining unit is configured to obtain semantic vectors corresponding to the plurality of target semantic units based on the semantic vectors corresponding to each target semantic unit.
  14. 根据权利要求11所述的信息抽取装置,其中,所述第一确定单元包括:The information extraction device according to claim 11, wherein the first determining unit comprises:
    中心语义向量确定单元,被配置为确定每个所述目标类的中心语义向量;a central semantic vector determining unit configured to determine the central semantic vector of each of the target classes;
    候选语义向量确定单元,被配置为基于每个所述目标类中的各语义向量与所述中心语义向量的距离,确定每个所述目标类的候选语义向量;a candidate semantic vector determining unit configured to determine a candidate semantic vector for each of the target classes based on the distance between each semantic vector in each of the target classes and the central semantic vector;
    候选语义单元确定单元,被配置为根据每个所述目标类的候选语义向量对应的目标语义单元,得到多个候选语义单元;The candidate semantic unit determining unit is configured to obtain a plurality of candidate semantic units according to the target semantic unit corresponding to the candidate semantic vector of each target class;
    第一数量确定单元,被配置为基于所述语义单元匹配组,确定与每个候选语义单元相匹配的原生语义单元的数量;The first quantity determining unit is configured to determine the quantity of native semantic units matching each candidate semantic unit based on the semantic unit matching group;
    聚类语义单元确定单元,被配置为基于与每个候选语义单元相匹配的原生语义单元的数量,从所述候选语义单元中确定出所述聚类语义单元。The clustering semantic unit determining unit is configured to determine the clustering semantic unit from the candidate semantic units based on the number of native semantic units matching each candidate semantic unit.
  15. 根据权利要求11所述的信息抽取装置,还包括:The information extraction device according to claim 11, further comprising:
    遍历单元,被配置为响应于确定每个对象的属性描述信息,遍历每个所述聚类语义单元,基于每个所述聚类语义单元执行以下操作:The traversal unit is configured to, in response to determining the attribute description information of each object, traverse each of the clustering semantic units, and perform the following operations based on each of the clustering semantic units:
    查找单元,被配置为在所述对象的原生评价信息中查找当前聚类语义单元;a search unit configured to search for the current clustering semantic unit in the original evaluation information of the object;
    第四确定单元,被配置为响应于所述对象的原生评价信息中包含所述当前聚类语义信息,将所述当前聚类语义单元确定为所述对象的属性描述信息。The fourth determining unit is configured to determine the current clustering semantic unit as the attribute description information of the object in response to the original evaluation information of the object containing the current clustering semantic information.
  16. 根据权利要求15所述的信息抽取装置,还包括:The information extraction device according to claim 15, further comprising:
    情感值确定单元,被配置为对于所述对象的每项属性描述信息,确定所述属性描述信息的情感值;an emotional value determination unit configured to determine the emotional value of the attribute description information for each item of attribute description information of the object;
    第五确定单元,被配置为将所述对象的原生评价信息中包含所述属性描述信息,与所述属性描述信息的情感值一致的原生评价信息确定为与所述属性描述信息相匹配的原生评价信息;The fifth determination unit is configured to determine that the native evaluation information of the object includes the attribute description information and is consistent with the emotional value of the attribute description information as the original evaluation information that matches the attribute description information. evaluation information;
    第一挂载单元,被配置为将与所述属性描述信息相匹配的原生评价信息挂载到所述属性描述信息中。The first mounting unit is configured to mount native evaluation information matching the attribute description information into the attribute description information.
  17. 根据权利要求16所述的信息抽取装置,还包括:The information extraction device according to claim 16, further comprising:
    第二数量单元,被配置为统计所述对象的各项属性描述信息所挂载的原生评价信息的数量;The second quantity unit is configured to count the number of native evaluation information loaded by each attribute description information of the object;
    排序单元,被配置为基于所挂载的原生评价信息的数量由大到小的顺序,对各项属性描述信息进行排序;The sorting unit is configured to sort the attribute description information based on the descending order of the number of native evaluation information mounted;
    相似度计算单元,被配置为对所述对象的各项属性描述信息进行相似度计算;A similarity calculation unit configured to perform similarity calculations on each attribute description information of the object;
    相似属性信息对确定单元,被配置为基于相似度计算结果,确定相似属性信息对;每项所述相似属性信息对中包括相似度大于预设值的两项属性描述信息;The similar attribute information pair determination unit is configured to determine similar attribute information pairs based on similarity calculation results; each similar attribute information pair includes two items of attribute description information whose similarity is greater than a preset value;
    第二挂载单元,被配置为将所述相似属性信息对中排序在后的属性描述信息对应的原生评价信息,挂载到所述相似属性信息对中排序在前的属性描述信息对应的原生评价信息中。The second mounting unit is configured to mount the native evaluation information corresponding to the attribute description information ranked last in the similar attribute information pair to the native evaluation information corresponding to the attribute description information ranked first in the similar attribute information pair. evaluation information.
  18. 根据权利要求10所述的信息抽取装置,其中,所述语义单元拆分单元包括:The information extraction device according to claim 10, wherein the semantic unit splitting unit comprises:
    第一拆分单元,被配置为对所述原生评价信息进行语义单元拆分,得到多个第一语义单元;The first splitting unit is configured to split the original evaluation information into semantic units to obtain multiple first semantic units;
    第一去重单元,被配置为对所述多个第一语义单元进行去重,得到所述多个原生语义单元;The first deduplication unit is configured to deduplicate the plurality of first semantic units to obtain the plurality of original semantic units;
    第二拆分单元,被配置为对所述目标评价信息进行语义单元拆分,得到多个第二语义单元;The second splitting unit is configured to split the target evaluation information into semantic units to obtain a plurality of second semantic units;
    第二去重单元,被配置为对所述多个第二语义单元进行去重,得到所述多个目标语义单元。The second deduplication unit is configured to deduplicate the plurality of second semantic units to obtain the plurality of target semantic units.
  19. 一种电子设备,包括:An electronic device comprising:
    处理器;processor;
    用于存储所述处理器可执行指令的存储器;memory for storing said processor-executable instructions;
    其中,所述处理器被配置为执行所述指令,以实现以下步骤:Wherein, the processor is configured to execute the instructions to achieve the following steps:
    对多个对象的原生评价信息进行语种转换,得到与每条原生评价信息对应的目标评价信息;其中,所述多个对象的原生评价信息中包括多语种的原生评价信息;Performing language conversion on the original evaluation information of multiple objects to obtain target evaluation information corresponding to each piece of original evaluation information; wherein, the original evaluation information of the multiple objects includes multilingual original evaluation information;
    对所述原生评价信息和所述目标评价信息进行语义单元拆分,得到多个原生语义单元和多个目标语义单元;performing semantic unit splitting on the native evaluation information and the target evaluation information to obtain multiple native semantic units and multiple target semantic units;
    构建语义单元匹配组;其中每个所述语义单元匹配组中包括一个目标语义单元,以及与所述目标语义单元具有相同语义的多个原生语义单元;所述多个原生语义单元对应不同的语种;Constructing a semantic unit matching group; wherein each of the semantic unit matching groups includes a target semantic unit and a plurality of native semantic units having the same semantics as the target semantic unit; the plurality of native semantic units correspond to different languages ;
    基于对所述多个目标语义单元的语义聚类结果,以及所述语义单元匹配组,得到与所述多个对象对应的多语种属性描述信息。Based on the semantic clustering results of the multiple target semantic units and the semantic unit matching group, the multilingual attribute description information corresponding to the multiple objects is obtained.
  20. 根据权利要求19所述的电子设备,其中,所述处理器还被配置为:The electronic device of claim 19, wherein the processor is further configured to:
    生成与所述多个目标语义单元对应的语义向量;generating semantic vectors corresponding to the plurality of target semantic units;
    对所述与所述多个目标语义单元对应的语义向量进行语义聚类,得到多个目标类;Semantic clustering is performed on the semantic vectors corresponding to the plurality of target semantic units to obtain a plurality of target classes;
    基于每个所述目标类中的语义向量,以及所述语义单元匹配组,从所述多个目标语义单元中确定出多个聚类语义单元。A plurality of clustered semantic units are determined from the plurality of target semantic units based on the semantic vectors in each of the target classes and the semantic unit matching group.
  21. 根据权利要求20所述的电子设备,其中,所述处理器还被配置为:The electronic device of claim 20, wherein the processor is further configured to:
    基于所述语义单元匹配组,确定与每项所述聚类语义单元相匹配的多个原生语义单元;Based on the semantic unit matching group, determine a plurality of native semantic units that match each of the clustered semantic units;
    将每项所述聚类语义单元,以及与所述聚类语义单元相匹配的多个原生语义单元确定为与所述多个对象对应的多语种属性描述信息。Each of the clustering semantic units and multiple native semantic units matching the clustering semantic units are determined as multilingual attribute description information corresponding to the multiple objects.
  22. 根据权利要求20所述的电子设备,其中,所述处理器还被配置为:The electronic device of claim 20, wherein the processor is further configured to:
    基于每个所述目标语义单元中每个词语的词向量,得到所述目标语义单元包含的词向量;Based on the word vector of each word in each of the target semantic units, the word vector contained in the target semantic unit is obtained;
    对所述目标语义单元包含的词向量取平均值,得到所述目标语义单元对应的所述语义向量;Taking the average value of the word vectors contained in the target semantic unit to obtain the semantic vector corresponding to the target semantic unit;
    基于各目标语义单元对应的所述语义向量,得到与所述多个目标语义单元对应的语义向量。Semantic vectors corresponding to the multiple target semantic units are obtained based on the semantic vectors corresponding to each target semantic unit.
  23. 根据权利要求20所述的电子设备,其中,所述处理器还被配置为:The electronic device of claim 20, wherein the processor is further configured to:
    确定每个所述目标类的中心语义向量;determining a central semantic vector for each of said target classes;
    基于每个所述目标类中的各语义向量与所述中心语义向量的距离,确定每个所述目标类的候选语义向量;Based on the distance between each semantic vector in each of the target classes and the central semantic vector, determine a candidate semantic vector for each of the target classes;
    根据每个所述目标类的候选语义向量对应的目标语义单元,得到多个候选语义单元;According to the target semantic unit corresponding to the candidate semantic vector of each target class, a plurality of candidate semantic units are obtained;
    基于所述语义单元匹配组,确定与每个候选语义单元相匹配的原生语义单元的数量;determining the number of native semantic units matching each candidate semantic unit based on the semantic unit matching group;
    基于与每个候选语义单元相匹配的原生语义单元的数量,从所述候选语义单元中确定出所述 聚类语义单元。Determining the clustering semantic units from the candidate semantic units based on the number of native semantic units matching each candidate semantic unit.
  24. 根据权利要求20所述的电子设备,其中,所述处理器还被配置为:The electronic device of claim 20, wherein the processor is further configured to:
    响应于确定每个对象的属性描述信息,遍历每个所述聚类语义单元,基于每个所述聚类语义单元执行以下操作:In response to determining the attribute description information of each object, traverse each clustering semantic unit, and perform the following operations based on each clustering semantic unit:
    在所述对象的原生评价信息中查找当前聚类语义单元;Find the current clustering semantic unit in the original evaluation information of the object;
    响应于所述对象的原生评价信息中包含所述当前聚类语义信息,将所述当前聚类语义单元确定为所述对象的属性描述信息。In response to the original evaluation information of the object including the current cluster semantic information, the current cluster semantic unit is determined as the attribute description information of the object.
  25. 根据权利要求24所述的电子设备,其中,所述处理器还被配置为:The electronic device of claim 24, wherein the processor is further configured to:
    对于所述对象的每项属性描述信息,确定所述属性描述信息的情感值;For each attribute description information of the object, determine the sentiment value of the attribute description information;
    将所述对象的原生评价信息中包含所述属性描述信息,与所述属性描述信息的情感值一致的原生评价信息确定为与所述属性描述信息相匹配的原生评价信息;Determining the original evaluation information of the object that includes the attribute description information and is consistent with the emotional value of the attribute description information as the original evaluation information that matches the attribute description information;
    将与所述属性描述信息相匹配的原生评价信息挂载到所述属性描述信息中。Mount the original evaluation information matching the attribute description information into the attribute description information.
  26. 根据权利要求25所述的电子设备,其中,所述处理器还被配置为:The electronic device of claim 25, wherein the processor is further configured to:
    对所述对象的各项属性描述信息中任意两项属性描述信息进行相似度计算;Perform similarity calculation on any two items of attribute description information of the object;
    基于相似度计算结果,确定相似属性信息对;每项所述相似属性信息对中包括相似度大于预设值的两项属性描述信息;Determine similar attribute information pairs based on similarity calculation results; each similar attribute information pair includes two pieces of attribute description information whose similarity is greater than a preset value;
    统计所述对象的各项属性描述信息所挂载的原生评价信息的数量;Counting the number of native evaluation information attached to each attribute description information of the object;
    基于所挂载的原生评价信息的数量由大到小的顺序,对各项属性描述信息进行排序;Based on the descending order of the number of native evaluation information mounted, sort the attribute description information;
    将所述相似属性信息对中排序在后的属性描述信息对应的原生评价信息,挂载到所述相似属性信息对中排序在前的属性描述信息对应的原生评价信息中。Mounting the original evaluation information corresponding to the attribute description information ranked last in the similar attribute information pair to the original evaluation information corresponding to the attribute description information ranked first in the similar attribute information pair.
  27. 根据权利要求19所述的电子设备,其中,所述处理器还被配置为:The electronic device of claim 19, wherein the processor is further configured to:
    对所述原生评价信息进行语义单元拆分,得到多个第一语义单元;performing semantic unit splitting on the native evaluation information to obtain a plurality of first semantic units;
    对所述多个第一语义单元进行去重,得到所述多个原生语义单元;Deduplicating the plurality of first semantic units to obtain the plurality of original semantic units;
    对所述目标评价信息进行语义单元拆分,得到多个第二语义单元;performing semantic unit splitting on the target evaluation information to obtain a plurality of second semantic units;
    对所述多个第二语义单元进行去重,得到所述多个目标语义单元。Deduplication is performed on the plurality of second semantic units to obtain the plurality of target semantic units.
  28. 一种非易失性计算机可读存储介质,其中,当所述计算机可读存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行以下步骤:A non-volatile computer-readable storage medium, wherein, when the instructions in the computer-readable storage medium are executed by a processor of the electronic device, the electronic device is enabled to perform the following steps:
    对多个对象的原生评价信息进行语种转换,得到与每条原生评价信息对应的目标评价信息;其中,所述多个对象的原生评价信息中包括多语种的原生评价信息;Performing language conversion on the original evaluation information of multiple objects to obtain target evaluation information corresponding to each piece of original evaluation information; wherein, the original evaluation information of the multiple objects includes multilingual original evaluation information;
    对所述原生评价信息和所述目标评价信息进行语义单元拆分,得到多个原生语义单元和多个目标语义单元;performing semantic unit splitting on the native evaluation information and the target evaluation information to obtain multiple native semantic units and multiple target semantic units;
    构建语义单元匹配组;其中每个所述语义单元匹配组中包括一个目标语义单元,以及与所述目标语义单元具有相同语义的多个原生语义单元;所述多个原生语义单元对应不同的语种;Constructing a semantic unit matching group; wherein each of the semantic unit matching groups includes a target semantic unit and a plurality of native semantic units having the same semantics as the target semantic unit; the plurality of native semantic units correspond to different languages ;
    基于对所述多个目标语义单元的语义聚类结果,以及所述语义单元匹配组,得到与所述多个对象对应的多语种属性描述信息。Based on the semantic clustering results of the multiple target semantic units and the semantic unit matching group, the multilingual attribute description information corresponding to the multiple objects is obtained.
  29. 一种计算机程序产品,包括计算机程序/指令,其中,所述计算机程序/指令被处理器执行时实现以下步骤:A computer program product comprising computer programs/instructions, wherein the computer programs/instructions implement the following steps when executed by a processor:
    对多个对象的原生评价信息进行语种转换,得到与每条原生评价信息对应的目标评价信息;其中,所述多个对象的原生评价信息中包括多语种的原生评价信息;Performing language conversion on the original evaluation information of multiple objects to obtain target evaluation information corresponding to each piece of original evaluation information; wherein, the original evaluation information of the multiple objects includes multilingual original evaluation information;
    对所述原生评价信息和所述目标评价信息进行语义单元拆分,得到多个原生语义单元和多个目标语义单元;performing semantic unit splitting on the native evaluation information and the target evaluation information to obtain multiple native semantic units and multiple target semantic units;
    构建语义单元匹配组;其中每个所述语义单元匹配组中包括一个目标语义单元,以及与所述目标语义单元具有相同语义的多个原生语义单元;所述多个原生语义单元对应不同的语种;Constructing a semantic unit matching group; wherein each of the semantic unit matching groups includes a target semantic unit and a plurality of native semantic units having the same semantics as the target semantic unit; the plurality of native semantic units correspond to different languages ;
    基于对所述多个目标语义单元的语义聚类结果,以及所述语义单元匹配组,得到与所述多个对象对应的多语种属性描述信息。Based on the semantic clustering results of the multiple target semantic units and the semantic unit matching group, the multilingual attribute description information corresponding to the multiple objects is obtained.
PCT/CN2022/096657 2021-10-11 2022-06-01 Information extraction method and apparatus WO2023060910A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111180788.5 2021-10-11
CN202111180788.5A CN113627201B (en) 2021-10-11 2021-10-11 Information extraction method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2023060910A1 true WO2023060910A1 (en) 2023-04-20

Family

ID=78390892

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/096657 WO2023060910A1 (en) 2021-10-11 2022-06-01 Information extraction method and apparatus

Country Status (2)

Country Link
CN (1) CN113627201B (en)
WO (1) WO2023060910A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627201B (en) * 2021-10-11 2022-02-08 北京达佳互联信息技术有限公司 Information extraction method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101042692A (en) * 2006-03-24 2007-09-26 富士通株式会社 translation obtaining method and apparatus based on semantic forecast
CN106897274A (en) * 2017-01-09 2017-06-27 北京众荟信息技术股份有限公司 Method is repeated in a kind of comment across languages
CN108763223A (en) * 2016-06-28 2018-11-06 大连民族大学 Method for constructing Chinese-English Mongolian Tibetan language multilingual parallel corpus
CN109726292A (en) * 2019-01-02 2019-05-07 山东省科学院情报研究所 Text analyzing method and apparatus towards extensive multilingual data
US20200311203A1 (en) * 2019-03-29 2020-10-01 Microsoft Technology Licensing, Llc Context-sensitive salient keyword unit surfacing for multi-language survey comments
CN113627201A (en) * 2021-10-11 2021-11-09 北京达佳互联信息技术有限公司 Information extraction method and device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104732571A (en) * 2013-12-20 2015-06-24 上海莱凯数码科技有限公司 Method for translating subtitles in digital animation manufacturing process

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101042692A (en) * 2006-03-24 2007-09-26 富士通株式会社 translation obtaining method and apparatus based on semantic forecast
CN108763223A (en) * 2016-06-28 2018-11-06 大连民族大学 Method for constructing Chinese-English Mongolian Tibetan language multilingual parallel corpus
CN106897274A (en) * 2017-01-09 2017-06-27 北京众荟信息技术股份有限公司 Method is repeated in a kind of comment across languages
CN109726292A (en) * 2019-01-02 2019-05-07 山东省科学院情报研究所 Text analyzing method and apparatus towards extensive multilingual data
US20200311203A1 (en) * 2019-03-29 2020-10-01 Microsoft Technology Licensing, Llc Context-sensitive salient keyword unit surfacing for multi-language survey comments
CN113627201A (en) * 2021-10-11 2021-11-09 北京达佳互联信息技术有限公司 Information extraction method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113627201A (en) 2021-11-09
CN113627201B (en) 2022-02-08

Similar Documents

Publication Publication Date Title
US11657231B2 (en) Capturing rich response relationships with small-data neural networks
JP7223785B2 (en) TIME-SERIES KNOWLEDGE GRAPH GENERATION METHOD, APPARATUS, DEVICE AND MEDIUM
KR102564144B1 (en) Method, apparatus, device and medium for determining text relevance
Khuc et al. Towards building large-scale distributed systems for twitter sentiment analysis
CN112507715A (en) Method, device, equipment and storage medium for determining incidence relation between entities
US20160098645A1 (en) High-precision limited supervision relationship extractor
CN110619053A (en) Training method of entity relation extraction model and method for extracting entity relation
US20180089316A1 (en) Seamless integration of modules for search enhancement
CN108038725A (en) A kind of electric business Customer Satisfaction for Product analysis method based on machine learning
JPWO2014033799A1 (en) Word semantic relation extraction device
US11514034B2 (en) Conversion of natural language query
Qiu et al. Advanced sentiment classification of tibetan microblogs on smart campuses based on multi-feature fusion
WO2019173085A1 (en) Intelligent knowledge-learning and question-answering
CN111400584A (en) Association word recommendation method and device, computer equipment and storage medium
CN113282762A (en) Knowledge graph construction method and device, electronic equipment and storage medium
CN111984774B (en) Searching method, searching device, searching equipment and storage medium
CN112883165A (en) Intelligent full-text retrieval method and system based on semantic understanding
TWI640877B (en) Semantic analysis apparatus, method, and computer program product thereof
WO2023060910A1 (en) Information extraction method and apparatus
CN114840685A (en) Emergency plan knowledge graph construction method
Wei et al. Feature-level sentiment analysis based on rules and fine-grained domain ontology
CN114116997A (en) Knowledge question answering method, knowledge question answering device, electronic equipment and storage medium
CN114925185B (en) Interaction method, model training method, device, equipment and medium
CN112699672B (en) Method and device for selecting articles
zhen Liu et al. Automatic summarization in Chinese product reviews

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22879859

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE