WO2021073119A1 - Procédé et appareil de désambiguïsation d'entité faisant appel à un modèle de reconnaissance d'intention et dispositif informatique - Google Patents

Procédé et appareil de désambiguïsation d'entité faisant appel à un modèle de reconnaissance d'intention et dispositif informatique Download PDF

Info

Publication number
WO2021073119A1
WO2021073119A1 PCT/CN2020/093428 CN2020093428W WO2021073119A1 WO 2021073119 A1 WO2021073119 A1 WO 2021073119A1 CN 2020093428 W CN2020093428 W CN 2020093428W WO 2021073119 A1 WO2021073119 A1 WO 2021073119A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
preset
standard
word
entity
Prior art date
Application number
PCT/CN2020/093428
Other languages
English (en)
Chinese (zh)
Inventor
张师琲
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021073119A1 publication Critical patent/WO2021073119A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Definitions

  • This application relates to the field of artificial intelligence, and in particular to an entity disambiguation method, device, computer equipment and storage medium based on an intention recognition model.
  • Entity disambiguation is a key task in natural language processing. Since entity references (such as nouns) existing in massive data can usually correspond to multiple named entity concepts, this undoubtedly creates a great obstacle to entity disambiguation.
  • the task of entity disambiguation is to match these ambiguous entity references among a large number of candidate entities to match the corresponding target entities.
  • the inventor realizes that the accuracy of the current entity disambiguation scheme is insufficient. For example, using entity links for disambiguation requires linking the named entity to be disambiguated to the corresponding entity in the external knowledge base for disambiguation, so the accuracy depends on The records of external knowledge bases are not accurate enough to accurately distinguish entities in different contexts. Therefore, the accuracy of current entity disambiguation needs to be improved.
  • the main purpose of this application is to provide an entity disambiguation method, device, computer equipment and storage medium based on an intention recognition model, aiming to improve the accuracy of entity disambiguation.
  • this application proposes an entity disambiguation method based on an intention recognition model, which includes the following steps:
  • the preset standard sentence selection method select the designated standard sentence from the preset standard sentence database
  • the specified intent recognition model corresponding to the specified standard sentence is obtained according to the corresponding relationship between the preset standard sentence and the intent recognition model, wherein the specified intent recognition
  • the model is trained using sample data, and the sample data is only composed of sentences marked as a specified type of intent
  • the recognition result is successful, then according to the preset first sentence-standard sentence-intent recognition model-entity meaning corresponding relationship, the designated entity meaning corresponding to the first sentence is acquired, and the first sentence A disambiguation labeling operation is performed in the sentence, so that the entity word that is labeled as ambiguous is labeled with the specified entity meaning.
  • This application provides an entity disambiguation device based on an intention recognition model, including:
  • the entity word acquisition unit is used to acquire the first sentence to be disambiguated, and perform ambiguity labeling processing on the first sentence according to a preset ambiguity labeling method, so as to obtain the first sentence marked as ambiguous Entity words
  • the designated standard sentence acquisition unit is used to select the designated standard sentence from the preset standard sentence database according to the preset standard sentence selection method;
  • the first distance judgment unit is configured to calculate the first distance between the first sentence and the designated standard sentence according to a preset distance calculation formula, and determine whether the first distance is less than the preset first distance Threshold
  • a designated intent recognition model acquiring unit configured to, if the first distance is less than a preset first distance threshold, acquire the designated intent corresponding to the designated standard sentence according to the corresponding relationship between the preset standard sentence and the intent recognition model A recognition model, wherein the designated intent recognition model is trained using sample data, and the sample data is only composed of sentences marked as designated types of intents;
  • a recognition result obtaining unit configured to input the first sentence into the designated intent recognition model for calculation, thereby obtaining a recognition result output by the designated intent recognition model, wherein the recognition result includes recognition success or recognition failure;
  • a recognition result judging unit for judging whether the recognition result is a successful recognition
  • the designated entity meaning labeling unit is used to obtain the designated entity corresponding to the first sentence according to the preset correspondence relationship of the first sentence-standard sentence-intent recognition model-entity meaning Meaning, and perform a disambiguation labeling operation on the first sentence, so that the entity word labeled as ambiguous is labeled with the specified entity meaning.
  • the present application provides a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements the steps of any one of the above methods when the computer program is executed.
  • the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of any one of the methods described above are implemented.
  • the entity disambiguation method, device, computer equipment and storage medium based on the intention recognition model of the present application obtain the first sentence to be disambiguated, obtain the entity words marked as ambiguous in the first sentence; select the specified criteria Sentence; calculate the first distance between the first sentence and the specified standard sentence; if the first distance is less than the preset first distance threshold, obtain the specified intent recognition model; input the first sentence
  • the designated intent recognition model performs operations to obtain a recognition result, wherein the designated intent recognition model is trained using sample data, and the sample data is only composed of sentences marked as designated types of intents; if the recognition result For successful recognition, the meaning of the designated entity corresponding to the first sentence is obtained, and the disambiguation operation is performed on the first sentence, so that the entity word marked as ambiguous is marked with the designated entity meaning.
  • a new dimension intent recognition, is introduced to improve the accuracy of entity disambiguation.
  • Fig. 1 is a schematic flowchart of an entity disambiguation method based on an intention recognition model according to an embodiment of this application;
  • FIG. 2 is a schematic block diagram of the structure of an entity disambiguation apparatus based on an intention recognition model according to an embodiment of the application;
  • FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the application.
  • an embodiment of the present application provides an entity disambiguation method based on an intention recognition model, including the following steps:
  • step S1 obtain the first sentence to be disambiguated, and perform ambiguity labeling processing on the first sentence according to the preset ambiguity labeling method, so as to obtain the first sentence marked as ambiguous Entity words.
  • entity disambiguation in this application is to obtain the true meaning of ambiguous entity words, so the ambiguous entity words need to be marked as ambiguous.
  • the preset ambiguity tagging method is, for example, that the first sentence is input into the bidirectional encoder in the preset ambiguity tagging model for processing, so that the first ambiguity corresponding to each word in the first sentence is one-to-one.
  • the ambiguity annotation model is composed of a two-way encoder and a support vector machine, and the two-way encoder includes a multi-layer conversion unit;
  • the set of hidden state vectors is input to the support vector machine for operation to obtain a second ambiguity tag sequence corresponding to each word in the first sentence one-to-one;
  • the first ambiguity is calculated according to a preset similarity value calculation method Mark the similarity value between the annotation sequence and the second ambiguous annotation sequence, and determine whether the similarity value is greater than a preset similarity threshold; if the similarity value is greater than the preset similarity threshold, obtain the first The entity words marked as ambiguous in the two-ambiguity tagging sequence.
  • the designated standard sentence is selected from the preset standard sentence database.
  • the standard sentence is used to select a suitable intent recognition model, so it is necessary to pick out the designated standard sentence that is similar to the first sentence.
  • the preset standard sentence selection method is, for example, according to the formula: Calculate the sentence similarity value sim between the first sentence and a standard sentence in the standard sentence database; determine whether the sentence similarity value sim is greater than the preset sentence similarity threshold in the standard sentence database Standard sentence
  • the standard sentence whose sentence similarity value sim is greater than the preset sentence similarity threshold is recorded as the designated standard sentence.
  • step S3 calculate the first distance between the first sentence and the specified standard sentence according to the preset distance calculation formula, and determine whether the first distance is less than the preset first distance threshold .
  • the first distance reflects the degree of similarity between the first sentence and the specified standard sentence. If the value of the first distance is smaller, the more similar is indicated. When the first sentence is exactly the same as the specified standard sentence, the The first distance is equal to zero.
  • the preset distance calculation formula is, for example, by querying a preset word vector library, obtaining a first word vector sequence I corresponding to the first sentence, and obtaining a second word vector sequence R corresponding to the specified standard sentence ; According to the formula:
  • the specified intent recognition model corresponding to the specified standard sentence is obtained according to the corresponding relationship between the preset standard sentence and the intent recognition model , wherein the designated intent recognition model is trained using sample data, and the sample data is only composed of sentences marked as designated types of intents. If the first distance is less than the preset first distance threshold, it indicates that there is an applicable intention recognition model, and according to the corresponding relationship between the preset standard sentence and the intention recognition model, the designated standard sentence corresponding to the designated standard sentence is obtained. Intent recognition model.
  • the designated intent recognition model used in this application is trained using sample data, and the sample data is only composed of sentences marked as designated types of intents, so that the size of the designated intent recognition model is smaller and the training data required is smaller. , It is easier to train, and the accuracy of intention recognition is higher for sentences within a limited range (that is, sentences similar to the specified standard sentence, such as the first sentence). Further, the sample data for training the specified intent recognition model consists of only a limited number of words, and the limited number of words is the same or similar to the words in the first sentence, so that the training is faster and the first sentence is more efficient.
  • Recognition is more accurate (because the number of words in the sample data is limited and is the same or similar to the words in the first sentence, so the sample data can find all training sentences by traversal method, so the first sentence must be in the training process Sentences that have appeared, so it is more accurate and faster to recognize the first sentence).
  • the first sentence is input into the designated intent recognition model to perform operations, so as to obtain a recognition result output by the designated intent recognition model, wherein the recognition result includes recognition success or recognition failure.
  • the designated intent recognition model can only recognize one type of intent (namely, the designated intent type), its successful recognition means that the first sentence is the designated intent type. If the recognition fails, other intent recognition models need to be adopted. Identify again.
  • step S6 it is determined whether the recognition result is successful. Because there are only two recognition results: recognition success or recognition failure.
  • recognition success it indicates that the first sentence is the designated intent type, otherwise, the intent type of the first sentence cannot be determined.
  • step S7 if the recognition result is successful, then according to the preset first sentence-standard sentence-intent recognition model-entity meaning correspondence relationship, the designated entity meaning corresponding to the first sentence is obtained , And perform a disambiguation labeling operation on the first sentence, so that the entity word labeled as ambiguous is labeled with the specified entity meaning.
  • Ambiguous entity words have different meanings in different intention contexts, and if the specific intention type can be identified, the exact meaning of ambiguous words can also be determined. Accordingly, this application obtains the specified entity meaning corresponding to the first sentence according to the preset correspondence relationship of the first sentence-standard sentence-intent recognition model-entity meaning, and disambiguates the first sentence Marking operation, so that the entity word marked as ambiguous is marked with the specified entity meaning. Therefore, the actual meaning of the entity words marked as ambiguous in the first sentence can be known from the meaning of the specified entity. For example, the first sentence is: My phone is broken, borrow your apple and use it.
  • the apple in the first sentence can be marked with the designated entity meaning (phone).
  • the step S1 of performing ambiguity labeling processing on the first sentence according to the preset ambiguity labeling method, so as to obtain the entity words marked as ambiguous in the first sentence includes:
  • the ambiguity annotation processing on the first sentence is implemented, so as to obtain the entity words marked as ambiguous in the first sentence.
  • This application uses an ambiguity annotation model with a special structure for ambiguity annotation.
  • the ambiguity labeling model is composed of a bidirectional encoder and a support vector machine, thereby improving the accuracy of ambiguity labeling.
  • the support vector machine is a model that can be used for labeling, but its input features need to be manually set, so the accuracy is low. Therefore, this application uses the hidden state vector set of the last layer of the conversion unit of the two-way encoder as the support vector machine The input improves the accuracy.
  • the bidirectional encoder includes a multi-layer conversion unit, wherein the conversion unit is composed of multiple encoders and decoders, and can output a first ambiguity annotation sequence, which is used as a reference for whether the second ambiguity annotation sequence is accurate. Then calculate the similarity value between the first ambiguous annotation sequence and the second ambiguous annotation sequence, and if the similarity value is greater than the preset similarity threshold, it indicates that the annotation of the ambiguous annotation model is accurate, and then the first ambiguous annotation sequence is obtained.
  • the entity words marked as ambiguous in the two-ambiguity tagging sequence may be any method, for example, a calculation method based on cosine similarity is adopted.
  • the step S2 of selecting a specified standard sentence from a preset standard sentence database according to a preset standard sentence selection method includes:
  • S202 Determine whether there is a standard sentence with the sentence similarity value sim greater than a preset sentence similarity threshold in the standard sentence database;
  • the specified standard sentence is selected from the preset standard sentence database.
  • This application is based on the formula: Calculate the sentence similarity value sim between the first sentence and a standard sentence in the standard sentence database; determine whether the sentence similarity value sim is greater than the preset sentence similarity threshold in the standard sentence database Standard sentence; if it exists, the standard sentence whose sentence similarity value sim is greater than the preset sentence similarity threshold is recorded as the designated standard sentence.
  • the sentence similarity value sim is used to measure the similarity between two sentences, and its maximum value is 1. When the value is 1, it indicates that the two sentences have exactly the same words.
  • the word frequency vector is composed of the number of occurrences of each word as the value of the sub-vector.
  • the sentence is: I say I want a book, then it has four words (I, say, want, book), which constitutes the word frequency
  • the vector is (2,1,1,1).
  • the step S3 of calculating the first distance between the first sentence and the designated standard sentence according to a preset distance calculation formula includes:
  • S301 Obtain a first word vector sequence I corresponding to the first sentence by querying a preset word vector library, and obtain a second word vector sequence R corresponding to the designated standard sentence;
  • the word vector library stores word vectors, which are used to convert words into vector forms to facilitate computer understanding.
  • the word vector database can be obtained by using an existing database, or by using the word vector training tool word2vec to train a pre-collected corpus. According to the formula:
  • the designated intent recognition corresponding to the designated standard sentence is obtained.
  • the training specified intent recognition model is realized.
  • This application uses sample data for training.
  • the sample data is only composed of sentences marked as specified types of intents, thereby reducing the amount of training data, and only one type of intent needs to be recognized, transforming complex multi-classification tasks It becomes a simple two-classification task, which improves the accuracy and speed of recognition.
  • the neural network model is, for example, VGG16 model, ResNet50 model, DPN131 model, InceptionV3 model, etc.
  • the stochastic gradient descent method refers to randomly sampling some training data for training, which can solve the problem of slow training speed caused by a large amount of training data.
  • the test data is then used to verify the intermediate intent recognition model, and if the verification is passed, the intermediate intent recognition is recorded as the designated intent recognition model.
  • the method includes:
  • the recognition result is recognition failure, obtain candidate standard sentences from a plurality of designated standard sentences, wherein the second distance between the candidate standard sentence and the first sentence is greater than the first sentence.
  • the distance threshold is and is smaller than the preset second distance threshold;
  • the second recognition result is that the recognition is successful, obtain the candidate entity meaning corresponding to the first sentence according to the preset correspondence relationship of the first sentence-standard sentence-intent recognition model-entity meaning, and A disambiguation labeling operation is performed on the first sentence, so that the entity words labeled as ambiguous are labeled with the candidate entity meaning.
  • the intent recognition model of the present application is a small volume model and can only recognize one type of intent, there are cases where the recognition of the designated intent recognition model fails.
  • the intent type can still be recognized.
  • This application adopts the method of adjusting the distance threshold to obtain a suitable model, specifically: obtaining candidate standard sentences from a plurality of designated standard sentences, wherein the second distance between the candidate standard sentence and the first sentence It is greater than the first distance threshold and less than the preset second distance threshold; according to the correspondence between the preset standard sentence and the intent recognition model, obtain the candidate intent recognition model corresponding to the candidate standard sentence.
  • the candidate intent recognition model can be successfully identified, it can also achieve the purpose of disambiguation. According to this, according to the preset first sentence-standard sentence-intent recognition model-entity meaning correspondence relationship, the first sentence is obtained Corresponding candidate entity meaning, and performing a disambiguation labeling operation on the first sentence, so that the entity word marked as ambiguous is marked with the candidate entity meaning.
  • the method includes:
  • S642 Determine whether the number of the designated standard sentences is greater than a preset number threshold
  • the second recognition result is recognition failure, and the number of specified standard sentences is not greater than the preset number threshold, it indicates that the first sentence has only one intention, that is, there is no ambiguity in the first sentence, so
  • the aforementioned ambiguous labeling is not accurate, and accordingly, a labeling modification operation is performed, wherein the labeling modification operation is used to modify the label of the entity word that is marked as ambiguous to an unambiguous label. Accordingly, the error of mislabeling of ambiguity can be prevented, and the ambiguity label can be corrected quickly.
  • an embodiment of the present application provides an entity disambiguation device based on an intention recognition model, including:
  • the entity word acquisition unit 10 is configured to acquire the first sentence to be disambiguated, and perform ambiguity labeling processing on the first sentence according to a preset ambiguity labeling method, so as to obtain the first sentence marked as ambiguous Entity words;
  • the designated standard sentence acquisition unit 20 is configured to select a designated standard sentence from a preset standard sentence database according to a preset standard sentence selection method;
  • the first distance judgment unit 30 is configured to calculate the first distance between the first sentence and the designated standard sentence according to a preset distance calculation formula, and determine whether the first distance is less than the preset first distance.
  • Distance threshold
  • the designated intent recognition model acquiring unit 40 is configured to, if the first distance is less than a preset first distance threshold, acquire the designated standard sentence corresponding to the designated standard sentence according to the corresponding relationship between the preset standard sentence and the intent recognition model An intent recognition model, wherein the designated intent recognition model is trained using sample data, and the sample data is only composed of sentences marked as designated types of intents;
  • the recognition result obtaining unit 50 is configured to input the first sentence into the designated intent recognition model to perform operations, thereby obtaining a recognition result output by the designated intent recognition model, wherein the recognition result includes recognition success or recognition failure;
  • the recognition result judging unit 60 is used to judge whether the recognition result is a successful recognition
  • the designated entity meaning labeling unit 70 is configured to, if the recognition result is successful, obtain the designated entity corresponding to the first sentence according to the preset correspondence relationship of the first sentence-standard sentence-intent recognition model-entity meaning Entity meaning, and perform a disambiguation labeling operation on the first sentence, so that the entity word that is labeled as ambiguous is labeled with the specified entity meaning.
  • the entity word acquiring unit 10 includes:
  • the two-way encoder processing subunit is used to input the first sentence into the two-way encoder in the preset ambiguity labeling model for processing, so that the first ambiguity label is one-to-one corresponding to each word in the first sentence Sequence, and obtain the hidden state vector set of the last-level conversion unit of the bidirectional encoder, wherein the ambiguity annotation model is composed of a bidirectional encoder and a support vector machine, and the bidirectional encoder includes a multilayer conversion unit;
  • the second ambiguity tag sequence acquisition subunit is used to input the hidden state vector set into the support vector machine for operation to obtain a second ambiguity tag sequence corresponding to each word of the first sentence one-to-one, wherein the support
  • the function used by the vector machine for calculation is among them Is the label value corresponding to the i-th word of the first sentence, y is the independent variable, yi is the label corresponding to the i-th word of the first sentence, w yi is the parameter vector corresponding to the i-th word, hi Is the hidden state vector corresponding to the i-th word, w yi and hi have the same number of component vectors;
  • the similarity value judgment subunit is used to calculate the similarity value between the first ambiguity annotation sequence and the second ambiguity annotation sequence according to a preset similarity value calculation method, and to determine whether the similarity value is greater than the expected value.
  • the entity word acquiring subunit is configured to acquire the entity word marked as ambiguous in the second ambiguous annotation sequence if the similarity value is greater than a preset similarity threshold.
  • the designated standard sentence obtaining unit 20 includes:
  • the sentence similarity value sim calculation subunit is used according to the formula: Calculate the sentence similarity value sim between the first sentence and a standard sentence in the standard sentence database, where A is the word frequency vector of the first sentence, B is the word frequency vector of the standard sentence, and Ai is the first The number of times the i-th word of the sentence appears in the whole sentence; Bi is the number of times the i-th word of the standard sentence appears in the whole sentence;
  • the sentence similarity value sim judgment subunit is used to judge whether there is a standard sentence whose sentence similarity value sim is greater than a preset sentence similarity threshold in the standard sentence database;
  • the designated standard sentence marking subunit is used to record the standard sentence with the sentence similarity value sim greater than the preset sentence similarity threshold as the designated standard sentence if it exists.
  • the first distance determining unit 30 includes:
  • the word vector database query subunit is used to query a preset word vector database to obtain a first word vector sequence I corresponding to the first sentence, and to obtain a second word vector sequence R corresponding to the specified standard sentence ;
  • the first distance D calculation subunit is used according to the formula:
  • the device includes:
  • the sample data dividing unit is used to obtain a plurality of pre-collected sample data, and divide the plurality of sample data into training data and test data; wherein, the sample data is a sentence marked as a specified type of intention;
  • the intermediate intent recognition model acquisition unit is used to input training data into the preset neural network model for training, where the training adopts the stochastic gradient descent method to obtain the intermediate intent recognition model;
  • the verification passing judgment unit is configured to verify the intermediate intention recognition model by using the test data, and judge whether the verification is passed;
  • the designated intent recognition model marking unit is used to record the intermediate intent recognition as the designated intent recognition model if the verification is passed.
  • the device includes:
  • the candidate standard sentence obtaining unit is configured to obtain a candidate standard sentence from a plurality of designated standard sentences if the recognition result is a recognition failure, wherein the first sentence between the candidate standard sentence and the first sentence 2.
  • the distance is greater than the first distance threshold and less than the preset second distance threshold;
  • the candidate intent recognition model obtaining unit is configured to obtain the candidate intent recognition model corresponding to the candidate standard sentence according to the preset corresponding relationship between the standard sentence and the intent recognition model;
  • the second recognition result acquisition unit is configured to input the first sentence into the candidate intent recognition model to perform operations, thereby obtaining a second recognition result output by the candidate intent recognition model, wherein the second recognition result Including recognition success or recognition failure;
  • the second recognition result judging unit is used to judge whether the second recognition result is a successful recognition
  • An alternative entity meaning labeling unit configured to, if the second recognition result is successful, obtain the corresponding relationship to the first sentence according to the preset correspondence relationship of the first sentence-standard sentence-intent recognition model-entity meaning The candidate entity meaning of, and a disambiguation labeling operation is performed on the first sentence, so that the entity word that is labeled as ambiguous is labeled with the candidate entity meaning.
  • the device includes:
  • a quantity acquiring unit configured to acquire the quantity of the designated standard sentence if the second recognition result is a recognition failure
  • a quantity threshold judging unit for judging whether the quantity of the specified standard sentences is greater than a preset quantity threshold
  • An annotation modification unit configured to perform an annotation modification operation if the number of the designated standard sentences is not greater than a preset number threshold, wherein the annotation modification operation is used to modify the annotations of the entity words that are marked as ambiguous to be unambiguous Label.
  • an embodiment of the present application also provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in the figure.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer equipment is used to store the data used in the entity disambiguation method based on the intention recognition model.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize an entity disambiguation method based on the intention recognition model.
  • the processor executes the above-mentioned entity disambiguation method based on the intention recognition model, wherein the steps included in the method respectively correspond to the steps of executing the entity disambiguation method based on the intention recognition model of the foregoing embodiment in a one-to-one correspondence, and will not be repeated here.
  • An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored.
  • a computer program When the computer program is executed by a processor, an entity disambiguation method based on an intention recognition model is implemented, and the storage medium is a volatile storage medium. Or a non-volatile storage medium, wherein the steps included in the method respectively correspond to the steps of performing the entity disambiguation method based on the intention recognition model of the foregoing embodiment, and will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)

Abstract

La présente invention appartient au domaine de l'intelligence artificielle et concerne un procédé et un appareil de désambiguïsation d'entité faisant appel à un modèle de reconnaissance d'intention et un dispositif informatique et un support de stockage, ledit procédé consistant à : acquérir une première phrase à désambiguïser et acquérir un mot d'entité marqué comme ambigu dans la première phrase ; sélectionner une phrase standard spécifiée ; calculer une première distance entre la première phrase et la phrase standard spécifiée ; si la première distance est inférieure à un premier seuil de distance, alors acquérir un modèle de reconnaissance d'intention spécifié ; entrer la première phrase dans le modèle de reconnaissance d'intention spécifié pour obtenir un résultat de reconnaissance, le modèle de reconnaissance d'intention spécifié étant entraîné à l'aide de données d'échantillons, et les données d'échantillons étant uniquement composées de phrases marquées comme ayant un type d'intention spécifié ; et, si le résultat de la reconnaissance est une reconnaissance réussie, alors acquérir une signification d'entité spécifiée correspondant à la première phrase et marquer le mot d'entité avec la signification d'entité spécifiée. Une nouvelle dimension (reconnaissance d'intention) est introduite dans le processus de désambiguïsation, augmentant la précision de la désambiguïsation d'entité.
PCT/CN2020/093428 2019-10-15 2020-05-29 Procédé et appareil de désambiguïsation d'entité faisant appel à un modèle de reconnaissance d'intention et dispositif informatique WO2021073119A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910978260.9 2019-10-15
CN201910978260.9A CN111079429B (zh) 2019-10-15 2019-10-15 基于意图识别模型的实体消歧方法、装置和计算机设备

Publications (1)

Publication Number Publication Date
WO2021073119A1 true WO2021073119A1 (fr) 2021-04-22

Family

ID=70310388

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093428 WO2021073119A1 (fr) 2019-10-15 2020-05-29 Procédé et appareil de désambiguïsation d'entité faisant appel à un modèle de reconnaissance d'intention et dispositif informatique

Country Status (2)

Country Link
CN (1) CN111079429B (fr)
WO (1) WO2021073119A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269244A (zh) * 2021-05-18 2021-08-17 上海睿翎法律咨询服务有限公司 针对工商登记信息中跨企业人员重名实现消歧处理方法、系统、装置、处理器及其存储介质
CN113761218A (zh) * 2021-04-27 2021-12-07 腾讯科技(深圳)有限公司 一种实体链接的方法、装置、设备及存储介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079429B (zh) * 2019-10-15 2022-03-18 平安科技(深圳)有限公司 基于意图识别模型的实体消歧方法、装置和计算机设备
CN111737962A (zh) * 2020-06-24 2020-10-02 平安科技(深圳)有限公司 一种实体修订方法、装置、计算机设备和可读存储介质
CN111985249B (zh) * 2020-09-03 2024-10-08 贝壳技术有限公司 语义分析方法、装置、计算机可读存储介质及电子设备
CN112650859A (zh) * 2020-12-29 2021-04-13 北京欧拉认知智能科技有限公司 一种用户意图识别方法、设备及模型构建方法
CN113157890B (zh) * 2021-04-25 2024-06-11 深圳壹账通智能科技有限公司 智能问答方法、装置、电子设备及可读存储介质
CN115238701B (zh) * 2022-09-21 2023-01-10 北京融信数联科技有限公司 基于子词级别适应器的多领域命名实体识别方法和系统
CN115599900B (zh) * 2022-12-12 2023-03-21 深圳市人马互动科技有限公司 具有歧义的用户输入的分发处理方法及相关装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9245015B2 (en) * 2013-03-08 2016-01-26 Accenture Global Services Limited Entity disambiguation in natural language text
CN106407180A (zh) * 2016-08-30 2017-02-15 北京奇艺世纪科技有限公司 一种实体消歧方法及装置
CN110287283A (zh) * 2019-05-22 2019-09-27 中国平安财产保险股份有限公司 意图模型训练方法、意图识别方法、装置、设备及介质
CN111079429A (zh) * 2019-10-15 2020-04-28 平安科技(深圳)有限公司 基于意图识别模型的实体消歧方法、装置和计算机设备

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677639A (zh) * 2016-01-10 2016-06-15 齐鲁工业大学 一种基于短语结构句法树的英文词义消歧方法
CN107861939B (zh) * 2017-09-30 2021-05-14 昆明理工大学 一种融合词向量和主题模型的领域实体消歧方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9245015B2 (en) * 2013-03-08 2016-01-26 Accenture Global Services Limited Entity disambiguation in natural language text
CN106407180A (zh) * 2016-08-30 2017-02-15 北京奇艺世纪科技有限公司 一种实体消歧方法及装置
CN110287283A (zh) * 2019-05-22 2019-09-27 中国平安财产保险股份有限公司 意图模型训练方法、意图识别方法、装置、设备及介质
CN111079429A (zh) * 2019-10-15 2020-04-28 平安科技(深圳)有限公司 基于意图识别模型的实体消歧方法、装置和计算机设备

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113761218A (zh) * 2021-04-27 2021-12-07 腾讯科技(深圳)有限公司 一种实体链接的方法、装置、设备及存储介质
CN113761218B (zh) * 2021-04-27 2024-05-10 腾讯科技(深圳)有限公司 一种实体链接的方法、装置、设备及存储介质
CN113269244A (zh) * 2021-05-18 2021-08-17 上海睿翎法律咨询服务有限公司 针对工商登记信息中跨企业人员重名实现消歧处理方法、系统、装置、处理器及其存储介质

Also Published As

Publication number Publication date
CN111079429A (zh) 2020-04-28
CN111079429B (zh) 2022-03-18

Similar Documents

Publication Publication Date Title
WO2021073119A1 (fr) Procédé et appareil de désambiguïsation d'entité faisant appel à un modèle de reconnaissance d'intention et dispositif informatique
CN111160017B (zh) 关键词抽取方法、话术评分方法以及话术推荐方法
CN108874928B (zh) 简历数据信息解析处理方法、装置、设备及存储介质
WO2022142613A1 (fr) Procédé et appareil d'expansion de corpus de formation et procédé et appareil de formation de modèle de reconnaissance d'intention
JP5346279B2 (ja) 検索による注釈付与
CN111177345B (zh) 基于知识图谱的智能问答方法、装置和计算机设备
CN109446885B (zh) 一种基于文本的元器件识别方法、系统、装置和存储介质
CN110427612B (zh) 基于多语言的实体消歧方法、装置、设备和存储介质
CN113220782B (zh) 多元测试数据源生成方法、装置、设备及介质
CN110162771B (zh) 事件触发词的识别方法、装置、电子设备
JP4699954B2 (ja) マルチメディアデータ管理方法とその装置
WO2021027125A1 (fr) Procédé et appareil d'étiquetage de séquence, dispositif informatique et support d'informations
WO2022116436A1 (fr) Procédé et appareil d'appariement sémantique de texte pour des phrases longues et courtes, dispositif informatique et support de stockage
WO2022134805A1 (fr) Procédé et appareil de prédiction de classification de document, dispositif informatique et support de stockage
US11507746B2 (en) Method and apparatus for generating context information
CN112364660A (zh) 语料文本处理方法、装置、计算机设备及存储介质
CN113449489B (zh) 标点符号标注方法、装置、计算机设备和存储介质
CN111611452A (zh) 搜索文本的歧义识别方法、系统、设备及存储介质
CN113761208A (zh) 一种基于知识图谱的科技创新资讯分类方法和存储设备
CN112784009A (zh) 一种主题词挖掘方法、装置、电子设备及存储介质
CN113468890A (zh) 基于nlp信息萃取与词性规则的沉积学文献挖掘方法
CN111104503A (zh) 一种建筑工程质量验收规范问答系统及其构建方法
JP7121819B2 (ja) 画像処理方法及び装置、電子機器、コンピュータ可読記憶媒体並びにコンピュータプログラム
CN117725182A (zh) 基于大语言模型的数据检索方法、装置、设备和存储介质
CN110633363B (zh) 一种基于nlp和模糊多准则决策的文本实体推荐方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20876741

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20876741

Country of ref document: EP

Kind code of ref document: A1