CN112560466B - Link entity association method, device, electronic equipment and storage medium - Google Patents

Link entity association method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112560466B
CN112560466B CN202011546360.3A CN202011546360A CN112560466B CN 112560466 B CN112560466 B CN 112560466B CN 202011546360 A CN202011546360 A CN 202011546360A CN 112560466 B CN112560466 B CN 112560466B
Authority
CN
China
Prior art keywords
vector
link
text
entity
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011546360.3A
Other languages
Chinese (zh)
Other versions
CN112560466A (en
Inventor
雷谦
熊壮
张翔翔
姚后清
施鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011546360.3A priority Critical patent/CN112560466B/en
Publication of CN112560466A publication Critical patent/CN112560466A/en
Application granted granted Critical
Publication of CN112560466B publication Critical patent/CN112560466B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a link entity association method, a device, electronic equipment and a storage medium, and relates to the technical field of deep learning and natural language, wherein the method comprises the following steps: generating a first vector in response to receiving the first text and the first link word; the first link word is a word to be linked in the first text; determining a second vector matched with the first vector from a preset link entity vector library, wherein the link entity vector library is pre-stored with vector representations of a plurality of link entities; and associating the link entity corresponding to the second vector with the first link word in the first text. The text containing the link words and the link entities are vectorized, so that the proper link entities can be determined for the link words from the link entity vector library in a vector matching mode.

Description

Link entity association method, device, electronic equipment and storage medium
Technical Field
The application relates to the technical field of artificial intelligence, in particular to deep learning and natural language technology, and specifically relates to a method, a device, electronic equipment and a storage medium for linking entity association.
Background
The internet encyclopedia product provides a convenient information retrieval way for users, and in the text content of encyclopedia vocabulary entries, some internal links exist, so that the user can jump to other vocabulary entries mentioned in the text, and knowledge points can be conveniently and quickly associated. As the size of the knowledge base increases, the number of candidate terms corresponding to the same term increases, that is, the term to be linked (the link term for short) may correspond to a plurality of candidate link entities, which requires selecting an appropriate link entity for the link term from the plurality of candidate link entities.
Disclosure of Invention
The application provides a method, a device, an electronic device, a storage medium and a computer program product for linking entity association.
According to a first aspect, the present application provides a method for linking entity association, including:
generating a first vector in response to receiving the first text and the first link word; the first link word is a word to be linked in the first text;
determining a second vector matched with the first vector from a preset link entity vector library, wherein the link entity vector library is pre-stored with vector representations of a plurality of link entities;
and associating the link entity corresponding to the second vector with the first link word in the first text.
According to a second aspect, the present application provides a linking entity associating apparatus, comprising:
the text vectorization module is used for generating a first vector in response to receiving the first text and the first link word; the first link word is a word to be linked in the first text;
the determining module is used for determining a second vector matched with the first vector from a preset link entity vector library, and the link entity vector library is pre-stored with vector representations of a plurality of link entities;
and the association module is used for associating the link entity corresponding to the second vector with the first link word in the first text.
According to a third aspect, the present application provides an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any one of the methods of the first aspect.
According to a fourth aspect, the present application provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform any of the methods of the first aspect.
According to a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements any of the methods of the first aspect.
According to the technology of the application, through vectorization of the text containing the link words and the link entities, the proper link entities can be determined from the link entity vector library through vector matching.
It should be understood that the description of this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.
Drawings
The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:
FIG. 1 is a flow chart of a method of linking entity association according to a first embodiment of the present application;
fig. 2 is a schematic structural diagram of a link entity associating apparatus according to a second embodiment of the present application;
fig. 3 is a block diagram of an electronic device for implementing a linked entity association method of an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Currently, there are mainly the following ways in which a link entity can be associated for a link word in text:
first, manual editing: after the user has written the text content, the user can select a proper link entity for the link words in the text, and the link entity determined in the mode has higher accuracy but lower efficiency.
Second, machine linking: the machine finds the alternative link entity corresponding to the link word from the knowledge base according to the link word in the text, if a plurality of alternative link entities exist, scoring each alternative link entity, and sorting the scoring to determine the link entity corresponding to the link word.
In view of this, the present application provides a linking entity association method, apparatus, electronic device, storage medium and computer program product based on linking entity vectorization.
Exemplary embodiments of the present application are described below.
As shown in fig. 1, the link entity association method includes the following steps:
step 101: generating a first vector in response to receiving the first text and the first link word; the first link word is a word to be linked in the first text.
The first text refers to a text including a link word, for example, a sentence or paragraph including the link word. The first text may include one link word (i.e., the first link word) or a plurality of link words, when the first text includes a plurality of link words, one of the link words may be used as the first link word, after the link entity association of the link word is completed, some other link word may be used as a new first link word, and so on until the link entity association of all the link words is completed.
In the application, a vectorization model can be trained and constructed in advance, and after the vectorization model converges, the vectorization model can be used for vectorizing the text. For example, the vectorization model may include a portion that encodes the text information, which may be used separately as a text encoder.
In use, the first text and the first link word may be input as inputs to a text encoder, which, after receiving the first text and the first link word, may generate a first vector from the first text and the first link word.
In a real scenario, when the same link word is located in different texts, the meaning of the expression is different, for example, taking the link word as "dream of red blood cell" as an example, the text "dream of red blood cell" is an odd book, here "dream of red blood cell" refers to the book of "dream of red blood cell" written in Cao Xueqin, and the text "dream of red blood cell" led in Li Shaogong, here "dream of red blood cell" refers to a television play. Thus, when the same link word is located in different texts, the corresponding link entities may be different.
In this step, the first text and the first link word are simultaneously input, so that the generated first vector not only represents the first link word, but also can represent the relationship between the first link word and the first text, and the generated first vector can represent the first link word in the first text more accurately and more closely. Particularly, in the case that the first link word has a plurality of alternative link entities, vectors generated when the first link word is located in different texts can be distinguished, so that an appropriate link entity can be more accurately determined from the plurality of alternative link entities.
Step 102: and determining a second vector matched with the first vector from a preset link entity vector library, wherein the link entity vector library is pre-stored with vector representations of a plurality of link entities.
In the application, vectorization can be performed on all the linked entities in the knowledge base, and vector representations of all the linked entities are stored in a pre-created linked entity vector library, and the linked entity vector library can also be called a linked entity vector retrieval system, which is a retrieval recall system for vectors. When in use, a vector is input into the link entity vector library, and the vector matched with the input vector can be searched in the link entity vector library.
In this step, a first vector may be input into the linked entity vector library, and a second vector matching the first vector may be determined from the linked entity vector library. After the second vector is determined, the link entity corresponding to the first link word in the first text is correspondingly determined, and the link entity vector library can return the ID or address of the link entity corresponding to the second vector.
Step 103: and associating the link entity corresponding to the second vector with the first link word in the first text.
In this step, the link entity corresponding to the second vector may be associated with the first link word in the first text according to the ID or address of the link entity returned by the link entity vector library.
According to the technology of the embodiment of the application, the text containing the link words and the link entities are vectorized, so that the proper link entities for the link words can be determined from the link entity vector library in a vector matching mode.
Optionally, the generating the first vector in response to receiving the first text and the first link word includes:
generating a vector corresponding to a first text in response to receiving the first text and a first link word, and generating a position code according to the position information of the first link word in the first text, wherein the length of the position code is the same as that of the first text;
and generating the first vector according to the vector corresponding to the first text and the position code.
This embodiment provides a specific implementation of generating the first vector from the first text and the first link word.
In this embodiment, after the first text and the first link word are input to the text encoder, the text encoder may directly convert the first text into a vector for the first text; for the first link word, a position code having the same length as the first text may be generated according to the position information of the link word in the first text, and the position code may be understood as a vector. As an example, the position code may take the form of a 0-1 code, where the link word is located at position 1, and where there is no link word at position 0, the position code may be 1,1,1,0,0,0,0,0 assuming that the first text is "dream of red blood cells is an odd book" and the first link word is "dream of red blood cells".
After generating the vector corresponding to the first text and the position code corresponding to the first link word, the first vector may be generated according to the vector corresponding to the first text and the position code.
In the embodiment, the generated position code can well reflect the relation between the first link word and the first text by considering the position information of the first link word in the first text, so that the generated first vector can reflect the relation between the first link word and the first text, and the generated first vector can more accurately and more closely represent the first link word in the first text.
Optionally, the vector corresponding to the first text includes an initial vector corresponding to each character of the first text;
the generating the first vector according to the vector corresponding to the first text and the position code includes:
performing para-position addition on the position codes and initial vectors corresponding to each character of the first text to obtain target vectors corresponding to each character of the first text;
and generating the first vector according to the target vector corresponding to each character of the first text.
In this embodiment, for the first text, the text encoder may generate a vector for each character of the first text (i.e., an initial vector for each character), which may be a vector of dimension 512.
The generation process of the first vector is exemplified below:
let the first text be "dream of the red building is an odd book", and the first link word be "dream of the red building".
The first text contains a total of 8 characters, each corresponding to a vector, such as a "red" corresponding to a 512 vector, a "building" corresponding to a 512 vector, … …, resulting in a total of 8 initial vectors.
The position codes corresponding to the first link words are: 1,1,1,0,0,0,0,0.
And carrying out para-position addition on the position codes and the initial vectors corresponding to the characters to obtain target vectors corresponding to the characters, and obtaining target vectors of 8 characters. Finally, a first vector is generated from the target vectors of the 8 characters.
It should be noted that, in addition to the above-mentioned vector operation method for obtaining the first vector, other suitable vector operation methods may be used for obtaining the first vector, which will not be described herein.
Optionally, the determining, from a preset linked entity vector library, a second vector matching the first vector includes:
calculating the similarity between the first vector and each vector in the link entity vector library;
and determining the vector with the highest similarity with the first vector in the link entity vector library as the second vector.
In the application, a vector similarity calculation model can be trained and constructed in advance, and the vector similarity calculation model can adopt a preset framework of a text similarity calculation model to train the similarity between text information and link entity information. After the vector similarity calculation model converges, the similarity between the two vectors can be calculated by using the vector similarity calculation model.
Optionally, the method further comprises:
vectorizing a first link entity to obtain a vector representation corresponding to the first link entity;
and storing the vector representation corresponding to the first link entity in the link entity vector library.
In the application, a vectorization model can be trained and constructed in advance, and after the vectorization model converges, the vectorization model can be used for vectorizing the link entity. For example, the vectorization model may include a portion that encodes the linked entity, which may be separate and used as the linked entity encoder.
As an example, in constructing the vectorization model, a semantic understanding technology and platform relics (ERNIE) may be used to construct a two-classification model, the main inputs of the model include text information, link word location information and link entity information, and a network layer structure inside the model may adopt a double-tower model structure. After the model training is converged, the model has the capability of vectorizing the link entity information and the text information, and the part of the model for encoding the link entity information can be independently used as a link entity encoder. The part of the model for encoding the text information can be separated out and used as a text encoder.
When in use, the related information of the first link entity can be used as input to the link entity encoder, and the link entity encoder can generate the vector representation corresponding to the first link entity according to the related information of the first link entity after receiving the related information of the first link entity. If a two-tower model structure is employed, the output of the model penultimate layer may be used as the final vector representation of the first linking entity.
The following is a brief description of the training process for the vector model:
first, a training sample set is prepared, comprising text information, link words in the text and an alternative link entity corpus, wherein the alternative link entity corpus comprises link entities matched with the link words and link entities not matched with the link words. And secondly, training a classification task, wherein among alternative link entities, the link entity matched with the link word is marked as 1, and the non-matched link entity is marked as 0. For the link entity, feature information including entry information and attribute information is constructed. And then, inputting the text information, the link words in the text and the characteristic information of the alternative link entities into a vectorization model to obtain vector representation of the text information and vector representation of the link entities. Then, the similarity between the vector representation of the text information and the vector representation of the linked entity is trained using a preset framework of the text similarity calculation model.
The vectorization process of the link entity is specifically described below.
Optionally, the vectorizing the first link entity to obtain a vector representation corresponding to the first link entity includes:
acquiring entry information and attribute information of the first link entity;
and vectorizing the entry information and the attribute information to obtain vector representation corresponding to the first link entity.
In this embodiment, for the link entity, term information and attribute information may be constructed as feature information, where the term information is generally text information, and the attribute information may include an attribute name and an attribute value, and generally adopts a table format.
The term information and the attribute information are used as key characteristic information for distinguishing the link entity, and vector representation of the link entity can be more accurate by vectorizing the term information and the attribute information.
Optionally, the vectorizing the first link entity to obtain a vector representation corresponding to the first link entity includes:
acquiring entry information, attribute information and first information of the first link entity, wherein the first information is non-text information;
vectorizing the entry information and the attribute information to obtain a third vector;
converting the first information into a fourth vector;
and generating a vector representation corresponding to the first link entity according to the third vector and the fourth vector.
In some cases, the linking entity may contain other more critical non-text information (i.e., the first information), such as a picture, in addition to the characteristic information, such as entry information, attribute information. The first information may be encoded using a pre-training model and converted to a fourth vector for use. Further, the length of the fourth vector is the same as the length of the third vector.
After the third vector and the fourth vector are obtained, a vector representation corresponding to the first linking entity may be obtained in a concatenation manner, for example, the first half is the third vector, and the fourth vector is spliced later.
The term information, attribute information and other non-text information which are more critical are used as key characteristic information for distinguishing the link entities, and vector representation of the link entities can be more accurate by vectorizing the information.
For convenience of processing, the attribute information may be converted into text information for processing, that is, the vectorizing the term information and the attribute information includes:
converting the attribute information into a second text;
and vectorizing the entry information and the second text.
It should be noted that, in the method for associating link entities in the present application, various optional embodiments may be implemented in combination with each other, or may be implemented separately, which is not limited to this application.
The above embodiments of the present application have at least the following advantages or benefits:
according to the technology of the application, through vectorization of the text containing the link words and the link entities, the proper link entities can be determined from the link entity vector library through vector matching.
As shown in fig. 2, the present application provides a link entity association apparatus 200, including:
a text vectorization module 201, configured to generate a first vector in response to receiving a first text and a first link word; the first link word is a word to be linked in the first text;
a determining module 202, configured to determine a second vector matching the first vector from a preset linked entity vector library, where the linked entity vector library stores vector representations of a plurality of linked entities in advance;
and the association module 203 is configured to associate the link entity corresponding to the second vector with the first link word in the first text.
Optionally, the text vectorization module 201 includes:
the first generation unit is used for responding to the received first text and the first link word, generating a vector corresponding to the first text, and generating a position code according to the position information of the first link word in the first text, wherein the length of the position code is the same as that of the first text;
and the second generating unit is used for generating the first vector according to the vector corresponding to the first text and the position code.
Optionally, the vector corresponding to the first text includes an initial vector corresponding to each character of the first text;
the second generating unit is specifically configured to:
performing para-position addition on the position codes and initial vectors corresponding to each character of the first text to obtain target vectors corresponding to each character of the first text;
and generating the first vector according to the target vector corresponding to each character of the first text.
Optionally, the determining module 202 includes:
the computing unit is used for computing the similarity between the first vector and each vector in the link entity vector library;
and the determining unit is used for determining the vector with highest similarity with the first vector in the link entity vector library as the second vector.
Optionally, the link entity associating apparatus 200 further includes:
the link entity vectorization module is used for vectorizing the first link entity to obtain a vector representation corresponding to the first link entity;
and the storage module is used for storing the vector representation corresponding to the first link entity in the link entity vector library.
Optionally, the link entity vectorization module includes:
the first acquisition unit is used for acquiring entry information and attribute information of the first link entity;
and the first vectorization unit is used for vectorizing the entry information and the attribute information to obtain vector representation corresponding to the first link entity.
Optionally, the link entity vectorization module includes:
the second acquisition unit is used for acquiring entry information, attribute information and first information of the first link entity, wherein the first information is non-text information;
the first vectorization unit is used for vectorizing the entry information and the attribute information to obtain a third vector;
a second vector quantization unit for converting the first information into a fourth vector;
and the third generating unit is used for generating a vector representation corresponding to the first link entity according to the third vector and the fourth vector.
Optionally, the first vectorization unit includes:
a conversion subunit for converting the attribute information into a second text;
and the vectorization subunit is used for vectorizing the entry information and the second text.
The link entity association device 200 provided in the embodiment of the present application can implement each process in the embodiment of the link entity association method, and can achieve the same beneficial effects, so that repetition is avoided, and details are not repeated here.
According to embodiments of the present application, there is also provided an electronic device, a readable storage medium and a computer program product.
FIG. 3 illustrates a schematic block diagram of an example electronic device 300 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.
As shown in fig. 3, the electronic device 300 includes a computing unit 301 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 302 or a computer program loaded from a storage unit 308 into a Random Access Memory (RAM) 303. In the RAM303, various programs and data required for the operation of the device 300 can also be stored. The computing unit 301, the ROM302, and the RAM303 are connected to each other by a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.
Various components in the electronic device 300 are connected to the I/O interface 305, including: an input unit 306 such as a keyboard, a mouse, etc.; an output unit 307 such as various types of displays, speakers, and the like; a storage unit 308 such as a magnetic disk, an optical disk, or the like; and a communication unit 309 such as a network card, a modem, a wireless communication transceiver, etc. The communication unit 309 allows the electronic device 300 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 301 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 301 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 301 performs the various methods and processes described above, such as the linked entity association method. For example, in some embodiments, the linking entity association method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 308. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 300 via the ROM302 and/or the communication unit 309. When the computer program is loaded into RAM303 and executed by computing unit 301, one or more steps of the linked entity association method described above may be performed. Alternatively, in other embodiments, the computing unit 301 may be configured to perform the linked entity association method by any other suitable method (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out the methods of the present disclosure can be written in any combination of one or more editing languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.
The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims (10)

1. A method of linking entity association, comprising:
generating a first vector in response to receiving the first text and the first link word; the first link word is a word to be linked in the first text;
determining a second vector matched with the first vector from a preset link entity vector library, wherein the link entity vector library is pre-stored with vector representations of a plurality of link entities;
associating the link entity corresponding to the second vector with the first link word in the first text;
wherein the generating a first vector in response to receiving the first text and the first link word comprises:
generating a vector corresponding to a first text in response to receiving the first text and a first link word, and generating a position code according to the position information of the first link word in the first text, wherein the length of the position code is the same as that of the first text;
generating the first vector according to the vector corresponding to the first text and the position code;
the determining, from a preset link entity vector library, a second vector matched with the first vector includes:
calculating the similarity between the first vector and each vector in the link entity vector library;
determining a vector with highest similarity with the first vector in the link entity vector library as the second vector;
wherein the method further comprises:
vectorizing a first link entity to obtain a vector representation corresponding to the first link entity;
storing the vector representation corresponding to the first link entity in the link entity vector library;
the vectorizing the first link entity to obtain a vector representation corresponding to the first link entity includes:
acquiring entry information, attribute information and first information of the first link entity, wherein the first information is non-text information;
vectorizing the entry information and the attribute information to obtain a third vector;
converting the first information into a fourth vector;
and generating a vector representation corresponding to the first link entity according to the third vector and the fourth vector.
2. The method of claim 1, wherein the vector corresponding to the first text comprises an initial vector corresponding to each character of the first text;
the generating the first vector according to the vector corresponding to the first text and the position code includes:
performing para-position addition on the position codes and initial vectors corresponding to each character of the first text to obtain target vectors corresponding to each character of the first text;
and generating the first vector according to the target vector corresponding to each character of the first text.
3. The method of claim 1, wherein the vectorizing the first linking entity to obtain the vector representation corresponding to the first linking entity comprises:
acquiring entry information and attribute information of the first link entity;
and vectorizing the entry information and the attribute information to obtain vector representation corresponding to the first link entity.
4. A method according to claim 1 or 3, wherein said vectorizing said term information and said attribute information comprises:
converting the attribute information into a second text;
and vectorizing the entry information and the second text.
5. A linking entity association apparatus, comprising:
the text vectorization module is used for generating a first vector in response to receiving the first text and the first link word; the first link word is a word to be linked in the first text;
the determining module is used for determining a second vector matched with the first vector from a preset link entity vector library, and the link entity vector library is pre-stored with vector representations of a plurality of link entities;
the association module is used for associating the link entity corresponding to the second vector with the first link word in the first text;
wherein, the text vectorization module includes:
the first generation unit is used for responding to the received first text and the first link word, generating a vector corresponding to the first text, and generating a position code according to the position information of the first link word in the first text, wherein the length of the position code is the same as that of the first text;
the second generating unit is used for generating the first vector according to the vector corresponding to the first text and the position code;
wherein the determining module comprises:
the computing unit is used for computing the similarity between the first vector and each vector in the link entity vector library;
a determining unit, configured to determine, as the second vector, a vector with highest similarity to the first vector in the linked entity vector library;
wherein the apparatus further comprises:
the link entity vectorization module is used for vectorizing the first link entity to obtain a vector representation corresponding to the first link entity;
the storage module is used for storing the vector representation corresponding to the first link entity in the link entity vector library;
wherein, the link entity vectorization module includes:
the second acquisition unit is used for acquiring entry information, attribute information and first information of the first link entity, wherein the first information is non-text information;
the first vectorization unit is used for vectorizing the entry information and the attribute information to obtain a third vector;
a second vector quantization unit for converting the first information into a fourth vector;
and the third generating unit is used for generating a vector representation corresponding to the first link entity according to the third vector and the fourth vector.
6. The apparatus of claim 5, wherein the vector corresponding to the first text comprises an initial vector corresponding to each character of the first text;
the second generating unit is specifically configured to:
performing para-position addition on the position codes and initial vectors corresponding to each character of the first text to obtain target vectors corresponding to each character of the first text;
and generating the first vector according to the target vector corresponding to each character of the first text.
7. The apparatus of claim 5, wherein the linked entity vectorization module comprises:
the first acquisition unit is used for acquiring entry information and attribute information of the first link entity;
and the first vectorization unit is used for vectorizing the entry information and the attribute information to obtain vector representation corresponding to the first link entity.
8. The apparatus of claim 5 or 7, wherein the first vectorization unit comprises:
a conversion subunit for converting the attribute information into a second text;
and the vectorization subunit is used for vectorizing the entry information and the second text.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 4.
10. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1 to 4.
CN202011546360.3A 2020-12-24 2020-12-24 Link entity association method, device, electronic equipment and storage medium Active CN112560466B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011546360.3A CN112560466B (en) 2020-12-24 2020-12-24 Link entity association method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011546360.3A CN112560466B (en) 2020-12-24 2020-12-24 Link entity association method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112560466A CN112560466A (en) 2021-03-26
CN112560466B true CN112560466B (en) 2023-07-25

Family

ID=75030534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011546360.3A Active CN112560466B (en) 2020-12-24 2020-12-24 Link entity association method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112560466B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806552B (en) * 2021-08-30 2022-06-14 北京百度网讯科技有限公司 Information extraction method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295796A (en) * 2016-07-22 2017-01-04 浙江大学 Entity link method based on degree of depth study
KR20180113444A (en) * 2017-04-06 2018-10-16 네이버 주식회사 Method, apparauts and system for named entity linking and computer program thereof
CN109241294A (en) * 2018-08-29 2019-01-18 国信优易数据有限公司 A kind of entity link method and device
CN110991187A (en) * 2019-12-05 2020-04-10 北京奇艺世纪科技有限公司 Entity linking method, device, electronic equipment and medium
CN111159485A (en) * 2019-12-30 2020-05-15 科大讯飞(苏州)科技有限公司 Tail entity linking method, device, server and storage medium
CN111553163A (en) * 2020-04-28 2020-08-18 腾讯科技(武汉)有限公司 Text relevance determining method and device, storage medium and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295796A (en) * 2016-07-22 2017-01-04 浙江大学 Entity link method based on degree of depth study
KR20180113444A (en) * 2017-04-06 2018-10-16 네이버 주식회사 Method, apparauts and system for named entity linking and computer program thereof
CN109241294A (en) * 2018-08-29 2019-01-18 国信优易数据有限公司 A kind of entity link method and device
CN110991187A (en) * 2019-12-05 2020-04-10 北京奇艺世纪科技有限公司 Entity linking method, device, electronic equipment and medium
CN111159485A (en) * 2019-12-30 2020-05-15 科大讯飞(苏州)科技有限公司 Tail entity linking method, device, server and storage medium
CN111553163A (en) * 2020-04-28 2020-08-18 腾讯科技(武汉)有限公司 Text relevance determining method and device, storage medium and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于词向量的中文微博实体链接方法;毛二松;王波;唐永旺;梁丹;;计算机应用与软件(04);全文 *

Also Published As

Publication number Publication date
CN112560466A (en) 2021-03-26

Similar Documents

Publication Publication Date Title
CN113313022B (en) Training method of character recognition model and method for recognizing characters in image
CN113553414B (en) Intelligent dialogue method, intelligent dialogue device, electronic equipment and storage medium
EP4113357A1 (en) Method and apparatus for recognizing entity, electronic device and storage medium
CN114281968B (en) Model training and corpus generation method, device, equipment and storage medium
CN110808032A (en) Voice recognition method and device, computer equipment and storage medium
CN113053367A (en) Speech recognition method, model training method and device for speech recognition
CN115099239B (en) Resource identification method, device, equipment and storage medium
EP3920074A2 (en) Method for industry text increment, related apparatus, and computer program product
CN112560466B (en) Link entity association method, device, electronic equipment and storage medium
CN117633194A (en) Large model prompt data processing method and device, electronic equipment and storage medium
CN113051896A (en) Method and device for correcting text, electronic equipment and storage medium
CN115658903B (en) Text classification method, model training method, related device and electronic equipment
CN116662484A (en) Text regularization method, device, equipment and storage medium
CN114758649B (en) Voice recognition method, device, equipment and medium
CN114118049B (en) Information acquisition method, device, electronic equipment and storage medium
CN114970666B (en) Spoken language processing method and device, electronic equipment and storage medium
CN113553833B (en) Text error correction method and device and electronic equipment
JP2023012541A (en) Question answering method, device, and electronic apparatus based on table
CN116049370A (en) Information query method and training method and device of information generation model
CN115936018A (en) Method and device for translating terms, electronic equipment and storage medium
CN115510203A (en) Question answer determining method, device, equipment, storage medium and program product
CN112784599B (en) Method and device for generating poem, electronic equipment and storage medium
CN114841172A (en) Knowledge distillation method, apparatus and program product for text matching double tower model
CN114417862A (en) Text matching method, and training method and device of text matching model
CN112905917A (en) Inner chain generation method, model training method, related device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant