CN114117062A - Text vector representation method and device and electronic equipment - Google Patents

Text vector representation method and device and electronic equipment Download PDF

Info

Publication number
CN114117062A
CN114117062A CN202111268351.7A CN202111268351A CN114117062A CN 114117062 A CN114117062 A CN 114117062A CN 202111268351 A CN202111268351 A CN 202111268351A CN 114117062 A CN114117062 A CN 114117062A
Authority
CN
China
Prior art keywords
vector
word
text
entity
representation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111268351.7A
Other languages
Chinese (zh)
Inventor
刘伟硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN202111268351.7A priority Critical patent/CN114117062A/en
Publication of CN114117062A publication Critical patent/CN114117062A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a text vector representation method, a text vector representation device and electronic equipment, and relates to the technical field of data processing, wherein when text vector representation is carried out on a text to be processed, a plurality of words in the text to be processed are obtained firstly; then, acquiring an embedded vector corresponding to each word according to a pre-established embedded vector representation matrix of the vocabulary; acquiring a priori knowledge vector corresponding to each word according to a pre-obtained knowledge graph and entity representation information established based on the knowledge graph; and determining text vector representation corresponding to the text to be processed according to the embedded vector and the prior knowledge vector corresponding to each word. Therefore, the priori knowledge vector obtained according to the knowledge graph and the entity representation information is introduced, and the semantic information and the common sense information represented by the text vector are increased, so that the feature space of the text vector is expanded, and the influence of language noise is reduced.

Description

Text vector representation method and device and electronic equipment
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a text vector representation method, an apparatus, and an electronic device.
Background
At present, word vector models are mainly divided into two types, namely a traditional neural network pre-training model and a pre-training word vector model based on a self-attention mechanism, wherein the traditional neural network pre-training model comprises the following steps: FastText, Word2Vec, GLoVe and the like, and the pre-training Word vector model based on the self-attention mechanism comprises the following components: transformer, Bert, ERNIE, and the like.
The traditional neural network pre-training model has a simple model structure and shallow network depth, so that the feature space of the word vector obtained by learning is limited, and the expression capability of the word vector is limited. The word vector model of Bert et al based on the self-attention mechanism, although solving the problem of long-distance information attenuation, is susceptible to some speech noise.
Disclosure of Invention
The invention aims to provide a text vector representation method, a text vector representation device and electronic equipment, which are used for increasing semantic information and common sense information represented by a text vector, thereby expanding the feature space of the text vector and reducing the influence of language noise.
In a first aspect, an embodiment of the present invention provides a text vector representation method, including:
acquiring a plurality of words in a text to be processed;
acquiring an embedded vector corresponding to each word according to a pre-established embedded vector representation matrix of the words;
acquiring a priori knowledge vector corresponding to each word according to a pre-obtained knowledge graph and entity representation information established based on the knowledge graph; the entity representation information comprises vector representations corresponding to each entity in the knowledge graph and relationship factors corresponding to relationships among the entities;
and determining the text vector representation corresponding to the text to be processed according to the embedded vector and the prior knowledge vector corresponding to each word.
Further, the text to be processed is a chinese text, and the step of obtaining a plurality of words in the text to be processed includes:
and performing word segmentation processing on the text to be processed to obtain a plurality of words.
Further, the step of obtaining an embedded vector corresponding to each word according to a pre-established embedded vector representation matrix of the vocabulary includes:
and extracting the embedded vector corresponding to each word from the embedded vector representation matrix.
Further, the step of obtaining a priori knowledge vector corresponding to each word according to a pre-obtained knowledge graph and entity representation information established based on the knowledge graph includes:
for each term, taking the term as an entity, and inquiring all associated entities directly associated with the term in the knowledge graph;
and determining a priori knowledge vector corresponding to the word according to the entity representation information and all the associated entities.
Further, the step of querying all associated entities directly associated with the term in the knowledge-graph by using the term as an entity comprises:
inquiring whether a target entity corresponding to the term exists in the knowledge graph;
when the word exists, determining each entity directly associated with the target entity as an associated entity corresponding to the word;
and when the empty entity does not exist, determining the empty entity as a related entity corresponding to the word, wherein the relation between the empty entity and the word is an empty relation, and the entity representation information further comprises vector representation corresponding to the empty entity and a relation factor corresponding to the empty relation.
Further, the step of determining the prior knowledge vector corresponding to the word according to the entity representing information and all the associated entities includes:
according to the entity representation information, obtaining the vector representation corresponding to each associated entity and the corresponding relation factor of the relation between the vector representation and the word;
when the number of the associated entities is one, determining a vector formed by vector representation corresponding to the associated entities and corresponding relation factors as a priori knowledge vector corresponding to the word;
and when the number of the associated entities is at least two, carrying out weighted summation processing on the vector representations and the corresponding relation factors corresponding to all the associated entities to obtain the prior knowledge vector corresponding to the word.
Further, the step of determining the text vector representation corresponding to the text to be processed according to the embedded vector and the prior knowledge vector corresponding to each word includes:
splicing the embedded vector corresponding to each word with the prior knowledge vector to obtain a target word vector corresponding to each word;
and determining a matrix formed by target word vectors corresponding to the words as text vector representation corresponding to the text to be processed.
In a second aspect, an embodiment of the present invention further provides a text vector representing apparatus, including:
the first acquisition module is used for acquiring a plurality of words in the text to be processed;
the second acquisition module is used for acquiring an embedded vector corresponding to each word according to a pre-established embedded vector representation matrix of the vocabulary;
the third acquisition module is used for acquiring a priori knowledge vector corresponding to each word according to a pre-acquired knowledge graph and entity representation information established based on the knowledge graph; the entity representation information comprises vector representations corresponding to each entity in the knowledge graph and relationship factors corresponding to relationships among the entities;
and the determining module is used for determining the text vector representation corresponding to the text to be processed according to the embedded vector and the prior knowledge vector corresponding to each word.
In a third aspect, an embodiment of the present invention further provides an electronic device, including a memory and a processor, where the memory stores a computer program that is executable on the processor, and the processor implements the text vector representation method according to the first aspect when executing the computer program.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method for representing text vectors according to the first aspect is performed.
According to the text vector representing method, the text vector representing device and the electronic equipment, when text vector representation is performed on a text to be processed, a plurality of words in the text to be processed are obtained firstly; then, acquiring an embedded vector corresponding to each word according to a pre-established embedded vector representation matrix of the vocabulary; acquiring a priori knowledge vector corresponding to each word according to a pre-obtained knowledge graph and entity representation information established based on the knowledge graph; the entity representation information comprises vector representations corresponding to all entities in the knowledge graph and relationship factors corresponding to relationships among the entities; and determining text vector representation corresponding to the text to be processed according to the embedded vector and the prior knowledge vector corresponding to each word. Therefore, the priori knowledge vector obtained according to the knowledge graph and the entity representation information is introduced, and the semantic information and the common sense information represented by the text vector are increased, so that the feature space of the text vector is expanded, and the influence of language noise is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic flowchart of a text vector representation method according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating processing of a single word in a text vector representation method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a text vector representation apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
At present, the feature space of a word vector obtained by learning of a traditional neural network pre-training model is limited, the expression capability of the word vector is limited, and the word vector model based on an attention mechanism cannot effectively introduce prior knowledge and is easily influenced by some language noises. Based on this, according to the text vector representation method, the text vector representation device and the electronic device provided by the embodiments of the present invention, through introducing and utilizing a Knowledge Graph (Knowledge Graph), prior Knowledge related to a current text is introduced in a vector representation stage (i.e., a word embedding stage) of the text, and through Knowledge fusion, the vector representation of the text can have more semantic information and information, so that a feature space of a text vector is expanded, and a problem that a word vector model based on a self-attention mechanism is easily affected by common Knowledge noise is solved.
Word vectors (word embedding technology) are the basic stage in NLP (Natural Language Processing) algorithms, and all deep learning models need to convert Language text into word vectors. The method can be applied to all NLP deep learning models, such as a text classification task or a sequence labeling task. For the convenience of understanding the embodiment, a text vector representation method disclosed by the embodiment of the present invention will be described in detail first.
The embodiment of the invention provides a text vector representation method, which can be executed by an electronic device with data processing capability, wherein the electronic device can be a notebook computer, a desktop computer, a palm computer, a tablet computer, a mobile phone or the like. Referring to fig. 1, a schematic flow chart of a text vector representation method mainly includes the following steps S102 to S108:
step S102, a plurality of words in the text to be processed are obtained.
When the text to be processed is a chinese text, word segmentation processing needs to be performed on the text to be processed to obtain a plurality of words, and a specific word segmentation processing method may refer to related prior art and is not described herein again. When the text to be processed is an English text or the like, each word can be directly used as a word, so that a plurality of words in the text to be processed can be obtained.
And step S104, acquiring an embedded vector corresponding to each word according to a pre-established embedded vector representation matrix of the vocabulary.
An embedded vector representation matrix of the vocabulary is established in advance, and each word in the vocabulary has a corresponding vector representation. When text vector representation is performed, an embedded vector corresponding to each word is extracted from the embedded vector representation matrix.
Step S106, acquiring a priori knowledge vector corresponding to each word according to a pre-obtained knowledge graph and entity representation information established based on the knowledge graph; the entity representation information comprises vector representations corresponding to all entities in the knowledge graph and relationship factors corresponding to relationships among the entities.
The method comprises the steps of obtaining a corresponding knowledge graph in advance, establishing a vector representation matrix for all entities (entity) in the knowledge graph, setting a corresponding factor for the relation (relationship) between all the entities in the knowledge graph, wherein each entity in the knowledge graph has a corresponding vector representation, the corresponding factor is called a relationship factor, additionally establishing a vector representation corresponding to a null entity and a relationship factor corresponding to a null relationship, and forming entity representation information by the vector representation and the relationship factor corresponding to all the entities in the knowledge graph.
In some possible embodiments, step S106 may be implemented by the following process: for each term, taking the term as an entity, and inquiring all associated entities directly associated with the term in the knowledge graph; and determining a priori knowledge vector corresponding to the word according to the entity representation information and all associated entities.
In specific implementation, the associated entities corresponding to the words can be determined through the following processes: inquiring whether a target entity corresponding to the term exists in the knowledge graph; when the word exists, determining each entity directly associated with the target entity as an associated entity corresponding to the word; and when the empty entity does not exist, determining the empty entity as the associated entity corresponding to the word, wherein the relation between the empty entity and the word is an empty relation, and the entity representation information further comprises vector representation corresponding to the empty entity and a relation factor corresponding to the empty relation.
The prior knowledge vector corresponding to a word may be determined by: according to the entity representation information, obtaining the vector representation corresponding to each associated entity and the corresponding relation factor of the relation between the vector representation and the word; when the number of the associated entities is one, determining a vector formed by vector representation corresponding to the associated entities and corresponding relationship factors as a priori knowledge vector corresponding to the word; and when the number of the associated entities is at least two, carrying out weighted summation processing on the vector representations corresponding to all the associated entities and the corresponding relationship factors to obtain the prior knowledge vector corresponding to the word.
And S108, determining text vector representation corresponding to the text to be processed according to the embedded vector and the prior knowledge vector corresponding to each word.
In some possible embodiments, the embedded vector and the prior knowledge vector corresponding to each word may be spliced together to obtain a target word vector corresponding to each word; and determining a matrix formed by target word vectors corresponding to all the words as text vector representation corresponding to the text to be processed.
After the text vector representation corresponding to the text to be processed is obtained, a pre-training task represented by the text vector can be designed according to the requirements of subsequent text tasks, and all parameters in the optimization model are trained.
For easy understanding, refer to a schematic processing diagram of a single word in a text vector representation method shown in fig. 2, and refer to a word in a textiThe treatment process comprises the following steps: first, word is fetched in the embedded vector representation matrixiA corresponding embedding vector (word embedding); word is to beiUsing the entity as an entity to inquire the entity directly related to the entity in a Knowledge Graph (Knowledge Graph) and taking out the entity to obtain the related entity1、entity2、···entityn(ii) a Extracting the corresponding vector representation (entity embed) of all the inquired entitiesdding) and the entity and wordiRelationship factor corresponding to the relationship between (i.e., relationship factor: alpha)1、α2、αi···αn) Carrying out weighted summation on the vector representations corresponding to all the inquired entities and the corresponding relation factors to obtain the wordiCorresponding prior knowledge vector (entity embedding for word), and finally, the wordiThe corresponding prior knowledge vector (entity embedding for word) and the embedded vector (word embedding) are spliced together to be used as a target word vector (word embedding with KG) with prior knowledge information.
According to the text vector representation method provided by the embodiment of the invention, when text vector representation is carried out on a text to be processed, a plurality of words in the text to be processed are obtained firstly; then, acquiring an embedded vector corresponding to each word according to a pre-established embedded vector representation matrix of the vocabulary; acquiring a priori knowledge vector corresponding to each word according to a pre-obtained knowledge graph and entity representation information established based on the knowledge graph; the entity representation information comprises vector representations corresponding to all entities in the knowledge graph and relationship factors corresponding to relationships among the entities; and determining text vector representation corresponding to the text to be processed according to the embedded vector and the prior knowledge vector corresponding to each word. Therefore, the priori knowledge vector obtained according to the knowledge graph and the entity representation information is introduced, and the semantic information and the common sense information represented by the text vector are increased, so that the feature space of the text vector is expanded, and the influence of language noise is reduced.
Corresponding to the text vector representing method, an embodiment of the present invention further provides a text vector representing apparatus, referring to a schematic structural diagram of a text vector representing apparatus shown in fig. 3, where the apparatus includes:
a first obtaining module 32, configured to obtain multiple words in a text to be processed;
a second obtaining module 34, configured to obtain an embedded vector corresponding to each word according to a pre-established embedded vector representation matrix of the vocabulary;
a third obtaining module 36, configured to obtain, according to a pre-obtained knowledge graph and entity representation information established based on the knowledge graph, a priori knowledge vector corresponding to each word; the entity representation information comprises vector representations corresponding to all entities in the knowledge graph and relationship factors corresponding to relationships among the entities;
and the determining module 38 is configured to determine, according to the embedded vector and the prior knowledge vector corresponding to each word, a text vector representation corresponding to the text to be processed.
Optionally, the text to be processed is a chinese text, and the first obtaining module 32 is specifically configured to: and performing word segmentation processing on the text to be processed to obtain a plurality of words.
Optionally, the second obtaining module 34 is specifically configured to: from the embedding vector representation matrix, an embedding vector corresponding to each word is extracted.
Optionally, the third obtaining module 36 is specifically configured to: for each term, taking the term as an entity, and inquiring all associated entities directly associated with the term in the knowledge graph; and determining a priori knowledge vector corresponding to the word according to the entity representation information and all associated entities.
Further, the third obtaining module 36 is further configured to: inquiring whether a target entity corresponding to the term exists in the knowledge graph; when the word exists, determining each entity directly associated with the target entity as an associated entity corresponding to the word; and when the empty entity does not exist, determining the empty entity as the associated entity corresponding to the word, wherein the relation between the empty entity and the word is an empty relation, and the entity representation information further comprises vector representation corresponding to the empty entity and a relation factor corresponding to the empty relation.
Further, the third obtaining module 36 is further configured to: according to the entity representation information, obtaining the vector representation corresponding to each associated entity and the corresponding relation factor of the relation between the vector representation and the word; when the number of the associated entities is one, determining a vector formed by vector representation corresponding to the associated entities and corresponding relationship factors as a priori knowledge vector corresponding to the word; and when the number of the associated entities is at least two, carrying out weighted summation processing on the vector representations corresponding to all the associated entities and the corresponding relationship factors to obtain the prior knowledge vector corresponding to the word.
Further, the determining module 38 is specifically configured to: splicing the embedded vector corresponding to each word with the prior knowledge vector to obtain a target word vector corresponding to each word; and determining a matrix formed by target word vectors corresponding to all the words as text vector representation corresponding to the text to be processed.
When the text vector representation device provided by the embodiment of the invention is used for carrying out text vector representation on a text to be processed, a plurality of words in the text to be processed are obtained firstly; then, acquiring an embedded vector corresponding to each word according to a pre-established embedded vector representation matrix of the vocabulary; acquiring a priori knowledge vector corresponding to each word according to a pre-obtained knowledge graph and entity representation information established based on the knowledge graph; the entity representation information comprises vector representations corresponding to all entities in the knowledge graph and relationship factors corresponding to relationships among the entities; and determining text vector representation corresponding to the text to be processed according to the embedded vector and the prior knowledge vector corresponding to each word. Therefore, the priori knowledge vector obtained according to the knowledge graph and the entity representation information is introduced, and the semantic information and the common sense information represented by the text vector are increased, so that the feature space of the text vector is expanded, and the influence of language noise is reduced.
The device provided by the embodiment has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments.
Referring to fig. 4, an embodiment of the present invention further provides an electronic device 100, including: a processor 40, a memory 41, a bus 42 and a communication interface 43, wherein the processor 40, the communication interface 43 and the memory 41 are connected through the bus 42; the processor 40 is arranged to execute executable modules, such as computer programs, stored in the memory 41.
The Memory 41 may include a Random Access Memory (RAM) or a non-volatile Memory (NVM), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 43 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, etc. may be used.
The bus 42 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 4, but that does not indicate only one bus or one type of bus.
The memory 41 is used for storing a program, the processor 40 executes the program after receiving an execution instruction, and the method executed by the apparatus defined by the flow disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 40, or implemented by the processor 40.
The processor 40 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 40. The Processor 40 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory 41, and the processor 40 reads the information in the memory 41 and completes the steps of the method in combination with the hardware thereof.
Embodiments of the present invention further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the text vector representation method described in the foregoing method embodiments. The computer-readable storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a RAM, a magnetic disk, or an optical disk.
In all examples shown and described herein, any particular value should be construed as merely exemplary, and not as a limitation, and thus other examples of example embodiments may have different values.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for text vector representation, comprising:
acquiring a plurality of words in a text to be processed;
acquiring an embedded vector corresponding to each word according to a pre-established embedded vector representation matrix of the words;
acquiring a priori knowledge vector corresponding to each word according to a pre-obtained knowledge graph and entity representation information established based on the knowledge graph; the entity representation information comprises vector representations corresponding to each entity in the knowledge graph and relationship factors corresponding to relationships among the entities;
and determining the text vector representation corresponding to the text to be processed according to the embedded vector and the prior knowledge vector corresponding to each word.
2. The method of claim 1, wherein the text to be processed is a chinese text, and the step of obtaining a plurality of words in the text to be processed comprises:
and performing word segmentation processing on the text to be processed to obtain a plurality of words.
3. The text vector representation method of claim 1, wherein the step of obtaining the embedded vector corresponding to each word according to the pre-established vocabulary embedded vector representation matrix comprises:
and extracting the embedded vector corresponding to each word from the embedded vector representation matrix.
4. The method of claim 1, wherein the step of obtaining the prior knowledge vector corresponding to each word according to a pre-obtained knowledge-graph and entity representation information established based on the knowledge-graph comprises:
for each term, taking the term as an entity, and inquiring all associated entities directly associated with the term in the knowledge graph;
and determining a priori knowledge vector corresponding to the word according to the entity representation information and all the associated entities.
5. The method of claim 4, wherein the step of querying the knowledge-graph for all associated entities directly associated with the term as one entity comprises:
inquiring whether a target entity corresponding to the term exists in the knowledge graph;
when the word exists, determining each entity directly associated with the target entity as an associated entity corresponding to the word;
and when the empty entity does not exist, determining the empty entity as a related entity corresponding to the word, wherein the relation between the empty entity and the word is an empty relation, and the entity representation information further comprises vector representation corresponding to the empty entity and a relation factor corresponding to the empty relation.
6. The text vector representation method of claim 4, wherein the step of determining the prior knowledge vector corresponding to the word according to the entity representation information and all the associated entities comprises:
according to the entity representation information, obtaining the vector representation corresponding to each associated entity and the corresponding relation factor of the relation between the vector representation and the word;
when the number of the associated entities is one, determining a vector formed by vector representation corresponding to the associated entities and corresponding relation factors as a priori knowledge vector corresponding to the word;
and when the number of the associated entities is at least two, carrying out weighted summation processing on the vector representations and the corresponding relation factors corresponding to all the associated entities to obtain the prior knowledge vector corresponding to the word.
7. The method according to claim 1, wherein the step of determining the text vector representation corresponding to the text to be processed according to the embedded vector and the prior knowledge vector corresponding to each word comprises:
splicing the embedded vector corresponding to each word with the prior knowledge vector to obtain a target word vector corresponding to each word;
and determining a matrix formed by target word vectors corresponding to the words as text vector representation corresponding to the text to be processed.
8. A text vector representation apparatus, comprising:
the first acquisition module is used for acquiring a plurality of words in the text to be processed;
the second acquisition module is used for acquiring an embedded vector corresponding to each word according to a pre-established embedded vector representation matrix of the vocabulary;
the third acquisition module is used for acquiring a priori knowledge vector corresponding to each word according to a pre-acquired knowledge graph and entity representation information established based on the knowledge graph; the entity representation information comprises vector representations corresponding to each entity in the knowledge graph and relationship factors corresponding to relationships among the entities;
and the determining module is used for determining the text vector representation corresponding to the text to be processed according to the embedded vector and the prior knowledge vector corresponding to each word.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method of any one of claims 1-7 when executing the computer program.
10. A computer-readable storage medium, having stored thereon a computer program, characterized in that the computer program, when being executed by a processor, is adapted to carry out the method of any one of claims 1-7.
CN202111268351.7A 2021-10-29 2021-10-29 Text vector representation method and device and electronic equipment Pending CN114117062A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111268351.7A CN114117062A (en) 2021-10-29 2021-10-29 Text vector representation method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111268351.7A CN114117062A (en) 2021-10-29 2021-10-29 Text vector representation method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN114117062A true CN114117062A (en) 2022-03-01

Family

ID=80377442

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111268351.7A Pending CN114117062A (en) 2021-10-29 2021-10-29 Text vector representation method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114117062A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115392192A (en) * 2022-10-27 2022-11-25 北京中科汇联科技股份有限公司 Text coding method and system for hybrid neural network and character information

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115392192A (en) * 2022-10-27 2022-11-25 北京中科汇联科技股份有限公司 Text coding method and system for hybrid neural network and character information
CN115392192B (en) * 2022-10-27 2023-01-17 北京中科汇联科技股份有限公司 Text coding method and system for hybrid neural network and character information

Similar Documents

Publication Publication Date Title
US10282420B2 (en) Evaluation element recognition method, evaluation element recognition apparatus, and evaluation element recognition system
CN115203380A (en) Text processing system and method based on multi-mode data fusion
CN106997342B (en) Intention identification method and device based on multi-round interaction
CN111242291A (en) Neural network backdoor attack detection method and device and electronic equipment
CN111274797A (en) Intention recognition method, device and equipment for terminal and storage medium
CN110532107B (en) Interface calling method, device, computer equipment and storage medium
CN116884391B (en) Multimode fusion audio generation method and device based on diffusion model
CN110287951B (en) Character recognition method and device
CN115204183A (en) Knowledge enhancement based dual-channel emotion analysis method, device and equipment
CN112417878B (en) Entity relation extraction method, system, electronic equipment and storage medium
CN111651674A (en) Bidirectional searching method and device and electronic equipment
CN114117062A (en) Text vector representation method and device and electronic equipment
CN109829040B (en) Intelligent conversation method and device
CN110019952B (en) Video description method, system and device
CN107071553B (en) Method, device and computer readable storage medium for modifying video and voice
CN111949766A (en) Text similarity recognition method, system, equipment and storage medium
CN116994267A (en) Nameplate VIN code identification method, device, storage medium and equipment
CN116468038A (en) Information extraction method, method and device for training information extraction model
CN114970666B (en) Spoken language processing method and device, electronic equipment and storage medium
CN116303937A (en) Reply method, reply device, electronic equipment and readable storage medium
CN115906797A (en) Text entity alignment method, device, equipment and medium
CN113591862A (en) Text recognition method and device
CN116266394A (en) Multi-modal emotion recognition method, device and storage medium
CN113836297A (en) Training method and device for text emotion analysis model
CN113780239A (en) Iris recognition method, iris recognition device, electronic equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination