CN112307754A - Statement acquisition method and device - Google Patents
Statement acquisition method and device Download PDFInfo
- Publication number
- CN112307754A CN112307754A CN202010287542.7A CN202010287542A CN112307754A CN 112307754 A CN112307754 A CN 112307754A CN 202010287542 A CN202010287542 A CN 202010287542A CN 112307754 A CN112307754 A CN 112307754A
- Authority
- CN
- China
- Prior art keywords
- word
- metaphor
- modifier
- vector
- distance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 239000013598 vector Substances 0.000 claims abstract description 287
- 239000003607 modifier Substances 0.000 claims abstract description 168
- 238000012549 training Methods 0.000 claims description 17
- 238000012545 processing Methods 0.000 claims description 16
- 230000011218 segmentation Effects 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 16
- 230000008569 process Effects 0.000 description 10
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 235000009508 confectionery Nutrition 0.000 description 4
- 235000012907 honey Nutrition 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 241000157593 Milvus Species 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000007639 printing Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a statement acquisition method and device. The method comprises the following steps: the method comprises the steps of obtaining body words, determining at least one metaphor corresponding to the body words from a database according to the body words, wherein the database comprises a plurality of body words and at least one metaphor corresponding to each body word, the metaphor corresponding to each body word is generated according to at least one triple and a preset metaphor template, the at least one triple is determined according to the correlation distances of the body word set, the plena set, the modifier set and the triple, the correlation distances are determined according to the first vector cosine distances of the body words and the modifiers, the second vector cosine distances of the metaphors and the modifiers and the difference between the first vector cosine distances and the second vector cosine distances, the vector cosine distances are calculated according to embedded vectors of the two words, and at least one metaphor is sent to terminal equipment. Therefore, the number of the metaphorical sentences is greatly increased, and the diversity of the metaphorical sentences is improved.
Description
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a statement acquisition method and apparatus.
Background
With the continuous expansion of computer application fields, natural language processing has been highly regarded by people, and a metaphor is a common paraphrase method, which means that a metaphor with similar points to the ontology is used to describe or explain the ontology, and metaphors are used in writing and talking, so that a higher level of language level can be embodied, and the metaphor is one of the difficulties of natural language processing. With the development of intelligent technology in recent years, chat robots or creation robots are evolving from "accuracy" to "openness" and "humanoid". Generally, metaphors are used in conversations or texts, so that the user's enjoyment can be greatly improved, and the user can be prompted to continuously talk or read.
In the conventional chat robot, metaphors in a corpus including chat logs or comment logs are collected, and then ontology words and metaphor words in the collected metaphors are identified and stored in a database, so that the metaphors corresponding to the ontology words can be directly used when the ontology words input by a user are received.
However, since the corpus of metaphors generated as described above is limited, the number of metaphors stored in the database is not large, and the metaphors may be used repeatedly, which may result in poor user experience.
Disclosure of Invention
The application provides a sentence acquisition method and a sentence acquisition device, which can obtain available metaphors of all common words of Chinese and greatly increase the number of metaphors.
In a first aspect, the present application provides a statement acquisition method, including:
obtaining a body word;
determining at least one metaphor sentence corresponding to the body word from a database according to the body word, wherein the database comprises a plurality of body words and at least one metaphor sentence corresponding to each body word, the at least one metaphor sentence corresponding to each body word is generated according to at least one triple and a preset metaphor sentence template, the at least one triple is determined according to the correlation distance of the body word set, the face word set, the modifier set and the triple, the triple comprises the body words, the body words and the modifiers, the correlation distance is determined according to the first vector cosine distance of the body words and the modifiers, the second vector cosine distance of the body words and the modifiers and the difference value of the first vector cosine distance and the second vector cosine distance, and the vector cosine distance is calculated according to the embedded vectors of the two words;
and sending the at least one metaphor sentence to the terminal equipment.
Optionally, the method further includes:
performing word segmentation processing on the received corpus to obtain a word sample set, and determining a word embedded vector index corresponding to the word sample set according to a preset training model, wherein the word embedded vector index is used for storing an embedded vector of each word;
determining the metaphor embedded vector index according to the word embedded vector index and the metaphor set, and determining the modifier embedded vector index according to the word embedded vector index and the modifier set;
for each body word in the body word set, determining M triples according to the body word embedded vector index and the modifier embedded vector index, wherein M is a preset positive integer;
and generating at least one metaphor sentence according to the correlation distance of each triple in the M triples and a preset metaphor sentence template.
Optionally, the generating at least one metaphorical sentence according to the relevance distance of each triplet in the multiple triplets and a preset metaphorical sentence template includes:
respectively calculating the correlation distance of each triple in the M triples;
determining P triples with the minimum correlation distance from the M triples, wherein P is a preset positive integer;
and generating the at least one metaphorical sentence according to the P triples and a preset metaphorical sentence template.
Optionally, determining M triples according to the word-lifting embedded vector index and the modifier-embedding vector index includes:
determining N modifying words with the minimum vector cosine distance with the body word by embedding the modifying words into the vector index, wherein N is a preset positive integer;
for each modifier in the N modifiers, determining Q metaphors with the minimum vector cosine distance from the modifier through the metaphor embedding vector index, and determining S metaphors with the vector cosine distance from the modifier smaller than or equal to a first distance through the metaphors embedding vector index, wherein the first distance is the vector cosine distance between the local word and the modifier;
and obtaining the M triples according to the Q metaphors, the S metaphors and the N modifiers, wherein M is Q + N + S.
Optionally, the correlation distance is calculated by the following formula:
Distα,β,γ=distα,γ+distβ,γ+log(|distα,γ-disβ,γ|+)
wherein Distα,β,γFor the correlation distance, distα,γIs the first vector cosine distance, distβ,γIs the second vector cosine distance, | distα,γ-distβ,γAnd | is the difference value of the first vector cosine distance and the second vector cosine distance, and ξ is an integer.
Optionally, the determining a metaphor word embedding vector index from the word embedding vector index and the set of plenary words comprises:
finding word embedding vectors of all metaphors in the metaphor word set from the word embedding vector index, and obtaining the metaphor word embedding vector index according to the word embedding vectors of all metaphors;
determining a modifier-embedded vector index from the word-embedded vector index and the set of modifiers, comprising:
and finding out word embedding vectors of all modifiers in the modifier set from the word embedding vector index, and obtaining the modifier embedding vector index according to the word embedding vectors of all modifiers.
Optionally, the method further includes:
obtaining a ontology word database, a body word corpus and a modifier word corpus;
extracting the set of ontology words, and the set of modifiers from the ontology word corpus, and the modifier word corpus, respectively.
In a second aspect, the present application provides a sentence acquisition apparatus, including:
the acquisition module is used for acquiring the body words;
a processing module for determining at least one metaphor sentence corresponding to the body word from a database according to the body word, the database comprises a plurality of body words and at least one metaphor sentence corresponding to each body word, the at least one metaphor corresponding to each body word is generated according to the at least one triple and a preset metaphor template, the at least one triple is determined from the relevance distances of the body word set, the modifier set and the triple, the triple comprises a body word, a metaphor word and a modifier, the correlation distance is determined according to a first vector cosine distance between the body word and the modifier, a second vector cosine distance between the metaphor word and the modifier and a difference value between the first vector cosine distance and the second vector cosine distance, and the vector cosine distance is calculated according to embedded vectors of two words;
and the sending module is used for sending the at least one metaphor sentence to the terminal equipment.
Optionally, the apparatus further comprises:
the word segmentation module is used for carrying out word segmentation on the received corpus to obtain a word sample set, and determining a word embedded vector index corresponding to the word sample set according to a preset training model, wherein the word embedded vector index is used for storing an embedded vector of each word;
a first determining module, configured to determine the metaphor embedded vector index according to the word embedded vector index and the set of modifiers, and determine the modifier embedded vector index according to the word embedded vector index and the set of modifiers;
a second determining module, configured to determine, for each local word in the local word set, M triples according to the local word embedding vector index and the modifier embedding vector index, where M is a preset positive integer;
and the generating module is used for generating at least one metaphor sentence according to the correlation distance of each triple in the M triples and a preset metaphor sentence template.
Optionally, the generating module is configured to:
respectively calculating the correlation distance of each triple in the M triples;
determining P triples with the minimum correlation distance from the M triples, wherein P is a preset positive integer;
and generating the at least one metaphorical sentence according to the P triples and a preset metaphorical sentence template.
Optionally, the second determining module is configured to:
determining N modifying words with the minimum vector cosine distance with the body word by embedding the modifying words into the vector index, wherein N is a preset positive integer;
for each modifier in the N modifiers, determining Q metaphors with the minimum vector cosine distance from the modifier through the metaphor embedding vector index, and determining S metaphors with the vector cosine distance from the modifier smaller than or equal to a first distance through the metaphors embedding vector index, wherein the first distance is the vector cosine distance between the local word and the modifier;
and obtaining the M triples according to the Q metaphors, the S metaphors and the N modifiers, wherein M is Q + N + S.
Optionally, the correlation distance is calculated by the following formula:
Distα,β,γ=distα,γ+distβ,γ+log(|distα,γ-distβ,γ|+)
wherein Distα,β,γFor the correlation distance, distα,γIs the first vector cosine distance, distβ,γIs the second vector cosine distance, | distα,γ-distβ,γAnd | is the difference value of the first vector cosine distance and the second vector cosine distance, and ξ is an integer.
Optionally, the first determining module is configured to:
finding word embedding vectors of all metaphors in the metaphor word set from the word embedding vector index, and obtaining the metaphor word embedding vector index according to the word embedding vectors of all metaphors;
and finding out word embedding vectors of all modifiers in the modifier set from the word embedding vector index, and obtaining the modifier embedding vector index according to the word embedding vectors of all modifiers.
Optionally, the obtaining module is further configured to:
obtaining a ontology word database, a body word corpus and a modifier word corpus;
the processing module is further configured to: extracting the set of ontology words, and the set of modifiers from the ontology word corpus, and the modifier word corpus, respectively.
According to the sentence acquisition method and device provided by the application, after the body words are acquired, at least one metaphor sentence corresponding to the body words is determined from the database according to the body words, then the at least one metaphor sentence is sent to the terminal equipment, since the body words in the database are all commonly used words of chinese, the at least one metaphor sentence corresponding to each body word in the database is generated according to at least one triple and a preset metaphor sentence template, the at least one triple is determined according to the correlation distances of the body word set, the modifier set and the triple, therefore, the database stores the available metaphors of all the commonly used Chinese words, the number of the metaphors is greatly increased, the repeated use is avoided, the diversity of the metaphorical sentences is improved, the range of application scenes is expanded, and the user experience is improved.
Drawings
In order to more clearly illustrate the technical solutions in the present application or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic view of an application scenario of the present application;
FIG. 2 is a flowchart of an embodiment of a sentence acquisition method provided by the present application;
FIG. 3 is a flowchart of an embodiment of a sentence acquisition method provided by the present application;
FIG. 4 is a schematic diagram of a process for training a Word2vec model after corpus is received;
FIG. 5 is a schematic diagram of a word-embedded vector index;
FIG. 6 is a schematic diagram of a process for deriving a body word embedding vector index and a modifier embedding vector index from word embedding vector indices;
FIG. 7 is a schematic diagram of the relevance of ontology words, metaphorics, and modifiers;
fig. 8 is a schematic process diagram for determining M triples corresponding to each local word in the local word set provided by the present application;
fig. 9 is a schematic structural diagram of a sentence acquisition apparatus provided in the present application;
fig. 10 is a schematic structural diagram of a sentence acquisition apparatus provided in the present application;
fig. 11 is a schematic diagram of a hardware structure of an electronic device provided in the present application.
Detailed Description
To make the purpose, technical solutions and advantages of the present application clearer, the technical solutions in the present application will be clearly and completely described below with reference to the drawings in the present application, and it is obvious that the described embodiments are some, but not all embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
First, some terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.
1. Word2vec, a group of correlation models used to generate Word vectors. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic word text. The network is represented by words and the input words in adjacent positions are guessed, and the order of the words is unimportant under the assumption of the bag-of-words model in word2 vec. After training is complete, the word2vec model can be used to map each word to a vector, which can be used to represent word-to-word relationships, which is a hidden layer of the neural network.
2. TensorFlow is a symbolic mathematical system based on dataflow programming, and is widely applied to programming realization of various machine learning (machine learning) algorithms.
3. The Word Embedding vector index stores the Embedding vector of each Word, a Word2vec model is trained by using Tensorflow according to a Word sample set, and after multiple training is completed, a hidden layer parameter W of the neural network is the Embedding (Embedding) vector of all words in the Word sample set. The word embedding vector indexing in this application can be implemented using any vector storage engine, such as Faiss or Milvus. After entering the embedding vector v of an arbitrary word, the word embedding vector index can find several words and their embedding vectors that are closest to v within milliseconds.
4. The vector cosine distance is the vector cosine distance between two words, the word embedding vector index stores the embedding vector of each word, the vector cosine distance between the two words is calculated according to the embedding vectors of the two words, and the vector cosine distance calculation formula is as follows:wherein A and B are two word embedding vectors respectively, cos theta is the distance of cosine of the vector, i.e. the dot product of A and BDivided by the product of the lengths of a and B.
5. A triplet refers to a combination that includes a body word, a metaphor word, and a modifier word.
In the conventional sentence acquisition method, the metaphors in the database are obtained from the collected existing metaphors by recognizing the local words and the metaphors of the collected metaphors in the corpus and storing the local words and the corresponding metaphors in the database, and the existing metaphors are limited, so that the number of the metaphors stored in the database is small, and the metaphors are repeatedly used during human-computer interaction, and the user experience is not high. In order to solve the problem, the present application provides a sentence acquisition method and apparatus, in the present application, the body words in the database are all commonly used words of the chinese language, at least one metaphor sentence corresponding to each body word in the database is generated according to at least one triple and a preset metaphor sentence template, and at least one triple is determined according to the correlation distance between the body word set, the metaphor word set, the modifier word set and the triple, so that the database stores the available metaphor sentences of all the commonly used words of the chinese language, the number of the metaphor sentences is greatly increased, the situation of repeated use is avoided, the diversity of the metaphor sentences is improved, the range of application scenarios is expanded, and the user experience is improved. The following describes in detail a specific implementation process of the statement acquisition method according to the embodiment of the present application, with reference to the accompanying drawings.
The sentence acquisition method and the sentence acquisition device can be applied to scenes such as machine-generated texts and chat robots (such as customer service robots), taking a customer service robot as an example, fig. 1 is a schematic view of an application scenario of the present application, as shown in fig. 1, the application scenario of the present application relates to a terminal device and a server, the terminal device is an electronic device such as a mobile phone, a personal computer, etc., an interface of the terminal device shown in fig. 1 is a chat interface between a user and the customer service robot, for example, the user inputs a sentence "what is a crescent moon" on the chat interface, the server can acquire the body word "crescent moon" according to the sentence input by the user, inquiring whether the noumenon word exists from the database according to the noumenon word 'crescent moon', if the noumenon word exists, at least one metaphor corresponding to the ontology word is sent to the terminal device, for example, the metaphor is sent: the crescent moon is just like a hook/a sickle, if the noumenon word does not exist, an invalid noumenon is sent to the terminal equipment, the terminal equipment receives at least one metaphor sentence, the received metaphor sentence is displayed, and therefore a user can see the displayed content.
Fig. 2 is a flowchart of an embodiment of a statement obtaining method provided in the present application, where an execution subject of the embodiment may be the server shown in fig. 1, and as shown in fig. 2, the method of the embodiment may include:
and S101, acquiring the body words.
Specifically, a user inputs a sentence through a terminal device or directly inputs a body word, the body word is a noun, and the sentence or word input by the user is analyzed, so that the body word is obtained, for example, a life input by the user is the body word, and for example, a month input by the user is the body word.
And S102, determining at least one metaphor corresponding to the body word from a database according to the body word, wherein the database comprises a plurality of body words and at least one metaphor corresponding to each body word, the at least one metaphor corresponding to each body word is generated according to at least one triple and a preset metaphor template, the at least one triple is determined according to the correlation distances of the body word set, the face-lifting body word set, the modifier set and the triple, the triple comprises the body word, the metaphor and the modifier, and the correlation distances are determined according to the first vector cosine distance of the body word and the modifier, the first vector cosine distance of the metaphor and the modifier and the difference value of the first vector cosine distance and the second vector cosine distance.
Specifically, the database includes a plurality of body words, each body word corresponds to at least one metaphor, the server searches whether there is a body word in the database according to the body word obtained in S101, and if there is a body word, sends at least one metaphor corresponding to the body word to the terminal device, for example, sends a metaphor: the crescent moon is just like a sickle/crescent moon like a fishhook, if the noumenon word does not exist, the invalid noumenon is sent to the terminal equipment, and the terminal equipment displays according to the received content, so that the interaction with the user is realized. In this embodiment, the at least one metaphorical sentence corresponding to each body word may be generated by the server or the metaphorical sentence generating apparatus according to the plurality of triples and the preset metaphorical sentence template after determining the plurality of triples according to the body word set, the modifier word set, and the correlation distance between the triples. The number of the triples can be preset according to the operational capability and the actual requirement of the server, for each triplet, the triplet includes a body word, a metaphor word and a modifier, the correlation distance of the triplet is determined according to the first vector cosine distance between the body word and the modifier, the second vector cosine distance between the metaphor word and the modifier and the difference between the first vector cosine distance and the second vector cosine distance, the correlation distance of the triplet is used for representing the possibility that the triplet can be used as a metaphor sentence, and the smaller the correlation distance of the triplet is, the greater the possibility that the triplet can be used as the metaphor sentence is.
In addition, at least one metaphor sentence corresponding to each body word may be generated by the server, or may be generated by another metaphor sentence generating means, and if the metaphor sentence is generated by the server, the server stores a plurality of body words and at least one metaphor sentence corresponding to each body word in the database after the generation; if the metaphorical sentence generating means generates the metaphorical sentence, the server stores the plurality of body words generated by the metaphorical sentence generating means and at least one metaphorical sentence corresponding to each body word in the database.
Wherein the body word set, the body word set and the modifier word set are three different word sets, the body words and the body word sets are nouns, the modifiers are verbs and adjectives, in an implementable manner, the body word set and the modifier word set can be pre-stored, the body words of the metaphors are generally more difficult to understand abstract matters such as "life", "love", and the like, while the body words are generally common image matters such as "honey", "marine buildings", therefore, in the embodiment, the body words in the body word set can be words in modern poetry sets, modern prose sets and literature magazines, the body words in the body word set can be words in chat logs or comment logs, and the modifiers in the modifier word set can be words in adjective dictionaries, verb dictionaries or dictionary nouns. In another practical implementation manner, the body word set, the face word set and the modifier word set may be obtained online, where the method of this embodiment may further include, before S101: obtaining a body word stock, a body word stock and a modifier word stock, and extracting a body word set, a body word set and a modifier word set from the body word stock, the body word stock and the modifier word stock respectively, for example, the body word stock can be a modern poem set, a modern prose set and a literature magazine set, and performing word segmentation processing on all the words in the body word stock through a word segmentation system to obtain a body word set; the metaphorical word corpus can be a chat log or a comment log, and all corpora in the metaphorical word corpus are segmented through a segmentation system to obtain a well-known word set; the modifying word corpus can be an adjective dictionary, a verb dictionary or a noun dictionary, and words in the dictionary can be directly used as modifying words to obtain a modifying word set.
In this embodiment, the embedded vector of each Word may be a Word2vec model trained by using Tensorflow according to a Word sample set, and after multiple training is completed, the hidden layer parameter W of the neural network is the embedded (Embedding) vector of all words in the Word sample set. The word embedding vector indexing in this application can be implemented using any vector storage engine, such as Faiss or Milvus. After the embedded vector v of any word is input, the word embedded vector index can find several words and embedded vectors thereof which are closest to v within milliseconds.
S103, at least one metaphor sentence is sent to the terminal equipment.
Specifically, after receiving at least one metaphor sentence, the terminal device displays the metaphor sentence according to the received content, thereby realizing interaction with the user.
According to the sentence acquisition method provided by the embodiment, after the body words are acquired, at least one metaphor corresponding to the body words is determined from the database according to the body words, and then the at least one metaphor is sent to the terminal device.
In the embodiment shown in fig. 1, the plurality of body words and the at least one metaphor corresponding to each body word stored in the database may be generated in advance and then stored, and may also be updated according to the above generation method in a later period, and may also be generated on line, and the generated plurality of body words and the at least one metaphor corresponding to each body word are directly stored in the database, and before S101, a process of generating the at least one metaphor corresponding to each body word may also be included, and a process of generating the at least one metaphor corresponding to each body word is described in detail below with reference to fig. 2.
Fig. 3 is a flowchart of an embodiment of a statement obtaining method provided in the present application, where an execution subject of the embodiment may be the server shown in fig. 1, and as shown in fig. 3, the method of the embodiment may include:
s201, performing word segmentation processing on the received corpus to obtain a word sample set, and determining a word embedded vector index corresponding to the word sample set according to a preset training model, wherein the word embedded vector index is used for storing an embedded vector of each word.
The received corpus can be commodity comments, chat logs and user authored documents, and it can be understood that the received corpus comprises a plurality of sentences, each sentence can be participled through a participle system to obtain a word sample set, the word sample set is a plurality of words obtained after participle, for example, one sentence is "today's weather is clear", the participle processing is carried out to obtain "today's", "weather" and "clear", and the words in the word sample set are stored according to the natural sequence of the sentences. Then, determining a Word embedded vector index corresponding to the Word sample set according to a preset training model, wherein the preset training model may be a Word2vec model, when the Word2vec model is a Word2vec model, the Word2vec model is trained by using Tensorflow according to the Word sample set, fig. 4 is a schematic process diagram of training the Word2vec model after the corpus is received, as shown in fig. 4, training the Word2vec model to the Word sample set, and after multiple training is completed, obtaining an input layer, a hidden layer and an output layer of a neural network, and obtaining a hidden layer parameter W (the size is Dx 300, and D is the total number of words in the Word sample set) of the neural network, namely, the embedded (embedded) vector of all words in the Word sample set. In this embodiment, for example, the width of the hidden layer may be set to 300 (also to the length of the word embedding vector obtained by training), the width of the vote window may be set to 8, for example, D is the total number of words in different sample sets, and may generally be several tens of thousands to several hundred thousands. After the embedded vectors of all words in the word sample set are obtained, the embedded vectors of all words are stored in a vector storage engine, and the word embedded vector index can be obtained. Fig. 5 is a schematic diagram of a word embedding vector index, and as shown in fig. 5, the word embedding vector index stores each word and an embedding vector of the word, for example, the embedding vector corresponding to the life of the word is "0.235760.813240.33255-0.27385 … …".
S202, determining a metaphor word embedding vector index according to the word embedding vector index and the metaphor word set, and determining a modifier embedding vector index according to the word embedding vector index and the modifier set.
Specifically, as a practical manner, fig. 6 is a schematic diagram illustrating a process of obtaining a body word embedding vector index and a modifier embedding vector index according to the word embedding vector index, and after obtaining the word embedding vector index according to S201, a body word set and the word embedding vector index may be subjected to intersection finding to obtain the body word embedding vector index; the set of modifiers can be intersected with the word embedding vector index to obtain the modifier embedding vector index.
S203, for each body word in the body word set, determining M triples according to the body word embedding vector index and the modifier embedding vector index, where M is a preset positive integer.
Specifically, based on word-wide embedding vector indices and modifier-embedding vector indices, M triples are determined, i.e., M triples that can constitute metaphors, which are not strongly related in general (e.g., "life" and "honey"), but which must have commonalities ("sweet"), which can be expressed as modifiers. Fig. 7 is a diagram illustrating the correlation between the body words, metaphors, and modifiers, as shown in fig. 7, the body word "living" may be found by embedding the body words into the vector index to find four body words whose distance from the cosine of the vector of the body word "living" is the smallest, such as "red," sweet, "" brave, "" distressed, "and the number between every two words is the vector cosine distance (e.g., the distance between the cosine of the vector between living and red is 0.816), and then the modifiers whose distance from the cosine of the vector of each determined metaphor is the smallest may be determined by embedding the modifiers into the vector index, such as three modifiers" dense, "" fresh blood, "and" ground prison. Multiple triples may be obtained by combining the ontology words with the metaphorical words and modifiers.
As an implementation manner, S203 may specifically be:
s2031, embedding modifiers into the vector index to determine N modifiers with the minimum distance from the cosine of the vector of the word, wherein N is a preset positive integer.
S2032, for each modifier in the N modifiers, determining Q metaphors with the minimum vector cosine distance with the modifier through the metaphor embedding vector index, and determining S metaphors with the vector cosine distance with the modifier smaller than or equal to a first distance through the metaphor embedding vector index, wherein the first distance is the vector cosine distance between the metaphors and the modifier.
S2033, obtaining M triples according to Q metaphors, S metaphors and N modifiers, wherein M is Q + N + S.
And S204, generating at least one metaphor sentence according to the correlation distance of each triple in the M triples and a preset metaphor sentence template.
Specifically, S204 may specifically include:
and S2041, respectively calculating the correlation distance of each triple in the M triples.
Wherein, taking the word α, the word β and the modifier γ as examples, the correlation distance can be calculated by the following formula:
Distα,β,γ=distα,γ+distβ,γ+log(|distα,γ-distβ,γ|+)
wherein Distα,β,γIs the correlation distance, distα,γIs the first vector cosine distance, distβ,γIs the second vector cosine distance, | distα,γ-distβ,γI is the difference between the first vector cosine distance and the second vector cosine distance, and ξ is an integer, such as 1. Distα,β,γThe smaller the likelihood that the triplet (α, β, γ) can be used as a metaphor.
S2042, determining P triples with the minimum correlation distance from the M triples, wherein P is a preset positive integer.
And S2043, generating at least one metaphor sentence according to the P triples and a preset metaphor sentence template.
Specifically, the preset metaphorical sentence templates may be one or more, and the following table one is an example of three metaphorical sentence templates and corresponding metaphorical sentences:
specifically, according to the P triples, the self words, the metaphors and the modifiers of each triplet replace corresponding wildcards [ in the template ], and then a complete metaphor sentence can be generated. For example, the substitution of triples ("life", "honey", "sweet") into the template "[ ontology ] just like an figurative [ adjective modifier ], generates a figurative sentence" life just as sweet as honey ".
The sentence acquisition method provided in this embodiment obtains a word sample set by performing word segmentation on a received corpus, determines a word embedding vector index corresponding to the word sample set according to a preset training model, then determines a metaphor embedding vector index and a modifier embedding vector index according to the word embedding vector index, the homonym set and the modifier set, then determines M triples for each body word in the body word set according to the homonym embedding vector index and the modifier embedding vector index, and finally generates at least one metaphor sentence according to a correlation distance of each triplet in the M triples and a preset metaphor sentence template, so that available metaphors of all commonly used words in chinese can be generated according to the received corpus, a potential metaphor relationship of any word can be intelligently discovered, and many metaphors that are not imaginable by human can be discovered, the method expands the visualization boundary of the text, has strong generalization capability, can generate metaphorical sentences similar to human handwritten metaphorical sentences, and is suitable for being used in scenes such as machine-generated text, chat robots and the like, and can improve the user experience.
Fig. 8 is a schematic diagram of a process for determining M triples corresponding to each local word in the local word set provided in the present application, where in this embodiment, taking N as 100, Q as 10, and S as 10 as examples, then correspondingly, M ═ Q × N + S × N ═ 10 × 100+10 ═ 100 ═ 2000, as shown in fig. 8, the process for determining M triples may include:
s301, judging whether the word set of the body has the uncaptured words.
If yes, go to step S302, otherwise, terminate.
S302, one body word is taken from the body word set and is set as W.
And S303, embedding the modifiers into the vector indexes to determine 100 modifiers with the minimum distance from the vector cosine of the body word.
S304, for each modifier in the 100 modifiers, determining 10 metaphors with the smallest vector cosine distance from the modifier through the metaphor-embedded vector index.
S305, determining 10 metaphors of which the distance from the vector cosine of the modifiers is smaller than or equal to a first distance through the metaphors embedded vector index, wherein the first distance is the distance between the vector cosine of the ontology word W and the vectors of the modifiers.
S306, obtaining 2000 triples according to the body words W, the 10 metaphors and the 100 modifiers.
Fig. 9 is a schematic structural diagram of a sentence acquisition apparatus provided in the present application, and as shown in fig. 9, the apparatus of this embodiment may include: an acquisition module 11, a processing module 12 and a sending module 13, wherein,
the obtaining module 11 is used for obtaining the body word;
the processing module 12 is configured to determine at least one metaphor sentence corresponding to a body word from a database according to the body word, the database includes a plurality of body words and at least one metaphor sentence corresponding to each body word, the at least one metaphor sentence corresponding to each body word is generated according to at least one triple and a preset metaphor sentence template, the at least one triple is determined according to a correlation distance between the body word set, the face-lifted body word set, the modifier set and the triple, the triple includes the body word, the body-lifted word and the modifier, the correlation distance is determined according to a first vector cosine distance between the body word and the modifier, a first vector cosine distance between the body word and the modifier and a difference between the first vector cosine distance and the first vector cosine distance, and the vector cosine distance is calculated according to embedded vectors of the two words;
the sending module 13 is configured to send at least one metaphor sentence to the terminal device.
The apparatus provided in the embodiment of the present application may implement the method embodiment shown in fig. 2, and for details of the implementation principle and technical effect, reference may be made to the method embodiment, which is not described herein again.
Fig. 10 is a schematic structural diagram of a sentence acquisition apparatus provided in the present application, and as shown in fig. 10, the apparatus of the present embodiment may further include, on the basis of the apparatus shown in fig. 9: a segmentation module 14, a first determination module 15, a second determination module 16 and a generation module 17, wherein,
the segmentation module 14 is configured to perform segmentation processing on the received corpus to obtain a word sample set, and determine, according to a preset training model, a word embedded vector index corresponding to the word sample set, where the word embedded vector index is used to store an embedded vector of each word;
the first determining module 15 is configured to determine a metaphor word embedding vector index according to the word embedding vector index and the facility word set, and determine a modifier embedding vector index according to the word embedding vector index and the modifier set;
the second determining module 16 is configured to determine, for each body word in the body word set, M triples according to the body word embedding vector index and the modifier embedding vector index, where M is a preset positive integer;
the generating module 17 is configured to generate at least one metaphor sentence according to the correlation distance of each triple in the M triples and a preset metaphor sentence template.
Further, the generating module 17 is configured to:
respectively calculating the correlation distance of each triple in the M triples;
determining P triples with the minimum correlation distance from the M triples, wherein P is a preset positive integer;
and generating at least one metaphor sentence according to the P triples and a preset metaphor sentence template.
Further, the second determining module 16 is configured to:
determining N modifiers with the minimum distance from the vector cosine of the body word through modifier embedding vector indexes, wherein N is a preset positive integer, determining Q metaphors with the minimum distance from the vector cosine of the modifiers through the modifier embedding vector indexes for each modifier in the N modifiers, determining S metaphors with the distance from the vector cosine of the modifiers smaller than or equal to a first distance through metaphors embedding vector indexes, wherein the first distance is the distance from the body word to the vector cosine of the modifiers, and obtaining M triples according to the Q metaphors, the S metaphors and the N modifiers, wherein M is Q + N + S.
Optionally, the correlation distance is calculated by the following formula:
Distα,β,γ=distα,γ+distβ,γ+log(|distα,γ-distβ,γ|+)
wherein Distα,β,γIs the correlation distance, distα,γIs the first vector cosine distance, distβ,γIs the second vector cosine distance, | distα,γ-distβ,γAnd | is the difference value of the first vector cosine distance and the second vector cosine distance, and ξ is an integer.
Further, the first determining module 15 is configured to:
finding word embedding vectors of all metaphors in the metaphor word set from the word embedding vector index, and obtaining the metaphor word embedding vector index according to the word embedding vectors of all metaphors;
and finding out word embedding vectors of all modifiers in the modifier set from the word embedding vector indexes, and obtaining the modifier embedding vector indexes according to the word embedding vectors of all the modifiers.
Further, the obtaining module 11 is further configured to:
obtaining a ontology word database, a body word corpus and a modifier word corpus;
the processing module is further configured to: ontology word collections, body word collections, and modifier word collections are extracted from the ontology word corpus, the body word corpus, and the modifier word corpus, respectively.
The apparatus provided in the embodiment of the present application may implement the method embodiment shown in fig. 3, and for details of the implementation principle and technical effect, reference may be made to the method embodiment, which is not described herein again.
Fig. 11 is a schematic diagram of a hardware structure of an electronic device provided in the present application. As shown in fig. 11, the electronic device 20 of the present embodiment may include: a memory 21 and a processor 22;
a memory 21 for storing a computer program;
a processor 22 for executing the computer program stored in the memory to implement the printing method in the above-described embodiments. Reference may be made in particular to the description relating to the method embodiments described above.
Alternatively, the memory 21 may be separate or integrated with the processor 22.
When the memory 21 is a device separate from the processor 22, the electronic device 20 may further include:
a bus 23 for connecting the memory 21 and the processor 22.
Optionally, this embodiment further includes: a communication interface 24, the communication interface 24 being connectable to the processor 22 via a bus 23. The processor 22 may control the communication interface 23 to implement the above-described receiving and transmitting functions of the electronic device 20.
The electronic device provided by this embodiment can be used to execute the above method, and its implementation manner and technical effect are similar, and this embodiment is not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present application may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit. The unit formed by the modules can be realized in a hardware form, and can also be realized in a form of hardware and a software functional unit.
The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present application.
It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.
The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.
The computer-readable storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.
Claims (16)
1. A sentence acquisition method, comprising:
obtaining a body word;
determining at least one metaphor sentence corresponding to the body word from a database according to the body word, wherein the database comprises a plurality of body words and at least one metaphor sentence corresponding to each body word, the at least one metaphor sentence corresponding to each body word is generated according to at least one triple and a preset metaphor sentence template, the at least one triple is determined according to the correlation distance of the body word set, the face word set, the modifier set and the triple, the triple comprises the body words, the body words and the modifiers, the correlation distance is determined according to the first vector cosine distance of the body words and the modifiers, the second vector cosine distance of the body words and the modifiers and the difference value of the first vector cosine distance and the second vector cosine distance, and the vector cosine distance is calculated according to the embedded vectors of the two words;
and sending the at least one metaphor sentence to the terminal equipment.
2. The method of claim 1, further comprising:
performing word segmentation processing on the received corpus to obtain a word sample set, and determining a word embedded vector index corresponding to the word sample set according to a preset training model, wherein the word embedded vector index is used for storing an embedded vector of each word;
determining the metaphor embedded vector index according to the word embedded vector index and the metaphor set, and determining the modifier embedded vector index according to the word embedded vector index and the modifier set;
for each body word in the body word set, determining M triples according to the body word embedded vector index and the modifier embedded vector index, wherein M is a preset positive integer;
and generating at least one metaphor sentence according to the correlation distance of each triple in the M triples and a preset metaphor sentence template.
3. The method of claim 2, wherein generating at least one metaphor from the relevance distance of each of the plurality of triples and a preset metaphor template comprises:
respectively calculating the correlation distance of each triple in the M triples;
determining P triples with the minimum correlation distance from the M triples, wherein P is a preset positive integer;
and generating the at least one metaphorical sentence according to the P triples and a preset metaphorical sentence template.
4. The method of claim 2, wherein determining M triples from the modifier embedding vector index and the modifier embedding vector index comprises:
determining N modifying words with the minimum vector cosine distance with the body word by embedding the modifying words into the vector index, wherein N is a preset positive integer;
for each modifier in the N modifiers, determining Q metaphors with the minimum vector cosine distance from the modifier through the metaphor embedding vector index, and determining S metaphors with the vector cosine distance from the modifier smaller than or equal to a first distance through the metaphors embedding vector index, wherein the first distance is the vector cosine distance between the local word and the modifier;
and obtaining the M triples according to the Q metaphors, the S metaphors and the N modifiers, wherein M is Q + N + S.
5. The method according to any one of claims 1-4, wherein the correlation distance is calculated by the following formula:
Distα,β,γ=distα,γ+distβ,γ+log(|distα,γ-distβ,γ|+ξ)
wherein Distα,β,γFor the correlation distance, distα,γIs the first vector cosine distance, distβ,γIs the second vector cosine distance, | distα,γ-distβ,γL is the first vector cosine distance and the second vector cosine distanceThe difference of the distances, ξ, is an integer.
6. The method of any of claims 2-4, wherein said determining a metaphor word embedding vector index from the word embedding vector index and the set of words, comprises:
finding word embedding vectors of all metaphors in the metaphor word set from the word embedding vector index, and obtaining the metaphor word embedding vector index according to the word embedding vectors of all metaphors;
determining a modifier-embedded vector index from the word-embedded vector index and the set of modifiers, comprising:
and finding out word embedding vectors of all modifiers in the modifier set from the word embedding vector index, and obtaining the modifier embedding vector index according to the word embedding vectors of all modifiers.
7. The method according to any one of claims 1-4, further comprising:
obtaining a ontology word database, a body word corpus and a modifier word corpus;
extracting the set of ontology words, and the set of modifiers from the ontology word corpus, and the modifier word corpus, respectively.
8. A sentence acquisition apparatus, comprising:
the acquisition module is used for acquiring the body words;
a processing module for determining at least one metaphor sentence corresponding to the body word from a database according to the body word, the database comprises a plurality of body words and at least one metaphor sentence corresponding to each body word, the at least one metaphor corresponding to each body word is generated according to the at least one triple and a preset metaphor template, the at least one triple is determined from the relevance distances of the body word set, the modifier set and the triple, the triple comprises a body word, a metaphor word and a modifier, the correlation distance is determined according to a first vector cosine distance between the body word and the modifier, a second vector cosine distance between the metaphor word and the modifier and a difference value between the first vector cosine distance and the second vector cosine distance, and the vector cosine distance is calculated according to embedded vectors of two words;
and the sending module is used for sending the at least one metaphor sentence to the terminal equipment.
9. The apparatus of claim 8, further comprising:
the word segmentation module is used for carrying out word segmentation on the received corpus to obtain a word sample set, and determining a word embedded vector index corresponding to the word sample set according to a preset training model, wherein the word embedded vector index is used for storing an embedded vector of each word;
a first determining module, configured to determine the metaphor embedded vector index according to the word embedded vector index and the set of modifiers, and determine the modifier embedded vector index according to the word embedded vector index and the set of modifiers;
a second determining module, configured to determine, for each local word in the local word set, M triples according to the local word embedding vector index and the modifier embedding vector index, where M is a preset positive integer;
and the generating module is used for generating at least one metaphor sentence according to the correlation distance of each triple in the M triples and a preset metaphor sentence template.
10. The apparatus of claim 9, wherein the generating module is configured to:
respectively calculating the correlation distance of each triple in the M triples;
determining P triples with the minimum correlation distance from the M triples, wherein P is a preset positive integer;
and generating the at least one metaphorical sentence according to the P triples and a preset metaphorical sentence template.
11. The apparatus of claim 9, wherein the second determining module is configured to:
determining N modifying words with the minimum vector cosine distance with the body word by embedding the modifying words into the vector index, wherein N is a preset positive integer;
for each modifier in the N modifiers, determining Q metaphors with the minimum vector cosine distance from the modifier through the metaphor embedding vector index, and determining S metaphors with the vector cosine distance from the modifier smaller than or equal to a first distance through the metaphors embedding vector index, wherein the first distance is the vector cosine distance between the local word and the modifier;
and obtaining the M triples according to the Q metaphors, the S metaphors and the N modifiers, wherein M is Q + N + S.
12. The apparatus according to any one of claims 8-11, wherein the correlation distance is calculated by the following formula:
Distα,β,γ=distα,γ+distβ,γ+log(|distα,γ-distβ,γ|+ξ)
wherein Distα,β,γFor the correlation distance, distα,γIs the first vector cosine distance, distβ,γIs the second vector cosine distance, | distα,γ-distβ,γAnd | is the difference value of the first vector cosine distance and the second vector cosine distance, and ξ is an integer.
13. The apparatus of any one of claims 9-11, wherein the first determining module is configured to:
finding word embedding vectors of all metaphors in the metaphor word set from the word embedding vector index, and obtaining the metaphor word embedding vector index according to the word embedding vectors of all metaphors;
and finding out word embedding vectors of all modifiers in the modifier set from the word embedding vector index, and obtaining the modifier embedding vector index according to the word embedding vectors of all modifiers.
14. The apparatus of any one of claims 8-11, wherein the obtaining module is further configured to:
obtaining a ontology word database, a body word corpus and a modifier word corpus;
the processing module is further configured to: extracting the set of ontology words, and the set of modifiers from the ontology word corpus, and the modifier word corpus, respectively.
15. A computer-readable storage medium on which a computer program is stored, the computer program, when being executed by a processor, implementing the sentence acquisition method of any one of claims 1-7.
16. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the statement acquisition method of any of claims 1-7 via execution of the executable instructions.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010287542.7A CN112307754B (en) | 2020-04-13 | 2020-04-13 | Statement acquisition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010287542.7A CN112307754B (en) | 2020-04-13 | 2020-04-13 | Statement acquisition method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112307754A true CN112307754A (en) | 2021-02-02 |
CN112307754B CN112307754B (en) | 2024-09-20 |
Family
ID=74336757
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010287542.7A Active CN112307754B (en) | 2020-04-13 | 2020-04-13 | Statement acquisition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112307754B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113111664A (en) * | 2021-04-30 | 2021-07-13 | 网易(杭州)网络有限公司 | Text generation method and device, storage medium and computer equipment |
CN113157932A (en) * | 2021-03-02 | 2021-07-23 | 首都师范大学 | Metaphor calculation and device based on knowledge graph representation learning |
CN114328970A (en) * | 2021-12-30 | 2022-04-12 | 达闼机器人有限公司 | Triple extraction method, equipment and computer storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102622342A (en) * | 2011-01-28 | 2012-08-01 | 上海肇通信息技术有限公司 | Interlanguage system and interlanguage engine and interlanguage translation system and corresponding method |
CN106502981A (en) * | 2016-10-09 | 2017-03-15 | 广西师范大学 | Automatically analyzed and decision method based on the Figures of Speech sentence of part of speech, syntax and dictionary |
CN109299455A (en) * | 2017-12-20 | 2019-02-01 | 北京联合大学 | A kind of Computer Language Processing method of the extraordinary collocation of Chinese gerund |
KR102081512B1 (en) * | 2018-09-14 | 2020-02-25 | 울산대학교 산학협력단 | Apparatus and method for generating metaphor sentence |
-
2020
- 2020-04-13 CN CN202010287542.7A patent/CN112307754B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102622342A (en) * | 2011-01-28 | 2012-08-01 | 上海肇通信息技术有限公司 | Interlanguage system and interlanguage engine and interlanguage translation system and corresponding method |
CN106502981A (en) * | 2016-10-09 | 2017-03-15 | 广西师范大学 | Automatically analyzed and decision method based on the Figures of Speech sentence of part of speech, syntax and dictionary |
CN109299455A (en) * | 2017-12-20 | 2019-02-01 | 北京联合大学 | A kind of Computer Language Processing method of the extraordinary collocation of Chinese gerund |
KR102081512B1 (en) * | 2018-09-14 | 2020-02-25 | 울산대학교 산학협력단 | Apparatus and method for generating metaphor sentence |
Non-Patent Citations (3)
Title |
---|
曾华琳;周昌乐;陈毅东;史晓东;: "基于特征自动选择方法的汉语隐喻计算", 厦门大学学报(自然科学版), no. 03, 31 May 2016 (2016-05-31) * |
林鸿飞;许侃;任惠;: "基于词汇范畴和语义相似的显性情感隐喻识别机制", 大连理工大学学报, no. 05, 15 September 2012 (2012-09-15) * |
汪梦翔;饶琪;顾澄;王厚峰;: "汉语名词的隐喻知识表示及获取研究", 中文信息学报, no. 06, 15 November 2017 (2017-11-15) * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113157932A (en) * | 2021-03-02 | 2021-07-23 | 首都师范大学 | Metaphor calculation and device based on knowledge graph representation learning |
CN113111664A (en) * | 2021-04-30 | 2021-07-13 | 网易(杭州)网络有限公司 | Text generation method and device, storage medium and computer equipment |
CN113111664B (en) * | 2021-04-30 | 2024-07-23 | 网易(杭州)网络有限公司 | Text generation method and device, storage medium and computer equipment |
CN114328970A (en) * | 2021-12-30 | 2022-04-12 | 达闼机器人有限公司 | Triple extraction method, equipment and computer storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112307754B (en) | 2024-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110717017B (en) | Method for processing corpus | |
WO2022007823A1 (en) | Text data processing method and device | |
CN107992543B (en) | Question-answer interaction method and device, computer equipment and computer readable storage medium | |
CN109918676B (en) | Method and device for detecting intention regular expression and terminal equipment | |
CN112528637B (en) | Text processing model training method, device, computer equipment and storage medium | |
KR101937778B1 (en) | System, method and recording medium for machine-learning based korean language conversation using artificial intelligence | |
CN110750959A (en) | Text information processing method, model training method and related device | |
JP6677419B2 (en) | Voice interaction method and apparatus | |
CN113268586A (en) | Text abstract generation method, device, equipment and storage medium | |
CN110347790B (en) | Text duplicate checking method, device and equipment based on attention mechanism and storage medium | |
CN112307754A (en) | Statement acquisition method and device | |
CN110895656B (en) | Text similarity calculation method and device, electronic equipment and storage medium | |
CN112287085B (en) | Semantic matching method, system, equipment and storage medium | |
CN111145914B (en) | Method and device for determining text entity of lung cancer clinical disease seed bank | |
CN112349294B (en) | Voice processing method and device, computer readable medium and electronic equipment | |
CN111832318A (en) | Single sentence natural language processing method and device, computer equipment and readable storage medium | |
CN113095072B (en) | Text processing method and device | |
CN115186080A (en) | Intelligent question-answering data processing method, system, computer equipment and medium | |
CN114241279A (en) | Image-text combined error correction method and device, storage medium and computer equipment | |
WO2024109597A1 (en) | Training method for text merging determination model, and text merging determination method | |
CN116913278B (en) | Voice processing method, device, equipment and storage medium | |
CN113705207A (en) | Grammar error recognition method and device | |
CN111401070B (en) | Word meaning similarity determining method and device, electronic equipment and storage medium | |
CN114238587A (en) | Reading understanding method and device, storage medium and computer equipment | |
CN114067362A (en) | Sign language recognition method, device, equipment and medium based on neural network model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |