CN110019648A - A kind of method, apparatus and storage medium of training data - Google Patents
A kind of method, apparatus and storage medium of training data Download PDFInfo
- Publication number
- CN110019648A CN110019648A CN201711269292.9A CN201711269292A CN110019648A CN 110019648 A CN110019648 A CN 110019648A CN 201711269292 A CN201711269292 A CN 201711269292A CN 110019648 A CN110019648 A CN 110019648A
- Authority
- CN
- China
- Prior art keywords
- word
- candidate
- hash
- vector
- words
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/325—Hash tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
- G06F18/2414—Smoothing the distance, e.g. radial basis function networks [RBFN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Machine Translation (AREA)
Abstract
A kind of method, apparatus and storage medium of training data, this method include obtaining corpus set to be processed;Entity sets are extracted from corpus set, candidate upper set of words is extracted from entity sets;By the entity in entity sets respectively with each upper word combination in candidate upper set of words, obtain it is candidate to set, candidate includes that multiple candidate are right to set, it is candidate to referring to the entity and upper contamination for having incidence relation;By candidate to, each with candidate a prediction data is respectively configured to associated sentence, and extensive processing is carried out to associated sentence with candidate in prediction data;Word segmentation processing is carried out to associated sentence to each candidate respectively, obtains set of words;Extensive process layer is inputted to each word in set of words to convert, and obtains vector set;The vector set is trained and is predicted according to prediction data and shot and long term memory artificial neural network LSTM.By using the program, the efficiency of training data can be improved.
Description
Technical field
This application involves the method, apparatus and storage of big data processing technology field more particularly to a kind of training data to be situated between
Matter.
Background technique
In time recursive neural network technology field, generally using shot and long term memory artificial neural network (full name in English:
Long-short term memory, English abbreviation: LSTM) it handles, the important thing that interval is long in predicted time sequence, delay is long
Part.Before using LSTM prediction, needs to excavate hypernym from corpus set, and problem is converted into classification problem, that is, give
Fixed one candidate entity-hypernym pair, predict the candidate's entity-hypernym to whether real entity-hypernym pair.Pre-
In survey method, typically word segmentation processing, extract feature, then candidate entity-hypernym is carried out using traditional classifier
Classification.But this mode is more demanding to domain knowledge, and the result of final classification may not have generalization, and institute can be pre-
The range of survey is smaller.
The method for being based primarily upon deep learning at present classifies to candidate entity-hypernym, automatically from corpus set
Feature and raw batches of training data are extracted, the training data based on batch is predicted, it can be improved the performance of classification, but
It is since depth network is very complicated, the increase of additional name physical quantities needs to generate more training datas, generates a large amount of
Time spent by training data is longer, and efficiency is lower.
Summary of the invention
This application provides a kind of method, apparatus of training data and storage mediums, are able to solve and train in the prior art
The lower problem of the efficiency of data.
The application first aspect provides a kind of method of training data, which comprises
Obtain corpus set to be processed;
Entity sets are extracted from the corpus set, the entity sets include the entity of multiple names;
Candidate upper set of words is extracted from the entity sets;
By the entity in the entity sets respectively with each upper word combination in the upper set of words of candidate, waited
Choosing to set, it is described it is candidate include that multiple candidate are right to set, the candidate to refer to the entity that has incidence relation with it is upper
Contamination;
By candidate to, each with candidate a prediction data is respectively configured to associated sentence, and to prediction data
In with candidate extensive processing is carried out to associated sentence;
Word segmentation processing is carried out to associated sentence to each candidate respectively, obtains set of words;
Extensive process layer is inputted to each word in the set of words to convert, and obtains vector set;
The vector set is trained according to the prediction data and shot and long term memory artificial neural network LSTM and
Prediction.
The application second aspect provides a kind of device for training data, has and realizes that corresponding to above-mentioned first aspect mentions
The function of the method for the training data of confession.The function can also be executed corresponding soft by hardware realization by hardware
Part is realized.Hardware or software include one or more modules corresponding with above-mentioned function, the module can be software and/or
Hardware.
In a kind of possible design, described device includes:
Module is obtained, for obtaining corpus set to be processed;
Processing module, for extracting entity sets from the corpus set, the entity sets include multiple names
Entity;
Candidate upper set of words is extracted from the entity sets;
By the entity in the entity sets respectively with each upper word combination in the upper set of words of candidate, waited
Choosing to set, it is described it is candidate include that multiple candidate are right to set, the candidate to refer to the entity that has incidence relation with it is upper
Contamination;
By candidate to, each with candidate a prediction data is respectively configured to associated sentence, and to prediction data
In with candidate extensive processing is carried out to associated sentence;
Word segmentation processing is carried out to associated sentence to each candidate respectively, obtains set of words;
Extensive process layer is inputted to each word in the set of words to convert, and obtains vector set;
The vector set is trained according to the prediction data and shot and long term memory artificial neural network LSTM and
Prediction.
The another aspect of the application provides a kind of device for training data comprising the processing of at least one connection
Device, memory and transceiver, wherein the memory is for storing program code, and the processor is for calling the storage
Program code in device executes method described in above-mentioned first aspect.
The another aspect of the application provides a kind of computer storage medium comprising instruction, when it runs on computers
When, so that computer executes method described in above-mentioned first aspect.
Compared to the prior art, in scheme provided by the present application, after extracting entity sets and candidate upper set of words, by institute
The entity in entity sets is stated respectively with each upper word combination in the upper set of words of candidate, candidate is obtained to set, incites somebody to action
Candidate to, each with candidate a prediction data is respectively configured to associated sentence, and to right with candidate in prediction data
Associated sentence carries out extensive processing;Word segmentation processing is carried out to associated sentence to each candidate respectively, obtains set of words;It is right
Each word in the set of words inputs extensive process layer and is converted, and obtains vector set, and being handled by extensive layer can
The order of magnitude of data is reduced, and then carries out fast convergence on the basis of a small amount of prediction data, and then is reduced for trained and pre-
Number of parameters needed for surveying, to improve the efficiency of training data.
Detailed description of the invention
Fig. 1 is a kind of a kind of flow diagram of the method for training data in the embodiment of the present application;
Fig. 2 is a kind of a kind of flow diagram of the method for training data in the embodiment of the present application;
Fig. 3 is LSTM schematic network structure in the embodiment of the present application;
Fig. 4 is to convert a kind of schematic diagram of word in the char layer of LSTM in the embodiment of the present application;
Fig. 5 is to convert a kind of schematic diagram of word in the hash layer of LSTM in the embodiment of the present application;
Fig. 6 is a kind of a kind of structural schematic diagram of the device for training data in the embodiment of the present application;
Fig. 7 is a kind of another structural schematic diagram of the device for training data in the embodiment of the present application;
Fig. 8 is a kind of structural schematic diagram of terminal device in the embodiment of the present application;
Fig. 9 is a kind of structural schematic diagram of server in the embodiment of the present application.
Specific embodiment
The description and claims of this application and term " first " in above-mentioned attached drawing, " second " etc. are for distinguishing
Similar object, without being used to describe a particular order or precedence order.It should be understood that the data used in this way are in appropriate feelings
It can be interchanged under condition, so that the embodiments described herein can be real with the sequence other than the content for illustrating or describing herein
It applies.In addition, term " includes " and " having " and their any deformation, it is intended that cover it is non-exclusive include, for example, packet
The process, method, system, product or equipment for having contained series of steps or module those of be not necessarily limited to be clearly listed step or
Module, but may include other steps being not clearly listed or intrinsic for these process, methods, product or equipment or
Module, the division of module appeared in the application, only a kind of division in logic can have when realizing in practical application
Other division mode, such as multiple modules can be combined into or are integrated in another system, or some features can be ignored,
Or do not execute, in addition, shown or discussion mutual coupling, direct-coupling or communication connection can be by one
A little interfaces, the indirect coupling or communication connection between module can be electrical or other similar form, do not make in the application
It limits.Also, module or submodule can be the separation that may not be physically as illustrated by the separation member, can be
It can not be physical module, or can be distributed in multiple circuit modules, portion therein can be selected according to the actual needs
Point or whole module realize the purpose of application scheme.
The application has supplied the method, apparatus and storage medium of a kind of training data, is used for artificial neural network, artificial neuron
Network is a kind of imitation animal nerve network behavior feature, carries out the algorithm mathematics model of distributed parallel information processing.It is
A kind of operational model, be coupled to each other by a large amount of node (or neuron or processing unit) constitute it is non-linear, adaptive
Information processing system.Wherein, a kind of specific output function of each node on behalf, referred to as excitation function.Between every two node
Connection all represents a weighted value for passing through the connection signal, referred to as weight, is equivalent to the memory of artificial neural network.
The output of artificial neural network then according to the connection type of artificial neural network, the difference of weighted value and excitation function and it is different.And
Artificial neural network itself is approached certain algorithm of nature or function, it is also possible to a kind of logic strategy
Expression.Artificial neural network can rely on the complexity of system, by adjusting interconnected between internal great deal of nodes
Relationship achievees the purpose that handle information.
Artificial neural network has the function of that self-learning function, connection entropy, high speed find the operational capability of optimization solution, from group
Knit, adaptively, the ability of real-time learning.
Wherein, it should be strongly noted that this application involves terminal device, can be directed to user provide voice and/
Or the equipment of data connectivity, with wireless connecting function handheld device or be connected to radio modem other
Processing equipment.Wireless terminal can be through wireless access network (full name in English: radio access network, English abbreviation: RAN)
Communicated with one or more core nets, wireless terminal can be mobile terminal, as mobile phone (or for " honeycomb " electricity
Words) and computer with mobile terminal, for example, it may be portable, pocket, hand-held, built-in computer or vehicle
The mobile device of load, they exchange voice and/or data with wireless access network.For example, personal communication service (full name in English:
Personal communication service, English abbreviation: PCS) phone, wireless phone, Session initiation Protocol (SIP) words
Machine, wireless local loop (full name in English: wireless local loop, English abbreviation: WLL) are stood, personal digital assistant (English
Literary full name: personal digital assistant, English abbreviation: PDA) etc. equipment.Wireless terminal be referred to as system,
Subscriber unit (subscriber Unit), subscriber station (Subscriber Station), movement station (Mobile Station),
Mobile station (Mobile), distant station (Remote Station), access point (Access Point), remote terminal (Remote
Terminal), access terminal (Access Terminal), user terminal (User Terminal), terminal device, user agent
(User Agent), user equipment (User Device) or user equipment (User Equipment).
Fig. 1, a kind of method of training data provided herein introduced below are please referred to, the embodiment of the present application is mainly wrapped
It includes:
101, corpus set to be processed is obtained.
Wherein, the corpus set refers to the set of the collected corpus in a statistical time, and each corpus can come
From at least one platform.The corpus set includes multiple corpus, and each corpus includes multiple words, and multiple words constitute one
A set of words.Such as the corpus set is from a model or the data of news.Institute can be grabbed by modes such as crawlers
Predicate material set, concrete mode the application are not construed as limiting.The corpus set is also possible to the data from an enterprise, wherein can
Including employee information, company information, intellectual property, legal information, employee's up/down grade relationship, staff attendance, assessment of staff, enterprise
Industry news, the product sale information of enterprise and creation data of enterprise etc..In addition, for convenient for follow-up data processing, it can be with
Denoising is carried out to corpus set, specific the application is not construed as limiting.
102, entity sets are extracted from the corpus set.
Wherein, the entity sets include the entity of multiple names, and entity can be any noun, for example, as name,
Name, things title, organization, term etc..
103, candidate upper set of words is extracted from the entity sets.
Such as entity sets include Liu Dehua, Yao Chen, party, attend, famous star, album and issue, eat, apple,
The entities such as lichee.It is possible to be inferred to the hypernym that famous star is Liu Dehua, Yao Chen, Yi Jishui from the entity sets
Fruit is the hypernym of apple, lichee.
104, the entity in the entity sets is obtained respectively with each upper word combination in the upper set of words of candidate
To candidate to set.
Wherein, it is described it is candidate include that multiple candidate are right to set, it is described it is candidate to refer to the entity for having incidence relation and
Upper contamination.
After being inferred to candidate hypernym by step 103, it is possible to which being inferred to famous star is the upper of Liu Dehua, Yao Chen
Position word, can be right respectively as a candidate by (Liu Dehua, famous star), (Yao Chen, famous star).It can also be by (apple, water
Fruit), (lichee, fruit) it is right respectively as a candidate.
105, by candidate to, each with candidate a prediction data is respectively configured to associated sentence, and to prediction
Extensive processing is carried out to associated sentence with candidate in data.
In some embodiments, prediction data can be used (pair, extensive sentence) to indicate, wherein extensive sentence refers to time
Choosing carries out the sentence obtained after extensive processing to associated sentence, and pair presentation-entity and the candidate of candidate hypernym composition are right.
For example, the candidate entity in 1 is Liu Dehua, candidate hypernym is famous star, the candidate entity in 2 is Yao
Morning, candidate hypernym are famous star, candidate to may include to sentence associated by 1:
" Liu Dehua and Yao Chen etc. are famous, and star has attended party.", " Liu Dehua and Yao Chen etc. are famous, and star performs jointly
One film ", " the famous star such as Liu Dehua and Fan ice ice chorused one first song " ...
After carrying out extensive processing to 1 associated sentence to above-mentioned candidate, following extensive sentence can be respectively obtained:
" Tag such as Nr and Yao Chen have attended party ", " Tag such as Nr and Yao Chen have performed a film jointly ", " Nr and Fan Bing
The Tag such as ice have chorused a first song " ...
Wherein, Nr indicates extensive name entity, such as with " Liu Dehua and Yao Chen etc. are famous, and star has attended party." be
Example, if pair for " Liu Dehua ", is incited somebody to action, " Liu Dehua and Yao Chen etc. are famous, and star has attended party." in " Liu Dehua "
It is generalized for Nr;If pair is for " Yao Chen ", by " Liu Dehua and Yao Chen etc. are famous, and star has attended party." in " Yao
Morning " is generalized for Nr.
Tag indicates the label of the hypernym of extensive entity attribute, such as famous in the famous star such as Liu Dehua, Yao Chen
Star then refers to the hypernym to " Liu Dehua, Yao Chen etc. " people entities.
106, word segmentation processing is carried out to associated sentence to each candidate respectively, obtains set of words.
Wherein, the set of words includes N number of word.Such as " Liu Dehua and Yao Chen etc. are famous, and star attends to sentence
Party " obtains after carrying out word segmentation processing: Liu Dehua and Yao Chen, etc., famous, star, attend, party.
107, extensive process layer is inputted to each word in the set of words to convert, obtain vector set.
Optionally, in some embodiments of the present application, the extensive process layer include character layer (char level) and
Hash layer (hash level), each word in the set of words input extensive process layer and convert, turned
The set of words after changing, comprising:
1, each word in the set of words is inputted into the character layer respectively, the word will be inputted in the character layer
The word of symbol layer is respectively converted into word vectors, obtains word vectors set.
In some embodiments, the first word can be matched with the character in character look-up table, obtain n character
Corresponding n vector generates word vectors, first word according to two-phase LSTM by the n vector and first word
Language refers to the word in the set of words wait train and predict.
For example, as shown in figure 4, the word (word) in Fig. 4 is the first word.After word enters char layers, by this
Word is matched with the character look-up table (char lookup table) in char layers respectively, is combined.Such as by word with
Char1 to charN is respectively combined, and word is combined with char1 can obtain output 1 (output1), other are similarly finally exported N number of
Output, i.e. output1 are to outputN.
2, each word in the set of words is inputted respectively it is hash layers described, described hash layers will input described in
Hash layers of word is respectively converted into Hash vector (hash vector), obtains hash vector set.
In some embodiments, Hash hash function can be used that N number of word is respectively mapped in K hash barrels,
N number of word is compressed in hash barrels each respectively, obtains K hash vector, each hash vector corresponds to the N
A word, wherein N and K is positive integer, N > K.
For example, as shown in figure 5, after word1 to wordN enters hash layers, Harbin function in hash layers is by word1
Hash1 is respectively mapped into hashK to wordN, and wherein hash1 to hashK indicates hash1 barrels.For example, extremely by word1
WordN is respectively mapped to hash1, finally obtains a Hash vector, i.e. hash1vector, other similarly, final hash layers is defeated
K hash vector, i.e. hash1vector to hashK vector out.
3, the vector set is obtained according to the word vectors set and the hash vector set.
In some embodiments, can the word vectors and the K hash vector be spliced or is pasted, obtained described
Vector set.
In some embodiments, extensive process layer is inputted to each word in the set of words to convert, obtain
The vector that each word has been arrived after vector set, for the vector of each word, since word can be embodied in sentence
With candidate in two dimensions, thus the vector of finally obtained each word can correspond to sentence matrix and it is candidate to matrix both
Matrix.Below in the corpus set the first sentence and candidate first candidate in set to for, respectively
It is introduced:
1, about sentence matrix
For example, the first sentence is corresponding can be obtained first matrix, and first matrix is according to first sentence
It is corresponding word quantity after participle, general via the vector dimension exported after the extensive processing of the character layer and via hash layers
Set vector dimension obtains when changing processing.
In some embodiments, the first matrix can be indicated with L1* (char_N+hash_N).Wherein, L1 is sentence point
The quantity of word after word, char_N are the vector dimensions exported after the extensive processing of char level, and hash_N is Hash lookup table
The vector dimension of (hash lookup table) setting.
2, about candidate to matrix
For example, first it is candidate obtain second matrix to corresponding, second matrix according to it is described it is candidate to point
It is corresponding word quantity after word, extensive via the vector dimension exported after the extensive processing of the character layer and via hash layers
Set vector dimension obtains when processing.
In some embodiments, the second matrix can be indicated with L2* (char_N+hash_N).Wherein, L2 is the first time
Word quantity after selecting the candidate entity of centering, candidate hypernym to segment respectively, char_N is the extensive processing of char level
The vector dimension exported afterwards, hash_N are hash vector dimensions.
108, the vector set is instructed according to the prediction data and shot and long term memory artificial neural network LSTM
Practice and predicts.
Compared with current mechanism, in the embodiment of the present application, after extracting entity sets and candidate upper set of words, by the reality
Entity in body set with each upper word combination in the upper set of words of candidate, obtains candidate to set respectively, will be candidate
To, each with candidate a prediction data is respectively configured to associated sentence, and in prediction data with candidate to being associated with
Sentence carry out extensive processing;Word segmentation processing is carried out to associated sentence to each candidate respectively, obtains set of words;To described
Each word in set of words inputs extensive process layer and is converted, and obtains vector set, handles to obtain vector by extensive layer
Set, can carry out fast convergence on the basis of a small amount of prediction data, gone to be trained and can reduce based on vector set
Number of parameters for training and needed for predicting, to improve the efficiency of training data, reduce training data manufacturing cost and
Training time.Also, handled in the embodiment of the present application by extensive layer, additionally it is possible to reduce deep learning to training samples number mistake
In relying on, restrain slow characteristic, and relatively good performance is automatically directly reached by the training of low volume data, not needed
Artificial extraction feature.
For ease of understanding, below by taking concrete application scene as an example, to training data provided in the embodiment of the present application
Method is introduced.As shown in Fig. 2, the embodiment of the present application can include:
Step 1: the sentence in corpus set being segmented, candidate pair is obtained based on corpus set, uses candidate
It is extensive that pair carries out sentence.
In corpus set each sentence first using name entity recognition techniques obtain it includes entity sets, then will be complete
The possible noun in portion, noun phrase etc. are as candidate upper set of words, by the entity and the upper set of words of candidate in entity sets
In any combination of two of hypernym treat as candidate pair.Then each candidate pair is used, and corresponding to candidate pair
Sentence is configured to a prediction data, while carrying out extensive processing to sentence.
Wherein, naming Entity recognition (full name in English: named entities recognition, English abbreviation: NER) is
One background task of natural language processing, the purpose is to identify, the names such as name, place name, institution term are real in corpus set
Body.Since these name physical quantities are continuously increased, it is often impossible to it is exhaustive in dictionary to list, and its constructive method has respectively
From certain law, thus, it is usually that the identification to these words is independent from vocabulary morphological process (such as Chinese word segmentation) task
Processing, referred to as name Entity recognition.Naming entity recognition techniques is information extraction, information retrieval, machine translation, question answering system etc.
A variety of essential component parts of natural language processing technique.
In view of the entity in candidate pair may be from different sentences with candidate hypernym, for example, in corpus set
Including following two sentence:
(1) the famous star such as Liu Dehua and Yao Chen has attended party.
(2) the famous star such as Liu Dehua and Yao Chen has performed a film jointly.
So, here structure forecast data when, then will appear multiple extensive sentences, but only correspond to the candidate
pair。
With sentence, " Liu Dehua and Yao Chen etc. are famous, and star has attended party below." for.Liu Dehua and Yao Chen belong to people
The entity of object name, famous star are corresponding candidate hypernym.By composite entity and candidate hypernym, available 2
Candidate pair:(Liu Dehua, famous star), (Yao Chen, famous star).It may then based on the two candidate pair, in conjunction with time
Entity in pair, the sentence where candidate hypernym is selected to construct corresponding prediction data:
(1) prediction data 1:pair (Liu Dehua, famous star), extensive sentence (Tag such as Nr and Yao Chen have attended party).
(2) prediction data 2:pair (Yao Chen, famous star), extensive sentence (Tag such as Liu Dehua and Nr have attended party).
Step 2: each word after segmenting to sentence generates word all by an extensive process layer processing
Vector is converted each word by extensive process layer, can be effectively reduced number of parameters, and in a small amount of training data
Upper realization fast convergence.
Step 3: to by extensive layer, treated that data are trained and are predicted using LSTM network, inputting as candidate
Pair sentence corresponding with candidate pair.
In some embodiments of the present application, LSTM network structure is described below, is carried out using LSTM network structure extensive
The process of layer processing.
LSTM network structure as shown in Figure 3, LSTM network structure include softmax classifier, sentence template, pair about
Beam template and extensive layer.
Wherein, extensive layer includes character layer (char level) and Hash layer (hash level), and Char level includes
Two-way LSTM and character look-up table (char lookup table), char lookup table may include N number of different char,
N can take 1~20,000, and the application does not limit the value of N.
Hash level includes hash function and hash lookup table, hash lookup table includes K
Hash barrels, K can be rule of thumb arranged, and the application does not limit the value of K.
Vector of each char and hash in respective lookup table can be M (20~50) dimension.
Softmax classifier refers to using multinomial distribution as model modeling, can divide the classification of a variety of mutual exclusions, energy
Enough any real vector mappings (compression) by a K dimension are the real vector of another K dimension.Softmax classifier refers to people
The output layer of artificial neural networks.
Sentence template refers to the LSTM of processing sentence matrix.
Pair constraint template refers to the LSTM of processing pair matrix.
One, extensive process layer handling principle:
Extensive layer process flow includes: that char level replaces word level, is mapped as hash using hash
vector。
1, charlevel replaces word level.
As shown in figure 4, passing through char lookup to each of word lookup table after participle word
Table (char1 ... charN) obtains the vector of each char, later by a two-way LSTM by result (n
Vector) with the new word vector of word combination producing, that is, remain word itself information can also substantially reduce it is original
Use parameter explosion issues caused by word lookup table.
2, it maps to obtain hash vector using hash.
As shown in figure 5, using hash Function Mapping to K hash to N number of word in each word lookup table
In bucket, K can be far smaller than N here, guarantee the reduction of parameter number magnitude.By the way that multiple word pressure to be compressed together, altogether
Enjoy a hash vector.By sharing hash vector mechanism, training speed can be greatly speeded up and can be less
Preferable result is obtained on training dataset.
Wherein, a corresponding hash vector can be obtained by each hash barrels, a shared hash here
Vector refers to that N number of word is respectively mapped to hash1 barrels to K barrels of hash, N number of for the N number of word for being mapped to hash1 barrels
Word shares hash1vector.
As it can be seen that being indicated using the vector that LSTM network structure obtains sentence the corresponding sentence of pair, while to pair sheet
Body, which obtains vector using LSTM network structure, to be indicated, is then done these two types of vectors together and is classified, at the same using pair information and
Sentence information, the data obtained by the two dimensions can quickly finish Data Convergence, and significantly reduce parameter explosion
Phenomenon.
Two, extensive processing is carried out based on LSTM network structure.
Be described below using the LSTM network structure carry out extensive processing process (including step 1 is to step 4):
1, char lookup table matrix, hash lookup table matrix are initialized.
It, can be by the way of random initializtion in some embodiments.
2, it for the sentence after a candidate pair and participle, can handle to obtain the defeated of each word by extensive layer
Enter vector:
(a), for sentence, the sentence matrix of an available L1* (char_N+hash_N).
Wherein, L1 is the quantity of word after sentence participle, and char_N is the vector dimension exported after the extensive processing of charlevel
Degree, hash_N are the vector dimensions of hash lookup table setting.
Char_N, hash_N obtained in step (a) are pasted together, and then the sentence matrix for obtaining extensive layer is defeated
Out.
(b), for pair, the pair matrix of an available L2* (char_N+hash_N).
Wherein, L2 is the word quantity after candidate entity in pair, candidate hypernym segment respectively, and char_N is
The vector dimension exported after the extensive processing of charlevel, hash_N are the vector dimensions of hash lookup table setting.
Char_N, hash_N obtained in step (b) are pasted into append together, and then obtain the pair square of extensive layer
Battle array output.
(c), by sentence Input matrix clause model, and by pair Input matrix pair constrain model.
3, two results obtained after being handled respectively in step (c) via clause model, pair constraint model
Append together, is finally obtained and is exported (clause h1+pair constrains a h2) dimensional vector.
Wherein, clause h1 is the h1 dimensional vector that exports after handling via clause model, pair constraint h2 be via pair about
The h2 dimensional vector exported after beam model processing, append refer to multiple vector direct splicings together.
4, (clause h1+pair constrains h2) dimensional vector that splicing in step 3 obtains is divided using softmax classifier
Class.
Any technical characteristic in embodiment corresponding to any one of Fig. 1 to Fig. 5 is applied equally in the application
Embodiment corresponding to Fig. 6 to Fig. 8, subsequent similar place repeat no more.
The method of training data a kind of in the application is illustrated above, below to the method for executing above-mentioned training data
Device is described.The device can be mounted in the functional module on terminal device or server, be also possible to terminal device
Or server, it can be combined with functional module and hardware module, specific the application is not construed as limiting.
Referring to Fig. 6, described device includes:
Module is obtained, for obtaining corpus set to be processed;
Processing module, for extracting entity sets from the corpus set, the entity sets include multiple names
Entity;
Candidate upper set of words is extracted from the entity sets;
By the entity in the entity sets respectively with each upper word combination in the upper set of words of candidate, waited
Choosing to set, it is described it is candidate include that multiple candidate are right to set, the candidate to refer to the entity that has incidence relation with it is upper
Contamination;
By candidate to, each with candidate a prediction data is respectively configured to associated sentence, and to prediction data
In with candidate extensive processing is carried out to associated sentence;
Word segmentation processing is carried out to associated sentence to each candidate respectively, obtains set of words;
Extensive process layer is inputted to each word in the set of words to convert, and obtains vector set;
The vector set is trained according to the prediction data and shot and long term memory artificial neural network LSTM and
Prediction.
In the embodiment of the present application, after the processing module extracts entity sets and candidate upper set of words, by the entity
Entity in set respectively with each upper word combination in the upper set of words of candidate, obtain it is candidate to set, by candidate to,
A prediction data each is respectively configured to associated sentence with candidate, and in prediction data with candidate to associated language
Sentence carries out extensive processing;Word segmentation processing is carried out to associated sentence to each candidate respectively, obtains set of words;To the word
Each word in set inputs extensive process layer and is converted, and obtains vector set, and being handled by extensive layer can reduce data
The order of magnitude, and then fast convergence is carried out on the basis of a small amount of prediction data, and then reduce for needed for training and prediction
Number of parameters, to improve the efficiency of training data.
Optionally, in some embodiments of the present application, the extensive process layer includes character layer and hash layers of Hash, institute
Processing module is stated to be specifically used for:
Each word in the set of words is inputted into the character layer respectively, the character will be inputted in the character layer
The word of layer is respectively converted into word vectors, obtains word vectors set;
Each word in the set of words is inputted hash layers described respectively, the hash will be inputted at described hash layers
The word of layer is respectively converted into hash vector, obtains hash vector set;
The vector set is obtained according to the word vectors set and the hash vector set.
Optionally, in some embodiments of the present application, the set of words includes N number of word, the processing module tool
Body is used for:
First word is matched with the character in character look-up table, obtains the corresponding n vector of n character, according to
The n vector and first word are generated word vectors by two-phase LSTM, and first word refers to the set of words
In word wait train and predict.
Optionally, in some embodiments of the present application, the processing module is specifically used for:
N number of word is respectively mapped in K hash barrels using Hash hash function, respectively in hash barrels each
N number of word is compressed, K hash vector is obtained, each hash vector corresponds to N number of word, and wherein N and K are equal
For positive integer, N > K.
Optionally, in some embodiments of the present application, the processing module is specifically used for:
The word vectors and the K hash vector are spliced, the vector set is obtained.
Optionally, in some embodiments of the present application, the first sentence in the corpus set is corresponding to obtain one the
One matrix, first matrix according to first sentence segment after corresponding word quantity, via the extensive place of the character layer
The vector dimension that exports after reason and via hash layers of extensive processing when set vector dimension obtain;
Candidate first candidate in set obtains second matrix to correspondence, and second matrix is according to
Candidate to word quantity corresponding after participle, via the vector dimension exported after the extensive processing of the character layer and via
Set vector dimension obtains when hash layers of extensive processing.
Above from the angle of modular functionality entity in the embodiment of the present application server and terminal device retouched
State, below from the angle of hardware handles respectively in the embodiment of the present application network authentication server and terminal device retouch
It states.It should be noted that the corresponding entity device of acquisition module in the application z in embodiment corresponding to Fig. 6 can be
Input-output unit device, the corresponding entity device of processing module can be processor.Device shown in fig. 6 can have such as Fig. 7
Shown in structure, when a kind of device has structure as shown in Figure 7, before processor and input-output unit in Fig. 7 are realized
State processing module and acquisition the same or similar function of module that the Installation practice of the corresponding device provides, the storage in Fig. 7
The program code for needing to call when device storage processor executes the method for above-mentioned training data.
The embodiment of the present application also provides a kind of terminal devices, as shown in figure 8, for ease of description, illustrating only and this
Apply for the relevant part of embodiment, it is disclosed by specific technical details, please refer to the embodiment of the present application method part.The terminal is set
It include mobile phone, tablet computer, personal digital assistant (full name in English: personal digital assistant, English for that can be
It is literary referred to as: PDA), any terminal such as point-of-sale terminal (full name in English: point of sales, English abbreviation: POS), vehicle-mounted computer
Equipment.
Fig. 8 shows the part of terminal device relevant to the device provided by the embodiments of the present application for training data
The block diagram of structure.With reference to Fig. 8, terminal device includes: radio frequency (full name in English: radio frequency, English abbreviation: RF) electricity
(English is complete for road 88, memory 820, input unit 830, display unit 840, sensor 850, voicefrequency circuit 860, Wireless Fidelity
Claim: wireless fidelity, English abbreviation: WiFi) components such as module 870, processor 880 and power supply 890.This field
Technical staff is appreciated that terminal device structure shown in Fig. 8 does not constitute the restriction to terminal device, may include than figure
Show more or fewer components, perhaps combines certain components or different component layouts.
It is specifically introduced below with reference to each component parts of the Fig. 8 to terminal device:
RF circuit 88 can be used for receiving and sending messages or communication process in, signal sends and receivees, particularly, will be under base station
After row information receives, handled to processor 880;In addition, the data for designing uplink are sent to base station.In general, RF circuit 88 wraps
Include but be not limited to antenna, at least one amplifier, transceiver, coupler, low-noise amplifier (full name in English: low noise
Amplifier, English abbreviation: LNA), duplexer etc..In addition, RF circuit 88 can also by wireless communication with network and other
Equipment communication.Any communication standard or agreement, including but not limited to global system for mobile communications can be used in above-mentioned wireless communication
(full name in English: global system of mobile communication, English abbreviation: GSM), general grouped wireless clothes
Be engaged in (full name in English: general packet radio service, English abbreviation: GPRS), CDMA (full name in English:
Code division multiple Access, English abbreviation: CDMA), wideband code division multiple access (full name in English: wideband
Code division multiple access, English abbreviation: WCDMA), long term evolution (full name in English: long term
Evolution, English abbreviation: LTE), Email, short message service (full name in English: short messaging service,
English abbreviation: SMS) etc..
Memory 820 can be used for storing software program and module, and processor 880 is stored in memory 820 by operation
Software program and module, thereby executing the various function application and data processing of terminal device.Memory 820 can be main
Including storing program area and storage data area, wherein storing program area can answer needed for storage program area, at least one function
With program (such as sound-playing function, image player function etc.) etc.;Storage data area can store the use according to terminal device
Data (such as audio data, phone directory etc.) created etc..In addition, memory 820 may include high random access storage
Device, can also include nonvolatile memory, and a for example, at least disk memory, flush memory device or other volatibility are solid
State memory device.
Input unit 830 can be used for receiving the number or character information of input, and generates and set with the user of terminal device
It sets and the related key signals of function control inputs.Specifically, input unit 830 may include touch panel 831 and other are defeated
Enter equipment 832.Touch panel 831, also referred to as touch screen collect touch operation (such as the user of user on it or nearby
Use the operation of any suitable object or attachment such as finger, stylus on touch panel 831 or near touch panel 831),
And corresponding attachment device is driven according to preset formula.Optionally, touch panel 831 may include touch detecting apparatus and
Two parts of touch controller.Wherein, the touch orientation of touch detecting apparatus detection user, and detect touch operation bring letter
Number, transmit a signal to touch controller;Touch controller receives touch information from touch detecting apparatus, and is converted into
Contact coordinate, then give processor 880, and order that processor 880 is sent can be received and executed.Furthermore, it is possible to using
The multiple types such as resistance-type, condenser type, infrared ray and surface acoustic wave realize touch panel 831.It is defeated in addition to touch panel 831
Entering unit 830 can also include other input equipments 832.Specifically, other input equipments 832 can include but is not limited to physics
One of keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse, operating stick etc. are a variety of.
Display unit 840 can be used for showing information input by user or the information and terminal device that are supplied to user
Various menus.Display unit 840 may include display panel 841, optionally, can using liquid crystal display (full name in English:
Liquid Crystal Display, English abbreviation: LCD), Organic Light Emitting Diode (full name in English: Organic Light-
Emitting Diode, English abbreviation: OLED) etc. forms configure display panel 841.Further, touch panel 831 can cover
Lid display panel 841 sends processor 880 to after touch panel 831 detects touch operation on it or nearby with true
The type for determining touch event is followed by subsequent processing device 880 according to the type of touch event and provides corresponding vision on display panel 841
Output.Although in fig. 8, touch panel 831 and display panel 841 are to realize terminal device as two independent components
Input and input function, but in some embodiments it is possible to touch panel 831 and display panel 841 are integrated and realized eventually
End equipment outputs and inputs function.
Terminal device may also include at least one sensor 850, such as optical sensor, motion sensor and other sensings
Device.Specifically, optical sensor may include ambient light sensor and proximity sensor, wherein ambient light sensor can be according to environment
The light and shade of light adjusts the brightness of display panel 841, and proximity sensor can close display when terminal device is moved in one's ear
Panel 841 and/or backlight.As a kind of motion sensor, accelerometer sensor can detect (generally three in all directions
Axis) acceleration size, can detect that size and the direction of gravity when static, can be used to identify the application of terminal device posture
(such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, percussion) etc.;
The other sensors such as the gyroscope, barometer, hygrometer, thermometer, the infrared sensor that can also configure as terminal device,
This is repeated no more.
Voicefrequency circuit 860, loudspeaker 861, microphone 862 can provide the audio interface between user and terminal device.Sound
Electric signal after the audio data received conversion can be transferred to loudspeaker 861, be converted to by loudspeaker 861 by frequency circuit 860
Voice signal output;On the other hand, the voice signal of collection is converted to electric signal by microphone 862, is received by voicefrequency circuit 860
After be converted to audio data, then by after the processing of audio data output processor 880, such as another end is sent to through RF circuit 88
End equipment, or audio data is exported to memory 820 to be further processed.
WiFi belongs to short range wireless transmission technology, and terminal device can help user to receive and dispatch electricity by WiFi module 870
Sub- mail, browsing webpage and access streaming video etc., it provides wireless broadband internet access for user.Although Fig. 8 shows
Go out WiFi module 870, but it is understood that, and it is not belonging to must be configured into for terminal device, it completely can be according to need
It to be omitted in the range for the essence for not changing application.
Processor 880 is the control centre of terminal device, utilizes each of various interfaces and the entire terminal device of connection
A part by running or execute the software program and/or module that are stored in memory 820, and calls and is stored in storage
Data in device 820 execute the various functions and processing data of terminal device, to carry out integral monitoring to terminal device.It can
Choosing, processor 880 may include one or more processing units;Preferably, processor 880 can integrate application processor and modulation
Demodulation processor, wherein the main processing operation system of application processor, user interface and application program etc., modulation /demodulation processing
Device mainly handles wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 880.
Terminal device further includes the power supply 890 (such as battery) powered to all parts, it is preferred that power supply can pass through electricity
Management system and processor 880 are logically contiguous, to realize management charging, electric discharge and power consumption by power-supply management system
The functions such as management.
Although being not shown, terminal device can also include camera, bluetooth module etc., and details are not described herein.
In the embodiment of the present application, processor 880 included by the terminal device also there is control to execute above by Fig. 6
Shown in method flow performed by device.For example, the processor 880 by call memory 820 in instruction, execute with
Lower operation:
Obtain corpus set to be processed;
Entity sets are extracted from the corpus set, the entity sets include the entity of multiple names;
Candidate upper set of words is extracted from the entity sets;
By the entity in the entity sets respectively with each upper word combination in the upper set of words of candidate, waited
Choosing to set, it is described it is candidate include that multiple candidate are right to set, the candidate to refer to the entity that has incidence relation with it is upper
Contamination;
By candidate to, each with candidate a prediction data is respectively configured to associated sentence, and to prediction data
In with candidate extensive processing is carried out to associated sentence;
Word segmentation processing is carried out to associated sentence to each candidate respectively, obtains set of words;
Extensive process layer is inputted to each word in the set of words to convert, and obtains vector set;
The vector set is trained according to the prediction data and shot and long term memory artificial neural network LSTM and
Prediction.
Fig. 9 is a kind of server architecture schematic diagram provided by the embodiments of the present application, which can be because of configuration or performance
It is different and generate bigger difference, it may include one or more central processing unit (full name in English: central
Processing units, English abbreviation: CPU) 922 (for example, one or more processors) and memory 932, one
Or (such as one or more mass memories are set the storage medium 930 of more than one storage application program 1542 or data 944
It is standby).Wherein, memory 932 and storage medium 930 can be of short duration storage or persistent storage.It is stored in the journey of storage medium 930
Sequence may include one or more modules (diagram does not mark), and each module may include to a series of fingers in server
Enable operation.Further, central processing unit 922 can be set to communicate with storage medium 930, execute on server 920
Series of instructions operation in storage medium 930.
Server 920 can also include one or more power supplys 926, one or more wired or wireless networks
Interface 950, one or more input/output interfaces 958, and/or, one or more operating systems 941, such as
Windows Server, Mac OS X, Unix, Linux, FreeBSD etc..
Step performed by device in above-described embodiment as shown in Figure 6 can be based on the server architecture shown in Fig. 9.
For example, the processor 922, by calling the instruction in memory 932, execution is following to be operated:
Obtain corpus set to be processed;
Entity sets are extracted from the corpus set, the entity sets include the entity of multiple names;
Candidate upper set of words is extracted from the entity sets;
By the entity in the entity sets respectively with each upper word combination in the upper set of words of candidate, waited
Choosing to set, it is described it is candidate include that multiple candidate are right to set, the candidate to refer to the entity that has incidence relation with it is upper
Contamination;
By candidate to, each with candidate a prediction data is respectively configured to associated sentence, and to prediction data
In with candidate extensive processing is carried out to associated sentence;
Word segmentation processing is carried out to associated sentence to each candidate respectively, obtains set of words;
Extensive process layer is inputted to each word in the set of words to convert, and obtains vector set;
The vector set is trained according to the prediction data and shot and long term memory artificial neural network LSTM and
Prediction.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, reference can be made to the related descriptions of other embodiments.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and module, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, device and method can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the module
It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple module or components
It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or
The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or module
It closes or communicates to connect, can be electrical property, mechanical or other forms.
The module as illustrated by the separation member may or may not be physically separated, aobvious as module
The component shown may or may not be physical module, it can and it is in one place, or may be distributed over multiple
On network module.Some or all of the modules therein can be selected to realize the mesh of this embodiment scheme according to the actual needs
's.
It, can also be in addition, can integrate in a processing module in each functional module in each embodiment of the application
It is that modules physically exist alone, can also be integrated in two or more modules in a module.Above-mentioned integrated mould
Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as
Fruit realizes and that when sold or used as an independent product can store can in a computer in the form of software function module
It reads in storage medium.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real
It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.
The computer program product includes one or more computer instructions.Load and execute on computers the meter
When calculation machine program instruction, entirely or partly generate according to process or function described in the embodiment of the present application.The computer can
To be general purpose computer, special purpose computer, computer network or other programmable devices.The computer instruction can be deposited
Storage in a computer-readable storage medium, or from a computer readable storage medium to another computer readable storage medium
Transmission, for example, the computer instruction can pass through wired (example from a web-site, computer, server or data center
Such as coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (such as infrared, wireless, microwave) mode to another website
Website, computer, server or data center are transmitted.The computer readable storage medium can be computer and can deposit
Any usable medium of storage either includes that the data storages such as one or more usable mediums integrated server, data center are set
It is standby.The usable medium can be magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or partly lead
Body medium (such as solid state hard disk Solid State Disk (SSD)) etc..
Technical solution provided herein is described in detail above, specific case is applied in the application to this
The principle and embodiment of application is expounded, the present processes that the above embodiments are only used to help understand and
Its core concept;At the same time, for those skilled in the art in specific embodiment and is answered according to the thought of the application
With in range, there will be changes, in conclusion the contents of this specification should not be construed as limiting the present application.
Claims (14)
1. a kind of method of training data, which is characterized in that the described method includes:
Obtain corpus set to be processed;
Entity sets are extracted from the corpus set, the entity sets include the entity of multiple names;
Candidate upper set of words is extracted from the entity sets;
By the entity in the entity sets respectively with each upper word combination in the upper set of words of candidate, it is candidate right to obtain
Set, it is described it is candidate include that multiple candidate are right to set, the candidate is to referring to the entity and hypernym for having incidence relation
Combination;
By candidate to, each with candidate a prediction data is respectively configured to associated sentence, and in prediction data with
Candidate carries out extensive processing to associated sentence;
Word segmentation processing is carried out to associated sentence to each candidate respectively, obtains set of words;
Extensive process layer is inputted to each word in the set of words to convert, and obtains vector set;
The vector set is trained and is predicted according to the prediction data and shot and long term memory artificial neural network LSTM.
2. the method according to claim 1, wherein the extensive process layer includes character layer and Hash hash
Layer, each word in the set of words input extensive process layer and convert, the word collection after being converted
It closes, comprising:
Each word in the set of words is inputted into the character layer respectively, the character layer will be inputted in the character layer
Word is respectively converted into word vectors, obtains word vectors set;
Each word in the set of words is inputted hash layers described respectively, described hash layers will be inputted at described hash layers
Word is respectively converted into hash vector, obtains hash vector set;
The vector set is obtained according to the word vectors set and the hash vector set.
3. described by institute's predicate according to the method described in claim 2, it is characterized in that, the set of words includes N number of word
Each word in language set inputs the character layer respectively, converts the word for inputting the character layer respectively in the character layer
For word vectors, the word vectors set is obtained, comprising:
First word is matched with the character in character look-up table, the corresponding n vector of n character is obtained, according to two-phase
The n vector and first word are generated word vectors by LSTM, first word refer in the set of words to
The word of training and prediction.
4. according to the method in claim 2 or 3, which is characterized in that each word by the set of words is distinguished
It inputs hash layers described, hash layers of the word will be inputted at described hash layers and be respectively converted into hash vector, obtain hash
Vector set, comprising:
N number of word is respectively mapped in K hash barrels using Hash hash function, respectively in hash barrels each to institute
It states N number of word to be compressed, obtains K hash vector, each hash vector corresponds to N number of word, and wherein N and K are positive
Integer, N > K.
5. according to the method described in claim 4, it is characterized in that, it is described according to the word vectors set and the hash to
Duration set obtains the vector set, comprising:
The word vectors and the K hash vector are spliced, the vector set is obtained.
6. according to the method described in claim 5, it is characterized in that, described extensive to each word input in the set of words
Process layer is converted, and after obtaining vector set, the first sentence correspondence in the corpus set obtains first matrix, institute
State corresponding word quantity after the first matrix is segmented according to first sentence, via what is exported after the extensive processing of the character layer
Vector dimension and via hash layers of extensive processing when set vector dimension obtain;
Candidate first candidate in set obtains second matrix to correspondence, and second matrix is according to the candidate
To word quantity corresponding after participle, via the vector dimension exported after the extensive processing of the character layer and via hash layers
Set vector dimension obtains when extensive processing.
7. a kind of device for training data, which is characterized in that described device includes:
Module is obtained, for obtaining corpus set to be processed;
Processing module, for extracting entity sets from the corpus set, the entity sets include the entity of multiple names;
Candidate upper set of words is extracted from the entity sets;
By the entity in the entity sets respectively with each upper word combination in the upper set of words of candidate, it is candidate right to obtain
Set, it is described it is candidate include that multiple candidate are right to set, the candidate is to referring to the entity and hypernym for having incidence relation
Combination;
By candidate to, each with candidate a prediction data is respectively configured to associated sentence, and in prediction data with
Candidate carries out extensive processing to associated sentence;
Word segmentation processing is carried out to associated sentence to each candidate respectively, obtains set of words;
Extensive process layer is inputted to each word in the set of words to convert, and obtains vector set;
The vector set is trained and is predicted according to the prediction data and shot and long term memory artificial neural network LSTM.
8. device according to claim 7, which is characterized in that the extensive process layer includes character layer and Hash hash
Layer, the processing module are specifically used for:
Each word in the set of words is inputted into the character layer respectively, the character layer will be inputted in the character layer
Word is respectively converted into word vectors, obtains word vectors set;
Each word in the set of words is inputted hash layers described respectively, described hash layers will be inputted at described hash layers
Word is respectively converted into hash vector, obtains hash vector set;
The vector set is obtained according to the word vectors set and the hash vector set.
9. device according to claim 8, which is characterized in that the set of words includes N number of word, the processing module
It is specifically used for:
First word is matched with the character in character look-up table, the corresponding n vector of n character is obtained, according to two-phase
The n vector and first word are generated word vectors by LSTM, first word refer in the set of words to
The word of training and prediction.
10. device according to claim 8 or claim 9, which is characterized in that the processing module is specifically used for:
N number of word is respectively mapped in K hash barrels using Hash hash function, respectively in hash barrels each to institute
It states N number of word to be compressed, obtains K hash vector, each hash vector corresponds to N number of word, and wherein N and K are positive
Integer, N > K.
11. device according to claim 10, which is characterized in that the processing module is specifically used for:
The word vectors and the K hash vector are spliced, the vector set is obtained.
12. device according to claim 11, which is characterized in that the first sentence correspondence in the corpus set obtains one
A first matrix, corresponding word quantity, general via the character layer after first matrix is segmented according to first sentence
The vector dimension that exports and via hash layers of extensive processing when set vector dimension obtain after change processing;
Candidate first candidate in set obtains second matrix to correspondence, and second matrix is according to the candidate
To word quantity corresponding after participle, via the vector dimension exported after the extensive processing of the character layer and via hash layers
Set vector dimension obtains when extensive processing.
13. a kind of computer storage medium, which is characterized in that it includes instruction, when run on a computer, so that calculating
Machine executes the method as described in claim 1-6 is any.
14. a kind of computer program product comprising instruction, which is characterized in that when run on a computer, so that calculating
Machine executes any method of the claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711269292.9A CN110019648B (en) | 2017-12-05 | 2017-12-05 | Method and device for training data and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711269292.9A CN110019648B (en) | 2017-12-05 | 2017-12-05 | Method and device for training data and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110019648A true CN110019648A (en) | 2019-07-16 |
CN110019648B CN110019648B (en) | 2021-02-02 |
Family
ID=67185955
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711269292.9A Active CN110019648B (en) | 2017-12-05 | 2017-12-05 | Method and device for training data and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110019648B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110765244A (en) * | 2019-09-18 | 2020-02-07 | 平安科技(深圳)有限公司 | Method and device for acquiring answering, computer equipment and storage medium |
US11501070B2 (en) | 2020-07-01 | 2022-11-15 | International Business Machines Corporation | Taxonomy generation to insert out of vocabulary terms and hypernym-hyponym pair induction |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8086549B2 (en) * | 2007-11-09 | 2011-12-27 | Microsoft Corporation | Multi-label active learning |
CN105808525A (en) * | 2016-03-29 | 2016-07-27 | 国家计算机网络与信息安全管理中心 | Domain concept hypernym-hyponym relation extraction method based on similar concept pairs |
CN106407211A (en) * | 2015-07-30 | 2017-02-15 | 富士通株式会社 | Method and device for classifying semantic relationships among entity words |
CN106570179A (en) * | 2016-11-10 | 2017-04-19 | 中国科学院信息工程研究所 | Evaluative text-oriented kernel entity identification method and apparatus |
CN106649819A (en) * | 2016-12-29 | 2017-05-10 | 北京奇虎科技有限公司 | Method and device for extracting entity words and hypernyms |
CN106919977A (en) * | 2015-12-25 | 2017-07-04 | 科大讯飞股份有限公司 | A kind of feedforward sequence Memory Neural Networks and its construction method and system |
CN106980608A (en) * | 2017-03-16 | 2017-07-25 | 四川大学 | A kind of Chinese electronic health record participle and name entity recognition method and system |
US20170221474A1 (en) * | 2016-02-02 | 2017-08-03 | Mitsubishi Electric Research Laboratories, Inc. | Method and System for Training Language Models to Reduce Recognition Errors |
WO2017130089A1 (en) * | 2016-01-26 | 2017-08-03 | Koninklijke Philips N.V. | Systems and methods for neural clinical paraphrase generation |
CN107203511A (en) * | 2017-05-27 | 2017-09-26 | 中国矿业大学 | A kind of network text name entity recognition method based on neutral net probability disambiguation |
CN107273357A (en) * | 2017-06-14 | 2017-10-20 | 北京百度网讯科技有限公司 | Modification method, device, equipment and the medium of participle model based on artificial intelligence |
-
2017
- 2017-12-05 CN CN201711269292.9A patent/CN110019648B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8086549B2 (en) * | 2007-11-09 | 2011-12-27 | Microsoft Corporation | Multi-label active learning |
CN106407211A (en) * | 2015-07-30 | 2017-02-15 | 富士通株式会社 | Method and device for classifying semantic relationships among entity words |
CN106919977A (en) * | 2015-12-25 | 2017-07-04 | 科大讯飞股份有限公司 | A kind of feedforward sequence Memory Neural Networks and its construction method and system |
WO2017130089A1 (en) * | 2016-01-26 | 2017-08-03 | Koninklijke Philips N.V. | Systems and methods for neural clinical paraphrase generation |
US20170221474A1 (en) * | 2016-02-02 | 2017-08-03 | Mitsubishi Electric Research Laboratories, Inc. | Method and System for Training Language Models to Reduce Recognition Errors |
CN105808525A (en) * | 2016-03-29 | 2016-07-27 | 国家计算机网络与信息安全管理中心 | Domain concept hypernym-hyponym relation extraction method based on similar concept pairs |
CN106570179A (en) * | 2016-11-10 | 2017-04-19 | 中国科学院信息工程研究所 | Evaluative text-oriented kernel entity identification method and apparatus |
CN106649819A (en) * | 2016-12-29 | 2017-05-10 | 北京奇虎科技有限公司 | Method and device for extracting entity words and hypernyms |
CN106980608A (en) * | 2017-03-16 | 2017-07-25 | 四川大学 | A kind of Chinese electronic health record participle and name entity recognition method and system |
CN107203511A (en) * | 2017-05-27 | 2017-09-26 | 中国矿业大学 | A kind of network text name entity recognition method based on neutral net probability disambiguation |
CN107273357A (en) * | 2017-06-14 | 2017-10-20 | 北京百度网讯科技有限公司 | Modification method, device, equipment and the medium of participle model based on artificial intelligence |
Non-Patent Citations (6)
Title |
---|
CHAO MA等: "Unsupervised Video Hashing by Exploiting Spatio-Temporal Feature", 《INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING》 * |
JULIAN GEORG ZILLY等: "Recurrent highway networks", 《ARXIV:1607.03474V5》 * |
ZIMING ZHANG等: "Efficient Training of Very Deep Neural Networks for Supervised Hashing", 《ARXIV:1511.04524V2》 * |
张俊驰: "基于循环神经网络的依存句法分析模型研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
李彦鹏: "特征耦合泛化及其在文体挖掘中的应用", 《中国博士学位论文全文数据库 信息科技辑》 * |
胡新辰: "基于LSTM的语义关系分类研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110765244A (en) * | 2019-09-18 | 2020-02-07 | 平安科技(深圳)有限公司 | Method and device for acquiring answering, computer equipment and storage medium |
CN110765244B (en) * | 2019-09-18 | 2023-06-06 | 平安科技(深圳)有限公司 | Method, device, computer equipment and storage medium for obtaining answering operation |
US11501070B2 (en) | 2020-07-01 | 2022-11-15 | International Business Machines Corporation | Taxonomy generation to insert out of vocabulary terms and hypernym-hyponym pair induction |
Also Published As
Publication number | Publication date |
---|---|
CN110019648B (en) | 2021-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102646667B1 (en) | Methods for finding image regions, model training methods, and related devices | |
WO2020108483A1 (en) | Model training method, machine translation method, computer device and storage medium | |
CN111428516B (en) | Information processing method and device | |
CN111553162B (en) | Intention recognition method and related device | |
CN108280458B (en) | Group relation type identification method and device | |
CN111046227B (en) | Video duplicate checking method and device | |
CN108304388A (en) | Machine translation method and device | |
WO2019062413A1 (en) | Method and apparatus for managing and controlling application program, storage medium, and electronic device | |
WO2020147369A1 (en) | Natural language processing method, training method, and data processing device | |
CN111816159B (en) | Language identification method and related device | |
CN108228270A (en) | Start resource loading method and device | |
JP2017514204A (en) | Contact grouping method and apparatus | |
CN110019825B (en) | Method and device for analyzing data semantics | |
CN110069715A (en) | A kind of method of information recommendation model training, the method and device of information recommendation | |
CN114444579B (en) | General disturbance acquisition method and device, storage medium and computer equipment | |
CN113821589A (en) | Text label determination method and device, computer equipment and storage medium | |
CN113723378A (en) | Model training method and device, computer equipment and storage medium | |
CN108846051A (en) | Data processing method, device and computer readable storage medium | |
CN110019648A (en) | A kind of method, apparatus and storage medium of training data | |
CN112862021B (en) | Content labeling method and related device | |
CN114840563B (en) | Method, device, equipment and storage medium for generating field description information | |
CN113569043A (en) | Text category determination method and related device | |
CN114840499A (en) | Table description information generation method, related device, equipment and storage medium | |
CN111062198A (en) | Big data-based enterprise category analysis method and related equipment | |
CN110781274A (en) | Question-answer pair generation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |