CN108363685A - Based on recurrence variation own coding model from media data document representation method - Google Patents
Based on recurrence variation own coding model from media data document representation method Download PDFInfo
- Publication number
- CN108363685A CN108363685A CN201711417351.2A CN201711417351A CN108363685A CN 108363685 A CN108363685 A CN 108363685A CN 201711417351 A CN201711417351 A CN 201711417351A CN 108363685 A CN108363685 A CN 108363685A
- Authority
- CN
- China
- Prior art keywords
- text
- coding
- variation
- recurrence
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Abstract
The present invention provide it is a kind of based on recurrence variation own coding model from media data document representation method, this method includes:The language material text of input is pre-processed, is encoded using recurrent neural network encoding model, the text vector of fixed dimension is generated;Mean vector and variance vectors are generated by the text vector of fixed dimension, the collecting sample from standardized normal distribution generates potential coded representation z using mean vector, variance vectors and sample using the method for variation reasoning;Then it is decoded to obtain decoding sequence using recurrent neural network decoded model, coding loss between calculation code sequence and decoding sequence, and the divergence between potential coded representation z and standardized normal distribution, the parameter of recurrence variation own coding model is updated using coding loss and divergence.The method coding efficiency of the present invention is high, can better adapt to the coded representation from media data, the distribution of data can also be described while the content to data is fitted.
Description
Technical field
It is based on passing the present invention relates to deep learning and from media data text content analysis technical field, more particularly to one kind
Return variation own coding model from media data document representation method.
Background technology
With the development of Social Media in recent years, user generates largely from media short text content, these texts
Hold due to lacking effective contextual information, is difficult to indicate this class text using traditional bag of words.
Deep learning is derived from the research of artificial neural network, and the multitiered network containing more hidden layers is exactly a kind of deep learning knot
Structure.Deep learning forms more abstract high-rise expression attribute classification or feature by combining low-level feature, to find data
Distributed nature indicates.The concept of deep learning was proposed by Hinton et al. in 2006.Based on depth confidence network (DBN)
It is proposed that non-supervisory greed successively training algorithm then proposes multilayer autocoding to solve the relevant optimization problem of deep structure
Device deep structure.And the convolutional neural networks proposed by Lecun Yann et al. are first real multilayered structure learning algorithms,
It reduces number of parameters to improve training performance using spatial correlation.Deep learning is exactly that one is generated from an input
The involved calculating of output can indicate that each node indicates a basic meter in this figure by a flow graph
It calculates and the value of a calculating, the result of calculating is applied to the value of the child node of this node.Deep learning simulates the mankind
Cognitive process successively carries out, and gradually abstract process, i.e., learn simple concept first, then goes to indicate more to take out in this way
The thought and concept of elephant.This method has been successfully applied to the fields such as computer vision, speech recognition, although depth in recent years
The application that learning method is applied to natural language processing receives prodigious concern, but is mostly based on the design of model, shortage pair
The introducing of knowledge.
It is traditional to indicate that study is mostly based on bag of words from media content of text for the presentation technology of content of text
And using words representation methods such as only heat, this will inevitably result in " vocabulary wide gap " phenomenon, i.e. language serious between word and word
Word similar in justice is also mutually orthogonal in vector indicates.Although these methods are more effective when indicating traditional text,
But it is applied to then will appear serious Sparse Problem from media text representation.Traditional method generally use craft feature into
Row indicates the feature extraction of study from media content of text, but this method depends on artificial experience, for some professional domains
Then need corresponding expert to build knowledge base from media data could preferably to realize the expression of these data texts.
There are various data text analysis methods in the prior art, but these data text analysis methods are for common mostly
Or part special dimension is analyzed from media data content of text, and these analysis methods are usually only with simple
Text code is simply fitted data, lacks description to data distribution, therefore causes text representation inaccuracy etc. and asks
Topic.
Invention content
It is an object of the present invention to provide it is a kind of based on recurrence variation own coding model from media data document representation method,
Its coding efficiency is high, can better adapt to the coded representation from media data, and be fitted in the content to data
The distribution of data can also be described simultaneously.
The present invention provide it is a kind of based on recurrence variation own coding model from media data document representation method, wherein should
Method includes the following steps:
Step S100, the language material text of input is pre-processed, obtains coded sequence;
Step S200, the coded sequence is encoded using recurrent neural network encoding model, generates fixed dimension
Text vector;
Step S300, mean vector and variance vectors are generated by the text vector of the fixed dimension, then just from standard
Collecting sample in state distribution is given birth to using the mean vector, the variance vectors and the sample using the method for variation reasoning
At potential coded representation z;
Step S400, the potential coded representation z is decoded using recurrent neural network decoded model and is decoded
Sequence calculates coding loss between the coded sequence and the decoding sequence and the potential coded representation z and standard
Divergence between normal distribution updates the parameter of recurrence variation own coding model using the coding loss and the divergence.
It is preferred that, wherein pretreatment is carried out to the language material text of input in step S100 and is included the following steps:
Step S110, filter every input the language material text, and remove the label of the language material text, label symbol with
And link, and word segmentation processing is carried out to the content of the language material text and generates text T;
Step S120, the word in the language material text is counted, and generates the dictionary of word in the language material text, it is right
Word in each language material text carries out vector initialising, wherein the initialization vector of the word in each language material text
Dimension is according to experiment performance setting;
Step S130, dependency structure analysis is carried out to the text T, and serializing is carried out to the structure after analysis and is handled
To coded sequence.
It is preferred that, wherein further include in step s 130:
Text content analysis is carried out to the text T using the dependency analysis device of Stamford and generates interdependent tree construction;
The serializing for carrying out binary tree to the interdependent tree construction handles to obtain the coded sequence.
It is preferred that, wherein in step S200 to the coded sequence using recurrent neural network encoding model into
Row coding, the term vector that when coding uses include the initialization vector and/or term vector trained in advance.
It is preferred that, wherein in step S200 to the coded sequence using recurrent neural network encoding model into
Row coding, the text vector for generating fixed dimension include the following steps:
Two S210, selection child node c1And c2, by the c1With the c2Generate first father node p1;
S220, the father node p by generating1New child node, which is constituted, with the word in the coded sequence generates second
Father node p2;
S230, it is encoded with step S220 recurrence, is generated every time by the word in a father node and a coded sequence
New father node, until word all in the coded sequence is encoded position;Wherein,
During coding, code weight WeIt is shared in each coding, so that the text for making coding generate is compiled
Code table is shown as the vector of the fixed dimension.
It is preferred that, wherein in step S300 by identical mapping generate the mean vector and the variance to
Amount.
It is preferred that, wherein step S300 includes:
Variable of the acquisition for generating the potential coded representation z, the distribution table of the variable in standard is just distributed very much
Divergence when showing for model training calculates;
The variable and the variance vectors quadrature, then sum obtained product and the mean vector, and then
To the potential coded representation z.
It is preferred that, wherein the decoding process of potential coded representation z described in step S400 includes the following steps:
S410, the input vector that dimension is twice of the coded representation z is generated on the basis of the coded representation z
X a, wherein part of the input vector x is child node c, and another part is for decoding father node p;
S420, continue to decode the father node p, obtain new child node c1' and p1', wherein the p1' is for solving
The new father node of code;
S430, with step S420 recursive decodings, every time by a new child node as decoded father node in next step into
Row decoding, until generating the decoding sequence identical with the coded sequence length.
It is preferred that, wherein the decoding sequence and the volume are calculated by Euclidean distance in step S400
Coding loss between code sequence.
It is preferred that, wherein the recurrence variation own coding mould is updated by back-propagation algorithm in step S400
The parameter of type.
The present invention has the following advantages and beneficial effect:
1, it is of the invention based on recurrence variation own coding model from media data document representation method, in content of text
In terms of expression, overcomes tradition and lack problem of representation caused by context, and the party when being indicated from media data content of text
Method introduces Heuristics by the expression that existing text processing facilities are content of text, improves the performance of text representation.
2, it is of the invention based on recurrence variation own coding model from media data document representation method, using recurrent neural
The encoding model of network is not only able to for carrying out sequential encoding to content of text, can also be in the text with tree construction
Appearance is encoded, and can only be carried out the deficiency of sequential encoding to content of text to effectively prevent conventional method, preferably be combined
The real structure of text is indicated it, and then the structure of coded representation is made to be more in line with actual demand.
3, it is of the invention based on recurrence variation own coding model from media data document representation method, utilize variation reasoning
Method preferably embody the process being really distributed of deep learning method analogue data.
4, it is of the invention based on recurrence variation own coding model from media data document representation method, using passing for expansion
Return neural network decoded model, the input content of text can be reconstructed, and by modes such as Euclidean distance calculating come measurement model
Coding efficiency, and by with new model parameter come Optimized model to the expression from media data content of text.
5, it is of the invention based on recurrence variation own coding model from media data document representation method, pass through the standard of introducing
Normal distribution simultaneously calculates the mean vector of input text and variance vectors arrive potential coded representation z, and potential coded representation z accumulates
Contain the knowledge such as term vector knowledge, text structure, and met certain distribution, and can be set as needed the dimension of vector,
Contain more characteristic informations than traditional recurrence coding vector, is conducive to the expression and calculating of text.
6, it is of the invention based on recurrence variation own coding model from media data document representation method, coding can be utilized
The parameter of loss and divergence update recurrence variation own coding model, and then Optimized model and it is preferably fitted training data, raising
Coding efficiency.
Description of the drawings
It will be briefly described attached drawing used in this application below, it should be apparent that, these attached drawings are only used for explaining the present invention
Design.
Fig. 1 is the flow chart from media data document representation method based on recurrence variation own coding model of the present invention;
Fig. 2 is that the use from media data document representation method based on recurrence variation own coding model of the present invention is interdependent
The flow chart for the context dependent structure that analyzer obtains;
Fig. 3 is the refreshing using recurrence from media data document representation method based on recurrence variation own coding model of the present invention
The structural schematic diagram encoded through network code model;
Fig. 4 be the present invention based on recurrence variation own coding model from media data document representation method generate mean value to
The flow chart of amount and variance vectors;
Fig. 5 be the present invention based on recurrence variation own coding model from media data document representation method from just dividing very much
Sample variation and the flow chart of potential coded representation is generated in cloth;
Fig. 6 is the recurrence variation from media data document representation method based on recurrence variation own coding model of the present invention
The structural schematic diagram of own coding model.
Specific implementation mode
Hereinafter, with reference to the accompanying drawings the description present invention based on recurrence variation own coding model from media data text
The embodiment of representation method.
The embodiment recorded herein is the specific specific implementation mode of the present invention, for illustrating design of the invention,
It is explanatory and illustrative, should not be construed as the limitation to embodiment of the present invention and the scope of the invention.Except what is recorded herein
Outside embodiment, those skilled in the art can also based on the application claims and specification disclosure of that using aobvious and
The other technical solutions being clear to, these technical solutions include to the embodiment recorded herein make it is any it is obvious replacement and
The technical solution of modification.
The attached drawing of this specification is schematic diagram, aids in illustrating the design of the present invention, it is schematically indicated the shape of each section
And its correlation.The structure of each section for the ease of clearly showing the embodiment of the present invention is note that, between each attached drawing
Not necessarily drawn according to identical ratio.Same or analogous reference marker is for indicating same or analogous part.
Referring to Fig. 1, the present invention provide it is a kind of based on recurrence variation own coding model from media data document representation method,
Wherein, this approach includes the following steps:
Step S100, the language material text of input is pre-processed, obtains coded sequence;
Step S200, coded sequence is encoded using recurrent neural network encoding model, generates the text of fixed dimension
This vector;
Step S300, mean vector and variance vectors are generated by the text vector of fixed dimension, then from standard normal point
Collecting sample in cloth generates potential coded representation z using mean vector, variance vectors and sample using the method for variation reasoning;
Step S400, potential coded representation z is decoded to obtain decoding sequence using recurrent neural network decoded model,
Dissipating between the coding loss and potential coded representation z and standardized normal distribution between calculation code sequence and decoding sequence
Degree updates the parameter of recurrence variation own coding model using coding loss and divergence.
Potential coded representation z obtained by calculation has contained term vector knowledge, text structure etc. in the method for the present invention
Knowledge, and meet certain distribution, and the dimension of vector can be set according to actual needs, referring now to traditional recurrence encode to
Amount contains more characteristic informations, is conducive to the expression and calculating of text, and reduce coding dimension, improves computational efficiency.
In addition, the method for the present invention is by using the coding loss between potential coding calculation code sequence and decoding sequence, Yi Jiqian
Divergence between coded representation z and standardized normal distribution automatically updates recurrence variation own coding using coding loss and divergence
The parameter of model to effectively raise the coding efficiency of model, and inputs different texts, the recurrence variation own coding mould
Type can automatically update parameter according to the content of text, and then different texts is made accurately to be indicated.
Further, further include following step in being pre-processed to the language material text of input in the step S100 of the present invention
Suddenly:
Step S110, the language material text of every input is filtered, and removes the label of language material text, label symbol and link,
And word segmentation processing is carried out to the content of language material text and generates text T;
Step S120, the word in language material text is counted, and generates the dictionary of word in language material text, to each language material
Word in text carries out vector initialising, wherein the initialization vector dimension of the word in each language material text is showed according to experiment
Setting;
Step S130, dependency structure analysis is carried out to text T, and carries out serializing to the structure after analysis and handle to be compiled
Code sequence.
Further, text content analysis generation is carried out to text T using the dependency analysis device of Stamford in step S130
Interdependent tree construction;The serializing that binary tree is carried out to interdependent tree construction handles to obtain coded sequence.By the way that text is carried out structure
Analysis can overcome the shortcomings of that conventional method can only carry out content of text sequential encoding, preferably combine the real structure of text
It is indicated, is more in line with actual demand.
Fig. 2 is that the use from media data document representation method based on recurrence variation own coding model of the present invention is interdependent
The flow chart for the context dependent structure that analyzer obtains.It is further illustrated the present invention with reference to Fig. 2 and specific embodiment
Method.
Fig. 2 is indicated to from media data content of text " My cat also likes eating fish and
The input of hamburger " carries out the process of text retrieval conference TREC by dependency analysis device.It is passed through from media text data in input
The interdependent tree construction of raw text after dependency analysis device is crossed, the word " likes " in text is connected to " My cat " and " eating
The content of two parts fish and hamburger ", wherein adverbial word " also " modification verb " likes ", and " My cat "
Be made of word " My " and " cat ", " eating fish and hamburger " and can be further split into " eating " and
" fish and hamburger " two parts, " fish " and " hamburger " then constitutes structure arranged side by side by conjunction " and ".It is logical
Cross above-mentioned dependency analysis tool, can using the knowledge of external resource carrying out explicit representation from the structure of media data text,
And it is encoded by this explicit representation.The dependence of such structural visual being depicted between each word,
It indicates between word in syntactical Matching Relation, and this Matching Relation is associated with semanteme, and then makes coded representation
Context between it is more coherent.
Further, coded sequence is encoded using recurrent neural network encoding model in step S200, when coding
The term vector of use includes initialization vector and/or term vector trained in advance, Heuristics can be thus introduced, to subtract
Few encoding calculation amount, improves code efficiency.
Specifically, coded sequence is encoded in step S200 using recurrent neural network encoding model, generates and fixes
The text vector of dimension includes the following steps:
Two S210, selection child node c1And c2, by c1And c2Generate first father node p1;
S220, the father node p by generating1New child node, which is constituted, with the word in coded sequence generates second father node p2;
S230, it is encoded with step S220 recurrence, is generated newly by the word in a father node and a coded sequence every time
Father node, until word all in coded sequence is encoded position;Wherein,
During coding, code weight WeIt is shared in each coding, so that the text for making coding generate is compiled
Code table is shown as the vector of fixed dimension.
Fig. 3 is indicated to the process from media data content of text coded representation, here to use recurrent neural network
To list entries x=w1,w2,…,w4It carries out describing cataloged procedure for coded representation.The coding structure is first the word of input
Vectorial w1And w2It connects, is expressed as the child node vector [c that a dimension is 2n1;c2], it should be noted that (w1,w2)=
(c1,c2), then utilize formula p=f (We[c1;c2]+be) pass through p1=f (We[w1;w2]+be) father node p is calculated1,
Again w3With the p being calculated1Combination is expressed as new [c1;c2], i.e. (c1,c2)=(p1,w3), recycle formula p=f (We
[c1;c2]+be) pass through p2=f (We[p1;w3]+be) father node p is calculated2, using p3=f (We[p2;w4]+be) meter
Calculation obtains father node p3, then recurrence is all encoded position until the word in coded sequence successively.Due to recurrence encoding model profit
Text representation is carried out with this binary combination, it is therefore desirable to text is expressed as diadactic structure according to certain mode, and step
It is exactly the sequential organization of text to be expressed as to the process of hierarchical structure, and then expand to carry out dependency structure analysis to text in S130
The applicability of the method for the present invention model.
Further, mean vector and variance vectors are generated by identical mapping in step S300.
If Fig. 4 and Fig. 5 are the processes for carrying out variation reasoning by obtained coded representation, due to the latent variable table of generation
Show z need meet obey distribution N (μ, σ) condition, wherein μ indicate generate mean vector, and σ indicate generate variance to
Amount, wherein the process for generating mean vector and variance vectors is as shown in Figure 4.As shown in figure 5, generating potential coding by z=μ+ε σ
It indicates, wherein ε~N (0, I).Variable of the acquisition for generating potential coded representation z, the distribution of variable in standard is just distributed very much
Divergence when indicating for model training calculates;Variable and variance vectors quadrature, then ask obtained product with mean vector
With, and then obtain potential coded representation z.That is Fig. 4 and Fig. 5 is described carries out Reparameterization using the coded representation of variation reasoning
Processing, since the coded representation z of generation obeys distribution N (μ, σ), thus its obtained coding be distributed as a region without
It is a single point, i.e., preferably describes the distribution of data.
Specifically, the decoding process of potential coded representation z includes the following steps in step S400:S410, in coded representation z
On the basis of generate the input vector x that dimension is twice of coded representation z, wherein a part of input vector x is child node
C, another part are for decoding father node p;S420, continue to decode father node p, obtain new child node c1' and p1',
In, p1' is for decoded new father node;S430, with step S420 recursive decodings, every time by a new child node conduct
Decoded father node is decoded in next step, until generating decoding sequence identical with coded sequence length.
Fig. 6 is the recurrence variation from media data document representation method based on recurrence variation own coding model of the present invention
The structural schematic diagram of own coding model.As seen from the figure, method of the invention is latent by what is generated after obtaining potential coded representation z
Be converted into for decoded input expression in coded representation z, if such as the dimension of term vector from media data content of text be
100 dimensions, and the vector dimension of the coded representation z generated is 50 dimensions, then needs to make it be converted into 100 by the processing of neural network
The vector of dimension indicates.The coded representation p for generating child node is obtained after transform coding3', below equally to generate coding
Explanation is decoded for the input of four words, first by p3' passes through decoding matrix WdGenerate the vector of one 200 dimension, the vector
It is divided into two parts, the w that preceding 100 dimension obtains for decoding4', rear 100 dimension are the father node p of subsequent decoding2' passes through father node p2'
Regenerate w3' and father node p1', then w is generated by the father node2' and w1', the decoding process of implementation model, passes through Euclidean distance
The coding loss between decoding sequence and coded sequence is calculated, recurrence variation own coding mould is updated by back-propagation algorithm
The parameter and Optimized model of type.By the coding and decoding of model can complete coding text input and reconstruct text it is defeated
Enter, realize the unsupervised expression from media data content of text, due to its unsupervised characteristic, so as to better adapt to
From the coded representation of media data.
The method of the present invention by recurrent neural network encoding model and recurrent neural network decoded model to input from
Media data text is encoded, and potential coded representation z is then calculated, then by being decoded to potential coded representation z,
In the divergence lost by calculation code and between potential coded representation z and standardized normal distribution, using the coding loss and dissipate
The parameter of degree update recurrence variation own coding model, improves the coding efficiency of model.Also, the model can be according to different defeated
Enter the different potential coded representation z of text generation, and then realizes and accurate coded representation is carried out to different input texts.
Above to the embodiment party from media data document representation method based on recurrence variation own coding model of the present invention
Formula is illustrated.For the present invention based on recurrence variation own coding model from the specific of media data document representation method
Feature can specifically be designed according to the effect of the feature of above-mentioned disclosure, these designs are that those skilled in the art can be real
Existing.Moreover, each technical characteristic of above-mentioned disclosure is not limited to disclosed and other feature combination, those skilled in the art
Other combinations between each technical characteristic can be also carried out according to the purpose of the present invention, be subject to realize the present invention purpose.
Claims (10)
1. it is a kind of based on recurrence variation own coding model from media data document representation method, wherein this method includes following
Step:
Step S100, the language material text of input is pre-processed, obtains coded sequence;
Step S200, the coded sequence is encoded using recurrent neural network encoding model, generates the text of fixed dimension
This vector;
Step S300, mean vector and variance vectors are generated by the text vector of the fixed dimension, then from standard normal point
Collecting sample in cloth is generated using the method for variation reasoning using the mean vector, the variance vectors and the sample and is dived
In coded representation z;
Step S400, the potential coded representation z is decoded to obtain decoding sequence using recurrent neural network decoded model,
Calculate coding loss between the coded sequence and the decoding sequence and the potential coded representation z and standard normal
Divergence between distribution updates the parameter of recurrence variation own coding model using the coding loss and the divergence.
2. as described in claim 1 based on recurrence variation own coding model from media data document representation method, wherein step
Pretreatment is carried out in rapid S100 to the language material text of input to include the following steps:
Step S110, the language material text of every input is filtered, and removes the label of the language material text, label symbol and chain
It connects, and word segmentation processing is carried out to the content of the language material text and generates text T;
Step S120, the word in the language material text is counted, and generates the dictionary of word in the language material text, to each
Word in the language material text carries out vector initialising, wherein the initialization vector dimension of the word in each language material text
According to experiment performance setting;
Step S130, dependency structure analysis is carried out to the text T, and carries out serializing to the structure after analysis and handle to be compiled
Code sequence.
3. as claimed in claim 2 based on recurrence variation own coding model from media data document representation method, wherein
Further include in step S130:
Text content analysis is carried out to the text T using the dependency analysis device of Stamford and generates interdependent tree construction;
The serializing for carrying out binary tree to the interdependent tree construction handles to obtain the coded sequence.
4. as described in claim 1 based on recurrence variation own coding model from media data document representation method, wherein step
The coded sequence is encoded using recurrent neural network encoding model in rapid S200, the term vector that when coding uses includes
The initialization vector and/or term vector trained in advance.
5. as described in claim 1 based on recurrence variation own coding model from media data document representation method, wherein step
The coded sequence is encoded using recurrent neural network encoding model in rapid S200, generates the text vector of fixed dimension
Include the following steps:
Two S210, selection child node c1And c2, by the c1With the c2Generate first father node p1;
S220, the father node p by generating1New child node, which is constituted, with the word in the coded sequence generates second father's section
Point p2;
S230, it is encoded with step S220 recurrence, is generated newly by the word in a father node and a coded sequence every time
Father node, until word all in the coded sequence is encoded position;Wherein,
During coding, code weight WeIt is shared in each coding, so that the text code for making coding generate indicates
For the vector of the fixed dimension.
6. as described in claim 1 based on recurrence variation own coding model from media data document representation method, wherein step
The mean vector and the variance vectors are generated by identical mapping in rapid S300.
7. as described in claim 1 based on recurrence variation own coding model from media data document representation method, wherein step
Suddenly S300 includes:
Variable of the acquisition for generating the potential coded representation z in standard is just distributed very much, the distribution of the variable indicate to use
Divergence when model training calculates;
The variable and the variance vectors quadrature, then sum obtained product and the mean vector, and then obtain institute
State potential coded representation z.
8. as described in claim 1 based on recurrence variation own coding model from media data document representation method, wherein step
The decoding process of potential coded representation z described in rapid S400 includes the following steps:
S410, the input vector x that dimension is twice of the coded representation z is generated on the basis of the coded representation z,
In, a part of the input vector x is child node c, and another part is for decoding father node p;
S420, continue to decode the father node p, obtain new child node c1' and p1', wherein the p1' is for decoded
New father node;
S430, with step S420 recursive decodings, solved every time as decoded father node in next step by a new child node
Code, until generating the decoding sequence identical with the coded sequence length.
9. as described in claim 1 based on recurrence variation own coding model from media data document representation method, wherein step
The coding loss between the decoding sequence and the coded sequence is calculated in rapid S400 by Euclidean distance.
10. as described in claim 1 based on recurrence variation own coding model from media data document representation method, wherein
The parameter of the recurrence variation own coding model is updated in step S400 by back-propagation algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711417351.2A CN108363685B (en) | 2017-12-25 | 2017-12-25 | Self-media data text representation method based on recursive variation self-coding model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711417351.2A CN108363685B (en) | 2017-12-25 | 2017-12-25 | Self-media data text representation method based on recursive variation self-coding model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108363685A true CN108363685A (en) | 2018-08-03 |
CN108363685B CN108363685B (en) | 2021-09-14 |
Family
ID=63010041
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711417351.2A Active CN108363685B (en) | 2017-12-25 | 2017-12-25 | Self-media data text representation method based on recursive variation self-coding model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108363685B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109213975A (en) * | 2018-08-23 | 2019-01-15 | 重庆邮电大学 | It is a kind of that special document representation method is pushed away from coding based on character level convolution variation |
CN109886388A (en) * | 2019-01-09 | 2019-06-14 | 平安科技(深圳)有限公司 | A kind of training sample data extending method and device based on variation self-encoding encoder |
CN111581916A (en) * | 2020-05-15 | 2020-08-25 | 北京字节跳动网络技术有限公司 | Text generation method and device, electronic equipment and computer readable medium |
CN113379068A (en) * | 2021-06-29 | 2021-09-10 | 哈尔滨工业大学 | Deep learning architecture searching method based on structured data |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1510931A1 (en) * | 2003-08-28 | 2005-03-02 | DVZ-Systemhaus GmbH | Process for platform-independent archiving and indexing of digital media assets |
CN101645786A (en) * | 2009-06-24 | 2010-02-10 | 中国联合网络通信集团有限公司 | Method for issuing blog content and business processing device thereof |
US9053431B1 (en) * | 2010-10-26 | 2015-06-09 | Michael Lamport Commons | Intelligent control with hierarchical stacked neural networks |
CN105469065A (en) * | 2015-12-07 | 2016-04-06 | 中国科学院自动化研究所 | Recurrent neural network-based discrete emotion recognition method |
CN106844327A (en) * | 2015-12-07 | 2017-06-13 | 科大讯飞股份有限公司 | Text code method and system |
CN107220311A (en) * | 2017-05-12 | 2017-09-29 | 北京理工大学 | A kind of document representation method of utilization locally embedding topic modeling |
-
2017
- 2017-12-25 CN CN201711417351.2A patent/CN108363685B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1510931A1 (en) * | 2003-08-28 | 2005-03-02 | DVZ-Systemhaus GmbH | Process for platform-independent archiving and indexing of digital media assets |
CN101645786A (en) * | 2009-06-24 | 2010-02-10 | 中国联合网络通信集团有限公司 | Method for issuing blog content and business processing device thereof |
US9053431B1 (en) * | 2010-10-26 | 2015-06-09 | Michael Lamport Commons | Intelligent control with hierarchical stacked neural networks |
CN105469065A (en) * | 2015-12-07 | 2016-04-06 | 中国科学院自动化研究所 | Recurrent neural network-based discrete emotion recognition method |
CN106844327A (en) * | 2015-12-07 | 2017-06-13 | 科大讯飞股份有限公司 | Text code method and system |
CN107220311A (en) * | 2017-05-12 | 2017-09-29 | 北京理工大学 | A kind of document representation method of utilization locally embedding topic modeling |
Non-Patent Citations (2)
Title |
---|
DIEDERIK P.KINGMA等: "Auto-Encoding Variational Bayes", 《ARXIV:1312.6114V10 [STAT.ML] 1 MAY 2014》 * |
佚名: "【Learning Notes】变分自编码器(Variational Auto-Encoder,VAE)", 《HTTPS://BLOG.CSDN.NET/JACKYTINTIN/ARTICLE/DETAILS/53641885》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109213975A (en) * | 2018-08-23 | 2019-01-15 | 重庆邮电大学 | It is a kind of that special document representation method is pushed away from coding based on character level convolution variation |
CN109213975B (en) * | 2018-08-23 | 2022-04-12 | 重庆邮电大学 | Twitter text representation method based on character level convolution variation self-coding |
CN109886388A (en) * | 2019-01-09 | 2019-06-14 | 平安科技(深圳)有限公司 | A kind of training sample data extending method and device based on variation self-encoding encoder |
WO2020143321A1 (en) * | 2019-01-09 | 2020-07-16 | 平安科技(深圳)有限公司 | Training sample data augmentation method based on variational autoencoder, storage medium and computer device |
CN109886388B (en) * | 2019-01-09 | 2024-03-22 | 平安科技(深圳)有限公司 | Training sample data expansion method and device based on variation self-encoder |
CN111581916A (en) * | 2020-05-15 | 2020-08-25 | 北京字节跳动网络技术有限公司 | Text generation method and device, electronic equipment and computer readable medium |
CN111581916B (en) * | 2020-05-15 | 2022-03-01 | 北京字节跳动网络技术有限公司 | Text generation method and device, electronic equipment and computer readable medium |
CN113379068A (en) * | 2021-06-29 | 2021-09-10 | 哈尔滨工业大学 | Deep learning architecture searching method based on structured data |
CN113379068B (en) * | 2021-06-29 | 2023-08-08 | 哈尔滨工业大学 | Deep learning architecture searching method based on structured data |
Also Published As
Publication number | Publication date |
---|---|
CN108363685B (en) | 2021-09-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106202010B (en) | Method and apparatus based on deep neural network building Law Text syntax tree | |
CN110321417B (en) | Dialog generation method, system, readable storage medium and computer equipment | |
CN108830287A (en) | The Chinese image, semantic of Inception network integration multilayer GRU based on residual error connection describes method | |
CN107203511A (en) | A kind of network text name entity recognition method based on neutral net probability disambiguation | |
CN108363695B (en) | User comment attribute extraction method based on bidirectional dependency syntax tree representation | |
CN109359297B (en) | Relationship extraction method and system | |
CN109213975B (en) | Twitter text representation method based on character level convolution variation self-coding | |
CN104598611B (en) | The method and system being ranked up to search entry | |
CN107748757A (en) | A kind of answering method of knowledge based collection of illustrative plates | |
CN108363685A (en) | Based on recurrence variation own coding model from media data document representation method | |
CN107273913B (en) | Short text similarity calculation method based on multi-feature fusion | |
CN110609899A (en) | Specific target emotion classification method based on improved BERT model | |
CN107608953B (en) | Word vector generation method based on indefinite-length context | |
CN110851575B (en) | Dialogue generating system and dialogue realizing method | |
CN112949647B (en) | Three-dimensional scene description method and device, electronic equipment and storage medium | |
CN113254610B (en) | Multi-round conversation generation method for patent consultation | |
CN113435211B (en) | Text implicit emotion analysis method combined with external knowledge | |
CN112417289B (en) | Information intelligent recommendation method based on deep clustering | |
CN111581966A (en) | Context feature fusion aspect level emotion classification method and device | |
US20200334410A1 (en) | Encoding textual information for text analysis | |
CN111858940A (en) | Multi-head attention-based legal case similarity calculation method and system | |
CN114936287A (en) | Knowledge injection method for pre-training language model and corresponding interactive system | |
CN114528898A (en) | Scene graph modification based on natural language commands | |
CN111540470B (en) | Social network depression tendency detection model based on BERT transfer learning and training method thereof | |
CN113761220A (en) | Information acquisition method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |