CN110168542A - For compressing the electronic equipment of language model, for providing the electronic equipment and its operating method of recommending word - Google Patents
For compressing the electronic equipment of language model, for providing the electronic equipment and its operating method of recommending word Download PDFInfo
- Publication number
- CN110168542A CN110168542A CN201880005774.XA CN201880005774A CN110168542A CN 110168542 A CN110168542 A CN 110168542A CN 201880005774 A CN201880005774 A CN 201880005774A CN 110168542 A CN110168542 A CN 110168542A
- Authority
- CN
- China
- Prior art keywords
- matrix
- projection
- word
- sharing
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000011017 operating method Methods 0.000 title description 10
- 239000011159 matrix material Substances 0.000 claims abstract description 773
- 238000012549 training Methods 0.000 claims abstract description 59
- 238000003860 storage Methods 0.000 claims abstract description 46
- 238000013528 artificial neural network Methods 0.000 claims abstract description 9
- 230000000306 recurrent effect Effects 0.000 claims abstract description 9
- 239000013598 vector Substances 0.000 claims description 121
- 238000000034 method Methods 0.000 claims description 60
- 230000006835 compression Effects 0.000 claims description 29
- 238000007906 compression Methods 0.000 claims description 29
- 230000004044 response Effects 0.000 claims description 28
- 238000012360 testing method Methods 0.000 claims description 10
- 238000011084 recovery Methods 0.000 claims description 9
- 238000000354 decomposition reaction Methods 0.000 claims description 8
- 230000017105 transposition Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 description 18
- 239000000047 product Substances 0.000 description 17
- 238000005516 engineering process Methods 0.000 description 14
- 238000004891 communication Methods 0.000 description 12
- 238000013473 artificial intelligence Methods 0.000 description 11
- 230000015556 catabolic process Effects 0.000 description 9
- 238000006731 degradation reaction Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 230000015654 memory Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 238000006243 chemical reaction Methods 0.000 description 5
- 238000004590 computer program Methods 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 238000003780 insertion Methods 0.000 description 4
- 230000037431 insertion Effects 0.000 description 4
- 210000004556 brain Anatomy 0.000 description 3
- 230000007774 longterm Effects 0.000 description 3
- 238000009877 rendering Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000019771 cognition Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000011002 quantification Methods 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 206010028916 Neologism Diseases 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 241001282153 Scopelogadus mizolepis Species 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000006227 byproduct Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000011229 interlayer Substances 0.000 description 1
- 238000013140 knowledge distillation Methods 0.000 description 1
- 229940050561 matrix product Drugs 0.000 description 1
- 230000001343 mnemonic effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000000060 site-specific infrared dichroism spectroscopy Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3347—Query execution using vector based model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/03—Arrangements for converting the position or the displacement of a member into a coded form
- G06F3/041—Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means
- G06F3/044—Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means by capacitive means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/274—Converting codes to words; Guess-ahead of partial word inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Databases & Information Systems (AREA)
- Algebra (AREA)
- Human Computer Interaction (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
Abstract
It provides a kind of for compressing the electronic equipment of language model, the electronic equipment includes: storage device, it is configured to store language model, which includes the embeded matrix and softmax matrix generated and carrying out recurrent neural network (RNN) training based on the master data for including multiple sentences;And processor, it is configured to: embeded matrix is converted into the product of the first projection matrix and sharing matrix, first projection matrix has size identical with the size of embeded matrix with the product of sharing matrix, and the transposed matrix of softmax matrix is converted into the product of the second projection matrix and sharing matrix, second projection matrix has size identical with the size of the transposed matrix of softmax matrix with the product of sharing matrix, and by being based on master data relative to the first projection matrix, second projection matrix and sharing matrix execute RNN training to update the first projection matrix, the element of second projection matrix and sharing matrix.
Description
Technical field
This disclosure relates in such as cognition, the judgement using the machine learning algorithms such as deep learning simulation human brain
Etc. in artificial intelligence (AI) system of functions in the application system electronic equipment of compression language model, for providing recommend word
Electronic equipment and its operating method, for example, be related to for based on for its execute recurrent neural network (RNN) training language
Model come compress language model electronic equipment, for provide recommend word electronic equipment and its operating method.
Background technique
Artificial intelligence (AI) system is to embody the computer system for the intelligence being equal with human intelligence, and be different from being based on
The intelligence system of rule, AI system is by training itself and determines to become intelligence.AI system use is more, the discrimination of system
Improvement is more, and system becomes able to more accurately understand user preference.Therefore, rule-based intelligence system by
AI system based on deep learning replaces.
AI technology can be configured to have machine learning (deep learning) and the element technology using machine learning.
Machine learning refer to itself classification and study input data characteristic algorithmic technique, and element technology be using
The machine learning algorithms such as deep learning come the reproduction technology of the functions such as the cognition, the judgement that replicate the human brain of human brain, and
And element technology includes the technical fields such as language understanding, visual analysis, deduction/prediction, knowledge representation, operation control.
AI technology uses and is applied to various fields.Language understanding technology can be related to identifying human language/text and answer
With/language/text is handled, and the technology may include the processing of nature word, machine translation, conversational system, answer a question, voice
Identification/synthesis etc..Visual analysis technology can be related to recognizing the object as human vision, and the technology may include identification pair
As, tracking object, image is searched for, people is identified, understands scene, understands space, upgrade image etc..Deduction and Predicting Technique can relate to
And judgement, reasoning and predictive information, and the technology may include the deduction of knowledge based/probability, Optimization Prediction, be based on
Plan, recommendation of preference etc..Knowledge representation technology can refer to human experience's information processing into knowledge data, and can wrap
It includes knowledge and establishes (data generation/classification), information management (data utilization) etc..Operation control technology, which can be, refers to control vehicle
Independent navigation, movement of robot etc., and may include motion control (navigation, collision, driving etc.), manipulation and control (behavior
Control) etc..
For example, AI system can learn various sentences, and system can be applied to the result next life Chinese idiom according to study
Say model.In addition, AI system can be provided by the similar process of the process learnt with the language model based on generation
Neologisms complete sentence.
Such language model can be generated based on a large amount of sentence is learnt, and the dimension of language model is higher, just
Integrality can be improved more.However, if the dimension of language model becomes higher, the data volume of language model can be with
Exponentially increase, and is likely difficult in the device of no enough memory spaces use language model.In addition, if dimension
Spending rank reduces to generate the language model used in the device of no enough memory spaces, then performance may also can drop
It is low.Therefore, it is necessary to for reducing data volume and the method for the performance degradation of minimum and/or reduction language model.
Summary of the invention
[technical problem]
One side according to an example embodiment of the present disclosure, is related to compressing in the case where no performance degradation and performs
RNN training language model electronic equipment, based on through compression language model and provide recommend word electronic equipment and its operation
Method.
[technical solution]
According to example embodiment, a kind of electronic equipment is provided, which includes: storage device, is configured to
Language model is stored, which includes by carrying out recurrent neural network based on the master data for including multiple sentences
(RNN) embeded matrix and softmax matrix of training and generation;And processor, it is configured to: embeded matrix is converted into
The product of first projection matrix and sharing matrix, the product have size identical with the size of embeded matrix, and will
The transposed matrix of softmax matrix is converted into the product of the second projection matrix and sharing matrix, which has and softmax square
The identical size of size of the transposed matrix of battle array, and by being thrown based on master data relative to the first projection matrix, second
Shadow matrix and sharing matrix execute RNN training to update the element of the first projection matrix, the second projection matrix and sharing matrix.
Processor can be by test module and relative to the first projection matrix, the second projection matrix and sharing matrix come based on
Calculate (determination) word complexity;It is equal to or more than predetermined value in response to word complexity, acquisition size is greater than the size of sharing matrix
New sharing matrix, and it is lower than predetermined value in response to word complexity, obtain the new shared square that size is less than the size of sharing matrix
Battle array;And the first projection matrix, the second projection matrix and sharing matrix are recalculated using new sharing matrix obtained.
Processor may include that (determination) ginseng is calculated relative to embeded matrix and softmax matrix based on text module
Word complexity is examined, and predetermined value is determined based on reference word complexity.
Shared square of the word complexity lower than the minimal size among multiple sharing matrix of predetermined value can be used in processor
Battle array recalculates and (redefines) the first projection matrix, the second projection matrix and sharing matrix, and based on recalculating
First projection matrix, the second projection matrix and sharing matrix come generate through compress language model.
Processor can be used singular value decomposition (SVD) and sharing matrix be converted into the first matrix, the second matrix and third
Matrix;By being based on master data relative to the first projection matrix, the second projection matrix, the first matrix, the second matrix and third
Matrix executes RNN training to update the first projection matrix, the second projection matrix, the first matrix, the second matrix and third matrix
Element;And based on the first projection matrix, the second projection matrix, the first matrix, the second matrix and the third square for having updated element
Battle array come generate through compress language model.
Processor can obtain the first data, wherein corresponding with first word included in one of multiple sentences
First vector is based on the first random matrix and is mapped to vector space;And it is included and the in response to inputting in the sentence
Second word after one word obtains the second data, wherein the second vector corresponding with second word is based on the first random square
Battle array and be mapped to vector space;And third data are generated based on the first data and the second data;Based on the second random matrix
And recovery vector is obtained from third data;It is sweared based on vector is restored with the third corresponding to the third word after second word
Difference between amount and update the element of the first random matrix and the second random matrix and execute training.
Processor can update the first random matrix and the second random matrix based on the remaining sentence in multiple sentences
Element, and the first random matrix that element is had updated based on remaining sentence and the second random matrix are stored in the storage device
As embeded matrix and softmax matrix.
The transposed matrix of embeded matrix and softmax matrix can have identical size.
According to example embodiment, provide it is a kind of for provide recommend word electronic equipment, the electronic equipment include: storage
Device is configured to store language model, which includes the first projection matrix for being used as embeded matrix, the first square
Battle array, the second matrix and third matrix, and be used as the second projection matrix of softmax matrix, the first matrix, the second matrix and
Third matrix;And processor, it is configured to: in response to inputting first word, the first data is obtained, wherein corresponding to first
First vector of a word is based on the first projection matrix, the first matrix, the second matrix and third matrix and is mapped to vector space;Base
The second vector is obtained from the first data in the second projection matrix, the first matrix, the second matrix and third matrix;And based on the
Two vectors and recommendation word is provided.
Processor can obtain the second data, wherein corresponding in response to inputting second word after inputting first word
The first projection matrix, the first matrix, the second matrix and third matrix are based in the third vector of second word and are mapped to vector
Space;Third data are generated based on the first data and the second data;Based on the second projection matrix, the first matrix, the second matrix
The four-vector is obtained from third data with third matrix;And recommendation word is provided based on the four-vector.
According to example embodiment, the operating method of a kind of electronic equipment compression language model is provided, in the electronic equipment
It is stored with language model, which includes by carrying out recurrent neural network based on the master data for including multiple sentences
(RNN) embeded matrix and softmax matrix of training and generation, which includes: that embeded matrix is converted into the first throwing
The product of shadow matrix and sharing matrix, the product have size identical with the size of embeded matrix, and by softmax matrix
Transposed matrix be converted into the product of the second projection matrix and sharing matrix, which has the transposition square with softmax matrix
The identical size of size of battle array;And by relative to the first projection matrix, the second projection matrix and being total to based on master data
It enjoys matrix and executes RNN training to update the element of the first projection matrix, the second projection matrix and sharing matrix.
This method can also include: based on test module and relative to the first projection matrix, the second projection matrix and shared
Matrix calculates (determination) word complexity;It is equal to or more than predetermined value in response to word complexity, obtains size and be greater than sharing matrix
Size new sharing matrix, and be lower than predetermined value in response to word complexity, obtain the size that size is less than sharing matrix
New sharing matrix;And the first projection matrix, the second projection matrix are recalculated using new sharing matrix obtained and is total to
Enjoy matrix.
This method can also include being calculated (determination) based on text module relative to embeded matrix and softmax matrix
Reference word complexity, and predetermined value is determined based on reference word complexity.
This method can also include using word complexity lower than the minimal size among multiple sharing matrix of predetermined value
Sharing matrix recalculates and (redefines) the first projection matrix, the second projection matrix and sharing matrix, and based on again
Calculate the first projection matrix, the second projection matrix and sharing matrix come generate through compress language model.
This method can also include: that sharing matrix is converted into the first matrix, the second matrix using singular value decomposition (SVD)
With third matrix;By being based on master data relative to the first projection matrix, the second projection matrix, the first matrix, the second matrix
RNN training is executed with third matrix to update the first projection matrix, the second projection matrix, the first matrix, the second matrix and third
The element of matrix;And based on have updated the first projection matrix of element, the second projection matrix, the first matrix, the second matrix and
Third matrix come generate through compress language model.
This method can also include: to obtain the first data, wherein with first included in one of multiple sentences
Corresponding first vector of word is based on the first random matrix and is mapped to vector space;In response to input in the sentence it is included and
Second word after first word, obtain the second data, wherein the second vector corresponding with second word be based on first with
Machine matrix and be mapped to vector space;Third data are generated based on the first data and the second data;Based on the second random matrix
And recovery vector is obtained from third data;And based on restoring vector and correspond to the of third word after second word
Difference between three vectors and update the element of the first random matrix and the second random matrix and execute training.
This method can also include updating the first random matrix and second at random based on the remaining sentence in multiple sentences
The element of matrix, and the first random matrix of element will be had updated based on remaining sentence and the second random matrix is stored in storage
Embeded matrix and softmax matrix are used as in device.
The transposed matrix of embeded matrix and softmax matrix size having the same.
According to example embodiment, a kind of operating method for providing for electronic equipment and recommending word, the electronic equipment are provided
In be stored with storage language model, which includes the first projection matrix for being used as embeded matrix, the first matrix, second
Matrix and third matrix, and it is used as the second projection matrix, the first matrix, the second matrix and the third square of softmax matrix
Battle array, which includes: to obtain the first data in response to inputting first word, wherein corresponding to first arrow of first word
Amount is mapped to vector space based on the first projection matrix, the first matrix, the second matrix and third matrix;Based on the second projection square
Battle array, the first matrix, the second matrix and third matrix and obtain the second vector from the first data;And it is provided based on the second vector
Recommend word.
This method can also include: to obtain the second data in response to inputting second word after inputting first word,
In correspond to second word third vector be based on the first projection matrix, the first matrix, the second matrix and third matrix and map
To vector space.
[advantageous effects]
According to one or more example embodiments, electronic equipment, which can compress, performs recurrent neural network (RNN) training
Language model data, the electronic equipment with relatively small memory space can store through compressing language model, and can
Recommend word to be based on providing through compression language model in the case where minimizing and/or reducing performance degradation.
Detailed description of the invention
Above and other aspect, feature and the adjoint advantage of the disclosure will become from the detailed description below in conjunction with attached drawing
It must become readily apparent from, in the accompanying drawings, identical appended drawing reference refers to identical element, and wherein:
Figure 1A is the exemplary block diagram for showing electronic equipment according to example embodiment;
Figure 1B is the block diagram for showing the example arrangement of electronic equipment according to example embodiment;
Fig. 2 is the block diagram for showing the electronic equipment according to another example embodiment;
Fig. 3 A and Fig. 3 B are the exemplary figures for showing conventional RNN training;
Fig. 4 is the figure for showing compression method according to example embodiment;
Fig. 5 is the figure for showing the performance and compression ratio through compressing language model according to example embodiment;
Fig. 6 is the flow chart for showing the example operating method for electronic equipment compression language according to example embodiment;
And
Fig. 7 is the process for showing the example operating method of the electronic equipment for providing recommendation word according to example embodiment
Figure.
Specific embodiment
The example embodiment of the disclosure can differently be changed.Therefore, be shown in the accompanying drawings and in detailed description more
Describe specific example embodiment in detail.However, it should be understood that the present disclosure is not limited to specific example embodiments, but including not taking off
All changes, equivalent and substitution from the scope of the present disclosure and spirit.In addition, if this public affairs can be obscured with unnecessary details
If opening, then well-known function or construction can be not described in detail.
Below, various example embodiments will be described in greater detail with reference to the attached drawings.
Figure 1A is the block diagram for showing electronic equipment 100 according to example embodiment.As shown in Figure 1A, electronic equipment 100 can
To include storage device 110 and processor (e.g., including processing circuit) 120.
Electronic equipment 100 can be able to carry out artificial intelligence training.For example, electronic equipment 100 may be implemented as, such as
But it is not limited to, Desktop PC, laptop computer, smart phone, tablet PC, server etc..Electronic equipment 100 can also refer to
The system for establishing cloud computing environment.However, example embodiment is without being limited thereto.Electronic equipment 100 may be implemented as being able to carry out
Any equipment of artificial intelligence training.
Storage device 110 can store language model.Language model may, for example, be all by actually using to user
Such as sentence, phrase language are modeled and the data that create.Using language model, can be based on sequentially inputting word and providing most
It is suitable to recommend word as the word after input word.
Storage device 110 can store the master data including multiple sentences.Master data can be generation language model
Required data.In other words, language model can be generated by being trained relative to master data.
Storage device 110 can store the language model before compression.Storage device 110 can also be stored by processor
The language model of 120 compressions, is described in more detail.
Storage device 110 can store language model, and the language model includes by based on the base including multiple sentences
Notebook data carries out the embeded matrix and softmax matrix of recurrent neural network (RNN) training and generation.RNN can be for learning
Practise a kind of deep learning model of the data (such as time series data) changed over time, become.It hereafter will be with insertion square
RNN training method is more fully described in battle array and softmax matrix together.
Processor may include the overall operation of various processing circuits and controlling electronic devices 100.
According to example embodiment, processor 120 may be implemented as, such as, but not limited to, digital signal processor
(DSP), microprocessor or time controller (TCON) etc., but not limited to this.Processor can be, such as, but not limited to, with
Lower one or more persons: application specific processor, central processing unit (CPU), micro controller unit (MCU), microprocessing unit (MPU),
Controller, application processor (AP), communication processor (CP), arm processor etc., or can be defined as in the above item
One.In addition, processor 140 may be implemented as providing the system on chip (SoC) of Processing Algorithm, or with field programmable gate
The form of array (FPGA) etc. is implemented, but not limited to this.
Processor 120 can be generated the language model before compressing and language model be stored in storage device 110.Place
Reason device 120 can also receive by external equipment generate compression before language model and language model is stored in storage device
In 110.For ease of description, description is used to instruct by RNN other than the description for embeded matrix and softmax matrix
The method for practicing to generate language model, and the method that description also is used to compress language model.
Processor 120 can obtain the first data, wherein with one of multiple sentences for being stored in storage device 110
In included corresponding first vector of first word be based on the first random matrix and be mapped to vector space.For example, in sentence
One of can be " I is a boy ", and first word can be " I ".
Vector corresponding with the word can have the size of " 1 × m ", and can be based on word included in master data
Type determine " m ".For example, if there is different from each other 15,000 word, then the size of vector can be " 1 ×
15000".In addition, the 15 of vector, only one in the value of 000 column can be that the value of " 1 " and remaining columns can be 0, and
Word can be determined based on the position for the column that value is " 1 ".For example, the value of the first row in 15,000 column is " 1 ", then it can be with
Deictic words " I ", and if the value of the secondary series in 15,000 column is " 1 ", then it may be an indicator that word " you ".Processor 120
The first vector corresponding with first word can be obtained by the above method.
First random matrix can have the size of " m × n " and can have random element, and can be used for
One vector is mapped to n-dimensional vector space.In other words, processor 120 can obtain the first data, wherein by by the first vector
The first vector is mapped to n-dimensional vector space multiplied by the first random matrix.
Include in same sentence and second word after first word, processor 120 can obtain if inputted
The second data are obtained, wherein the second vector corresponding with second word is based on the first random matrix and is mapped to vector space.With
In upper example, processor 120 can obtain the second data, wherein by will be with multiplied by the first random matrix by the second vector
Corresponding second vector of two word "Yes" is mapped to n-dimensional vector space.
Processor 120 can generate third data based on the first data and the second data.For example, processor 120 can be with
(LSTM) method, which is remembered, by shot and long term generates third data based on the first data and the second data.LSTM method is the prior art,
And it therefore will not provide its detailed description.Processor 120 is also based on the weighted sum of the first data and the second data and gives birth to
At third data.
Processor 120 can obtain recovery vector based on the second random matrix from third data.Second random matrix
It can have the size of " n × m " and can have random element, and the data that can be used for will be mapped to n dimension revert to
Vector.Therefore, the size of the transposed matrix of the first random matrix can be identical as the size of the transposed matrix of the second random matrix.
Restoring vector can have size " 1 × m ".The value of each column can be between 0 and 1, and the value of all column
With can be " 1 ".
Processor 120 can be based on restoring between vector and third vector corresponding to the third word after second word
Difference and update the element of the first random matrix and the second random matrix, and execute training.In the above examples, third
Word can be "one", and processor 120 can update the element of the first random matrix and the second random matrix, so that by extensive
Complex vector is restored to corresponding to "one" third vector of third word.
Processor 120 can execute above procedure for each sentence.For example, processor 120 can receive word, " I is one
It is a ", each of word is mapped to n dimension, restore weight sum according to and obtain and restore vector, based on restoring vector and correspond to
Difference between the four-vector of word " boy " and the element for updating the first random matrix and the second random matrix, and based on more
New element and execute training.In other words, processor 120 can be based on greater numbers more than two words in a sentence
Word and execute training.Processor 120 can also execute training based on a word.
Processor 120 can complete training relevant to a sentence, and be executed and another language by above method
The relevant another training of sentence.In this case, processor 120 can not consider to complete the previous sentence of training.In other words,
Processor 120 can execute to sentence training one by one, and can execute by various methods with the word in a sentence it
Between the relevant training of relationship.
Processor 120 can update the first random matrix and the second random matrix based on the remaining sentence in multiple sentences
Each of element.If processor 120 completes relevant to sentence training, processor 120 can execute and
The included relevant training of other sentences in master data, and processor 120 can be directed to each sentence execution or more
Process.
Processor 120 first random matrix and the second random matrix can be stored in storage device 110 as
Embeded matrix and softmax matrix, the element of the matrix are based on remaining sentence and update.In other words, once complete and base
The included relevant training of all sentences in notebook data, so that it may be stored in the first random matrix and the second random matrix
Respectively as embeded matrix and softmax matrix in storage device 110.
The size of the transposed matrix of embeded matrix and softmax matrix can be identical.In addition, embeded matrix and softmax
The transposed matrix of matrix can have different corresponding elements.Therefore, even if one identical word of input, recommends word can not also
Together.
Embeded matrix and softmax matrix may be used as the language model for recommending word.For example, if user's input word
" I ", then processor 120 can obtain the first data (wherein, corresponding to word " I " the first vector be based on embeded matrix and
Be mapped to vector space), generate first based on softmax matrix and from the first data and restore vector, and based on first extensive
Complex vector and recommendation word is provided.
For example, processor 120 can provide word corresponding with the column where the maximum value among the element value for restoring vector
Recommend word as first, and word conduct corresponding with the column where the second maximum value among the element value for restoring vector is provided
Second recommends word.For example, processor 120, which can provide word "Yes" as first, recommends word, and recommend word " being once " as the
Two recommend word.
If user in order input word "Yes" and "one", processor 120 can obtain the second data and third
Data, wherein correspond to word "Yes" the second vector sum correspond to word "one" third vector be based on embeded matrix and be mapped to
Vector space, and based on the first data, the weighted sum of the second data and third data and generate the 4th data.
Processor 120 can generate second based on softmax matrix and from the 4th data and restore vector, and based on the
Two restore vectors and provide recommendation word.
For example, the column where processor 120 can provide the maximum value among the element value for restoring vector with second are corresponding
Word as first recommend word, and provide with second recovery vector element value among the second maximum value where column it is corresponding
Word as second recommend word.For example, processor 120 can recommend word " boy " to recommend word as first, and recommend word " female
Child " recommends word as second.
As described above, processor 120 can execute RNN training relative to master data, obtain embeded matrix and softmax
Matrix, and generate the language model including embeded matrix and softmax matrix.When generating language model, processor 120 can
To provide recommendation word based on language model.
However, the size of embeded matrix and softmax matrix may be larger.For example, if there are 15 in master data,
000 different word, and using the vector space of 600 dimensions, then the insertion square that size is " 15000 × 600 " can be generated
Battle array and size are the softmax matrix of " 600 × 15000 ".In this case, it is possible to need to store 18,000,000 member
Element, and need big memory space.If reducing dimension, the number of elements for needing to store can be reduced, but recommend word
Performance may be reduced since learning ability is degenerated.
Therefore, it is described in more detail in the case where minimizing performance degradation and/or reducing performance degradation
Compress the method for language model.
The embeded matrix being stored in storage device 110 can be converted into the first projection matrix and shared by processor 120
The product of matrix, which has size identical with the size of embeded matrix, and will be stored in storage device 110
The transposed matrix of softmax matrix is converted into the product of the second projection matrix and sharing matrix, which has and softmax square
The identical size of size of the transposed matrix of battle array.
For example, embeded matrix can be converted into size by processor 120 if the size of embeded matrix is " m × n "
The first projection matrix and size for " m × l " are the sharing matrix of " l × n ".First projection matrix and the element of sharing matrix can
To be randomly determined, and the element can be unrelated with the element of embeded matrix.
If the size of softmax matrix is " n × m ", processor 120 can be by softmax matrix conversion at big
Small is second projection matrix of " m × l " and size is the sharing matrix of " l × n ".The element of second projection matrix and sharing matrix
It can be randomly determined, and the element can be unrelated with the element of softmax matrix.
For example, if the size of embeded matrix be " 15000 × 600 " and the size of softmax matrix be " 600 ×
15000 ", then processor 120 can be generated size be first projection matrix of " 15000 × 100 ", size be " 15000 ×
100 " the second projection matrix and size is the sharing matrix of " 100 × 600 ".In this case, embeded matrix and softmax
18,000,000 elements of matrix can be reduced to the 3,060 of the first projection matrix, the second projection matrix and sharing matrix,
000 element." l " is smaller, and compression efficiency can increase more.
Processor 120 can be by relative to the first projection matrix, the second projection matrix and being shared based on master data
Matrix executes RNN training to update the element of the first projection matrix, the second projection matrix and sharing matrix.Update method can be with
It is above-mentioned identical with the method for softmax matrix for generating embeded matrix.
First projection matrix can be used as the first random matrix multiplied by sharing matrix and by product by processor 120, and
It is used as the second random matrix by the second projection matrix multiplied by sharing matrix and by the transposed matrix of product.Processor 120 can lead to
It crosses and executes training relative to all sentences for including in master data to update the first projection matrix, the second projection matrix and share
The element of matrix.
If the size of sharing matrix is smaller (that is, if " l " is smaller), the performance of language model can be moved back
Change, and if the size of sharing matrix is larger, compression efficiency can be deteriorated.Therefore, it is necessary to obtain the shared of best size
Matrix is to improve compression efficiency in the case where maintaining the performance of language model.Therefore, it is described below for calculating word complexity
The method spent (perplexity) and obtain the sharing matrix of best size.Complexity can refer to for example indicate probability distribution or
The standard of probabilistic model how successfully forecast sample, and complexity can be used for the comparison of probabilistic model.Complexity is lower,
Prediction can be more successful.
Processor 120 can be calculated and the first projection matrix, the second projection matrix and sharing matrix based on test module
Relevant word complexity.Test module can refer to for test language model module (e.g., including processing circuit and/or
Program element), and to the type of module can there is no limit.
If word complexity is greater than predetermined value, processor 120 can obtain size of the size greater than sharing matrix
New sharing matrix, and if word complexity is less than predetermined value, processor 120 can obtain size less than sharing matrix
The new sharing matrix of size.Using the sharing matrix newly obtained, can recalculate the first projection matrix, the second projection matrix and
Sharing matrix.Recalculating for first projection matrix, the second projection matrix and sharing matrix can be indicated through training come more
New element.
If the word complexity that size is the sharing matrix of " 100 × 600 " is greater than predetermined value, processor 120 can be with
The sharing matrix that size is " 110 × 600 " is obtained, and if word complexity is less than predetermined value, processor 120 can be obtained
Obtain the sharing matrix that size is " 90 × 600 ".The above example that the value of " l " is 100 or 90 can be an example, and can be with
It is arranged differently than the value of " l ".
It is multiple that processor 120 can calculate reference word based on test module and relative to embeded matrix and softmax matrix
Miscellaneous degree, and predetermined value is determined based on reference word complexity.
For example, processor 120 can be calculated and embeded matrix and softmax matrix correlation based on identical test module
Reference word complexity, and reference word complexity is determined as predetermined value.
The language model using embeded matrix and softmax matrix can be optimized for determining dimension.Even if dimension
Identical, the performance that use value " l " is less than the language model of the sharing matrix of embeded matrix and softmax rank of matrix can be than making
It is further decreased with the performance of embeded matrix and the language model of softmax matrix." l " value of sharing matrix can not be forced to answer
Greater than embeded matrix and softmax rank of matrix, but if " l " value is too small, then the model generated can be with best model too
Cross difference.In order to solve above situation, processor 120 can execute training relative to multiple sharing matrix.
In other words, processor can update the member of new sharing matrix and the first projection matrix and the second projection matrix
Element, and recalculate word complexity.For example, processor 120 can update multiple sharing matrix and correspond to multiple shared
First projection matrix of matrix and the element of the second projection matrix, and calculate and correspond to each of multiple sharing matrix
Word complexity.
It is shared lower than the minimal size among multiple sharing matrix of predetermined value that word complexity can be used in processor 120
Matrix is projected to recalculate the first projection matrix, the second projection matrix and sharing matrix based on first recalculated
Matrix, the second projection matrix and sharing matrix come generate through compress language model.
As described above, processor 120 can be by will include that the language model of embeded matrix and softmax matrix becomes to wrap
The language model for including the first projection matrix, the second projection matrix and sharing matrix carrys out compressed data.In addition, include embeded matrix and
The dimension of the language model of softmax matrix can be with the language including the first projection matrix, the second projection matrix and sharing matrix
Say that the dimension of model is identical, compression efficiency can be improved and can be most in and the size by being suitably set sharing matrix
Smallization and/or reduction performance degradation.
Processor 120 can be used singular value decomposition (SVD) by sharing matrix be converted into the first matrix, the second matrix and
Third matrix.For example, the sharing matrix that size is " l × n " can be converted into the first square that size is " l × l " by processor 120
Battle array, the second matrix that size is " l × r " and third matrix that size is " r × l ".
For example, it is " 100 × 100 " that the sharing matrix that size is " 100 × 600 " can be converted into size by processor 120
The first matrix, the second matrix that size is " 100 × 20 " and third matrix that size is " 20 × 600 ".In this case,
The quantity of 60,000 elements of sharing matrix can be reduced to 24,000 elements.In other words, can by again into
One step decomposes sharing matrix to improve compression efficiency.
SVD can refer to such as singular value decomposition, and since singular value decomposition is well known technology, will not provide
It is described in detail.In addition, the element of the first matrix, the second matrix and third matrix can be unrelated with sharing matrix.
Processor 120 can be by being based on master data relative to the first projection matrix, the second projection matrix, the first square
Battle array, the second matrix and third matrix execute RNN training to update the first projection matrix, the second projection matrix, the first matrix, second
The element of matrix and third matrix, and using have updated the first projection matrix of element, the second projection matrix, the first matrix,
Second matrix and third matrix are generated through compressing language model.The method by training more new element is hereinbefore described,
And it therefore, detailed description thereof will not be repeated.
Figure 1B is the block diagram for showing the example arrangement of electronic equipment 100.According to Figure 1B, electronic equipment 100 may include depositing
Storage device 110, processor (e.g., including processing circuit) 120, communicator (e.g., including telecommunication circuit) 130, user interface
(e.g., including interface circuit) 140, display 150, audio processor (e.g., including audio frequency processing circuit) 160, and view
Frequency processor (e.g., including video processing circuits) 170.It will not repeat member shown in Figure 1B Chong Die with element shown in figure 1A
The detailed description of part.
Processor 120 be may include various processing circuits and be come using the various programs being stored in storage device 110
The integrated operation of controlling electronic devices 100.
Processor 120 may include, such as, but not limited to, RAM 121, ROM 122, host CPU 123, graphics processor
124, first interface 125-1 is to the n-th interface 125-n and bus 126.
RAM 121, ROM 122, host CPU 123, graphics processor 124, first interface 125-1 to the n-th interface 125-n can
To be connected to each other via bus 126.
First interface 125-1 to the n-th interface 125-n can be connect with above-mentioned component.One of interface can be via
The network interface that network is connect with external equipment.
The accessible storage device 110 of host CPU 123, and use the operating system (O/ being stored in storage device 110
S starting) is executed, and executes various operations using the various programs being stored in storage device 110.
ROM 122 can store the order word set etc. for activation system.Once connecting order in response to input and supplying
Electric power, host CPU 123 can answer the O/S being stored in storage device 110 in response to the order that is stored in ROM 122
It makes RAM 121, execute O/S, and activation system.Once completing starting, host CPU 123 can will be stored in storage device
Various application copies in 110 copy to the application program of RAM 121 to RAM 121, execution, and execute various behaviour
Make.
Computing unit (not shown) and rendering unit (not shown) can be used to generate including such as in graphics processor 124
The screen of the various objects such as icon, image, text.Computing unit can calculate such as coordinate based on the control command received
The attribute values such as value, shape, size, color follow the layout of screen using the attribute value to show each object.Rendering unit
The screen of the various layouts including object can be generated based on the attribute value calculated by computing unit.It is generated in rendering unit
Screen may be displayed on the display area of display 150.
The operation of above-mentioned processor 120 can be executed by the program being stored in storage device 110.
Storage device 110 can store various data, such as soft for the operating system (O/S) of drive electronics 100
Part module, the language model including embeded matrix and softmax matrix, the compression module for compressing language model, RNN training
Module etc..
Communicator 130 may include various telecommunication circuits, and be set by various communication means and various types of outsides
Standby communication.Communicator 130 may include various telecommunication circuits, such as, such as, but not limited to, Wi-Fi chip 131, Bluetooth chip
132, wireless communication chips 133 and NFC chip 134 etc..Communicator 130 can be used to set with various outsides in processor 120
Standby communication.
Wi-Fi chip 131 and Bluetooth chip 132 can execute communication by Wi-Fi and bluetooth respectively.Using Wi-
In the case where Fi chip 131 or Bluetooth chip 132, it can preferentially emit and receive the connections such as SSID and session key letter
It ceases, the information can be used carry out connection communication, and can emit and receive various information.Wireless communication chips 133 can be with
Refer to according to various communication standards and executes the chip of communication, such as IEEE, Zigbee, the 3rd generation (3G), the 3rd generation affiliate
Project (3GPP), long term evolution (LTE) etc..NFC chip 134 can refer to the chip operated in near-field communication (NFC) method, institute
It states NFC method and uses 135kHz, 13.56MHz, 433MHz, 860MHz to the various RF-ID frequencies such as 960MHz and 2.45Ghz
13.56MHz frequency band among band.
Processor 120 can receive the language including embeded matrix and softmax matrix from external equipment via communicator 130
Say model.
User interface 140 may include various interface circuits and receive various user's interactions.User interface 140 can be with
Various forms is implemented, and the example embodiment of electronic equipment 100 is specifically dependent upon.For example, user interface 140 may include, such as
But it is not limited to, the button being arranged in electronic equipment 100, the microphone for receiving user's input, the camera for detecting user movement etc.
Deng.In addition, user interface 140 can be carried out if electronic equipment 100 is implemented as the electronic equipment based on touch
Such as, but not limited to, to form the touch screen etc. of interlayer structure with touch tablet.In this case, user interface 140 can be with
As aforementioned display device 150.
Audio processor 160 may include the various circuits for handling audio data.Audio processor 160 can be opposite
Various processing operations are executed in audio data, such as decodes, amplify, noise filtering.
Video processor 170 may include the various circuits for executing the processing to video data.Video processor 170
Various image processing operations, decoding, the conversion of scaling, noise filtering, frame per second, conversion of resolution etc. can be executed.
By the above method, processor 120 can be in the case where minimizing and/or reducing the performance degradation of language model
Language model including embeded matrix and softmax matrix is converted into data to be compressed and including the first projection matrix, second
Projection matrix, the first matrix, the second matrix and third matrix language model.
It is described in more detail for providing the method for recommending word by the language model as above compressed.
Fig. 2 is the block diagram for showing the electronic equipment 200 according to another example embodiment.As shown in Fig. 2, electronic equipment 200
It may include storage device 210 and processor (e.g., including processing circuit) 220.
Electronic equipment 200 can provide recommendation word.For example, electronic equipment 200 can receive the input of user spoken utterances and
Recommendation word after user spoken utterances is provided.For example, if input user spoken utterances " weather of today be ... ", electronics is set
Standby 200, which can provide " sunny ", " cold " etc., recommends word.
Electronic equipment 200 may be implemented as, and such as, but not limited to, Desktop PC, smart phone, is put down at laptop computer
Plate PC, server etc..In addition, electronic equipment 200 may be implemented as the equipment with small memory space.
Storage device 210 can store language model, and the language model includes the first projection for being used as embeded matrix
Matrix, the first matrix, the second matrix and third matrix, and it is used as the second projection matrix of softmax matrix, the first square
Battle array, the second matrix and third matrix.Storage device 210 can store described in Figure 1A and Figure 1B through compressing language model.
Processor 220 can control the integrated operation of electronic equipment 200.
According to example embodiment, processor 220 may be implemented as, such as, but not limited to, digital signal processor
(DSP), microprocessor or time controller (TCON) etc., but not limited to this.Processor 220 may include, such as but unlimited
In, below one or more: application specific processor, central processing unit (CPU), micro controller unit (MCU), microprocessing unit
(MPU), controller, application processor (AP), communication processor (CP) and arm processor etc. or processor 220 can be with
It is defined as one of above item.Processor 220 can also be implemented as providing the system on chip (SoC) of Processing Algorithm or big
Scale integrates (LSI), or is embodied as the form etc. of field programmable gate array (FPGA).
Processor 220 can obtain the first data in response to first word of input, wherein corresponding to the of first word
One vector is based on the first projection matrix, the first matrix, the second matrix and third matrix and is mapped to vector space.For example, if
User's input word " I ", then processor 220 can will correspond to first vector of word " I " multiplied by the first projection matrix, first
Matrix, the second matrix and third matrix, and obtain first data with higher-dimension.
Processor 220 can be based on the second projection matrix, the first matrix, the second matrix and third matrix and from the first data
Obtain the second vector.For example, the value of each column of the second vector can between 0 and 1, and the value of all column and can be
1。
Processor 220 can provide recommendation word based on the second vector.For example, processor 220 can provide and the second arrow
The corresponding word of the column where maximum value among the element value of amount recommends word as first, and provides the element with the second vector
The corresponding word of column where the second maximum value among value recommends word as second.For example, processor 220 can provide word "Yes"
Recommend word as first, and word " being once " is recommended to recommend word as second.
However, example embodiment is not limited to above-mentioned example.Processor 220 can provide the recommendation word of any different number.
Processor 220 can obtain the second data in response to inputting second word after inputting first word, wherein
Third vector corresponding to second word is based on the first projection matrix, the first matrix, the second matrix and third matrix and is mapped to
Vector space.
For example, if after the input word " I " input word "Yes", processor 220 can be by that will correspond to word
The third vector of "Yes" obtains with higher-dimension multiplied by the first projection matrix, the first matrix, the second matrix and third matrix
Two data.In other words, processor 220 is it is contemplated that the word " I " being previously entered and word "Yes" currently entered.
Processor 220 can generate third data based on the first data and the second data.For example, processor 220 can be with
The first data are based on by shot and long term mnemonics (LSTM) and the second data generate third data.Processor 220 can also lead to
The weighted sum of the first data and the second data is crossed to generate third data.
Processor 220 can be based on the second projection matrix, the first matrix, the second matrix and third matrix and from third data
Obtain the four-vector.
For example, processor 220 can obtain the product of the second projection matrix, the first matrix, the second matrix and third matrix
Transposed matrix, and by by the transposed matrix multiplied by therefrom obtain third data transposed matrix come obtain the 4th arrow
Amount.For example, the value of each column of the four-vector can between 0 and 1, and the value of all column and can be 1.
Processor 220 can provide recommendation word based on the four-vector.For example, processor 220 can provide and the 4th arrow
The corresponding word of the column where maximum value among the element value of amount recommends word as first, and provides the element with the four-vector
The corresponding word of column where the second maximum value among value recommends word as second.For example, processor 220 can recommend word " one
It is a " as the first recommendation word, and recommend word " extremely busy " as second and recommend word.
However, example embodiment is not limited to above-mentioned example.Processor 220 can receive more words and provide more recommend
Word.However, it is possible to limit the quantity of the word inputted in the previous period, processor 220 refers to institute's predicate.For example,
If inputting current word, processor 220 can be only with reference to three or less inputted in the previous period word.
In addition, previous time section can be from the predetermined amount of time that current time is counted.For example, if input is current
Word, then processor 220 can be only with reference to the word inputted during nearest 10 seconds.
In addition, processor 220 can receive a word and provide a recommendation word.In other words, processor 220 can be with
Without reference to the word inputted in the previous period.
Electronic equipment 200 can also include input unit (not shown) and output unit (not shown).Input unit can be with
Including various input circuits and for receiving word from user, and may be implemented as, such as, but not limited to, microphone, key
Disk etc..Output unit may include various output circuits and be configured to provide recommendation word, and may be implemented as, such as
But it is not limited to, display, loudspeaker etc..
The structure of processor 220 can be equal with the structure of the processor 120 in Figure 1B, and therefore will not repeat it in detail
Thin description.
Electronic equipment 200 can provide recommendation word, as described above.Meanwhile electronic equipment 200 can store and throw including first
Shadow matrix, the second projection matrix, the first matrix, the second matrix and third matrix language model and correspondingly can be into one
Walk the first meter of the first projection matrix executed multiplied by as embeded matrix, the first matrix, the second matrix and third matrix multiple
Calculate, multiplied by the second projection matrix, the first matrix, the second matrix and third matrix second calculate, and based on second calculating come
Calculate the third calculating that may be used as the transposed matrix of matrix of softmax matrix.The time of such calculating can be very short, and
It and therefore, can be with there is no problem in terms of recommendation word is provided.
The electronic equipment 100 in Figure 1A and Figure 1B, which has been described, can be the portion separated with the electronic equipment 200 in Fig. 2
Part, but the electronic equipment may be implemented as a component.
The operation of the electronic equipment for compressing language model is more fully described hereinafter with reference to attached drawing and for providing
Recommend the operation of the electronic equipment of word.
Fig. 3 A and Fig. 3 B are the exemplary figures for showing conventional RNN training.
As shown in Figure 3A, processor 120 can execute the operation of word insertion, wherein the vector for corresponding to input word is mapped to
Vector space.Embeded matrix can be used for word insertion.
Processor 120 can by time t-3 input first word, time t-2 at input second word and
The third word inputted at time t-1 is mapped to vector space in order, and hides in recurrence and be based on reflecting in the level stage
It is mapped to the first data, the second data and the third data of vector space and generates the 4th data.For example, processor 120 can lead to
Too long short-term memory (LSTM) method or weighted sum method are based on the first data, the second data and third data and generate the 4th data.
In the softmax level stage, processor 120 can be sweared the 4th data conversion in vector space at recovery
Amount.Softmax matrix can be used for converting.Processor 120 can will restore vector and will input at time t the 4th
Word is compared, and updates the element of embeded matrix and softmax matrix.Above procedure, which can be referred to as, trains.
Fig. 3 B is the figure for showing the process being trained using the more specific example of master data, and will be related to Fig. 3 A
Connection ground description Fig. 3 B.
When processor 120 relative in Fig. 3 B first sentence execute training when, processor 120 can in order by
At time t-3 input word " I ", time t-2 at input word " it is desirable that " and time t-1 at input word " I " map
The 4th data are generated to vector space, and based on the first data, the second data and the third data that are mapped to vector space.
Processor 120 can by the 4th data conversion in vector space at restore vector, and will restore vector with
The word " meeting " inputted at time t is compared and based on the element for comparing and updating embeded matrix and softmax matrix.In other words
Say, the element of embeded matrix and softmax matrix can be updated, in response to input word " I " in order, " it is desirable that " and " I "
And export word " meeting ".
In addition, at time t+1, can in order input word " I ", " it is desirable that " and " I " and " meeting ", and processor
120 can execute training by same procedure.In other words, the element of embeded matrix and softmax matrix can be updated, with
In response to input word " I " in order, " it is desirable that " and " I " and " meeting " and export word " success ".Once completing and a sentence
Relevant training, processor 120 can execute training relevant to other four sentences.
Language model can be generated based on training to provide best recommendation word.For example, if on language model execute with
The relevant RNN training of master data, then word "Yes" can be provided as recommendation word as input word " I " at time t-1.
This is because if word " I " is first word in five sentences, second word can be " it is desirable that ", "Yes", " doing ",
"Yes" and "Yes", and since word "Yes" is repeated three times in the training process, embeded matrix and softmax can be updated
Matrix, so that the most suitable recommendation word after word " I " can be "Yes".
Fig. 3 B is provided in order to describe, and embeded matrix can be updated by executing training relevant to a large amount of sentences
With the element of softmax matrix.
Fig. 4 is the figure for showing exemplary compression method according to example embodiment.
Knowledge distillation in Fig. 4 can refer to for example for generating multiple language models and use is from multiple language models
Each of output average recommendation word come the method that improves the performance of language model.
Compression and retraining can refer to for example for compressing the language model including embeded matrix and softmax matrix
Method, and the method can have two steps.
In the first step, embeded matrix can be converted into the product of the first projection matrix and sharing matrix, the product
With size identical with the size of embeded matrix, the transposed matrix of softmax matrix can be converted into the second projection matrix
With the product of sharing matrix, which has size identical with the size of the transposed matrix of softmax matrix, is instructed by RNN
Practice to update the element of the first projection matrix, the second projection matrix and sharing matrix, and determines performance.It can be for various big
Above procedure is repeatedly carried out in small sharing matrix, and can obtain performance almost without degenerating and having optimal compression efficiency
Sharing matrix size, and can be used sharing matrix obtained come generate mainly through compress language model.
In the second step, sharing matrix can be converted into the first matrix, the second matrix and third square for example, by SVD
Battle array, and the first projection matrix, the second projection matrix, the first matrix, the second matrix and third can be updated by RNN training
The element of matrix simultaneously generates second through compressing language model.
Fig. 5 is the figure for showing performance and compression efficiency through compressing language model.PP in Fig. 5 can refer to that for example word is multiple
Miscellaneous degree, and CR can refer to such as compression ratio.
Using the baseline of basic language model, PP can be 56.55, and the size of data can be
56.76.If the knowledge in application drawing 4 distills (KD), performance can be improved and PP can become more smaller than baseline
55.76。
In the case where using the sharing matrix mainly through compression language model, PP can be 55.07, and data is big
It is small to can be 33.87.In other words, in the case where sharing matrix, PP can be similar to baseline or KD, but the size of data can
To reduce 1.68 (based on CR).
In the case where use second is through compressing low-rank and the retraining of language model, PP can be 59.78, and data
Size can be 14.80.In other words, in the case where low-rank and retraining, PP can increase slightly above baseline or KD
PP reduces performance slightly but the size of data can reduce 3.84 (based on CR).The compression ratio of low-rank and retraining
Sharing matrix can be higher than.
Processor 120 can will from second through compress language model the first projection matrix, the second projection matrix, first
The Quantification of elements of matrix, the second matrix and third matrix, and third is generated through compressing language model.For example, processor 120 can
With by the member of four bytes of the element of the first projection matrix, the second projection matrix, the first matrix, the second matrix and third matrix
Element is quantized into two bytes, and generates third through compressing language model.
As shown in figure 5, PP can using in the case where second through quantifying the language model of element in compression language model
To be 59.78, and the size of data can be 7.40.In other words, in the case where quantization, PP can be instructed with low-rank and again
Experienced PP is identical, and the size of data can reduce 7.68 (based on CR).The compression ratio of quantization is higher than low-rank and retraining
Compression ratio.
As set forth above, it is possible in the case where minimizing and/or reducing performance degradation by by embeded matrix and softmax
Matrix is divided into the matrix of multiple small sizes and the Quantification of elements of matrix is carried out compressed data.
Fig. 6 is the process for showing the example operating method for electronic equipment compression language model according to example embodiment
Figure.Electronic equipment can for example store the language model including embeded matrix and softmax matrix, the base on the language model
It is trained in including the master data of multiple sentences to execute recurrent neural network (RNN).
Embeded matrix can be converted into the product of the first projection matrix and sharing matrix, which has and embeded matrix
The identical size of size, and the transposed matrix of softmax matrix can be converted into the second projection matrix and sharing matrix
Product, the product have size (S610) identical with the size of the transposed matrix of softmax matrix.It can be by being based on base
Notebook data and execute RNN training relative to the first projection matrix, the second projection matrix and sharing matrix and update the first projection square
The element (S620) of battle array, the second projection matrix and sharing matrix.
The method can also include: based on test module and relative to the first projection matrix, the second projection matrix and altogether
Matrix is enjoyed to calculate word complexity;It is equal to or more than predetermined value in response to word complexity and obtains size and be greater than the big of sharing matrix
Small new sharing matrix, and obtain size lower than predetermined value in response to word complexity and be total to less than the new of the size of sharing matrix
Enjoy matrix;And the first projection matrix, the second projection matrix and shared square are recalculated using new sharing matrix obtained
Battle array.
The method can also include calculating reference relative to embeded matrix and softmax matrix based on text module
Word complexity, and predetermined value is determined based on reference word complexity.
Recalculate may include using word complexity lower than the minimal size among multiple sharing matrix of predetermined value
Sharing matrix recalculates the first projection matrix, the second projection matrix and sharing matrix, and based on first recalculated
Projection matrix, the second projection matrix and sharing matrix come generate through compress language model.
The method can also include: that sharing matrix is converted into the first matrix, the second square using singular value decomposition (SVD)
Battle array and third matrix;By being based on master data relative to the first projection matrix, the second projection matrix, the first matrix, the second square
Battle array and third matrix execute RNN training to update the first projection matrix, the second projection matrix, the first matrix, the second matrix and the
The element of three matrixes;And based on the first projection matrix, the second projection matrix, first matrix, second with the element updated
Matrix and third matrix are generated through compressing language model.
The method can also include: to obtain the first data, wherein in one of multiple sentences included first
Corresponding first vector of a word is based on the first random matrix and is mapped to vector space;It is included in the sentence in response to inputting
And second word after first word and obtain the second data, wherein the second vector corresponding with second word is based on the
One random matrix and be mapped to vector space, generate third data based on the first data and the second data;It is random based on second
Matrix and recovery vector is obtained from third data;And based on recovery vector and corresponding to the third word after second word
Third vector between difference and update the element of the first random matrix and the second random matrix and execute training.
The method can also include updated based on the remaining sentence in multiple sentences the first random matrix and second with
The element of machine matrix, and the first random matrix of element will be had updated based on remaining sentence and the second random matrix is stored in
Embeded matrix and softmax matrix are used as in storage device.
The transposed matrix of embeded matrix and softmax matrix can have identical size.
Fig. 7 is the process for showing the example operating method of the electronic equipment for providing recommendation word according to example embodiment
Figure.Electronic equipment can store language model, and the language model includes the first projection matrix for being used as embeded matrix, first
Matrix, the second matrix and third matrix, and it is used as the second projection matrix, the first matrix, the second matrix of softmax matrix
With third matrix.
The first data can be obtained in response to first word of input, wherein the first vector for corresponding to first word is based on
It is used as the first projection matrix, the first matrix, the second matrix and the third matrix of embeded matrix and is mapped to vector space
(S710).Can based on the second projection matrix, the first matrix, the second matrix and the third matrix for being used as softmax matrix and
The second vector (S720) is obtained from the first data.It can be provided based on the second vector and recommend word (S730).
The method can also include: to obtain the second number in response to inputting second word after inputting first word
According to, wherein correspond to second word third vector be based on the first projection matrix, the first matrix, the second matrix and third matrix and
It is mapped to vector space;Third data are generated based on the first data and the second data;Based on the second projection matrix, the first square
Battle array, the second matrix and third matrix and obtain the four-vector from third data;And recommendation word is provided based on the four-vector.
According to said one or multiple example embodiments, electronic equipment can compress the language model for executing RNN training
Data, and the electronic equipment with relatively small memory space can store through compressing language model.In addition, electronic equipment can
To be based on providing recommendation word through compression language model in the case where minimizing and/or reducing performance degradation.
According to example embodiment, said one or multiple example embodiments may be implemented as include be stored in it is machine readable
The software of instruction in storage medium.Machine can call the store instruction being stored in a storage medium and the instruction according to calling
It is operated, and may include electronic equipment according to example embodiment.If instruction is executed by processor, processor
The function corresponding to instruction can be directly executed, or function can be executed using other component under the control of a processor.
Instruction may include the code that can be generated or be executed by compiler or interpreter.It can be mentioned in the form of non-transitory storage medium
For machine readable storage medium." non-transitory " may not necessarily mean that storage medium includes signal but can simply mean
Signal can be tangible, and the term can not indicate semi-permanently or provisionally storing data.
In addition, may include according to example embodiment, in computer program product and provide real in one or more examples
Apply method described in example.Computer program product can be the commodity traded between the seller and buyer.Computer program produces
Product can have the form (for example, compact disc read-only memory [CD-ROM]) of machine readable storage medium, or can be via answering
It is distributed online with shop (for example, PlayStoreTM).If distributing computer program product online, computer program is produced
At least part of product can be temporarily stored in storage medium, such as the clothes of the server of company, manufacturer, application shop
The memory of business device or Relay Server, or can provisionally generate.
Above-mentioned various example embodiments can be embodied in computer or similar to computer equipment using software, hardware or
In the readable recording medium of any combination thereof.In some cases, above example embodiment can be presented as processor.According to soft
Part implementation, the described example embodiments such as program and function can be presented as individual software module.Software module
Each of can execute one or more functions described in example embodiment and operation.
In addition, the computer instruction of the processing operation for executing equipment according to one or more example embodiments can be with
It is stored in non-transitory computer-readable medium.Being stored in computer instruction in non-transitory computer-readable medium can be with
A certain equipment/device is controlled according to various example embodiments in equipment/device when instruction is executed by equipment/device processor
Middle execution processing operation.Non-transitory computer-readable medium can refer to the machine readable media or device of storing data.It is non-
The example of temporary computer-readable medium may include, but be not limited to, CD (CD), digital versatile disc (DVD), blue light light
Disk, universal serial bus (USB) stick, storage card, ROM etc..
In addition, each of component described in said one or multiple example embodiments (for example, module or program)
It may be configured to that there are one or more entities, and can be omitted some or other sons in above-mentioned corresponding subassembly
Component can also be included in example embodiment.By alternative or it is additional in a manner of, in component it is some (for example, module or
Program) it can integrate as an entity, and can equally or similarly execute and will be executed before integrated by each component
Function.It can sequentially, in parallel, again according to the operation that various example embodiments execute by module, program or another component
Implement again or heuristically, perhaps at least some of operation can be performed in different, is omitted or can add
Another operation.
Foregoing example embodiment and advantage are only exemplary, and are understood not to limitation example embodiment.It is right
The description of exemplary embodiment is intended to illustrate, without limiting the scope of the present disclosure as defined by the appended claims, and it is more
A substitution, change and variation will be evident for a person skilled in the art.
Claims (15)
1. a kind of electronic equipment, is configured to compression language model, the electronic equipment includes:
Storage device is configured to storage language model, and the language model includes by based on the basic number including multiple sentences
According to the embeded matrix and softmax matrix for carrying out recurrent neural network RNN training and generating;And
Processor is configured to:
The embeded matrix is converted into the product of the first projection matrix and sharing matrix, first projection matrix with it is described total
The product for enjoying matrix has size identical with the size of the embeded matrix, and by the transposition square of the softmax matrix
Battle array is converted into the product of the second projection matrix Yu the sharing matrix, the product of second projection matrix and the sharing matrix
With size identical with the size of the transposed matrix of the softmax matrix, and
By being based on the master data relative to first projection matrix, second projection matrix and the shared square
Battle array executes the RNN training to update the element of first projection matrix, second projection matrix and the sharing matrix.
2. electronic equipment as described in claim 1, wherein the processor is configured to:
It is determined based on test module relative to first projection matrix, second projection matrix and the sharing matrix
Word complexity,
It is equal to or more than predetermined value in response to institute's predicate complexity, it is new shared greater than the size of the sharing matrix obtains size
Matrix, and it is less than the predetermined value in response to institute's predicate complexity, it is new less than the size of the sharing matrix to obtain size
Sharing matrix, and
First projection matrix, second projection matrix and the shared square are redefined using the new sharing matrix
Battle array.
3. electronic equipment as claimed in claim 2, wherein the processor is configured to: based on text module relative to institute
Embeded matrix and the softmax matrix is stated to determine to determine reference word complexity, and based on the reference word complexity
The predetermined value.
4. electronic equipment as claimed in claim 3, wherein the processor is configured to: using word complexity lower than described pre-
The sharing matrix of minimal size among multiple sharing matrix of definite value redefines first projection matrix, described second
Projection matrix and the sharing matrix, and based on the first projection matrix, the second projection matrix and sharing matrix redefined
To generate through compressing language model.
5. electronic equipment as described in claim 1, wherein the processor is further configured to:
The sharing matrix is converted into the first matrix, the second matrix and third matrix using singular value decomposition SVD,
By based on the master data relative to first projection matrix, second projection matrix, first matrix,
Second matrix and the third matrix execute the RNN training to update first projection matrix, second projection
Matrix, first matrix, second matrix and the third matrix element, and
Based on having first projection matrix of element updated, second projection matrix, first matrix, described the
Two matrixes and the third matrix are generated through compressing language model.
6. electronic equipment as described in claim 1, wherein the processor is configured to:
The first data are obtained, wherein the first vector base corresponding with first word included in one of the multiple sentence
It is mapped to vector space in the first random matrix, and included and described first in response to receiving in the sentence
The input of second word after a word obtains the second data, wherein the second vector corresponding with second word is based on institute
It states the first random matrix and is mapped to the vector space,
Third data are generated based on first data and second data, and
Recovery vector is obtained from the third data based on the second random matrix, and is based on the recovery vector and corresponds to
Difference between the third vector of third word after second word and update first random matrix and described
The element of two random matrixes, and execute training.
7. electronic equipment as claimed in claim 6, wherein the processor is configured to:
It is updated in first random matrix and second random matrix based on the remaining sentence in the multiple sentence
Element, and
By first random matrix with the element updated based on the remaining sentence and second random matrix storage
In the storage device respectively as the embeded matrix and the softmax matrix.
8. electronic equipment as described in claim 1, wherein the transposed matrix of the embeded matrix and the softmax matrix has
There is identical size.
9. a kind of electronic equipment is configured to provide and recommends word, the electronic equipment includes:
Storage device, be configured to storage language model, the language model include the first projection matrix for being used as embeded matrix,
First matrix, the second matrix and third matrix, and it is used as the second projection matrix of softmax matrix, the first matrix, second
Matrix and third matrix;And
Processor is configured to:
In response to inputting first word, the first data are obtained, wherein corresponding to the first vector of first word based on described
First projection matrix, first matrix, second matrix and the third matrix and be mapped to vector space,
Based on second projection matrix, first matrix, second matrix and the third matrix from described first
Data obtain the second vector, and
Recommendation word is provided based on second vector.
10. electronic equipment as claimed in claim 9, wherein the processor is configured to:
In response to receiving the input of second word after inputting first word, the second data are obtained, wherein corresponding to
The third vector of second word is based on first projection matrix, first matrix, second matrix and described the
Three matrixes and be mapped to the vector space,
Third data are generated based on first data and second data,
Based on second projection matrix, first matrix, second matrix and the third matrix from the third
Data obtain the four-vector, and
The recommendation word is provided based on the four-vector.
11. the method that a kind of electronic equipment compresses language model is stored with language model, the language mould in the electronic equipment
Type include embeded matrix by carrying out generating and recurrent neural network RNN is trained based on the master data for including multiple sentences and
Softmax matrix, which comprises
The embeded matrix is converted into the product of the first projection matrix and sharing matrix, first projection matrix with it is described total
The product for enjoying matrix has size identical with the size of the embeded matrix, and by the transposition square of the softmax matrix
Battle array is converted into the product of the second projection matrix Yu the sharing matrix, the product of second projection matrix and the sharing matrix
With size identical with the size of the transposed matrix of the softmax matrix;And
By being based on the master data relative to first projection matrix, second projection matrix and the shared square
Battle array executes the RNN training to update the element of first projection matrix, second projection matrix and the sharing matrix.
12. method as claimed in claim 11, further includes:
It is determined based on test module relative to first projection matrix, second projection matrix and the sharing matrix
Word complexity;
It is equal to or more than predetermined value in response to institute's predicate complexity, it is new shared greater than the size of the sharing matrix obtains size
Matrix, and it is less than the predetermined value in response to institute's predicate complexity, it is new less than the size of the sharing matrix to obtain size
Sharing matrix;And
First projection matrix, second projection matrix and the shared square are redefined using the new sharing matrix
Battle array.
13. method as claimed in claim 12, further includes:
Reference word complexity is determined relative to the embeded matrix and the softmax matrix based on text module;And
The predetermined value is determined based on the reference word complexity.
14. method as claimed in claim 13, wherein redefining further include: be less than the predetermined value using word complexity
The sharing matrix of minimal size among multiple sharing matrix come redefine first projection matrix, it is described second projection square
Battle array and the sharing matrix, and generated based on the first projection matrix, the second projection matrix and sharing matrix that redefine
Through compressing language model.
15. method as claimed in claim 11, further includes:
The sharing matrix is converted into the first matrix, the second matrix and third matrix using singular value decomposition (SVD);
By based on the master data relative to first projection matrix, second projection matrix, first matrix,
Second matrix and the third matrix execute the RNN training to update first projection matrix, second projection
Matrix, first matrix, second matrix and the third matrix element;And
Based on having first projection matrix of element updated, second projection matrix, first matrix, described the
Two matrixes and the third matrix are generated through compressing language model.
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201762469089P | 2017-03-09 | 2017-03-09 | |
US62/469,089 | 2017-03-09 | ||
KR10-2017-0147922 | 2017-11-08 | ||
KR1020170147922A KR102488338B1 (en) | 2017-03-09 | 2017-11-08 | Electronic apparatus for compressing language model, electronic apparatus for providing recommendation word and operation methods thereof |
PCT/KR2018/001611 WO2018164378A1 (en) | 2017-03-09 | 2018-02-06 | Electronic apparatus for compressing language model, electronic apparatus for providing recommendation word and operation methods thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110168542A true CN110168542A (en) | 2019-08-23 |
CN110168542B CN110168542B (en) | 2023-11-24 |
Family
ID=63719251
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201880005774.XA Active CN110168542B (en) | 2017-03-09 | 2018-02-06 | Electronic device for compressing language model, electronic device for providing recommended word, and operating method thereof |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP3577571A4 (en) |
KR (1) | KR102488338B1 (en) |
CN (1) | CN110168542B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110781690A (en) * | 2019-10-31 | 2020-02-11 | 北京理工大学 | Fusion and compression method of multi-source neural machine translation model |
US20200257970A1 (en) * | 2019-02-08 | 2020-08-13 | Korea Advanced Institute Of Science And Technology | Data processing apparatus by learning of neural network, data processing method by learning of neural network, and recording medium recording the data processing method |
CN111597825A (en) * | 2020-05-13 | 2020-08-28 | 北京字节跳动网络技术有限公司 | Voice translation method and device, readable medium and electronic equipment |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102234155B1 (en) | 2018-08-31 | 2021-03-30 | 주식회사 엘지화학 | System and method for correcting current value of shunt resistor |
KR102331242B1 (en) * | 2019-05-20 | 2021-11-25 | 에스케이텔레콤 주식회사 | Memory network apparatus and deducing method using the same |
KR20210098247A (en) | 2020-01-31 | 2021-08-10 | 삼성전자주식회사 | Electronic device and operating method for the same |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011221895A (en) * | 2010-04-13 | 2011-11-04 | Fujifilm Corp | Matrix generation device, method, program and information processing device |
CN104636317A (en) * | 2015-03-09 | 2015-05-20 | 湘潭大学 | Method for optimizing interactive projection measurement matrix based on feature value decomposition |
CN105679317A (en) * | 2014-12-08 | 2016-06-15 | 三星电子株式会社 | Method and apparatus for language model training and speech recognition |
CN105810193A (en) * | 2015-01-19 | 2016-07-27 | 三星电子株式会社 | Method and apparatus for training language model, and method and apparatus for recognizing language |
CN106407211A (en) * | 2015-07-30 | 2017-02-15 | 富士通株式会社 | Method and device for classifying semantic relationships among entity words |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9728184B2 (en) * | 2013-06-18 | 2017-08-08 | Microsoft Technology Licensing, Llc | Restructuring deep neural network acoustic models |
JP6628350B2 (en) * | 2015-05-11 | 2020-01-08 | 国立研究開発法人情報通信研究機構 | Method for learning recurrent neural network, computer program therefor, and speech recognition device |
-
2017
- 2017-11-08 KR KR1020170147922A patent/KR102488338B1/en active IP Right Grant
-
2018
- 2018-02-06 CN CN201880005774.XA patent/CN110168542B/en active Active
- 2018-02-06 EP EP18763492.8A patent/EP3577571A4/en not_active Ceased
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011221895A (en) * | 2010-04-13 | 2011-11-04 | Fujifilm Corp | Matrix generation device, method, program and information processing device |
CN105679317A (en) * | 2014-12-08 | 2016-06-15 | 三星电子株式会社 | Method and apparatus for language model training and speech recognition |
CN105810193A (en) * | 2015-01-19 | 2016-07-27 | 三星电子株式会社 | Method and apparatus for training language model, and method and apparatus for recognizing language |
CN104636317A (en) * | 2015-03-09 | 2015-05-20 | 湘潭大学 | Method for optimizing interactive projection measurement matrix based on feature value decomposition |
CN106407211A (en) * | 2015-07-30 | 2017-02-15 | 富士通株式会社 | Method and device for classifying semantic relationships among entity words |
Non-Patent Citations (4)
Title |
---|
OFIR PRESS AND LIORWOLF: "Using the Output Embedding to Improve Language Models", 《ARXIV》 * |
王龙等: "基于RNN汉语语言模型自适应算法研究", 《火力与指挥控制》 * |
祁文青: "一种改进的中文分词算法", 《黄石理工学院学报》 * |
顾思远等: "基于软聚类的模糊类语言模型", 《军事通信技术》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200257970A1 (en) * | 2019-02-08 | 2020-08-13 | Korea Advanced Institute Of Science And Technology | Data processing apparatus by learning of neural network, data processing method by learning of neural network, and recording medium recording the data processing method |
US11741356B2 (en) * | 2019-02-08 | 2023-08-29 | Korea Advanced Institute Of Science & Technology | Data processing apparatus by learning of neural network, data processing method by learning of neural network, and recording medium recording the data processing method |
CN110781690A (en) * | 2019-10-31 | 2020-02-11 | 北京理工大学 | Fusion and compression method of multi-source neural machine translation model |
CN110781690B (en) * | 2019-10-31 | 2021-07-13 | 北京理工大学 | Fusion and compression method of multi-source neural machine translation model |
CN111597825A (en) * | 2020-05-13 | 2020-08-28 | 北京字节跳动网络技术有限公司 | Voice translation method and device, readable medium and electronic equipment |
CN111597825B (en) * | 2020-05-13 | 2021-07-23 | 北京字节跳动网络技术有限公司 | Voice translation method and device, readable medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
EP3577571A1 (en) | 2019-12-11 |
KR20180103671A (en) | 2018-09-19 |
CN110168542B (en) | 2023-11-24 |
EP3577571A4 (en) | 2020-02-26 |
KR102488338B1 (en) | 2023-01-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110168542A (en) | For compressing the electronic equipment of language model, for providing the electronic equipment and its operating method of recommending word | |
US10691886B2 (en) | Electronic apparatus for compressing language model, electronic apparatus for providing recommendation word and operation methods thereof | |
KR102473447B1 (en) | Electronic device and Method for controlling the electronic device thereof | |
US11651214B2 (en) | Multimodal data learning method and device | |
KR101882704B1 (en) | Electronic apparatus and control method thereof | |
CN111816159B (en) | Language identification method and related device | |
EP3702905A1 (en) | Electronic device and control method thereof | |
CN110069715A (en) | A kind of method of information recommendation model training, the method and device of information recommendation | |
US20210279589A1 (en) | Electronic device and control method thereof | |
US11443116B2 (en) | Electronic apparatus and control method thereof | |
US10733481B2 (en) | Cloud device, terminal device, and method for classifying images | |
CN113761153A (en) | Question and answer processing method and device based on picture, readable medium and electronic equipment | |
KR20200044173A (en) | Electronic apparatus and control method thereof | |
US10997947B2 (en) | Electronic device and control method thereof | |
CN111797854A (en) | Scene model establishing method and device, storage medium and electronic equipment | |
KR102071198B1 (en) | Control method, device and system of knowledge sharing platform for managing character, programming file and programming instruction | |
KR20190135888A (en) | Electronic apparatus and contorl method thereof | |
KR102071197B1 (en) | Control method, device and system of knowledge sharing platform for content management of coding education game | |
KR102059017B1 (en) | Control method, apparatus and system for knowledge sharing platform | |
WO2023051678A1 (en) | Recommendation method and related device | |
US20230106213A1 (en) | Machine learning model compression using weighted low-rank factorization | |
CN114357138A (en) | Question and answer identification method and device, electronic equipment and readable storage medium | |
KR20230013876A (en) | System and method for providing interface to applied actual feeling learning contents platform | |
KR20200027085A (en) | Electronic apparatus and control method thereof | |
KR102071199B1 (en) | Control method, device and system of knowledge sharing platform for 2d / 3d design result management |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |