CN110168542A - For compressing the electronic equipment of language model, for providing the electronic equipment and its operating method of recommending word - Google Patents

For compressing the electronic equipment of language model, for providing the electronic equipment and its operating method of recommending word Download PDF

Info

Publication number
CN110168542A
CN110168542A CN201880005774.XA CN201880005774A CN110168542A CN 110168542 A CN110168542 A CN 110168542A CN 201880005774 A CN201880005774 A CN 201880005774A CN 110168542 A CN110168542 A CN 110168542A
Authority
CN
China
Prior art keywords
matrix
projection
word
sharing
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201880005774.XA
Other languages
Chinese (zh)
Other versions
CN110168542B (en
Inventor
俞承学
奈尔什·库尔卡尼
宋熙俊
李海俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Priority claimed from PCT/KR2018/001611 external-priority patent/WO2018164378A1/en
Publication of CN110168542A publication Critical patent/CN110168542A/en
Application granted granted Critical
Publication of CN110168542B publication Critical patent/CN110168542B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/041Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means
    • G06F3/044Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means by capacitive means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/274Converting codes to words; Guess-ahead of partial word inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • Human Computer Interaction (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

It provides a kind of for compressing the electronic equipment of language model, the electronic equipment includes: storage device, it is configured to store language model, which includes the embeded matrix and softmax matrix generated and carrying out recurrent neural network (RNN) training based on the master data for including multiple sentences;And processor, it is configured to: embeded matrix is converted into the product of the first projection matrix and sharing matrix, first projection matrix has size identical with the size of embeded matrix with the product of sharing matrix, and the transposed matrix of softmax matrix is converted into the product of the second projection matrix and sharing matrix, second projection matrix has size identical with the size of the transposed matrix of softmax matrix with the product of sharing matrix, and by being based on master data relative to the first projection matrix, second projection matrix and sharing matrix execute RNN training to update the first projection matrix, the element of second projection matrix and sharing matrix.

Description

For compressing the electronic equipment of language model, for providing the electronic equipment for recommending word And its operating method
Technical field
This disclosure relates in such as cognition, the judgement using the machine learning algorithms such as deep learning simulation human brain Etc. in artificial intelligence (AI) system of functions in the application system electronic equipment of compression language model, for providing recommend word Electronic equipment and its operating method, for example, be related to for based on for its execute recurrent neural network (RNN) training language Model come compress language model electronic equipment, for provide recommend word electronic equipment and its operating method.
Background technique
Artificial intelligence (AI) system is to embody the computer system for the intelligence being equal with human intelligence, and be different from being based on The intelligence system of rule, AI system is by training itself and determines to become intelligence.AI system use is more, the discrimination of system Improvement is more, and system becomes able to more accurately understand user preference.Therefore, rule-based intelligence system by AI system based on deep learning replaces.
AI technology can be configured to have machine learning (deep learning) and the element technology using machine learning.
Machine learning refer to itself classification and study input data characteristic algorithmic technique, and element technology be using The machine learning algorithms such as deep learning come the reproduction technology of the functions such as the cognition, the judgement that replicate the human brain of human brain, and And element technology includes the technical fields such as language understanding, visual analysis, deduction/prediction, knowledge representation, operation control.
AI technology uses and is applied to various fields.Language understanding technology can be related to identifying human language/text and answer With/language/text is handled, and the technology may include the processing of nature word, machine translation, conversational system, answer a question, voice Identification/synthesis etc..Visual analysis technology can be related to recognizing the object as human vision, and the technology may include identification pair As, tracking object, image is searched for, people is identified, understands scene, understands space, upgrade image etc..Deduction and Predicting Technique can relate to And judgement, reasoning and predictive information, and the technology may include the deduction of knowledge based/probability, Optimization Prediction, be based on Plan, recommendation of preference etc..Knowledge representation technology can refer to human experience's information processing into knowledge data, and can wrap It includes knowledge and establishes (data generation/classification), information management (data utilization) etc..Operation control technology, which can be, refers to control vehicle Independent navigation, movement of robot etc., and may include motion control (navigation, collision, driving etc.), manipulation and control (behavior Control) etc..
For example, AI system can learn various sentences, and system can be applied to the result next life Chinese idiom according to study Say model.In addition, AI system can be provided by the similar process of the process learnt with the language model based on generation Neologisms complete sentence.
Such language model can be generated based on a large amount of sentence is learnt, and the dimension of language model is higher, just Integrality can be improved more.However, if the dimension of language model becomes higher, the data volume of language model can be with Exponentially increase, and is likely difficult in the device of no enough memory spaces use language model.In addition, if dimension Spending rank reduces to generate the language model used in the device of no enough memory spaces, then performance may also can drop It is low.Therefore, it is necessary to for reducing data volume and the method for the performance degradation of minimum and/or reduction language model.
Summary of the invention
[technical problem]
One side according to an example embodiment of the present disclosure, is related to compressing in the case where no performance degradation and performs RNN training language model electronic equipment, based on through compression language model and provide recommend word electronic equipment and its operation Method.
[technical solution]
According to example embodiment, a kind of electronic equipment is provided, which includes: storage device, is configured to Language model is stored, which includes by carrying out recurrent neural network based on the master data for including multiple sentences (RNN) embeded matrix and softmax matrix of training and generation;And processor, it is configured to: embeded matrix is converted into The product of first projection matrix and sharing matrix, the product have size identical with the size of embeded matrix, and will The transposed matrix of softmax matrix is converted into the product of the second projection matrix and sharing matrix, which has and softmax square The identical size of size of the transposed matrix of battle array, and by being thrown based on master data relative to the first projection matrix, second Shadow matrix and sharing matrix execute RNN training to update the element of the first projection matrix, the second projection matrix and sharing matrix.
Processor can be by test module and relative to the first projection matrix, the second projection matrix and sharing matrix come based on Calculate (determination) word complexity;It is equal to or more than predetermined value in response to word complexity, acquisition size is greater than the size of sharing matrix New sharing matrix, and it is lower than predetermined value in response to word complexity, obtain the new shared square that size is less than the size of sharing matrix Battle array;And the first projection matrix, the second projection matrix and sharing matrix are recalculated using new sharing matrix obtained.
Processor may include that (determination) ginseng is calculated relative to embeded matrix and softmax matrix based on text module Word complexity is examined, and predetermined value is determined based on reference word complexity.
Shared square of the word complexity lower than the minimal size among multiple sharing matrix of predetermined value can be used in processor Battle array recalculates and (redefines) the first projection matrix, the second projection matrix and sharing matrix, and based on recalculating First projection matrix, the second projection matrix and sharing matrix come generate through compress language model.
Processor can be used singular value decomposition (SVD) and sharing matrix be converted into the first matrix, the second matrix and third Matrix;By being based on master data relative to the first projection matrix, the second projection matrix, the first matrix, the second matrix and third Matrix executes RNN training to update the first projection matrix, the second projection matrix, the first matrix, the second matrix and third matrix Element;And based on the first projection matrix, the second projection matrix, the first matrix, the second matrix and the third square for having updated element Battle array come generate through compress language model.
Processor can obtain the first data, wherein corresponding with first word included in one of multiple sentences First vector is based on the first random matrix and is mapped to vector space;And it is included and the in response to inputting in the sentence Second word after one word obtains the second data, wherein the second vector corresponding with second word is based on the first random square Battle array and be mapped to vector space;And third data are generated based on the first data and the second data;Based on the second random matrix And recovery vector is obtained from third data;It is sweared based on vector is restored with the third corresponding to the third word after second word Difference between amount and update the element of the first random matrix and the second random matrix and execute training.
Processor can update the first random matrix and the second random matrix based on the remaining sentence in multiple sentences Element, and the first random matrix that element is had updated based on remaining sentence and the second random matrix are stored in the storage device As embeded matrix and softmax matrix.
The transposed matrix of embeded matrix and softmax matrix can have identical size.
According to example embodiment, provide it is a kind of for provide recommend word electronic equipment, the electronic equipment include: storage Device is configured to store language model, which includes the first projection matrix for being used as embeded matrix, the first square Battle array, the second matrix and third matrix, and be used as the second projection matrix of softmax matrix, the first matrix, the second matrix and Third matrix;And processor, it is configured to: in response to inputting first word, the first data is obtained, wherein corresponding to first First vector of a word is based on the first projection matrix, the first matrix, the second matrix and third matrix and is mapped to vector space;Base The second vector is obtained from the first data in the second projection matrix, the first matrix, the second matrix and third matrix;And based on the Two vectors and recommendation word is provided.
Processor can obtain the second data, wherein corresponding in response to inputting second word after inputting first word The first projection matrix, the first matrix, the second matrix and third matrix are based in the third vector of second word and are mapped to vector Space;Third data are generated based on the first data and the second data;Based on the second projection matrix, the first matrix, the second matrix The four-vector is obtained from third data with third matrix;And recommendation word is provided based on the four-vector.
According to example embodiment, the operating method of a kind of electronic equipment compression language model is provided, in the electronic equipment It is stored with language model, which includes by carrying out recurrent neural network based on the master data for including multiple sentences (RNN) embeded matrix and softmax matrix of training and generation, which includes: that embeded matrix is converted into the first throwing The product of shadow matrix and sharing matrix, the product have size identical with the size of embeded matrix, and by softmax matrix Transposed matrix be converted into the product of the second projection matrix and sharing matrix, which has the transposition square with softmax matrix The identical size of size of battle array;And by relative to the first projection matrix, the second projection matrix and being total to based on master data It enjoys matrix and executes RNN training to update the element of the first projection matrix, the second projection matrix and sharing matrix.
This method can also include: based on test module and relative to the first projection matrix, the second projection matrix and shared Matrix calculates (determination) word complexity;It is equal to or more than predetermined value in response to word complexity, obtains size and be greater than sharing matrix Size new sharing matrix, and be lower than predetermined value in response to word complexity, obtain the size that size is less than sharing matrix New sharing matrix;And the first projection matrix, the second projection matrix are recalculated using new sharing matrix obtained and is total to Enjoy matrix.
This method can also include being calculated (determination) based on text module relative to embeded matrix and softmax matrix Reference word complexity, and predetermined value is determined based on reference word complexity.
This method can also include using word complexity lower than the minimal size among multiple sharing matrix of predetermined value Sharing matrix recalculates and (redefines) the first projection matrix, the second projection matrix and sharing matrix, and based on again Calculate the first projection matrix, the second projection matrix and sharing matrix come generate through compress language model.
This method can also include: that sharing matrix is converted into the first matrix, the second matrix using singular value decomposition (SVD) With third matrix;By being based on master data relative to the first projection matrix, the second projection matrix, the first matrix, the second matrix RNN training is executed with third matrix to update the first projection matrix, the second projection matrix, the first matrix, the second matrix and third The element of matrix;And based on have updated the first projection matrix of element, the second projection matrix, the first matrix, the second matrix and Third matrix come generate through compress language model.
This method can also include: to obtain the first data, wherein with first included in one of multiple sentences Corresponding first vector of word is based on the first random matrix and is mapped to vector space;In response to input in the sentence it is included and Second word after first word, obtain the second data, wherein the second vector corresponding with second word be based on first with Machine matrix and be mapped to vector space;Third data are generated based on the first data and the second data;Based on the second random matrix And recovery vector is obtained from third data;And based on restoring vector and correspond to the of third word after second word Difference between three vectors and update the element of the first random matrix and the second random matrix and execute training.
This method can also include updating the first random matrix and second at random based on the remaining sentence in multiple sentences The element of matrix, and the first random matrix of element will be had updated based on remaining sentence and the second random matrix is stored in storage Embeded matrix and softmax matrix are used as in device.
The transposed matrix of embeded matrix and softmax matrix size having the same.
According to example embodiment, a kind of operating method for providing for electronic equipment and recommending word, the electronic equipment are provided In be stored with storage language model, which includes the first projection matrix for being used as embeded matrix, the first matrix, second Matrix and third matrix, and it is used as the second projection matrix, the first matrix, the second matrix and the third square of softmax matrix Battle array, which includes: to obtain the first data in response to inputting first word, wherein corresponding to first arrow of first word Amount is mapped to vector space based on the first projection matrix, the first matrix, the second matrix and third matrix;Based on the second projection square Battle array, the first matrix, the second matrix and third matrix and obtain the second vector from the first data;And it is provided based on the second vector Recommend word.
This method can also include: to obtain the second data in response to inputting second word after inputting first word, In correspond to second word third vector be based on the first projection matrix, the first matrix, the second matrix and third matrix and map To vector space.
[advantageous effects]
According to one or more example embodiments, electronic equipment, which can compress, performs recurrent neural network (RNN) training Language model data, the electronic equipment with relatively small memory space can store through compressing language model, and can Recommend word to be based on providing through compression language model in the case where minimizing and/or reducing performance degradation.
Detailed description of the invention
Above and other aspect, feature and the adjoint advantage of the disclosure will become from the detailed description below in conjunction with attached drawing It must become readily apparent from, in the accompanying drawings, identical appended drawing reference refers to identical element, and wherein:
Figure 1A is the exemplary block diagram for showing electronic equipment according to example embodiment;
Figure 1B is the block diagram for showing the example arrangement of electronic equipment according to example embodiment;
Fig. 2 is the block diagram for showing the electronic equipment according to another example embodiment;
Fig. 3 A and Fig. 3 B are the exemplary figures for showing conventional RNN training;
Fig. 4 is the figure for showing compression method according to example embodiment;
Fig. 5 is the figure for showing the performance and compression ratio through compressing language model according to example embodiment;
Fig. 6 is the flow chart for showing the example operating method for electronic equipment compression language according to example embodiment; And
Fig. 7 is the process for showing the example operating method of the electronic equipment for providing recommendation word according to example embodiment Figure.
Specific embodiment
The example embodiment of the disclosure can differently be changed.Therefore, be shown in the accompanying drawings and in detailed description more Describe specific example embodiment in detail.However, it should be understood that the present disclosure is not limited to specific example embodiments, but including not taking off All changes, equivalent and substitution from the scope of the present disclosure and spirit.In addition, if this public affairs can be obscured with unnecessary details If opening, then well-known function or construction can be not described in detail.
Below, various example embodiments will be described in greater detail with reference to the attached drawings.
Figure 1A is the block diagram for showing electronic equipment 100 according to example embodiment.As shown in Figure 1A, electronic equipment 100 can To include storage device 110 and processor (e.g., including processing circuit) 120.
Electronic equipment 100 can be able to carry out artificial intelligence training.For example, electronic equipment 100 may be implemented as, such as But it is not limited to, Desktop PC, laptop computer, smart phone, tablet PC, server etc..Electronic equipment 100 can also refer to The system for establishing cloud computing environment.However, example embodiment is without being limited thereto.Electronic equipment 100 may be implemented as being able to carry out Any equipment of artificial intelligence training.
Storage device 110 can store language model.Language model may, for example, be all by actually using to user Such as sentence, phrase language are modeled and the data that create.Using language model, can be based on sequentially inputting word and providing most It is suitable to recommend word as the word after input word.
Storage device 110 can store the master data including multiple sentences.Master data can be generation language model Required data.In other words, language model can be generated by being trained relative to master data.
Storage device 110 can store the language model before compression.Storage device 110 can also be stored by processor The language model of 120 compressions, is described in more detail.
Storage device 110 can store language model, and the language model includes by based on the base including multiple sentences Notebook data carries out the embeded matrix and softmax matrix of recurrent neural network (RNN) training and generation.RNN can be for learning Practise a kind of deep learning model of the data (such as time series data) changed over time, become.It hereafter will be with insertion square RNN training method is more fully described in battle array and softmax matrix together.
Processor may include the overall operation of various processing circuits and controlling electronic devices 100.
According to example embodiment, processor 120 may be implemented as, such as, but not limited to, digital signal processor (DSP), microprocessor or time controller (TCON) etc., but not limited to this.Processor can be, such as, but not limited to, with Lower one or more persons: application specific processor, central processing unit (CPU), micro controller unit (MCU), microprocessing unit (MPU), Controller, application processor (AP), communication processor (CP), arm processor etc., or can be defined as in the above item One.In addition, processor 140 may be implemented as providing the system on chip (SoC) of Processing Algorithm, or with field programmable gate The form of array (FPGA) etc. is implemented, but not limited to this.
Processor 120 can be generated the language model before compressing and language model be stored in storage device 110.Place Reason device 120 can also receive by external equipment generate compression before language model and language model is stored in storage device In 110.For ease of description, description is used to instruct by RNN other than the description for embeded matrix and softmax matrix The method for practicing to generate language model, and the method that description also is used to compress language model.
Processor 120 can obtain the first data, wherein with one of multiple sentences for being stored in storage device 110 In included corresponding first vector of first word be based on the first random matrix and be mapped to vector space.For example, in sentence One of can be " I is a boy ", and first word can be " I ".
Vector corresponding with the word can have the size of " 1 × m ", and can be based on word included in master data Type determine " m ".For example, if there is different from each other 15,000 word, then the size of vector can be " 1 × 15000".In addition, the 15 of vector, only one in the value of 000 column can be that the value of " 1 " and remaining columns can be 0, and Word can be determined based on the position for the column that value is " 1 ".For example, the value of the first row in 15,000 column is " 1 ", then it can be with Deictic words " I ", and if the value of the secondary series in 15,000 column is " 1 ", then it may be an indicator that word " you ".Processor 120 The first vector corresponding with first word can be obtained by the above method.
First random matrix can have the size of " m × n " and can have random element, and can be used for One vector is mapped to n-dimensional vector space.In other words, processor 120 can obtain the first data, wherein by by the first vector The first vector is mapped to n-dimensional vector space multiplied by the first random matrix.
Include in same sentence and second word after first word, processor 120 can obtain if inputted The second data are obtained, wherein the second vector corresponding with second word is based on the first random matrix and is mapped to vector space.With In upper example, processor 120 can obtain the second data, wherein by will be with multiplied by the first random matrix by the second vector Corresponding second vector of two word "Yes" is mapped to n-dimensional vector space.
Processor 120 can generate third data based on the first data and the second data.For example, processor 120 can be with (LSTM) method, which is remembered, by shot and long term generates third data based on the first data and the second data.LSTM method is the prior art, And it therefore will not provide its detailed description.Processor 120 is also based on the weighted sum of the first data and the second data and gives birth to At third data.
Processor 120 can obtain recovery vector based on the second random matrix from third data.Second random matrix It can have the size of " n × m " and can have random element, and the data that can be used for will be mapped to n dimension revert to Vector.Therefore, the size of the transposed matrix of the first random matrix can be identical as the size of the transposed matrix of the second random matrix.
Restoring vector can have size " 1 × m ".The value of each column can be between 0 and 1, and the value of all column With can be " 1 ".
Processor 120 can be based on restoring between vector and third vector corresponding to the third word after second word Difference and update the element of the first random matrix and the second random matrix, and execute training.In the above examples, third Word can be "one", and processor 120 can update the element of the first random matrix and the second random matrix, so that by extensive Complex vector is restored to corresponding to "one" third vector of third word.
Processor 120 can execute above procedure for each sentence.For example, processor 120 can receive word, " I is one It is a ", each of word is mapped to n dimension, restore weight sum according to and obtain and restore vector, based on restoring vector and correspond to Difference between the four-vector of word " boy " and the element for updating the first random matrix and the second random matrix, and based on more New element and execute training.In other words, processor 120 can be based on greater numbers more than two words in a sentence Word and execute training.Processor 120 can also execute training based on a word.
Processor 120 can complete training relevant to a sentence, and be executed and another language by above method The relevant another training of sentence.In this case, processor 120 can not consider to complete the previous sentence of training.In other words, Processor 120 can execute to sentence training one by one, and can execute by various methods with the word in a sentence it Between the relevant training of relationship.
Processor 120 can update the first random matrix and the second random matrix based on the remaining sentence in multiple sentences Each of element.If processor 120 completes relevant to sentence training, processor 120 can execute and The included relevant training of other sentences in master data, and processor 120 can be directed to each sentence execution or more Process.
Processor 120 first random matrix and the second random matrix can be stored in storage device 110 as Embeded matrix and softmax matrix, the element of the matrix are based on remaining sentence and update.In other words, once complete and base The included relevant training of all sentences in notebook data, so that it may be stored in the first random matrix and the second random matrix Respectively as embeded matrix and softmax matrix in storage device 110.
The size of the transposed matrix of embeded matrix and softmax matrix can be identical.In addition, embeded matrix and softmax The transposed matrix of matrix can have different corresponding elements.Therefore, even if one identical word of input, recommends word can not also Together.
Embeded matrix and softmax matrix may be used as the language model for recommending word.For example, if user's input word " I ", then processor 120 can obtain the first data (wherein, corresponding to word " I " the first vector be based on embeded matrix and Be mapped to vector space), generate first based on softmax matrix and from the first data and restore vector, and based on first extensive Complex vector and recommendation word is provided.
For example, processor 120 can provide word corresponding with the column where the maximum value among the element value for restoring vector Recommend word as first, and word conduct corresponding with the column where the second maximum value among the element value for restoring vector is provided Second recommends word.For example, processor 120, which can provide word "Yes" as first, recommends word, and recommend word " being once " as the Two recommend word.
If user in order input word "Yes" and "one", processor 120 can obtain the second data and third Data, wherein correspond to word "Yes" the second vector sum correspond to word "one" third vector be based on embeded matrix and be mapped to Vector space, and based on the first data, the weighted sum of the second data and third data and generate the 4th data.
Processor 120 can generate second based on softmax matrix and from the 4th data and restore vector, and based on the Two restore vectors and provide recommendation word.
For example, the column where processor 120 can provide the maximum value among the element value for restoring vector with second are corresponding Word as first recommend word, and provide with second recovery vector element value among the second maximum value where column it is corresponding Word as second recommend word.For example, processor 120 can recommend word " boy " to recommend word as first, and recommend word " female Child " recommends word as second.
As described above, processor 120 can execute RNN training relative to master data, obtain embeded matrix and softmax Matrix, and generate the language model including embeded matrix and softmax matrix.When generating language model, processor 120 can To provide recommendation word based on language model.
However, the size of embeded matrix and softmax matrix may be larger.For example, if there are 15 in master data, 000 different word, and using the vector space of 600 dimensions, then the insertion square that size is " 15000 × 600 " can be generated Battle array and size are the softmax matrix of " 600 × 15000 ".In this case, it is possible to need to store 18,000,000 member Element, and need big memory space.If reducing dimension, the number of elements for needing to store can be reduced, but recommend word Performance may be reduced since learning ability is degenerated.
Therefore, it is described in more detail in the case where minimizing performance degradation and/or reducing performance degradation Compress the method for language model.
The embeded matrix being stored in storage device 110 can be converted into the first projection matrix and shared by processor 120 The product of matrix, which has size identical with the size of embeded matrix, and will be stored in storage device 110 The transposed matrix of softmax matrix is converted into the product of the second projection matrix and sharing matrix, which has and softmax square The identical size of size of the transposed matrix of battle array.
For example, embeded matrix can be converted into size by processor 120 if the size of embeded matrix is " m × n " The first projection matrix and size for " m × l " are the sharing matrix of " l × n ".First projection matrix and the element of sharing matrix can To be randomly determined, and the element can be unrelated with the element of embeded matrix.
If the size of softmax matrix is " n × m ", processor 120 can be by softmax matrix conversion at big Small is second projection matrix of " m × l " and size is the sharing matrix of " l × n ".The element of second projection matrix and sharing matrix It can be randomly determined, and the element can be unrelated with the element of softmax matrix.
For example, if the size of embeded matrix be " 15000 × 600 " and the size of softmax matrix be " 600 × 15000 ", then processor 120 can be generated size be first projection matrix of " 15000 × 100 ", size be " 15000 × 100 " the second projection matrix and size is the sharing matrix of " 100 × 600 ".In this case, embeded matrix and softmax 18,000,000 elements of matrix can be reduced to the 3,060 of the first projection matrix, the second projection matrix and sharing matrix, 000 element." l " is smaller, and compression efficiency can increase more.
Processor 120 can be by relative to the first projection matrix, the second projection matrix and being shared based on master data Matrix executes RNN training to update the element of the first projection matrix, the second projection matrix and sharing matrix.Update method can be with It is above-mentioned identical with the method for softmax matrix for generating embeded matrix.
First projection matrix can be used as the first random matrix multiplied by sharing matrix and by product by processor 120, and It is used as the second random matrix by the second projection matrix multiplied by sharing matrix and by the transposed matrix of product.Processor 120 can lead to It crosses and executes training relative to all sentences for including in master data to update the first projection matrix, the second projection matrix and share The element of matrix.
If the size of sharing matrix is smaller (that is, if " l " is smaller), the performance of language model can be moved back Change, and if the size of sharing matrix is larger, compression efficiency can be deteriorated.Therefore, it is necessary to obtain the shared of best size Matrix is to improve compression efficiency in the case where maintaining the performance of language model.Therefore, it is described below for calculating word complexity The method spent (perplexity) and obtain the sharing matrix of best size.Complexity can refer to for example indicate probability distribution or The standard of probabilistic model how successfully forecast sample, and complexity can be used for the comparison of probabilistic model.Complexity is lower, Prediction can be more successful.
Processor 120 can be calculated and the first projection matrix, the second projection matrix and sharing matrix based on test module Relevant word complexity.Test module can refer to for test language model module (e.g., including processing circuit and/or Program element), and to the type of module can there is no limit.
If word complexity is greater than predetermined value, processor 120 can obtain size of the size greater than sharing matrix New sharing matrix, and if word complexity is less than predetermined value, processor 120 can obtain size less than sharing matrix The new sharing matrix of size.Using the sharing matrix newly obtained, can recalculate the first projection matrix, the second projection matrix and Sharing matrix.Recalculating for first projection matrix, the second projection matrix and sharing matrix can be indicated through training come more New element.
If the word complexity that size is the sharing matrix of " 100 × 600 " is greater than predetermined value, processor 120 can be with The sharing matrix that size is " 110 × 600 " is obtained, and if word complexity is less than predetermined value, processor 120 can be obtained Obtain the sharing matrix that size is " 90 × 600 ".The above example that the value of " l " is 100 or 90 can be an example, and can be with It is arranged differently than the value of " l ".
It is multiple that processor 120 can calculate reference word based on test module and relative to embeded matrix and softmax matrix Miscellaneous degree, and predetermined value is determined based on reference word complexity.
For example, processor 120 can be calculated and embeded matrix and softmax matrix correlation based on identical test module Reference word complexity, and reference word complexity is determined as predetermined value.
The language model using embeded matrix and softmax matrix can be optimized for determining dimension.Even if dimension Identical, the performance that use value " l " is less than the language model of the sharing matrix of embeded matrix and softmax rank of matrix can be than making It is further decreased with the performance of embeded matrix and the language model of softmax matrix." l " value of sharing matrix can not be forced to answer Greater than embeded matrix and softmax rank of matrix, but if " l " value is too small, then the model generated can be with best model too Cross difference.In order to solve above situation, processor 120 can execute training relative to multiple sharing matrix.
In other words, processor can update the member of new sharing matrix and the first projection matrix and the second projection matrix Element, and recalculate word complexity.For example, processor 120 can update multiple sharing matrix and correspond to multiple shared First projection matrix of matrix and the element of the second projection matrix, and calculate and correspond to each of multiple sharing matrix Word complexity.
It is shared lower than the minimal size among multiple sharing matrix of predetermined value that word complexity can be used in processor 120 Matrix is projected to recalculate the first projection matrix, the second projection matrix and sharing matrix based on first recalculated Matrix, the second projection matrix and sharing matrix come generate through compress language model.
As described above, processor 120 can be by will include that the language model of embeded matrix and softmax matrix becomes to wrap The language model for including the first projection matrix, the second projection matrix and sharing matrix carrys out compressed data.In addition, include embeded matrix and The dimension of the language model of softmax matrix can be with the language including the first projection matrix, the second projection matrix and sharing matrix Say that the dimension of model is identical, compression efficiency can be improved and can be most in and the size by being suitably set sharing matrix Smallization and/or reduction performance degradation.
Processor 120 can be used singular value decomposition (SVD) by sharing matrix be converted into the first matrix, the second matrix and Third matrix.For example, the sharing matrix that size is " l × n " can be converted into the first square that size is " l × l " by processor 120 Battle array, the second matrix that size is " l × r " and third matrix that size is " r × l ".
For example, it is " 100 × 100 " that the sharing matrix that size is " 100 × 600 " can be converted into size by processor 120 The first matrix, the second matrix that size is " 100 × 20 " and third matrix that size is " 20 × 600 ".In this case, The quantity of 60,000 elements of sharing matrix can be reduced to 24,000 elements.In other words, can by again into One step decomposes sharing matrix to improve compression efficiency.
SVD can refer to such as singular value decomposition, and since singular value decomposition is well known technology, will not provide It is described in detail.In addition, the element of the first matrix, the second matrix and third matrix can be unrelated with sharing matrix.
Processor 120 can be by being based on master data relative to the first projection matrix, the second projection matrix, the first square Battle array, the second matrix and third matrix execute RNN training to update the first projection matrix, the second projection matrix, the first matrix, second The element of matrix and third matrix, and using have updated the first projection matrix of element, the second projection matrix, the first matrix, Second matrix and third matrix are generated through compressing language model.The method by training more new element is hereinbefore described, And it therefore, detailed description thereof will not be repeated.
Figure 1B is the block diagram for showing the example arrangement of electronic equipment 100.According to Figure 1B, electronic equipment 100 may include depositing Storage device 110, processor (e.g., including processing circuit) 120, communicator (e.g., including telecommunication circuit) 130, user interface (e.g., including interface circuit) 140, display 150, audio processor (e.g., including audio frequency processing circuit) 160, and view Frequency processor (e.g., including video processing circuits) 170.It will not repeat member shown in Figure 1B Chong Die with element shown in figure 1A The detailed description of part.
Processor 120 be may include various processing circuits and be come using the various programs being stored in storage device 110 The integrated operation of controlling electronic devices 100.
Processor 120 may include, such as, but not limited to, RAM 121, ROM 122, host CPU 123, graphics processor 124, first interface 125-1 is to the n-th interface 125-n and bus 126.
RAM 121, ROM 122, host CPU 123, graphics processor 124, first interface 125-1 to the n-th interface 125-n can To be connected to each other via bus 126.
First interface 125-1 to the n-th interface 125-n can be connect with above-mentioned component.One of interface can be via The network interface that network is connect with external equipment.
The accessible storage device 110 of host CPU 123, and use the operating system (O/ being stored in storage device 110 S starting) is executed, and executes various operations using the various programs being stored in storage device 110.
ROM 122 can store the order word set etc. for activation system.Once connecting order in response to input and supplying Electric power, host CPU 123 can answer the O/S being stored in storage device 110 in response to the order that is stored in ROM 122 It makes RAM 121, execute O/S, and activation system.Once completing starting, host CPU 123 can will be stored in storage device Various application copies in 110 copy to the application program of RAM 121 to RAM 121, execution, and execute various behaviour Make.
Computing unit (not shown) and rendering unit (not shown) can be used to generate including such as in graphics processor 124 The screen of the various objects such as icon, image, text.Computing unit can calculate such as coordinate based on the control command received The attribute values such as value, shape, size, color follow the layout of screen using the attribute value to show each object.Rendering unit The screen of the various layouts including object can be generated based on the attribute value calculated by computing unit.It is generated in rendering unit Screen may be displayed on the display area of display 150.
The operation of above-mentioned processor 120 can be executed by the program being stored in storage device 110.
Storage device 110 can store various data, such as soft for the operating system (O/S) of drive electronics 100 Part module, the language model including embeded matrix and softmax matrix, the compression module for compressing language model, RNN training Module etc..
Communicator 130 may include various telecommunication circuits, and be set by various communication means and various types of outsides Standby communication.Communicator 130 may include various telecommunication circuits, such as, such as, but not limited to, Wi-Fi chip 131, Bluetooth chip 132, wireless communication chips 133 and NFC chip 134 etc..Communicator 130 can be used to set with various outsides in processor 120 Standby communication.
Wi-Fi chip 131 and Bluetooth chip 132 can execute communication by Wi-Fi and bluetooth respectively.Using Wi- In the case where Fi chip 131 or Bluetooth chip 132, it can preferentially emit and receive the connections such as SSID and session key letter It ceases, the information can be used carry out connection communication, and can emit and receive various information.Wireless communication chips 133 can be with Refer to according to various communication standards and executes the chip of communication, such as IEEE, Zigbee, the 3rd generation (3G), the 3rd generation affiliate Project (3GPP), long term evolution (LTE) etc..NFC chip 134 can refer to the chip operated in near-field communication (NFC) method, institute It states NFC method and uses 135kHz, 13.56MHz, 433MHz, 860MHz to the various RF-ID frequencies such as 960MHz and 2.45Ghz 13.56MHz frequency band among band.
Processor 120 can receive the language including embeded matrix and softmax matrix from external equipment via communicator 130 Say model.
User interface 140 may include various interface circuits and receive various user's interactions.User interface 140 can be with Various forms is implemented, and the example embodiment of electronic equipment 100 is specifically dependent upon.For example, user interface 140 may include, such as But it is not limited to, the button being arranged in electronic equipment 100, the microphone for receiving user's input, the camera for detecting user movement etc. Deng.In addition, user interface 140 can be carried out if electronic equipment 100 is implemented as the electronic equipment based on touch Such as, but not limited to, to form the touch screen etc. of interlayer structure with touch tablet.In this case, user interface 140 can be with As aforementioned display device 150.
Audio processor 160 may include the various circuits for handling audio data.Audio processor 160 can be opposite Various processing operations are executed in audio data, such as decodes, amplify, noise filtering.
Video processor 170 may include the various circuits for executing the processing to video data.Video processor 170 Various image processing operations, decoding, the conversion of scaling, noise filtering, frame per second, conversion of resolution etc. can be executed.
By the above method, processor 120 can be in the case where minimizing and/or reducing the performance degradation of language model Language model including embeded matrix and softmax matrix is converted into data to be compressed and including the first projection matrix, second Projection matrix, the first matrix, the second matrix and third matrix language model.
It is described in more detail for providing the method for recommending word by the language model as above compressed.
Fig. 2 is the block diagram for showing the electronic equipment 200 according to another example embodiment.As shown in Fig. 2, electronic equipment 200 It may include storage device 210 and processor (e.g., including processing circuit) 220.
Electronic equipment 200 can provide recommendation word.For example, electronic equipment 200 can receive the input of user spoken utterances and Recommendation word after user spoken utterances is provided.For example, if input user spoken utterances " weather of today be ... ", electronics is set Standby 200, which can provide " sunny ", " cold " etc., recommends word.
Electronic equipment 200 may be implemented as, and such as, but not limited to, Desktop PC, smart phone, is put down at laptop computer Plate PC, server etc..In addition, electronic equipment 200 may be implemented as the equipment with small memory space.
Storage device 210 can store language model, and the language model includes the first projection for being used as embeded matrix Matrix, the first matrix, the second matrix and third matrix, and it is used as the second projection matrix of softmax matrix, the first square Battle array, the second matrix and third matrix.Storage device 210 can store described in Figure 1A and Figure 1B through compressing language model.
Processor 220 can control the integrated operation of electronic equipment 200.
According to example embodiment, processor 220 may be implemented as, such as, but not limited to, digital signal processor (DSP), microprocessor or time controller (TCON) etc., but not limited to this.Processor 220 may include, such as but unlimited In, below one or more: application specific processor, central processing unit (CPU), micro controller unit (MCU), microprocessing unit (MPU), controller, application processor (AP), communication processor (CP) and arm processor etc. or processor 220 can be with It is defined as one of above item.Processor 220 can also be implemented as providing the system on chip (SoC) of Processing Algorithm or big Scale integrates (LSI), or is embodied as the form etc. of field programmable gate array (FPGA).
Processor 220 can obtain the first data in response to first word of input, wherein corresponding to the of first word One vector is based on the first projection matrix, the first matrix, the second matrix and third matrix and is mapped to vector space.For example, if User's input word " I ", then processor 220 can will correspond to first vector of word " I " multiplied by the first projection matrix, first Matrix, the second matrix and third matrix, and obtain first data with higher-dimension.
Processor 220 can be based on the second projection matrix, the first matrix, the second matrix and third matrix and from the first data Obtain the second vector.For example, the value of each column of the second vector can between 0 and 1, and the value of all column and can be 1。
Processor 220 can provide recommendation word based on the second vector.For example, processor 220 can provide and the second arrow The corresponding word of the column where maximum value among the element value of amount recommends word as first, and provides the element with the second vector The corresponding word of column where the second maximum value among value recommends word as second.For example, processor 220 can provide word "Yes" Recommend word as first, and word " being once " is recommended to recommend word as second.
However, example embodiment is not limited to above-mentioned example.Processor 220 can provide the recommendation word of any different number.
Processor 220 can obtain the second data in response to inputting second word after inputting first word, wherein Third vector corresponding to second word is based on the first projection matrix, the first matrix, the second matrix and third matrix and is mapped to Vector space.
For example, if after the input word " I " input word "Yes", processor 220 can be by that will correspond to word The third vector of "Yes" obtains with higher-dimension multiplied by the first projection matrix, the first matrix, the second matrix and third matrix Two data.In other words, processor 220 is it is contemplated that the word " I " being previously entered and word "Yes" currently entered.
Processor 220 can generate third data based on the first data and the second data.For example, processor 220 can be with The first data are based on by shot and long term mnemonics (LSTM) and the second data generate third data.Processor 220 can also lead to The weighted sum of the first data and the second data is crossed to generate third data.
Processor 220 can be based on the second projection matrix, the first matrix, the second matrix and third matrix and from third data Obtain the four-vector.
For example, processor 220 can obtain the product of the second projection matrix, the first matrix, the second matrix and third matrix Transposed matrix, and by by the transposed matrix multiplied by therefrom obtain third data transposed matrix come obtain the 4th arrow Amount.For example, the value of each column of the four-vector can between 0 and 1, and the value of all column and can be 1.
Processor 220 can provide recommendation word based on the four-vector.For example, processor 220 can provide and the 4th arrow The corresponding word of the column where maximum value among the element value of amount recommends word as first, and provides the element with the four-vector The corresponding word of column where the second maximum value among value recommends word as second.For example, processor 220 can recommend word " one It is a " as the first recommendation word, and recommend word " extremely busy " as second and recommend word.
However, example embodiment is not limited to above-mentioned example.Processor 220 can receive more words and provide more recommend Word.However, it is possible to limit the quantity of the word inputted in the previous period, processor 220 refers to institute's predicate.For example, If inputting current word, processor 220 can be only with reference to three or less inputted in the previous period word.
In addition, previous time section can be from the predetermined amount of time that current time is counted.For example, if input is current Word, then processor 220 can be only with reference to the word inputted during nearest 10 seconds.
In addition, processor 220 can receive a word and provide a recommendation word.In other words, processor 220 can be with Without reference to the word inputted in the previous period.
Electronic equipment 200 can also include input unit (not shown) and output unit (not shown).Input unit can be with Including various input circuits and for receiving word from user, and may be implemented as, such as, but not limited to, microphone, key Disk etc..Output unit may include various output circuits and be configured to provide recommendation word, and may be implemented as, such as But it is not limited to, display, loudspeaker etc..
The structure of processor 220 can be equal with the structure of the processor 120 in Figure 1B, and therefore will not repeat it in detail Thin description.
Electronic equipment 200 can provide recommendation word, as described above.Meanwhile electronic equipment 200 can store and throw including first Shadow matrix, the second projection matrix, the first matrix, the second matrix and third matrix language model and correspondingly can be into one Walk the first meter of the first projection matrix executed multiplied by as embeded matrix, the first matrix, the second matrix and third matrix multiple Calculate, multiplied by the second projection matrix, the first matrix, the second matrix and third matrix second calculate, and based on second calculating come Calculate the third calculating that may be used as the transposed matrix of matrix of softmax matrix.The time of such calculating can be very short, and It and therefore, can be with there is no problem in terms of recommendation word is provided.
The electronic equipment 100 in Figure 1A and Figure 1B, which has been described, can be the portion separated with the electronic equipment 200 in Fig. 2 Part, but the electronic equipment may be implemented as a component.
The operation of the electronic equipment for compressing language model is more fully described hereinafter with reference to attached drawing and for providing Recommend the operation of the electronic equipment of word.
Fig. 3 A and Fig. 3 B are the exemplary figures for showing conventional RNN training.
As shown in Figure 3A, processor 120 can execute the operation of word insertion, wherein the vector for corresponding to input word is mapped to Vector space.Embeded matrix can be used for word insertion.
Processor 120 can by time t-3 input first word, time t-2 at input second word and The third word inputted at time t-1 is mapped to vector space in order, and hides in recurrence and be based on reflecting in the level stage It is mapped to the first data, the second data and the third data of vector space and generates the 4th data.For example, processor 120 can lead to Too long short-term memory (LSTM) method or weighted sum method are based on the first data, the second data and third data and generate the 4th data.
In the softmax level stage, processor 120 can be sweared the 4th data conversion in vector space at recovery Amount.Softmax matrix can be used for converting.Processor 120 can will restore vector and will input at time t the 4th Word is compared, and updates the element of embeded matrix and softmax matrix.Above procedure, which can be referred to as, trains.
Fig. 3 B is the figure for showing the process being trained using the more specific example of master data, and will be related to Fig. 3 A Connection ground description Fig. 3 B.
When processor 120 relative in Fig. 3 B first sentence execute training when, processor 120 can in order by At time t-3 input word " I ", time t-2 at input word " it is desirable that " and time t-1 at input word " I " map The 4th data are generated to vector space, and based on the first data, the second data and the third data that are mapped to vector space.
Processor 120 can by the 4th data conversion in vector space at restore vector, and will restore vector with The word " meeting " inputted at time t is compared and based on the element for comparing and updating embeded matrix and softmax matrix.In other words Say, the element of embeded matrix and softmax matrix can be updated, in response to input word " I " in order, " it is desirable that " and " I " And export word " meeting ".
In addition, at time t+1, can in order input word " I ", " it is desirable that " and " I " and " meeting ", and processor 120 can execute training by same procedure.In other words, the element of embeded matrix and softmax matrix can be updated, with In response to input word " I " in order, " it is desirable that " and " I " and " meeting " and export word " success ".Once completing and a sentence Relevant training, processor 120 can execute training relevant to other four sentences.
Language model can be generated based on training to provide best recommendation word.For example, if on language model execute with The relevant RNN training of master data, then word "Yes" can be provided as recommendation word as input word " I " at time t-1. This is because if word " I " is first word in five sentences, second word can be " it is desirable that ", "Yes", " doing ", "Yes" and "Yes", and since word "Yes" is repeated three times in the training process, embeded matrix and softmax can be updated Matrix, so that the most suitable recommendation word after word " I " can be "Yes".
Fig. 3 B is provided in order to describe, and embeded matrix can be updated by executing training relevant to a large amount of sentences With the element of softmax matrix.
Fig. 4 is the figure for showing exemplary compression method according to example embodiment.
Knowledge distillation in Fig. 4 can refer to for example for generating multiple language models and use is from multiple language models Each of output average recommendation word come the method that improves the performance of language model.
Compression and retraining can refer to for example for compressing the language model including embeded matrix and softmax matrix Method, and the method can have two steps.
In the first step, embeded matrix can be converted into the product of the first projection matrix and sharing matrix, the product With size identical with the size of embeded matrix, the transposed matrix of softmax matrix can be converted into the second projection matrix With the product of sharing matrix, which has size identical with the size of the transposed matrix of softmax matrix, is instructed by RNN Practice to update the element of the first projection matrix, the second projection matrix and sharing matrix, and determines performance.It can be for various big Above procedure is repeatedly carried out in small sharing matrix, and can obtain performance almost without degenerating and having optimal compression efficiency Sharing matrix size, and can be used sharing matrix obtained come generate mainly through compress language model.
In the second step, sharing matrix can be converted into the first matrix, the second matrix and third square for example, by SVD Battle array, and the first projection matrix, the second projection matrix, the first matrix, the second matrix and third can be updated by RNN training The element of matrix simultaneously generates second through compressing language model.
Fig. 5 is the figure for showing performance and compression efficiency through compressing language model.PP in Fig. 5 can refer to that for example word is multiple Miscellaneous degree, and CR can refer to such as compression ratio.
Using the baseline of basic language model, PP can be 56.55, and the size of data can be 56.76.If the knowledge in application drawing 4 distills (KD), performance can be improved and PP can become more smaller than baseline 55.76。
In the case where using the sharing matrix mainly through compression language model, PP can be 55.07, and data is big It is small to can be 33.87.In other words, in the case where sharing matrix, PP can be similar to baseline or KD, but the size of data can To reduce 1.68 (based on CR).
In the case where use second is through compressing low-rank and the retraining of language model, PP can be 59.78, and data Size can be 14.80.In other words, in the case where low-rank and retraining, PP can increase slightly above baseline or KD PP reduces performance slightly but the size of data can reduce 3.84 (based on CR).The compression ratio of low-rank and retraining Sharing matrix can be higher than.
Processor 120 can will from second through compress language model the first projection matrix, the second projection matrix, first The Quantification of elements of matrix, the second matrix and third matrix, and third is generated through compressing language model.For example, processor 120 can With by the member of four bytes of the element of the first projection matrix, the second projection matrix, the first matrix, the second matrix and third matrix Element is quantized into two bytes, and generates third through compressing language model.
As shown in figure 5, PP can using in the case where second through quantifying the language model of element in compression language model To be 59.78, and the size of data can be 7.40.In other words, in the case where quantization, PP can be instructed with low-rank and again Experienced PP is identical, and the size of data can reduce 7.68 (based on CR).The compression ratio of quantization is higher than low-rank and retraining Compression ratio.
As set forth above, it is possible in the case where minimizing and/or reducing performance degradation by by embeded matrix and softmax Matrix is divided into the matrix of multiple small sizes and the Quantification of elements of matrix is carried out compressed data.
Fig. 6 is the process for showing the example operating method for electronic equipment compression language model according to example embodiment Figure.Electronic equipment can for example store the language model including embeded matrix and softmax matrix, the base on the language model It is trained in including the master data of multiple sentences to execute recurrent neural network (RNN).
Embeded matrix can be converted into the product of the first projection matrix and sharing matrix, which has and embeded matrix The identical size of size, and the transposed matrix of softmax matrix can be converted into the second projection matrix and sharing matrix Product, the product have size (S610) identical with the size of the transposed matrix of softmax matrix.It can be by being based on base Notebook data and execute RNN training relative to the first projection matrix, the second projection matrix and sharing matrix and update the first projection square The element (S620) of battle array, the second projection matrix and sharing matrix.
The method can also include: based on test module and relative to the first projection matrix, the second projection matrix and altogether Matrix is enjoyed to calculate word complexity;It is equal to or more than predetermined value in response to word complexity and obtains size and be greater than the big of sharing matrix Small new sharing matrix, and obtain size lower than predetermined value in response to word complexity and be total to less than the new of the size of sharing matrix Enjoy matrix;And the first projection matrix, the second projection matrix and shared square are recalculated using new sharing matrix obtained Battle array.
The method can also include calculating reference relative to embeded matrix and softmax matrix based on text module Word complexity, and predetermined value is determined based on reference word complexity.
Recalculate may include using word complexity lower than the minimal size among multiple sharing matrix of predetermined value Sharing matrix recalculates the first projection matrix, the second projection matrix and sharing matrix, and based on first recalculated Projection matrix, the second projection matrix and sharing matrix come generate through compress language model.
The method can also include: that sharing matrix is converted into the first matrix, the second square using singular value decomposition (SVD) Battle array and third matrix;By being based on master data relative to the first projection matrix, the second projection matrix, the first matrix, the second square Battle array and third matrix execute RNN training to update the first projection matrix, the second projection matrix, the first matrix, the second matrix and the The element of three matrixes;And based on the first projection matrix, the second projection matrix, first matrix, second with the element updated Matrix and third matrix are generated through compressing language model.
The method can also include: to obtain the first data, wherein in one of multiple sentences included first Corresponding first vector of a word is based on the first random matrix and is mapped to vector space;It is included in the sentence in response to inputting And second word after first word and obtain the second data, wherein the second vector corresponding with second word is based on the One random matrix and be mapped to vector space, generate third data based on the first data and the second data;It is random based on second Matrix and recovery vector is obtained from third data;And based on recovery vector and corresponding to the third word after second word Third vector between difference and update the element of the first random matrix and the second random matrix and execute training.
The method can also include updated based on the remaining sentence in multiple sentences the first random matrix and second with The element of machine matrix, and the first random matrix of element will be had updated based on remaining sentence and the second random matrix is stored in Embeded matrix and softmax matrix are used as in storage device.
The transposed matrix of embeded matrix and softmax matrix can have identical size.
Fig. 7 is the process for showing the example operating method of the electronic equipment for providing recommendation word according to example embodiment Figure.Electronic equipment can store language model, and the language model includes the first projection matrix for being used as embeded matrix, first Matrix, the second matrix and third matrix, and it is used as the second projection matrix, the first matrix, the second matrix of softmax matrix With third matrix.
The first data can be obtained in response to first word of input, wherein the first vector for corresponding to first word is based on It is used as the first projection matrix, the first matrix, the second matrix and the third matrix of embeded matrix and is mapped to vector space (S710).Can based on the second projection matrix, the first matrix, the second matrix and the third matrix for being used as softmax matrix and The second vector (S720) is obtained from the first data.It can be provided based on the second vector and recommend word (S730).
The method can also include: to obtain the second number in response to inputting second word after inputting first word According to, wherein correspond to second word third vector be based on the first projection matrix, the first matrix, the second matrix and third matrix and It is mapped to vector space;Third data are generated based on the first data and the second data;Based on the second projection matrix, the first square Battle array, the second matrix and third matrix and obtain the four-vector from third data;And recommendation word is provided based on the four-vector.
According to said one or multiple example embodiments, electronic equipment can compress the language model for executing RNN training Data, and the electronic equipment with relatively small memory space can store through compressing language model.In addition, electronic equipment can To be based on providing recommendation word through compression language model in the case where minimizing and/or reducing performance degradation.
According to example embodiment, said one or multiple example embodiments may be implemented as include be stored in it is machine readable The software of instruction in storage medium.Machine can call the store instruction being stored in a storage medium and the instruction according to calling It is operated, and may include electronic equipment according to example embodiment.If instruction is executed by processor, processor The function corresponding to instruction can be directly executed, or function can be executed using other component under the control of a processor. Instruction may include the code that can be generated or be executed by compiler or interpreter.It can be mentioned in the form of non-transitory storage medium For machine readable storage medium." non-transitory " may not necessarily mean that storage medium includes signal but can simply mean Signal can be tangible, and the term can not indicate semi-permanently or provisionally storing data.
In addition, may include according to example embodiment, in computer program product and provide real in one or more examples Apply method described in example.Computer program product can be the commodity traded between the seller and buyer.Computer program produces Product can have the form (for example, compact disc read-only memory [CD-ROM]) of machine readable storage medium, or can be via answering It is distributed online with shop (for example, PlayStoreTM).If distributing computer program product online, computer program is produced At least part of product can be temporarily stored in storage medium, such as the clothes of the server of company, manufacturer, application shop The memory of business device or Relay Server, or can provisionally generate.
Above-mentioned various example embodiments can be embodied in computer or similar to computer equipment using software, hardware or In the readable recording medium of any combination thereof.In some cases, above example embodiment can be presented as processor.According to soft Part implementation, the described example embodiments such as program and function can be presented as individual software module.Software module Each of can execute one or more functions described in example embodiment and operation.
In addition, the computer instruction of the processing operation for executing equipment according to one or more example embodiments can be with It is stored in non-transitory computer-readable medium.Being stored in computer instruction in non-transitory computer-readable medium can be with A certain equipment/device is controlled according to various example embodiments in equipment/device when instruction is executed by equipment/device processor Middle execution processing operation.Non-transitory computer-readable medium can refer to the machine readable media or device of storing data.It is non- The example of temporary computer-readable medium may include, but be not limited to, CD (CD), digital versatile disc (DVD), blue light light Disk, universal serial bus (USB) stick, storage card, ROM etc..
In addition, each of component described in said one or multiple example embodiments (for example, module or program) It may be configured to that there are one or more entities, and can be omitted some or other sons in above-mentioned corresponding subassembly Component can also be included in example embodiment.By alternative or it is additional in a manner of, in component it is some (for example, module or Program) it can integrate as an entity, and can equally or similarly execute and will be executed before integrated by each component Function.It can sequentially, in parallel, again according to the operation that various example embodiments execute by module, program or another component Implement again or heuristically, perhaps at least some of operation can be performed in different, is omitted or can add Another operation.
Foregoing example embodiment and advantage are only exemplary, and are understood not to limitation example embodiment.It is right The description of exemplary embodiment is intended to illustrate, without limiting the scope of the present disclosure as defined by the appended claims, and it is more A substitution, change and variation will be evident for a person skilled in the art.

Claims (15)

1. a kind of electronic equipment, is configured to compression language model, the electronic equipment includes:
Storage device is configured to storage language model, and the language model includes by based on the basic number including multiple sentences According to the embeded matrix and softmax matrix for carrying out recurrent neural network RNN training and generating;And
Processor is configured to:
The embeded matrix is converted into the product of the first projection matrix and sharing matrix, first projection matrix with it is described total The product for enjoying matrix has size identical with the size of the embeded matrix, and by the transposition square of the softmax matrix Battle array is converted into the product of the second projection matrix Yu the sharing matrix, the product of second projection matrix and the sharing matrix With size identical with the size of the transposed matrix of the softmax matrix, and
By being based on the master data relative to first projection matrix, second projection matrix and the shared square Battle array executes the RNN training to update the element of first projection matrix, second projection matrix and the sharing matrix.
2. electronic equipment as described in claim 1, wherein the processor is configured to:
It is determined based on test module relative to first projection matrix, second projection matrix and the sharing matrix Word complexity,
It is equal to or more than predetermined value in response to institute's predicate complexity, it is new shared greater than the size of the sharing matrix obtains size Matrix, and it is less than the predetermined value in response to institute's predicate complexity, it is new less than the size of the sharing matrix to obtain size Sharing matrix, and
First projection matrix, second projection matrix and the shared square are redefined using the new sharing matrix Battle array.
3. electronic equipment as claimed in claim 2, wherein the processor is configured to: based on text module relative to institute Embeded matrix and the softmax matrix is stated to determine to determine reference word complexity, and based on the reference word complexity The predetermined value.
4. electronic equipment as claimed in claim 3, wherein the processor is configured to: using word complexity lower than described pre- The sharing matrix of minimal size among multiple sharing matrix of definite value redefines first projection matrix, described second Projection matrix and the sharing matrix, and based on the first projection matrix, the second projection matrix and sharing matrix redefined To generate through compressing language model.
5. electronic equipment as described in claim 1, wherein the processor is further configured to:
The sharing matrix is converted into the first matrix, the second matrix and third matrix using singular value decomposition SVD,
By based on the master data relative to first projection matrix, second projection matrix, first matrix, Second matrix and the third matrix execute the RNN training to update first projection matrix, second projection Matrix, first matrix, second matrix and the third matrix element, and
Based on having first projection matrix of element updated, second projection matrix, first matrix, described the Two matrixes and the third matrix are generated through compressing language model.
6. electronic equipment as described in claim 1, wherein the processor is configured to:
The first data are obtained, wherein the first vector base corresponding with first word included in one of the multiple sentence It is mapped to vector space in the first random matrix, and included and described first in response to receiving in the sentence The input of second word after a word obtains the second data, wherein the second vector corresponding with second word is based on institute It states the first random matrix and is mapped to the vector space,
Third data are generated based on first data and second data, and
Recovery vector is obtained from the third data based on the second random matrix, and is based on the recovery vector and corresponds to Difference between the third vector of third word after second word and update first random matrix and described The element of two random matrixes, and execute training.
7. electronic equipment as claimed in claim 6, wherein the processor is configured to:
It is updated in first random matrix and second random matrix based on the remaining sentence in the multiple sentence Element, and
By first random matrix with the element updated based on the remaining sentence and second random matrix storage In the storage device respectively as the embeded matrix and the softmax matrix.
8. electronic equipment as described in claim 1, wherein the transposed matrix of the embeded matrix and the softmax matrix has There is identical size.
9. a kind of electronic equipment is configured to provide and recommends word, the electronic equipment includes:
Storage device, be configured to storage language model, the language model include the first projection matrix for being used as embeded matrix, First matrix, the second matrix and third matrix, and it is used as the second projection matrix of softmax matrix, the first matrix, second Matrix and third matrix;And
Processor is configured to:
In response to inputting first word, the first data are obtained, wherein corresponding to the first vector of first word based on described First projection matrix, first matrix, second matrix and the third matrix and be mapped to vector space,
Based on second projection matrix, first matrix, second matrix and the third matrix from described first Data obtain the second vector, and
Recommendation word is provided based on second vector.
10. electronic equipment as claimed in claim 9, wherein the processor is configured to:
In response to receiving the input of second word after inputting first word, the second data are obtained, wherein corresponding to The third vector of second word is based on first projection matrix, first matrix, second matrix and described the Three matrixes and be mapped to the vector space,
Third data are generated based on first data and second data,
Based on second projection matrix, first matrix, second matrix and the third matrix from the third Data obtain the four-vector, and
The recommendation word is provided based on the four-vector.
11. the method that a kind of electronic equipment compresses language model is stored with language model, the language mould in the electronic equipment Type include embeded matrix by carrying out generating and recurrent neural network RNN is trained based on the master data for including multiple sentences and Softmax matrix, which comprises
The embeded matrix is converted into the product of the first projection matrix and sharing matrix, first projection matrix with it is described total The product for enjoying matrix has size identical with the size of the embeded matrix, and by the transposition square of the softmax matrix Battle array is converted into the product of the second projection matrix Yu the sharing matrix, the product of second projection matrix and the sharing matrix With size identical with the size of the transposed matrix of the softmax matrix;And
By being based on the master data relative to first projection matrix, second projection matrix and the shared square Battle array executes the RNN training to update the element of first projection matrix, second projection matrix and the sharing matrix.
12. method as claimed in claim 11, further includes:
It is determined based on test module relative to first projection matrix, second projection matrix and the sharing matrix Word complexity;
It is equal to or more than predetermined value in response to institute's predicate complexity, it is new shared greater than the size of the sharing matrix obtains size Matrix, and it is less than the predetermined value in response to institute's predicate complexity, it is new less than the size of the sharing matrix to obtain size Sharing matrix;And
First projection matrix, second projection matrix and the shared square are redefined using the new sharing matrix Battle array.
13. method as claimed in claim 12, further includes:
Reference word complexity is determined relative to the embeded matrix and the softmax matrix based on text module;And
The predetermined value is determined based on the reference word complexity.
14. method as claimed in claim 13, wherein redefining further include: be less than the predetermined value using word complexity The sharing matrix of minimal size among multiple sharing matrix come redefine first projection matrix, it is described second projection square Battle array and the sharing matrix, and generated based on the first projection matrix, the second projection matrix and sharing matrix that redefine Through compressing language model.
15. method as claimed in claim 11, further includes:
The sharing matrix is converted into the first matrix, the second matrix and third matrix using singular value decomposition (SVD);
By based on the master data relative to first projection matrix, second projection matrix, first matrix, Second matrix and the third matrix execute the RNN training to update first projection matrix, second projection Matrix, first matrix, second matrix and the third matrix element;And
Based on having first projection matrix of element updated, second projection matrix, first matrix, described the Two matrixes and the third matrix are generated through compressing language model.
CN201880005774.XA 2017-03-09 2018-02-06 Electronic device for compressing language model, electronic device for providing recommended word, and operating method thereof Active CN110168542B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201762469089P 2017-03-09 2017-03-09
US62/469,089 2017-03-09
KR10-2017-0147922 2017-11-08
KR1020170147922A KR102488338B1 (en) 2017-03-09 2017-11-08 Electronic apparatus for compressing language model, electronic apparatus for providing recommendation word and operation methods thereof
PCT/KR2018/001611 WO2018164378A1 (en) 2017-03-09 2018-02-06 Electronic apparatus for compressing language model, electronic apparatus for providing recommendation word and operation methods thereof

Publications (2)

Publication Number Publication Date
CN110168542A true CN110168542A (en) 2019-08-23
CN110168542B CN110168542B (en) 2023-11-24

Family

ID=63719251

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201880005774.XA Active CN110168542B (en) 2017-03-09 2018-02-06 Electronic device for compressing language model, electronic device for providing recommended word, and operating method thereof

Country Status (3)

Country Link
EP (1) EP3577571A4 (en)
KR (1) KR102488338B1 (en)
CN (1) CN110168542B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781690A (en) * 2019-10-31 2020-02-11 北京理工大学 Fusion and compression method of multi-source neural machine translation model
US20200257970A1 (en) * 2019-02-08 2020-08-13 Korea Advanced Institute Of Science And Technology Data processing apparatus by learning of neural network, data processing method by learning of neural network, and recording medium recording the data processing method
CN111597825A (en) * 2020-05-13 2020-08-28 北京字节跳动网络技术有限公司 Voice translation method and device, readable medium and electronic equipment

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102234155B1 (en) 2018-08-31 2021-03-30 주식회사 엘지화학 System and method for correcting current value of shunt resistor
KR102331242B1 (en) * 2019-05-20 2021-11-25 에스케이텔레콤 주식회사 Memory network apparatus and deducing method using the same
KR20210098247A (en) 2020-01-31 2021-08-10 삼성전자주식회사 Electronic device and operating method for the same

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011221895A (en) * 2010-04-13 2011-11-04 Fujifilm Corp Matrix generation device, method, program and information processing device
CN104636317A (en) * 2015-03-09 2015-05-20 湘潭大学 Method for optimizing interactive projection measurement matrix based on feature value decomposition
CN105679317A (en) * 2014-12-08 2016-06-15 三星电子株式会社 Method and apparatus for language model training and speech recognition
CN105810193A (en) * 2015-01-19 2016-07-27 三星电子株式会社 Method and apparatus for training language model, and method and apparatus for recognizing language
CN106407211A (en) * 2015-07-30 2017-02-15 富士通株式会社 Method and device for classifying semantic relationships among entity words

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9728184B2 (en) * 2013-06-18 2017-08-08 Microsoft Technology Licensing, Llc Restructuring deep neural network acoustic models
JP6628350B2 (en) * 2015-05-11 2020-01-08 国立研究開発法人情報通信研究機構 Method for learning recurrent neural network, computer program therefor, and speech recognition device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011221895A (en) * 2010-04-13 2011-11-04 Fujifilm Corp Matrix generation device, method, program and information processing device
CN105679317A (en) * 2014-12-08 2016-06-15 三星电子株式会社 Method and apparatus for language model training and speech recognition
CN105810193A (en) * 2015-01-19 2016-07-27 三星电子株式会社 Method and apparatus for training language model, and method and apparatus for recognizing language
CN104636317A (en) * 2015-03-09 2015-05-20 湘潭大学 Method for optimizing interactive projection measurement matrix based on feature value decomposition
CN106407211A (en) * 2015-07-30 2017-02-15 富士通株式会社 Method and device for classifying semantic relationships among entity words

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
OFIR PRESS AND LIORWOLF: "Using the Output Embedding to Improve Language Models", 《ARXIV》 *
王龙等: "基于RNN汉语语言模型自适应算法研究", 《火力与指挥控制》 *
祁文青: "一种改进的中文分词算法", 《黄石理工学院学报》 *
顾思远等: "基于软聚类的模糊类语言模型", 《军事通信技术》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200257970A1 (en) * 2019-02-08 2020-08-13 Korea Advanced Institute Of Science And Technology Data processing apparatus by learning of neural network, data processing method by learning of neural network, and recording medium recording the data processing method
US11741356B2 (en) * 2019-02-08 2023-08-29 Korea Advanced Institute Of Science & Technology Data processing apparatus by learning of neural network, data processing method by learning of neural network, and recording medium recording the data processing method
CN110781690A (en) * 2019-10-31 2020-02-11 北京理工大学 Fusion and compression method of multi-source neural machine translation model
CN110781690B (en) * 2019-10-31 2021-07-13 北京理工大学 Fusion and compression method of multi-source neural machine translation model
CN111597825A (en) * 2020-05-13 2020-08-28 北京字节跳动网络技术有限公司 Voice translation method and device, readable medium and electronic equipment
CN111597825B (en) * 2020-05-13 2021-07-23 北京字节跳动网络技术有限公司 Voice translation method and device, readable medium and electronic equipment

Also Published As

Publication number Publication date
EP3577571A1 (en) 2019-12-11
KR20180103671A (en) 2018-09-19
CN110168542B (en) 2023-11-24
EP3577571A4 (en) 2020-02-26
KR102488338B1 (en) 2023-01-13

Similar Documents

Publication Publication Date Title
CN110168542A (en) For compressing the electronic equipment of language model, for providing the electronic equipment and its operating method of recommending word
US10691886B2 (en) Electronic apparatus for compressing language model, electronic apparatus for providing recommendation word and operation methods thereof
KR102473447B1 (en) Electronic device and Method for controlling the electronic device thereof
US11651214B2 (en) Multimodal data learning method and device
KR101882704B1 (en) Electronic apparatus and control method thereof
CN111816159B (en) Language identification method and related device
EP3702905A1 (en) Electronic device and control method thereof
CN110069715A (en) A kind of method of information recommendation model training, the method and device of information recommendation
US20210279589A1 (en) Electronic device and control method thereof
US11443116B2 (en) Electronic apparatus and control method thereof
US10733481B2 (en) Cloud device, terminal device, and method for classifying images
CN113761153A (en) Question and answer processing method and device based on picture, readable medium and electronic equipment
KR20200044173A (en) Electronic apparatus and control method thereof
US10997947B2 (en) Electronic device and control method thereof
CN111797854A (en) Scene model establishing method and device, storage medium and electronic equipment
KR102071198B1 (en) Control method, device and system of knowledge sharing platform for managing character, programming file and programming instruction
KR20190135888A (en) Electronic apparatus and contorl method thereof
KR102071197B1 (en) Control method, device and system of knowledge sharing platform for content management of coding education game
KR102059017B1 (en) Control method, apparatus and system for knowledge sharing platform
WO2023051678A1 (en) Recommendation method and related device
US20230106213A1 (en) Machine learning model compression using weighted low-rank factorization
CN114357138A (en) Question and answer identification method and device, electronic equipment and readable storage medium
KR20230013876A (en) System and method for providing interface to applied actual feeling learning contents platform
KR20200027085A (en) Electronic apparatus and control method thereof
KR102071199B1 (en) Control method, device and system of knowledge sharing platform for 2d / 3d design result management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant