EP3937060A1 - Method and apparatus for training semantic representation model, device and computer storage medium - Google Patents

Method and apparatus for training semantic representation model, device and computer storage medium Download PDF

Info

Publication number
EP3937060A1
EP3937060A1 EP21163589.1A EP21163589A EP3937060A1 EP 3937060 A1 EP3937060 A1 EP 3937060A1 EP 21163589 A EP21163589 A EP 21163589A EP 3937060 A1 EP3937060 A1 EP 3937060A1
Authority
EP
European Patent Office
Prior art keywords
language
training
semantic representation
layers
representation model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP21163589.1A
Other languages
German (de)
French (fr)
Inventor
Shuohuan Wang
Jiaxiang Liu
Xuan OUYANG
Yu Sun
Hua Wu
Haifeng Wang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Publication of EP3937060A1 publication Critical patent/EP3937060A1/en
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • the present application relates to the technical field of computer applications, and particularly to an artificial intelligence technology.
  • the present application provides a method and apparatus for training a semantic representation model, a device, a computer storage medium and a computer program product, for a language with a small number of language materials.
  • the present application provides a method for training a semantic representation model, including:
  • the present application further provides an apparatus for training a semantic representation model, including:
  • the present application provides an electronic device, including:
  • the present application further provides a non-transitory computer readable storage medium with computer instructions stored thereon, wherein the computer instructions are used for causing a computer to perform the method as mentioned above.
  • the present application further provides a computer program product, comprising instructions which, when the program is executed by a computer, cause the computer to perform the method as mentioned above.
  • the trained semantic representation model for the existing language is used fully, and each layer is successively migrated and trained to obtain the semantic representation model for another language, which remarkably reduces the cost for collecting training samples for the language with a quite small number of language materials, and achieves a higher training efficiency.
  • the present application has a core idea that a semantic representation model of a first language which is trained sufficiently is utilized to assist in training a semantic representation model of a second language.
  • a semantic representation model of a first language which is trained sufficiently is utilized to assist in training a semantic representation model of a second language.
  • examples referred in following embodiments are described with English as the first language and Chinese as the second language, but the present application is not limited thereto, and may be applied to any language.
  • a semantic representation model in the present application may be configured as a Recurrent Neural Network (RNN), a Convolutional Neural Network (CNN), a Transformer model, or the like.
  • RNN Recurrent Neural Network
  • CNN Convolutional Neural Network
  • Transformer model is used as an example for describing the examples referred in following embodiments, and other models have similar implementation principles.
  • Fig. 1 is a flow chart of a method for training a semantic representation model according to a first embodiment of the present application, and an apparatus for training a semantic representation model serves as a subject for executing this method, and may be configured as an application located in a computer system/server, or as a functional unit, such as a plug-in or Software Development Kit (SDK) located in the application in the computer system/server.
  • the method may include the following steps: 101: Acquiring a semantic representation model which has been trained for a first language as a first semantic representation model.
  • English serves as the first language; since English is internationally common and usually has many language materials, a semantic representation model, such as a Transformer model, may be easily and well trained using English. In this step, a trained English Transformer model is used as the first semantic representation model for a subsequent migration training process to assist in training a Chinese Transformer model.
  • a semantic representation model such as a Transformer model
  • a bottom layer and a top layer of the first semantic representation model as trained layers, initializing the trained layers, keeping model parameters of other layers unchanged, and training the trained layers using training language materials of a second language until a training ending condition is met.
  • the language material is usually a text containing a mask and a character corresponding to the mask.
  • the character corresponding to [mask] is " " ("ate") .
  • the characters corresponding to [mask] are " " "way") and " " " " "find") respectively.
  • the Transformer model has a function of predicting the character corresponding to the mask in the training language material and making the predicted result meet expectation (the character corresponding to the mask in the training language material) as much as possible.
  • the Transformer model has a multilayer structure, as shown in Fig. 2 .
  • a bottom layer is an embedding layer represented by Embedding Layer, and is configured to determine vector representation of each character in the training language material.
  • a top layer is a fully-connected layer usually represented by Task Layer, and is configured to map the vector representation processed by each middle layer of the Transformer model, so as to obtain content prediction of the mask in the training language material.
  • a plurality of layers are contained between the top layer and the bottom layer and usually represented by Transformer Block.
  • Each Transformer Block is used for processing the input vector representation of each character into global vector representation with an Attention mechanism.
  • Each Transformer Block refers to the global vector representation of the previous layer when performing the Attention mechanism.
  • the working mechanism of each Transformer Block is not detailed here. For example, three Transformer Blocks exist in Fig. 2 in the embodiment of the present application.
  • the bottom layer of the Transformer model pays more attention on processing literal logic
  • the top layer pays more attention on semantic logic
  • the semantic logic of the top layer has higher consistency for different languages.
  • each layer is trained successively, the bottom layer and the top layer are trained first, and then, each middle layer is trained in combination with the bottom layer and the top layer.
  • stage (a) as shown in Fig. 2 the Embedding Layer and the Task Layer in the English Transformer model are initialized as the trained layers; that is, the model parameters are initialized.
  • the parameters of the other layers, i.e., the Transformer Blocks, are kept unchanged; that is, the parameters of each Transformer Block still keep the model parameters obtained in the previous English training process.
  • Chinese training language materials are input to train the trained Embedding Layer and Task Layer.
  • Each training process of the trained layer has a training target that the prediction result of the mask by the Task Layer meets expectation. That is, a loss function may be constructed according to the training target, and the model parameters of the trained layer may be optimized using values of the loss function.
  • Each trained layer has a training ending condition that Loss converges gradually or an iteration number reaches a preset threshold.
  • stage (a) in the process of training the Embedding Layer and the Task Layer using the Chinese training language materials, iteration is performed by the Loss, and the parameters of the Embedding Layer and the Task Layer are optimized gradually until the Loss converges gradually or the iteration number reaches the preset threshold.
  • Transformer Block1 is first brought into the trained layers from bottom to top, and at this point, the trained layers include the Embedding Layer, the Task Layer and the Transformer Block1.
  • the current parameters of the Embedding Layer and the Task Layer are parameters after the training process in 102, and the parameters of the Transformer Block1 are parameters of Transformer Block1 in the English Transformer model.
  • the Embedding Layer, the Task Layer and the Transformer Block1 are trained with parameters of Transformer Block2 and Transformer Block3 kept unchanged.
  • the Transformer Block2 is brought into the trained layers from bottom to top, and at this point, the trained layers include the Embedding Layer, the Task Layer, the Transformer Block1 and the Transformer Block2.
  • the Embedding Layer, the Task Layer, the Transformer Block1 and the Transformer Block2 are trained with parameters of the Transformer Block3 kept unchanged.
  • iteration is performed by the Loss, and the parameters of the Embedding Layer, the Task Layer, the Transformer Block1 and the Transformer Block2 are optimized gradually until the Loss converges gradually or the iteration number reaches a preset threshold.
  • the Transformer Block3 is brought into the trained layers from bottom to top, and at this point, the trained layers include the Embedding Layer, the Task Layer, the Transformer Block1, the Transformer Block2 and the Transformer Block3.
  • iteration is performed by the Loss, and the parameters of the Embedding Layer, the Task Layer, the Transformer Block1, the Transformer Block2 and the Transformer Block3 are optimized gradually until the Loss converges gradually or the iteration number reaches a preset threshold.
  • each English middle layer Transformer Block is actually used to perform warm start to train each Chinese Transformer Block.
  • the middle layers may be trained two by two from bottom to top, or more layers may be trained successively.
  • the Chinese Transformer model is obtained, such that a gradual migration training process is performed from the trained English Transformer model to obtain the Chinese Transformer model.
  • a single language material i.e., the Chinese language material
  • the Chinese language material is used to train the Chinese Transformer model by means of migration from the English Transformer model.
  • Utilization of some bilingual parallel language materials may further reduce a training cost and improve a training effect.
  • the training process may be performed with a method in the second embodiment.
  • the semantic representation model trained in the first language is further acquired as a second semantic representation model.
  • the first semantic representation model is used as a basis for performing the layer-by-layer migration training process
  • the second semantic representation model is configured to align a result of the first language output by the second semantic representation model and a result output by the first semantic representation model in the process of training the semantic representation model of the second language.
  • an additional alignment model is required to assist the migration training process of the first semantic representation model, and configured to perform the above-mentioned alignment.
  • the English training language material in the Chinese-English parallel language materials is input into the pre-trained English Transformer model, and an English result output by the Task Layer is input into the alignment model.
  • the Chinese training language material corresponding to the English training language material is input into the Chinese Transformer model in the training process corresponding to the stage (a), and a Chinese result output by the Task Layer is also input into the alignment model.
  • the alignment model processes the output result of the English Transformer model with the Attention mechanism using the output result of the Chinese Transformer model being trained, and then maps an Attention processing result to obtain the prediction result of the mask in the Chinese training language material.
  • the training target is that the prediction result of the mask conforms to the expected character in the training language material.
  • the Loss is constructed using the prediction result of the alignment model, the parameters of the Chinese Transformer model (i.e., the model parameters of the trained layers) being trained are optimized using the values of the Loss, and meanwhile, model parameters of the alignment model are optimized.
  • the fully-connection layer is mapped (Softmax) using a vector formed by each x i ⁇ obtained after the Attention processing process, so as to predict a mask value in the Chinese training language material.
  • the desired character of the mask is " " ("ate") .
  • the Chinese language material and a position identifier of each character are input into the Chinese Transformer model in the training process.
  • the parallel English language material and a position identifier of each character are input into the trained English Transformer model.
  • Each English character output by the English Transformer model and each Chinese character output by the Chinese Transformer model are output to the alignment model, and after performing Attention on the output result of the English Transformer model using the output result of the Chinese Transformer model, the alignment model performs a Softmax mapping operation on a result obtained by the Attention to obtain each Chinese character in Chinese prediction.
  • the Loss is determined using the characters obtained in Chinese prediction and the expected characters of the Chinese language material, and the model parameters of the trained layers in the Chinese Transformer model trained layer by layer and the model parameters of the alignment model are then updated.
  • the Attention processing process by the alignment model is the same as the process described in the second embodiment, and after Softmax, each character in the Chinese training language material is also predicted.
  • the Loss is determined using the characters obtained in Chinese prediction and the expected characters of the Chinese language material, and the model parameters of the trained layers in the Chinese Transformer model trained layer by layer and the model parameters of the alignment model are then updated.
  • the bilingual parallel language materials are utilized fully, and the language material of the high-resource language is utilized fully, which further reduces the training cost, and improves the training effect of the semantic representation model of the low-resource language.
  • Fig. 5 is a structural diagram of an apparatus for training a semantic representation model according to a third embodiment of the present application, and as shown in Fig. 5 , the apparatus includes a first acquiring unit 01 and a training unit 02, and may further include a second acquiring unit 03.
  • the main functions of each constitutional unit are as follows.
  • the first acquiring unit 01 is configured to acquire a semantic representation model which has been trained for a first language as a first semantic representation model.
  • the training unit 02 is configured to take a bottom layer and a top layer of the first semantic representation model as trained layers, initialize the trained layers, keep model parameters of other layers unchanged, and train the trained layers using training language materials of a second language until a training ending condition is met; successively bring the untrained layers into the trained layers from bottom to top, and execute these layers respectively: keep the model parameters of other layers than the trained layers unchanged, and train the trained layers using the training language materials of the second language until the training ending condition is met respectively; and obtain a semantic representation model for the second language after all the layers are trained.
  • the bottom layer is configured as an embedding layer
  • the top layer is configured as a fully-connected layer.
  • the semantic representation model may be configured as a CNN, an RNN, a Transformer model, or the like.
  • the training language material of the second language includes a text with a mask in the second language and a character corresponding to the mask.
  • the training unit 02 When training each layer of the first semantic representation model, the training unit 02 has a training target that the prediction result of the mask by the top layer accords with the character corresponding to the mask in the training language material.
  • Each training process of the trained layer has a training target that the prediction result of the mask by the top layer meets expectation. That is, a loss function may be constructed according to the training target, and the model parameters of the trained layer may be optimized using values of the loss function.
  • Each trained layer has a training ending condition that Loss converges gradually or an iteration number reaches a preset threshold.
  • Utilization of some bilingual parallel language materials may further reduce a training cost and improve a training effect.
  • the second acquiring unit 03 is configured to acquire the semantic representation model trained for the first language as a second semantic representation model.
  • the training unit 02 When training the trained layers using the training language material of the second language, the training unit 02 inputs the parallel language material of the first language corresponding to the training language material of the second language into the second semantic representation model; and aligns an output result of the second semantic representation model with an output result of the first semantic representation model.
  • the training unit 02 may align the output result of the second semantic representation model with the output result of the first semantic representation model specifically by:
  • the training target is that the language material result of the mask in the training language material of the second language accords with the character corresponding to the mask in the training language material.
  • the training target is that the prediction result of each character in the training language material of the second language accords with each character in the training language material.
  • an electronic device and a readable storage medium.
  • Fig. 6 is a block diagram of an electronic device for a method for training a semantic representation model according to the embodiment of the present application.
  • the electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other appropriate computers.
  • the electronic device may also represent various forms of mobile apparatuses, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing apparatuses.
  • the components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementation of the present application described and/or claimed herein.
  • the electronic device includes one or more processors 601, a memory 602, and interfaces configured to connect the components, including high-speed interfaces and low-speed interfaces.
  • the components are interconnected using different buses and may be mounted at a common motherboard or in other manners as desired.
  • the processor may process instructions for execution within the electronic device, including instructions stored in or at the memory to display graphical information for a GUI at an external input/output apparatus, such as a display device coupled to the interface.
  • plural processors and/or plural buses may be used with plural memories, if desired.
  • plural electronic devices may be connected, with each device providing some of necessary operations (for example, as a server array, a group of blade servers, or a multi-processor system).
  • one processor 601 is taken as an example.
  • the memory 602 is configured as the non-transitory computer readable storage medium according to the present application.
  • the memory stores instructions executable by the at least one processor to cause the at least one processor to perform a method for training a semantic representation model according to the present application.
  • the non-transitory computer readable storage medium according to the present application stores computer instructions for causing a computer to perform the method for training a semantic representation model according to the present application.
  • the memory 602 which is a non-transitory computer readable storage medium may be configured to store non-transitory software programs, non-transitory computer executable programs and modules, such as program instructions/modules corresponding to the method for training a semantic representation model according to the embodiment of the present application.
  • the processor 601 executes various functional applications and data processing of a server, that is, implements the method for training a semantic representation model according to the above-mentioned embodiments, by running the non-transitory software programs, instructions, and modules stored in the memory 602.
  • the memory 602 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required for at least one function; the data storage area may store data created according to use of the electronic device, or the like. Furthermore, the memory 602 may include a high-speed random access memory, or a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid state storage devices. In some embodiments, optionally, the memory 602 may include memories remote from the processor 601, and such remote memories may be connected to the electronic device via a network. Examples of such a network include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the electronic device may further include an input apparatus 603 and an output apparatus 604.
  • the processor 601, the memory 602, the input apparatus 603 and the output apparatus 604 may be connected by a bus or other means, and Fig. 6 takes the connection by a bus as an example.
  • the input apparatus 603 may receive input numeric or character information and generate key signal input related to user settings and function control of the electronic device, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a trackball, a joystick, or the like.
  • the output apparatus 604 may include a display device, an auxiliary lighting apparatus (for example, an LED) and a tactile feedback apparatus (for example, a vibrating motor), or the like.
  • the display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
  • Various implementations of the systems and technologies described here may be implemented in digital electronic circuitry, integrated circuitry, application specific integrated circuits (ASIC), computer hardware, firmware, software, and/or combinations thereof.
  • the systems and technologies may be implemented in one or more computer programs which are executable and/or interpretable on a programmable system including at least one programmable processor, and the programmable processor may be special or general, and may receive data and instructions from, and transmitting data and instructions to, a storage system, at least one input apparatus, and at least one output apparatus.
  • a computer having: a display apparatus (for example, a cathode ray tube (CRT) or liquid crystal display (LCD) monitor) for displaying information to a user; and a keyboard and a pointing apparatus (for example, a mouse or a trackball) by which a user may provide input for the computer.
  • a display apparatus for example, a cathode ray tube (CRT) or liquid crystal display (LCD) monitor
  • a keyboard and a pointing apparatus for example, a mouse or a trackball
  • Other kinds of apparatuses may also be used to provide interaction with a user; for example, feedback provided for a user may be any form of sensory feedback (for example, visual feedback, auditory feedback, or tactile feedback); and input from a user may be received in any form (including acoustic, voice or tactile input).
  • the systems and technologies described here may be implemented in a computing system (for example, as a data server) which includes a back-end component, or a computing system (for example, an application server) which includes a middleware component, or a computing system (for example, a user computer having a graphical user interface or a web browser through which a user may interact with an implementation of the systems and technologies described here) which includes a front-end component, or a computing system which includes any combination of such back-end, middleware, or front-end components.
  • the components of the system may be interconnected through any form or medium of digital data communication (for example, a communication network). Examples of the communication network include: a local area network (LAN), a wide area network (WAN) and the Internet.
  • a computer system may include a client and a server.
  • the client and the server are remote from each other and interact through the communication network.
  • the relationship between the client and the server is generated by virtue of computer programs which run on respective computers and have a client-server relationship to each other.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The present application discloses a method and apparatus for training a semantic representation model, a device, a computer storage medium and a computer program product, which relates to the field of natural language processing technologies in artificial intelligence. An implementation includes: acquiring a semantic representation model which has been trained for a first language as a first semantic representation model; taking a bottom layer and a top layer of the first semantic representation model as trained layers, initializing the trained layers, keeping model parameters of other layers unchanged, and training the trained layers using training language materials of a second language until a training ending condition is met; successively bringing the untrained layers into the trained layers from bottom to top, and executing these layers respectively: keeping the model parameters of other layers than the trained layers unchanged, and training the trained layers using the training language materials of the second language until the training ending condition is met respectively; and obtaining a semantic representation model for the second language after all the layers are trained.

Description

    Field of the Disclosure
  • The present application relates to the technical field of computer applications, and particularly to an artificial intelligence technology.
  • Background of the Disclosure
  • In recent years, pre-trained models represented by Bidirectional Encoder Representation from Transformers (BERT) models have greatly improved the effect of Natural Language Processing (NLP) tasks. However, current mainstream semantic representation models focus on common languages, such as English, Chinese, French, German, or the like. However, there are thousands of languages in the world, most of which have fewer language materials than the common languages, such as English, or the like, and these languages are called low resource languages. A lot of computing resources are required for training the pre-trained models, which results in an expensive cost, and the cost of each model is as high as hundreds of thousands or even millions of yuan. Therefore, for each language, it is difficult to construct enough language materials for training. For the language with a quite small number of language materials, such as Czech, it is even difficult to collect enough language materials for training.
  • Summary of the Disclosure
  • In view of this, the present application provides a method and apparatus for training a semantic representation model, a device, a computer storage medium and a computer program product, for a language with a small number of language materials.
  • In a first aspect, the present application provides a method for training a semantic representation model, including:
    • acquiring a semantic representation model which has been trained for a first language as a first semantic representation model;
    • taking a bottom layer and a top layer of the first semantic representation model as trained layers, initializing the trained layers, keeping model parameters of other layers unchanged, and training the trained layers using training language materials of a second language until a training ending condition is met;
    • successively bringing the untrained layers into the trained layers from bottom to top, and executing these layers respectively: keeping the model parameters of other layers than the trained layers unchanged, and training the trained layers using the training language materials of the second language until the training ending condition is met respectively; and
    • obtaining a semantic representation model for the second language after all the layers are trained.
  • In a second aspect, the present application further provides an apparatus for training a semantic representation model, including:
    • a first acquiring unit configured to acquire a semantic representation model which has been trained for a first language as a first semantic representation model; and
    • a training unit configured to take a bottom layer and a top layer of the first semantic representation model as trained layers, initialize the trained layers, keep model parameters of other layers unchanged, and train the trained layers using training language materials of a second language until a training ending condition is met; successively bring the untrained layers into the trained layers from bottom to top, and execute these layers respectively: keep the model parameters of other layers than the trained layers unchanged, and train the trained layers using the training language materials of the second language until the training ending condition is met respectively; and obtain a semantic representation model for the second language after all the layers are trained.
  • In a third aspect, the present application provides an electronic device, including:
    • at least one processor; and
    • a memory connected with the at least one processor communicatively;
    • wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as mentioned above.
  • In a fourth aspect, the present application further provides a non-transitory computer readable storage medium with computer instructions stored thereon, wherein the computer instructions are used for causing a computer to perform the method as mentioned above.
  • In a fifth aspect, the present application further provides a computer program product, comprising instructions which, when the program is executed by a computer, cause the computer to perform the method as mentioned above.
  • In the present application, the trained semantic representation model for the existing language is used fully, and each layer is successively migrated and trained to obtain the semantic representation model for another language, which remarkably reduces the cost for collecting training samples for the language with a quite small number of language materials, and achieves a higher training efficiency.
  • Other effects of the above-mentioned alternatives will be described below in conjunction with embodiments.
  • Brief Description of Drawings
  • The drawings are used for better understanding the present solution and do not constitute a limitation of the present application. In the drawings:
    • Fig. 1 is a flow chart of a method for training a semantic representation model according to a first embodiment of the present application;
    • Fig. 2 is a schematic diagram of each stage of training the semantic representation model according to the first embodiment of the present application;
    • Fig. 3 is a schematic diagram of training a model using parallel language materials according to a second embodiment of the present application;
    • Fig. 4 is a diagram of an example of a working principle of an alignment model according to the second embodiment of the present application;
    • Fig. 5 is a structural diagram of an apparatus for training a semantic representation model according to a third embodiment of the present application; and
    • Fig. 6 is a block diagram of an electronic device configured to implement the embodiment of the present application.
    Detailed Description of Preferred Embodiments
  • The following part will illustrate exemplary embodiments of the present application with reference to the drawings, including various details of the embodiments of the present application for a better understanding. The embodiments should be regarded only as exemplary ones. Therefore, those skilled in the art should appreciate that various changes or modifications can be made with respect to the embodiments described herein without departing from the scope and spirit of the present application. Similarly, for clarity and conciseness, the descriptions of the known functions and structures are omitted in the descriptions below.
  • The present application has a core idea that a semantic representation model of a first language which is trained sufficiently is utilized to assist in training a semantic representation model of a second language. For convenience of description and understanding, examples referred in following embodiments are described with English as the first language and Chinese as the second language, but the present application is not limited thereto, and may be applied to any language.
  • In addition, a semantic representation model in the present application may be configured as a Recurrent Neural Network (RNN), a Convolutional Neural Network (CNN), a Transformer model, or the like. As a typical semantic representation model, the Transformer model is used as an example for describing the examples referred in following embodiments, and other models have similar implementation principles.
  • First Embodiment
  • Fig. 1 is a flow chart of a method for training a semantic representation model according to a first embodiment of the present application, and an apparatus for training a semantic representation model serves as a subject for executing this method, and may be configured as an application located in a computer system/server, or as a functional unit, such as a plug-in or Software Development Kit (SDK) located in the application in the computer system/server. As shown in Fig. 1, the method may include the following steps:
    101: Acquiring a semantic representation model which has been trained for a first language as a first semantic representation model.
  • For example, English serves as the first language; since English is internationally common and usually has many language materials, a semantic representation model, such as a Transformer model, may be easily and well trained using English. In this step, a trained English Transformer model is used as the first semantic representation model for a subsequent migration training process to assist in training a Chinese Transformer model.
  • 102: Taking a bottom layer and a top layer of the first semantic representation model as trained layers, initializing the trained layers, keeping model parameters of other layers unchanged, and training the trained layers using training language materials of a second language until a training ending condition is met.
  • For ease of understanding, the training language materials in the present application will be briefly described first. For the Transformer model, the language material is usually a text containing a mask and a character corresponding to the mask. Taking one training language material "
    Figure imgb0001
    [ mask]
    Figure imgb0002
    "("I [mask] an apple") as an example, the character corresponding to [mask] is "
    Figure imgb0003
    " ("ate") . Taking one training language material "
    Figure imgb0004
    [mask]
    Figure imgb0005
    [mask]
    Figure imgb0006
    " ("I run a long [mask] before I [mask] you"), the characters corresponding to [mask] are "
    Figure imgb0007
    " ("way") and "
    Figure imgb0008
    " ("find") respectively. The Transformer model has a function of predicting the character corresponding to the mask in the training language material and making the predicted result meet expectation (the character corresponding to the mask in the training language material) as much as possible.
  • The Transformer model has a multilayer structure, as shown in Fig. 2. A bottom layer is an embedding layer represented by Embedding Layer, and is configured to determine vector representation of each character in the training language material. A top layer is a fully-connected layer usually represented by Task Layer, and is configured to map the vector representation processed by each middle layer of the Transformer model, so as to obtain content prediction of the mask in the training language material. A plurality of layers are contained between the top layer and the bottom layer and usually represented by Transformer Block. Each Transformer Block is used for processing the input vector representation of each character into global vector representation with an Attention mechanism. Each Transformer Block refers to the global vector representation of the previous layer when performing the Attention mechanism. The working mechanism of each Transformer Block is not detailed here. For example, three Transformer Blocks exist in Fig. 2 in the embodiment of the present application.
  • Usually, the bottom layer of the Transformer model pays more attention on processing literal logic, the top layer pays more attention on semantic logic, and the semantic logic of the top layer has higher consistency for different languages. Based on this assumption, in the embodiment of the present application, each layer is trained successively, the bottom layer and the top layer are trained first, and then, each middle layer is trained in combination with the bottom layer and the top layer.
  • In stage (a) as shown in Fig. 2, the Embedding Layer and the Task Layer in the English Transformer model are initialized as the trained layers; that is, the model parameters are initialized. The parameters of the other layers, i.e., the Transformer Blocks, are kept unchanged; that is, the parameters of each Transformer Block still keep the model parameters obtained in the previous English training process. Then, Chinese training language materials are input to train the trained Embedding Layer and Task Layer.
  • Each training process of the trained layer has a training target that the prediction result of the mask by the Task Layer meets expectation. That is, a loss function may be constructed according to the training target, and the model parameters of the trained layer may be optimized using values of the loss function. Each trained layer has a training ending condition that Loss converges gradually or an iteration number reaches a preset threshold.
  • That is, in stage (a), in the process of training the Embedding Layer and the Task Layer using the Chinese training language materials, iteration is performed by the Loss, and the parameters of the Embedding Layer and the Task Layer are optimized gradually until the Loss converges gradually or the iteration number reaches the preset threshold.
  • 103: Successively bringing the untrained layers into the trained layers from bottom to top, and executing these layers respectively: keeping the model parameters of other layers than the trained layers unchanged, and training the trained layers using the training language materials of the second language until the training ending condition is met respectively.
  • In stage (b) as shown in Fig. 2, Transformer Block1 is first brought into the trained layers from bottom to top, and at this point, the trained layers include the Embedding Layer, the Task Layer and the Transformer Block1. The current parameters of the Embedding Layer and the Task Layer are parameters after the training process in 102, and the parameters of the Transformer Block1 are parameters of Transformer Block1 in the English Transformer model. The Embedding Layer, the Task Layer and the Transformer Block1 are trained with parameters of Transformer Block2 and Transformer Block3 kept unchanged. In the process of training the Embedding Layer, the Task Layer and the Transformer Block1 using the Chinese training language materials, iteration is performed by the Loss, and the parameters of the Embedding Layer, the Task Layer and the Transformer Block1 are optimized gradually until the Loss converges gradually or the iteration number reaches a preset threshold.
  • In stage (c) as shown in Fig. 2, the Transformer Block2 is brought into the trained layers from bottom to top, and at this point, the trained layers include the Embedding Layer, the Task Layer, the Transformer Block1 and the Transformer Block2. The Embedding Layer, the Task Layer, the Transformer Block1 and the Transformer Block2 are trained with parameters of the Transformer Block3 kept unchanged. In the process of training the Embedding Layer, the Task Layer, the Transformer Block1 and the Transformer Block2 using the Chinese training language materials, iteration is performed by the Loss, and the parameters of the Embedding Layer, the Task Layer, the Transformer Block1 and the Transformer Block2 are optimized gradually until the Loss converges gradually or the iteration number reaches a preset threshold.
  • In stage (d) as shown in Fig. 2, the Transformer Block3 is brought into the trained layers from bottom to top, and at this point, the trained layers include the Embedding Layer, the Task Layer, the Transformer Block1, the Transformer Block2 and the Transformer Block3. In the process of training the Embedding Layer, the Task Layer, the Transformer Block1, the Transformer Block2 and the Transformer Block3 using the Chinese training language materials, iteration is performed by the Loss, and the parameters of the Embedding Layer, the Task Layer, the Transformer Block1, the Transformer Block2 and the Transformer Block3 are optimized gradually until the Loss converges gradually or the iteration number reaches a preset threshold.
  • It is observed from the above-mentioned process that each English middle layer Transformer Block is actually used to perform warm start to train each Chinese Transformer Block. In addition, in addition to the above-mentioned way of training the middle layers one by one from bottom to top, if the number of the middle layers is large, the middle layers may be trained two by two from bottom to top, or more layers may be trained successively.
  • 104: Obtaining a semantic representation model for the second language after all the layers are trained.
  • After completion of the training process in stage (d) as shown in Fig. 2, the Chinese Transformer model is obtained, such that a gradual migration training process is performed from the trained English Transformer model to obtain the Chinese Transformer model.
  • In the present embodiment, a single language material, i.e., the Chinese language material, is used to train the Chinese Transformer model by means of migration from the English Transformer model. Utilization of some bilingual parallel language materials may further reduce a training cost and improve a training effect. At this point, the training process may be performed with a method in the second embodiment.
  • Second Embodiment
  • In the present embodiment, on the basis of the first embodiment, the semantic representation model trained in the first language is further acquired as a second semantic representation model. The first semantic representation model is used as a basis for performing the layer-by-layer migration training process, and the second semantic representation model is configured to align a result of the first language output by the second semantic representation model and a result output by the first semantic representation model in the process of training the semantic representation model of the second language.
  • Here, an additional alignment model is required to assist the migration training process of the first semantic representation model, and configured to perform the above-mentioned alignment.
  • Taking the training process in stage (a) in Fig. 2 as an example, as shown in Fig. 3, the English training language material in the Chinese-English parallel language materials is input into the pre-trained English Transformer model, and an English result output by the Task Layer is input into the alignment model. Meanwhile, the Chinese training language material corresponding to the English training language material is input into the Chinese Transformer model in the training process corresponding to the stage (a), and a Chinese result output by the Task Layer is also input into the alignment model. The alignment model processes the output result of the English Transformer model with the Attention mechanism using the output result of the Chinese Transformer model being trained, and then maps an Attention processing result to obtain the prediction result of the mask in the Chinese training language material. Similarly, the training target is that the prediction result of the mask conforms to the expected character in the training language material. The Loss is constructed using the prediction result of the alignment model, the parameters of the Chinese Transformer model (i.e., the model parameters of the trained layers) being trained are optimized using the values of the Loss, and meanwhile, model parameters of the alignment model are optimized.
  • In the Attention processing process of the alignment model, it is assumed that the character output by the Chinese Transformer model is represented as xi , and the character output by the English Transformer model is represented as yi A dot product of xi and yi is represented as Aij and yi is weighted with Aij : x i ʹ = j = 1 n A ij * y j
    Figure imgb0009
    wherein n is the total number of characters output by the English Transformer model.
  • Then, the fully-connection layer is mapped (Softmax) using a vector formed by each x i ʹ
    Figure imgb0010
    obtained after the Attention processing process, so as to predict a mask value in the Chinese training language material.
  • Similar to the training process of other stages, the output result of the English Transformer model is also aligned, and details are not repeated.
  • For example, it is assumed that there exist such a set of parallel language materials:
    • English: I ate an apple.
    • Chinese:
      Figure imgb0011
      [mask]
      Figure imgb0012
      .
  • The desired character of the mask is "
    Figure imgb0013
    " ("ate") .
  • As shown in Fig. 4, the Chinese language material and a position identifier of each character (in the drawing, the position identifier of "
    Figure imgb0014
    " is "0", the position identifier of [mask] is "1", and so on) are input into the Chinese Transformer model in the training process. The parallel English language material and a position identifier of each character (in the drawing, the position identifier of "I" is "0", the position identifier of "ate" is "1", and so on) are input into the trained English Transformer model. Each English character output by the English Transformer model and each Chinese character output by the Chinese Transformer model are output to the alignment model, and after performing Attention on the output result of the English Transformer model using the output result of the Chinese Transformer model, the alignment model performs a Softmax mapping operation on a result obtained by the Attention to obtain each Chinese character in Chinese prediction. The Loss is determined using the characters obtained in Chinese prediction and the expected characters of the Chinese language material, and the model parameters of the trained layers in the Chinese Transformer model trained layer by layer and the model parameters of the alignment model are then updated.
  • In addition, in the above-mentioned second embodiment, if the bilingual parallel language materials are adopted, adopted training data may not be masked. For example, it is assumed that there exist such a set of parallel language materials:
    • English: I ate an apple.
    • Chinese:
      Figure imgb0015
      .
  • The Attention processing process by the alignment model is the same as the process described in the second embodiment, and after Softmax, each character in the Chinese training language material is also predicted. The Loss is determined using the characters obtained in Chinese prediction and the expected characters of the Chinese language material, and the model parameters of the trained layers in the Chinese Transformer model trained layer by layer and the model parameters of the alignment model are then updated.
  • In the way in the second embodiment, the bilingual parallel language materials are utilized fully, and the language material of the high-resource language is utilized fully, which further reduces the training cost, and improves the training effect of the semantic representation model of the low-resource language.
  • The method according to the present application is described above in detail, and an apparatus according to the present application will be described below in detail in conjunction with an embodiment.
  • Third Embodiment
  • Fig. 5 is a structural diagram of an apparatus for training a semantic representation model according to a third embodiment of the present application, and as shown in Fig. 5, the apparatus includes a first acquiring unit 01 and a training unit 02, and may further include a second acquiring unit 03. The main functions of each constitutional unit are as follows.
  • The first acquiring unit 01 is configured to acquire a semantic representation model which has been trained for a first language as a first semantic representation model.
  • The training unit 02 is configured to take a bottom layer and a top layer of the first semantic representation model as trained layers, initialize the trained layers, keep model parameters of other layers unchanged, and train the trained layers using training language materials of a second language until a training ending condition is met; successively bring the untrained layers into the trained layers from bottom to top, and execute these layers respectively: keep the model parameters of other layers than the trained layers unchanged, and train the trained layers using the training language materials of the second language until the training ending condition is met respectively; and obtain a semantic representation model for the second language after all the layers are trained.
  • The bottom layer is configured as an embedding layer, and the top layer is configured as a fully-connected layer. The semantic representation model may be configured as a CNN, an RNN, a Transformer model, or the like.
  • The training language material of the second language includes a text with a mask in the second language and a character corresponding to the mask.
  • When training each layer of the first semantic representation model, the training unit 02 has a training target that the prediction result of the mask by the top layer accords with the character corresponding to the mask in the training language material.
  • Each training process of the trained layer has a training target that the prediction result of the mask by the top layer meets expectation. That is, a loss function may be constructed according to the training target, and the model parameters of the trained layer may be optimized using values of the loss function. Each trained layer has a training ending condition that Loss converges gradually or an iteration number reaches a preset threshold.
  • Utilization of some bilingual parallel language materials may further reduce a training cost and improve a training effect. At this point, the second acquiring unit 03 is configured to acquire the semantic representation model trained for the first language as a second semantic representation model.
  • When training the trained layers using the training language material of the second language, the training unit 02 inputs the parallel language material of the first language corresponding to the training language material of the second language into the second semantic representation model; and aligns an output result of the second semantic representation model with an output result of the first semantic representation model.
  • Specifically, the training unit 02 may align the output result of the second semantic representation model with the output result of the first semantic representation model specifically by:
    • inputting the output result of the first semantic representation model and the output result of the second semantic representation model into an alignment model; and
    • processing, by the alignment model, the output result of the second semantic representation model with an attention mechanism using the output result of the first semantic representation model, and mapping a processing result of the attention mechanism to obtain a prediction result of the character in the training language material of the second language.
  • If the training language material of the second language in the parallel language materials includes a text with a mask in the second language and a character corresponding to the mask, the training target is that the language material result of the mask in the training language material of the second language accords with the character corresponding to the mask in the training language material.
  • If the training language material of the second language in the parallel language materials is a text without a mask in the second language, the training target is that the prediction result of each character in the training language material of the second language accords with each character in the training language material.
  • According to the embodiment of the present application, there are also provided an electronic device and a readable storage medium.
  • Fig. 6 is a block diagram of an electronic device for a method for training a semantic representation model according to the embodiment of the present application. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other appropriate computers. The electronic device may also represent various forms of mobile apparatuses, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing apparatuses. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementation of the present application described and/or claimed herein.
  • As shown in Fig. 6, the electronic device includes one or more processors 601, a memory 602, and interfaces configured to connect the components, including high-speed interfaces and low-speed interfaces. The components are interconnected using different buses and may be mounted at a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or at the memory to display graphical information for a GUI at an external input/output apparatus, such as a display device coupled to the interface. In other implementations, plural processors and/or plural buses may be used with plural memories, if desired. Also, plural electronic devices may be connected, with each device providing some of necessary operations (for example, as a server array, a group of blade servers, or a multi-processor system). In Fig. 6, one processor 601 is taken as an example.
  • The memory 602 is configured as the non-transitory computer readable storage medium according to the present application. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform a method for training a semantic representation model according to the present application. The non-transitory computer readable storage medium according to the present application stores computer instructions for causing a computer to perform the method for training a semantic representation model according to the present application.
  • The memory 602 which is a non-transitory computer readable storage medium may be configured to store non-transitory software programs, non-transitory computer executable programs and modules, such as program instructions/modules corresponding to the method for training a semantic representation model according to the embodiment of the present application. The processor 601 executes various functional applications and data processing of a server, that is, implements the method for training a semantic representation model according to the above-mentioned embodiments, by running the non-transitory software programs, instructions, and modules stored in the memory 602.
  • The memory 602 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required for at least one function; the data storage area may store data created according to use of the electronic device, or the like. Furthermore, the memory 602 may include a high-speed random access memory, or a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid state storage devices. In some embodiments, optionally, the memory 602 may include memories remote from the processor 601, and such remote memories may be connected to the electronic device via a network. Examples of such a network include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • The electronic device may further include an input apparatus 603 and an output apparatus 604. The processor 601, the memory 602, the input apparatus 603 and the output apparatus 604 may be connected by a bus or other means, and Fig. 6 takes the connection by a bus as an example.
  • The input apparatus 603 may receive input numeric or character information and generate key signal input related to user settings and function control of the electronic device, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a trackball, a joystick, or the like. The output apparatus 604 may include a display device, an auxiliary lighting apparatus (for example, an LED) and a tactile feedback apparatus (for example, a vibrating motor), or the like. The display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
  • Various implementations of the systems and technologies described here may be implemented in digital electronic circuitry, integrated circuitry, application specific integrated circuits (ASIC), computer hardware, firmware, software, and/or combinations thereof. The systems and technologies may be implemented in one or more computer programs which are executable and/or interpretable on a programmable system including at least one programmable processor, and the programmable processor may be special or general, and may receive data and instructions from, and transmitting data and instructions to, a storage system, at least one input apparatus, and at least one output apparatus.
  • These computer programs (also known as programs, software, software applications, or codes) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine readable medium" and "computer readable medium" refer to any computer program product, device and/or apparatus (for example, magnetic discs, optical disks, memories, programmable logic devices (PLD)) for providing machine instructions and/or data for a programmable processor, including a machine readable medium which receives machine instructions as a machine readable signal. The term "machine readable signal" refers to any signal for providing machine instructions and/or data for a programmable processor.
  • To provide interaction with a user, the systems and technologies described here may be implemented on a computer having: a display apparatus (for example, a cathode ray tube (CRT) or liquid crystal display (LCD) monitor) for displaying information to a user; and a keyboard and a pointing apparatus (for example, a mouse or a trackball) by which a user may provide input for the computer. Other kinds of apparatuses may also be used to provide interaction with a user; for example, feedback provided for a user may be any form of sensory feedback (for example, visual feedback, auditory feedback, or tactile feedback); and input from a user may be received in any form (including acoustic, voice or tactile input).
  • The systems and technologies described here may be implemented in a computing system (for example, as a data server) which includes a back-end component, or a computing system (for example, an application server) which includes a middleware component, or a computing system (for example, a user computer having a graphical user interface or a web browser through which a user may interact with an implementation of the systems and technologies described here) which includes a front-end component, or a computing system which includes any combination of such back-end, middleware, or front-end components. The components of the system may be interconnected through any form or medium of digital data communication (for example, a communication network). Examples of the communication network include: a local area network (LAN), a wide area network (WAN) and the Internet.
  • A computer system may include a client and a server. Generally, the client and the server are remote from each other and interact through the communication network. The relationship between the client and the server is generated by virtue of computer programs which run on respective computers and have a client-server relationship to each other.
  • It should be understood that various forms of the flows shown above may be used and reordered, and steps may be added or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, which is not limited herein as long as the desired results of the technical solution disclosed in the present application may be achieved.
  • The above-mentioned implementations are not intended to limit the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent substitution and improvement made within the spirit and principle of the present application all should be included in the extent of protection of the present application.

Claims (15)

  1. A method for training a semantic representation model, comprising:
    acquiring (101) a semantic representation model which has been trained for a first language as a first semantic representation model;
    taking (102) a bottom layer and a top layer of the first semantic representation model as trained layers, initializing (102) the trained layers, keeping (102) model parameters of other layers unchanged, and training (102) the trained layers using training language materials of a second language until a training ending condition is met;
    successively bringing (103) the untrained layers into the trained layers from bottom to top, and executing (103) these layers respectively: keeping (103) the model parameters of other layers than the trained layers unchanged, and training (103) the trained layers using the training language materials of the second language until the training ending condition is met respectively; and
    obtaining (104) a semantic representation model for the second language after all the layers are trained.
  2. The method according to claim 1, wherein the semantic representation model comprises a Transformer model.
  3. The method according to claim 1 or 2, wherein the training language material of the second language comprises a text with a mask in the second language and a character corresponding to the mask; and
    the training process of each layer of the first semantic representation model has a training target that the prediction result of the mask by the top layer accords with the character corresponding to the mask in the training language material.
  4. The method according to claim 1 or 2, further comprising:
    acquiring the semantic representation model trained for the first language as a second semantic representation model; and
    when the trained layers are trained using the training language material of the second language, inputting the parallel language material of the first language corresponding to the training language material of the second language into the second semantic representation model; and aligning an output result of the second semantic representation model with an output result of the first semantic representation model.
  5. The method according to claim 4, wherein
    the aligning an output result of the second semantic representation model with an output result of the first semantic representation model comprises:
    inputting the output result of the first semantic representation model and the output result of the second semantic representation model into an alignment model; and
    processing, by the alignment model, the output result of the second semantic representation model with an attention mechanism using the output result of the first semantic representation model, and mapping a processing result of the attention mechanism to obtain a prediction result of the character in the training language material of the second language.
  6. The method according to claim 5, wherein if the training language material of the second language comprises a text with a mask in the second language and a character corresponding to the mask, the training target is that the prediction result of the mask in the training language material of the second language accords with the character corresponding to the mask in the training language material; and
    if the training language material of the second language is a text without a mask in the second language, the training target is that the prediction result of each character in the training language material of the second language accords with each character in the training language material.
  7. An apparatus for training a semantic representation model, comprising:
    a first acquiring unit (01) configured to acquire a semantic representation model which has been trained for a first language as a first semantic representation model; and
    a training unit (02) configured to take a bottom layer and a top layer of the first semantic representation model as trained layers, initialize the trained layers, keep model parameters of other layers unchanged, and train the trained layers using training language materials of a second language until a training ending condition is met; successively bring the untrained layers into the trained layers from bottom to top, and execute these layers respectively: keep the model parameters of other layers than the trained layers unchanged, and train the trained layers using the training language materials of the second language until the training ending condition is met respectively; and obtain a semantic representation model for the second language after all the layers are trained.
  8. The apparatus according to claim 7, wherein the semantic representation model comprises a Transformer model.
  9. The apparatus according to claim 7 or 8, wherein the training language material of the second language comprises a text with a mask in the second language and a character corresponding to the mask; and
    when training each layer of the first semantic representation model, the training unit (02) has a training target that the prediction result of the mask by the top layer accords with the character corresponding to the mask in the training language material.
  10. The apparatus according to claim 7 or 8, further comprising:
    a second acquiring unit (03) configured to acquire the semantic representation model trained for the first language as a second semantic representation model;
    wherein the training unit (02) is further configured to, when the trained layers are trained using the training language material of the second language, input the parallel language material of the first language corresponding to the training language material of the second language into the second semantic representation model; and align an output result of the second semantic representation model with an output result of the first semantic representation model.
  11. The apparatus according to claim 10, wherein the training unit (02) aligns the output result of the second semantic representation model with the output result of the first semantic representation model specifically by:
    inputting the output result of the first semantic representation model and the output result of the second semantic representation model into an alignment model; and
    processing, by the alignment model, the output result of the second semantic representation model with an attention mechanism using the output result of the first semantic representation model, and mapping a processing result of the attention mechanism to obtain a prediction result of the character in the training language material of the second language.
  12. The apparatus according to claim 10, wherein if the training language material of the second language comprises a text with a mask in the second language and a character corresponding to the mask, the training target is that the prediction result of the mask in the training language material of the second language accords with the character corresponding to the mask in the training language material; and
    if the training language material of the second language is a text without a mask in the second language, the training target is that the prediction result of each character in the training language material of the second language accords with each character in the training language material.
  13. An electronic device, comprising:
    at least one processor;
    a memory connected with the at least one processor communicatively;
    wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method according to any one of claims 1 to 6.
  14. A non-transitory computer readable storage medium with computer instructions stored thereon, wherein the computer instructions are used for causing a computer to perform the method according to any one of claims 1 to 6.
  15. A computer program product comprising instructions which, when the program is executed by a computer, cause the computer to perform the method according to any one of claims 1 to 6.
EP21163589.1A 2020-07-06 2021-03-19 Method and apparatus for training semantic representation model, device and computer storage medium Ceased EP3937060A1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010638228.9A CN111539227B (en) 2020-07-06 2020-07-06 Method, apparatus, device and computer storage medium for training semantic representation model

Publications (1)

Publication Number Publication Date
EP3937060A1 true EP3937060A1 (en) 2022-01-12

Family

ID=71968594

Family Applications (1)

Application Number Title Priority Date Filing Date
EP21163589.1A Ceased EP3937060A1 (en) 2020-07-06 2021-03-19 Method and apparatus for training semantic representation model, device and computer storage medium

Country Status (5)

Country Link
US (1) US11914964B2 (en)
EP (1) EP3937060A1 (en)
JP (1) JP7267342B2 (en)
KR (1) KR102567635B1 (en)
CN (1) CN111539227B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11475226B2 (en) * 2020-09-21 2022-10-18 International Business Machines Corporation Real-time optimized translation
CN112528669B (en) * 2020-12-01 2023-08-11 北京百度网讯科技有限公司 Training method and device for multilingual model, electronic equipment and readable storage medium
CN113033801A (en) * 2021-03-04 2021-06-25 北京百度网讯科技有限公司 Pre-training method and device of neural network model, electronic equipment and medium
CN112989844A (en) * 2021-03-10 2021-06-18 北京奇艺世纪科技有限公司 Model training and text recognition method, device, equipment and storage medium
CN113011126B (en) * 2021-03-11 2023-06-30 腾讯科技(深圳)有限公司 Text processing method, text processing device, electronic equipment and computer readable storage medium
CN113590865B (en) * 2021-07-09 2022-11-22 北京百度网讯科技有限公司 Training method of image search model and image search method
CN114926460B (en) * 2022-07-19 2022-10-25 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Training method of fundus image classification model, and fundus image classification method and system
CN115982583A (en) * 2022-12-30 2023-04-18 北京百度网讯科技有限公司 Training method, device, equipment and medium for pre-training language model
CN116932728B (en) * 2023-08-30 2024-01-26 苏州浪潮智能科技有限公司 Language interaction method, device, communication equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209817A (en) * 2019-05-31 2019-09-06 安徽省泰岳祥升软件有限公司 Training method and device of text processing model and text processing method
CN110717339A (en) * 2019-12-12 2020-01-21 北京百度网讯科技有限公司 Semantic representation model processing method and device, electronic equipment and storage medium
CN111159416A (en) * 2020-04-02 2020-05-15 腾讯科技(深圳)有限公司 Language task model training method and device, electronic equipment and storage medium
CN111310474A (en) * 2020-01-20 2020-06-19 桂林电子科技大学 Online course comment sentiment analysis method based on activation-pooling enhanced BERT model

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846126B (en) * 2018-06-29 2021-07-27 北京百度网讯科技有限公司 Generation of associated problem aggregation model, question-answer type aggregation method, device and equipment
CN111160016B (en) * 2019-04-15 2022-05-03 深圳碳云智能数字生命健康管理有限公司 Semantic recognition method and device, computer readable storage medium and computer equipment
US11586930B2 (en) * 2019-04-16 2023-02-21 Microsoft Technology Licensing, Llc Conditional teacher-student learning for model training
US11604965B2 (en) * 2019-05-16 2023-03-14 Salesforce.Com, Inc. Private deep learning
US11620515B2 (en) * 2019-11-07 2023-04-04 Salesforce.Com, Inc. Multi-task knowledge distillation for language model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209817A (en) * 2019-05-31 2019-09-06 安徽省泰岳祥升软件有限公司 Training method and device of text processing model and text processing method
CN110717339A (en) * 2019-12-12 2020-01-21 北京百度网讯科技有限公司 Semantic representation model processing method and device, electronic equipment and storage medium
CN111310474A (en) * 2020-01-20 2020-06-19 桂林电子科技大学 Online course comment sentiment analysis method based on activation-pooling enhanced BERT model
CN111159416A (en) * 2020-04-02 2020-05-15 腾讯科技(深圳)有限公司 Language task model training method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JACOB DEVLIN ET AL: "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 11 October 2018 (2018-10-11), XP080923817 *

Also Published As

Publication number Publication date
CN111539227B (en) 2020-12-18
US11914964B2 (en) 2024-02-27
JP2022014429A (en) 2022-01-19
KR102567635B1 (en) 2023-08-16
JP7267342B2 (en) 2023-05-01
KR20220005384A (en) 2022-01-13
US20220004716A1 (en) 2022-01-06
CN111539227A (en) 2020-08-14

Similar Documents

Publication Publication Date Title
EP3937060A1 (en) Method and apparatus for training semantic representation model, device and computer storage medium
KR102484617B1 (en) Method and apparatus for generating model for representing heterogeneous graph node, electronic device, storage medium and program
EP3866025A1 (en) Natural language and knowledge graph-based method and device for representating learning
EP3933659A1 (en) Method and apparatus for generating relationship of events, electronic device, and storage medium
EP3851977A1 (en) Method, apparatus, electronic device, and storage medium for extracting spo triples
JP2022018095A (en) Multi-modal pre-training model acquisition method, apparatus, electronic device and storage medium
US11995560B2 (en) Method and apparatus for generating vector representation of knowledge graph
EP3916613A1 (en) Method and apparatus for obtaining word vectors based on language model, device and storage medium
JP2021174516A (en) Knowledge graph construction method, device, electronic equipment, storage medium, and computer program
JP7222040B2 (en) Model training, image processing method and device, storage medium, program product
CN111582477B (en) Training method and device for neural network model
JP7044839B2 (en) End-to-end model training methods and equipment
JP7297038B2 (en) Neural network model pre-training method, device, electronic device and medium
CN112528669B (en) Training method and device for multilingual model, electronic equipment and readable storage medium
EP3852013A1 (en) Method, apparatus, and storage medium for predicting punctuation in text
JP2021192289A (en) Method, apparatus, electronic device and medium for adversarial training of machine learning model
US11321370B2 (en) Method for generating question answering robot and computer device
CN111709252A (en) Model improvement method and device based on pre-trained semantic model
CN112529180A (en) Method and apparatus for model distillation
CN111611808A (en) Method and apparatus for generating natural language model
CN112270169B (en) Method and device for predicting dialogue roles, electronic equipment and storage medium
CN111310481B (en) Speech translation method, device, computer equipment and storage medium
CN114490968B (en) Dialogue state tracking method, model training method and device and electronic equipment
CN111859981B (en) Language model acquisition and Chinese semantic understanding method, device and storage medium
US20210390255A1 (en) Text prediction method, device and storage medium

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20210319

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

B565 Issuance of search results under rule 164(2) epc

Effective date: 20211005

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20220704

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 20230622