EP3937060A1 - Procédé et appareil d'apprentissage de modèle de représentation sémantique, dispositif et support d'enregistrement informatique - Google Patents

Procédé et appareil d'apprentissage de modèle de représentation sémantique, dispositif et support d'enregistrement informatique Download PDF

Info

Publication number
EP3937060A1
EP3937060A1 EP21163589.1A EP21163589A EP3937060A1 EP 3937060 A1 EP3937060 A1 EP 3937060A1 EP 21163589 A EP21163589 A EP 21163589A EP 3937060 A1 EP3937060 A1 EP 3937060A1
Authority
EP
European Patent Office
Prior art keywords
language
training
semantic representation
layers
representation model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP21163589.1A
Other languages
German (de)
English (en)
Inventor
Shuohuan Wang
Jiaxiang Liu
Xuan OUYANG
Yu Sun
Hua Wu
Haifeng Wang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Publication of EP3937060A1 publication Critical patent/EP3937060A1/fr
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • the present application relates to the technical field of computer applications, and particularly to an artificial intelligence technology.
  • the present application provides a method and apparatus for training a semantic representation model, a device, a computer storage medium and a computer program product, for a language with a small number of language materials.
  • the present application provides a method for training a semantic representation model, including:
  • the present application further provides an apparatus for training a semantic representation model, including:
  • the present application provides an electronic device, including:
  • the present application further provides a non-transitory computer readable storage medium with computer instructions stored thereon, wherein the computer instructions are used for causing a computer to perform the method as mentioned above.
  • the present application further provides a computer program product, comprising instructions which, when the program is executed by a computer, cause the computer to perform the method as mentioned above.
  • the trained semantic representation model for the existing language is used fully, and each layer is successively migrated and trained to obtain the semantic representation model for another language, which remarkably reduces the cost for collecting training samples for the language with a quite small number of language materials, and achieves a higher training efficiency.
  • the present application has a core idea that a semantic representation model of a first language which is trained sufficiently is utilized to assist in training a semantic representation model of a second language.
  • a semantic representation model of a first language which is trained sufficiently is utilized to assist in training a semantic representation model of a second language.
  • examples referred in following embodiments are described with English as the first language and Chinese as the second language, but the present application is not limited thereto, and may be applied to any language.
  • a semantic representation model in the present application may be configured as a Recurrent Neural Network (RNN), a Convolutional Neural Network (CNN), a Transformer model, or the like.
  • RNN Recurrent Neural Network
  • CNN Convolutional Neural Network
  • Transformer model is used as an example for describing the examples referred in following embodiments, and other models have similar implementation principles.
  • Fig. 1 is a flow chart of a method for training a semantic representation model according to a first embodiment of the present application, and an apparatus for training a semantic representation model serves as a subject for executing this method, and may be configured as an application located in a computer system/server, or as a functional unit, such as a plug-in or Software Development Kit (SDK) located in the application in the computer system/server.
  • the method may include the following steps: 101: Acquiring a semantic representation model which has been trained for a first language as a first semantic representation model.
  • English serves as the first language; since English is internationally common and usually has many language materials, a semantic representation model, such as a Transformer model, may be easily and well trained using English. In this step, a trained English Transformer model is used as the first semantic representation model for a subsequent migration training process to assist in training a Chinese Transformer model.
  • a semantic representation model such as a Transformer model
  • a bottom layer and a top layer of the first semantic representation model as trained layers, initializing the trained layers, keeping model parameters of other layers unchanged, and training the trained layers using training language materials of a second language until a training ending condition is met.
  • the language material is usually a text containing a mask and a character corresponding to the mask.
  • the character corresponding to [mask] is " " ("ate") .
  • the characters corresponding to [mask] are " " "way") and " " " " "find") respectively.
  • the Transformer model has a function of predicting the character corresponding to the mask in the training language material and making the predicted result meet expectation (the character corresponding to the mask in the training language material) as much as possible.
  • the Transformer model has a multilayer structure, as shown in Fig. 2 .
  • a bottom layer is an embedding layer represented by Embedding Layer, and is configured to determine vector representation of each character in the training language material.
  • a top layer is a fully-connected layer usually represented by Task Layer, and is configured to map the vector representation processed by each middle layer of the Transformer model, so as to obtain content prediction of the mask in the training language material.
  • a plurality of layers are contained between the top layer and the bottom layer and usually represented by Transformer Block.
  • Each Transformer Block is used for processing the input vector representation of each character into global vector representation with an Attention mechanism.
  • Each Transformer Block refers to the global vector representation of the previous layer when performing the Attention mechanism.
  • the working mechanism of each Transformer Block is not detailed here. For example, three Transformer Blocks exist in Fig. 2 in the embodiment of the present application.
  • the bottom layer of the Transformer model pays more attention on processing literal logic
  • the top layer pays more attention on semantic logic
  • the semantic logic of the top layer has higher consistency for different languages.
  • each layer is trained successively, the bottom layer and the top layer are trained first, and then, each middle layer is trained in combination with the bottom layer and the top layer.
  • stage (a) as shown in Fig. 2 the Embedding Layer and the Task Layer in the English Transformer model are initialized as the trained layers; that is, the model parameters are initialized.
  • the parameters of the other layers, i.e., the Transformer Blocks, are kept unchanged; that is, the parameters of each Transformer Block still keep the model parameters obtained in the previous English training process.
  • Chinese training language materials are input to train the trained Embedding Layer and Task Layer.
  • Each training process of the trained layer has a training target that the prediction result of the mask by the Task Layer meets expectation. That is, a loss function may be constructed according to the training target, and the model parameters of the trained layer may be optimized using values of the loss function.
  • Each trained layer has a training ending condition that Loss converges gradually or an iteration number reaches a preset threshold.
  • stage (a) in the process of training the Embedding Layer and the Task Layer using the Chinese training language materials, iteration is performed by the Loss, and the parameters of the Embedding Layer and the Task Layer are optimized gradually until the Loss converges gradually or the iteration number reaches the preset threshold.
  • Transformer Block1 is first brought into the trained layers from bottom to top, and at this point, the trained layers include the Embedding Layer, the Task Layer and the Transformer Block1.
  • the current parameters of the Embedding Layer and the Task Layer are parameters after the training process in 102, and the parameters of the Transformer Block1 are parameters of Transformer Block1 in the English Transformer model.
  • the Embedding Layer, the Task Layer and the Transformer Block1 are trained with parameters of Transformer Block2 and Transformer Block3 kept unchanged.
  • the Transformer Block2 is brought into the trained layers from bottom to top, and at this point, the trained layers include the Embedding Layer, the Task Layer, the Transformer Block1 and the Transformer Block2.
  • the Embedding Layer, the Task Layer, the Transformer Block1 and the Transformer Block2 are trained with parameters of the Transformer Block3 kept unchanged.
  • iteration is performed by the Loss, and the parameters of the Embedding Layer, the Task Layer, the Transformer Block1 and the Transformer Block2 are optimized gradually until the Loss converges gradually or the iteration number reaches a preset threshold.
  • the Transformer Block3 is brought into the trained layers from bottom to top, and at this point, the trained layers include the Embedding Layer, the Task Layer, the Transformer Block1, the Transformer Block2 and the Transformer Block3.
  • iteration is performed by the Loss, and the parameters of the Embedding Layer, the Task Layer, the Transformer Block1, the Transformer Block2 and the Transformer Block3 are optimized gradually until the Loss converges gradually or the iteration number reaches a preset threshold.
  • each English middle layer Transformer Block is actually used to perform warm start to train each Chinese Transformer Block.
  • the middle layers may be trained two by two from bottom to top, or more layers may be trained successively.
  • the Chinese Transformer model is obtained, such that a gradual migration training process is performed from the trained English Transformer model to obtain the Chinese Transformer model.
  • a single language material i.e., the Chinese language material
  • the Chinese language material is used to train the Chinese Transformer model by means of migration from the English Transformer model.
  • Utilization of some bilingual parallel language materials may further reduce a training cost and improve a training effect.
  • the training process may be performed with a method in the second embodiment.
  • the semantic representation model trained in the first language is further acquired as a second semantic representation model.
  • the first semantic representation model is used as a basis for performing the layer-by-layer migration training process
  • the second semantic representation model is configured to align a result of the first language output by the second semantic representation model and a result output by the first semantic representation model in the process of training the semantic representation model of the second language.
  • an additional alignment model is required to assist the migration training process of the first semantic representation model, and configured to perform the above-mentioned alignment.
  • the English training language material in the Chinese-English parallel language materials is input into the pre-trained English Transformer model, and an English result output by the Task Layer is input into the alignment model.
  • the Chinese training language material corresponding to the English training language material is input into the Chinese Transformer model in the training process corresponding to the stage (a), and a Chinese result output by the Task Layer is also input into the alignment model.
  • the alignment model processes the output result of the English Transformer model with the Attention mechanism using the output result of the Chinese Transformer model being trained, and then maps an Attention processing result to obtain the prediction result of the mask in the Chinese training language material.
  • the training target is that the prediction result of the mask conforms to the expected character in the training language material.
  • the Loss is constructed using the prediction result of the alignment model, the parameters of the Chinese Transformer model (i.e., the model parameters of the trained layers) being trained are optimized using the values of the Loss, and meanwhile, model parameters of the alignment model are optimized.
  • the fully-connection layer is mapped (Softmax) using a vector formed by each x i ⁇ obtained after the Attention processing process, so as to predict a mask value in the Chinese training language material.
  • the desired character of the mask is " " ("ate") .
  • the Chinese language material and a position identifier of each character are input into the Chinese Transformer model in the training process.
  • the parallel English language material and a position identifier of each character are input into the trained English Transformer model.
  • Each English character output by the English Transformer model and each Chinese character output by the Chinese Transformer model are output to the alignment model, and after performing Attention on the output result of the English Transformer model using the output result of the Chinese Transformer model, the alignment model performs a Softmax mapping operation on a result obtained by the Attention to obtain each Chinese character in Chinese prediction.
  • the Loss is determined using the characters obtained in Chinese prediction and the expected characters of the Chinese language material, and the model parameters of the trained layers in the Chinese Transformer model trained layer by layer and the model parameters of the alignment model are then updated.
  • the Attention processing process by the alignment model is the same as the process described in the second embodiment, and after Softmax, each character in the Chinese training language material is also predicted.
  • the Loss is determined using the characters obtained in Chinese prediction and the expected characters of the Chinese language material, and the model parameters of the trained layers in the Chinese Transformer model trained layer by layer and the model parameters of the alignment model are then updated.
  • the bilingual parallel language materials are utilized fully, and the language material of the high-resource language is utilized fully, which further reduces the training cost, and improves the training effect of the semantic representation model of the low-resource language.
  • Fig. 5 is a structural diagram of an apparatus for training a semantic representation model according to a third embodiment of the present application, and as shown in Fig. 5 , the apparatus includes a first acquiring unit 01 and a training unit 02, and may further include a second acquiring unit 03.
  • the main functions of each constitutional unit are as follows.
  • the first acquiring unit 01 is configured to acquire a semantic representation model which has been trained for a first language as a first semantic representation model.
  • the training unit 02 is configured to take a bottom layer and a top layer of the first semantic representation model as trained layers, initialize the trained layers, keep model parameters of other layers unchanged, and train the trained layers using training language materials of a second language until a training ending condition is met; successively bring the untrained layers into the trained layers from bottom to top, and execute these layers respectively: keep the model parameters of other layers than the trained layers unchanged, and train the trained layers using the training language materials of the second language until the training ending condition is met respectively; and obtain a semantic representation model for the second language after all the layers are trained.
  • the bottom layer is configured as an embedding layer
  • the top layer is configured as a fully-connected layer.
  • the semantic representation model may be configured as a CNN, an RNN, a Transformer model, or the like.
  • the training language material of the second language includes a text with a mask in the second language and a character corresponding to the mask.
  • the training unit 02 When training each layer of the first semantic representation model, the training unit 02 has a training target that the prediction result of the mask by the top layer accords with the character corresponding to the mask in the training language material.
  • Each training process of the trained layer has a training target that the prediction result of the mask by the top layer meets expectation. That is, a loss function may be constructed according to the training target, and the model parameters of the trained layer may be optimized using values of the loss function.
  • Each trained layer has a training ending condition that Loss converges gradually or an iteration number reaches a preset threshold.
  • Utilization of some bilingual parallel language materials may further reduce a training cost and improve a training effect.
  • the second acquiring unit 03 is configured to acquire the semantic representation model trained for the first language as a second semantic representation model.
  • the training unit 02 When training the trained layers using the training language material of the second language, the training unit 02 inputs the parallel language material of the first language corresponding to the training language material of the second language into the second semantic representation model; and aligns an output result of the second semantic representation model with an output result of the first semantic representation model.
  • the training unit 02 may align the output result of the second semantic representation model with the output result of the first semantic representation model specifically by:
  • the training target is that the language material result of the mask in the training language material of the second language accords with the character corresponding to the mask in the training language material.
  • the training target is that the prediction result of each character in the training language material of the second language accords with each character in the training language material.
  • an electronic device and a readable storage medium.
  • Fig. 6 is a block diagram of an electronic device for a method for training a semantic representation model according to the embodiment of the present application.
  • the electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other appropriate computers.
  • the electronic device may also represent various forms of mobile apparatuses, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing apparatuses.
  • the components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementation of the present application described and/or claimed herein.
  • the electronic device includes one or more processors 601, a memory 602, and interfaces configured to connect the components, including high-speed interfaces and low-speed interfaces.
  • the components are interconnected using different buses and may be mounted at a common motherboard or in other manners as desired.
  • the processor may process instructions for execution within the electronic device, including instructions stored in or at the memory to display graphical information for a GUI at an external input/output apparatus, such as a display device coupled to the interface.
  • plural processors and/or plural buses may be used with plural memories, if desired.
  • plural electronic devices may be connected, with each device providing some of necessary operations (for example, as a server array, a group of blade servers, or a multi-processor system).
  • one processor 601 is taken as an example.
  • the memory 602 is configured as the non-transitory computer readable storage medium according to the present application.
  • the memory stores instructions executable by the at least one processor to cause the at least one processor to perform a method for training a semantic representation model according to the present application.
  • the non-transitory computer readable storage medium according to the present application stores computer instructions for causing a computer to perform the method for training a semantic representation model according to the present application.
  • the memory 602 which is a non-transitory computer readable storage medium may be configured to store non-transitory software programs, non-transitory computer executable programs and modules, such as program instructions/modules corresponding to the method for training a semantic representation model according to the embodiment of the present application.
  • the processor 601 executes various functional applications and data processing of a server, that is, implements the method for training a semantic representation model according to the above-mentioned embodiments, by running the non-transitory software programs, instructions, and modules stored in the memory 602.
  • the memory 602 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required for at least one function; the data storage area may store data created according to use of the electronic device, or the like. Furthermore, the memory 602 may include a high-speed random access memory, or a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid state storage devices. In some embodiments, optionally, the memory 602 may include memories remote from the processor 601, and such remote memories may be connected to the electronic device via a network. Examples of such a network include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the electronic device may further include an input apparatus 603 and an output apparatus 604.
  • the processor 601, the memory 602, the input apparatus 603 and the output apparatus 604 may be connected by a bus or other means, and Fig. 6 takes the connection by a bus as an example.
  • the input apparatus 603 may receive input numeric or character information and generate key signal input related to user settings and function control of the electronic device, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a trackball, a joystick, or the like.
  • the output apparatus 604 may include a display device, an auxiliary lighting apparatus (for example, an LED) and a tactile feedback apparatus (for example, a vibrating motor), or the like.
  • the display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
  • Various implementations of the systems and technologies described here may be implemented in digital electronic circuitry, integrated circuitry, application specific integrated circuits (ASIC), computer hardware, firmware, software, and/or combinations thereof.
  • the systems and technologies may be implemented in one or more computer programs which are executable and/or interpretable on a programmable system including at least one programmable processor, and the programmable processor may be special or general, and may receive data and instructions from, and transmitting data and instructions to, a storage system, at least one input apparatus, and at least one output apparatus.
  • a computer having: a display apparatus (for example, a cathode ray tube (CRT) or liquid crystal display (LCD) monitor) for displaying information to a user; and a keyboard and a pointing apparatus (for example, a mouse or a trackball) by which a user may provide input for the computer.
  • a display apparatus for example, a cathode ray tube (CRT) or liquid crystal display (LCD) monitor
  • a keyboard and a pointing apparatus for example, a mouse or a trackball
  • Other kinds of apparatuses may also be used to provide interaction with a user; for example, feedback provided for a user may be any form of sensory feedback (for example, visual feedback, auditory feedback, or tactile feedback); and input from a user may be received in any form (including acoustic, voice or tactile input).
  • the systems and technologies described here may be implemented in a computing system (for example, as a data server) which includes a back-end component, or a computing system (for example, an application server) which includes a middleware component, or a computing system (for example, a user computer having a graphical user interface or a web browser through which a user may interact with an implementation of the systems and technologies described here) which includes a front-end component, or a computing system which includes any combination of such back-end, middleware, or front-end components.
  • the components of the system may be interconnected through any form or medium of digital data communication (for example, a communication network). Examples of the communication network include: a local area network (LAN), a wide area network (WAN) and the Internet.
  • a computer system may include a client and a server.
  • the client and the server are remote from each other and interact through the communication network.
  • the relationship between the client and the server is generated by virtue of computer programs which run on respective computers and have a client-server relationship to each other.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)
EP21163589.1A 2020-07-06 2021-03-19 Procédé et appareil d'apprentissage de modèle de représentation sémantique, dispositif et support d'enregistrement informatique Ceased EP3937060A1 (fr)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010638228.9A CN111539227B (zh) 2020-07-06 2020-07-06 训练语义表示模型的方法、装置、设备和计算机存储介质

Publications (1)

Publication Number Publication Date
EP3937060A1 true EP3937060A1 (fr) 2022-01-12

Family

ID=71968594

Family Applications (1)

Application Number Title Priority Date Filing Date
EP21163589.1A Ceased EP3937060A1 (fr) 2020-07-06 2021-03-19 Procédé et appareil d'apprentissage de modèle de représentation sémantique, dispositif et support d'enregistrement informatique

Country Status (5)

Country Link
US (1) US11914964B2 (fr)
EP (1) EP3937060A1 (fr)
JP (1) JP7267342B2 (fr)
KR (1) KR102567635B1 (fr)
CN (1) CN111539227B (fr)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11475226B2 (en) * 2020-09-21 2022-10-18 International Business Machines Corporation Real-time optimized translation
CN112528669B (zh) 2020-12-01 2023-08-11 北京百度网讯科技有限公司 多语言模型的训练方法、装置、电子设备和可读存储介质
CN113033801A (zh) * 2021-03-04 2021-06-25 北京百度网讯科技有限公司 神经网络模型的预训练方法、装置、电子设备和介质
CN112989844A (zh) * 2021-03-10 2021-06-18 北京奇艺世纪科技有限公司 一种模型训练及文本识别方法、装置、设备及存储介质
CN113011126B (zh) * 2021-03-11 2023-06-30 腾讯科技(深圳)有限公司 文本处理方法、装置、电子设备及计算机可读存储介质
CN113590865B (zh) * 2021-07-09 2022-11-22 北京百度网讯科技有限公司 图像搜索模型的训练方法及图像搜索方法
CN114926460B (zh) * 2022-07-19 2022-10-25 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) 眼底图像分类模型的训练方法、眼底图像分类方法及系统
CN115982583A (zh) * 2022-12-30 2023-04-18 北京百度网讯科技有限公司 预训练语言模型的训练方法、装置、设备和介质
CN116932728B (zh) * 2023-08-30 2024-01-26 苏州浪潮智能科技有限公司 语言交互方法、装置、通信设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209817A (zh) * 2019-05-31 2019-09-06 安徽省泰岳祥升软件有限公司 文本处理模型的训练方法、装置和文本处理方法
CN110717339A (zh) * 2019-12-12 2020-01-21 北京百度网讯科技有限公司 语义表示模型的处理方法、装置、电子设备及存储介质
CN111159416A (zh) * 2020-04-02 2020-05-15 腾讯科技(深圳)有限公司 语言任务模型训练方法、装置、电子设备及存储介质
CN111310474A (zh) * 2020-01-20 2020-06-19 桂林电子科技大学 基于激活-池化增强bert模型的在线课程评论情感分析方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846126B (zh) * 2018-06-29 2021-07-27 北京百度网讯科技有限公司 关联问题聚合模型的生成、问答式聚合方法、装置及设备
CN111160016B (zh) * 2019-04-15 2022-05-03 深圳碳云智能数字生命健康管理有限公司 语义识别方法、装置、计算机可读存储介质和计算机设备
US11586930B2 (en) * 2019-04-16 2023-02-21 Microsoft Technology Licensing, Llc Conditional teacher-student learning for model training
US11604965B2 (en) * 2019-05-16 2023-03-14 Salesforce.Com, Inc. Private deep learning
US11620515B2 (en) * 2019-11-07 2023-04-04 Salesforce.Com, Inc. Multi-task knowledge distillation for language model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209817A (zh) * 2019-05-31 2019-09-06 安徽省泰岳祥升软件有限公司 文本处理模型的训练方法、装置和文本处理方法
CN110717339A (zh) * 2019-12-12 2020-01-21 北京百度网讯科技有限公司 语义表示模型的处理方法、装置、电子设备及存储介质
CN111310474A (zh) * 2020-01-20 2020-06-19 桂林电子科技大学 基于激活-池化增强bert模型的在线课程评论情感分析方法
CN111159416A (zh) * 2020-04-02 2020-05-15 腾讯科技(深圳)有限公司 语言任务模型训练方法、装置、电子设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JACOB DEVLIN ET AL: "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 11 October 2018 (2018-10-11), XP080923817 *

Also Published As

Publication number Publication date
KR20220005384A (ko) 2022-01-13
JP2022014429A (ja) 2022-01-19
CN111539227B (zh) 2020-12-18
CN111539227A (zh) 2020-08-14
US11914964B2 (en) 2024-02-27
KR102567635B1 (ko) 2023-08-16
JP7267342B2 (ja) 2023-05-01
US20220004716A1 (en) 2022-01-06

Similar Documents

Publication Publication Date Title
EP3937060A1 (fr) Procédé et appareil d'apprentissage de modèle de représentation sémantique, dispositif et support d'enregistrement informatique
KR102484617B1 (ko) 이종 그래프 노드를 표현하는 모델 생성 방법, 장치, 전자 기기, 저장 매체 및 프로그램
EP3866025A1 (fr) Procédé et dispositif basés sur un graphe de connaissances et un langage naturel pour l'apprentissage par représentation
EP3916612A1 (fr) Procédé et appareil d'apprentissage d'un modèle de langage basé sur divers vecteurs de mots, dispositif, support et produit programme informatique
EP3933659A1 (fr) Procédé et appareil de génération de relation d'événements, dispositif électronique et support d'enregistrement
JP2022018095A (ja) マルチモーダル事前訓練モデル取得方法、装置、電子デバイス及び記憶媒体
EP3851977A1 (fr) Procédé, appareil, dispositif électronique et support de stockage permettant d'extraire des triplets spo
US20210374343A1 (en) Method and apparatus for obtaining word vectors based on language model, device and storage medium
US11995560B2 (en) Method and apparatus for generating vector representation of knowledge graph
CN111582477B (zh) 神经网络模型的训练方法和装置
JP2021174516A (ja) ナレッジグラフ構築方法、装置、電子機器、記憶媒体およびコンピュータプログラム
CN111079945B (zh) 端到端模型的训练方法及装置
CN112528669B (zh) 多语言模型的训练方法、装置、电子设备和可读存储介质
EP3852013A1 (fr) Procédé, appareil et support de stockage permettant de prédire la ponctuation dans un texte
US20210334659A1 (en) Method and apparatus for adversarial training of machine learning model, and medium
JP7297038B2 (ja) ニューラルネットワークモデルの事前トレーニング方法、装置、電子機器及び媒体
US11321370B2 (en) Method for generating question answering robot and computer device
JP2021192286A (ja) モデル訓練、画像処理方法及びデバイス、記憶媒体、プログラム製品
CN112561056A (zh) 神经网络模型的训练方法、装置、电子设备和存储介质
CN112529180A (zh) 模型蒸馏的方法和装置
CN111611808A (zh) 用于生成自然语言模型的方法和装置
CN112270169B (zh) 对白角色预测方法、装置、电子设备及存储介质
CN115688796B (zh) 用于自然语言处理领域中预训练模型的训练方法及其装置
CN114490968B (zh) 对话状态跟踪方法、模型训练方法、装置以及电子设备
CN111859981B (zh) 语言模型获取及中文语义理解方法、装置及存储介质

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20210319

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

B565 Issuance of search results under rule 164(2) epc

Effective date: 20211005

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20220704

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 20230622