WO2020132985A1 - 模型的自训练方法、装置、计算机设备及存储介质 - Google Patents

模型的自训练方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2020132985A1
WO2020132985A1 PCT/CN2018/124032 CN2018124032W WO2020132985A1 WO 2020132985 A1 WO2020132985 A1 WO 2020132985A1 CN 2018124032 W CN2018124032 W CN 2018124032W WO 2020132985 A1 WO2020132985 A1 WO 2020132985A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
model
corpus
text
trained
Prior art date
Application number
PCT/CN2018/124032
Other languages
English (en)
French (fr)
Inventor
熊友军
罗沛鹏
廖洪涛
Original Assignee
深圳市优必选科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市优必选科技有限公司 filed Critical 深圳市优必选科技有限公司
Priority to PCT/CN2018/124032 priority Critical patent/WO2020132985A1/zh
Publication of WO2020132985A1 publication Critical patent/WO2020132985A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the invention relates to the field of computer processing, in particular to a model self-training method, device, computer equipment and storage medium.
  • deep learning technology has a certain threshold, which requires a solid foundation of mathematics and programming.
  • professional deep learners are also difficult to support.
  • an embodiment of the present invention provides a method for self-training a model.
  • the method includes:
  • Generating a training corpus according to the corpus template and the entity where the training corpus includes training text and corresponding text annotations, and the training text is generated according to the combination of the corpus template and the entity;
  • the training text is used as the input of the model to be trained, and the corresponding text annotation is used as the expected output of the model to be trained to train the model to be trained, and the target model is obtained after the training is completed.
  • an embodiment of the present invention provides a model self-training device.
  • the device includes:
  • Receive module used to receive custom corpus templates and entities
  • a generating module configured to generate a training corpus based on the corpus template and the entity, the training corpus includes training text and corresponding text annotations, the training text is generated according to the combination of the corpus template and the entity;
  • the training module is configured to use the training text as the input of the model to be trained, mark the corresponding text as the expected output of the model to train the model to be trained, and complete the training to obtain the target model.
  • an embodiment of the present invention provides a computer device, including a memory and a processor.
  • the memory stores a computer program.
  • the processor is caused to perform the following steps:
  • Generating a training corpus according to the corpus template and the entity where the training corpus includes training text and corresponding text annotations, and the training text is generated according to the combination of the corpus template and the entity;
  • the training text is used as the input of the model to be trained, and the corresponding text annotation is used as the expected output of the model to be trained to train the model to be trained, and the target model is obtained after the training is completed.
  • an embodiment of the present invention provides a computer-readable storage medium that stores a computer program.
  • the processor is caused to perform the following steps:
  • Generating a training corpus according to the corpus template and the entity where the training corpus includes training text and corresponding text annotations, and the training text is generated according to the combination of the corpus template and the entity;
  • the training text is used as the input of the model to be trained, and the corresponding text annotation is used as the expected output of the model to be trained to train the model to be trained, and the target model is obtained after the training is completed.
  • the self-training method of the above model by receiving a custom corpus template and entity, and then generating a training corpus according to the corpus template and entity, the training corpus includes the training text and the corresponding annotation of the text, and the training text is generated based on the combination of the corpus template and entity , And then automatically complete the model training based on the training samples and corresponding text annotations to obtain the target model.
  • the self-training method of the above model only needs the user-defined configuration corpus template and entity to automatically generate the training corpus, and automatically train the model according to the training corpus, which is simple and convenient, and realizes simple, convenient, and efficient recognition of machine corpus.
  • FIG. 1 is an application environment diagram of a model self-training method in an embodiment
  • FIG. 2 is a flowchart of a model self-training method in an embodiment
  • FIG. 3 is a schematic diagram of an interface of a custom corpus template and an entity in an embodiment
  • FIG. 4 is a schematic diagram of a model of a deep convolutional neural network in an embodiment
  • FIG. 5 is a schematic flowchart of a model self-training method in an embodiment
  • FIG. 6 is a structural block diagram of a model self-training device in an embodiment
  • FIG. 7 is a structural block diagram of a model self-training device in another embodiment
  • FIG. 8 is a structural block diagram of a model self-training device in yet another embodiment
  • FIG. 9 is a structural block diagram of a model self-training device in still another embodiment.
  • FIG. 10 is a structural block diagram of a model self-training device in still another embodiment
  • FIG. 11 is an internal structure diagram of a computer device in an embodiment.
  • FIG. 1 is an application environment diagram of a model self-training method in an embodiment.
  • the self-training of the model is applied to the self-training system of the model.
  • the self-training system of the model includes a terminal 110 and a server 120.
  • the terminal 110 and the server 120 are connected through a network.
  • the terminal 110 may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, a notebook computer, and the like.
  • the server 120 may be implemented by an independent server or a server cluster composed of multiple servers.
  • the terminal 110 is used to receive a customized corpus template and entity, and then upload the customized corpus template and entity to the server 120.
  • the server 120 After receiving the customized corpus template and entity, the server 120 generates a training corpus according to the corpus template and entity ,
  • the training corpus includes training text and corresponding text annotations, the training text is generated based on the combination of the corpus template and the entity; the training text is used as the input of the model to be trained, and the corresponding text annotation is used as The expected output of the model to be trained trains the model to be trained, and the target model is obtained after the training is completed.
  • the self-training method of the above model can be directly applied to the terminal 110.
  • the terminal 110 is used to receive a customized corpus template and entity, and generate a training corpus according to the corpus template and entity.
  • the training corpus includes Training text and corresponding text annotation, the training text is generated based on the combination of the corpus template and the entity; the training text is used as the input of the model to be trained, and the corresponding text is marked as the expectation of the model to be trained
  • the output trains the model to be trained, and the target model is obtained after the training is completed.
  • the model self-training method can be applied to a terminal or a server.
  • the application to the server is used as an example to illustrate that the model self-training
  • the method includes the following steps:
  • Step 202 Receive a customized corpus template and entity.
  • the corpus template refers to a template for generating training corpus. Entities are keywords combined with corpus templates.
  • the corpus template refers to the corpus template set by the user independently, and the entity is also set by the user.
  • you can set the corresponding music template for example, set ⁇ I want to listen to [singer]’s song ⁇ , ⁇ I want to listen to [singer]’s [song] ⁇ , ⁇ I want to order a [song] ] ⁇ and other templates, and then set up some entities at the same time, for example, the corresponding singer entity can be set to: Andy Lau, Jacky Cheung, Song Zuying, etc., the songs are set to: Happy Birthday, Merry Christmas, Happy New Year, Teen Years, etc.
  • FIG. 3 it is a schematic diagram of an interface of a custom corpus template and an entity in one embodiment, including custom settings of the corpus template and the entity.
  • custom settings of the corpus template and the entity there is a corresponding intent in the corpus template: Chinese to English.
  • the corpus template contains the location of the substitute entity.
  • the corresponding training corpus is generated by combining the configured entities and corpus templates.
  • different types of corpus templates may be provided for different types.
  • templates such as ⁇ I want to listen to [singer]'s songs ⁇ can be equipped.
  • For movies you can set ⁇ I want to watch [actor]'s movie ⁇ , ⁇ I want to watch [movie/TV series] ] ⁇ , for stories, you can set it to ⁇ I want to listen to [character]'s story ⁇ and other templates.
  • Step 204 Generate a training corpus based on the corpus template and the entity.
  • the training corpus includes training text and corresponding text annotations.
  • the training text is generated based on the combination of the corpus template and the entity.
  • the corresponding training corpus is generated through free combination. Assuming there are 100 corpus templates and 100 entities, theoretically 10,000 training texts can be generated. In one embodiment, if the number of generated sentences is too large, each entity may randomly select a part of the corpus template to generate the training corpus. For example, an entity may randomly select 20 from 100 templates for combination. In one embodiment, the templates are divided into different types of templates, for example, music and film. For different types, the templates need to be stored separately in order to determine the corresponding type of the generated training text.
  • the training text refers to the samples used to train the model.
  • Text annotation refers to the annotation of training text.
  • the model is a supervised training model, and each training sample corresponds to a corresponding sample annotation.
  • the corresponding text annotation may be determined according to the entities contained in the training text.
  • the model to be trained is a classification model, you can set a positive corpus template and a negative corpus template separately, and the corresponding text annotations can be simply divided into 0 and 1, 0 for positive corpus and 1 for negative corpus. For example, for garbage classification, you can set the text label corresponding to the dirty corpus to 1 and the text label corresponding to the normal corpus to 0.
  • the corresponding text annotation can be determined according to the type of corpus template. In one embodiment, if the music category is labeled 0, if the story category is labeled 1, if the movie category is labeled 2. Then, the corresponding text annotations can be determined according to the types of corpus generated by different types of corpus templates. For example, if the generated training text is: I want to listen to Andy Lau’s song, the corresponding label is 0; I want to listen to Andy Lau’s story, the corresponding label is 1; I want to watch Andy Lau’s movie, the corresponding label is 2.
  • step 206 the training text is used as the input of the model to be trained, and the corresponding text annotation is used as the expected output of the model to be trained to train the model to obtain the target model.
  • the model to be trained may be an intent recognition model, an entity recognition model, or a classification model.
  • the intent recognition model is used to identify the intent of the text. For example, whether the intent of the text is to listen to a story or music.
  • the entity recognition model refers to the recognition of entities in the text.
  • the classification model is used to classify the text.
  • the target model refers to the model obtained after the training is completed.
  • the target model is used for subsequent predictions.
  • the purpose of the target model is to predict the entities in the input text.
  • the user is queried for the music he wants to listen to by identifying the entity in the text. For example, enter "I want to listen to Andy Lau's song", and by identifying the entity "Andy Lau", you can find the corresponding song list of Andy Lau in the corresponding music library.
  • the model to be trained is based on a deep learning algorithm.
  • the model to be trained is based on a convolutional neural network model (CNN), which includes an input layer, a convolutional layer, a pooling layer, and a fully connected layer Among them, there may be multiple convolutional layers and multiple pooling layers.
  • CNN convolutional neural network model
  • FIG. 4 it is a schematic diagram of a model of a deep convolutional neural network in an embodiment. Specifically, by converting the training text into a vector form as input to the input layer. The text input corresponding to the input layer can be expressed as follows: let x i ⁇ R k be a word vector of the i-th word in a text of length n, and the dimension is k.
  • x i:j x i ⁇ x i+j ⁇ ... ⁇ x j represent the word vector stitching of the i-th word to the j-th word in a text of length n.
  • the convolution layer is used to perform the convolution operation according to the convolution kernel.
  • h can be the number of words in the window surrounded by the convolution kernel, and the size of the convolution kernel is h*k;
  • w ⁇ R k be the weight matrix of the convolution kernel;
  • c [c 1 , c 2 , ..., c n-h+1 ] is the obtained convolutional layer feature.
  • the pooling layer uses max-pooling (maximum pooling), that is, the maximum value of each feature of the convolution layer is output.
  • the final output layer is a fully connected layer using softmax.
  • the self-training method of the above model by receiving a custom corpus template and entity, and then generating a training corpus according to the corpus template and entity, the training corpus includes the training text and the corresponding annotation of the text, and the training text is generated based on the combination of the corpus template and entity , And then automatically complete the model training based on the training samples and corresponding text annotations to obtain the target model.
  • the self-training method of the above model only needs the user-defined configuration corpus template and entity to automatically generate the training corpus, and automatically train the model according to the training corpus, which is simple and convenient, and realizes simple, convenient, and efficient recognition of machine corpus.
  • the text annotation is determined according to the type of corpus template; when the model to be trained is an entity recognition model, the text annotation is determined according to the entity in the training sample.
  • the model to be trained may be an intent recognition model, an entity recognition model, and of course, a classification model.
  • the corresponding text annotations are different.
  • the intent recognition model the purpose is to identify the user's intent, for example, to identify whether the user wants to listen to stories, music, or movies.
  • the corresponding text annotations can be made according to the type of corpus template.
  • the corpus template is divided into different types according to different intentions, for example, the template is divided into the music category, the story category template and the film and television template, so according to the corpus template
  • the type can determine the corresponding intent text annotation.
  • the purpose is to identify the entity in the training sample, so the corresponding text annotation can be determined according to the entity in the training sample.
  • there is a corresponding label for each word in the training text For example, use a common BIO format (B for the beginning of an entity, I for other parts of the entity, and 0 for non-entity words) to label each word in the training text. Decide the label that needs to be marked according to the specific type of entity, and then use "B-type” to mark the beginning of the entry, use "I-type” to mark the non-starting of the entry, and use "0" to indicate other non-entity parts.
  • the corpus template is ⁇ I want to listen to [singer]'s [song] ⁇ , if it is the singer's name entity, it is marked as "SINGER”, if it is the song's name entity, it is marked as "SONG", for example, if The generated training text is "I want to listen to [Liu Dehua]'s [Stupid Child]", then the text corresponding to each word is marked as "I 0 want 0 listen 0 Liu B-SINGER De I-SINGER Hua I-SINGER 0 messy B-SONG little I-SONG boy I-SONG".
  • the method before receiving the customized corpus template and entity, the method further includes: receiving a model training request for the application, and assigning a unique model identifier to the application model according to the training request.
  • the model identifier is used to distinguish models of different applications ;
  • Receive custom corpus templates and entities including: receiving custom corpus templates and entities corresponding to the model identification.
  • each registered user is assigned a unique model identifier for each application, and the data, corpus, parameters, and models of each application are distinguished by the model identifier. Since the platform can provide model training for many applications, it is necessary to assign model identifiers to distinguish models corresponding to different applications.
  • a unique model identifier is assigned to the model of the target application according to the training request.
  • the model identifier is used to distinguish models of different applications, and a unique model is assigned to each application, so it can also be called "app ID".
  • the method further includes: checking the training text in the training corpus according to a preset rule, the preset rule including whether there are duplicately configured training text, and the length of the training text Whether it meets at least one of the preset lengths; when the training text does not meet the preset rules, a reminder notification is generated.
  • the corpus check rules to automatically detect the training corpus, for example, to detect whether there are duplicate texts configured, especially for the classified corpus .
  • it is also necessary to detect whether there is abnormal configuration data for example, to detect whether the length of the corpus exceeds the preset length, and if it exceeds the preset length, the user is reminded to rewrite it as the norm.
  • a corresponding reminder notification is generated to remind the user to rewrite the corresponding specifications.
  • the training text is used as the input of the model to be trained, and the corresponding text annotation is used as the expected output of the model to be trained.
  • the target model is obtained, including: the training text is used as the input of the model to be trained by the training server, and the corresponding The text annotation is used as the expected output of the model to be trained to train the model to be trained to obtain the target model.
  • the platform is for multiple users, there may be multiple models that need to be trained at the same time.
  • the computing resources of the training server must be counted to facilitate the allocation of an appropriate amount of resources for training. Insufficient, you need to queue. Specifically, the remaining computing resources of the training server are counted, and then it is judged whether the remaining computing resources are greater than a preset resource threshold, and if so, the training task of the model to be trained is sent to the training server to enable the training server to train the model to be trained Get the target model.
  • the training of the model based on deep learning requires a lot of calculations, in order not to affect the prediction of the user's existing model, two servers need to be deployed, one of which is a prediction server for model prediction.
  • the other is a training server, which is used to perform training tasks and train models. In this way, even if the amount of calculation at the time of training is too large and the training server is down, it will not affect the prediction server.
  • the training server can use an offline server to reduce the occupation of the network.
  • the statistics of the remaining computing resources of the training server include: acquiring indicators that affect the remaining computing resources of the training server.
  • the indicators include: memory usage, CPU usage, GPU (graphics processor) usage, and GPU memory At least one of; obtaining the number of models being trained and the size of the training corpus; calculating the remaining computing resources of the training server according to the indicators, the number of models being trained, and the size of the training corpus.
  • the calculation of the remaining computing resources of the training server is calculated by counting multiple indicators.
  • Statistical indicators include: memory usage, CPU usage, GPU usage, and GPU memory. Then obtain the number of models being trained and the size of the training corpus corresponding to the model to be trained this time.
  • the size of the training corpus refers to the size of the space occupied by the training corpus.
  • the above model self-training method further includes: publishing the target model to the prediction server according to the model identifier, and recording the version number of the target model and the corresponding training status in the log, and the prediction server is used to carry the trained The target model.
  • the target model is synchronized to the prediction server, and the target model and the model identifier are stored in association.
  • the version number of the target model and the corresponding training status are recorded in the log according to the model identification. Since the model corresponding to the same model identifier is constantly being updated, when the target model is obtained from the first training, it is recorded as version 1, if it is obtained from the second training, then the corresponding version number is marked as 2, and so on.
  • the training status refers to whether the model is successfully trained.
  • the training status can be divided into success, failure (server problem), and error report (code problem). Through the recorded logs, you can timely understand the version and status of each model.
  • the prediction server is used to carry the trained target model, and the corresponding prediction service is provided by publishing the target model to the prediction server.
  • the above model self-training method further includes: when the target model corresponding to the model identifier is the first training, the target model is loaded into the memory of the prediction server, and when the loading is completed, the target model is started to provide prediction Service; when the target model corresponding to the model identifier is not the first training, the target model is loaded into the memory of the prediction server, and when the loading is completed, the prediction call is switched from the old model corresponding to the model identifier to the target model.
  • the target model corresponding to the model identifier When the target model corresponding to the model identifier is the first training, the target model is loaded into the memory of the prediction server, and then the target model is started to provide the corresponding prediction service. Since the model is generally constantly being updated, there may have been a prediction model corresponding to the model identifier before. When the target model corresponding to the model identifier is not the first training, after loading the target model into the memory of the prediction server, you need to quickly switch the prediction call from the old model to the new target model, and then stop the old model, so that you can Ensure that the online prediction model can make predictions without interruption.
  • FIG. 5 it is a schematic flowchart of a model self-training method in an embodiment.
  • (1) First assign a unique model ID to each application of the user, and different applications correspond to different model IDs.
  • (2) Receive the input corpus template and entity.
  • (3) Generate training corpus based on corpus template and entity.
  • (4) Send the training corpus to the training server and create a new training task.
  • the remaining computing resources of the training server are counted, which is convenient for allocating an appropriate amount of resources for training, and if the resources are not enough, wait in line.
  • (6) Record the version number and training status of the training model. For each model retrained for the same model ID, the version number is automatically incremented by 1. The training status is divided into success, failure and error.
  • a model self-training device which includes:
  • the receiving module 602 is used to receive custom corpus templates and entities
  • the generating module 604 is configured to generate a training corpus based on the corpus template and the entity, where the training corpus includes training text and corresponding text annotations, and the training text is generated by combining the corpus template and the entity;
  • the training module 606 is configured to use the training text as the input of the model to be trained, mark the corresponding text as the expected output of the model to be trained, and train the model to be trained, and the target model can be obtained after the training is completed.
  • the text annotation is determined according to the type of the corpus template; when the model to be trained is an entity recognition model, the text annotation is Determined according to the entity in the training sample.
  • the device before the receiving the customized corpus template and entity, the device further includes:
  • the allocation module 601 is configured to receive a model training request for an application, and assign a unique model identifier to the model of the application according to the training request, and the model identifier is used to distinguish models of different applications;
  • the receiving module is also used to receive a custom corpus template and entity corresponding to the model identification.
  • the self-training device of the above model also includes:
  • the checking module 605 is configured to check the training text in the training corpus according to a preset rule, where the preset rule includes whether there is repeatedly configured training text, and whether the length of the training text meets at least one of the preset length , When the training text does not conform to the preset rule, a reminder notification is generated.
  • the self-training device of the above model also includes:
  • the statistics module 608 is configured to count the remaining computing resources of the training server, and when the remaining computing resources are greater than a preset resource threshold, send the training task of the model to be trained to the training server;
  • the training module is further used to train the training model by using the training text as the input of the model to be trained and marking the corresponding text as the expected output of the model to obtain the target model .
  • the statistics module is further used to obtain indicators that affect the remaining computing resources of the training server, the indicators include: at least one of memory usage, CPU usage, GPU usage, and GPU memory Obtain the number of models being trained and the size of the training corpus; calculate the remaining computing resources of the training server according to the indicators, the number of models being trained and the size of the training corpus;
  • the self-training device of the above model further includes:
  • the startup module 610 is used to obtain a model identifier corresponding to the target model.
  • the target model corresponding to the model identifier is the first training, the target model is loaded into the memory of the prediction server.
  • the loading is completed, Start the target model to provide an intent prediction service; when the target model corresponding to the model identification is not the first training, load the target model into the memory of the prediction server, and when the loading is completed, call the prediction from the The old model corresponding to the model identifier is switched to the target model.
  • FIG. 11 shows an internal structure diagram of a computer device in an embodiment.
  • the computer device may be a terminal or a server.
  • the computer device includes a processor, a memory, and a network interface connected by a system bus.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device stores an operating system, and may also store a computer program.
  • the processor may cause the processor to implement a model self-training method.
  • a computer program can also be stored in the internal memory.
  • the processor can be caused to execute the self-training method of the model.
  • the network interface is used to communicate with the outside world.
  • FIG. 11 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied.
  • the specific computer equipment may It includes more or fewer components than shown in the figure, or some components are combined, or have a different component arrangement.
  • the self-training method of the model provided in this application may be implemented in the form of a computer program, and the computer program may run on the computer device shown in FIG. 11.
  • the program templates of the self-training device of the model can be stored in the memory of the computer device.
  • a computer device includes a memory and a processor.
  • the memory stores a computer program.
  • the processor is caused to perform the following steps:
  • Generating a training corpus according to the corpus template and the entity where the training corpus includes training text and corresponding text annotations, and the training text is generated according to the combination of the corpus template and the entity;
  • the training text is used as the input of the model to be trained, and the corresponding text annotation is used as the expected output of the model to be trained to train the model to be trained, and the target model is obtained after the training is completed.
  • a computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor is caused to perform the following steps:
  • Generating a training corpus according to the corpus template and the entity where the training corpus includes training text and corresponding text annotations, and the training text is generated according to the combination of the corpus template and the entity;
  • the training text is used as the input of the model to be trained, and the corresponding text annotation is used as the expected output of the model to be trained to train the model to be trained, and the target model is obtained after the training is completed.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM random access memory
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Road (Synchlink) DRAM
  • RDRAM direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

一种模型的自训练方法、装置、计算机设备及存储介质。该方法包括:接收自定义的语料模板和实体(202);根据所述语料模板和实体生成训练语料,所述训练语料中包括训练文本和对应的文本标注,所述训练文本是根据所述语料模板和实体进行组合生成的(204);将所述训练文本作为待训练模型的输入,将对应的文本标注作为所述待训练模型的期望输出对所述待训练模型进行训练,训练完成得到目标模型(206)。该模型的自训练方法只需要用户自定义语料模板和实体便可自动生成训练语料,并自动根据训练语料进行模型的训练得到目标模型,简单方便,实现了简易、便捷、高效地识别机器语料。

Description

模型的自训练方法、装置、计算机设备及存储介质 技术领域
本发明涉及计算机处理领域,尤其是涉及一种模型的自训练方法、装置、计算机设备及存储介质。
背景技术
随着计算能力的提升和深度学习算法的广泛应用,越来越多的问题能够被深度学习解决。然而,深度学习技术具有一定的门槛,需要比较扎实的数学、编程基础,当面对庞大的信息需求时,专业的深度学习者也难于支持。
技术问题
尤其,在机器学习中,因传统的深度学习应用的门槛高,大多数人无法通过简易便捷的方式解决机器人语料识别的问题,导致降低机器的分类或实体识别任务的效率。
技术解决方案
基于此,有必要针对上述问题,提供了一种能简易、便捷、高效地识别机器语料的模型的自训练方法、装置、计算机设备及存储介质。
第一方面,本发明实施例提供一种模型的自训练方法,所述方法包括:
接收自定义的语料模板和实体;
根据所述语料模板和实体生成训练语料,所述训练语料中包括训练文本和对应的文本标注,所述训练文本是根据所述语料模板和实体进行组合生成的;
将所述训练文本作为待训练模型的输入,将对应的文本标注作为所述待训练模型的期望输出对所述待训练模型进行训练,训练完成得到目标模型。
第二方面,本发明实施例提供一种模型的自训练装置,所述装置包括:
接收模块,用于接收自定义的语料模板和实体;
生成模块,用于根据所述语料模板和实体生成训练语料,所述训练语料中包括训练文本和对应的文本标注,所述训练文本是根据所述语料模板和实体进行组合生成的;
训练模块,用于将所述训练文本作为待训练模型的输入,将对应的文本标注作为所述待训练模型的期望输出对所述待训练模型进行训练,训练完成得到目标模型。
第三方面,本发明实施例提供一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如下步骤:
接收自定义的语料模板和实体;
根据所述语料模板和实体生成训练语料,所述训练语料中包括训练文本和对应的文本标注,所述训练文本是根据所述语料模板和实体进行组合生成的;
将所述训练文本作为待训练模型的输入,将对应的文本标注作为所述待训练模型的期望输出对所述待训练模型进行训练,训练完成得到目标模型。
第四方面,本发明实施例提供一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如下步骤:
接收自定义的语料模板和实体;
根据所述语料模板和实体生成训练语料,所述训练语料中包括训练文本和对应的文本标注,所述训练文本是根据所述语料模板和实体进行组合生成的;
将所述训练文本作为待训练模型的输入,将对应的文本标注作为所述待训练模型的期望输出对所述待训练模型进行训练,训练完成得到目标模型。
有益效果
上述模型的自训练方法,通过接收自定义的语料模板和实体,然后根据语料模板和实体生成训练语料,训练语料中包括训练文本和文本对应的标注,训练文本是根据语料模板和实体进行组合生成的,然后根据训练样本和相应的文本标注自动完成模型的训练,得到目标模型。上述模型的自训练方法只需要用户自定义配置语料模板和实体便可自动生成训练语料,并自动根据训练语料进行模型的训练,简单方便,实现了简易、便捷、高效地识别机器语料。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
其中:
图1为一个实施例中模型的自训练方法的应用环境图;
图2为一个实施例中模型的自训练方法的流程图;
图3为一个实施例中自定义语料模板和实体的界面示意图;
图4为一个实施例中深度卷积神经网络的模型示意图;
图5为一个实施例中模型的自训练方法的流程示意图;
图6为一个实施例中模型的自训练装置的结构框图;
图7为另一个实施例中模型的自训练装置的结构框图;
图8为又一个实施例中模型的自训练装置的结构框图;
图9为再一个实施例中模型的自训练装置的结构框图;
图10为还一个实施例中模型的自训练装置的结构框图;
图11为一个实施例中计算机设备的内部结构图。
本发明的实施方式
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
图1为一个实施例中模型的自训练方法的应用环境图。参照图1,该模型的自训练应用于模型的自训练系统。该模型的自训练系统包括终端110和服务器120。终端110和服务器120通过网络连接,终端110具体可以是台式终端或移动终端,移动终端具体可以是手机、平板电脑、笔记本电脑等中的至少一种。服务器120可以用独立的服务器或者是多个服务器组成的服务器集群来实现。终端110用于接收自定义的语料模板和实体,然后将自定义的语料模板和实体上传到服务器120,服务器120接收到自定义的语料模板和实体后,根据所述语料模板和实体生成训练语料,所述训练语料中包括训练文本和对应的文本标注,所述训练文本是根据所述语料模板和实体进行组合生成的;将所述训练文本作为待训练模型的输入,将对应的文本标注作为所述待训练模型的期望输出对所述待训练模型进行训练,训练完成得到目标模型。
在另一个实施例中,上述模型的自训练方法可以直接应用于终端110,终端110用于接收自定义的语料模板和实体,根据所述语料模板和实体生成训练语料,所述训练语料中包括训练文本和对应的文本标注,所述训练文本是根据所述语料模板和实体进行组合生成的;将所述训练文本作为待训练模型的输入,将对应的文本标注作为所述待训练模型的期望输出对所述待训练模型进行训练,训练完成得到目标模型。
如图2所示,提出了一种模型的自训练方法,该模型的自训练方法可以应用于终端,也可以应用于服务器,本实施例中以应用于服务器为例说明,该模型的自训练方法具体包括以下步骤:
步骤202,接收自定义的语料模板和实体。
其中,语料模板是指生成训练语料的模板。实体是指与语料模板进行组合的关键字。语料模板是指用户自主设置的语料模板,实体也是用户自定义设置的。在一个音乐应用的场景中,可以设置相应的音乐模板,比如,设置{我想听[歌手]的歌}、{我要听[歌手]的[歌曲]}、{我想点一首[歌曲]}等模板,然后同时设置一些实体,比如,相应的歌手实体可以设置为:刘德华、张学友、宋祖英等,歌曲设置为:生日快乐、圣诞快乐、流年、青春年华等等。如图3所示,为一个实施例中,自定义语料模板和实体的界面示意图,包括语料模板和实体的自定义设置。其中,语料模板中有相应的意图标注:Chinese to English。语料模板中包含的有代填实体的位置。通过将配置的实体和语料模板结合生成相应的训练语料。
在一个实施例中,可以针对不同的类型,配备不同类型的语料模板。比如,针对音乐类的,可以配备{我想听[歌手]的歌}这类的模板,对于影视类的,可以设置{我想看[演员]的电影}、{我想看[电影/电视剧]},对于故事类的,可以设置为{我想听[人物]的故事}等模板。
步骤204,根据语料模板和实体生成训练语料,训练语料中包括训练文本和对应的文本标注,训练文本是根据语料模板和实体进行组合生成的。
其中,在获取到语料模板和实体后通过自由组合生成相应的训练语料。假设有100个语料模板和100个实体,理论上就可以生成10000个训练文本。在一个实施例中,如果生成的语句数量过大,可以每个实体随机选择一部分语料模板进行训练语料的生成,比如,一个实体可以随机从100个模板中选择20个进行组合。在一个实施例中,模板分为不同类型的模板,比如,分为音乐类和影视类,对于不同的种类,需要将模板进行分开存储,以便确定生成的训练文本对应的类型。
训练文本是指用于训练模型的样本。文本标注是指对训练文本的标注,该模型为有监督的训练模型, 每个训练样本都对应有相应的样本标注。在一个实施例中,如果要训练的模型为实体识别模型,那么可以根据训练文本中包含的实体来确定相应的文本标注。如果要训练的模型为分类模型,那么可以分别设置正语料模板和负语料模板,相应的文本标注可以简单分为0和1,0表示正语料,1表示负语料。举个例子,对于垃圾分类,可以将脏语料对应的文本标注设为1,将正常语料对应的文本标注设为0。
如果要训练的模型是意图识别模型,那么可以根据语料模板的类型来确定相应的文本标注。在一个实施例中,如果是音乐类的标注为0,如果是故事类的标注为1,如果是影视类的标注为2。那么就可以根据不同类型的语料模板生成的语料类型来确定相应的文本标注。比如,如果生成的训练文本为:我想听刘德华的歌,相应的标注为0;我想听刘德华的故事,相应的标注为1;我想看刘德华的电影,相应的标注为2。
步骤206,将训练文本作为待训练模型的输入,将对应的文本标注作为待训练模型的期望输出对模型进行训练,得到目标模型。
其中,在得到训练语料后,即得到训练文本和对应的文本标注后,分别将训练文本作为待训练模型的输入,将相应的文本标注作为模型的期望输出来对模型进行训练,然后得到目标模型。待训练模型可以是意图识别模型,也可以是实体识别模型,还可以是分类模型等。意图识别模型用于对文本的意图进行识别,比如,识别文本的意图是想听故事,还是想听音乐等。实体识别模型是指文本中实体进行识别。分类模型用于对文本进行分类。
目标模型是指训练完成后得到的模型。目标模型用于后续的预测。比如,如果目标模型为目标实体识别模型,那么该目标模型的目的是就是预测出输入文本中的实体。在一个音乐应用场景的实施例中,通过识别文本中的实体来为用户查询想要听的音乐。比如,输入“我想听刘德华的歌”,通过识别实体“刘德华”就可以在相应的音乐曲库中查找到刘德华相应的歌曲列表。
待训练模型是基于深度学习算法建立的,在一个实施例中,待训练模型是基于卷积神经网络模型(CNN)建立的,该模型包括输入层,卷积层、池化层和全连接层,其中,卷积层可以有多个,池化层也可以有多个,如图4所示,为一个实施例中,深度卷积神经网络的模型示意图。具体地,通过将训练文本转换为向量的形式作为输入层的输入。输入层对应的文本输入可以表示如下:令x i∈R k为一个长度为n的文本中第i个字的字向量,维度为k。令x i:j= x i⊕x i+j⊕…⊕x j表示在长度为n的文本中,第i个字到第j个字的字向量拼接。卷积层用于根据卷积核进行卷积运算,在一个实施例中,卷积层中可以令h为卷积核所围窗口中单词的个数,卷积核的尺寸为h*k;令w∈R k为卷积核的权重矩阵;令c i=f(w*x i:i+h-1+b)为卷积核在字i位置上的输出,b∈R k为偏置(bias),f为激活函数;c=[c 1, c 2, …, c n-h+1]为得到的卷积层特征。池化层采用max-pooling(最大池化),即输出卷积层每个特征的最大值。最后一层输出层是采用softmax的全连接层。上述模型的自训练方法只需要用户自定义配置语料模板和实体即可,操作简便,门槛低,且有利于降低开发成本。
上述模型的自训练方法,通过接收自定义的语料模板和实体,然后根据语料模板和实体生成训练语料,训练语料中包括训练文本和文本对应的标注,训练文本是根据语料模板和实体进行组合生成的,然后根据训练样本和相应的文本标注自动完成模型的训练,得到目标模型。上述模型的自训练方法只需要用户自定义配置语料模板和实体便可自动生成训练语料,并自动根据训练语料进行模型的训练,简单方便,实现了简易、便捷、高效地识别机器语料。
在一个实施例中,当待训练模型为意图识别模型时,文本标注是根据语料模板的类型确定的;当待训练模型为实体识别模型时,文本标注是根据训练样本中的实体确定的。
其中,待训练模型可以是意图识别模型,也可以是实体识别模型,当然也可以是分类模型等。针对不同的待训练模型,相应的文本标注不同。对于意图识别模型,其目的是为了识别用户的意图,比如,识别用户是想听故事,还是想听音乐,或者想看电影。那么就可以根据语料模板的类型来进行相应的文本标注,语料模板根据不同的意图分为不同类型,比如,分为音乐类的模板、故事类的模板和影视类的模板,所以根据语料模板的类型就可以确定相应的意图文本标注。
对于实体识别模型,其目的是识别训练样本中的实体,所以对应的文本标注可以根据训练样本中的实体确定。在一个实施例中,针对训练文本中的每个字都对应有相应的标注。比如,采用通用的BIO格式(B表示实体的开始、I表示实体的其他部分、0表示非实体的字)对训练文本中的每个字进行标注。根据实体的具体类型决定需要标注的标签,然后采用“B-类型”标注词条开头的词,采用“I-类型”标注词条的非开头,用“0”表示其他非实体部分。举个例子,“我很生气啊”根据实体“生气”的类型为“ANGRY”,可以对应的标注为“我0很0生B-ANGRY 气I-ANGRY啊0”。根据语料模板中要填充的实体来预先设置不同的实体类型。比如,语料模板为{我要听[歌手]的[歌曲]},如果是歌手的名字实体,则标注为“SINGER”,如果是歌曲的名字的实体,则标注为“SONG”,比如,如果生成的训练文本为“我想听[刘德华]的[笨小孩]”,那么每个字对应的文本标注为“我0想0听0刘B-SINGER德I-SINGER华I-SINGER的0笨B-SONG小I-SONG孩I-SONG”。
在一个实施例中,在接收自定义的语料模板和实体之前,还包括:接收针对应用的模型训练请求,根据训练请求为应用的模型分配唯一的模型标识,模型标识用于区分不同应用的模型;接收自定义的语料模板和实体,包括:接收与模型标识对应的自定义语料模板和实体。
其中,为了很好地区分和管理模型,针对每个注册用户的每个应用分配一个唯一的模型标识,每个应用的数据、语料、参数、模型均靠模型标识来区分。由于该平台可以为很多应用提供模型的训练,所以需要分配模型标识来区分不同应用对应的模型。具体地,在接收到针对目标应用的模型训练请求后,根据该训练请求为目标应用的模型分配唯一的模型标识。模型标识用于区分不同应用的模型,针对每个应用分配唯一的模型,所以也可以称为“app ID”。
在一个实施例中,在根据语料模板和实体生成训练语料之后,还包括:根据预设规则对训练语料中的训练文本进行检查,预设规则包括是否存在重复配置的训练文本、训练文本的长度是否符合预设长度中的至少一种;当训练文本不符合预设规则时,则生成提醒通知。
其中,为了避免语料的不规范导致训练异常或不准确,当用户配置了训练语料后,设置语料检查规则自动对训练语料进行检测,比如,检测是否出现了配置重复的文本,尤其是对分类语料,不同类别中可能出现较多的重复语料,所以要进行检查,提醒用户进行修改。另外,还需要检测是否有配置异常的数据,比如,检测语料的长度是否超过了预设的长度,如果超过了预设长度,则提醒用户改写成规范的。当训练文本不符合设置的任意一个规则时,则生成相应的提醒通知,提醒用户进行相应的规范改写。
在一个实施例中,在根据语料模板和实体生成训练语料之后,还包括:统计训练服务器的剩余计算资源,当剩余计算资源大于预设资源阈值时,则将模型的训练任务发送给训练服务器;将训练文本作为待训练模型的输入,将对应的文本标注作为待训练模型的期望输出对待训练模型进行训练,得到目标模型,包括:通过训练服务器将训练文本作为待训练模型的输入,将对应的文本标注作为待训练模型的期望输出对待训练模型进行训练,得到目标模型。
其中,由于该平台面向的是多个使用者,所以在同一时间可能有多个模型需要训练,在生成训练语料后,要统计训练服务器的计算资源,以便于分配适量的资源进行训练,如果资源不够,则需要排队。具体地,统计训练服务器的剩余计算资源,然后判断剩余计算资源是否大于预设资源阈值,如果是,则将待训练模型的训练任务发送给训练服务器,以使训练服务器对该待训练模型进行训练得到目标模型。在一个实施例中,由于基于深度学习的模型的训练需要消耗大量运算,为了不影响使用者的已有模型的预测,需要部署两台服务器,一台是预测服务器,用于模型的预测。另一台是训练服务器,用于执行训练任务,对模型进行训练。这样,即使在训练时的计算量过大致使训练服务器宕机,也不会影响预测服务器。另外,训练服务器可以采用离线服务器,减少网络的占用。
在一个实施例中,统计训练服务器的剩余计算资源,包括:获取影响训练服务器剩余计算资源的指标,指标包括:内存的占用、CPU使用率、GPU(图形处理器)使用率,GPU的显存中的至少一种;获取正在训练的模型数量和训练语料的大小;根据指标、正在训练的模型数量和训练语料的大小计算得到训练服务器的剩余计算资源。
其中,训练服务器的剩余计算资源的计算是通过统计多个指标来计算得到的。统计的指标包括:内存的占用、CPU的使用率、GPU的使用率、GPU的显存。然后获取正在训练的模型数量和本次待训练模型对应的训练语料的大小,训练语料的大小是指训练语料所占的空间大小。然后计算得到训练服务器的剩余计算资源。具体地,可以通过计算CPU、GPU的平均使用率和内存需求来衡量剩余计算资源。对于CPU、GPU的平均使用率可以采用如下计算方式:平均使用率=使用率/正在训练数量。对于内存、显存的判断为:内存需求=已使用的内存/正在训练的数量+训练语料的大小。
在一个实施例中,上述模型的自训练方法还包括:根据模型标识将目标模型发布到预测服务器,并在日志中记录目标模型的版本号和相应的训练状态,预测服务器用于承载训练好的目标模型。
其中,在训练得到目标模型后,将目标模型同步到预测服务器,将目标模型与模型标识进行关联存储。同时在日志中根据模型标识记录目标模型的版本号和相应的训练状态。由于同一模型标识对应的模型处于不断更新中,当该目标模型是第一次训练得到的,记录为版本1,如果是第二训练得到的,那么相应的版本号记为2,依次类推。训练状态是指模型是否训练成功的状态,训练状态可以分为成功、失败(服务器问题)、报错(代码问题)。通过记录的日志可以及时了解各个模型的版本和状态。预测服务器用于承载训练好的目标模型,通过将目标模型发布到预测服务器来提供相应的预测服务。
在一个实施例中,上述模型的自训练方法还包括:当模型标识对应的目标模型为第一次训练时,则将目标模型加载到预测服务器的内存,当加载完成后,启动目标模型提供预测服务;当模型标识对应的目标模型不是第一次训练时,则将目标模型加载到预测服务器的内存,当加载完成后,将预测调用从模型标识对应的旧的模型切换到目标模型。
其中,当模型标识对应的目标模型为第一次训练时,则将目标模型加载到预测服务器的内存,然后启动该目标模型即可提供相应的预测服务。由于模型一般处于不断地更新换代中,所以在之前可能已经存在了模型标识对应的预测模型。当模型标识对应的目标模型不是第一次训练,将目标模型加载到预测服务器的内存后,需要迅速将预测的调用从老模型切换到该新的目标模型,然后再将老模型停止,这样可以保证线上的预测模型可以不间断地进行预测。
如图5所示,为一个实施例中,模型的自训练方法的流程示意图。(1)首先为使用者的每个应用分配唯一的模型标识,不同应用对应不同的模型标识。(2)接收录入的语料模板和实体。(3)根据语料模板和实体生成训练语料。(4)将训练语料发送到训练服务器,新建一个训练任务。(5)统计训练服务器的剩余计算资源,便于分配适量资源进行训练,如果资源不够,则排队等待。(6)记录训练模型的版本号和训练状态,对于同一模型标识对应的模型每重新训练一次,则版本号自动加1。训练状态分为成功、失败和报错。(7)将训练服务器训练得到的目标模型同步到预测服务器,根据模型标识将训练好的目标模型同步到预测服务器的相应位置。(8)判断目标模型是不是第一次训练,如果是,则进入(9),如果不是,则进入(10)。(9)将目标模型加载到预测服务器的内存,然后启动该模型,提供预测服务,供使用者调用。(10)先保持老模型预测不变,将新模型加载到新的内存地址,并在新内存上初始化模型,同时将老模型的调用改为新模型的调用,提供预测服务。(11)将老模型服务停掉,释放老模型的内存空间。
如图6所示,在一个实施例中,提出了一种模型的自训练装置,该装置包括:
接收模块602,用于接收自定义的语料模板和实体;
生成模块604,用于根据所述语料模板和实体生成训练语料,所述训练语料中包括训练文本和对应的文本标注,所述训练文本是根据所述语料模板和实体进行组合生成的;
训练模块606,用于将所述训练文本作为待训练模型的输入,将对应的文本标注作为所述待训练模型的期望输出对所述待训练模型进行训练,训练完成得到目标模型。
在一个实施例中,当所述待训练模型为意图识别模型时,所述文本标注是根据所述语料模板的类型确定的;当所述待训练模型为实体识别模型时,所述文本标注是根据所述训练样本中的实体确定的。
如图7所示,在一个实施例中,在所述接收自定义的语料模板和实体之前,所述装置还包括:
分配模块601,用于接收针对应用的模型训练请求,根据所述训练请求为所述应用的模型分配唯一的模型标识,所述模型标识用于区分不同应用的模型;
所述接收模块还用于接收与所述模型标识对应的自定义语料模板和实体。
如图8所示,上述模型的自训练装置还包括:
检查模块605,用于根据预设规则对所述训练语料中的训练文本进行检查,所述预设规则包括是否存在重复配置的训练文本、训练文本的长度是否符合预设长度中的至少一种,当所述训练文本不符合所述预设规则时,则生成提醒通知。
如图9所示,上述模型的自训练装置还包括:
统计模块608,用于统计训练服务器的剩余计算资源,当所述剩余计算资源大于预设资源阈值时,则将所述待训练模型的训练任务发送给所述训练服务器;
所述训练模块还用于通过所述训练服务器将所述训练文本作为待训练模型的输入,将对应的文本标注作为所述待训练模型的期望输出对所述待训练模型进行训练,得到目标模型。
在一个实施例中,所述统计模块还用于获取影响所述训练服务器剩余计算资源的指标,所述指标包括:内存的占用、CPU使用率、GPU使用率,GPU的显存中的至少一种;获取正在训练的模型数量和所述训练语料的大小;根据所述指标、所述正在训练的模型数量和所述训练语料的大小计算得到所述训练服务器的剩余计算资源。
如图10所示,在一个实施例中,上述模型的自训练装置还包括:
启动模块610,用于获取所述目标模型对应的模型标识,当所述模型标识对应的目标模型为第一次训练时,则将所述目标模型加载到预测服务器的内存,当加载完成后,启动所述目标模型提供意图预测服务;当所述模型标识对应的目标模型不是第一次训练时,则将所述目标模型加载到预测服务器的内存,当加载完成后,将预测调用从所述模型标识对应的旧的模型切换到所述目标模型。
图11示出了一个实施例中计算机设备的内部结构图。该计算机设备可以是终端,也可以是服务器。如图11所示,该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现模型的自训练方法。该内存储器中也可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行模型的自训练方法。网络接口用于与外界进行通信。本领域技术人员可以理解,图11中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,本申请提供的模型的自训练方法可以实现为一种计算机程序的形式,计算机程序可在如图11所示的计算机设备上运行。计算机设备的存储器中可存储组成该模型的自训练装置的各个程序模板。比如,接收模块602、生成模块604和训练模块606。
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如下步骤:
接收自定义的语料模板和实体;
根据所述语料模板和实体生成训练语料,所述训练语料中包括训练文本和对应的文本标注,所述训练文本是根据所述语料模板和实体进行组合生成的;
将所述训练文本作为待训练模型的输入,将对应的文本标注作为所述待训练模型的期望输出对所述待训练模型进行训练,训练完成得到目标模型。
一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如下步骤:
接收自定义的语料模板和实体;
根据所述语料模板和实体生成训练语料,所述训练语料中包括训练文本和对应的文本标注,所述训练文本是根据所述语料模板和实体进行组合生成的;
将所述训练文本作为待训练模型的输入,将对应的文本标注作为所述待训练模型的期望输出对所述待训练模型进行训练,训练完成得到目标模型。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink) DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (10)

  1. 一种模型的自训练方法,其特征在于,所述方法包括:
    接收自定义的语料模板和实体;
    根据所述语料模板和实体生成训练语料,所述训练语料中包括训练文本和对应的文本标注,所述训练文本是根据所述语料模板和实体进行组合生成的;
    将所述训练文本作为待训练模型的输入,将对应的文本标注作为所述待训练模型的期望输出对所述待训练模型进行训练,得到目标模型。
  2. 根据权利要求1所述的方法,其特征在于,当所述待训练模型为意图识别模型时,所述文本标注是根据所述语料模板的类型确定的;当所述待训练模型为实体识别模型时,所述文本标注是根据所述训练样本中的实体的类型确定的。
  3. 根据权利要求1所述的方法,其特征在于,在所述接收自定义的语料模板和实体之前,还包括:
    接收针对应用的模型训练请求,根据所述训练请求为所述应用的模型分配唯一的模型标识,所述模型标识用于区分不同应用的模型;
    所述接收自定义的语料模板和实体,包括:
    接收与所述模型标识对应的自定义语料模板和实体。
  4. 根据权利要求1所述的方法,其特征在于,在所述根据所述语料模板和实体生成训练语料之后,还包括:
    根据预设规则对所述训练语料中的训练文本进行检查,所述预设规则包括是否存在重复配置的训练文本、训练文本的长度是否符合预设长度中的至少一种;
    当所述训练文本不符合所述预设规则时,则生成提醒通知。
  5. 根据权利要求1所述的方法,其特征在于,在所述根据所述语料模板和实体生成训练语料之后,还包括:
    统计训练服务器的剩余计算资源,当所述剩余计算资源大于预设资源阈值时,则将所述待训练模型的训练任务发送给所述训练服务器;
    所述将所述训练文本作为待训练模型的输入,将对应的文本标注作为所述待训练模型的期望输出对所述待训练模型进行训练,得到目标模型,包括:
    通过所述训练服务器将所述训练文本作为待训练模型的输入,将对应的文本标注作为所述待训练模型的期望输出对所述待训练模型进行训练,得到目标模型。
  6. 根据权利要求5所述的方法,其特征在于,所述统计训练服务器的剩余计算资源,包括:
    获取影响所述训练服务器剩余计算资源的指标,所述指标包括:内存的占用、CPU使用率、GPU使用率,GPU的显存中的至少一种;
    获取正在训练的模型数量和所述训练语料的大小;
    根据所述指标、所述正在训练的模型数量和所述训练语料的大小计算得到所述训练服务器的剩余计算资源。
  7. 根据权利要求1或3所述的方法,其特征在于,所述方法还包括:
    获取所述目标模型对应的模型标识,当所述模型标识对应的目标模型为第一次训练时,则将所述目标模型加载到预测服务器的内存,当加载完成后,启动所述目标模型提供意图预测服务;
    当所述模型标识对应的目标模型不是第一次训练时,则将所述目标模型加载到预测服务器的内存,当加载完成后,将预测调用从所述模型标识对应的旧的模型切换到所述目标模型。
  8. 一种模型的自训练装置,其特征在于,所述装置包括:
    接收模块,用于接收自定义的语料模板和实体;
    生成模块,用于根据所述语料模板和实体生成训练语料,所述训练语料中包括训练文本和对应的文本标注,所述训练文本是根据所述语料模板和实体进行组合生成的;
    训练模块,用于将所述训练文本作为待训练模型的输入,将对应的文本标注作为所述待训练模型的期望输出对所述待训练模型进行训练,训练完成得到目标模型。
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如权利要求1至7中任一项所述方法的步骤。
  10. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如权利要求1至7中任一项所述方法的步骤。
PCT/CN2018/124032 2018-12-26 2018-12-26 模型的自训练方法、装置、计算机设备及存储介质 WO2020132985A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/124032 WO2020132985A1 (zh) 2018-12-26 2018-12-26 模型的自训练方法、装置、计算机设备及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/124032 WO2020132985A1 (zh) 2018-12-26 2018-12-26 模型的自训练方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2020132985A1 true WO2020132985A1 (zh) 2020-07-02

Family

ID=71126146

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/124032 WO2020132985A1 (zh) 2018-12-26 2018-12-26 模型的自训练方法、装置、计算机设备及存储介质

Country Status (1)

Country Link
WO (1) WO2020132985A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140358539A1 (en) * 2013-05-29 2014-12-04 Tencent Technology (Shenzhen) Company Limited Method and apparatus for building a language model
CN104615589A (zh) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 训练命名实体识别模型的方法、命名实体识别方法及装置
CN106815193A (zh) * 2015-11-27 2017-06-09 北京国双科技有限公司 模型训练方法及装置和错别字识别方法及装置
CN108446286A (zh) * 2017-02-16 2018-08-24 阿里巴巴集团控股有限公司 一种自然语言问句答案的生成方法、装置及服务器

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140358539A1 (en) * 2013-05-29 2014-12-04 Tencent Technology (Shenzhen) Company Limited Method and apparatus for building a language model
CN104615589A (zh) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 训练命名实体识别模型的方法、命名实体识别方法及装置
CN106815193A (zh) * 2015-11-27 2017-06-09 北京国双科技有限公司 模型训练方法及装置和错别字识别方法及装置
CN108446286A (zh) * 2017-02-16 2018-08-24 阿里巴巴集团控股有限公司 一种自然语言问句答案的生成方法、装置及服务器

Similar Documents

Publication Publication Date Title
US11170055B2 (en) Artificial intelligence augmented document capture and processing systems and methods
CN111753198B (zh) 信息推荐方法和装置、以及电子设备和可读存储介质
US9514417B2 (en) Cloud-based plagiarism detection system performing predicting based on classified feature vectors
US10713306B2 (en) Content pattern based automatic document classification
US20190180098A1 (en) Content based transformation for digital documents
US10402486B2 (en) Document conversion, annotation, and data capturing system
US20220084524A1 (en) Generating summary text compositions
WO2020140639A1 (zh) 基于机器学习的报表生成方法、装置和计算机设备
CN113569115A (zh) 数据分类方法、装置、设备及计算机可读存储介质
US11868714B2 (en) Facilitating generation of fillable document templates
CN111435449B (zh) 模型的自训练方法、装置、计算机设备及存储介质
US20230351121A1 (en) Method and system for generating conversation flows
US20230325601A1 (en) System and method for intelligent generation of privilege logs
CN103403713B (zh) 文件系统中的文件变体
WO2020132985A1 (zh) 模型的自训练方法、装置、计算机设备及存储介质
WO2023179038A1 (zh) 数据标注的方法、ai开发平台、计算设备集群和存储介质
CN112445905A (zh) 一种信息处理方法和装置
CN112579149B (zh) 模型训练程序镜像的生成方法、装置、设备及存储介质
US20220237234A1 (en) Document sampling using prefetching and precomputing
CN114490578A (zh) 数据模型的管理方法、装置及设备
CN115712719A (zh) 数据处理方法、装置、计算机可读存储介质和计算机设备
US20150186794A1 (en) Template regularization for generalization of learning systems
KR102671573B1 (ko) 질의 의도 분류를 통한 딥러닝 모델 리소스 할당 시스템
KR102671574B1 (ko) 사용자 질의 의도에 따른 클라우드 기반 딥러닝 모델 자원 할당 방법
US11783123B1 (en) Generating a dynamic template for transforming source data

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18945092

Country of ref document: EP

Kind code of ref document: A1