CN113204961A - Language model construction method, device, equipment and medium for NLP task - Google Patents

Language model construction method, device, equipment and medium for NLP task Download PDF

Info

Publication number
CN113204961A
CN113204961A CN202110602682.3A CN202110602682A CN113204961A CN 113204961 A CN113204961 A CN 113204961A CN 202110602682 A CN202110602682 A CN 202110602682A CN 113204961 A CN113204961 A CN 113204961A
Authority
CN
China
Prior art keywords
target
word vector
word
model
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110602682.3A
Other languages
Chinese (zh)
Other versions
CN113204961B (en
Inventor
于凤英
王健宗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202110602682.3A priority Critical patent/CN113204961B/en
Publication of CN113204961A publication Critical patent/CN113204961A/en
Application granted granted Critical
Publication of CN113204961B publication Critical patent/CN113204961B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to the technical field of artificial intelligence, and discloses a language model construction method, a device, equipment and a medium for NLP tasks, wherein the method comprises the following steps: acquiring a first dictionary of a target Word vector generation model of a target field, wherein the target Word vector generation model is a model obtained based on Word2vec training; acquiring a second dictionary of the initial language model, wherein the initial language model is a Bert model obtained by adopting sample data in an unlimited field for training; intersecting according to the first dictionary and the second dictionary to obtain intersection data of the target dictionary; fitting unconstrained linear transformation is carried out on the intersection data of the target dictionary by adopting a least square method to obtain a simulation matrix vector; and constructing a language model according to the initial language model, the target word vector generation model and the simulation matrix vector to obtain a target language model corresponding to the target field. The NLP task in the target field can be processed after the structure is changed, hardware cost is reduced, and time spent is reduced.

Description

Language model construction method, device, equipment and medium for NLP task
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a medium for constructing a language model for NLP tasks.
Background
For a pre-trained language model, we usually need to apply it to another domain to handle NLP (natural language processing) tasks. The conventional method is to perform unsupervised pre-training on a pre-trained language model by using a text in a target field to perform NLP task processing in the target field, for example, when a text mining task in a biomedical field is desired, it is necessary to initialize a BioBERT (pre-trained language representation model for biomedical text mining) using weights of a Bert model (language model) trained in a general field, and then pre-train the weight-initialized BioBERT using a corpus in the biomedical field. The training method has good results, but requires huge cost in terms of hardware, and takes a lot of time for training, thereby delaying the development of NLP tasks in emerging fields.
Disclosure of Invention
The method, the device, the equipment and the medium for constructing the language model for the NLP task aim at solving the technical problems that in the prior art, the text in the target field is adopted to perform unsupervised pre-training on the pre-trained language model to realize the processing of the NLP task in the target field, huge cost is needed in terms of hardware, and a large amount of time is needed for training.
In order to achieve the above object, the present application proposes a language model construction method for NLP task, the method comprising:
acquiring a first dictionary of a target Word vector generation model of a target field, wherein the target Word vector generation model is a model obtained based on Word2vec training;
acquiring a second dictionary of an initial language model, wherein the initial language model is a Bert model obtained by adopting sample data in an unlimited field for training;
performing intersection acquisition according to the first dictionary and the second dictionary to obtain intersection data of the target dictionaries;
fitting unconstrained linear transformation is carried out on the intersection data of the target dictionaries by adopting a least square method to obtain a simulation matrix vector;
and constructing a language model according to the initial language model, the target word vector generation model and the simulation matrix vector to obtain a target language model corresponding to the target field.
Further, before the step of obtaining the first dictionary of the target word vector generation model in the target domain, the method further includes:
acquiring a training sample set of the target field;
and training a word vector generation initial model by adopting the training sample set, and taking the training-finished word vector generation initial model as the target word vector generation model.
Further, the step of obtaining intersection data of the target dictionaries according to the intersection acquisition of the first dictionary and the second dictionary includes:
performing intersection acquisition according to the first dictionary and the second dictionary to obtain dictionary intersection data to be denoised;
removing noise characters from the intersection data of the dictionaries to be denoised to obtain the intersection data of the target dictionaries, wherein the noise characters comprise: emoticons, punctuation, and null characters.
Further, the simulation matrix vector is expressed as W and is calculated by the following formula:
Figure BDA0003093407580000021
wherein W is the simulation matrix vector for aligning a first word vector that is a word vector that inputs a target word into the target word vector generative model output and a second word vector that is a word vector that inputs the target word into the initial language model output, the target word being a word in the target dictionary intersection data; epsilonw2v(x) Inputting the words x in the intersection data of the target dictionary into the target word vector generation modelThe first word vector, epsilonLM(x) Is to input word x in the target lexicon intersection data into the second word vector output by the initial language model,
Figure BDA0003093407580000022
is to make the following calculation expression
Figure BDA0003093407580000023
Figure BDA0003093407580000024
Reaching a minimum value, LLM &LW2 v is the target dictionary intersection data, LLM is the first dictionary, LW2vIs the second dictionary of the second set of words,
Figure BDA0003093407580000025
is to W epsilonw2v(x)-εLM(x) And taking the square and then opening the root for calculation.
Further, the step of constructing a language model according to the initial language model, the target word vector generation model and the simulation matrix vector to obtain a target language model corresponding to the target field includes:
generating a vector generating unit according to the initial language model and the target word vector generating model to obtain a target word vector generating unit;
generating a word embedding unit according to the target word vector generating unit, the initial language model and the simulation matrix vector to obtain a target word embedding unit;
removing the previous structure of the encoder from the initial language model to obtain a target word vector processing unit;
and sequentially connecting the target word vector generating unit, the target word embedding unit and the target word vector processing unit to obtain the target language model corresponding to the target field.
Further, the step of generating a vector generation unit according to the initial language model and the target word vector generation model to obtain a target word vector generation unit includes:
taking the target word vector generation model as a first word vector generation subunit;
taking the word segmenter and the word vector generator of the initial language model as a second word vector generating subunit;
and the first word vector generating subunit and the second word vector generating subunit are arranged in parallel to obtain the target word vector generating unit.
Further, the step of generating a word embedding unit according to the target word vector generating unit, the initial language model and the simulation matrix vector to obtain a target word embedding unit includes:
a word vector source Bert judgment subunit is constructed according to the target word vector generation unit, wherein the word vector source Bert judgment subunit is used for judging whether a word vector generated by the second word vector generation subunit exists for each word in the target text data input into the target word vector generation unit, so as to obtain a word vector source Bert judgment result corresponding to each word in the target text data;
constructing a word vector alignment subunit according to the target word vector generation unit, the simulation matrix vector and the word vector source Bert judgment subunit, wherein the word vector alignment subunit is configured to, when the word vector source Bert judgment result indicates that no word vector source Bert exists, use all words corresponding to the word vector source Bert judgment result that no word vector source Bert exists as a word set to be aligned, obtain a word vector output by the first word vector generation subunit according to each word in the word set to be aligned, obtain a word vector set to be aligned, and multiply each word vector in the word vector set to be aligned with the simulation matrix vector to obtain an aligned word vector set;
constructing a word vector combination subunit according to the target word vector generation unit, the word vector source Bert judgment subunit and the word vector alignment subunit, where the word vector combination subunit is configured to, when the word vector source Bert judgment result indicates that a word vector source Bert exists, use all words corresponding to the word vector source Bert judgment result as word sets that do not need to be aligned, obtain word vectors output by the second word vector generation subunit according to each word in the word sets that do not need to be aligned, obtain a word vector set that does not need to be aligned, and splice the aligned word vector set and the vector word set that does not need to be aligned according to the word sequence of the target text data to obtain target word vector data;
taking the word embedding layer of the initial language model as a word embedding subunit;
and performing word embedding unit generation according to the word vector source Bert judgment subunit, the word vector alignment subunit, the word vector combination subunit and the word embedding subunit to obtain the target word embedding unit.
The application also provides a language model building device for NLP task, the device comprises:
the first dictionary determining module is used for obtaining a first dictionary of a target Word vector generating model of the target field, wherein the target Word vector generating model is a model obtained based on Word2vec training;
the second dictionary determining module is used for acquiring a second dictionary of the initial language model, wherein the initial language model is a Bert model obtained by adopting sample data in an unlimited field;
the target dictionary intersection data determining module is used for performing intersection acquisition according to the first dictionary and the second dictionary to obtain target dictionary intersection data;
the simulation matrix vector determining module is used for performing fitting unconstrained linear transformation on the intersection data of the target dictionary by adopting a least square method to obtain a simulation matrix vector;
and the target language model determining module is used for constructing a language model according to the initial language model, the target word vector generating model and the simulation matrix vector to obtain a target language model corresponding to the target field.
The present application further proposes a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of any of the above methods when executing the computer program.
The present application also proposes a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any of the above.
According to the language model construction method, device, equipment and medium for the NLP task, a first dictionary of a target Word vector generation model in a target field is obtained, and the target Word vector generation model is a model obtained based on Word2vec training; acquiring a second dictionary of the initial language model, wherein the initial language model is a Bert model obtained by adopting sample data in an unlimited field for training; intersection acquisition is carried out according to the first dictionary and the second dictionary to obtain intersection data of the target dictionaries; fitting unconstrained linear transformation is carried out on the intersection data of the target dictionary by adopting a least square method to obtain a simulation matrix vector; the method comprises the steps of constructing a language model according to an initial language model, a target Word vector generation model and a simulation matrix vector to obtain a target language model corresponding to a target field, and realizing that an NLP task in the target field can be processed after a model obtained by training based on Word2vec in the target field is adopted to structurally modify a Bert model obtained by training sample data in an unlimited field, wherein the cost in terms of hardware required by the model obtained by training based on Word2vec in the target field is less than that of unsupervised pre-training of the language model by adopting a text in the target field, and the time required by the model is less than that of unsupervised pre-training of the language model by adopting the text in the target field, so that the hardware cost is reduced, the time required by the model is reduced, and the development of the NLP task in the emerging field is.
Drawings
Fig. 1 is a schematic flowchart of a language model construction method for NLP task according to an embodiment of the present application;
FIG. 2 is a block diagram schematically illustrating the structure of a language model building apparatus for NLP task according to an embodiment of the present application;
fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In order to solve the technical problems that in the prior art, the text in the target field is adopted to perform unsupervised pre-training on a pre-trained language model to process the NLP task in the target field, huge cost is needed in terms of hardware, and a large amount of time is needed for training, the method for constructing the language model for the NLP task is provided, the method is applied to the technical field of artificial intelligence, and the method is further applied to the technical field of artificial intelligence natural language processing. According to the language model construction method for the NLP task, the model obtained by training based on Word2vec in the target field is adopted to carry out structural modification on the Bert model obtained by training based on sample data in an unlimited field, and then the NLP task in the target field can be processed, because the cost in terms of hardware required by the model obtained by training based on Word2vec in the target field is less than that of unsupervised pre-training of the language model by adopting the text in the target field, and the time required by unsupervised pre-training of the language model by adopting the text in the target field is less than that of unsupervised pre-training of the language model, the hardware cost is reduced, the time required by spending is reduced, and the development of the NLP task in the emerging field is facilitated.
Referring to fig. 1, in an embodiment of the present application, a method for constructing a language model for an NLP task is provided, where the method includes:
s1: acquiring a first dictionary of a target Word vector generation model of a target field, wherein the target Word vector generation model is a model obtained based on Word2vec training;
s2: acquiring a second dictionary of an initial language model, wherein the initial language model is a Bert model obtained by adopting sample data in an unlimited field for training;
s3: performing intersection acquisition according to the first dictionary and the second dictionary to obtain intersection data of the target dictionaries;
s4: fitting unconstrained linear transformation is carried out on the intersection data of the target dictionaries by adopting a least square method to obtain a simulation matrix vector;
s5: and constructing a language model according to the initial language model, the target word vector generation model and the simulation matrix vector to obtain a target language model corresponding to the target field.
In the embodiment, a first dictionary of a target Word vector generation model in a target field is obtained, wherein the target Word vector generation model is a model obtained based on Word2vec training; acquiring a second dictionary of the initial language model, wherein the initial language model is a Bert model obtained by adopting sample data in an unlimited field for training; intersection acquisition is carried out according to the first dictionary and the second dictionary to obtain intersection data of the target dictionaries; fitting unconstrained linear transformation is carried out on the intersection data of the target dictionary by adopting a least square method to obtain a simulation matrix vector; the method comprises the steps of constructing a language model according to an initial language model, a target Word vector generation model and a simulation matrix vector to obtain a target language model corresponding to a target field, and realizing that an NLP task in the target field can be processed after a model obtained by training based on Word2vec in the target field is adopted to structurally modify a Bert model obtained by training sample data in an unlimited field, wherein the cost in terms of hardware required by the model obtained by training based on Word2vec in the target field is less than that of unsupervised pre-training of the language model by adopting a text in the target field, and the time required by the model is less than that of unsupervised pre-training of the language model by adopting the text in the target field, so that the hardware cost is reduced, the time required by the model is reduced, and the development of the NLP task in the emerging field is facilitated.
For S1, the first dictionary of the target word vector generation model of the target domain may be obtained from the database, the first dictionary of the target word vector generation model of the target domain may be obtained from the third-party application system, or the first dictionary of the target word vector generation model of the target domain input by the user may be obtained.
Target areas include, but are not limited to: biomedical field, financial field.
The first dictionary, that is, the dictionary in the target word vector generation model.
The target Word vector generation model is obtained by training a model obtained based on Word2vec by adopting a training sample in a target field. That is, the target word vector generation model may be used for word vector generation of text data of the target domain. Word2vec, a group of correlation models used to generate Word vectors.
For S2, the second dictionary of the initial language model may be obtained from a database, the third-party application system, or the user input.
The second dictionary, i.e., the dictionary of the initial language model.
The initial language model is a Bert (bidirectional Encoder replication from transformations) model obtained by training sample data in an unlimited field, that is, the initial language model is a Bert model trained in the general field.
For S3, the same words are found from the first dictionary and the second dictionary, and all the found same words are used as target dictionary intersection data. That is, the words in the target dictionary intersection data are words that exist in both the first dictionary and the second dictionary.
And S4, performing fitting unconstrained linear transformation on the intersection data of the target dictionaries by adopting a least square method, calculating to obtain a parameter matrix, and taking the obtained parameter matrix as a simulation matrix vector.
It is understood that each vector element in the analog matrix vector is a value between 0 and 1, and may include 0 and may also include 1.
And simulating a matrix vector to align a word vector obtained by inputting the words in the intersection data of the target dictionary into the target word vector generation model with a word vector obtained by inputting the initial language model. That is, multiplication of the word vector obtained by inputting the word in the target dictionary intersection data into the target word vector generation model and the simulation matrix vector is substantially consistent with the word vector obtained by inputting the word into the initial language model. Therefore, the word vectors output by the target word vector generation model can be simulated into the word vectors output by the initial language model through the simulation matrix vectors.
For S5, a model is generated according to the target word vector and the simulation matrix vector, a previous structure of the encoder of the initial language model is adjusted, and the adjusted network structure is used as the target language model corresponding to the target domain.
In an embodiment, before the step of obtaining the first dictionary of the target word vector generation model in the target domain, the method further includes:
s11: acquiring a training sample set of the target field;
s12: and training a word vector generation initial model by adopting the training sample set, and taking the training-finished word vector generation initial model as the target word vector generation model.
According to the method and the device, the training of the word vector generation initial model is carried out by adopting the training sample set in the target field to obtain the target word vector generation model, so that support is provided for the generation of the word vector in the target field.
For S11, the training sample set of the target field may be obtained from a database, or the training sample set of the target field may be obtained from a third-party application system, or the training sample set of the target field may be obtained.
The training sample set of the target field means that sample data in the training sample set comes from the target field.
For S12, the specific steps of training the word vector generation initial model by using the training sample set are not described herein again.
Generating an initial model using Word2vec as the Word vector.
In an embodiment, the step of obtaining intersection data of the target dictionaries according to the intersection acquisition of the first dictionary and the second dictionary includes:
s31: performing intersection acquisition according to the first dictionary and the second dictionary to obtain dictionary intersection data to be denoised;
s32: removing noise characters from the intersection data of the dictionaries to be denoised to obtain the intersection data of the target dictionaries, wherein the noise characters comprise: emoticons, punctuation, and null characters.
According to the method and the device, the intersection data of the target dictionary is used after the intersection data of the dictionary is removed, the influence of the noise characters on the accuracy of determining the simulation matrix vector is reduced, and the accuracy of the target language model is improved.
For S31, finding out the same words from the first dictionary and the second dictionary, and using all the found words as the intersection data of the dictionaries to be denoised. It will be appreciated that each noisy character is treated as a word in the lexicon intersection data to be denoised.
For S32, acquiring a preset noise character library; and searching each word in the intersection data of the dictionaries to be denoised in the preset noise character library, deleting all words which are successfully searched in the preset noise character library from the intersection data of the dictionaries to be denoised, and taking the intersection data of the dictionaries to be denoised which is deleted as the intersection data of the target dictionaries.
The noise characters include, but are not limited to: emoticons, punctuation, and null characters.
Because the Bert model carries out accurate Word vector generation on the noise characters in the preset noise character library, the Word vectors generated by the Bert model are directly adopted for the noise characters in the preset noise character library, and Word2vec Word vectors are not needed, so that the noise characters in the preset noise character library are deleted from the intersection data of the dictionary to be denoised, and the influence of the noise characters on the accuracy of determining the simulation matrix vectors is avoided.
In one embodiment, the above-mentioned analog matrix vector is represented as W and is calculated by using the following formula:
Figure BDA0003093407580000091
wherein W is the simulation matrix vector for aligning a first word vector that is a word vector that inputs a target word into the target word vector generative model output and a second word vector that is a word vector that inputs the target word into the initial language model output, the target word being a word in the target dictionary intersection data; epsilonw2v(x) Is to input the word x in the intersection data of the target dictionary into the first word vector output by the target word vector generation model, epsilonLM(x) Is to input word x in the target lexicon intersection data into the second word vector output by the initial language model,
Figure BDA0003093407580000092
is to make the following calculation expression
Figure BDA0003093407580000093
Figure BDA0003093407580000094
Reaching a minimum value, LLM &LW2 v is the target dictionary intersection data, LLM is the first dictionary, LW2vIs the second dictionary of the second set of words,
Figure BDA0003093407580000095
is to W epsilonw2v(x)-εLM(x) And taking the square and then opening the root for calculation.
In this embodiment, a least square method is adopted to perform fitting unconstrained linear transformation on the target dictionary intersection data, determine a simulation matrix vector that aligns a Word vector output by a Word input target Word vector generation model in the target dictionary intersection data with a Word vector output by an input initial language model, and provide support for subsequently adopting a model obtained by training based on Word2vec in a target field to perform structural modification on a Bert model obtained by training sample data in an unlimited field, so as to process an NLP task in the target field.
Wherein when
Figure BDA0003093407580000101
And W when the minimum value is reached is the simulation matrix vector.
In an embodiment, the step of constructing a language model according to the initial language model, the target word vector generation model, and the simulation matrix vector to obtain a target language model corresponding to the target field includes:
s51: generating a vector generating unit according to the initial language model and the target word vector generating model to obtain a target word vector generating unit;
s52: generating a word embedding unit according to the target word vector generating unit, the initial language model and the simulation matrix vector to obtain a target word embedding unit;
s53: removing the previous structure of the encoder from the initial language model to obtain a target word vector processing unit;
s54: and sequentially connecting the target word vector generating unit, the target word embedding unit and the target word vector processing unit to obtain the target language model corresponding to the target field.
According to the embodiment, the initial language model is adjusted according to the target Word vector generation model and the simulation matrix vector to obtain the target language model corresponding to the target field, the model obtained by training the target field based on Word2vec is adopted to carry out structural modification on the Bert model obtained by training sample data in an unlimited field, and then the NLP task in the target field can be processed, because the cost in terms of hardware required by the model obtained by training the target field based on Word2vec is less than that of unsupervised pre-training of the language model by adopting the text in the target field, and the time required to be spent is less than that of unsupervised pre-training of the language model by adopting the text in the target field, the hardware cost is reduced, and the time required to be spent is reduced.
And S51, taking the word segmenter and the word vector generator of the initial language model and the target word vector generation model as word vector generation subunits connected in parallel to obtain a target word vector generation unit.
Optionally, the union word segmenter is used as a first word segmenter, the word segmenter of the initial language model is used as a second word segmenter, the word vector generator of the target word vector generation model is used as a first word vector generator, and the word vector generator of the initial language model is used as a second word vector generator; and sequentially connecting the first Word splitter and the first Word vector generator to obtain a Word2vec Word vector generating subunit, sequentially connecting the second Word splitter and the second Word vector generator to obtain a Bert Word vector generating subunit, and arranging the Word2vec Word vector generating subunit and the Bert Word vector generating subunit in parallel to obtain the target Word vector generating unit. The word segmentation device is a word segmentation device which can perform word segmentation on the text data of the general field and the target field.
And S52, redefining the word embedding layer of the initial language model according to the target word vector generating unit and the simulation matrix vector to obtain a target word embedding unit.
For S53, the previous structures of the encoder of the initial language model are removed, and the remaining structures in the initial language model are used as the target word vector processing unit.
For S54, the output end of the target word vector generating unit is connected to the input end of the target word embedding unit, the output end of the target word embedding unit is connected to the input end of the target word vector processing unit, and the target word vector generating unit, the target word embedding unit, and the target word vector processing unit that have completed the connection are used as the target language model corresponding to the target domain. That is, the target language model is a model obtained by adjusting the previous structure of the encoder of the initial language model.
In an embodiment, the step of generating the vector generating unit according to the initial language model and the target word vector generating model to obtain the target word vector generating unit includes:
s511: taking the target word vector generation model as a first word vector generation subunit;
s512: taking the word segmenter and the word vector generator of the initial language model as a second word vector generating subunit;
s513: and the first word vector generating subunit and the second word vector generating subunit are arranged in parallel to obtain the target word vector generating unit.
The Word segmentation device, the Word vector generator and the target Word vector generation model of the initial language model are used as parallel Word vector generation subunits, and support is provided for the fact that the model obtained by training based on Word2vec in the target field is adopted to carry out structural modification on the Bert model obtained by training sample data in an unlimited field, and then the NLP task in the target field can be processed.
For S511, the target word vector generation model is directly used as a first word vector generation subunit.
The first word vector generation subunit may perform word segmentation and word vector generation on the input text data.
For step S512, after the word segmenter of the initial language model and the word vector generator of the initial language model are sequentially connected, the connected word segmenter of the initial language model and the connected word vector generator of the initial language model serve as a second word vector generation subunit.
The word segmentation device of the initial language model is used for segmenting words of input text data, and the word vector generator of the initial language model is used for generating word vectors of the input text data.
For S513, the first word vector generating subunit and the second word vector generating subunit are arranged in parallel, that is, the target text data input to the target word vector generating unit is input to the first word vector generating subunit and the second word vector generating subunit at the same time, the first word vector generating subunit outputs data to the target word embedding unit, and the second word vector generating subunit outputs data to the target word embedding unit.
In an embodiment, the step of generating a word embedding unit according to the target word vector generating unit, the initial language model, and the simulation matrix vector to obtain a target word embedding unit includes:
s521: a word vector source Bert judgment subunit is constructed according to the target word vector generation unit, wherein the word vector source Bert judgment subunit is used for judging whether a word vector generated by the second word vector generation subunit exists for each word in the target text data input into the target word vector generation unit, so as to obtain a word vector source Bert judgment result corresponding to each word in the target text data;
s522: constructing a word vector alignment subunit according to the target word vector generation unit, the simulation matrix vector and the word vector source Bert judgment subunit, wherein the word vector alignment subunit is configured to, when the word vector source Bert judgment result indicates that no word vector source Bert exists, use all words corresponding to the word vector source Bert judgment result that no word vector source Bert exists as a word set to be aligned, obtain a word vector output by the first word vector generation subunit according to each word in the word set to be aligned, obtain a word vector set to be aligned, and multiply each word vector in the word vector set to be aligned with the simulation matrix vector to obtain an aligned word vector set;
s523: constructing a word vector combination subunit according to the target word vector generation unit, the word vector source Bert judgment subunit and the word vector alignment subunit, where the word vector combination subunit is configured to, when the word vector source Bert judgment result indicates that a word vector source Bert exists, use all words corresponding to the word vector source Bert judgment result as word sets that do not need to be aligned, obtain word vectors output by the second word vector generation subunit according to each word in the word sets that do not need to be aligned, obtain a word vector set that does not need to be aligned, and splice the aligned word vector set and the vector word set that does not need to be aligned according to the word sequence of the target text data to obtain target word vector data;
s524: taking the word embedding layer of the initial language model as a word embedding subunit;
s525: and performing word embedding unit generation according to the word vector source Bert judgment subunit, the word vector alignment subunit, the word vector combination subunit and the word embedding subunit to obtain the target word embedding unit.
According to the embodiment, the Word embedding layer of the initial language model is redefined according to the target Word vector generating unit and the simulation matrix vector, and support is provided for the following step that the model obtained by training based on Word2vec in the target field is adopted to carry out structure change on the Bert model obtained by training sample data in an unlimited field, and then the NLP task in the target field can be processed.
And for S521, constructing a word vector source Bert judgment subunit, and connecting the input end of the word vector source Bert judgment subunit with the output end of the second word vector generation subunit.
The working principle of the word vector source Bert judgment subunit is as follows: and judging whether the word vector generated by the second word vector generation subunit exists or not according to each word in the target text data input into the target word vector generation unit to obtain a word vector source Bert judgment result corresponding to each word in the target text data.
That is to say, for each word in the target text data input to the target word vector generation unit, determining whether the second word vector generation subunit successfully generates a word vector, determining that the word vector source Bert determination result corresponding to the word for which the second word vector generation subunit unsuccessfully generates a word vector is absent, and determining that the word vector source Bert determination result corresponding to the word for which the second word vector generation subunit successfully generates a word vector is present.
For S522, a word vector alignment subunit is constructed, and the input end of the word vector alignment subunit is connected to the output end of the first word vector generation subunit and the output end of the word vector source Bert judgment subunit, respectively.
The working principle of the word vector alignment subunit is as follows: and when the judgment result of the word vector source Bert is that the word vector source Bert does not exist, taking all words corresponding to the word vector source Bert as a word set to be aligned, respectively acquiring the word vectors output by the first word vector generation subunit according to each word in the word set to be aligned to obtain a word vector set to be aligned, and respectively multiplying each word vector in the word vector set to be aligned with the simulation matrix vector to obtain an aligned word vector set.
That is to say, when the word vector source Bert determination result indicates that no word vector source Bert exists, it means that the word corresponding to the word vector source Bert is not correctly identified by the second word vector generation subunit and a word vector is generated as the word vector source Bert determination result, and therefore all words corresponding to the word vector source Bert determination result are taken as a word set to be aligned; respectively acquiring Word vectors output by the first Word vector generation subunit according to each Word in the Word set to be aligned to obtain a Word vector set to be aligned, so as to obtain Word vectors generated by Word2 vec; and multiplying each Word vector in the Word vector set to be aligned with the simulation matrix vector respectively, taking each multiplied data as an aligned Word vector, and taking all aligned Word vectors as an aligned Word vector set, thereby realizing that the Word vector generated by Word2vec is multiplied with the simulation matrix vector to simulate the Word vector generated by the second Word vector generation subunit.
For S523, a word vector combination subunit is constructed, and an input end of the word vector combination subunit is respectively connected to an output end of the target word vector generation unit, an output end of the word vector source Bert judgment subunit, and an output end of the word vector alignment subunit.
The working principle of the word vector combination subunit is as follows: when the judgment result of the word vector source Bert is that the word vector source Bert exists, taking all words corresponding to the word vector source Bert as word sets which do not need to be aligned according to the judgment result of the word vector source Bert, respectively obtaining word vectors output by the second word vector generation subunit according to each word in the word sets which do not need to be aligned, obtaining word vector sets which do not need to be aligned, and splicing the aligned word vector sets and the word vector sets which do not need to be aligned according to the character sequence of the target text data to obtain target word vector data.
That is to say, when the word vector source Bert exists, it means that the word corresponding to the word vector source Bert exists in the word vector source Bert determination result, which is the word vector source Bert, can be correctly identified by the second word vector generation subunit to generate the word vector, and the word corresponding to the word vector source Bert does not need to be aligned in the word vector source Bert determination result, so that all the words corresponding to the word vector source Bert in the word vector source Bert determination result are used as a word set that does not need to be aligned; respectively acquiring word vectors output by the second word vector generation subunit according to each word in the word set which does not need to be aligned to obtain a word vector set which does not need to be aligned, thereby obtaining word vectors generated by Bert; and splicing the aligned word vector set and the word vector set which does not need to be aligned according to the character sequence of the target text data, and taking the spliced data as target word vector data.
For S524, directly using the word embedding layer of the initial language model as a word embedding subunit, and connecting an input end of the word embedding subunit with an output end of the word vector combination subunit.
For S525, the word vector source Bert determining subunit, the word vector aligning subunit, the word vector combining subunit, and the word embedding subunit, which are connected, are used as word embedding units, and the obtained word embedding unit is used as the target word embedding unit.
Referring to fig. 2, the present application also proposes a language model construction apparatus for NLP task, the apparatus comprising:
the first dictionary determining module 100 is configured to obtain a first dictionary of a target Word vector generation model in a target field, where the target Word vector generation model is a model obtained based on Word2vec training;
the second dictionary determining module 200 is configured to obtain a second dictionary of an initial language model, where the initial language model is a Bert model trained by sample data in an unlimited field;
the target dictionary intersection data determining module 300 is configured to perform intersection acquisition according to the first dictionary and the second dictionary to obtain target dictionary intersection data;
the simulation matrix vector determining module 400 is configured to perform fitting unconstrained linear transformation on the intersection data of the target dictionary by using a least square method to obtain a simulation matrix vector;
and the target language model determining module 500 is configured to perform language model construction according to the initial language model, the target word vector generation model and the simulation matrix vector to obtain a target language model corresponding to the target field.
In the embodiment, a first dictionary of a target Word vector generation model in a target field is obtained, wherein the target Word vector generation model is a model obtained based on Word2vec training; acquiring a second dictionary of the initial language model, wherein the initial language model is a Bert model obtained by adopting sample data in an unlimited field for training; intersection acquisition is carried out according to the first dictionary and the second dictionary to obtain intersection data of the target dictionaries; fitting unconstrained linear transformation is carried out on the intersection data of the target dictionary by adopting a least square method to obtain a simulation matrix vector; the method comprises the steps of constructing a language model according to an initial language model, a target Word vector generation model and a simulation matrix vector to obtain a target language model corresponding to a target field, and realizing that an NLP task in the target field can be processed after a model obtained by training based on Word2vec in the target field is adopted to structurally modify a Bert model obtained by training sample data in an unlimited field, wherein the cost in terms of hardware required by the model obtained by training based on Word2vec in the target field is less than that of unsupervised pre-training of the language model by adopting a text in the target field, and the time required by the model is less than that of unsupervised pre-training of the language model by adopting the text in the target field, so that the hardware cost is reduced, the time required by the model is reduced, and the development of the NLP task in the emerging field is facilitated.
Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer device is used for storing data such as a language model construction method for NLP tasks. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a language model construction method for NLP tasks. The language model construction method for the NLP task comprises the following steps: acquiring a first dictionary of a target Word vector generation model of a target field, wherein the target Word vector generation model is a model obtained based on Word2vec training; acquiring a second dictionary of an initial language model, wherein the initial language model is a Bert model obtained by adopting sample data in an unlimited field for training; performing intersection acquisition according to the first dictionary and the second dictionary to obtain intersection data of the target dictionaries; fitting unconstrained linear transformation is carried out on the intersection data of the target dictionaries by adopting a least square method to obtain a simulation matrix vector; and constructing a language model according to the initial language model, the target word vector generation model and the simulation matrix vector to obtain a target language model corresponding to the target field.
In the embodiment, a first dictionary of a target Word vector generation model in a target field is obtained, wherein the target Word vector generation model is a model obtained based on Word2vec training; acquiring a second dictionary of the initial language model, wherein the initial language model is a Bert model obtained by adopting sample data in an unlimited field for training; intersection acquisition is carried out according to the first dictionary and the second dictionary to obtain intersection data of the target dictionaries; fitting unconstrained linear transformation is carried out on the intersection data of the target dictionary by adopting a least square method to obtain a simulation matrix vector; the method comprises the steps of constructing a language model according to an initial language model, a target Word vector generation model and a simulation matrix vector to obtain a target language model corresponding to a target field, and realizing that an NLP task in the target field can be processed after a model obtained by training based on Word2vec in the target field is adopted to structurally modify a Bert model obtained by training sample data in an unlimited field, wherein the cost in terms of hardware required by the model obtained by training based on Word2vec in the target field is less than that of unsupervised pre-training of the language model by adopting a text in the target field, and the time required by the model is less than that of unsupervised pre-training of the language model by adopting the text in the target field, so that the hardware cost is reduced, the time required by the model is reduced, and the development of the NLP task in the emerging field is facilitated.
An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements a language model construction method for NLP task, including the steps of: acquiring a first dictionary of a target Word vector generation model of a target field, wherein the target Word vector generation model is a model obtained based on Word2vec training; acquiring a second dictionary of an initial language model, wherein the initial language model is a Bert model obtained by adopting sample data in an unlimited field for training; performing intersection acquisition according to the first dictionary and the second dictionary to obtain intersection data of the target dictionaries; fitting unconstrained linear transformation is carried out on the intersection data of the target dictionaries by adopting a least square method to obtain a simulation matrix vector; and constructing a language model according to the initial language model, the target word vector generation model and the simulation matrix vector to obtain a target language model corresponding to the target field.
The executed language model construction method for the NLP task obtains a first dictionary of a target Word vector generation model in a target field, wherein the target Word vector generation model is a model obtained based on Word2vec training; acquiring a second dictionary of the initial language model, wherein the initial language model is a Bert model obtained by adopting sample data in an unlimited field for training; intersection acquisition is carried out according to the first dictionary and the second dictionary to obtain intersection data of the target dictionaries; fitting unconstrained linear transformation is carried out on the intersection data of the target dictionary by adopting a least square method to obtain a simulation matrix vector; the method comprises the steps of constructing a language model according to an initial language model, a target Word vector generation model and a simulation matrix vector to obtain a target language model corresponding to a target field, and realizing that an NLP task in the target field can be processed after a model obtained by training based on Word2vec in the target field is adopted to structurally modify a Bert model obtained by training sample data in an unlimited field, wherein the cost in terms of hardware required by the model obtained by training based on Word2vec in the target field is less than that of unsupervised pre-training of the language model by adopting a text in the target field, and the time required by the model is less than that of unsupervised pre-training of the language model by adopting the text in the target field, so that the hardware cost is reduced, the time required by the model is reduced, and the development of the NLP task in the emerging field is facilitated.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (10)

1. A method of language model construction for NLP tasks, the method comprising:
acquiring a first dictionary of a target Word vector generation model of a target field, wherein the target Word vector generation model is a model obtained based on Word2vec training;
acquiring a second dictionary of an initial language model, wherein the initial language model is a Bert model obtained by adopting sample data in an unlimited field for training;
performing intersection acquisition according to the first dictionary and the second dictionary to obtain intersection data of the target dictionaries;
fitting unconstrained linear transformation is carried out on the intersection data of the target dictionaries by adopting a least square method to obtain a simulation matrix vector;
and constructing a language model according to the initial language model, the target word vector generation model and the simulation matrix vector to obtain a target language model corresponding to the target field.
2. The method of claim 1, wherein the step of obtaining the first dictionary of the target word vector generation model of the target domain is preceded by the step of:
acquiring a training sample set of the target field;
and training a word vector generation initial model by adopting the training sample set, and taking the training-finished word vector generation initial model as the target word vector generation model.
3. The method of claim 1, wherein the step of obtaining intersection data of target dictionaries by performing intersection acquisition according to the first dictionary and the second dictionary comprises:
performing intersection acquisition according to the first dictionary and the second dictionary to obtain dictionary intersection data to be denoised;
removing noise characters from the intersection data of the dictionaries to be denoised to obtain the intersection data of the target dictionaries, wherein the noise characters comprise: emoticons, punctuation, and null characters.
4. The method for constructing a language model for NLP task according to claim 1, wherein the simulation matrix vector is represented by W and is calculated by the following formula:
Figure FDA0003093407570000011
wherein W is the simulation matrix vector for aligning a first word vector that is a word vector that inputs a target word into the target word vector generative model output and a second word vector that is a word vector that inputs the target word into the initial language model output, the target word being a word in the target dictionary intersection data; epsilonw2v(x) Is to input the word x in the intersection data of the target dictionary into the first word vector output by the target word vector generation model, epsilonLM(x) Is to input word x in the target lexicon intersection data into the second word vector output by the initial language model,
Figure FDA0003093407570000021
is to make the following calculation expression
Figure FDA0003093407570000022
Reaches a minimum value, LLM∩LW2vIs the target dictionary intersection data, LLMIs the first dictionary, LW2vIs the second dictionary of the second set of words,
Figure FDA0003093407570000023
Figure FDA0003093407570000024
is to W epsilonw2v(x)-εLM(x) And taking the square and then opening the root for calculation.
5. The method according to claim 1, wherein the step of constructing a language model according to the initial language model, the target word vector generation model, and the simulation matrix vector to obtain a target language model corresponding to the target domain comprises:
generating a vector generating unit according to the initial language model and the target word vector generating model to obtain a target word vector generating unit;
generating a word embedding unit according to the target word vector generating unit, the initial language model and the simulation matrix vector to obtain a target word embedding unit;
removing the previous structure of the encoder from the initial language model to obtain a target word vector processing unit;
and sequentially connecting the target word vector generating unit, the target word embedding unit and the target word vector processing unit to obtain the target language model corresponding to the target field.
6. The method according to claim 5, wherein the step of generating a target word vector generating unit by a vector generating unit according to the initial language model and the target word vector generating model comprises:
taking the target word vector generation model as a first word vector generation subunit;
taking the word segmenter and the word vector generator of the initial language model as a second word vector generating subunit;
and the first word vector generating subunit and the second word vector generating subunit are arranged in parallel to obtain the target word vector generating unit.
7. The method according to claim 6, wherein the step of generating a word embedding unit according to the target word vector generating unit, the initial language model and the simulation matrix vector to obtain a target word embedding unit comprises:
a word vector source Bert judgment subunit is constructed according to the target word vector generation unit, wherein the word vector source Bert judgment subunit is used for judging whether a word vector generated by the second word vector generation subunit exists for each word in the target text data input into the target word vector generation unit, so as to obtain a word vector source Bert judgment result corresponding to each word in the target text data;
constructing a word vector alignment subunit according to the target word vector generation unit, the simulation matrix vector and the word vector source Bert judgment subunit, wherein the word vector alignment subunit is configured to, when the word vector source Bert judgment result indicates that no word vector source Bert exists, use all words corresponding to the word vector source Bert judgment result that no word vector source Bert exists as a word set to be aligned, obtain a word vector output by the first word vector generation subunit according to each word in the word set to be aligned, obtain a word vector set to be aligned, and multiply each word vector in the word vector set to be aligned with the simulation matrix vector to obtain an aligned word vector set;
constructing a word vector combination subunit according to the target word vector generation unit, the word vector source Bert judgment subunit and the word vector alignment subunit, where the word vector combination subunit is configured to, when the word vector source Bert judgment result indicates that a word vector source Bert exists, use all words corresponding to the word vector source Bert judgment result as word sets that do not need to be aligned, obtain word vectors output by the second word vector generation subunit according to each word in the word sets that do not need to be aligned, obtain a word vector set that does not need to be aligned, and splice the aligned word vector set and the vector word set that does not need to be aligned according to the word sequence of the target text data to obtain target word vector data;
taking the word embedding layer of the initial language model as a word embedding subunit;
and performing word embedding unit generation according to the word vector source Bert judgment subunit, the word vector alignment subunit, the word vector combination subunit and the word embedding subunit to obtain the target word embedding unit.
8. A language model building apparatus for NLP task, the apparatus comprising:
the first dictionary determining module is used for obtaining a first dictionary of a target Word vector generating model of the target field, wherein the target Word vector generating model is a model obtained based on Word2vec training;
the second dictionary determining module is used for acquiring a second dictionary of the initial language model, wherein the initial language model is a Bert model obtained by adopting sample data in an unlimited field;
the target dictionary intersection data determining module is used for performing intersection acquisition according to the first dictionary and the second dictionary to obtain target dictionary intersection data;
the simulation matrix vector determining module is used for performing fitting unconstrained linear transformation on the intersection data of the target dictionary by adopting a least square method to obtain a simulation matrix vector;
and the target language model determining module is used for constructing a language model according to the initial language model, the target word vector generating model and the simulation matrix vector to obtain a target language model corresponding to the target field.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202110602682.3A 2021-05-31 2021-05-31 Language model construction method, device, equipment and medium for NLP task Active CN113204961B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110602682.3A CN113204961B (en) 2021-05-31 2021-05-31 Language model construction method, device, equipment and medium for NLP task

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110602682.3A CN113204961B (en) 2021-05-31 2021-05-31 Language model construction method, device, equipment and medium for NLP task

Publications (2)

Publication Number Publication Date
CN113204961A true CN113204961A (en) 2021-08-03
CN113204961B CN113204961B (en) 2023-12-19

Family

ID=77024355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110602682.3A Active CN113204961B (en) 2021-05-31 2021-05-31 Language model construction method, device, equipment and medium for NLP task

Country Status (1)

Country Link
CN (1) CN113204961B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113887216A (en) * 2021-10-20 2022-01-04 美的集团(上海)有限公司 Word vector increment method, electronic device and computer storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460838A (en) * 2020-04-23 2020-07-28 腾讯科技(深圳)有限公司 Pre-training method and device of intelligent translation model and storage medium
CN111737996A (en) * 2020-05-29 2020-10-02 北京百度网讯科技有限公司 Method, device and equipment for obtaining word vector based on language model and storage medium
CN111737994A (en) * 2020-05-29 2020-10-02 北京百度网讯科技有限公司 Method, device and equipment for obtaining word vector based on language model and storage medium
US20200334416A1 (en) * 2019-04-16 2020-10-22 Covera Health Computer-implemented natural language understanding of medical reports
CN112365003A (en) * 2020-11-16 2021-02-12 浙江百应科技有限公司 Method for adjusting NLP model capacity based on big data
CN112528037A (en) * 2020-12-04 2021-03-19 北京百度网讯科技有限公司 Edge relation prediction method, device, equipment and storage medium based on knowledge graph
CN112541343A (en) * 2020-12-03 2021-03-23 昆明理工大学 Semi-supervised counterstudy cross-language abstract generation method based on word alignment
CN112749557A (en) * 2020-08-06 2021-05-04 腾讯科技(深圳)有限公司 Text processing model construction method and text processing method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200334416A1 (en) * 2019-04-16 2020-10-22 Covera Health Computer-implemented natural language understanding of medical reports
CN111460838A (en) * 2020-04-23 2020-07-28 腾讯科技(深圳)有限公司 Pre-training method and device of intelligent translation model and storage medium
CN111737996A (en) * 2020-05-29 2020-10-02 北京百度网讯科技有限公司 Method, device and equipment for obtaining word vector based on language model and storage medium
CN111737994A (en) * 2020-05-29 2020-10-02 北京百度网讯科技有限公司 Method, device and equipment for obtaining word vector based on language model and storage medium
CN112749557A (en) * 2020-08-06 2021-05-04 腾讯科技(深圳)有限公司 Text processing model construction method and text processing method
CN112365003A (en) * 2020-11-16 2021-02-12 浙江百应科技有限公司 Method for adjusting NLP model capacity based on big data
CN112541343A (en) * 2020-12-03 2021-03-23 昆明理工大学 Semi-supervised counterstudy cross-language abstract generation method based on word alignment
CN112528037A (en) * 2020-12-04 2021-03-19 北京百度网讯科技有限公司 Edge relation prediction method, device, equipment and storage medium based on knowledge graph

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113887216A (en) * 2021-10-20 2022-01-04 美的集团(上海)有限公司 Word vector increment method, electronic device and computer storage medium

Also Published As

Publication number Publication date
CN113204961B (en) 2023-12-19

Similar Documents

Publication Publication Date Title
CN110765265B (en) Information classification extraction method and device, computer equipment and storage medium
CN111832294B (en) Method and device for selecting marking data, computer equipment and storage medium
CN108509596B (en) Text classification method and device, computer equipment and storage medium
CN108763535B (en) Information acquisition method and device
US20210049298A1 (en) Privacy preserving machine learning model training
CN109766418B (en) Method and apparatus for outputting information
EP4131076A1 (en) Serialized data processing method and device, and text processing method and device
CN114245203B (en) Video editing method, device, equipment and medium based on script
CN109977394B (en) Text model training method, text analysis method, device, equipment and medium
JP2020506488A (en) Batch renormalization layer
CN112000809B (en) Incremental learning method and device for text category and readable storage medium
CN111079429A (en) Entity disambiguation method and device based on intention recognition model and computer equipment
CN111223476B (en) Method and device for extracting voice feature vector, computer equipment and storage medium
CN112860919A (en) Data labeling method, device and equipment based on generative model and storage medium
CN113486140A (en) Knowledge question-answer matching method, device, equipment and storage medium
CN114416984A (en) Text classification method, device and equipment based on artificial intelligence and storage medium
CN113204961B (en) Language model construction method, device, equipment and medium for NLP task
KR20240049528A (en) Method and apparatus for generating question and answer dataset based on input paragraph
CN112395857A (en) Voice text processing method, device, equipment and medium based on dialog system
CN115098722B (en) Text and image matching method and device, electronic equipment and storage medium
CN111368056A (en) Ancient poetry generation method and device
WO2022021987A1 (en) Dialog reply method and apparatus based on control by attribute tag, and computer device
Zhang et al. Character-Aware Sub-Word Level Language Modeling for Uyghur and Turkish ASR
CN114937191A (en) Text image generation method and device and computer equipment
CN114692635A (en) Information analysis method and device based on vocabulary enhancement and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant