CN116050433A - Scene adaptation method, device, equipment and medium of natural language processing model - Google Patents
Scene adaptation method, device, equipment and medium of natural language processing model Download PDFInfo
- Publication number
- CN116050433A CN116050433A CN202310136543.5A CN202310136543A CN116050433A CN 116050433 A CN116050433 A CN 116050433A CN 202310136543 A CN202310136543 A CN 202310136543A CN 116050433 A CN116050433 A CN 116050433A
- Authority
- CN
- China
- Prior art keywords
- target
- voice data
- natural language
- language processing
- processing model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003058 natural language processing Methods 0.000 title claims abstract description 233
- 238000000034 method Methods 0.000 title claims abstract description 79
- 230000006978 adaptation Effects 0.000 title claims abstract description 37
- 239000013598 vector Substances 0.000 claims abstract description 175
- 238000012545 processing Methods 0.000 claims description 172
- 238000012216 screening Methods 0.000 claims description 20
- 230000008569 process Effects 0.000 claims description 19
- 230000006855 networking Effects 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 12
- 230000015654 memory Effects 0.000 claims description 9
- 238000001914 filtration Methods 0.000 claims description 8
- 238000012163 sequencing technique Methods 0.000 claims description 8
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 238000013473 artificial intelligence Methods 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 19
- 238000005457 optimization Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 8
- 230000008901 benefit Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 101100129590 Schizosaccharomyces pombe (strain 972 / ATCC 24843) mcp5 gene Proteins 0.000 description 2
- 238000007667 floating Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
The disclosure provides a scene adaptation method, device, equipment and medium of a natural language processing model, relates to the technical field of artificial intelligence, and particularly relates to the technical field of natural language processing. The specific implementation scheme is as follows: acquiring a target voice data set in a target scene; determining an initial natural language processing model according to the target voice data set, and determining a target voice data input vector and at least one reference voice data input vector according to the initial natural language processing model; sequentially inputting the target voice data input vector into each network node of the initial topology network corresponding to the initial natural language processing model to obtain a target topology network corresponding to the initial natural language processing model; and respectively inputting each reference voice data input vector into a target topological network to obtain a target natural language processing model which is adapted to the target scene. According to the scheme, the natural language processing model is tightly matched with the scene, and the precision of the natural language processing model is improved.
Description
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of natural language processing, and specifically relates to a scene adaptation method, device, equipment and medium of a natural language processing model.
Background
With the continuous development and wide application of artificial intelligence technology, natural language processing technology is also widely applied, and the application of the natural language processing technology is related to different devices and application scenes such as intelligent sound boxes, vehicle-mounted voice assistants, intelligent robots and the like; meanwhile, the speed and the precision of natural language processing can be accelerated through a natural language processing model.
Disclosure of Invention
The present disclosure provides a scene adaptation method, device, equipment and medium for a natural language processing model.
According to an aspect of the present disclosure, there is provided a scene adaptation method of a natural language processing model, including:
acquiring a target voice data set in a target scene; the target voice data set comprises a plurality of voice data, and the size description parameter of each voice data meets the preset condition;
determining an initial natural language processing model according to the target voice data set, and determining a target voice data input vector and at least one reference voice data input vector according to the initial natural language processing model;
Sequentially inputting the target voice data input vector into each network node of an initial topology network corresponding to the initial natural language processing model to obtain a target topology network corresponding to the initial natural language processing model;
and respectively inputting each reference voice data input vector into the target topological network to obtain a target natural language processing model which is adapted to the target scene.
According to another aspect of the present disclosure, there is provided an apparatus for scene adaptation of a natural language processing model, including:
the target voice data set acquisition module is used for acquiring a target voice data set in a target scene; the target voice data set comprises a plurality of voice data, and the size description parameter of each voice data meets the preset condition;
the initial natural language processing model determining module is used for determining an initial natural language processing model according to the target voice data set and determining a target voice data input vector and at least one reference voice data input vector according to the initial natural language processing model;
the target topology network determining module is used for sequentially inputting the target voice data input vector into each network node of the initial topology network corresponding to the initial natural language processing model to obtain a target topology network corresponding to the initial natural language processing model;
And the target natural language processing model determining module is used for respectively inputting each reference voice data input vector into the target topological network to obtain a target natural language processing model which is adapted to the target scene.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the embodiments of the present disclosure.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram of a scene adaptation method of a natural language processing model provided in accordance with an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a scene adaptation method of another natural language processing model provided in accordance with an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a scene adaptation method of yet another natural language processing model provided in accordance with an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a scene adaptation method of yet another natural language processing model provided in accordance with an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a scene adaptation method of yet another natural language processing model provided in accordance with an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of an initial topology network provided in accordance with an embodiment of the present disclosure;
FIG. 7 is a schematic diagram of a target topology network provided in accordance with an embodiment of the present disclosure;
FIG. 8 is a schematic diagram of a scenario adaptation device of a natural language processing model according to an embodiment of the present disclosure;
FIG. 9 is a block diagram of an electronic device for implementing a scene adaptation method of a natural language processing model in accordance with an embodiment of the disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of a scenario adaptation method of a natural language processing model according to an embodiment of the present disclosure, where the embodiment may be adapted to determine, through an obtained target speech data set in a target scenario, a situation of obtaining a natural language processing model adapted to the target scenario, where the method may be performed by a scenario adaptation device of the natural language processing model, where the device may be implemented by software and/or hardware and integrated in an electronic device, where the electronic device involved in the present disclosure may be a computer, a server, a tablet computer, an intelligent sound box, an on-vehicle voice assistant, an intelligent robot, or the like.
Specifically, referring to fig. 1, the method specifically includes the following steps:
s110, acquiring a target voice data set in a target scene.
The target voice data set may include a plurality of voice data, and the size description parameter of each voice data satisfies a preset condition. In this embodiment, the size description parameter of the voice data may be a shape of the voice data, that is, may be a shape of a vector (tensor) corresponding to the voice data, which may include a size, a dimension, or a dimension of the voice data vector, which is not limited in this embodiment.
The preset condition may be that the size description parameters of the voice data are different, or the number of voice data with the same size description parameters is smaller than a set threshold; wherein the set threshold is related to the amount of voice data, for example, the set threshold may be one tenth or one fifth of the total amount of voice data, or the like; therefore, the shape of the vector corresponding to each voice data in the target voice data set can be ensured to be dynamically changed instead of being static, and a basis is provided for obtaining a natural language processing model which is adapted to different target scenes subsequently.
In this embodiment, the target scene may be an intelligent question-answer scene or a voice broadcast scene; for example, the method may be a one-to-one scenario between the user and the intelligent speaker, or a scenario where the vehicle-mounted voice assistant plays voice in real time for navigation, which is not limited in this embodiment.
In an optional implementation manner of this embodiment, voice data generated in the intelligent question-answer scenario may be obtained in real time, size description parameters of each voice data are determined, and the obtained voice data are screened according to the size description parameters to obtain a target voice data set; for example, after determining the size description parameters of each voice data, it may be determined whether the size description parameters are the same, and if they are different, the acquired voice data may be directly determined as the target voice data set; if the size description parameters are the same, or the number of the same size parameters is close to the total number of the voice data, the voice data can be continuously acquired until the acquired size description parameters of the voice data meet the preset condition.
S120, determining an initial natural language processing model according to the target voice data set, and determining a target voice data input vector and at least one reference voice data input vector according to the initial natural language processing model.
In an optional implementation manner of this embodiment, after the target voice data set in the target scene is acquired, an initial natural language processing model corresponding to the acquired target voice data set may be further determined according to the acquired target voice data set; for example, in this embodiment, the initial natural language processing model may be determined according to the data features of the target voice data set, or the scene features of the target scene of the target voice data set are obtained; for example, the initial natural language processing model may be RNN (Recurrent Neural Network ), LSTM (Long Short Term Memory, long short term memory model), word2Vec (Word to vector, generating Word vector model), or the like, which is not limited in this embodiment.
In another optional implementation manner of this embodiment, after the target voice data set in the target scene is acquired, automatic networking may be further performed according to the data feature of the target voice data set, the scene feature of the target scene in which the target voice data set is acquired, or the natural language processing task requirement, to obtain the initial natural language processing model, so that a topology network may be obtained by networking, where the topology network includes a plurality of nodes.
Optionally, in this embodiment, after the initial natural language processing model is obtained, the target speech data input vector and the at least one reference speech data input vector may be further determined according to the natural language processing model.
For example, in this embodiment, after the initial natural language processing model is obtained, the initial natural language processing model may be run in a preset environment, for example, each voice data in the target voice data set may be input into the natural language processing model in a configured cloud server or a local server, that is, each voice data is processed by the initial natural language processing model, and size description information in a preset range may be obtained in the processing process, that is, shape values of each input voice data vector meeting the condition may be obtained; further, each shape value can be stored in a corresponding file to obtain an initialization file; counting all shape values in the initialization file, and determining the occurrence frequency of each shape value; further, a speech data vector corresponding to the shape value with the largest occurrence frequency can be determined as the target speech data input vector; a speech data vector corresponding to a shape value whose frequency of occurrence is greater than a set threshold (for example, 10 or 15, etc., which is not limited in the present embodiment) is determined as each reference speech data input vector.
S130, sequentially inputting the target voice data input vector into each network node of the initial topology network corresponding to the initial natural language processing model to obtain the target topology network corresponding to the initial natural language processing model.
In an optional implementation manner of this embodiment, after the target voice data input vector is obtained, the target voice data may be further sequentially input into each network node of the initial topology network corresponding to the initial natural language processing model, so as to obtain an optimized target topology network corresponding to the initial natural language processing model.
In this embodiment, the target voice data input vector may be input to a first network node of the initial topology network to obtain at least one first processing result; optimizing operators in the first network node according to each first processing result to obtain an optimized first network node; continuously inputting each first processing result into a second network node of the initial topology network to obtain at least one second processing result; optimizing operators in the second network node according to each second processing result to obtain an optimized second network node; and continuously executing the operation of inputting the obtained second processing results into the next network node of the initial topological network until the last network node of the initial topological network, and sequentially optimizing the network nodes to obtain an optimized topological network, namely the target topological network related in the embodiment.
It should be noted that, in this embodiment, the target voice data input vector is not unique, and may include a plurality of vectors, that is, in this embodiment, the initial topology network may be optimized multiple times to obtain the final target topology network.
S140, respectively inputting each reference voice data input vector into a target topological network to obtain a target natural language processing model adapted to a target scene.
In an optional implementation manner of this embodiment, after obtaining each reference voice data input vector and optimizing the initial topology network through the target voice data input vector to obtain an optimized target topology network, each reference voice data input vector may be further input into the optimized target topology network, and the target topology network may be further adjusted according to each output result, so as to obtain a final target natural language processing model that may be adapted to a target scene corresponding to the target voice data set.
It may be appreciated that each network node of the target topology network may include a plurality of operators, where each operator may process the input speech data vector by using different processing cores (kernel), and may obtain different processing results, for example, different processing durations; in this embodiment, by inputting different reference voice data input vectors into the target topology network, it is possible to perform processing on the reference voice data input vectors according to different processing kernels in each node in the target topology network, and screen each processing kernel according to a processing result (for example, processing duration), and finally screen to obtain a target natural language processing model that can be adapted to a target scene corresponding to the target voice data set.
According to the scheme of the embodiment, a target voice data set in a target scene is obtained; determining an initial natural language processing model according to the target voice data set, and determining a target voice data input vector and at least one reference voice data input vector according to the initial natural language processing model; sequentially inputting the target voice data input vector into each network node of the initial topology network corresponding to the initial natural language processing model to obtain a target topology network corresponding to the initial natural language processing model; the method comprises the steps of respectively inputting each reference voice data input vector into a target topological network to obtain a target natural language processing model adapted to a target scene, determining an initial natural language processing model through an obtained target voice data set in the target scene, respectively optimizing a network structure and each operator in the network of the topological network of the initial natural language processing model to obtain the target natural language processing model of the target scene corresponding to the target voice data set, adapting the natural language processing model to a specific scene, and improving the precision of the natural language processing model.
Fig. 2 is a schematic diagram of a scene adaptation method of another natural language processing model according to an embodiment of the disclosure, which is a further refinement of the foregoing technical solutions, where the technical solutions in the present embodiment may be combined with each of the alternatives in one or more embodiments described above. As shown in fig. 2, the scene adaptation method of the natural language processing model includes the following:
s210, acquiring voice data in real time in a target scene, and determining size description parameters of each voice data; and screening the voice data according to the size description parameters to obtain a target voice data set.
The target scene may include: intelligent question-answering scenes or voice broadcasting scenes.
In an optional implementation manner of this embodiment, it may be tried to obtain the voice data generated in the target scene, and determine the size description parameters of each voice data; further, the obtained voice data may be screened according to each size description parameter, so as to obtain a target voice data set.
In this embodiment, the obtained voice data may be converted into voice data vectors, and shape of each voice data vector may be determined; further, screening each voice data by the shape of each voice data vector; for example, voice data with shape greater than a set threshold or less than a set threshold is filtered out, which is not limited in this embodiment.
The advantage of this arrangement is that it can ensure that the acquired target speech data set matches the target scene and that each speech data in the target speech data set has consistency in other variables than the dynamic changes of the size description parameters.
S220, networking according to the attribute characteristics of the target voice data set to obtain an initial natural language processing model.
Wherein the attribute features of the target speech dataset include: the type of the voice data or the size description parameter of each voice data; the voice data can be of integer type or floating point type; the size description parameter of the voice data may be a shape of a voice data vector corresponding to the voice data.
In an optional implementation manner of this embodiment, after the target voice data set is acquired, attribute features of each voice data in the target voice data set are acquired, for example, a data type of each voice data and a shape of a voice data vector corresponding to each voice data may be acquired; furthermore, the initial natural language processing model can be obtained by networking according to the attribute characteristics of the target voice data set.
In this embodiment, in the networking stage, there is no need to specify a shape of the input/output vector of each operator; when the Shape of the input and output vector is needed to participate in operation, the Shape operator of the pad can be used to participate in networking, so that the initial natural language processing model can be quickly obtained through networking.
It will be appreciated that the networking obtains an initial natural language processing model, i.e., determines an initial topology network corresponding to the initial natural language processing model.
According to the attribute characteristics of the target voice data set, the initial natural language processing model is obtained through networking, the initial natural language processing model and an initial topological network corresponding to the initial natural language processing model can be rapidly determined, and a basis is provided for optimization of a subsequent topological network structure.
S230, determining a target voice data input vector and at least one reference voice data input vector according to the initial natural language processing model.
In an alternative implementation of the present embodiment, determining the target speech data input vector and the at least one reference speech data input vector according to the initial natural language processing model may include: operating an initial natural language processing model in a target operating environment, and acquiring a size description parameter of voice data with a dimension within a preset range in the operation process of the initial natural language processing model; writing the description parameters of each size into a target file to obtain an initialization file; traversing each size description parameter in the target initialization file, and determining the occurrence frequency of each size description parameter; determining target voice data corresponding to the target size description parameters with the occurrence frequency larger than a first set threshold value as target voice data input vectors; determining target voice data corresponding to the target size description parameters with the occurrence frequency larger than a second set threshold value as input vectors of each reference voice data; wherein the first set threshold is greater than the second set threshold.
The preset range may be 1-5, 1-6, or 2-7, which is not limited in this embodiment; illustratively, in this embodiment, a shape value of the voice data input vector with a size of 1-7 may be obtained; storing the obtained shape values into a target file, thereby obtaining an initialization file; further, each shape value in the initialization file can be traversed, and the occurrence frequency of each shape value is determined; determining target voice data corresponding to a shape value with the occurrence frequency larger than a first set threshold value as a target voice data input vector, for example, target voice data corresponding to a shape value with the largest occurrence frequency can be determined as a target voice data input vector; target voice data corresponding to a shape value having a frequency of occurrence greater than the second set threshold value is determined as each reference voice data input vector, and for example, target voice data corresponding to a shape value having a frequency of occurrence greater than 2 and less than the maximum frequency of occurrence may be determined as each reference voice data input vector.
In this embodiment, by performing an initialization process on the initial natural language processing model, an initialization file can be quickly obtained, and a target voice data input vector and each reference voice data input vector can be quickly obtained according to the initialization file, so as to provide a basis for optimizing a subsequent natural language processing model.
S240, sequentially inputting the target voice data input vector into each network node of the initial topology network corresponding to the initial natural language processing model to obtain the target topology network corresponding to the initial natural language processing model.
In an optional implementation manner of this embodiment, after determining the target voice data input vector, the target voice data input vector may be further input into each network node of the initial topology network corresponding to the initial natural language processing model, so as to obtain the target topology network corresponding to the initial natural language processing model.
Optionally, sequentially inputting the target voice data input vector into each network node of the initial topology network corresponding to the initial natural language processing model to obtain the target topology network corresponding to the initial natural language processing model, which may include: sequentially inputting the target voice data input vector into each network node of the initial topological network corresponding to the initial natural language processing model, and respectively obtaining at least one processing result corresponding to each network node; and screening the processing results corresponding to the network nodes to obtain target operators corresponding to the network nodes, and forming a target topological network corresponding to the initial natural language processing model.
In an optional implementation manner of this embodiment, after the target voice data input vector is obtained, the target voice data input vector may be input into a first network node of the initial topology network, and the target voice data input vector is processed by the first network node (in this process, each operator in the first network node processes the target voice data input vector), so as to obtain a plurality of first processing results; further, each processing result can be screened to obtain a target operator with the shortest processing duration, and other operators except the target operator in the first network node are deleted to obtain the optimized first network node.
Further, each processing result can be input to the next network node of the initial topology network to obtain a plurality of processing results, and each processing result is screened to obtain an optimized network node; and finally forming the target topological network corresponding to the initial natural language processing model until the last network node of the initial topological network is processed.
The method has the advantages that the initial topology network can be optimized rapidly, and a basis is provided for improving the precision of the natural language processing model.
S250, respectively inputting each reference voice data input vector into a target topological network to obtain a target natural language processing model adapted to a target scene.
In an optional implementation manner of this embodiment, after the optimized target topology network is obtained by inference, each reference voice data input vector determined through S230 may be further input into the target topology network, so as to obtain a target natural language processing model adapted to the target scene.
Optionally, in this embodiment, each reference voice data input vector is input into the target topology network to obtain a target natural language processing model adapted to the target scene, which may include: sequentially inputting each reference voice data input vector into a first network node in a target topological network, sequentially processing the reference voice data input vector through each processing core in a target operator of the first network node, and determining a first target processing core corresponding to the target operator of the first network node according to a processing result; continuing to execute the operation of inputting the reference vector into the next network node in the target topology network until the last network node in the target topology network obtains a target processing core corresponding to a target operator of the last network node; and obtaining a target natural language processing model adapted to the target scene according to the target processing cores corresponding to the network nodes and the target topology network.
It will be appreciated that a plurality of network nodes may be included in the target topology network, each network node may include an operator, each operator may correspond to a plurality of processing cores; that is, when the voice data input vector is processed by the first network node, the voice data input vector can be processed by each processing core of the operator of the first network node to obtain a plurality of processing results, and the processing cores with the best processing effect can be obtained by screening each processing result.
Optionally, in this embodiment, after the optimized target topology network is obtained, a reference voice data input vector may be input into the target topology network, and the reference voice data input vector is sequentially processed by processing cores in each network node in the target topology network, and an optimal processing core corresponding to each network node is determined according to a processing result.
Further, the next reference voice data input vector can be input into the target topology network, the reference voice data input vector is continuously processed by the processing cores in all network nodes in the target topology network in sequence, and the optimal processing core corresponding to each network node is determined according to the processing result; after all the reference voice data input vectors are processed, the optimal kernels can be sequenced, other kernels are filtered, and the fact that the processing kernels corresponding to each network node are relatively optimal to different voice data input vectors is guaranteed, so that a final natural language processing model matched with a target scene is obtained.
In the scheme of the embodiment, after the reference voice data input vector is obtained, the target voice data input vector can be sequentially input into each network node of the initial topology network corresponding to the initial natural language processing model, so as to respectively obtain at least one processing result corresponding to each network node; and screening the processing results corresponding to the network nodes to obtain target operators corresponding to the network nodes, forming a target topological network corresponding to the initial natural language processing model, optimizing the initial topological network, and providing a basis for improving the precision of the natural language processing model.
Fig. 3 is a schematic diagram of a scene adaptation method of a natural language processing model according to still another embodiment of the disclosure, which is a further refinement of the foregoing technical solutions, where the technical solutions in this embodiment may be combined with each of the alternatives in one or more embodiments described above. As shown in fig. 3, the scene adaptation method of the natural language processing model includes the following:
s310, acquiring a target voice data set in a target scene.
S320, determining an initial natural language processing model according to the target voice data set, and determining a target voice data input vector and at least one reference voice data input vector according to the initial natural language processing model.
S330, sequentially inputting the target voice data input vector into each network node of the initial topology network corresponding to the initial natural language processing model, and respectively obtaining at least one processing result corresponding to each network node.
In an optional implementation manner of this embodiment, after the target voice data input vector is acquired, the target voice data input vector may be further sequentially input into each network node of the initial topology network corresponding to the initial natural language processing model, so as to obtain at least one processing result corresponding to each network node.
Optionally, in this embodiment, sequentially inputting the target voice data input vector to each network node of the initial topology network corresponding to the initial natural language processing model, to obtain at least one processing result corresponding to each network node, respectively, may include: inputting the target voice data input vector into a first network node of an initial topological network corresponding to the initial natural language processing model to obtain at least one first output result, and screening each first output result to obtain at least one target first output result; and continuously executing the operation of inputting the target voice data input vector to the next network node of the initial topological network corresponding to the initial natural language processing model until the last network node of the initial topological network corresponding to the initial natural language processing model, and obtaining at least one screened target output result.
The first output result and each target output result may include: the data type, the data form, the processing time length, and the like, which are not limited in this embodiment.
In an optional implementation manner of this embodiment, the screening of each first output result to obtain at least one target first output result may include: under the condition that the data types of the first output results are the same, filtering the first output results which are the same in data form and have processing time length meeting the preset time length rule; or filtering the first output results with the same data type and data form and the processing duration meeting the preset duration rule under the condition that the data types of the first output results are different.
The preset duration rule may be a duration shortest rule; for example, if the data types and the data forms of the data in the first output results are the same, the first output result with the shortest processing duration may be reserved, and other first output results may be filtered out.
In an example of this embodiment, if the first network node includes 4 first output results, and the data types and the data forms of the first output results are floating point type and NCHW, and the processing durations are 1,2,3, and 4, respectively, the first network node may retain the operators corresponding to the first output results, and filter the operators corresponding to the other first output results.
The advantage of this arrangement is that each network node in the topology network can be optimized, providing basis for obtaining the optimal target topology network subsequently.
S340, screening the processing results corresponding to the network nodes to obtain target operators corresponding to the network nodes, and forming a target topological network corresponding to the initial natural language processing model.
In an optional implementation manner of this embodiment, after obtaining a processing result corresponding to any network node, screening processing may be further performed on each processing result to obtain a target operator corresponding to each network node, and finally, a target topology network corresponding to the initial natural language processing model is formed.
Optionally, screening each processing result corresponding to each network node to obtain a target operator corresponding to each network node, and forming a target topology network corresponding to the initial natural language processing model may include: determining the processing time length of each target first output result, sequencing each target first output result according to the set sequence according to each processing time length, obtaining operators corresponding to the first output result ranked at the first or last position of the sequence, and determining the operators as first target operators corresponding to the first network node; determining the data form of each target output result, converting the target data form of the first target output result into the data form of the target voice data input vector under the condition that the target data form of the first target output result is different from the data form of the target voice data input vector, acquiring conversion time length, and adding the rotating speed time length to the processing time length of the first target output result to acquire target time length corresponding to the first target output result; sequencing each target output result according to the set sequence according to each target time length, obtaining operators corresponding to the target output results ranked at the first or last position of the sequence, and determining the operators as target operators corresponding to the last network node; forming a target topology network corresponding to the initial natural language processing model according to each target operator; wherein each target operator contains at least one processing core.
In this embodiment, the processing core may be kernel. Note that, the processing of the first output result in the present embodiment is also applicable to processing of output results of other network nodes, which is merely for explaining the present embodiment and is not limited to the present embodiment.
In an optional implementation manner of this embodiment, the number of target first output results may be one or more, and when the number of first output results is more, the output results may be ordered according to the processing duration of each first output result, for example, the order may be from small to large, that is, the processing duration of the first processing result ordered in the first position is the shortest, and at this time, an operator corresponding to the first processing result may be acquired and determined as an operator corresponding to each first output result.
In another optional implementation manner of this embodiment, when the target data form of the first target output result is different from the data form of the target voice data input vector, converting the target data form of the first target output result into the data form of the target voice data input vector, obtaining a conversion duration, and adding the rotational speed duration to the processing duration of the first target output result to obtain a target duration corresponding to the first target output result; further, the target time lengths can be sequenced from small to large, operators corresponding to the target output results sequenced at the first position of the sequence are obtained, and the operators are determined to be target operators corresponding to the last network node; further, a target topology network corresponding to the initial natural language processing model can be formed according to each target operator.
The advantage of this arrangement is that each network node in the topology network can be further optimized, providing basis for obtaining the optimal target topology network subsequently.
S350, respectively inputting each reference voice data input vector into a target topological network to obtain a target natural language processing model adapted to a target scene.
According to the scheme of the embodiment, the target voice data input vector is input into a first network node of an initial topological network corresponding to an initial natural language processing model to obtain at least one first output result, and each first output result is screened to obtain at least one target first output result; and continuously executing the operation of inputting the target voice data input vector to the next network node of the initial topological network corresponding to the initial natural language processing model until the last network node of the initial topological network corresponding to the initial natural language processing model, obtaining at least one screened target output result, and providing a basis for optimizing the initial topological network and obtaining an optimized target topological network.
Fig. 4 is a schematic diagram of a scene adaptation method of a further natural language processing model according to an embodiment of the disclosure, which is a further refinement of the foregoing technical solutions, where the technical solutions in the present embodiment may be combined with each of the alternatives in one or more embodiments described above. As shown in fig. 4, the scene adaptation method of the natural language processing model includes the following:
S410, acquiring a target voice data set in a target scene.
S420, determining an initial natural language processing model according to the target voice data set, and determining a target voice data input vector and at least one reference voice data input vector according to the initial natural language processing model.
S430, sequentially inputting the target voice data input vector into each network node of the initial topology network corresponding to the initial natural language processing model to obtain the target topology network corresponding to the initial natural language processing model.
S440, each reference voice data input vector is sequentially input into a first network node in the target topological network, each processing core in the target operator of the first network node sequentially passes through to process the reference voice data input vector, and a first target processing core corresponding to the target operator of the first network node is determined according to the processing result.
In an optional implementation manner of this embodiment, after the target topology network is obtained by optimization, each network node in the target topology network may be further optimized; in this embodiment, a reference voice data input vector may be input to a first network node of a target topology network, and the reference voice data input vector may be sequentially processed by each processing core in a target operator of the first network node, to obtain a plurality of processing results; further, a first target processing core corresponding to the target operator of the first network node may be determined according to each processing result.
Optionally, in this embodiment, determining, according to the processing result, a first target processing core corresponding to a target operator of the first network node may include: determining the required time length for each processing core in the target operator to process the reference voice data input vector; the required lengths are ordered according to a set sequence, and a processing core corresponding to the required length meeting a set threshold is determined as a first target processing core.
The setting order may be from small to large or from large to small, and is not limited in this embodiment.
In an optional implementation manner of this embodiment, after the reference voice data input vector is processed by each processing core in the target operator of the first network node to obtain a plurality of processing results, each processing result may be ordered from small to large according to the processing duration; further, the processing core with the shortest processing duration may be determined as the first target processing core.
In this way, the optimal processing core corresponding to the operator of each network node can be rapidly determined, and a basis is provided for further optimization of the target network topology.
S450, the operation of inputting the reference vector into the next network node in the target topological network is continuously executed until the last network node in the target topological network obtains a target processing core corresponding to a target operator of the last network node.
S460, obtaining a target natural language processing model adapted to the target scene according to the target processing cores corresponding to the network nodes and the target topology network.
In an optional implementation manner of this embodiment, after determining the optimal processing core (target processing core) of each network node in the target topology network, other cores except the optimal processing core may be further filtered out, so as to finally obtain a target natural language processing model adapted to obtain the target scenario of the target voice dataset.
The method has the advantages that the final target natural language processing model which is adapted to the target scene can be obtained quickly, meanwhile, the structure of the target natural language processing model can be optimized, and parameters of the model are reduced.
After the target topology network is obtained through optimization, the scheme of the embodiment can sequentially input each reference voice data input vector into a first network node in the target topology network, sequentially process the reference voice data input vector through each processing core in a target operator of the first network node, and determine a first target processing core corresponding to the target operator of the first network node according to a processing result; continuing to execute the operation of inputting the reference vector into the next network node in the target topology network until the last network node in the target topology network obtains a target processing core corresponding to a target operator of the last network node; according to the target processing cores corresponding to the network nodes and the target topology network, a target natural language processing model adapted to the target scene is obtained, the target topology network can be further optimized, and a basis is provided for subsequently obtaining the target natural language processing model adapted to the target scene.
For a better understanding of the scenario adaptation method of the natural language processing model related to the present disclosure, fig. 5 is a schematic diagram of a scenario adaptation method of a further natural language processing model provided according to an embodiment of the present disclosure, and referring to fig. 5, the scenario adaptation method includes:
s510, the dynamic networking is carried out to obtain an initial dynamic shape model.
In this embodiment, for the input-output tensor of each operator in the model, other tensors, except the weight tensor, do not have to specify their shape size. If the Shape of the tensor is actually needed to participate in the operation, the Shape operator of the Paddle can be used for participating in the networking, so that the network flexibility is higher, and the method is applicable to any input Shape.
S520, carrying out non-optimization pre-operation on the initial dynamic shape model, and collecting shape information of each input vector in the operation process.
In this embodiment, the non-optimized pre-operation may be performed multiple times, and the more the number of operations, the closer the obtained shape information is to the real scene.
And S530, obtaining an initialization file according to the collected shape information.
In this embodiment, after the initialization file is obtained, some unchanged shape information may be used to optimize the network; for example, when the value of a shape element is fixed, the element may be deleted from the network. If the shape of the output tensor of an operator is unchanged, that node may be marked as not requiring an refer shape.
S540, operating the initial dynamic shape model according to the initialization file, and optimizing the initial dynamic shape model to obtain a final dynamic shape model.
In this embodiment, tuning the initial model may be divided into network topology tuning and single operator tuning; the network topology tuning can be performed as follows: the network is topologically ordered first, and from the first Op1 (network node), assuming that Op1 has num1 kernel, then there may be num1 outputs, for which only kernel with the least time consumption is reserved for deduplication according to the numerical type, and is recorded in state array state1, and at the same time, the total consumption from the beginning of the network to Op1 is recorded, and at this time, the number of states is assumed to be s1.
Further, starting from Op1, starting with kernel of Op2, assuming that Op2 has num2 kernels, for each output in each state1, traversing num2 kernels, s1 x num2 possible outputs are formed, for these outputs, deduplicating according to the value type, only the least time-consuming kernel is reserved, and the same time-consuming recording is performed, and is recorded into the state array state2, and at this time, the number of states is assumed to be s2.
In the past, until the output Op is encountered, the whole process is ended, the optimal topology structure of the network is determined, and the estimated total time consumption of the network is obtained.
Fig. 6 is a schematic diagram of an initial topology network, which mainly includes three network nodes (Op 1, op2, and Op 3), and the data type of the input vector is float, and the data form is NCHW according to an embodiment of the present disclosure; the data type of the output vector is float, and the data form is NCHW; it may include 4 operators for Op1, and its processing result for the input vector may be as follows:
implementation of Op1 | Op1 input state | Output state of Op1 | Time consuming |
Op1_1 | Float,NCHW | Float,NCHW | 1 |
Op1_2 | Float,NCHW | Float,NCHW | 2 |
Op1_3 | Float,NCHW | Float,NHWC | 3 |
Op1_4 | Float,NCHW | Float,NHWC | 4 |
The duplicate removal optimization is carried out on the data in the table, and the following steps are obtained:
implementation of Op1 | Output state of Op1 | Time consuming |
Op1_1 | Float,NCHW | 1 |
Op1_3 | Float,NHWC | 3 |
Further, for Op2, the processing result of the input vector may be as follows:
implementation of Op2 | Output state of Op1 | Output state of Op2 | Time consuming |
Op2_1 | Float,NCHW | Float,NCHW | 1+2 |
Op2_2 | Float,NCHW | Float,NCHW | 1+2 |
Op2_3 | Float,NCHW | Float,NHWC | 1+3 |
Op2_4 | Float,NCHW | Float,NHWC | 1+1 |
Op2_5 | Float,NHWC | Float,NCHW | 3+2 |
Op2_6 | Float,NHWC | Float,NCHW | 3+3 |
Op2_7 | Float,NHWC | Float,NHWC | 3+4 |
Op2_8 | Float,NHWC | Float,NHWC | 3+1 |
The duplicate removal optimization is carried out on the data in the table, and the following steps are obtained:
implementation of Op2 | Output state of Op2 | Time consuming |
Op2_2 | Float,NCHW | 3 |
Op2_4 | Float,NHWC | 2 |
Further, for Op3, the processing result of the input vector may be as follows:
implementation of Op3 | Input state of Op3 | Output state of Op3 | Time consuming |
Op3_1 | Float,NCHW | Float,NCHW | 3+2 |
Op3_2 | Float,NCHW | Float,NCHW | 3+2 |
Op3_3 | Float,NCHW | Float,NHWC | 3+3 |
Op3_4 | Float,NCHW | Float,NHWC | 3+1 |
Op3_5 | Float,NHWC | Float,NCHW | 2+2 |
Op3_6 | Float,NHWC | Float,NCHW | 2+3 |
Op3_7 | Float,NHWC | Float,NHWC | 2+4 |
Op3_8 | Float,NHWC | Float,NHWC | 2+1 |
The duplicate removal optimization is carried out on the data in the table, and the following steps are obtained:
implementation of Op3 | Output state of Op3 | Time consuming |
Op3_5 | Float,NCHW | 4 |
Op3_8 | Float,NHWC | 3 |
Finally turning to the target layout, assuming that NHWC- > NCHW consumes 0.5, the total cost of the network is min (4+0, 3+0.5) =3.5, and the optimal structure of the network is shown in fig. 7, that is, fig. 7 is a schematic diagram of a target topology network provided according to an embodiment of the present disclosure.
Furthermore, the kernel in the Op1_1 can be optimized according to different shapes, namely, single operator optimization is realized; optionally, for each type of the Op, each kernel is operated for a fixed number of times by a pre-operation means, so as to determine the time consumption and diff of each kernel, finally obtain the optimal kernel of each Op under each Shape, and record the optimal kernel in a dictionary, thereby ensuring that different shapes can quickly obtain the optimal kernel implementation when the model performs on-line dynamic Shape reasoning.
According to the scheme, the network topology and the single operator of the dynamic shape model obtained through networking are optimized, so that the optimized dynamic shape model can be obtained, and the reasoning speed and the precision of the dynamic shape model are improved.
FIG. 8 is a schematic structural diagram of a scene adaptation device of a natural language processing model according to an embodiment of the disclosure, which may perform the scene adaptation method of the natural language processing model according to any of the embodiments of the disclosure; referring to fig. 8, a scene adaptation apparatus 800 of a natural language processing model includes: the target speech dataset acquisition module 810, the initial natural language processing model determination module 820, the target topology network determination module 830, and the target natural language processing model determination module 840.
A target voice data set acquisition module 810, configured to acquire a target voice data set in a target scene; the target voice data set comprises a plurality of voice data, and the size description parameter of each voice data meets the preset condition;
an initial natural language processing model determination module 820 for determining an initial natural language processing model from the target speech data set and determining a target speech data input vector and at least one reference speech data input vector from the initial natural language processing model;
the target topology network determining module 830 is configured to sequentially input the target voice data input vector into each network node of an initial topology network corresponding to the initial natural language processing model, so as to obtain a target topology network corresponding to the initial natural language processing model;
the target natural language processing model determining module 840 is configured to input each of the reference voice data input vectors into the target topology network, respectively, to obtain a target natural language processing model adapted to the target scene.
According to the scheme of the embodiment, a target voice data set under a target scene is acquired through a target voice data set acquisition module; determining an initial natural language processing model according to the target voice data set by an initial natural language processing model determining module, and determining a target voice data input vector and at least one reference voice data input vector according to the initial natural language processing model; sequentially inputting the target voice data input vector into each network node of an initial topology network corresponding to the initial natural language processing model through a target topology network determining module to obtain a target topology network corresponding to the initial natural language processing model; the target natural language processing model determining module is used for respectively inputting each reference voice data input vector into the target topological network to obtain a target natural language processing model which is adapted to the target scene, so that the natural language processing model is closely adapted to the scene, and the precision of the natural language processing model is improved.
In an optional implementation manner of this embodiment, the target voice data set obtaining module 810 is specifically configured to obtain voice data in real time in the target scene, and determine a size description parameter of each voice data;
screening the voice data according to the size description parameters to obtain the target voice data set;
the target scene comprises: intelligent question-answering scenes or voice broadcasting scenes.
In an optional implementation manner of this embodiment, the initial natural language processing model determining module 820 is specifically configured to network to obtain the initial natural language processing model according to the attribute features of the target voice dataset;
the attribute features of the target speech dataset include: the type of the voice data or the size description parameter of each of the voice data.
In an optional implementation manner of this embodiment, the initial natural language processing model determining module 820 is further configured to operate the initial natural language processing model in a target operating environment, and obtain a size description parameter of the voice data with a dimension within a preset range during an operation process of the initial natural language processing model;
Writing each size description parameter into a target file to obtain an initialization file;
traversing each size description parameter in the initialization file, and determining the occurrence frequency of each size description parameter;
determining target voice data corresponding to the target size description parameters with the occurrence frequency larger than a first set threshold value as target voice data input vectors;
determining target voice data corresponding to the target size description parameters with the occurrence frequency larger than a second set threshold value as each reference voice data input vector;
wherein the first set threshold is greater than the second set threshold.
In an optional implementation manner of this embodiment, the target topology network determining module 830 is specifically configured to sequentially input the target voice data input vector into each network node of an initial topology network corresponding to the initial natural language processing model, to obtain at least one processing result corresponding to each network node;
and screening the processing results corresponding to the network nodes respectively to obtain target operators corresponding to the network nodes, and forming a target topological network corresponding to the initial natural language processing model.
In an optional implementation manner of this embodiment, the target topology network determining module 830 is further specifically configured to input the target voice data input vector into a first network node of an initial topology network corresponding to the initial natural language processing model, obtain at least one first output result, and filter each first output result to obtain at least one target first output result;
and continuing to input the target voice data input vector to the next network node of the initial topological network corresponding to the initial natural language processing model until the last network node of the initial topological network corresponding to the initial natural language processing model obtains at least one screened target output result.
In an optional implementation manner of this embodiment, each of the first output results includes a data type, a data form, and a processing duration;
the target topology network determining module 830 is further specifically configured to filter, in case that the data types of the first output results are the same, the first output results with the same data form and the processing duration meeting the preset duration rule;
Or filtering the first output results with the same data type and data form and the processing duration meeting the preset duration rule under the condition that the data types of the first output results are different.
In an optional implementation manner of this embodiment, the target topology network determining module 830 is further specifically configured to determine a processing duration of each target first output result, sort each target first output result according to a set order according to each processing duration, obtain an operator corresponding to a first output result arranged at the first or last position of the sequence, and determine the operator as a first target operator corresponding to a first network node;
or determining a data form of each target output result, converting the target data form of a first target output result into the data form of the target voice data input vector under the condition that the target data form of the first target output result is different from the data form of the target voice data input vector, acquiring conversion time length, and adding the processing time length of the first target output result to the rotating speed time length to acquire target time length corresponding to the first target output result;
Sequencing each target output result according to the target duration and a set sequence, obtaining operators corresponding to the target output result ranked at the first or last position of the sequence, and determining the operators as target operators corresponding to the last network node;
forming a target topology network corresponding to the initial natural language processing model according to each target operator;
wherein each of said target operators comprises at least one processing core.
In an optional implementation manner of this embodiment, the target natural language processing model determining module 840 is specifically configured to sequentially input each of the reference voice data input vectors to a first network node in the target topology network, sequentially process the reference voice data input vectors through each of processing kernels in a target operator of the first network node, and determine a first target processing kernel corresponding to the target operator of the first network node according to a processing result;
continuing to execute the operation of inputting the reference vector into the next network node in the target topology network until the last network node in the target topology network obtains a target processing core corresponding to a target operator of the last network node;
And obtaining a target natural language processing model adapted to the target scene according to the target processing cores corresponding to the network nodes and the target topology network.
In an optional implementation manner of this embodiment, the target natural language processing model determining module 840 is further specifically configured to determine a required duration for each processing core in the target operator to process the reference speech data input vector;
and sequencing the required lengths according to a set sequence, and determining the processing core corresponding to the required length meeting the set threshold as a first target processing core.
In an optional implementation manner of this embodiment, the target natural language processing model determining module 840 is further specifically configured to filter out other reference processing kernels in the target topology network, except for each target processing kernel, to obtain a target natural language processing model adapted to the target scene.
The scene adaptation device of the natural language processing model can execute the scene adaptation method of the natural language processing model provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of the execution method. Technical details not described in detail in this embodiment may be referred to a scene adaptation method of a natural language processing model provided in any embodiment of the present disclosure.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 9 shows a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 9, the apparatus 900 includes a computing unit 901 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 909 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The computing unit 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.
Various components in device 900 are connected to I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, or the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.
The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the respective methods and processes described above, for example, a scene adaptation method of a natural language processing model. For example, in some embodiments, the scene adaptation method of the natural language processing model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the scene adaptation method of the natural language processing model described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the scene adaptation method of the natural language processing model in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above can be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.
Claims (25)
1. A scene adaptation method of a natural language processing model, comprising:
acquiring a target voice data set in a target scene; the target voice data set comprises a plurality of voice data, and the size description parameter of each voice data meets the preset condition;
determining an initial natural language processing model according to the target voice data set, and determining a target voice data input vector and at least one reference voice data input vector according to the initial natural language processing model;
Sequentially inputting the target voice data input vector into each network node of an initial topology network corresponding to the initial natural language processing model to obtain a target topology network corresponding to the initial natural language processing model;
and respectively inputting each reference voice data input vector into the target topological network to obtain a target natural language processing model which is adapted to the target scene.
2. The method of claim 1, wherein the acquiring the target speech data set in the target scene comprises:
acquiring voice data in real time in the target scene, and determining size description parameters of each voice data;
screening the voice data according to the size description parameters to obtain the target voice data set;
the target scene comprises: intelligent question-answering scene and a voice broadcasting scene.
3. The method of claim 1, wherein said determining an initial natural language processing model from said target speech dataset comprises:
networking to obtain the initial natural language processing model according to the attribute characteristics of the target voice data set;
The attribute features of the target speech dataset include: the type of the voice data or the size description parameter of each of the voice data.
4. The method of claim 3, wherein said determining a target speech data input vector and at least one reference speech data input vector from said initial natural language processing model comprises:
operating the initial natural language processing model in a target operating environment, and acquiring a size description parameter of voice data with a dimension within a preset range in the operation process of the initial natural language processing model;
writing each size description parameter into a target file to obtain an initialization file;
traversing each size description parameter in the initialization file, and determining the occurrence frequency of each size description parameter;
determining target voice data corresponding to the target size description parameters with the occurrence frequency larger than a first set threshold value as target voice data input vectors;
determining target voice data corresponding to the target size description parameters with the occurrence frequency larger than a second set threshold value as each reference voice data input vector;
wherein the first set threshold is greater than the second set threshold.
5. The method of claim 1, wherein the sequentially inputting the target voice data input vector into each network node of an initial topology network corresponding to the initial natural language processing model, to obtain a target topology network corresponding to the initial natural language processing model, comprises:
sequentially inputting the target voice data input vector into each network node of an initial topology network corresponding to the initial natural language processing model to respectively obtain at least one processing result corresponding to each network node;
and screening the processing results corresponding to the network nodes respectively to obtain target operators corresponding to the network nodes, and forming a target topological network corresponding to the initial natural language processing model.
6. The method of claim 5, wherein the sequentially inputting the target voice data input vector into each network node of the initial topology network corresponding to the initial natural language processing model, respectively, obtains at least one processing result corresponding to each network node, respectively, includes:
inputting the target voice data input vector into a first network node of an initial topological network corresponding to the initial natural language processing model to obtain at least one first output result, and screening each first output result to obtain at least one target first output result;
And continuing to input the target voice data input vector to the next network node of the initial topological network corresponding to the initial natural language processing model until the last network node of the initial topological network corresponding to the initial natural language processing model obtains at least one screened target output result.
7. The method of claim 6, wherein each of the first output results includes a data type, a data form, and a processing duration;
the step of screening each first output result to obtain at least one target first output result includes:
under the condition that the data types of the first output results are the same, filtering the first output results which are the same in data form and have processing time length meeting the preset time length rule;
or filtering the first output results with the same data type and data form and the processing duration meeting the preset duration rule under the condition that the data types of the first output results are different.
8. The method of claim 5, wherein the screening the processing results corresponding to the network nodes to obtain target operators corresponding to the network nodes, respectively, forms a target topology network corresponding to the initial natural language processing model, and includes:
Determining the processing time length of each target first output result, sorting each target first output result according to the processing time length and a set sequence, obtaining an operator corresponding to the first output result arranged at the first position or the last position of the sequence, and determining the operator as a first target operator corresponding to a first network node;
or determining a data form of each target output result, converting the target data form of a first target output result into the data form of the target voice data input vector under the condition that the target data form of the first target output result is different from the data form of the target voice data input vector, acquiring conversion time length, and adding the processing time length of the first target output result to the rotating speed time length to acquire target time length corresponding to the first target output result;
sequencing each target output result according to the target duration and a set sequence, obtaining operators corresponding to the target output result ranked at the first or last position of the sequence, and determining the operators as target operators corresponding to the last network node;
forming a target topology network corresponding to the initial natural language processing model according to each target operator;
Wherein each of said target operators comprises at least one processing core.
9. The method of claim 1, wherein said respectively inputting each of the reference speech data input vectors into the target topology network results in a target natural language processing model adapted to the target scene, comprising:
sequentially inputting each reference voice data input vector into a first network node in the target topology network, sequentially processing the reference voice data input vector through each processing core in a target operator of the first network node, and determining a first target processing core corresponding to the target operator of the first network node according to a processing result;
continuing to execute the operation of inputting the reference vector into the next network node in the target topology network until the last network node in the target topology network obtains a target processing core corresponding to a target operator of the last network node;
and obtaining a target natural language processing model adapted to the target scene according to the target processing cores corresponding to the network nodes and the target topology network.
10. The method of claim 9, wherein the determining, according to the processing result, a first target processing core corresponding to a target operator of the first network node comprises:
determining the required time length for each processing core in the target operator to process the reference voice data input vector;
and sequencing the required lengths according to a set sequence, and determining the processing core corresponding to the required length meeting the set threshold as a first target processing core.
11. The method of claim 9, wherein the obtaining a target natural language processing model adapted to the target scene according to the target processing core corresponding to each network node and a target topology network includes:
and filtering out other reference processing cores except for each target processing core in the target topology network to obtain a target natural language processing model which is adapted to the target scene.
12. An apparatus for scene adaptation of a natural language processing model, comprising:
the target voice data set acquisition module is used for acquiring a target voice data set in a target scene; the target voice data set comprises a plurality of voice data, and the size description parameter of each voice data meets the preset condition;
The initial natural language processing model determining module is used for determining an initial natural language processing model according to the target voice data set and determining a target voice data input vector and at least one reference voice data input vector according to the initial natural language processing model;
the target topology network determining module is used for sequentially inputting the target voice data input vector into each network node of the initial topology network corresponding to the initial natural language processing model to obtain a target topology network corresponding to the initial natural language processing model;
and the target natural language processing model determining module is used for respectively inputting each reference voice data input vector into the target topological network to obtain a target natural language processing model which is adapted to the target scene.
13. The apparatus according to claim 12, wherein the target speech data set acquisition module is specifically configured to
Acquiring voice data in real time in the target scene, and determining size description parameters of each voice data;
screening the voice data according to the size description parameters to obtain the target voice data set;
The target scene comprises: intelligent question-answering scenes or voice broadcasting scenes.
14. The apparatus of claim 12, wherein the initial natural language processing model determination module is configured to
Networking to obtain the initial natural language processing model according to the attribute characteristics of the target voice data set;
the attribute features of the target speech dataset include: the type of the voice data or the size description parameter of each of the voice data.
15. The apparatus of claim 14, wherein the initial natural language processing model determination module is further configured to
Operating the initial natural language processing model in a target operating environment, and acquiring a size description parameter of voice data with a dimension within a preset range in the operation process of the initial natural language processing model;
writing each size description parameter into a target file to obtain an initialization file;
traversing each size description parameter in the initialization file, and determining the occurrence frequency of each size description parameter;
determining target voice data corresponding to the target size description parameters with the occurrence frequency larger than a first set threshold value as target voice data input vectors;
Determining target voice data corresponding to the target size description parameters with the occurrence frequency larger than a second set threshold value as each reference voice data input vector;
wherein the first set threshold is greater than the second set threshold.
16. The apparatus of claim 12, wherein the target topology network determination module is configured to
Sequentially inputting the target voice data input vector into each network node of an initial topology network corresponding to the initial natural language processing model to respectively obtain at least one processing result corresponding to each network node;
and screening the processing results corresponding to the network nodes respectively to obtain target operators corresponding to the network nodes, and forming a target topological network corresponding to the initial natural language processing model.
17. The apparatus of claim 16, wherein the target topology network determination module is further specifically configured to
Inputting the target voice data input vector into a first network node of an initial topological network corresponding to the initial natural language processing model to obtain at least one first output result, and screening each first output result to obtain at least one target first output result;
And continuing to input the target voice data input vector to the next network node of the initial topological network corresponding to the initial natural language processing model until the last network node of the initial topological network corresponding to the initial natural language processing model obtains at least one screened target output result.
18. The apparatus of claim 17, wherein each of the first output results comprises a data type, a data form, and a processing duration;
the target topology network determining module is further specifically configured to filter out the first output results with the same data form and a processing duration meeting a preset duration rule when the data types of the first output results are the same;
or filtering the first output results with the same data type and data form and the processing duration meeting the preset duration rule under the condition that the data types of the first output results are different.
19. The apparatus of claim 16, wherein the target topology network determination module is further specifically configured to
Determining the processing time length of each target first output result, sorting each target first output result according to the processing time length and a set sequence, obtaining an operator corresponding to the first output result arranged at the first position or the last position of the sequence, and determining the operator as a first target operator corresponding to a first network node;
Or determining a data form of each target output result, converting the target data form of a first target output result into the data form of the target voice data input vector under the condition that the target data form of the first target output result is different from the data form of the target voice data input vector, acquiring conversion time length, and adding the processing time length of the first target output result to the rotating speed time length to acquire target time length corresponding to the first target output result;
sequencing each target output result according to the target duration and a set sequence, obtaining operators corresponding to the target output result ranked at the first or last position of the sequence, and determining the operators as target operators corresponding to the last network node;
forming a target topology network corresponding to the initial natural language processing model according to each target operator;
wherein each of said target operators comprises at least one processing core.
20. The apparatus of claim 12, wherein the target natural language processing model determination module is specifically configured to
Sequentially inputting each reference voice data input vector into a first network node in the target topology network, sequentially processing the reference voice data input vector through each processing core in a target operator of the first network node, and determining a first target processing core corresponding to the target operator of the first network node according to a processing result;
Continuing to execute the operation of inputting the reference vector into the next network node in the target topology network until the last network node in the target topology network obtains a target processing core corresponding to a target operator of the last network node;
and obtaining a target natural language processing model adapted to the target scene according to the target processing cores corresponding to the network nodes and the target topology network.
21. The apparatus of claim 20, wherein the target natural language processing model determination module is further specifically configured to
Determining the required time length for each processing core in the target operator to process the reference voice data input vector;
and sequencing the required lengths according to a set sequence, and determining the processing core corresponding to the required length meeting the set threshold as a first target processing core.
22. The apparatus of claim 20, wherein the target natural language processing model determination module is further specifically configured to
And filtering out other reference processing cores except for each target processing core in the target topology network to obtain a target natural language processing model which is adapted to the target scene.
23. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.
24. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-11.
25. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310136543.5A CN116050433B (en) | 2023-02-13 | 2023-02-13 | Scene adaptation method, device, equipment and medium of natural language processing model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310136543.5A CN116050433B (en) | 2023-02-13 | 2023-02-13 | Scene adaptation method, device, equipment and medium of natural language processing model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116050433A true CN116050433A (en) | 2023-05-02 |
CN116050433B CN116050433B (en) | 2024-03-26 |
Family
ID=86133467
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310136543.5A Active CN116050433B (en) | 2023-02-13 | 2023-02-13 | Scene adaptation method, device, equipment and medium of natural language processing model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116050433B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110458598A (en) * | 2019-07-04 | 2019-11-15 | 阿里巴巴集团控股有限公司 | Scene adaptation method, device and electronic equipment |
US20200311476A1 (en) * | 2019-01-31 | 2020-10-01 | Beijing Sensetime Technology Development Co., Ltd. | Target object processing method and apparatus, electronic device, and storage medium |
CN112233653A (en) * | 2020-12-10 | 2021-01-15 | 北京远鉴信息技术有限公司 | Method, device and equipment for training multi-dialect accent mandarin speech recognition model |
US20220019736A1 (en) * | 2020-07-20 | 2022-01-20 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for training natural language processing model, device and storage medium |
CN114399998A (en) * | 2021-12-03 | 2022-04-26 | 北京百度网讯科技有限公司 | Voice processing method, device, equipment, storage medium and program product |
JP2022089798A (en) * | 2020-12-04 | 2022-06-16 | 株式会社Nttドコモ | Training device, method, apparatus, and computer readable medium |
CN115101061A (en) * | 2022-07-14 | 2022-09-23 | 京东科技信息技术有限公司 | Training method and device of voice recognition model, storage medium and electronic equipment |
US20220383876A1 (en) * | 2021-08-09 | 2022-12-01 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method of converting speech, electronic device, and readable storage medium |
WO2023279921A1 (en) * | 2021-07-08 | 2023-01-12 | 华为技术有限公司 | Neural network model training method, data processing method, and apparatuses |
-
2023
- 2023-02-13 CN CN202310136543.5A patent/CN116050433B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200311476A1 (en) * | 2019-01-31 | 2020-10-01 | Beijing Sensetime Technology Development Co., Ltd. | Target object processing method and apparatus, electronic device, and storage medium |
CN110458598A (en) * | 2019-07-04 | 2019-11-15 | 阿里巴巴集团控股有限公司 | Scene adaptation method, device and electronic equipment |
US20220019736A1 (en) * | 2020-07-20 | 2022-01-20 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for training natural language processing model, device and storage medium |
JP2022089798A (en) * | 2020-12-04 | 2022-06-16 | 株式会社Nttドコモ | Training device, method, apparatus, and computer readable medium |
CN112233653A (en) * | 2020-12-10 | 2021-01-15 | 北京远鉴信息技术有限公司 | Method, device and equipment for training multi-dialect accent mandarin speech recognition model |
WO2023279921A1 (en) * | 2021-07-08 | 2023-01-12 | 华为技术有限公司 | Neural network model training method, data processing method, and apparatuses |
US20220383876A1 (en) * | 2021-08-09 | 2022-12-01 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method of converting speech, electronic device, and readable storage medium |
CN114399998A (en) * | 2021-12-03 | 2022-04-26 | 北京百度网讯科技有限公司 | Voice processing method, device, equipment, storage medium and program product |
CN115101061A (en) * | 2022-07-14 | 2022-09-23 | 京东科技信息技术有限公司 | Training method and device of voice recognition model, storage medium and electronic equipment |
Non-Patent Citations (2)
Title |
---|
NING LU ET.AL: "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition", 《ARXIV:1910.02562V3》, pages 1 - 34 * |
熊红凯 等: "可解释化、结构化、多模态化的深度神经网络", 模式识别与人工智能, no. 01, pages 1 - 11 * |
Also Published As
Publication number | Publication date |
---|---|
CN116050433B (en) | 2024-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP2022018095A (en) | Multi-modal pre-training model acquisition method, apparatus, electronic device and storage medium | |
US20220147877A1 (en) | System and method for automatic building of learning machines using learning machines | |
CN113360711B (en) | Model training and executing method, device, equipment and medium for video understanding task | |
CN112509566B (en) | Speech recognition method, device, equipment, storage medium and program product | |
CN114091128B (en) | Method and device for determining layout scheme and electronic equipment | |
CN112749300B (en) | Method, apparatus, device, storage medium and program product for video classification | |
CN113657483A (en) | Model training method, target detection method, device, equipment and storage medium | |
CN116580223A (en) | Data processing and model fine tuning method and device, electronic equipment and storage medium | |
CN112560461A (en) | News clue generation method and device, electronic equipment and storage medium | |
CN117032938A (en) | Operator parallel scheduling method and device, electronic equipment and storage medium | |
CN114756680A (en) | Text classification method, system, electronic equipment and storage medium | |
CN114581261A (en) | Fault diagnosis method, system, equipment and storage medium based on quick graph calculation | |
CN116050433B (en) | Scene adaptation method, device, equipment and medium of natural language processing model | |
CN116257611B (en) | Question-answering model training method, question-answering processing device and storage medium | |
CN115410048B (en) | Training of image classification model, image classification method, device, equipment and medium | |
CN116992000A (en) | Interactive information processing method, device, electronic equipment and computer readable medium | |
CN113408702B (en) | Music neural network model pre-training method, electronic device and storage medium | |
CN113012682B (en) | False wake-up rate determination method, device, apparatus, storage medium, and program product | |
CN113570067B (en) | Synchronization method and device of distributed system | |
CN114970666A (en) | Spoken language processing method and device, electronic equipment and storage medium | |
CN115730681B (en) | Model training method, device, equipment and storage medium | |
CN117609870B (en) | Structure recognition model training, model structure recognition method, device and medium | |
CN116151215B (en) | Text processing method, deep learning model training method, device and equipment | |
CN116244413B (en) | New intention determining method, apparatus and storage medium | |
CN116827411B (en) | Load data analysis method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |