CN115408498A - Data dynamic identification method based on natural language - Google Patents

Data dynamic identification method based on natural language Download PDF

Info

Publication number
CN115408498A
CN115408498A CN202211359030.2A CN202211359030A CN115408498A CN 115408498 A CN115408498 A CN 115408498A CN 202211359030 A CN202211359030 A CN 202211359030A CN 115408498 A CN115408498 A CN 115408498A
Authority
CN
China
Prior art keywords
preset
data
sample
splicing
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211359030.2A
Other languages
Chinese (zh)
Other versions
CN115408498B (en
Inventor
杨介
崔昆俞
赵鸿
伍之洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongfu Safety Technology Co Ltd
Original Assignee
Zhongfu Safety Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongfu Safety Technology Co Ltd filed Critical Zhongfu Safety Technology Co Ltd
Priority to CN202211359030.2A priority Critical patent/CN115408498B/en
Publication of CN115408498A publication Critical patent/CN115408498A/en
Application granted granted Critical
Publication of CN115408498B publication Critical patent/CN115408498B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a data dynamic identification method based on natural language, mainly relates to the technical field of data dynamic identification, and is used for solving the problems of general applicability and high uncertainty of the existing model performance. The method comprises the following steps: determining semantic tag data corresponding to the sample data; generating an experiment set; splitting the data into a training data set and a verification data set; importing sample data in a training data set into a preset encoder; splicing into sample splicing data; importing semantic label data in a training data set into a preset generator, and further splicing into label splicing data; determining the cost value of the distance between the sample splicing data and the label splicing data to obtain a trained preset discriminator; obtaining a trained preset encoder and a preset generator; obtaining verification sample splicing data; obtaining verification label splicing data; and completing the matching of the data. By the method, the fitting degree of the model and the data is improved, and the accuracy is prompted.

Description

Data dynamic identification method based on natural language
Technical Field
The application relates to the technical field of dynamic data identification, in particular to a dynamic data identification method based on natural language.
Background
The text classification task in the natural language processing field is widely applied to various fields and industries, the application scene range is wide, and various resources (service platforms, hardware, technical frameworks, data and the like) which can be utilized in the development and implementation process are various.
The existing method for dynamically identifying data comprises the following steps: and (3) adopting a general structure, taking a BERT or similar model [ CLS ] layer as the input of a classifier, and carrying out classification task training. Can meet certain industrial requirements under normal conditions, and has simple realization and short development period.
However, as data asset security management becomes more and more standard, a certain bottleneck exists in the performance level in a real service scene through a general architecture, and the universality and uncertainty of the corresponding model performance are large.
Disclosure of Invention
In view of the above-mentioned shortcomings in the prior art, the present invention provides a dynamic data identification method based on natural language, so as to solve the above-mentioned technical problems.
The application provides a data dynamic identification method based on natural language, which comprises the following steps: acquiring a sample set, and determining semantic tag data corresponding to each sample data in the sample set; generating an experiment set based on the sample data, the semantic label data and the mapping relation between the sample data and the semantic label data; splitting the experiment set into a training data set and a verification data set; importing sample data in a training data set into a preset encoder; acquiring a plurality of sample sub-splicing data from a hidden layer of a preset encoder based on a preset sample splicing data acquisition position, and further splicing the sample sub-splicing data into sample splicing data; importing semantic label data and preset reference dimension data in a training data set into a preset generator; acquiring a plurality of label sub-splicing data from a hidden layer of a preset generator based on a preset label splicing data acquisition position, and splicing the label sub-splicing data into label splicing data; determining a distance cost value between the sample splicing data and the label splicing data based on a preset distribution distance equation; importing the distance cost value, the preset learning rate, the preset smoothing constant and the initial discriminator weight value into a preset optimizer to complete weight updating of the preset discriminator so as to obtain a trained preset discriminator; importing a preset learning rate, a preset smoothing constant, an initial encoder weight value and a trained preset discriminator into a preset optimizer to finish updating of the preset encoder weight value; importing a preset learning rate, a preset smoothing constant, an initial encoder weight value and a trained preset discriminator into a preset optimizer to finish updating of a preset generator weight value; so as to obtain a trained preset encoder and a preset generator; obtaining verification sample splicing data corresponding to sample data in a verification data set based on a trained preset encoder; obtaining verification label splicing data based on the trained preset generator or label splicing data; and matching the verification sample splicing data with the verification label splicing data based on a trained preset discriminator or a preset matching degree calculation formula.
Further, the hidden layer of the preset encoder and the hidden layer of the preset generator both comprise a text semantic coding network layer and a label semantic coding network layer.
Further, determining semantic tag data corresponding to each sample data in the sample set specifically includes: acquiring a semantic tag set through a preset semantic tag interface; wherein the semantic tag set comprises semantic tag data; or, semantic tag data corresponding to each sample data is obtained through a preset keyword/subject term extraction algorithm; or analyzing the part of speech of the sample data through a preset sample part of speech analysis algorithm to obtain preset attribute words corresponding to the sample data, and splicing the preset attribute words into semantic tag data; or when the preset associated data set corresponding to the sample set is obtained, extracting the keywords/subject terms corresponding to the preset associated data set through a keyword/subject term extraction algorithm and a preset sample part-of-speech analysis algorithm to obtain semantic tag data.
Further, acquiring a sample set specifically includes: and acquiring real service data or replacing open source service data or artificial sample data as a sample set through a preset sample uploading process.
Further, before determining a distance cost value between the sample stitching data and the label stitching data based on a preset inter-distribution distance equation, the method further includes: replacing joint distribution in the Wasserstein-distance method with an encoder, replacing edge distribution with a generator, and replacing sampling with sample splicing data and label splicing data; obtaining a preset distance cost value calculation formula:
Figure 986940DEST_PATH_IMAGE001
wherein D () is the output result of the preset discriminator,
Figure 742407DEST_PATH_IMAGE002
the data is spliced for the samples and,
Figure 866221DEST_PATH_IMAGE003
data is spliced for the label.
Further, importing the distance cost value, a preset learning rate, a preset smoothing constant and an initial discriminator weight value into a preset optimizer, and finishing weight updating of the preset discriminator, specifically comprising: updating the formula by the preset discriminator weight:
Figure 681730DEST_PATH_IMAGE004
updating the weight value of the preset discriminator; wherein, the first and the second end of the pipe are connected with each other,
Figure 625415DEST_PATH_IMAGE005
to preset the weight values of the encoders generated in the updating process,
Figure 122256DEST_PATH_IMAGE006
in order to be the cost value of the distance,
Figure 834997DEST_PATH_IMAGE007
to prepareThe learning rate is set, and the learning rate is set,
Figure 86986DEST_PATH_IMAGE008
and
Figure 517968DEST_PATH_IMAGE009
is a preset smoothing constant; when the weight value is larger than c or smaller than-c, through a preset clipping formula:
Figure 615237DEST_PATH_IMAGE010
performing gradient clipping on a preset discriminator weight value; wherein c is a clipping threshold.
Further, leading the preset learning rate, the preset smoothing constant, the initial encoder weight value and the trained preset discriminator into the preset optimizer, completing the updating of the preset encoder weight value, and specifically comprising: updating the formula by presetting the encoder weight:
Figure 385747DEST_PATH_IMAGE011
updating the weight value of the preset encoder; wherein the content of the first and second substances,
Figure 543059DEST_PATH_IMAGE012
to preset the weight values of the encoders generated in the updating process,
Figure 461336DEST_PATH_IMAGE013
the data is spliced for the samples and,
Figure 362296DEST_PATH_IMAGE007
in order to preset the learning rate, the learning rate is set,
Figure 49629DEST_PATH_IMAGE008
and
Figure 377842DEST_PATH_IMAGE009
is a preset smoothing constant.
Further, the preset learning rate, the preset smoothing constant, the initial encoder weight value and the trained preset discriminator are led into the preset optimizer to complete preset generationUpdating the weight value specifically includes: updating the formula by the preset generator weight:
Figure 721099DEST_PATH_IMAGE014
updating the weight value of the preset generator; wherein the content of the first and second substances,
Figure 425750DEST_PATH_IMAGE015
to update the weight values of the preset generators generated in the process,
Figure 233169DEST_PATH_IMAGE003
the data is spliced for the label and,
Figure 732283DEST_PATH_IMAGE007
in order to preset the learning rate, the learning rate is set,
Figure 562836DEST_PATH_IMAGE008
and
Figure 540019DEST_PATH_IMAGE009
is a preset smoothing constant.
Further, after obtaining the trained preset encoder and preset generator, the method further comprises: and modifying the semantic label data by presetting a semantic label modification interface.
As can be appreciated by those skilled in the art, the present invention has at least the following beneficial effects:
(1) And through presetting the sample splicing data acquisition position and presetting the label splicing data acquisition position, the sample splicing data and the label splicing data which are formed by splicing different depth hidden layers of the related structure are acquired. Because the hidden layer corresponding to the acquisition position is preset, relevant technicians can select proper hidden layer input data or output data for flexible splicing according to the network structure characteristics of the relevant technicians (a preset encoder, a preset generator and the like).
(2) Due to reasons such as data asset privacy and safety, when effective real service scene data cannot be obtained during model training, relevant open source data can be obtained through a preset interface, semantic tag data relevant to requirements can be obtained, a simulation training effect separated from the real service scene data can be achieved, risks caused by problems such as data privacy disclosure are avoided, and compared with a traditional scheme, the effect is improved to a certain extent.
(3) Finally, the range and definition of the semantic tag data can be modified through the preset semantic tag modification interface, so that the classification function can be dynamically adjusted to a certain degree, more diversified requirements are met, and better support is provided for user customization.
Drawings
Some embodiments of the disclosure are described below with reference to the accompanying drawings, in which:
fig. 1 is a flowchart of a dynamic data identification method based on natural language according to an embodiment of the present application.
Detailed Description
It should be understood by those skilled in the art that the embodiments described below are only preferred embodiments of the present disclosure, and do not mean that the present disclosure can be implemented only by the preferred embodiments, which are merely for explaining the technical principles of the present disclosure and are not intended to limit the scope of the present disclosure. All other embodiments that can be derived by one of ordinary skill in the art from the preferred embodiments provided by the disclosure and that fall within the scope of the disclosure are intended to be encompassed by the present disclosure without any inventive step.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The technical solutions proposed in the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Fig. 1 further provides a data dynamic identification method based on natural language for the embodiment of the present application, and as shown in fig. 1, the method provided in the embodiment of the present application mainly includes the following steps:
and 110, acquiring a sample set, and determining semantic tag data corresponding to each sample data in the sample set.
It should be noted that the sample set may be real service data, and in the case of lack of support of the real service data, or the real service scene data cannot directly participate in training due to privacy and security, the sample set may also be substitute open-source service data rich in similar semantics. The replacement open source service data may be supplemented by similar near semantic replacements, or other open source data sets. In the event that no suitable substitute open source business data can be found, approximately 50 samples (artificial sample data) can be artificially made for each tag as a set of samples. The specific contents of the real service data, the alternative open source service data and the artificial sample data can be determined by those skilled in the art according to actual conditions.
The method for obtaining the sample set may be obtaining through a preset sample uploading process.
The method for determining semantic tag data corresponding to each sample data in the sample set may specifically be: (1) Acquiring a semantic tag set uploaded by an operator through a preset semantic tag interface; wherein the semantic tag set comprises semantic tag data; (2) And importing the sample set into a preset keyword/subject term extraction algorithm to obtain semantic tag data corresponding to each sample data. The preset keyword/subject term extraction algorithm can be any algorithm with a keyword/subject term extraction function, such as a TF-IDF algorithm; (3) The method has the advantages that the sentence backbone is analyzed through syntactic analysis and is directly made of sentence backbone paragraphs, the labels obtained by the method are more generalized and are suitable for individual requirements, or under the condition that original data are not labeled, data related to samples can be searched by combining other business data of companies, and all semantic label data are obtained through business logic and algorithm correlation.
Thus, the acquisition of the sample set, the acquisition of the semantic tag data and the mapping of the sample data and the semantic tag data are completed.
Step 120, generating an experiment set based on the sample data, the semantic tag data and the mapping relation between the sample data and the semantic tag data; and splitting the experiment set into a training data set and a verification data set.
Step 130, importing sample data in the training data set into a preset encoder; and acquiring a plurality of sample sub-splicing data from a hidden layer of a preset encoder based on a preset sample splicing data acquisition position, and splicing the sample sub-splicing data into sample splicing data.
It should be noted that the hidden layer of the preset encoder at least includes a text semantic coding network and a tag semantic coding network. And presetting a sample splicing data acquisition position for defining which hidden layer corresponding to the text semantic coding network and the label semantic coding network extracts hidden layer input data or output data as sample sub-splicing data. And further splicing all the obtained sample sub-spliced data into sample spliced data. For example,
Figure 201945DEST_PATH_IMAGE016
for the output of the step 1 text semantic coding network,
Figure 131680DEST_PATH_IMAGE017
outputting the labeled semantic coding network in the step 1;
Figure 246267DEST_PATH_IMAGE018
the data is stitched for the sample.
Step 140, importing semantic label data and preset reference dimension data in the training data set into a preset generator; and acquiring a plurality of label sub-splicing data from the hidden layer of the preset generator based on the preset label splicing data acquisition position, and splicing the label sub-splicing data into label splicing data.
It should be noted that the hidden layer of the preset generator at least includes a text semantic coding network and a label semantic coding network. Preset tag splice data acquisitionAnd the position is used for defining which hidden layer corresponding to the text semantic coding network and the label semantic coding network extracts hidden layer input data or output data as label sub-splicing data. And then splicing all the obtained label sub-splicing data into label splicing data. For example,
Figure 27141DEST_PATH_IMAGE019
the output of the network is generated for the tag semantics in step 1,
Figure 746835DEST_PATH_IMAGE020
generating the output of the network for the text semantics in the step 1;
Figure 587752DEST_PATH_IMAGE021
data is spliced for the label. In addition, the preset reference dimension data may be a null value or supervisory information related to the tag, and the specific content thereof may be determined by those skilled in the art.
In addition, since the sample data may correspond to a plurality of semantic tag data, one or more semantic tag data may exist inside the semantic tag data. The separation method of the semantic tag data may be any feasible method, for example, a separator "|" separation.
Based on the step 130 and the step 140, those skilled in the art can understand that the sample stitching data and the label stitching data formed by stitching different depth hidden layers of the related structure are obtained by presetting the sample stitching data obtaining position and the label stitching data obtaining position. Because the hidden layer corresponding to the acquisition position is preset, relevant technicians can select proper hidden layer input data or output data for flexible splicing according to the network structure characteristics of the relevant technicians (a preset encoder, a preset generator and the like).
Thus, the acquisition of the sample splicing data and the label splicing data is completed.
Step 150, determining a distance cost value between the sample splicing data and the label splicing data based on a preset inter-distribution distance equation; and importing the distance cost value, the preset learning rate, the preset smoothing constant and the initial discriminator weight value into a preset optimizer, and finishing weight updating of the preset discriminator to obtain the trained preset discriminator.
It should be noted that the preset inter-distribution distance equation is any feasible measurement formula capable of calculating the difference between the two output distributions.
The method for obtaining the preset inter-distribution distance equation may specifically be: and (3) adopting a Wasserstein-distance formula as an object equation, jointly distributing and edge distribution corresponding to an encoder and a generator, and sampling (X, Y) corresponding to sample splicing data and label splicing data.
According to the formula:
Figure 455214DEST_PATH_IMAGE001
and obtaining the cost value of the preset distance. Wherein D () presets the discriminator output result,
Figure 39779DEST_PATH_IMAGE002
the data is spliced for the samples and,
Figure 410718DEST_PATH_IMAGE003
data is spliced for the label.
In addition, the above theoretical method based on Wasserstein-distance may be replaced by any theory capable of calculating the difference between two output distributions, such as Kullback-leibler divergence theory, jensen-Shannon divergence theory, and the like.
Wherein, leading-in distance cost value, preset learning rate, preset smooth constant, initial discriminator weighted value into predetermineeing the optimizer, accomplishing the weight update of predetermineeing the discriminator, specifically can be, through predetermineeing discriminator weight update formula:
Figure 625799DEST_PATH_IMAGE004
updating the weight value of the preset discriminator; wherein the content of the first and second substances,
Figure 449398DEST_PATH_IMAGE005
to preset the weight values of the encoders generated in the updating process,
Figure 103233DEST_PATH_IMAGE006
in order to be the cost value of the distance,
Figure 328678DEST_PATH_IMAGE007
in order to preset the learning rate,
Figure 714660DEST_PATH_IMAGE008
and
Figure 291135DEST_PATH_IMAGE009
is a preset smoothing constant; when the weight value is larger than c or smaller than-c, through a preset clipping formula:
Figure 217503DEST_PATH_IMAGE010
performing gradient clipping on the weight value of the preset discriminator; where c is the clipping threshold.
Thus, the preset encoder training is completed.
Step 160, importing a preset learning rate, a preset smoothing constant, an initial encoder weight value and a trained preset discriminator into a preset optimizer to complete updating of the preset encoder weight value; importing a preset learning rate, a preset smoothing constant, an initial encoder weight value and a trained preset discriminator into a preset optimizer to finish updating of a preset generator weight value; to obtain a trained preset encoder and preset generator.
Wherein, will predetermine the learning rate, predetermine smooth constant, initial encoder weighted value, the leading-in optimizer of predetermineeing of the good ware of distinguishing of training, accomplish the update of predetermineeing the encoder weighted value, specifically can be for, through predetermineeing encoder weight update formula:
Figure 563034DEST_PATH_IMAGE011
updating the weight value of a preset encoder; wherein the content of the first and second substances,
Figure 182234DEST_PATH_IMAGE012
for the weight values of the preset encoders generated in the updating process,
Figure 980425DEST_PATH_IMAGE013
the data is spliced for the samples and,
Figure 648167DEST_PATH_IMAGE007
in order to preset the learning rate,
Figure 848204DEST_PATH_IMAGE008
and
Figure 903885DEST_PATH_IMAGE009
is a preset smoothing constant.
Wherein, leading-in the optimizer of predetermineeing preset learning rate, predetermineeing smooth constant, initial encoder weighted value, the good ware of predetermineeing of training, the update of the completion preset generater weighted value specifically is, through predetermineeing generater weight update formula:
Figure 923794DEST_PATH_IMAGE014
updating the weight value of the preset generator; wherein the content of the first and second substances,
Figure 457543DEST_PATH_IMAGE015
to update the weight values of the preset generators generated in the process,
Figure 715349DEST_PATH_IMAGE003
the data is spliced for the label and,
Figure 410773DEST_PATH_IMAGE007
in order to preset the learning rate,
Figure 449136DEST_PATH_IMAGE008
and
Figure 520997DEST_PATH_IMAGE009
is a preset smoothing constant.
Thus, the training of the preset encoder and the preset generator is completed.
In addition, after obtaining the trained preset encoder and preset generator, the application can further: and modifying the semantic tag data by presetting a semantic tag modification interface. And then can adjust classification function to a certain extent developments, satisfy more diversified demand, provide better support for user-defined.
Step 170, obtaining verification sample splicing data corresponding to sample data in the verification data set based on the trained preset encoder; obtaining verification label splicing data based on the trained preset generator or label splicing data; and matching the verification sample splicing data with the verification label splicing data based on a trained preset discriminator or a preset matching degree calculation formula.
It should be noted that, based on the tag splicing data, the method for obtaining the verification tag splicing data includes: and directly using the label splicing data as verification label splicing data. The preset matching degree calculation formula is any existing formula capable of calculating the matching degree of the verification sample splicing data and the verification label splicing data.
Wherein the validation tag splice data is generated online with a generator. When a new label appears and output data needs to be modified, the model does not need to be retrained, and semantic label data with higher customization degree can be provided on line.
So far, the technical solutions of the present disclosure have been described in connection with the foregoing embodiments, but it is easily understood by those skilled in the art that the scope of the present disclosure is not limited to only these specific embodiments. The technical solutions in the above embodiments can be split and combined, and equivalent changes or substitutions can be made on related technical features by those skilled in the art without departing from the technical principles of the present disclosure, and any changes, equivalents, improvements, and the like made within the technical concept and/or technical principles of the present disclosure will fall within the protection scope of the present disclosure.

Claims (9)

1. A dynamic data identification method based on natural language is characterized in that the method comprises the following steps:
acquiring a sample set, and determining semantic tag data corresponding to each sample data in the sample set;
generating an experiment set based on the sample data, the semantic tag data and the mapping relation between the sample data and the semantic tag data; splitting the experiment set into a training data set and a verification data set;
importing sample data in a training data set into a preset encoder; acquiring a plurality of sample sub-splicing data from a hidden layer of a preset encoder based on a preset sample splicing data acquisition position, and splicing the sample sub-splicing data into sample splicing data;
importing semantic label data and preset reference dimension data in a training data set into a preset generator; acquiring a plurality of label sub-splicing data from a hidden layer of a preset generator based on a preset label splicing data acquisition position, and splicing the label sub-splicing data into label splicing data;
determining a distance cost value between the sample splicing data and the label splicing data based on a preset inter-distribution distance equation; importing the distance cost value, the preset learning rate, the preset smoothing constant and the initial discriminator weight value into a preset optimizer to complete weight updating of the preset discriminator so as to obtain a trained preset discriminator;
importing the preset learning rate, the preset smoothing constant, the initial encoder weight value and the trained preset discriminator into the preset optimizer to finish updating of the preset encoder weight value; importing the preset learning rate, the preset smoothing constant, the initial encoder weight value and the trained preset discriminator into the preset optimizer to finish updating of the weight value of the preset generator; so as to obtain a trained preset encoder and a preset generator;
obtaining verification sample splicing data corresponding to sample data in a verification data set based on a trained preset encoder; obtaining verification label splicing data based on the trained preset generator or label splicing data; and matching the verification sample splicing data with the verification label splicing data based on a trained preset discriminator or a preset matching degree calculation formula.
2. The dynamic natural language based data recognition method of claim 1,
the hidden layer of the preset encoder and the hidden layer of the preset generator both comprise a text semantic coding network layer and a label semantic coding network layer.
3. The method according to claim 1, wherein determining semantic tag data corresponding to each sample data in the sample set specifically comprises:
acquiring a semantic tag set through a preset semantic tag interface; wherein the semantic tag set comprises semantic tag data; or the like, or, alternatively,
obtaining semantic tag data corresponding to each sample data through a preset keyword/subject term extraction algorithm;
or analyzing the part of speech of the sample data through a preset sample part of speech analysis algorithm to obtain preset attribute words corresponding to the sample data, and splicing the preset attribute words into semantic tag data;
or when a preset associated data set corresponding to the sample set is obtained, extracting the keywords/subject terms corresponding to the preset associated data set through a keyword/subject term extraction algorithm and a preset sample part-of-speech analysis algorithm to obtain semantic tag data.
4. The dynamic data identification method based on natural language according to claim 1, wherein the obtaining of the sample set specifically includes:
and acquiring real service data or replacing open source service data or artificial sample data as a sample set through a preset sample uploading process.
5. The dynamic natural language-based data recognition method of claim 1, wherein before determining the cost value of the distance between the sample-stitched data and the label-stitched data based on a preset inter-distribution distance equation, the method further comprises:
replacing joint distribution in the Wasserstein-distance method with an encoder, replacing edge distribution with a generator, and replacing sampling with sample splicing data and label splicing data;
obtaining a preset distance cost value calculation formula:
Figure 308909DEST_PATH_IMAGE001
wherein D () is the output result of the preset discriminator,
Figure 99011DEST_PATH_IMAGE002
the data is spliced for the samples and,
Figure 650078DEST_PATH_IMAGE003
data is spliced for the label.
6. The dynamic data recognition method according to claim 1, wherein the step of importing the distance cost value, the preset learning rate, the preset smoothing constant, and the initial discriminator weight value into a preset optimizer to complete updating of the weight of the preset discriminator includes:
updating the formula by the preset discriminator weight:
Figure 918248DEST_PATH_IMAGE004
updating the weight value of the preset discriminator; wherein the content of the first and second substances,
Figure 972792DEST_PATH_IMAGE005
to preset the weight values of the encoders generated in the updating process,
Figure 137057DEST_PATH_IMAGE006
in order to be the cost value of the distance,
Figure 644261DEST_PATH_IMAGE007
in order to preset the learning rate,
Figure 981702DEST_PATH_IMAGE008
and
Figure 156331DEST_PATH_IMAGE009
is a preset smoothing constant; when the weight value is larger than c or smaller than-c, through a preset clipping formula:
Figure 22656DEST_PATH_IMAGE010
performing gradient clipping on the weight value of the preset discriminator; wherein c is a clipping threshold.
7. The dynamic data recognition method based on natural language according to claim 1, wherein the updating of the preset encoder weight value is completed by importing the preset learning rate, the preset smoothing constant, the initial encoder weight value, and the trained preset discriminator into the preset optimizer, and specifically includes:
updating the formula by presetting the encoder weight:
Figure 485998DEST_PATH_IMAGE011
updating the weight value of a preset encoder; wherein the content of the first and second substances,
Figure 95971DEST_PATH_IMAGE012
to preset the weight values of the encoders generated in the updating process,
Figure 125107DEST_PATH_IMAGE013
the data is spliced for the samples and,
Figure 427913DEST_PATH_IMAGE007
in order to preset the learning rate, the learning rate is set,
Figure 175289DEST_PATH_IMAGE008
and
Figure 588953DEST_PATH_IMAGE009
is a preset smoothing constant.
8. The dynamic data recognition method according to claim 1, wherein the step of importing the preset learning rate, the preset smoothing constant, the initial encoder weight value, and the trained preset discriminator into the preset optimizer to complete updating of the preset generator weight value includes:
updating the formula by presetting the weight of the generator:
Figure 675857DEST_PATH_IMAGE014
updating the weight value of the preset generator; wherein the content of the first and second substances,
Figure 149564DEST_PATH_IMAGE015
to update the weight values of the preset generators generated in the process,
Figure 384236DEST_PATH_IMAGE003
the data is spliced for the label and,
Figure 336012DEST_PATH_IMAGE007
in order to preset the learning rate,
Figure 339740DEST_PATH_IMAGE008
and
Figure 922031DEST_PATH_IMAGE009
is a preset smoothing constant.
9. The dynamic natural language based data recognition method of claim 1, wherein after obtaining the trained predictive coder and predictive generator, the method further comprises:
and modifying the semantic label data by presetting a semantic label modification interface.
CN202211359030.2A 2022-11-02 2022-11-02 Data dynamic identification method based on natural language Active CN115408498B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211359030.2A CN115408498B (en) 2022-11-02 2022-11-02 Data dynamic identification method based on natural language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211359030.2A CN115408498B (en) 2022-11-02 2022-11-02 Data dynamic identification method based on natural language

Publications (2)

Publication Number Publication Date
CN115408498A true CN115408498A (en) 2022-11-29
CN115408498B CN115408498B (en) 2023-03-24

Family

ID=84169251

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211359030.2A Active CN115408498B (en) 2022-11-02 2022-11-02 Data dynamic identification method based on natural language

Country Status (1)

Country Link
CN (1) CN115408498B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582175A (en) * 2020-05-09 2020-08-25 中南大学 High-resolution remote sensing image semantic segmentation method sharing multi-scale countermeasure characteristics
CN111583276A (en) * 2020-05-06 2020-08-25 西安电子科技大学 CGAN-based space target ISAR image component segmentation method
CN113936217A (en) * 2021-10-25 2022-01-14 华中师范大学 Priori semantic knowledge guided high-resolution remote sensing image weakly supervised building change detection method
WO2022160902A1 (en) * 2021-01-28 2022-08-04 广西大学 Anomaly detection method for large-scale multivariate time series data in cloud environment
CN114973062A (en) * 2022-04-25 2022-08-30 西安电子科技大学 Multi-modal emotion analysis method based on Transformer
CN115035418A (en) * 2022-06-15 2022-09-09 杭州电子科技大学 Remote sensing image semantic segmentation method and system based on improved deep LabV3+ network
CN115049936A (en) * 2022-08-12 2022-09-13 武汉大学 High-resolution remote sensing image-oriented boundary enhancement type semantic segmentation method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111583276A (en) * 2020-05-06 2020-08-25 西安电子科技大学 CGAN-based space target ISAR image component segmentation method
CN111582175A (en) * 2020-05-09 2020-08-25 中南大学 High-resolution remote sensing image semantic segmentation method sharing multi-scale countermeasure characteristics
WO2022160902A1 (en) * 2021-01-28 2022-08-04 广西大学 Anomaly detection method for large-scale multivariate time series data in cloud environment
CN113936217A (en) * 2021-10-25 2022-01-14 华中师范大学 Priori semantic knowledge guided high-resolution remote sensing image weakly supervised building change detection method
CN114973062A (en) * 2022-04-25 2022-08-30 西安电子科技大学 Multi-modal emotion analysis method based on Transformer
CN115035418A (en) * 2022-06-15 2022-09-09 杭州电子科技大学 Remote sensing image semantic segmentation method and system based on improved deep LabV3+ network
CN115049936A (en) * 2022-08-12 2022-09-13 武汉大学 High-resolution remote sensing image-oriented boundary enhancement type semantic segmentation method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
费豪 等: "基于动态句法剪枝机制的中文语义角色标注", 《计算机学报》 *

Also Published As

Publication number Publication date
CN115408498B (en) 2023-03-24

Similar Documents

Publication Publication Date Title
CN110148400B (en) Pronunciation type recognition method, model training method, device and equipment
CN110147445A (en) Intension recognizing method, device, equipment and storage medium based on text classification
CN111026842A (en) Natural language processing method, natural language processing device and intelligent question-answering system
CN104503998B (en) For the kind identification method and device of user query sentence
CN107515934B (en) Movie semantic personalized tag optimization method based on big data
CN104462064A (en) Method and system for prompting content input in information communication of mobile terminals
US7315810B2 (en) Named entity (NE) interface for multiple client application programs
CN112671985A (en) Agent quality inspection method, device, equipment and storage medium based on deep learning
CN111177402A (en) Evaluation method and device based on word segmentation processing, computer equipment and storage medium
CN111091009B (en) Document association auditing method based on semantic analysis
CN113821605A (en) Event extraction method
CN116467417A (en) Method, device, equipment and storage medium for generating answers to questions
CN112052675A (en) Method and device for detecting sensitive information of unstructured text
CN114996506A (en) Corpus generation method and device, electronic equipment and computer-readable storage medium
CN113986660A (en) Matching method, device, equipment and storage medium of system adjustment strategy
CN112380848B (en) Text generation method, device, equipment and storage medium
CN113868422A (en) Multi-label inspection work order problem traceability identification method and device
CN115408498B (en) Data dynamic identification method based on natural language
CN105653619B (en) The update method and device in correct log library in intelligent Answer System
CN115115432B (en) Product information recommendation method and device based on artificial intelligence
CN110347824B (en) Method for determining optimal number of topics of LDA topic model based on vocabulary similarity
CN113449506A (en) Data detection method, device and equipment and readable storage medium
Dong et al. End-to-end topic classification without asr
CN115600580B (en) Text matching method, device, equipment and storage medium
CN117195864A (en) Question generation system based on answer perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant