CN115408498B - Data dynamic identification method based on natural language - Google Patents

Data dynamic identification method based on natural language Download PDF

Info

Publication number
CN115408498B
CN115408498B CN202211359030.2A CN202211359030A CN115408498B CN 115408498 B CN115408498 B CN 115408498B CN 202211359030 A CN202211359030 A CN 202211359030A CN 115408498 B CN115408498 B CN 115408498B
Authority
CN
China
Prior art keywords
preset
data
sample
splicing
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211359030.2A
Other languages
Chinese (zh)
Other versions
CN115408498A (en
Inventor
杨介
崔昆俞
赵鸿
伍之洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongfu Safety Technology Co Ltd
Original Assignee
Zhongfu Safety Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongfu Safety Technology Co Ltd filed Critical Zhongfu Safety Technology Co Ltd
Priority to CN202211359030.2A priority Critical patent/CN115408498B/en
Publication of CN115408498A publication Critical patent/CN115408498A/en
Application granted granted Critical
Publication of CN115408498B publication Critical patent/CN115408498B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a data dynamic identification method based on natural language, mainly relates to the technical field of data dynamic identification, and is used for solving the problems of general applicability and high uncertainty of the existing model performance. The method comprises the following steps: determining semantic tag data corresponding to the sample data; generating an experiment set; splitting the data into a training data set and a verification data set; importing sample data in a training data set into a preset encoder; splicing into sample splicing data; importing semantic label data in a training data set into a preset generator, and further splicing into label splicing data; determining the cost value of the distance between the sample splicing data and the label splicing data to obtain a trained preset discriminator; obtaining a trained preset encoder and a preset generator; obtaining verification sample splicing data; obtaining verification label splicing data; and completing the matching of the data. According to the method, the fitting degree of the model and the data is improved, and the accuracy is prompted.

Description

Data dynamic identification method based on natural language
Technical Field
The application relates to the technical field of dynamic data identification, in particular to a dynamic data identification method based on natural language.
Background
The text classification task in the natural language processing field is widely applied to various fields and industries, the application scene range is wide, and various resources (service platforms, hardware, technical frameworks, data and the like) which can be utilized in the development and implementation process are various.
The existing method for dynamically identifying data comprises the following steps: and (3) adopting a general structure, taking a BERT or similar model [ CLS ] layer as the input of a classifier, and carrying out classification task training. Can meet certain industrial requirements under normal conditions, and has simple realization and short development period.
However, as data asset security management becomes more and more standard, a certain bottleneck exists in the performance level in a real service scene through a general architecture, and the universality and uncertainty of the corresponding model performance are large.
Disclosure of Invention
In view of the above-mentioned shortcomings in the prior art, the present invention provides a dynamic data identification method based on natural language, so as to solve the above-mentioned technical problems.
The application provides a data dynamic identification method based on natural language, which comprises the following steps: acquiring a sample set, and determining semantic tag data corresponding to each sample data in the sample set; generating an experiment set based on the sample data, the semantic label data and the mapping relation between the sample data and the semantic label data; dividing the experiment set into a training data set and a verification data set; importing sample data in a training data set into a preset encoder; acquiring a plurality of sample sub-splicing data from a hidden layer of a preset encoder based on a preset sample splicing data acquisition position, and further splicing the sample sub-splicing data into sample splicing data; importing semantic label data and preset reference dimension data in a training data set into a preset generator; acquiring a plurality of label sub-splicing data from a hidden layer of a preset generator based on a preset label splicing data acquisition position, and splicing the label sub-splicing data into label splicing data; determining a distance cost value between the sample splicing data and the label splicing data based on a preset distribution distance equation; importing the distance cost value, the preset learning rate, the preset smoothing constant and the initial discriminator weight value into a preset optimizer to complete weight updating of the preset discriminator so as to obtain a trained preset discriminator; importing a preset learning rate, a preset smoothing constant, an initial encoder weight value and a trained preset discriminator into a preset optimizer to finish updating of the preset encoder weight value; importing a preset learning rate, a preset smoothing constant, an initial encoder weight value and a trained preset discriminator into a preset optimizer to finish updating of a preset generator weight value; to obtain a trained preset encoder and a preset generator; obtaining verification sample splicing data corresponding to sample data in a verification data set based on a trained preset encoder; obtaining verification label splicing data based on the trained preset generator or label splicing data; and matching the verification sample splicing data with the verification label splicing data based on a trained preset discriminator or a preset matching degree calculation formula.
Further, the hidden layer of the preset encoder and the hidden layer of the preset generator both comprise a text semantic coding network layer and a label semantic coding network layer.
Further, determining semantic tag data corresponding to each sample data in the sample set specifically includes: acquiring a semantic tag set through a preset semantic tag interface; the semantic tag set comprises semantic tag data; or, semantic tag data corresponding to each sample data is obtained through a preset keyword/subject term extraction algorithm; or analyzing the part of speech of the sample data through a preset sample part of speech analysis algorithm to obtain preset attribute words corresponding to the sample data, and splicing the preset attribute words into semantic tag data; or when the preset associated data set corresponding to the sample set is obtained, extracting the keywords/subject terms corresponding to the preset associated data set through a keyword/subject term extraction algorithm and a preset sample part-of-speech analysis algorithm to obtain semantic tag data.
Further, acquiring a sample set specifically includes: and acquiring real service data or replacing open source service data or artificial sample data as a sample set through a preset sample uploading process.
Further, before determining a distance cost value between the sample stitching data and the label stitching data based on a preset inter-distribution distance equation, the method further includes: replacing joint distribution in the Wasserstein-distance method with an encoder, replacing edge distribution with a generator, and replacing sampling with sample splicing data and label splicing data; obtaining a preset distance cost value calculation formula:
Figure 986940DEST_PATH_IMAGE001
wherein D () is the output result of the preset discriminator,
Figure 742407DEST_PATH_IMAGE002
the data is spliced for the samples and,
Figure 866221DEST_PATH_IMAGE003
data is spliced for the label.
Further, importing the distance cost value, the preset learning rate, the preset smoothing constant and the initial discriminator weight value into a preset optimizer, and completing weight update of the preset discriminator, which specifically comprises: updating the formula by the preset discriminator weight:
Figure 681730DEST_PATH_IMAGE004
updating the weight value of the preset discriminator; wherein,
Figure 625415DEST_PATH_IMAGE005
to preset the weight values of the encoders generated in the updating process,
Figure 122256DEST_PATH_IMAGE006
in order to be the cost value of the distance,
Figure 834997DEST_PATH_IMAGE007
in order to preset the learning rate,
Figure 86986DEST_PATH_IMAGE008
and
Figure 517968DEST_PATH_IMAGE009
is a preset smoothing constant; when the weight value is larger than c or smaller than-c, through a preset clipping formula:
Figure 615237DEST_PATH_IMAGE010
performing gradient clipping on the weight value of the preset discriminator; where c is the clipping threshold.
Further, leading the preset learning rate, the preset smoothing constant, the initial encoder weight value and the trained preset discriminator into the preset optimizer, completing the updating of the preset encoder weight value, and specifically comprising: updating the formula by presetting the encoder weight:
Figure 385747DEST_PATH_IMAGE011
updating the weight value of a preset encoder; wherein,
Figure 543059DEST_PATH_IMAGE012
to preset the weight values of the encoders generated in the updating process,
Figure 461336DEST_PATH_IMAGE013
the data is spliced for the samples and,
Figure 362296DEST_PATH_IMAGE007
in order to preset the learning rate,
Figure 49629DEST_PATH_IMAGE008
and
Figure 377842DEST_PATH_IMAGE009
is a preset smoothing constant.
Further, importing the preset learning rate, the preset smoothing constant, the initial encoder weight value and the trained preset discriminator into the preset optimizer, and completing updating of the weight value of the preset generator, specifically including: updating the formula by presetting the weight of the generator:
Figure 721099DEST_PATH_IMAGE014
updating the weight value of the preset generator; wherein,
Figure 425750DEST_PATH_IMAGE015
for the weight value of the preset generator generated in the updating process,
Figure 233169DEST_PATH_IMAGE003
the data is spliced for the label and,
Figure 732283DEST_PATH_IMAGE007
in order to preset the learning rate,
Figure 562836DEST_PATH_IMAGE008
and
Figure 540019DEST_PATH_IMAGE009
is a preset smoothing constant.
Further, after obtaining the trained preset encoder and preset generator, the method further comprises: and modifying the semantic label data by presetting a semantic label modification interface.
As can be appreciated by those skilled in the art, the present invention has at least the following beneficial effects:
(1) And through presetting the sample splicing data acquisition position and presetting the label splicing data acquisition position, the sample splicing data and the label splicing data which are formed by splicing different depth hidden layers of the related structure are acquired. Because the hidden layer corresponding to the acquisition position is preset, relevant technicians can select proper hidden layer input data or output data for flexible splicing according to the network structure characteristics of the relevant technicians (a preset encoder, a preset generator and the like).
(2) Due to reasons such as data asset privacy and safety, when effective real service scene data cannot be obtained during model training, relevant open source data can be obtained through a preset interface, semantic tag data with associated requirements can be obtained, a simulation training effect separated from the real service scene data can be achieved, risks caused by problems such as data privacy disclosure are avoided, and compared with a traditional scheme, the effect is improved to a certain extent.
(3) Finally, the range and definition of the semantic tag data can be modified through the preset semantic tag modification interface, so that the classification function can be dynamically adjusted to a certain degree, more diversified requirements are met, and better support is provided for user customization.
Drawings
Some embodiments of the disclosure are described below with reference to the accompanying drawings, in which:
fig. 1 is a flowchart of a dynamic data identification method based on natural language according to an embodiment of the present application.
Detailed Description
It should be understood by those skilled in the art that the embodiments described below are only preferred embodiments of the present disclosure, and do not mean that the present disclosure can be implemented only by the preferred embodiments, which are merely intended to explain the technical principles of the present disclosure and not to limit the scope of the present disclosure. All other embodiments that can be derived by one of ordinary skill in the art from the preferred embodiments provided by the disclosure and that fall within the scope of the disclosure are intended to be encompassed by the present disclosure without any inventive step.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional identical elements in the process, method, article, or apparatus comprising the element.
The technical solutions proposed in the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Fig. 1 further provides a data dynamic recognition method based on natural language for the embodiment of the present application, and as shown in fig. 1, the method provided by the embodiment of the present application mainly includes the following steps:
and 110, acquiring a sample set, and determining semantic tag data corresponding to each sample data in the sample set.
It should be noted that the sample set may be real service data, and in the case of lack of support of the real service data, or the real service scene data cannot directly participate in training due to privacy and security, the sample set may also be substitute open-source service data rich in similar semantics. The replacement open source service data may be replaced for similar near semantic meaning or supplemented by other open source data sets. In the event that no suitable substitute open source business data can be found, approximately 50 samples (artificial sample data) can be artificially made for each tag as a set of samples. The specific contents of the real service data, the alternative open source service data and the artificial sample data can be determined by those skilled in the art according to actual conditions.
The method for obtaining the sample set may be obtaining through a preset sample uploading process.
The method for determining semantic tag data corresponding to each sample data in the sample set may specifically be: (1) Acquiring a semantic tag set uploaded by an operator through a preset semantic tag interface; wherein the semantic tag set comprises semantic tag data; (2) And importing the sample set into a preset keyword/subject term extraction algorithm to obtain semantic tag data corresponding to each sample data. The preset keyword/subject term extraction algorithm can be any algorithm with a keyword/subject term extraction function, such as a TF-IDF algorithm; (3) The method has the advantages that the labels obtained by the method are more generalized and are suitable for individual requirements, or under the condition that original data are not labeled, data related to samples can be searched by combining other business data of companies, and all semantic label data can be obtained through business logic and algorithm association.
Thus, the acquisition of the sample set, the acquisition of the semantic tag data and the mapping of the sample data and the semantic tag data are completed.
Step 120, generating an experiment set based on the sample data, the semantic tag data and the mapping relation between the sample data and the semantic tag data; and splitting the experiment set into a training data set and a verification data set.
Step 130, importing sample data in the training data set into a preset encoder; and acquiring a plurality of sample sub-splicing data from a hidden layer of a preset encoder based on a preset sample splicing data acquisition position, and splicing the sample sub-splicing data into sample splicing data.
It should be noted that the hidden layer of the preset encoder at least includes a text semantic coding network and a tag semantic coding network. And presetting a sample splicing data acquisition position for defining which hidden layer corresponding to the text semantic coding network and the label semantic coding network extracts hidden layer input data or output data as sample sub-splicing data.And splicing all the obtained sample sub-splicing data into sample splicing data. For example,
Figure 201945DEST_PATH_IMAGE016
for the output of the step 1 text semantic coding network,
Figure 131680DEST_PATH_IMAGE017
outputting the labeled semantic coding network in the step 1;
Figure 246267DEST_PATH_IMAGE018
the data is stitched for the sample.
Step 140, importing semantic label data and preset reference dimension data in the training data set into a preset generator; and acquiring a plurality of label sub-splicing data from the hidden layer of the preset generator based on the preset label splicing data acquisition position, and splicing the label sub-splicing data into label splicing data.
It should be noted that the hidden layer of the preset generator at least includes a text semantic coding network and a label semantic coding network. And presetting a tag splicing data acquisition position, and defining which hidden layer corresponding to the text semantic coding network and the tag semantic coding network extracts hidden layer input data or output data as tag sub-splicing data. And then splicing all the obtained label sub-splicing data into label splicing data. For example,
Figure 27141DEST_PATH_IMAGE019
the output of the network is generated for the tag semantics in step 1,
Figure 746835DEST_PATH_IMAGE020
generating the output of the network for the text semantics in the step 1;
Figure 587752DEST_PATH_IMAGE021
data is spliced for the label. In addition, the preset reference dimension data may be a null value or supervisory information related to the tag, and the specific content thereof may be determined by those skilled in the art.
In addition, since the sample data may correspond to a plurality of semantic tag data, one or more semantic tag data may exist inside the semantic tag data. The separation method of the semantic tag data can be any feasible method, for example, the separator "|" is used for separation.
Based on the step 130 and the step 140, those skilled in the art can understand that the sample stitching data and the label stitching data formed by stitching different depth hidden layers of the related structure are obtained by presetting the sample stitching data obtaining position and the label stitching data obtaining position. Because the hidden layer corresponding to the acquisition position is preset, relevant technicians can select proper hidden layer input data or output data for flexible splicing according to the network structure characteristics of the relevant technicians (a preset encoder, a preset generator and the like).
Thus, the acquisition of the sample splicing data and the label splicing data is completed.
Step 150, determining a distance cost value between the sample splicing data and the label splicing data based on a preset inter-distribution distance equation; and importing the distance cost value, the preset learning rate, the preset smoothing constant and the initial discriminator weight value into a preset optimizer, and finishing weight updating of the preset discriminator to obtain the trained preset discriminator.
It should be noted that the preset inter-distribution distance equation is any feasible measurement formula capable of calculating the difference between the two output distributions.
The method for obtaining the preset inter-distribution distance equation may specifically be: and (3) adopting a Wasserstein-distance formula as an object equation, jointly distributing and edge distribution corresponding to an encoder and a generator, and sampling (X, Y) corresponding to sample splicing data and label splicing data.
According to the formula:
Figure 455214DEST_PATH_IMAGE001
and obtaining the preset distance cost value. Wherein D () presets the discriminator output result,
Figure 39779DEST_PATH_IMAGE002
the data is spliced for the samples and,
Figure 410718DEST_PATH_IMAGE003
data is spliced for the label.
In addition, the above theoretical method based on Wasserstein-distance may be replaced by any theory capable of calculating the difference between two output distributions, such as Kullback-leibler divergence theory, jensen-Shannon divergence theory, and the like.
The distance cost value, the preset learning rate, the preset smoothing constant and the initial discriminator weight value are led into a preset optimizer, and the weight update of the preset discriminator is completed, specifically, the formula is updated by the preset discriminator weight:
Figure 625799DEST_PATH_IMAGE004
updating the weight value of the preset discriminator; wherein,
Figure 449398DEST_PATH_IMAGE005
to preset the weight values of the encoders generated in the updating process,
Figure 103233DEST_PATH_IMAGE006
in order to be the cost value of the distance,
Figure 328678DEST_PATH_IMAGE007
in order to preset the learning rate,
Figure 714660DEST_PATH_IMAGE008
and
Figure 291135DEST_PATH_IMAGE009
is a preset smoothing constant; when the weight value is larger than c or smaller than-c, through a preset clipping formula:
Figure 217503DEST_PATH_IMAGE010
performing gradient clipping on a preset discriminator weight value; where c is the clipping threshold.
Thus, the preset encoder training is completed.
Step 160, importing a preset learning rate, a preset smoothing constant, an initial encoder weight value and a trained preset discriminator into a preset optimizer to complete updating of the preset encoder weight value; importing a preset learning rate, a preset smoothing constant, an initial encoder weight value and a trained preset discriminator into a preset optimizer to finish updating of a preset generator weight value; to obtain a trained preset encoder and preset generator.
Wherein, will predetermine the learning rate, predetermine smooth constant, initial encoder weighted value, the leading-in optimizer of predetermineeing of the good ware of distinguishing of training, accomplish the update of predetermineeing the encoder weighted value, specifically can be for, through predetermineeing encoder weight update formula:
Figure 563034DEST_PATH_IMAGE011
updating the weight value of a preset encoder; wherein,
Figure 182234DEST_PATH_IMAGE012
to preset the weight values of the encoders generated in the updating process,
Figure 980425DEST_PATH_IMAGE013
the data is spliced for the samples and,
Figure 648167DEST_PATH_IMAGE007
in order to preset the learning rate,
Figure 848204DEST_PATH_IMAGE008
and
Figure 903885DEST_PATH_IMAGE009
is a preset smoothing constant.
Wherein, leading-in the optimizer of predetermineeing preset learning rate, predetermineeing smooth constant, initial encoder weighted value, the good ware of predetermineeing of training, the update of the completion preset generater weighted value specifically is, through predetermineeing generater weight update formula:
Figure 923794DEST_PATH_IMAGE014
to update the tableSetting a weight value of a generator; wherein,
Figure 457543DEST_PATH_IMAGE015
to update the weight values of the preset generators generated in the process,
Figure 715349DEST_PATH_IMAGE003
the data is spliced for the label and,
Figure 410773DEST_PATH_IMAGE007
in order to preset the learning rate,
Figure 449136DEST_PATH_IMAGE008
and
Figure 520997DEST_PATH_IMAGE009
is a preset smoothing constant.
Thus, the training of the preset encoder and the preset generator is completed.
In addition, after obtaining the trained preset encoder and preset generator, the application can further: and modifying the semantic label data by presetting a semantic label modification interface. And then can adjust classification function to a certain extent developments, satisfy more diversified demand, provide better support for user-defined.
Step 170, obtaining verification sample splicing data corresponding to sample data in the verification data set based on the trained preset encoder; obtaining verification label splicing data based on the trained preset generator or label splicing data; and completing the matching of the verification sample splicing data and the verification label splicing data based on a trained preset discriminator or a preset matching degree calculation formula.
It should be noted that, based on the tag splicing data, the method for obtaining the verification tag splicing data includes: and directly using the label splicing data as verification label splicing data. The preset matching degree calculation formula is any existing formula capable of calculating the matching degree of the verification sample splicing data and the verification label splicing data.
Wherein the validation tag splice data is generated online with a generator. When a new label appears and output data needs to be modified, the model does not need to be retrained, and semantic label data with higher customization degree can be provided on line.
So far, the technical solutions of the present disclosure have been described in connection with the foregoing embodiments, but it is easily understood by those skilled in the art that the scope of the present disclosure is not limited to only these specific embodiments. The technical solutions in the above embodiments can be split and combined, and equivalent changes or substitutions can be made on related technical features by those skilled in the art without departing from the technical principles of the present disclosure, and any changes, equivalents, improvements, and the like made within the technical concept and/or technical principles of the present disclosure will fall within the protection scope of the present disclosure.

Claims (9)

1. A dynamic data identification method based on natural language is characterized in that the method comprises the following steps:
acquiring a sample set, and determining semantic tag data corresponding to each sample data in the sample set;
generating an experiment set based on the sample data, the semantic tag data and the mapping relation between the sample data and the semantic tag data; splitting the experiment set into a training data set and a verification data set;
importing sample data in a training data set into a preset encoder; acquiring a plurality of sample sub-splicing data from a hidden layer of a preset encoder based on a preset sample splicing data acquisition position, and splicing the sample sub-splicing data into sample splicing data;
importing semantic label data and preset reference dimension data in a training data set into a preset generator; acquiring a plurality of label sub-splicing data from a hidden layer of a preset generator based on a preset label splicing data acquisition position, and splicing the label sub-splicing data into label splicing data;
determining a distance cost value between the sample splicing data and the label splicing data based on a preset inter-distribution distance equation; importing the distance cost value, the preset learning rate, the preset smoothing constant and the initial discriminator weight value into a preset optimizer to complete weight updating of the preset discriminator so as to obtain a trained preset discriminator;
importing the preset learning rate, the preset smoothing constant, the initial encoder weight value and the trained preset discriminator into the preset optimizer to finish the updating of the preset encoder weight value; importing the preset learning rate, the preset smoothing constant, the initial encoder weight value and the trained preset discriminator into the preset optimizer to finish updating of the weight value of the preset generator; so as to obtain a trained preset encoder and a preset generator;
obtaining verification sample splicing data corresponding to sample data in a verification data set based on a trained preset encoder; obtaining verification label splicing data based on the trained preset generator or label splicing data; and matching the verification sample splicing data with the verification label splicing data based on a trained preset discriminator or a preset matching degree calculation formula.
2. The dynamic natural language based data recognition method of claim 1,
the hidden layer of the preset encoder and the hidden layer of the preset generator both comprise a text semantic coding network layer and a label semantic coding network layer.
3. The dynamic data identification method based on natural language according to claim 1, wherein determining semantic tag data corresponding to each sample data in the sample set specifically includes:
acquiring a semantic tag set through a preset semantic tag interface; wherein the semantic tag set comprises semantic tag data; or,
obtaining semantic tag data corresponding to each sample data through a preset keyword/subject term extraction algorithm;
or analyzing the part of speech of the sample data through a preset sample part of speech analysis algorithm to obtain preset attribute words corresponding to the sample data, and splicing the preset attribute words into semantic tag data;
or when a preset associated data set corresponding to the sample set is obtained, extracting keywords/subject terms corresponding to the preset associated data set through a keyword/subject term extraction algorithm and a preset sample part-of-speech analysis algorithm to obtain semantic tag data.
4. The dynamic data identification method based on natural language according to claim 1, wherein the obtaining of the sample set specifically includes:
and acquiring real service data or replacing open source service data or artificial sample data as a sample set through a preset sample uploading process.
5. The dynamic natural language-based data recognition method of claim 1, wherein before determining the cost value of the distance between the sample-stitched data and the label-stitched data based on a preset inter-distribution distance equation, the method further comprises:
replacing joint distribution in the Wasserstein-distance method with an encoder, replacing edge distribution with a generator, and replacing sampling with sample splicing data and label splicing data;
obtaining a preset distance cost value calculation formula:
Figure 359581DEST_PATH_IMAGE001
wherein D () is the output result of the preset discriminator,
Figure 429037DEST_PATH_IMAGE002
the data is spliced for the samples and,
Figure 781521DEST_PATH_IMAGE003
data is spliced for the label.
6. The dynamic data recognition method according to claim 1, wherein the step of importing the distance cost value, the preset learning rate, the preset smoothing constant, and the initial discriminator weight value into a preset optimizer to complete updating of the weight of the preset discriminator includes:
updating the formula by the preset discriminator weight:
Figure 661752DEST_PATH_IMAGE004
updating the weight value of the preset discriminator; wherein,
Figure 286637DEST_PATH_IMAGE005
for the weight values of the preset encoders generated in the updating process,
Figure 644937DEST_PATH_IMAGE006
in order to be the cost value of the distance,
Figure 117507DEST_PATH_IMAGE007
in order to preset the learning rate,
Figure 417907DEST_PATH_IMAGE008
and
Figure 280821DEST_PATH_IMAGE009
is a preset smoothing constant; when the weight value is larger than c or smaller than-c, through a preset clipping formula:
Figure 692080DEST_PATH_IMAGE010
performing gradient clipping on the weight value of the preset discriminator; where c is the clipping threshold.
7. The dynamic data recognition method according to claim 1, wherein the updating of the preset encoder weight value is completed by importing the preset learning rate, the preset smoothing constant, the initial encoder weight value, and the trained preset discriminator into the preset optimizer, and specifically includes:
updating the formula by presetting the encoder weight:
Figure 753576DEST_PATH_IMAGE011
updating the weight value of a preset encoder; wherein,
Figure 975610DEST_PATH_IMAGE012
to preset the weight values of the encoders generated in the updating process,
Figure 388137DEST_PATH_IMAGE013
the data is spliced for the samples and,
Figure 620665DEST_PATH_IMAGE007
in order to preset the learning rate,
Figure 802247DEST_PATH_IMAGE008
and
Figure 929603DEST_PATH_IMAGE009
d () is the preset discriminator output result, j represents the value from 1.
8. The dynamic data recognition method according to claim 1, wherein the step of importing the preset learning rate, the preset smoothing constant, the initial encoder weight value, and the trained preset discriminator into the preset optimizer to complete updating of the preset generator weight value includes:
updating the formula by presetting the weight of the generator:
Figure 829426DEST_PATH_IMAGE014
updating the weight value of the preset generator; wherein,
Figure 582488DEST_PATH_IMAGE015
to update the weight values of the preset generators generated in the process,
Figure 884156DEST_PATH_IMAGE003
the data is spliced for the label and,
Figure 182413DEST_PATH_IMAGE007
in order to preset the learning rate,
Figure 569532DEST_PATH_IMAGE008
and
Figure 126284DEST_PATH_IMAGE009
d () is a preset discriminator output result, and m represents the number of computations of D (), for a preset smoothing constant.
9. The dynamic natural language based data recognition method of claim 1, wherein after obtaining the trained predictive coder and predictive generator, the method further comprises:
and modifying the semantic label data by presetting a semantic label modification interface.
CN202211359030.2A 2022-11-02 2022-11-02 Data dynamic identification method based on natural language Active CN115408498B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211359030.2A CN115408498B (en) 2022-11-02 2022-11-02 Data dynamic identification method based on natural language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211359030.2A CN115408498B (en) 2022-11-02 2022-11-02 Data dynamic identification method based on natural language

Publications (2)

Publication Number Publication Date
CN115408498A CN115408498A (en) 2022-11-29
CN115408498B true CN115408498B (en) 2023-03-24

Family

ID=84169251

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211359030.2A Active CN115408498B (en) 2022-11-02 2022-11-02 Data dynamic identification method based on natural language

Country Status (1)

Country Link
CN (1) CN115408498B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113936217A (en) * 2021-10-25 2022-01-14 华中师范大学 Priori semantic knowledge guided high-resolution remote sensing image weakly supervised building change detection method
CN115049936A (en) * 2022-08-12 2022-09-13 武汉大学 High-resolution remote sensing image-oriented boundary enhancement type semantic segmentation method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111583276B (en) * 2020-05-06 2022-04-19 西安电子科技大学 CGAN-based space target ISAR image component segmentation method
CN111582175B (en) * 2020-05-09 2023-07-21 中南大学 High-resolution remote sensing image semantic segmentation method for sharing multi-scale countermeasure features
CN112784965B (en) * 2021-01-28 2022-07-29 广西大学 Large-scale multi-element time series data anomaly detection method oriented to cloud environment
CN114973062B (en) * 2022-04-25 2024-08-20 西安电子科技大学 Multimode emotion analysis method based on Transformer
CN115035418A (en) * 2022-06-15 2022-09-09 杭州电子科技大学 Remote sensing image semantic segmentation method and system based on improved deep LabV3+ network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113936217A (en) * 2021-10-25 2022-01-14 华中师范大学 Priori semantic knowledge guided high-resolution remote sensing image weakly supervised building change detection method
CN115049936A (en) * 2022-08-12 2022-09-13 武汉大学 High-resolution remote sensing image-oriented boundary enhancement type semantic segmentation method

Also Published As

Publication number Publication date
CN115408498A (en) 2022-11-29

Similar Documents

Publication Publication Date Title
CN110008349B (en) Computer-implemented method and apparatus for event risk assessment
CN110147445A (en) Intension recognizing method, device, equipment and storage medium based on text classification
CN110489555A (en) A kind of language model pre-training method of combination class word information
CN104503998B (en) For the kind identification method and device of user query sentence
CN111428504B (en) Event extraction method and device
CN111091009B (en) Document association auditing method based on semantic analysis
CN104462064A (en) Method and system for prompting content input in information communication of mobile terminals
CN113986660A (en) Matching method, device, equipment and storage medium of system adjustment strategy
CN111177402A (en) Evaluation method and device based on word segmentation processing, computer equipment and storage medium
CN112671985A (en) Agent quality inspection method, device, equipment and storage medium based on deep learning
CN112257425A (en) Power data analysis method and system based on data classification model
CN110110087A (en) A kind of Feature Engineering method for Law Text classification based on two classifiers
CN114661951A (en) Video processing method and device, computer equipment and storage medium
CN114491034A (en) Text classification method and intelligent device
CN113868422A (en) Multi-label inspection work order problem traceability identification method and device
CN114064893A (en) Abnormal data auditing method, device, equipment and storage medium
CN115408498B (en) Data dynamic identification method based on natural language
CN117933253A (en) Data processing method, device, electronic equipment and computer readable medium
CN110852082B (en) Synonym determination method and device
CN105653619B (en) The update method and device in correct log library in intelligent Answer System
CN116663547A (en) Sample generation method and device
CN110851572A (en) Session labeling method and device, storage medium and electronic equipment
CN114429140A (en) Case cause identification method and system for causal inference based on related graph information
CN113449506A (en) Data detection method, device and equipment and readable storage medium
Dong et al. End-to-end topic classification without asr

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant