CN112084752A - Statement marking method, device, equipment and storage medium based on natural language - Google Patents

Statement marking method, device, equipment and storage medium based on natural language Download PDF

Info

Publication number
CN112084752A
CN112084752A CN202010936367.XA CN202010936367A CN112084752A CN 112084752 A CN112084752 A CN 112084752A CN 202010936367 A CN202010936367 A CN 202010936367A CN 112084752 A CN112084752 A CN 112084752A
Authority
CN
China
Prior art keywords
sentence
target
model
marking
annotation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010936367.XA
Other languages
Chinese (zh)
Other versions
CN112084752B (en
Inventor
陈夏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202010936367.XA priority Critical patent/CN112084752B/en
Publication of CN112084752A publication Critical patent/CN112084752A/en
Application granted granted Critical
Publication of CN112084752B publication Critical patent/CN112084752B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application discloses a sentence labeling method, a sentence labeling device, a sentence labeling equipment and a storage medium based on natural language, and relates to the technical field of artificial intelligence. The method comprises the following steps: receiving a target sentence sequence input by a user and a sentence marking instruction for the target sentence sequence; calling a preset sentence pattern marking model in response to the sentence marking instruction; inputting the target sentence sequence into a sentence pattern marking model, coding and converting the target sentence sequence through the sentence pattern marking model, and converting the target sentence sequence into a target sentence vector; and calculating loss values of the target sentence vector under different marking results based on a preset weighted loss function, and outputting a target marking sequence under the marking result corresponding to the lowest loss value. According to the method, different weights are given to different samples through the weighting loss function, so that the influence caused by label imbalance can be effectively reduced, the labeling result can be further improved, and the effect of model labeling is improved.

Description

Statement marking method, device, equipment and storage medium based on natural language
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a natural language-based sentence annotation method, apparatus, device, and storage medium.
Background
The sequence labeling problem is the most common problem in natural language, and comprises word segmentation, part of speech labeling, named entity identification, keyword extraction, word meaning role labeling and the like in natural language processing. The question sentence pattern marking is to map the words in the sentence with the service concept to form an abstract expression form corresponding to the sentence, so as to show the semantic meaning of the grammar, and can be corresponding to the named entity recognition task in the sequence marking. In the sentence labeling scenario, the corresponding tag types are more than the identification tasks of the general named entities, and the number of each category is also greatly different, so how to label the concept of the sentence under the conditions of multiple categories and unbalanced types is a difficult point of the sequence labeling task.
A common method is to perform fine adjustment on a pre-trained BERT (Bidirectional Encoder tokens from Transformers) model in combination with a conditional random field layer, but in the current sentence annotation scheme, a loss function adopts the same weight for samples with different difficulty levels, which may cause the current annotation model to blindly reduce a loss value, neglect fitting of a small number of labels, and for entity synonyms that do not appear in a training set, semantic accuracy of annotation result expression is not high enough.
Disclosure of Invention
The technical problem to be solved by the embodiments of the present application is to provide a sentence annotation method, device, equipment and storage medium based on natural language, so as to improve the annotation effect of the model, and enable the annotation result to express the user semantics more accurately.
In order to solve the above technical problem, an embodiment of the present application provides a sentence annotation method based on natural language, which adopts the following technical solutions:
a sentence annotation method based on natural language comprises the following steps:
receiving a target sentence sequence input by a user and a sentence marking instruction for the target sentence sequence;
calling a preset sentence pattern marking model in response to the sentence marking instruction;
inputting the target sentence sequence into the sentence pattern tagging model, coding and converting the target sentence sequence through the sentence pattern tagging model, and converting the target sentence sequence into a target sentence vector;
and calculating loss values of the target sentence vector under different marking results based on a preset weighted loss function, and outputting a target marking sequence under the marking result corresponding to the lowest loss value.
In order to solve the above technical problem, an embodiment of the present application further provides a sentence annotation device based on natural language, which adopts the following technical solutions:
a natural language based sentence annotation apparatus comprising:
the system comprises a data receiving module, a sentence marking module and a sentence marking module, wherein the data receiving module is used for receiving a target sentence sequence input by a user and a sentence marking instruction for the target sentence sequence;
the model calling module is used for calling a preset sentence pattern marking model in response to the sentence marking instruction;
the first model processing module is used for inputting the target sentence sequence into the sentence pattern marking model, coding and converting the target sentence sequence through the sentence pattern marking model and converting the target sentence sequence into a target sentence vector;
and the second model processing module is used for calculating loss values of the target sentence vector under different marking results based on a preset weighted loss function and outputting a target marking sequence under a marking result corresponding to the lowest loss value.
In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:
a computer device comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the sentence annotation method based on natural language according to any one of the above technical solutions when executing the computer program.
In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:
a computer-readable storage medium, having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the natural language based sentence annotation method according to any one of the above technical solutions.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:
the embodiment of the application discloses a statement annotation method, a device, equipment and a storage medium based on natural language, and the statement annotation method based on natural language comprises the steps of firstly receiving a target statement sequence input by a user and a statement annotation instruction for the target statement sequence; then, responding to the statement marking instruction and calling a preset sentence marking model; inputting the target sentence sequence into a sentence pattern marking model, coding and converting the target sentence sequence through the sentence pattern marking model, and converting the target sentence sequence into a target sentence vector; and then further calculating loss values of the target sentence vector under different labeling results based on a preset weighted loss function, and outputting a target labeling sequence under the labeling result corresponding to the lowest loss value. According to the method, different weights are given to different samples through the weighting loss function, so that the influence caused by label imbalance can be effectively reduced, the labeling result can be further improved, and the effect of model labeling is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a diagram of an exemplary system architecture to which embodiments of the present application may be applied;
FIG. 2 is a flowchart of an embodiment of a method for natural language based sentence annotation according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of an embodiment of a natural language-based sentence annotation device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an embodiment of a computer device in an embodiment of the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.
It is noted that the terms "comprises," "comprising," and "having" and any variations thereof in the description and claims of this application and the drawings described above are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. In the claims, the description and the drawings of the specification of the present application, relational terms such as "first" and "second", and the like, may be used solely to distinguish one entity/action/object from another entity/action/object without necessarily requiring or implying any actual such relationship or order between such entities/actions/objects.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the relevant drawings in the embodiments of the present application.
As shown in fig. 1, the system architecture 100 may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is used to provide a medium of communication links between the first terminal device 101, the second terminal device 102, the third terminal device 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the first terminal device 101, the second terminal device 102 and the third terminal device 103 to interact with the server 105 through the network 104 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the first terminal device 101, the second terminal device 102, and the third terminal device 103.
The first terminal device 101, the second terminal device 102 and the third terminal device 103 may be various electronic devices having display screens and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.
The server 105 may be a server that provides various services, such as a background server that provides support for pages displayed on the first terminal apparatus 101, the first terminal apparatus 102, and the third terminal apparatus 103.
It should be noted that, the sentence annotation method based on natural language provided in the embodiments of the present application is generally executed by a server/terminal device, and accordingly, the sentence annotation apparatus based on natural language is generally disposed in the server/terminal device.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continuing reference to FIG. 2, a flowchart of one embodiment of the natural language based sentence annotation process described in the embodiments of the present application is shown. The sentence labeling method based on the natural language comprises the following steps:
step 201: receiving a target sentence sequence input by a user and a sentence marking finger of the target sentence sequence.
When a user sends a request to a server for sentence annotation, the user needs to input an object requested to be annotated, namely a sentence sequence represented by a natural language, to the server, and simultaneously sends an annotation command for the sentence sequence to the server by editing a corresponding sentence annotation instruction.
In the embodiment of the present application, an electronic device (for example, the server/terminal device shown in fig. 1) on which the natural language based sentence annotation method operates may receive the target sentence sequence and the sentence annotation command through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.
Step 202: and calling a preset sentence pattern marking model in response to the sentence marking instruction.
In the application, the sentence sequence is marked mainly through a preset sentence pattern marking model. The sentence pattern labeling model mainly comprises: an input layer, a bert layer, and a CRF (Conditional Random Field) layer. The bert layer is used for coding an input statement sequence and converting the input statement sequence into a statement vector representing semantics; crf is denoted as the output layer and can be used to output the sentence sequence finally completing the annotation.
In some embodiments of the present application, before step 202, the method for sentence annotation based on natural language further comprises:
confirming the type of the target scene and the initial labeling model;
acquiring a target training set which is matched with the target scene type and has an initial label;
and training the initial labeling model based on the target training set, and adjusting the initial labeling model into the sentence pattern labeling model.
A sentence sequence represented by natural language is provided, and the category of the label in the label is often closely related to the specific scene type to which the label belongs. Wherein the scene types may include: part-of-speech tagging, named entity recognition, keyword tagging, role tagging, and the like. More specifically, in some scenarios of named entity identification, different specific service types may be further divided to achieve more accurate identification of entities.
After the scene type to which the current statement label belongs is determined, the selected initial label model is trained by acquiring a training set which corresponds to the scene type and has related labels conforming to the scene type, so that the initial label model is adjusted to a model suitable for the statement label under the scene type.
Further, before the step of training the initial labeling model based on the target training set, the sentence labeling method based on natural language further includes:
dividing the target training set into k sub-training sets, wherein k is larger than or equal to 2, and k belongs to N;
training the initial labeling model through k sub-training sets respectively to generate k sub-labeling models;
respectively carrying out annotation prediction on the target training set through k sub-annotation models to obtain k target prediction results;
and comparing the initial labels with the k target prediction results, adding first label labels appearing in the k target prediction results into the initial labels, and deleting second label labels in the initial labels when the second label labels in the initial labels do not appear in the k target prediction results.
In order to reduce the labeling error problem, data distillation can be performed on the data set used for model training before the model training.
In one embodiment of the present application, the data distillation is performed on the target training set by way of k-fold cross validation. Firstly, a target training set can be divided into k sub-training sets in a way of equipartition and the like, the numerical value of k is a positive integer greater than or equal to 2, then an initial model is trained through the k sub-training sets respectively to obtain k sub-label models, and then the target training set is predicted through the k sub-label models respectively. In the k target prediction results, all common same labeling labels can be regarded as more accurate labels for the target training set, if the same labeling labels exist in the initial labels, the same labeling labels are ignored, and if the same labeling labels do not exist, the corresponding labeling labels are added into the initial labels for updating; if some label labels in the initial labels appear in the k target prediction results, the label labels can be regarded as error labels of the target training set and need to be deleted from the initial labels.
Before the model is trained, the data distillation is carried out on the training set used by the model training, so that the quality of the training set can be improved, and the influence caused by label missing and label error is reduced.
Step 203: and inputting the target sentence sequence into the sentence pattern marking model, coding and converting the target sentence sequence through the sentence pattern marking model, and converting the target sentence sequence into a target sentence vector.
The obtained target sentence sequence is subjected to sentence marking through the called sentence marking model, and the sentence marking model is input through the input layer of the sentence marking model for operation processing. In the sentence marking model, the sentence sequence entering the sentence marking model through the input layer is encoded through the bert layer to be converted into a sentence vector capable of representing the semantics of the sentence marking model.
Step 204: and calculating loss values of the target sentence vector under different marking results based on a preset weighted loss function, and outputting a target marking sequence under the marking result corresponding to the lowest loss value.
The sentence vector obtained by the coding conversion of the bert layer in the sentence pattern labeling model is finally processed by the crf layer, and the optimal labeling result is selected from all possible labeling results and is output. When the marking result is selected, the crf layer is specifically screened according to the calculated loss value corresponding to each marking result, and the smaller the loss value is, the more excellent the marking result is.
In the application, the crf layer calculates loss values of sentence vectors corresponding to the target sentence sequence under different labeling results according to a preconfigured weighting loss function, and different weights are given to different entity samples in the sentence vectors identified in the labeling process when the loss values are calculated through the weighting loss function, so that the problem that the model labeling performance is poor due to the fact that the same weight is adopted by the samples when the samples are unbalanced is solved.
In some embodiments of the present application, before step 204, the method for natural language based sentence annotation further comprises:
configuring a weighted loss function and acquiring a function verification set;
updating parameter values of the weighted loss function based on a gradient of a validation set of functions.
In a specific implementation manner of the embodiment of the present application, the weighting loss function is configured as
Figure BDA0002672058420000091
Where W is the loss value, N is the size of the training set used by the model, and WiIs a sample weight, fi(θ) to identify samples representing entities, after the model has been trained through the training set, further parameter adjustments and preliminary evaluations of the model's abilities through the validation set are required, and wiThe value of (d) is updated by the gradient on the validation set. The weight is adjusted and optimized through a clean and small verification set, so that the influence caused by label imbalance can be effectively reduced.
In some embodiments of the present application, after step 204, the method for natural language based sentence annotation further comprises:
calling a preset entity knowledge base;
judging whether a first entity separated from the entity knowledge base exists in the target statement sequence or not;
and if the second entity exists, calculating a second entity with the highest matching degree with the first entity in the entity knowledge base, replacing the first entity in the target labeling sequence with the second entity, or adding the second entity into the target labeling sequence as a new labeling label for the first entity.
After the final output of the CRF layer of the statement marking model, the input statement sequence can be converted into a marking sequence for explaining each entity through a marking label. However, due to the variety of natural languages, some identified entities may have different entity names in different texts, or the same entity name may refer to different entities in different contexts.
It is further understood that, for an existing knowledge base, the name of the entity identified by the label in the target sentence sequence may not be included in the knowledge base, but may exist in the knowledge base in the form of another entity name, so in some embodiments, entity disambiguation is also performed on the target label sequence to find an unambiguous entity name representing the entity in the target sentence sequence from the existing knowledge base. And entity disambiguation operation is added, so that the labeling result can be further improved, and capability guarantee is provided for the downstream sentence pattern analysis, question and answer retrieval and other expanding tasks.
The statement labeling method based on the natural language, provided by the embodiment of the application, endows different weights to different samples through the weighting loss function, can effectively reduce the influence caused by label imbalance, can further improve the labeling result, and improves the effect of model labeling.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
With further reference to fig. 3, fig. 3 is a schematic structural diagram illustrating an embodiment of a natural language based sentence marking apparatus according to an embodiment of the present application. As an implementation of the method shown in fig. 2, the present application provides an embodiment of a natural language-based sentence tagging apparatus, where the apparatus embodiment corresponds to the method embodiment shown in fig. 2, and the apparatus may be applied to various electronic devices.
As shown in fig. 3, the sentence annotation device based on natural language according to the present embodiment includes:
a data receiving module 301; the sentence marking system is used for receiving a target sentence sequence input by a user and a sentence marking instruction for the target sentence sequence.
A model calling module 302; and the sentence marking module is used for responding to the sentence marking instruction and calling a preset sentence marking model.
A model first processing module 303; and the sentence pattern annotation model is used for inputting the target sentence sequence into the sentence pattern annotation model, coding and converting the target sentence sequence through the sentence pattern annotation model, and converting the target sentence sequence into a target sentence vector.
A model second processing module 304; and the system is used for calculating the loss values of the target sentence vector under different marking results based on a preset weighted loss function and outputting a target marking sequence under the marking result corresponding to the lowest loss value.
In some embodiments of the present application, the natural language based sentence annotation module further comprises: and a model training module. Before the model calling module 302 calls a preset sentence annotation model in response to the sentence annotation command, the model training module is configured to: confirming the type of the target scene and the initial labeling model; acquiring a target training set which is matched with the target scene type and has an initial label; and training the initial labeling model based on the target training set, and adjusting the initial labeling model into the sentence pattern labeling model.
Further, the model training module further comprises: data distillation submodule. Before the model training module trains the initial labeling model based on the target training set, the data distillation submodule is to: dividing the target training set into k sub-training sets, wherein k is larger than or equal to 2, and k belongs to N; training the initial labeling model through k sub-training sets respectively to generate k sub-labeling models; respectively carrying out annotation prediction on the target training set through k sub-annotation models to obtain k target prediction results; and comparing the initial labels with the k target prediction results, adding first label labels appearing in the k target prediction results into the initial labels, and deleting second label labels in the initial labels when the second label labels in the initial labels do not appear in the k target prediction results.
In some embodiments of the present application, the natural language based sentence annotation module further comprises: and a function setting module. Before the second model processing module 304 calculates the loss values of the target sentence vector under different labeling results based on a preset weighted loss function, the function setting module is configured to configure a weighted loss function and obtain a function verification set; updating parameter values of the weighted loss function based on a gradient of a validation set of functions.
In some embodiments of the present application, the natural language based sentence annotation module further comprises: an entity disambiguation module. After the second model processing module 304 outputs the target annotation sequence under the annotation result corresponding to the lowest loss value, the entity disambiguation module is configured to call a preset entity knowledge base; judging whether a first entity separated from the entity knowledge base exists in the target statement sequence or not; and if the second entity exists, calculating a second entity with the highest matching degree with the first entity in the entity knowledge base, replacing the first entity in the target labeling sequence with the second entity, or adding the second entity into the target labeling sequence as a new labeling label for the first entity.
The sentence marking device based on the natural language, provided by the embodiment of the application, gives different weights to different samples through the weighting loss function, can effectively reduce the influence caused by label unbalance, can further improve the marking result, and improves the effect of model marking.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 6 comprises a memory 61, a processor 62, a network interface 63 communicatively connected to each other via a system bus. It is noted that only a computer device 6 having components 61-63 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 61 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 6. Of course, the memory 61 may also comprise both an internal storage unit of the computer device 6 and an external storage device thereof. In this embodiment, the memory 61 is generally used for storing an operating system installed in the computer device 6 and various types of application software, such as program codes of a sentence annotation method based on natural language. Further, the memory 61 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 62 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 62 is typically used to control the overall operation of the computer device 6. In this embodiment, the processor 62 is configured to execute the program code stored in the memory 61 or process data, for example, execute the program code of the natural language-based statement annotation method.
The network interface 63 may comprise a wireless network interface or a wired network interface, and the network interface 63 is typically used for establishing a communication connection between the computer device 6 and other electronic devices.
The embodiment of the application discloses a computer device, when carrying out the functional test of data propelling movement through the computer program that treater execution memory was saved, need not to establish the task through the front end operation, can realize to big batch statement mark requirement based on natural language, and reduce the consumption of test time, promote the efficiency of functional test, can also conveniently carry out the pressure test at the in-process that carries out the data propelling movement test, the problem that appears when can also conveniently analyzing the test when judging the propelling movement result of data through the log, and fix a position the problem that appears in the test process.
The present application further provides another embodiment, which is to provide a computer-readable storage medium storing a natural language based sentence annotation program, wherein the natural language based sentence annotation program is executable by at least one processor to cause the at least one processor to execute the steps of the natural language based sentence annotation method as described above.
It is emphasized that, to further ensure the privacy and security of the picture data, the picture data may also be stored in a node of a block chain.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
In the above embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and other divisions may be realized in practice, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed.
The modules or components may or may not be physically separate, and the components shown as modules or components may or may not be physical modules, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules or components can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The present application is not limited to the above-mentioned embodiments, the above-mentioned embodiments are preferred embodiments of the present application, and the present application is only used for illustrating the present application and not for limiting the scope of the present application, it should be noted that, for a person skilled in the art, it is still possible to make several improvements and modifications to the technical solutions described in the foregoing embodiments or to make equivalent substitutions for some technical features without departing from the principle of the present application. All equivalent structures made by using the contents of the specification and the drawings of the present application can be directly or indirectly applied to other related technical fields, and the same should be considered to be included in the protection scope of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All other embodiments that can be obtained by a person skilled in the art based on the embodiments in this application without any creative effort and all equivalent structures made by using the contents of the specification and the drawings of this application can be directly or indirectly applied to other related technical fields and are within the scope of protection of the present application.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

Claims (10)

1. A sentence annotation method based on natural language is characterized by comprising the following steps:
receiving a target sentence sequence input by a user and a sentence marking instruction for the target sentence sequence;
calling a preset sentence pattern marking model in response to the sentence marking instruction;
inputting the target sentence sequence into the sentence pattern tagging model, coding and converting the target sentence sequence through the sentence pattern tagging model, and converting the target sentence sequence into a target sentence vector;
and calculating loss values of the target sentence vector under different marking results based on a preset weighted loss function, and outputting a target marking sequence under the marking result corresponding to the lowest loss value.
2. The natural language based sentence annotation method of claim 1, wherein prior to the step of invoking a preset sentence annotation model in response to the sentence annotation command, the method further comprises:
confirming the type of the target scene and the initial labeling model;
acquiring a target training set which is matched with the target scene type and has an initial label;
and training the initial labeling model based on the target training set, and adjusting the initial labeling model into the sentence pattern labeling model.
3. The natural language based sentence annotation process of claim 2 wherein prior to the step of training the initial annotation model based on the target training set, the process further comprises:
dividing the target training set into k sub-training sets, wherein k is larger than or equal to 2, and k belongs to N;
training the initial labeling model through k sub-training sets respectively to generate k sub-labeling models;
respectively carrying out annotation prediction on the target training set through k sub-annotation models to obtain k target prediction results;
and comparing the initial labels with the k target prediction results, adding first label labels appearing in the k target prediction results into the initial labels, and deleting second label labels in the initial labels when the second label labels in the initial labels do not appear in the k target prediction results.
4. The method for sentence annotation based on natural language according to claim 1, wherein before the step of calculating the loss value of the target sentence vector under different annotation results based on the preset weighted loss function, the method further comprises:
configuring a weighted loss function and acquiring a function verification set;
updating parameter values of the weighted loss function based on a gradient of a validation set of functions.
5. The natural language based sentence annotation method of claim 1, wherein after the step of outputting the target annotation sequence under the annotation result corresponding to the lowest loss value among them, the method further comprises:
calling a preset entity knowledge base;
judging whether a first entity separated from the entity knowledge base exists in the target statement sequence or not;
and if the second entity exists, calculating a second entity with the highest matching degree with the first entity in the entity knowledge base, replacing the first entity in the target labeling sequence with the second entity, or adding the second entity into the target labeling sequence as a new labeling label for the first entity.
6. A sentence annotation apparatus based on natural language, comprising:
the system comprises a data receiving module, a sentence marking module and a sentence marking module, wherein the data receiving module is used for receiving a target sentence sequence input by a user and a sentence marking instruction for the target sentence sequence;
the model calling module is used for calling a preset sentence pattern marking model in response to the sentence marking instruction;
the first model processing module is used for inputting the target sentence sequence into the sentence pattern marking model, coding and converting the target sentence sequence through the sentence pattern marking model and converting the target sentence sequence into a target sentence vector;
and the second model processing module is used for calculating loss values of the target sentence vector under different marking results based on a preset weighted loss function and outputting a target marking sequence under a marking result corresponding to the lowest loss value.
7. The natural language based sentence annotation device of claim 6, wherein said natural language based sentence annotation device further comprises: a model training module; the model training module is configured to:
confirming the type of the target scene and the initial labeling model;
acquiring a target training set which is matched with the target scene type and has an initial label;
and training the initial labeling model based on the target training set, and adjusting the initial labeling model into the sentence pattern labeling model.
8. The natural language based sentence annotation device of claim 7, wherein the model training module further comprises: a data distillation submodule; the data distillation submodule is used for:
dividing the target training set into k sub-training sets, wherein k is larger than or equal to 2, and k belongs to N;
training the initial labeling model through k sub-training sets respectively to generate k sub-labeling models;
respectively carrying out annotation prediction on the target training set through k sub-annotation models to obtain k target prediction results;
and comparing the initial labels with the k target prediction results, adding first label labels appearing in the k target prediction results into the initial labels, and deleting second label labels in the initial labels when the second label labels in the initial labels do not appear in the k target prediction results.
9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the natural language based sentence annotation method according to any one of claims 1-5 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the natural language based sentence annotation method according to any one of claims 1-5.
CN202010936367.XA 2020-09-08 2020-09-08 Sentence marking method, device, equipment and storage medium based on natural language Active CN112084752B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010936367.XA CN112084752B (en) 2020-09-08 2020-09-08 Sentence marking method, device, equipment and storage medium based on natural language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010936367.XA CN112084752B (en) 2020-09-08 2020-09-08 Sentence marking method, device, equipment and storage medium based on natural language

Publications (2)

Publication Number Publication Date
CN112084752A true CN112084752A (en) 2020-12-15
CN112084752B CN112084752B (en) 2023-07-21

Family

ID=73732099

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010936367.XA Active CN112084752B (en) 2020-09-08 2020-09-08 Sentence marking method, device, equipment and storage medium based on natural language

Country Status (1)

Country Link
CN (1) CN112084752B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860919A (en) * 2021-02-20 2021-05-28 平安科技(深圳)有限公司 Data labeling method, device and equipment based on generative model and storage medium
CN112966477A (en) * 2021-03-05 2021-06-15 浪潮云信息技术股份公司 Method and system for stating words and sentences based on sequence annotation
CN113283222A (en) * 2021-06-11 2021-08-20 平安科技(深圳)有限公司 Automatic report generation method and device, computer equipment and storage medium
CN114398492A (en) * 2021-12-24 2022-04-26 森纵艾数(北京)科技有限公司 Knowledge graph construction method, terminal and medium in digital field

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100250497A1 (en) * 2007-01-05 2010-09-30 Redlich Ron M Electromagnetic pulse (EMP) hardened information infrastructure with extractor, cloud dispersal, secure storage, content analysis and classification and method therefor
CN109726291A (en) * 2018-12-29 2019-05-07 中科鼎富(北京)科技发展有限公司 Loss function optimization method, device and the sample classification method of disaggregated model
CN110390095A (en) * 2018-04-20 2019-10-29 株式会社Ntt都科摩 Sentence mask method and sentence annotation equipment
CN110619112A (en) * 2019-08-08 2019-12-27 北京金山安全软件有限公司 Pronunciation marking method and device for Chinese characters, electronic equipment and storage medium
CN110738041A (en) * 2019-10-16 2020-01-31 天津市爱贝叶斯信息技术有限公司 statement labeling method, device, server and storage medium
CN111144120A (en) * 2019-12-27 2020-05-12 北京知道创宇信息技术股份有限公司 Training sentence acquisition method and device, storage medium and electronic equipment
CN111597376A (en) * 2020-07-09 2020-08-28 腾讯科技(深圳)有限公司 Image data processing method and device and computer readable storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100250497A1 (en) * 2007-01-05 2010-09-30 Redlich Ron M Electromagnetic pulse (EMP) hardened information infrastructure with extractor, cloud dispersal, secure storage, content analysis and classification and method therefor
CN110390095A (en) * 2018-04-20 2019-10-29 株式会社Ntt都科摩 Sentence mask method and sentence annotation equipment
CN109726291A (en) * 2018-12-29 2019-05-07 中科鼎富(北京)科技发展有限公司 Loss function optimization method, device and the sample classification method of disaggregated model
CN110619112A (en) * 2019-08-08 2019-12-27 北京金山安全软件有限公司 Pronunciation marking method and device for Chinese characters, electronic equipment and storage medium
CN110738041A (en) * 2019-10-16 2020-01-31 天津市爱贝叶斯信息技术有限公司 statement labeling method, device, server and storage medium
CN111144120A (en) * 2019-12-27 2020-05-12 北京知道创宇信息技术股份有限公司 Training sentence acquisition method and device, storage medium and electronic equipment
CN111597376A (en) * 2020-07-09 2020-08-28 腾讯科技(深圳)有限公司 Image data processing method and device and computer readable storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860919A (en) * 2021-02-20 2021-05-28 平安科技(深圳)有限公司 Data labeling method, device and equipment based on generative model and storage medium
WO2022174496A1 (en) * 2021-02-20 2022-08-25 平安科技(深圳)有限公司 Data annotation method and apparatus based on generative model, and device and storage medium
CN112966477A (en) * 2021-03-05 2021-06-15 浪潮云信息技术股份公司 Method and system for stating words and sentences based on sequence annotation
CN112966477B (en) * 2021-03-05 2023-08-29 浪潮云信息技术股份公司 Method and system for stating words and sentences based on sequence annotation
CN113283222A (en) * 2021-06-11 2021-08-20 平安科技(深圳)有限公司 Automatic report generation method and device, computer equipment and storage medium
CN113283222B (en) * 2021-06-11 2021-10-08 平安科技(深圳)有限公司 Automatic report generation method and device, computer equipment and storage medium
CN114398492A (en) * 2021-12-24 2022-04-26 森纵艾数(北京)科技有限公司 Knowledge graph construction method, terminal and medium in digital field

Also Published As

Publication number Publication date
CN112084752B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN112084752B (en) Sentence marking method, device, equipment and storage medium based on natural language
CN112328761B (en) Method and device for setting intention label, computer equipment and storage medium
CN114780727A (en) Text classification method and device based on reinforcement learning, computer equipment and medium
CN112466314A (en) Emotion voice data conversion method and device, computer equipment and storage medium
CN112686022A (en) Method and device for detecting illegal corpus, computer equipment and storage medium
CN111783471B (en) Semantic recognition method, device, equipment and storage medium for natural language
CN112528029A (en) Text classification model processing method and device, computer equipment and storage medium
US11036996B2 (en) Method and apparatus for determining (raw) video materials for news
CN112836521A (en) Question-answer matching method and device, computer equipment and storage medium
CN113887237A (en) Slot position prediction method and device for multi-intention text and computer equipment
CN115687934A (en) Intention recognition method and device, computer equipment and storage medium
CN112084779A (en) Entity acquisition method, device, equipment and storage medium for semantic recognition
CN113220828B (en) Method, device, computer equipment and storage medium for processing intention recognition model
CN114817478A (en) Text-based question and answer method and device, computer equipment and storage medium
CN112949320B (en) Sequence labeling method, device, equipment and medium based on conditional random field
CN113723077A (en) Sentence vector generation method and device based on bidirectional characterization model and computer equipment
CN112686053A (en) Data enhancement method and device, computer equipment and storage medium
CN112199954A (en) Disease entity matching method and device based on voice semantics and computer equipment
CN115730603A (en) Information extraction method, device, equipment and storage medium based on artificial intelligence
CN114637831A (en) Data query method based on semantic analysis and related equipment thereof
CN114218356A (en) Semantic recognition method, device, equipment and storage medium based on artificial intelligence
CN113420869A (en) Translation method based on omnidirectional attention and related equipment thereof
CN112417886A (en) Intention entity information extraction method and device, computer equipment and storage medium
CN113255292B (en) End-to-end text generation method based on pre-training model and related equipment
CN112949317B (en) Text semantic recognition method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant