CN112651513A

CN112651513A - Information extraction method and system based on zero sample learning

Info

Publication number: CN112651513A
Application number: CN202011527869.3A
Authority: CN
Inventors: 洪万福; 钱智毅; 黄海龙
Original assignee: Xiamen Yuanting Information Technology Co ltd
Current assignee: Xiamen Yuanting Information Technology Co ltd
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2021-04-13

Abstract

The invention relates to an information extraction method and system based on zero sample learning, wherein the method comprises the following steps: s1: a user generates an information extraction request after packaging information extraction related resources through a client, and sends the information extraction request to a server; s2: after receiving the information extraction request, the server extracts related resources according to the information in the information extraction request to perform simulated training, and returns state information of each stage in the training process; s3: the server evaluates the model obtained after the simulation training; s4: a user sends a state query request to a server through a client, and the server queries a training state according to a machine learning uniqueness ID in the received state query request; s5: and the user sends an automatic stopping request to the server through the client, and the server stops corresponding simulation training according to the machine learning uniqueness ID. According to the invention, a user does not need to manually label the new type of information, and can extract information based on zero samples only by importing the existing information resources.

Description

Information extraction method and system based on zero sample learning

Technical Field

The invention relates to the field of information extraction, in particular to an information extraction method and system based on zero sample learning.

Background

With the arrival of a new wave of artificial intelligence wave in the year, machine learning and deep learning related technologies are applied to various industries and fields. Because the data growth speed is fast and the types are diversified, the information overload problem is becoming more serious, and therefore how to quickly and accurately acquire the required key information becomes a main problem facing nowadays. The information extraction technology extracts important information contained in a text by extracting fact information such as entities, relationships, events and the like of a specified type from a natural language text. The current information extraction methods are mainly divided into supervised learning methods and unsupervised learning methods, and although both methods can complete the information extraction task, the greatest disadvantage is that a large amount of manually labeled training data is needed, and a large amount of labor cost is consumed. Therefore, learning even zero samples from a few examples to derive features of an object from only a semantic description of the object remains a key challenge for information extraction. Despite recent advances in important areas such as vision and language, standard supervised deep learning does not provide a satisfactory solution for fast learning new concepts from zero samples, a small number of samples.

In order to solve these drawbacks, some technologies for reducing training samples and improving training efficiency have been developed in the industry, such as machine reasoning and pattern learning technologies, but such technologies still require a certain number of training samples to train specific information of a specific class in a model, so as to implement information classification, prediction and extraction on test samples in the test samples.

Disclosure of Invention

In order to solve the above problems, the present invention provides an information extraction method and system based on zero sample learning.

The specific scheme is as follows:

an information extraction method based on zero sample learning comprises the following steps:

s1: a user generates an information extraction request after packaging information extraction related resources through a client, and sends the information extraction request to a server;

s2: after receiving an information extraction request sent by a client, a server extracts related resources according to the information in the information extraction request to perform simulated training, and returns state information of each stage in the training process;

s3: the server evaluates the model obtained after the simulation training;

s4: a user sends a state query request to a server through a client, the server queries a training state according to a machine learning uniqueness ID in the received state query request, and if the training state is finished, a trained model is returned; if the training state is not finished, returning to the training condition;

s5: the user sends an automatic stop request to the server through the client, and the server stops the simulation training corresponding to the machine learning unique ID according to the machine learning unique ID in the received automatic stop request.

Further, the information extraction related resources comprise a data set, a model, an algorithm and parameters in the machine learning process on one hand; another aspect includes meta-information of the information extraction, the meta-information including a machine-learned unique ID.

Further, the process of simulation training comprises the following steps:

s201: preprocessing a data set in the information extraction related resources;

s202: vectorizing coding is carried out on the preprocessed data set;

s203: performing zero sample learning on the data set subjected to the vector quantization coding through a learning engine, specifically: when the test data set is not determined in the data set, the learning engine automatically generates a corresponding test data set in a mode of splitting data in the data set; and selecting a proper algorithm model from a plurality of algorithm models built in the learning engine to perform simulation training according to three factors of the size of the test data set, the distribution characteristics of the test data set and the load balance of the server.

Further, the evaluation is compared through two types of indexes, one is an algorithm index including accuracy, recall, an F1 value, AUC and a confusion matrix, and the other is a performance index including total time consumption, training time consumption, CPU usage, GPU usage, memory consumption, hard disk IO and network IO.

Further, the results and intermediate information generated during the simulated training process can be sent to a display interface for display after being called.

An information extraction system based on zero sample learning comprises a client and a server, wherein the client and the server both comprise a processor, a memory and a computer program stored in the memory and capable of running on the processor, and the processor executes the computer program to realize the steps of the method of the embodiment of the invention.

By adopting the technical scheme, the invention realizes a general information extraction scheme, and the information extraction based on zero samples can be realized by only importing the existing information resources without manually marking the new information.

Drawings

Fig. 1 is a flowchart illustrating a first embodiment of the present invention.

Detailed Description

To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. Those skilled in the art will appreciate still other possible embodiments and advantages of the present invention with reference to these figures.

The invention will now be further described with reference to the accompanying drawings and detailed description.

The first embodiment is as follows:

the embodiment of the invention provides an information extraction method based on zero sample learning, which is used for extracting information in a certain specific field, and as shown in fig. 1, the method comprises the following steps:

s1: and the user generates an information extraction request after packaging the information extraction related resources through the client, and sends the information extraction request to the server.

The information extraction related resources are related resources of a machine learning process in the information extraction process and comprise data, models, algorithms, parameters and the like. The embodiment specifically comprises two parts, wherein one part is meta-information extracted from information, and the meta-information comprises machine learning uniqueness ID, name, description, creator, creation time, authority and the like; secondly, extracting related resources for information, comprising: data sets, evaluators (optional), parameter options (scaling), etc., in this embodiment the data sets are text data of the user's field of interest.

An example of the data format of the information extraction request is as follows:

s2: after receiving an information extraction request sent by a client, the server extracts related resources according to the information in the information extraction request to perform simulated training, and returns state information of each stage in the training process. The state information includes the machine learning unique ID and whether the simulated training process was successfully started (in this embodiment, whether the learning engine was successfully started).

The process of simulated training comprises the following steps:

s201: and preprocessing the data set in the information extraction related resource.

The preprocessing in this embodiment includes data cleaning, data integration, data specification, data transformation, and the like.

S202: vectorizing coding is carried out on the preprocessed data set.

Vectorization coding comprises word vector coding, semantic coding, entity coding and the like on text information, feature coding and the like on picture video audio data.

S203: and performing zero sample learning on the data set subjected to the vector quantization coding through a learning engine.

The zero sample learning process includes that when a test data set is not determined in the data set, a learning engine automatically generates a corresponding test data set in a mode of splitting data in the data set; selecting a proper algorithm model from a plurality of algorithm models built in the learning engine to perform simulation training according to three factors of the size of the test data set, the distribution characteristics of the test data set and the load balance of the server; and storing the trained model after training.

The Learning engine is based on an Adaptive Learning (Adaptive Learning) theory, is internally provided with a plurality of algorithm models, and comprises Bi-LSTM (Bidirectional-Long Short Term Memory), IDCNN (scaled convolutional neural Networks), BERT-LSTM (Bidirectional Encoder replication from Short Term Memory, BERT language model lengthening Short Term Memory network) and the like, wherein the Bi-LSTM is suitable for the conditions of large data volume and class label equalization, and has low requirements on a server; compared with Bi-LSTM, IDCNN is suitable for general data volume, and for BERT-LSTM, the IDCNN can be suitable for data sets distributed by various types of label data under extremely low data volume, but the requirement on a server is high.

The results and intermediate information generated by the process of simulated training can be sent to a display interface for display after being called.

S3: and the server evaluates the model obtained after the simulation training.

The evaluation in this embodiment includes evaluation of the effectiveness and performance of the application of the model. Specifically, a comparative analysis mode is adopted, and comparison is mainly performed based on two types of indexes: the first is algorithm indexes including accuracy, recall, F1 value, AUC, confusion matrix and the like; the second is performance index, which includes total time consumption, training time consumption, CPU utilization, GPU utilization, memory consumption, hard disk IO, network IO, and the like. The cross comparison results of the indexes can be displayed in real time through a visual interface.

S4: a user sends a state query request to a server through a client, the server queries a training state according to a machine learning uniqueness ID in the received state query request, and if the training state is finished, a trained model is returned; if the training status is not complete (obtained by training progress bar percentage), then the training situation is returned, such as the training progress percentage.

An example data format for the status query request is as follows:

An example of the data format of the autostop request is as follows:

compared with the prior art, the embodiment has the following advantages:

the cross-machine learning platform optimization can be realized, and the application range is wider;

secondly, adaptive learning iteration can be rapidly carried out when a user uploads information data resources in a new industry field;

thirdly, the method has high availability and high expansibility, and only information resources in related industry fields need to be uploaded when the method is applied in a large scale, and a user side does not need to be adjusted;

fourthly, the most advanced information extraction algorithm is built in, and the method can be directly put into production and use.

Example two:

the invention also provides an information extraction system based on zero sample learning, which comprises a client and a server, wherein the client and the server both comprise a memory, a processor and a computer program which is stored in the memory and can run on the processor, and the steps in the above method embodiment of the first embodiment of the invention are realized when the processor executes the computer program.

Further, as an executable scheme, the client may be a mobile phone, a desktop computer, a notebook computer, or other computing devices, and the server may be a desktop computer, a notebook computer, a cloud server, or other computing devices. The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that is the controlling center for the clients and servers, with various interfaces and lines connecting the various parts of the overall clients and servers.

The memory may be used to store the computer programs and/or modules, and the processor may implement the various functions of the client and server by running or executing the computer programs and/or modules stored in the memory and invoking the data stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the mobile phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An information extraction method based on zero sample learning is characterized by comprising the following steps:

s3: the server evaluates the model obtained after the simulation training;

2. The information extraction method based on zero sample learning according to claim 1, characterized in that: the information extraction related resources comprise a data set, a model, an algorithm and parameters in the machine learning process on one hand; another aspect includes meta-information of the information extraction, the meta-information including a machine-learned unique ID.

3. The information extraction method based on zero sample learning according to claim 1, characterized in that: the process of simulated training comprises the following steps:

s201: preprocessing a data set in the information extraction related resources;

s202: vectorizing coding is carried out on the preprocessed data set;

4. The information extraction method based on zero sample learning according to claim 1, characterized in that: the evaluation is compared through two types of indexes, one is an algorithm index which comprises accuracy, recall rate, an F1 value, AUC and a confusion matrix, and the other is a performance index which comprises total time consumption, training time consumption, CPU utilization rate, GPU utilization rate, memory consumption, hard disk IO and network IO.

5. The information extraction method based on zero sample learning according to claim 1, characterized in that: the results and intermediate information generated by the process of simulated training can be sent to a display interface for display after being called.

6. An information extraction system based on zero sample learning is characterized in that: comprising a client and a server, each comprising a processor, a memory and a computer program stored in said memory and running on said processor, said processor implementing the steps of the method according to any one of claims 1 to 5 when executing said computer program.