CN113095423B - Stream data classification method based on online anti-deduction learning and realization device thereof - Google Patents

Stream data classification method based on online anti-deduction learning and realization device thereof Download PDF

Info

Publication number
CN113095423B
CN113095423B CN202110430304.1A CN202110430304A CN113095423B CN 113095423 B CN113095423 B CN 113095423B CN 202110430304 A CN202110430304 A CN 202110430304A CN 113095423 B CN113095423 B CN 113095423B
Authority
CN
China
Prior art keywords
pseudo
facts
knowledge base
data
deduction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110430304.1A
Other languages
Chinese (zh)
Other versions
CN113095423A (en
Inventor
李宇峰
周志华
黄宇轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202110430304.1A priority Critical patent/CN113095423B/en
Publication of CN113095423A publication Critical patent/CN113095423A/en
Application granted granted Critical
Publication of CN113095423B publication Critical patent/CN113095423B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a stream data classification method based on online deduction learning and a realization device thereof, wherein the method comprises the steps of putting input unmarked (or weakly supervised mark) stream data into a current learner so as to obtain a pseudo mark for predicting the current stream data; performing a deductive reasoning operation on the predicted pseudo-tag by using the knowledge base (and the weak supervision tag) to obtain a modified pseudo-tag; finally, the learner is updated using the modified pseudo-tag. The above process is performed continuously as streaming data arrives. On one hand, the invention can utilize the domain knowledge of the first-order logic expression and use the online anti-deduction learning method to surpass the performance of the traditional online learning method; on the other hand, a large amount of streaming data can be processed quickly, unmarked or weakly marked data can be utilized, and new categories that may occur in the data can also be processed.

Description

Stream data classification method based on online anti-deduction learning and realization device thereof
Technical Field
The invention relates to a stream data classification method based on online deduction learning and an implementation device thereof, belonging to the technical field of artificial intelligence and pattern recognition tasks under large-scale data.
Background
The online learning is a mainstream machine learning algorithm, achieves remarkable effects in classification tasks such as streaming data, large-scale data and the like, mainly aims at continuously arriving a large amount of marked data, has limited equipment storage, and updates a current model by using a newly added training sample. The existing online learning technology is mostly realized by using a data-driven machine learning model, and has the defects that a large amount of annotation data is needed, weak annotation data is difficult to use, domain knowledge is difficult to use and the like.
Disclosure of Invention
The invention aims to: aiming at the problems and the shortcomings in the prior art, the invention provides a stream data classification method based on online deduction learning and an implementation device thereof.
The technical scheme is as follows: a method for classifying stream data based on online anti-deduction learning receives stream data, and obtains a pseudo mark for predicting a current sample by putting the input stream data into a current learner; converting the predicted false mark into false facts, and performing deduction reasoning operation by utilizing a knowledge base and weak mark data to obtain modified false facts; finally, converting the modified pseudo facts into pseudo marks, and updating the learner; the above process is continuously executed along with the arrival of streaming data; the weak annotation or non-annotation data is classified by an online anti-deduction learning method for the scene that the streaming training data and the knowledge base coexist.
The streaming data is unmarked or weakly supervised marked streaming data.
The flow of the flow data classification method based on online anti-deduction learning mainly comprises three parts, and the flow data classification method is continuously executed along with the arrival of data:
(1) Pseudo tag prediction process: taking one batch of streaming data, putting all input samples into a learner, and obtaining pseudo marks of the corresponding samples as output.
(2) Deductive reasoning labeling process: by converting the pseudo tag into a pseudo fact and inputting the pseudo fact into the knowledge base, logic algorithm is utilized to verify whether the pseudo fact is consistent with the knowledge base. If the pseudo marks are consistent, the pseudo marks are not modified; if not, an attempt is made to modify the pseudo facts according to the principles of minimizing the inconsistency, such that the modified pseudo facts agree with the knowledge base, and convert them to pseudo tags that are returned to the learner.
(3) Updating a learner process: the pseudo-mark obtained by deductive reasoning is taken as a real mark and is used for updating the learner together with the samples of the current batch.
Find the wrong marker location. The principle of minimizing inconsistencies is used, in other words, by modifying a minimum number of false facts, so that the modified facts are as consistent as possible with the knowledge base. When the number of marks is larger than the preset number, the process can search by adopting a non-gradient optimization method, and when the number of marks is smaller than the preset number, exhaustive search can be directly carried out. Specifically, the method firstly tries to find the fact corresponding to a certain pseudo mark, marks the fact as deductible, and then performs deduction to obtain the modified pseudo fact consistent with the knowledge base; if such facts do not exist, in other words, any one of the modified facts cannot be consistent with the knowledge base, the method will try to find the facts corresponding to some two marks, and mark them as deductible and try to infer, so as to obtain the pseudo marks consistent with the knowledge base. If it is still not consistent with the knowledge base, the number of labels that can be modified continues to be increased until a fact is found that can be modified to be consistent with the knowledge base.
An implementation device of a stream data classification method based on online deduction learning, comprising: a processor, and a memory coupled to the processor; the memory stores a domain knowledge base and instructions that, when executed by the one processor, cause the one processor to perform the above-described online anti-deductive learning streaming data classification method.
Drawings
FIG. 1 is a flow chart of the classification process of the method of the present invention;
FIG. 2 is a pseudo tag prediction flow chart of the method of the present invention;
FIG. 3 is a flow chart of the deductive reasoning labeling process of the method of the present invention;
fig. 4 is a block diagram of the apparatus of the present invention.
Detailed Description
The present application is further illustrated below in conjunction with specific embodiments, it being understood that these embodiments are meant to be illustrative of the application and not limiting the scope of the application, and that modifications of the application, which are equivalent to those skilled in the art to which the application pertains, fall within the scope of the application defined in the appended claims after reading the application.
The method for classifying the stream data based on the online anti-deduction learning carries out online learning on the input stream data based on a knowledge base and a learner to be learned. The learner in the method may be any learner suitable for the corresponding task, such as a neural network, a decision tree, etc. The learner before learning can perform supervised pre-training without pre-training. The content in the knowledge base can be domain knowledge rules expressed by first-order logic, or other forms of language expression and programs which can be used for reasoning and calculation.
The implementation device of the streaming data classification method based on online deduction learning can be executed by an electronic device, such as a terminal device or a server device. In other words, the method may be performed by software or hardware installed at a terminal device or a server device. Server devices include, but are not limited to: terminal devices such as a single server, a server cluster, a cloud server, or a cloud server cluster include, but are not limited to: any one of intelligent terminal equipment such as a smart phone, a personal computer, a notebook computer, a tablet personal computer, an electronic reader, a network television, a wearable device and the like.
As shown in fig. 1. For continuously incoming streaming data, first, a batch of data is taken and the current knowledge base is updated. Then, pseudo tag prediction is sequentially performed (the flow is shown in fig. 2), the pseudo tag is deduced and the learner process is updated. The three steps are continuously cycled until the proportion of the number of samples in the batch that are consistent with the knowledge base is determined to be greater than r. After the sample of one batch is learned, the sample of the next batch in the streaming data is taken, and then the process is repeated. As online learning is carried out on the streaming data samples, the time cost is low and the training speed is high. In addition, as only weak annotation data or no annotation data are needed, the requirement on data annotation is lower than that of the traditional online learning method.
Deduction reasoning marking process for online deduction learning method
The deduction reasoning marking process of the online deduction learning method consists of the following three sub-aspects:
1. And judging consistency by a knowledge base. First, a pseudo-marker y pseudo, which is made up of n sub-markers, i.e., y pseudo=[y1,y2,…,yn, is predicted for one sample by the learner. The pseudo-tag is converted into a pseudo-fact z pseudo=[z1,z2,…,zn, and then the pseudo-fact and the weak supervision tag possibly attached to the sample are input into the knowledge base together, and a logic algorithm is used to verify whether the pseudo-fact is consistent with the knowledge base. If consistent with the knowledge base, no modification is made to the pseudo tag. Otherwise, if the knowledge base is inconsistent, the following steps 2 and 3 are performed to try to deduce the false facts.
2. Finding the false fact location of the error. The principle of minimizing inconsistencies is used, in other words, by modifying a minimum number of facts, so that the modified facts are brought into agreement with the knowledge base. When n is relatively large (greater than a preset value), the process can adopt a non-gradient optimization method to search, and when n is relatively small (less than the preset value), the process can directly conduct exhaustive search. Specifically, the method may first try to find a certain pseudo fact z i, and label the one pseudo fact as deductible, and then give the deductive reasoning to the sub-step 3 to obtain a modified pseudo fact consistent with the knowledge base; if such a z i does not exist, in other words, neither of the facts z i is modified to obtain a fact that is consistent with the knowledge base, the method will attempt to find some two facts z i and z j, and label the two facts as deductible and attempt to deduce, resulting in a modified fact that is consistent with the knowledge base. If the position of the false fact is not consistent with the knowledge base, the number of the modifiable positions is continuously increased until the position of the false fact which is consistent with the knowledge base after the modifiable positions are found.
3. The deductive reasoning yields the modified pseudo-tag. In sub-step 2, deductible positions of the facts are obtained, these positions are set as deductible, and then the facts (and the weak supervision labels, if any) are deducted to the knowledge base so that the modified facts of these positions are consistent with the knowledge base, and finally converted into the pseudo labels.
The process of the on-line deductive learning of the deductive reasoning marks based on the 1,2,3 point sub-steps is shown in figure 3. Specifically, for the input sample and its pseudo-tag, first, according to substep 1, at 310 and 320, it is determined whether the input sample and its pseudo-tag are consistent, based on the pseudo-facts and knowledge base converted from the pseudo-tag, and if so, the input pseudo-tag is returned directly at 390. Next, according to sub-step 2, the search is calculated at 330, 340, 350, 385 to get the false facts location of the error, and this procedure will call sub-step 3, i.e. 360, 370, 380 to make the deductive reasoning to get the modified false facts. Finally the modified pseudo facts are converted back to pseudo tags at 390. Since the method searches for the mark with the least number of modifications first, the returned modified pseudo mark must conform to the principle of minimizing inconsistency.
Fig. 4 shows a schematic diagram of an implementation of an online deduction learning device according to an embodiment of the invention. As shown in fig. 4, the online deduction learning device 400 may include at least one processor 410, a memory 420, a storage (e.g., a non-volatile memory) 430, and a communication interface 440, and the at least one processor 410, the memory 420, the storage 430, and the communication interface 440 are connected together via a bus 450.
Bus 450 provides a communication channel between the components of the online deduction learning device 400. The at least one processor 410 may control the online deduction learning device 400. The at least one processor 410 may execute an operating system, firmware, etc. to drive the online deduction learning device 400. The at least one processor 410 executes at least one computer-readable instruction (i.e., the elements described above as being implemented in software) stored or encoded in memory. Memory 420 may be used as a working memory for processor 410. The memory 420 may include volatile memory, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), or non-volatile memory, such as phase change RAM (PRAM), magnetic RAM (MRAM), resistive RAM (ReRAM), or Ferroelectric RAM (FRAM). The memory 430 may store data generated by the at least one processor 410. Memory 430 may store operating system or firmware code for execution by at least one processor 410, as well as a domain knowledge base. The memory 430 may include a non-volatile memory (such as NAND flash memory, PRAM, MRAM, RRAM, or FRAM). The communication interface may include a network communication interface and a user input interface (such as a mouse, keyboard, microphone, and camera) for receiving information such as streaming data.
In one embodiment, computer-executable instructions are stored in memory that, when executed, cause the at least one processor 410 to: the method comprises the steps of obtaining a pseudo mark predicted for a current sample by putting input unmarked (or weak marked) streaming data into a current learner; performing a deductive reasoning operation on the predicted pseudo-tag by using the knowledge base (and the weak tag data) to obtain a modified pseudo-tag; finally, the learner in the online deduction learning device is updated using the modified pseudo tag.
The computer-executable instructions stored in the memory, when executed, cause a processor 410 to perform the various operations and functions described above in connection with fig. 1-3 in various embodiments of the invention.
According to one embodiment, a program product such as a machine-readable medium (e.g., a non-transitory machine-readable medium) is provided. The machine-readable medium may have instructions (i.e., elements described above implemented in software) that, when executed by a machine, cause the machine to perform various operations and functions described above in connection with fig. 1-3 in various embodiments of the invention. In particular, a system or apparatus provided with a readable storage medium having stored thereon software program code implementing the functions of any of the above embodiments may be provided, and a computer or processor of the system or apparatus may be caused to read out and execute instructions stored in the readable storage medium.
In this case, the program code itself read from the readable medium may implement the functions of any of the above embodiments, and thus the machine-readable code and the readable storage medium storing the machine-readable code form part of the present specification.
Examples of readable storage media include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs), magnetic tapes, nonvolatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or cloud by a communications network.

Claims (2)

1. A method for classifying stream data based on online anti-deduction learning is characterized in that stream data is received, and the input stream data is put into a current learner so as to obtain a pseudo mark for predicting a current sample; converting the predicted false mark into false facts, and performing deduction reasoning operation by utilizing a knowledge base and weak mark data to obtain modified false facts; finally, converting the modified pseudo facts into pseudo marks, and updating the learner; the above process is continuously executed along with the arrival of streaming data; classifying weak annotation or non-annotation data according to the scene of the concurrent existence of the streaming training data and the knowledge base by an online anti-deduction learning method;
the streaming data is the streaming data without mark or with weak supervision mark;
the flow of the flow data classification method based on online anti-deduction learning comprises three parts, wherein the flow data classification method is continuously executed along with the arrival of data:
(1) Pseudo tag prediction process: taking one batch of streaming data, putting all input samples into a learner, and obtaining pseudo marks of the corresponding samples as output;
(2) Deductive reasoning labeling process: the false mark is converted into false facts and the false facts are input into a knowledge base, and whether the false facts are consistent with the knowledge base or not is verified by utilizing logic calculation; if the pseudo marks are consistent, the pseudo marks are not modified; if the pseudo facts are inconsistent, modifying the pseudo facts according to the principle of minimizing the inconsistency is attempted, so that the modified pseudo facts are consistent with the knowledge base, and the modified pseudo facts are converted into pseudo marks and returned to the learner;
(3) Updating a learner process: the pseudo mark obtained by deduction reasoning is taken as a real mark and is used for updating a learner together with the samples of the current batch;
Modifying the least number of false facts such that the modified facts are as consistent as possible with the knowledge base; when the number of marks is larger than the preset number, searching is performed by adopting a non-gradient optimization method, and when the number of marks is smaller than the preset number, exhaustive searching is directly performed; the process of finding the wrong marker position is: firstly, trying to find the fact corresponding to a certain pseudo mark, marking the fact as deductible, and then performing deduction to obtain the modified pseudo fact consistent with the knowledge base; if the fact does not exist, in other words, any one of the pseudo facts cannot be consistent with the knowledge base after modification, searching the pseudo facts corresponding to two marks, marking the pseudo facts as deductible and trying to deduce, and obtaining the pseudo marks consistent with the knowledge base; if it is still not consistent with the knowledge base, the number of labels that can be modified continues to be increased until a fact is found that can be modified to be consistent with the knowledge base.
2. An implementation device of a stream data classification method based on online deduction learning is characterized by comprising: a processor, and a memory coupled to the processor; the memory stores a domain knowledge base and instructions that, when executed by the one processor, cause the one processor to perform the above-described online anti-deductive learning streaming data classification method.
CN202110430304.1A 2021-04-21 2021-04-21 Stream data classification method based on online anti-deduction learning and realization device thereof Active CN113095423B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110430304.1A CN113095423B (en) 2021-04-21 2021-04-21 Stream data classification method based on online anti-deduction learning and realization device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110430304.1A CN113095423B (en) 2021-04-21 2021-04-21 Stream data classification method based on online anti-deduction learning and realization device thereof

Publications (2)

Publication Number Publication Date
CN113095423A CN113095423A (en) 2021-07-09
CN113095423B true CN113095423B (en) 2024-05-28

Family

ID=76679033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110430304.1A Active CN113095423B (en) 2021-04-21 2021-04-21 Stream data classification method based on online anti-deduction learning and realization device thereof

Country Status (1)

Country Link
CN (1) CN113095423B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101116072A (en) * 2005-02-03 2008-01-30 英国电讯有限公司 Method and system for categorized presentation of search results
CN111222648A (en) * 2020-01-15 2020-06-02 深圳前海微众银行股份有限公司 Semi-supervised machine learning optimization method, device, equipment and storage medium
WO2020140597A1 (en) * 2018-12-31 2020-07-09 华南理工大学 Online active learning method applicable to unlabeled unbalanced data stream

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101116072A (en) * 2005-02-03 2008-01-30 英国电讯有限公司 Method and system for categorized presentation of search results
WO2020140597A1 (en) * 2018-12-31 2020-07-09 华南理工大学 Online active learning method applicable to unlabeled unbalanced data stream
CN111222648A (en) * 2020-01-15 2020-06-02 深圳前海微众银行股份有限公司 Semi-supervised machine learning optimization method, device, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Semi-Supervised Abductive Learning and Its Application to Theft Judicial Sentencing ";Yu-Xuan Huang等;《2020 IEEE International Conference on Data Mining (ICDM) 》;第1072-1073页 *
"一阶逻辑领域知识与机器学习的结合研究";戴望州;博士电子期刊;第74-85 *
Yu-Xuan Huang等 ."Semi-Supervised Abductive Learning and Its Application to Theft Judicial Sentencing ".《2020 IEEE International Conference on Data Mining (ICDM) 》.2020,第1072-1073页. *

Also Published As

Publication number Publication date
CN113095423A (en) 2021-07-09

Similar Documents

Publication Publication Date Title
CN106845530B (en) character detection method and device
CN109086873B (en) Training method, recognition method and device of recurrent neural network and processing equipment
CN111985229A (en) Sequence labeling method and device and computer equipment
CN107797989A (en) Enterprise name recognition methods, electronic equipment and computer-readable recording medium
CN113010683B (en) Entity relationship identification method and system based on improved graph attention network
US20220244937A1 (en) Utilizing machine learning models for automated software code modification
CN113989549A (en) Semi-supervised learning image classification optimization method and system based on pseudo labels
CN114896395A (en) Language model fine-tuning method, text classification method, device and equipment
CN115563627B (en) Binary program vulnerability static analysis method based on man-machine cooperation
CN113283414A (en) Pedestrian attribute identification method, related equipment and computer readable storage medium
CN111914159A (en) Information recommendation method and terminal
CN113779988A (en) Method for extracting process knowledge events in communication field
CN112966088A (en) Unknown intention recognition method, device, equipment and storage medium
CN113254649B (en) Training method of sensitive content recognition model, text recognition method and related device
CN113095423B (en) Stream data classification method based on online anti-deduction learning and realization device thereof
CN113869609A (en) Method and system for predicting confidence of frequent subgraph of root cause analysis
CN115879450B (en) Gradual text generation method, system, computer equipment and storage medium
CN115019029B (en) RPA element intelligent positioning method based on neural automaton
CN117218408A (en) Open world target detection method and device based on causal correction learning
EP4064078A1 (en) Utilizing a neural network model to generate a reference image based on a combination of images
CN112052649B (en) Text generation method, device, electronic equipment and storage medium
CN112364649A (en) Named entity identification method and device, computer equipment and storage medium
CN113283598A (en) Model training method and device, storage medium and electronic equipment
CN115130364A (en) Groove filling model training method and device, computing equipment and training system
CN111727108A (en) Method, apparatus, system, and program for controlling robot, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant