CN113095423A - Stream data classification method based on-line inverse deductive learning and implementation device thereof - Google Patents

Stream data classification method based on-line inverse deductive learning and implementation device thereof Download PDF

Info

Publication number
CN113095423A
CN113095423A CN202110430304.1A CN202110430304A CN113095423A CN 113095423 A CN113095423 A CN 113095423A CN 202110430304 A CN202110430304 A CN 202110430304A CN 113095423 A CN113095423 A CN 113095423A
Authority
CN
China
Prior art keywords
pseudo
knowledge base
modified
facts
mark
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110430304.1A
Other languages
Chinese (zh)
Other versions
CN113095423B (en
Inventor
李宇峰
周志华
黄宇轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202110430304.1A priority Critical patent/CN113095423B/en
Priority claimed from CN202110430304.1A external-priority patent/CN113095423B/en
Publication of CN113095423A publication Critical patent/CN113095423A/en
Application granted granted Critical
Publication of CN113095423B publication Critical patent/CN113095423B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a stream data classification method based on-line reverse deductive learning and a realization device thereof, wherein input unmarked (or weakly supervised) stream data is put into a current learner, so as to obtain a pseudo mark for predicting the current stream data; carrying out inverse deductive reasoning operation on the predicted pseudo mark by utilizing a knowledge base (and a weak supervision mark) to obtain a modified pseudo mark; finally, the learner is updated with the modified pseudo label. The above process is performed continuously as streaming data arrives. On one hand, the invention can utilize the domain knowledge expressed by first-order logic and use the online inverse deductive learning method to surpass the performance of the traditional online learning method; on the other hand, large amounts of streaming data can be processed quickly, unmarked or weakly marked data can be utilized, and also new classes that may appear in the data can be processed.

Description

Stream data classification method based on-line inverse deductive learning and implementation device thereof
Technical Field
The invention relates to a streaming data classification method based on online inverse deductive learning and an implementation device thereof, belonging to the technical field of artificial intelligence and pattern recognition tasks under large-scale data.
Background
The online learning is a mainstream machine learning algorithm, has an obvious effect in classification tasks of streaming data, large-scale data and the like, mainly aims at continuously arriving of a large amount of marked data, has limited equipment storage, and updates the current model by using a newly added training sample. The existing online learning technology is mostly realized by using a data-driven machine learning model, and has the defects of large amount of labeled data, difficulty in utilizing weak labeled data, difficulty in utilizing domain knowledge and the like.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems and the defects in the prior art, the invention provides a streaming data classification method based on online inverse deductive learning and an implementation device thereof.
The technical scheme is as follows: a kind of flow data classification method based on online inverse deductive learning, receive the flow data, through putting the flow data input into the present learner, thus obtain the false mark predicted to the present sample; converting the pseudo-mark obtained by prediction into a pseudo-fact, and executing an inverse deduction reasoning operation by using a knowledge base and weak mark data to obtain a modified pseudo-fact; finally, the modified pseudo fact is converted into a pseudo mark, and the learner is updated; the above process is continuously executed with the arrival of streaming data; and classifying weakly labeled or unlabeled data according to the situation that the streaming training data and the knowledge base exist simultaneously by an online inverse deduction learning method.
The streaming data is streaming data without marks or with weak supervision marks.
The flow of the streaming data classification method based on online inverse deductive learning mainly comprises three parts which are continuously executed along with the arrival of data:
(1) pseudo label prediction process: taking a batch of streaming data, putting all input samples into a learner, and obtaining the pseudo labels of the corresponding samples as output.
(2) And (3) an inverse deductive reasoning marking process: and (4) converting the pseudo marks into pseudo facts and inputting the pseudo facts into a knowledge base, and verifying whether the pseudo facts are consistent with the knowledge base by using logic calculation. If the pseudo marks are consistent, the pseudo marks are not modified; if not, the false facts are modified according to the principle of minimizing the inconsistency, so that the modified false facts are consistent with the knowledge base and are converted into false marks to be returned to the learner.
(3) Updating the learner process: and (4) taking the pseudo mark obtained by the reverse deduction reasoning as a real mark and using the real mark and the sample of the current batch for updating the learner.
The wrong marker location is found. The principle of minimizing inconsistencies is used, in other words, by modifying the fewest number of false facts, so that the modified facts are as consistent as possible with the knowledge base. When the number of the marks is larger than the preset number, the process can adopt a non-gradient optimization method to search, and when the number of the marks is smaller than the preset number, the exhaustive search can be directly carried out. Specifically, the method first tries to find a fact corresponding to a certain pseudo mark, marks the fact as reversible, and then conducts reverse deductive reasoning to obtain a pseudo fact which is consistent with the knowledge base after being modified; if there is no such fact, that is, any one of the pseudo-facts cannot be matched with the knowledge base after being modified, the method will try to find the pseudo-facts corresponding to some two labels, and label them as being reversible and trying to deduce, so as to obtain the pseudo-labels matched with the knowledge base. If the knowledge base is not consistent, the number of labels which can be modified is continuously increased until the fact that the modified knowledge base is consistent is found.
An implementation device of a streaming data classification method based on online inverse deductive learning comprises the following steps: a processor, and a memory coupled to the processor; the memory stores a domain knowledge base and instructions that, when executed by the one processor, cause the one processor to perform the above-described online inverse deductive learning streaming data classification method.
Drawings
FIG. 1 is a flow chart of a classification process of the method of the present invention;
FIG. 2 is a pseudo tag prediction flow diagram of the method of the present invention;
FIG. 3 is a flow chart of the reverse deductive reasoning marking process of the method of the present invention;
fig. 4 is a block diagram of the apparatus of the present invention.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.
The method for classifying the streaming data based on the online inverse deductive learning is used for online learning of the input streaming data based on a knowledge base and a learner to be learned. The learner in the method can be any learner suitable for corresponding tasks, such as a neural network, a decision tree and the like. The learner does not need to be trained in advance before learning, and can also carry out supervised pre-training. The content in the knowledge base can be domain knowledge rules expressed by first-order logic, and can also be programs expressed by other forms of languages and used for reasoning and calculation.
The implementation device of the streaming data classification method based on online inverse deductive learning can be executed by an electronic device, such as a terminal device or a server device. In other words, the method may be performed by software or hardware installed in the terminal device or the server device. Server devices include, but are not limited to: terminal devices such as a single server, a server cluster, a cloud server or a cloud server cluster include but are not limited to: any one of intelligent terminal equipment such as smart phones, personal computers, notebook computers, tablet computers, electronic readers, network televisions and wearable equipment.
As shown in fig. 1. For streaming data that comes continuously, a batch of data is taken first, and the current knowledge base is updated. Then pseudo-label prediction is performed in sequence (the flow is shown in figure 2), pseudo-labels are deduced in an inverse deductive way, and the learner process is updated. The three steps are continuously circulated until the proportion of the number of samples in the batch, which is consistent with the knowledge base, is judged to be more than r. After learning of samples of one batch is completed, samples of the next batch in the streaming data are taken, and then the process is repeated. Because the online learning is carried out on the streaming data samples, the time expenditure is small, and the training speed is high. In addition, because only weak labeling data or no labeling data is needed, the requirement on data labeling is lower than that of the traditional online learning method.
Reverse deduction marking process of online reverse deduction learning method
The reverse deduction reasoning marking process of the online reverse deduction learning method comprises the following three sub aspects:
1. and judging consistency by the knowledge base. First, a pseudo label y obtained by predicting one sample for a learnerpseudoThe pseudo-mark being composed of n sub-marks, i.e. ypseudo=[y1,y2,…,yn]. Converting pseudo-marks into pseudo-facts zpseudo=[z1,z2,…,zn]And then inputting the false fact and the weak supervision mark possibly attached to the sample into a knowledge base, and verifying whether the false fact is consistent with the knowledge base by using logic algorithm. If the pseudo-mark is consistent with the knowledge base, the pseudo-mark is not modified. Otherwise, if not consistent with the knowledge base, the following 2, 3 substeps are performed, attempting to make an inverse deductive inference of the pseudo-fact.
2. Finding false fact locations that are in error. The modified pseudo-facts are brought into agreement with the knowledge base using the principle of minimizing inconsistencies, in other words, by modifying a minimum number of facts. When n is larger (larger than a preset value), the process can adopt a non-gradient optimization method to search, and when n is smaller (smaller than the preset value), exhaustive search can be directly carried out. Specifically, the method first tries to find a certain false fact ziMarking the false fact as reversible deduction, and then sending the false fact to the substep 3 for reversible deduction reasoning to obtain the modified false fact which is consistent with the knowledge base; if such z is not presentiIn other words, any one of the pseudo facts ziIf no pseudo-facts consistent with the knowledge base can be obtained after modification, the method will try to find some two pseudo-facts ziAnd zjAnd labeling the two pseudo-facts as being inversely deductible and attempting an inversely deductive inference, resulting in a pseudo-fact that is modified to be consistent with the knowledge base. If the knowledge base is not consistent, the number of the modifiable data is continuously increased until the pseudo-fact position which is consistent with the knowledge base after modification is found.
3. The inverse deductive reasoning yields a modified pseudo-signature. In sub-step 2, deducible pseudo-fact locations are obtained, these locations are set to be deducible, these pseudo-facts (and weakly supervised tokens, if any) are then deduced inversely to the knowledge base, so that the pseudo-facts of these locations are modified to be consistent with the knowledge base, and finally converted into pseudo-tokens.
The process of reverse deductive reasoning labeling for online reverse deductive learning based on 1, 2, 3 point sub-steps is shown in fig. 3. Specifically, the input sample and the pseudo label thereof are first judged to be matched according to the pseudo fact converted from the pseudo label and the knowledge base in substep 1 in steps 310 and 320, and if they are matched, the input pseudo label is directly returned in step 390. Next, according to substep 2, at 330, 340, 350, 385 the computational search results in the wrong pseudo-fact location, and the process calls substep 3, 360, 370, 380, to perform an inverse deductive inference to obtain the modified pseudo-fact. Finally, the modified pseudo facts are converted back to pseudo labels at 390. Since the method searches for the marker with the least modification number first, the returned modified pseudo-marker must conform to the principle of minimizing inconsistency.
Fig. 4 shows a schematic diagram of an implementation of an online inverse deductive learning apparatus according to an embodiment of the invention. As shown in fig. 4, the online inverse deductive learning apparatus 400 may include at least one processor 410, a memory 420, a storage (e.g., a non-volatile storage) 430, and a communication interface 440, and the at least one processor 410, the memory 420, the storage 430, and the communication interface 440 are connected together via a bus 450.
Bus 450 provides a communication channel between the components of online anti-deductive learning device 400. The at least one processor 410 may control the online inverse learning apparatus 400. The at least one processor 410 may execute an operating system, firmware, etc. to drive the online anti-deductive learning apparatus 400. The at least one processor 410 executes at least one computer-readable instruction (i.e., the elements described above as being implemented in software) stored or encoded in memory. The memory 420 may be used as a working memory for the processor 410. The memory 420 may include volatile memory (such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM)) or non-volatile memory (such as phase change ram (pram), magnetic ram (mram), resistive ram (reram), or ferroelectric ram (fram)). The memory 430 may store data generated by the at least one processor 410. The memory 430 may store operating system or firmware code executed by the at least one processor 410, as well as a domain knowledge base. The memory 430 may include a non-volatile memory (such as a NAND flash memory, PRAM, MRAM, RRAM, or FRAM). The communication interface may include a network communication interface and a user input interface (such as a mouse, keyboard, microphone, and camera) for receiving information, such as streaming data.
In one embodiment, computer-executable instructions are stored in the memory that, when executed, cause the at least one processor 410 to: obtaining a pseudo-label for a current sample prediction by placing input unmarked (or weakly labeled) streaming data into a current learner; for the predicted pseudo-mark, performing inverse deductive reasoning operation by using the knowledge base (and weak mark data) to obtain a modified pseudo-mark; finally, the learner in the online anti-deductive learning device is updated with the modified pseudo label.
The computer-executable instructions stored in the memory, when executed, cause a processor 410 to perform the various operations and functions described above in connection with fig. 1-3 in the various embodiments of the present invention.
According to one embodiment, a program product, such as a machine-readable medium (e.g., a non-transitory machine-readable medium), is provided. A machine-readable medium may have instructions (i.e., elements described above as being implemented in software) that, when executed by a machine, cause the machine to perform various operations and functions described above in connection with fig. 1-3 in various embodiments of the invention. Specifically, a system or apparatus may be provided which is provided with a readable storage medium on which software program code implementing the functions of any of the above embodiments is stored, and causes a computer or processor of the system or apparatus to read out and execute instructions stored in the readable storage medium.
In this case, the program code itself read from the readable medium can realize the functions of any of the above-described embodiments, and thus the machine-readable code and the readable storage medium storing the machine-readable code form part of this specification.
Examples of the readable storage medium include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs), magnetic tapes, nonvolatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or from the cloud via a communications network.

Claims (7)

1. A flow data classification method based on-line reverse deductive learning is characterized in that flow data are received, input flow data are put into a current learner, and therefore a pseudo mark for predicting a current sample is obtained; converting the pseudo-mark obtained by prediction into a pseudo-fact, and executing an inverse deduction reasoning operation by using a knowledge base and weak mark data to obtain a modified pseudo-fact; finally, the modified pseudo fact is converted into a pseudo mark, and the learner is updated; the above process is continuously executed with the arrival of streaming data; and classifying weakly labeled or unlabeled data according to the situation that the streaming training data and the knowledge base exist simultaneously by an online inverse deduction learning method.
2. The method of claim 1, wherein the streaming data is unmarked or weakly supervised marked streaming data.
3. The method of claim 1, wherein the pseudo-label prediction process is: taking a batch of streaming data, putting all input samples into a learner, and obtaining the pseudo labels of the corresponding samples as output.
4. The method of claim 1, wherein the inverse deductive reasoning labeling procedure is: converting the pseudo marks into pseudo facts and inputting the pseudo facts into a knowledge base, and verifying whether the pseudo facts are consistent with the knowledge base by using logic calculation; if the pseudo marks are consistent, the pseudo marks are not modified; and if the false facts are inconsistent, modifying the false facts according to the principle of minimizing inconsistency so that the modified false facts are consistent with the knowledge base and are converted into false marks to be returned to the learner.
5. The method of claim 1, wherein the update learner process: and (4) taking the pseudo mark obtained by the reverse deduction reasoning as a real mark and using the real mark and the sample of the current batch for updating the learner.
6. The method of claim 1, wherein the modified facts are made as consistent as possible with the knowledge base by modifying a minimum number of pseudo-facts; when the number of the marks is larger than the preset number, searching by adopting a non-gradient optimization method, and when the number of the marks is smaller than the preset number, directly performing exhaustive search; the process of finding the wrong marker position is: firstly, trying to find a fact corresponding to a certain pseudo mark, marking the fact as reversible deduction, and then carrying out reversible deduction to obtain a pseudo fact which is consistent with a knowledge base after being modified; if the fact does not exist, in other words, any pseudo-fact cannot be consistent with the knowledge base after being modified, the pseudo-facts corresponding to some two marks are searched, and the pseudo-facts are marked to be reversible deductive and try to reason, so that the pseudo-marks consistent with the knowledge base are obtained; if the knowledge base is not consistent, the number of labels which can be modified is continuously increased until the fact that the modified knowledge base is consistent is found.
7. An implementation device of a streaming data classification method based on online inverse deductive learning is characterized by comprising the following steps: a processor, and a memory coupled to the processor; the memory stores a domain knowledge base and instructions that, when executed by the one processor, cause the one processor to perform the above-described online inverse deductive learning streaming data classification method.
CN202110430304.1A 2021-04-21 Stream data classification method based on online anti-deduction learning and realization device thereof Active CN113095423B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110430304.1A CN113095423B (en) 2021-04-21 Stream data classification method based on online anti-deduction learning and realization device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110430304.1A CN113095423B (en) 2021-04-21 Stream data classification method based on online anti-deduction learning and realization device thereof

Publications (2)

Publication Number Publication Date
CN113095423A true CN113095423A (en) 2021-07-09
CN113095423B CN113095423B (en) 2024-05-28

Family

ID=

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101116072A (en) * 2005-02-03 2008-01-30 英国电讯有限公司 Method and system for categorized presentation of search results
CN111222648A (en) * 2020-01-15 2020-06-02 深圳前海微众银行股份有限公司 Semi-supervised machine learning optimization method, device, equipment and storage medium
WO2020140597A1 (en) * 2018-12-31 2020-07-09 华南理工大学 Online active learning method applicable to unlabeled unbalanced data stream

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101116072A (en) * 2005-02-03 2008-01-30 英国电讯有限公司 Method and system for categorized presentation of search results
WO2020140597A1 (en) * 2018-12-31 2020-07-09 华南理工大学 Online active learning method applicable to unlabeled unbalanced data stream
CN111222648A (en) * 2020-01-15 2020-06-02 深圳前海微众银行股份有限公司 Semi-supervised machine learning optimization method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YU-XUAN HUANG等: ""Semi-Supervised Abductive Learning and Its Application to Theft Judicial Sentencing "", 《2020 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM) 》, pages 1072 - 1073 *
戴望州: ""一阶逻辑领域知识与机器学习的结合研究"", 博士电子期刊, pages 74 - 85 *

Similar Documents

Publication Publication Date Title
CN106845530B (en) character detection method and device
CN107797989A (en) Enterprise name recognition methods, electronic equipment and computer-readable recording medium
CN111985229A (en) Sequence labeling method and device and computer equipment
US20220244937A1 (en) Utilizing machine learning models for automated software code modification
WO2023116111A1 (en) Disk fault prediction method and apparatus
CN110162972B (en) UAF vulnerability detection method based on statement joint coding deep neural network
CN113010683B (en) Entity relationship identification method and system based on improved graph attention network
CN113283414A (en) Pedestrian attribute identification method, related equipment and computer readable storage medium
CN112069799A (en) Dependency syntax based data enhancement method, apparatus and readable storage medium
CN114169389A (en) Class-expanded target detection model training method and storage device
CN110795736B (en) Malicious android software detection method based on SVM decision tree
CN115131604A (en) Multi-label image classification method and device, electronic equipment and storage medium
CN114896395A (en) Language model fine-tuning method, text classification method, device and equipment
CN115563627A (en) Binary program vulnerability static analysis method based on man-machine cooperation
CN113254649B (en) Training method of sensitive content recognition model, text recognition method and related device
CN113869609A (en) Method and system for predicting confidence of frequent subgraph of root cause analysis
US20220075953A1 (en) Utilizing machine learning models and in-domain and out-of-domain data distribution to predict a causality relationship between events expressed in natural language text
CN113095423B (en) Stream data classification method based on online anti-deduction learning and realization device thereof
CN115879450B (en) Gradual text generation method, system, computer equipment and storage medium
CN113095423A (en) Stream data classification method based on-line inverse deductive learning and implementation device thereof
CN117218408A (en) Open world target detection method and device based on causal correction learning
EP4064078A1 (en) Utilizing a neural network model to generate a reference image based on a combination of images
US11961099B2 (en) Utilizing machine learning for optimization of planning and value realization for private networks
CN115576789A (en) Method and system for identifying lost user
CN112132269B (en) Model processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant