EA201991625A1 - METHOD AND SYSTEM FOR DATA CLASSIFICATION FOR DETECTING CONFIDENTIAL INFORMATION - Google Patents

METHOD AND SYSTEM FOR DATA CLASSIFICATION FOR DETECTING CONFIDENTIAL INFORMATION

Info

Publication number
EA201991625A1
EA201991625A1 EA201991625A EA201991625A EA201991625A1 EA 201991625 A1 EA201991625 A1 EA 201991625A1 EA 201991625 A EA201991625 A EA 201991625A EA 201991625 A EA201991625 A EA 201991625A EA 201991625 A1 EA201991625 A1 EA 201991625A1
Authority
EA
Eurasian Patent Office
Prior art keywords
data
confidential information
tags
classifying
processing
Prior art date
Application number
EA201991625A
Other languages
Russian (ru)
Other versions
EA038259B1 (en
Inventor
Алексей Алексеевич ТЕРЕНИН
Дмитрий Владимирович СМИРНОВ
Дмитрий Константинович СТРУКОВ
Денис Александрович КОРЯКОВСКИЙ
Original Assignee
Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк) filed Critical Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк)
Publication of EA201991625A1 publication Critical patent/EA201991625A1/en
Publication of EA038259B1 publication Critical patent/EA038259B1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Abstract

Настоящее изобретение в общем относится к области вычислительной обработки данных, а в частности к методам классификации данных для выявления конфиденциальной информации. Компьютерно-реализуемый способ классификации данных для выявления конфиденциальной информации выполняется с помощью по меньшей мере одного процессора и содержит этапы, на которых получают данные, представленные в табличном формате; осуществляют обработку полученных данных с помощью ансамбля нейронных сетей, в ходе которой данным в каждой ячейке таблицы присваивается тег, соответствующий заданному типу конфиденциальной информации, причем для каждой нейронной сети сформирована матрица классификации, на основании которой вычисляется F-мера для каждого типа данных; осуществляют обработку полученных данных с помощью алгоритмов определения контрольных разрядов на предмет выявления в ячейках таблицы данных, обладающих контрольным разрядом; на основе полученных от каждой нейронной сети таблиц с проставленными тегами и соответствующей нейронным сетям матрицы F-мер формируют итоговую таблицу с проставленными тегами с учетом данных, обладающих контрольным разрядом; выполняют классификацию данных итоговой таблицы по классам конфиденциальности на основе сравнения проставленных тегов итоговой таблицы с заданными тегами конфиденциальной информации.The present invention relates generally to the field of computational data processing, and in particular to methods for classifying data for identifying confidential information. A computer-implemented method for classifying data for detecting confidential information is performed using at least one processor and comprises the steps of obtaining data presented in a tabular format; processing the received data using an ensemble of neural networks, during which the data in each cell of the table is assigned a tag corresponding to a given type of confidential information, and for each neural network a classification matrix is formed, on the basis of which the F-measure is calculated for each type of data; carry out the processing of the obtained data using algorithms for determining the control digits in order to identify data in the cells of the table with a control bit; on the basis of the tables with affixed tags received from each neural network and the matrix of F-measures corresponding to the neural networks, a final table with the affixed tags is formed, taking into account the data having a control bit; classifying the summary table data into privacy classes based on a comparison of the set tags of the summary table with the specified tags of confidential information.

EA201991625A 2019-07-05 2019-07-31 Method and system for classifying data in order to detect confidential information EA038259B1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
RU2019121020A RU2759786C1 (en) 2019-07-05 2019-07-05 Method and system for classifying data for identifying confidential information

Publications (2)

Publication Number Publication Date
EA201991625A1 true EA201991625A1 (en) 2021-01-29
EA038259B1 EA038259B1 (en) 2021-07-30

Family

ID=74114915

Family Applications (1)

Application Number Title Priority Date Filing Date
EA201991625A EA038259B1 (en) 2019-07-05 2019-07-31 Method and system for classifying data in order to detect confidential information

Country Status (3)

Country Link
EA (1) EA038259B1 (en)
RU (1) RU2759786C1 (en)
WO (1) WO2021006755A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113918577B (en) * 2021-12-15 2022-03-11 北京新唐思创教育科技有限公司 Data table identification method and device, electronic equipment and storage medium

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7480640B1 (en) * 2003-12-16 2009-01-20 Quantum Leap Research, Inc. Automated method and system for generating models from data
WO2006138502A2 (en) * 2005-06-16 2006-12-28 The Board Of Trustees Operating Michigan State University Methods for data classification
US8490194B2 (en) * 2006-01-31 2013-07-16 Robert Moskovitch Method and system for detecting malicious behavioral patterns in a computer, using machine learning
US8752181B2 (en) * 2006-11-09 2014-06-10 Touchnet Information Systems, Inc. System and method for providing identity theft security
US9082080B2 (en) * 2008-03-05 2015-07-14 Kofax, Inc. Systems and methods for organizing data sets
US8286255B2 (en) * 2008-08-07 2012-10-09 Sophos Plc Computer file control through file tagging
FR2956541B1 (en) * 2010-02-18 2012-03-23 Centre Nat Rech Scient CRYPTOGRAPHIC METHOD FOR COMMUNICATING CONFIDENTIAL INFORMATION.
US10169715B2 (en) * 2014-06-30 2019-01-01 Amazon Technologies, Inc. Feature processing tradeoff management
US10535017B2 (en) * 2015-10-27 2020-01-14 Legility Data Solutions, Llc Apparatus and method of implementing enhanced batch-mode active learning for technology-assisted review of documents
RU2647640C2 (en) * 2015-12-07 2018-03-16 федеральное государственное казенное военное образовательное учреждение высшего образования "Краснодарское высшее военное училище имени генерала армии С.М. Штеменко" Министерства обороны Российской Федерации Method of automatic classification of confidential formalized documents in electronic document management system
WO2019035765A1 (en) * 2017-08-14 2019-02-21 Dathena Science Pte. Ltd. Methods, machine learning engines and file management platform systems for content and context aware data classification and security anomaly detection

Also Published As

Publication number Publication date
RU2759786C1 (en) 2021-11-17
WO2021006755A1 (en) 2021-01-14
EA038259B1 (en) 2021-07-30

Similar Documents

Publication Publication Date Title
US11341417B2 (en) Method and apparatus for completing a knowledge graph
CN109583325B (en) Face sample picture labeling method and device, computer equipment and storage medium
CN109635838B (en) Face sample picture labeling method and device, computer equipment and storage medium
CN111324784B (en) Character string processing method and device
WO2017124942A1 (en) Method and apparatus for abnormal access detection
Wang et al. Fast and robust object detection using asymmetric totally corrective boosting
WO2016177069A1 (en) Management method, device, spam short message monitoring system and computer storage medium
US20220222372A1 (en) Automated data masking with false positive detection and avoidance
US11334773B2 (en) Task-based image masking
Khullar et al. f-FNC: Privacy concerned efficient federated approach for fake news classification
EA201991625A1 (en) METHOD AND SYSTEM FOR DATA CLASSIFICATION FOR DETECTING CONFIDENTIAL INFORMATION
Ali et al. Fake accounts detection on social media using stack ensemble system
Bhuyan et al. SE_SPnet: Rice leaf disease prediction using stacked parallel convolutional neural network with squeeze‐and‐excitation
Chua et al. Problem Understanding of Fake News Detection from a Data Mining Perspective
US20130322682A1 (en) Profiling Activity Through Video Surveillance
Jairath et al. Adaptive skin color model to improve video face detection
US8918406B2 (en) Intelligent analysis queue construction
EP4227855A1 (en) Graph explainable artificial intelligence correlation
CN115438658A (en) Entity recognition method, recognition model training method and related device
US11775592B2 (en) System and method for association of data elements within a document
EA201992491A1 (en) METHOD AND SYSTEM FOR DATA CLASSIFICATION FOR DETECTING CONFIDENTIAL INFORMATION IN TEXT
CN112989869B (en) Optimization method, device, equipment and storage medium of face quality detection model
CN114398887A (en) Text classification method and device and electronic equipment
US20190057321A1 (en) Classification
KR20170085876A (en) Method for analyzing association of diseases using data mining