MX2022007872A - Privacy preservation in a queryable database built from unstructured texts. - Google Patents

Privacy preservation in a queryable database built from unstructured texts.

Info

Publication number
MX2022007872A
MX2022007872A MX2022007872A MX2022007872A MX2022007872A MX 2022007872 A MX2022007872 A MX 2022007872A MX 2022007872 A MX2022007872 A MX 2022007872A MX 2022007872 A MX2022007872 A MX 2022007872A MX 2022007872 A MX2022007872 A MX 2022007872A
Authority
MX
Mexico
Prior art keywords
named entities
queryable database
free text
queryable
abstract
Prior art date
Application number
MX2022007872A
Other languages
Spanish (es)
Inventor
Sancho Sara Lumbreras
Guijarro Jorge Tello
García Javier Fernández
Stephanie Marchesseau
Original Assignee
Medsavana S L
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Medsavana S L filed Critical Medsavana S L
Publication of MX2022007872A publication Critical patent/MX2022007872A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language

Abstract

A computer-implemented method of generating a queryable database (109). The method receives a corpus of free text documents (120) containing confidential data, the free text documents being related to the same domain. A trained Natural Language Processing (NLP) system (104) assigns one or more abstract named entities to each free text document in the corpus. The abstract named entities of each free text document are stored in a queryable database configured to provide aggregated information regarding the named entities. The NLP system is configured such that the abstract named entities are recognised and disambiguated with a precision between 0.75 and less than 1 and a recall between 0.75 and less than 1, and such that the ratio of precision and recall is between 0.7 and 1.3; wherein the queryable database is free from the addition of artificial noise by an artificial noise generation algorithm.
MX2022007872A 2019-12-23 2020-12-23 Privacy preservation in a queryable database built from unstructured texts. MX2022007872A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP19383189 2019-12-23
PCT/EP2020/087816 WO2021130337A1 (en) 2019-12-23 2020-12-23 Privacy preservation in a queryable database built from unstructured texts

Publications (1)

Publication Number Publication Date
MX2022007872A true MX2022007872A (en) 2022-07-19

Family

ID=69174292

Family Applications (1)

Application Number Title Priority Date Filing Date
MX2022007872A MX2022007872A (en) 2019-12-23 2020-12-23 Privacy preservation in a queryable database built from unstructured texts.

Country Status (8)

Country Link
US (1) US20230032536A1 (en)
EP (1) EP4081924B1 (en)
AU (1) AU2020412315A1 (en)
BR (1) BR112022012424A2 (en)
CA (1) CA3163953A1 (en)
CO (1) CO2022010085A2 (en)
MX (1) MX2022007872A (en)
WO (1) WO2021130337A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114386424B (en) * 2022-03-24 2022-06-10 上海帜讯信息技术股份有限公司 Industry professional text automatic labeling method, industry professional text automatic labeling device, industry professional text automatic labeling terminal and industry professional text automatic labeling storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8473451B1 (en) * 2004-07-30 2013-06-25 At&T Intellectual Property I, L.P. Preserving privacy in natural language databases
US9437207B2 (en) * 2013-03-12 2016-09-06 Pullstring, Inc. Feature extraction for anonymized speech recognition
US9996526B2 (en) * 2016-10-19 2018-06-12 International Business Machines Corporation System and method for supplementing a question answering system with mixed-language source documents
US11194967B2 (en) * 2018-03-15 2021-12-07 International Business Machines Corporation Unsupervised on-the-fly named entity resolution in dynamic corpora

Also Published As

Publication number Publication date
CA3163953A1 (en) 2021-07-01
EP4081924C0 (en) 2023-10-18
AU2020412315A1 (en) 2022-06-23
CO2022010085A2 (en) 2022-08-09
EP4081924A1 (en) 2022-11-02
EP4081924B1 (en) 2023-10-18
BR112022012424A2 (en) 2022-09-06
WO2021130337A1 (en) 2021-07-01
US20230032536A1 (en) 2023-02-02

Similar Documents

Publication Publication Date Title
Gamal et al. Twitter benchmark dataset for Arabic sentiment analysis
US20160307114A1 (en) Performing sentiment analysis
Çelikkaya et al. Named entity recognition on real data: a preliminary investigation for Turkish
KR20180084580A (en) Device and method to generate abstractive summaries from large multi-paragraph texts, recording medium for performing the method
CN106610955A (en) Dictionary-based multi-dimensional emotion analysis method
Munro et al. Short message communications: users, topics, and in-language processing
Maylawati et al. Set of frequent word item sets as feature representation for text with Indonesian slang
Adelani et al. The Effect of Domain and Diacritics in Yor\ub\'a-English Neural Machine Translation
Matsumoto et al. Emotions expressed by leaders in videos predict political aggression
Hawkins et al. Communicative interaction in spontaneous music and speech
MX2022007872A (en) Privacy preservation in a queryable database built from unstructured texts.
PH12019000353A1 (en) Natural language processing based sign language generation
Kwak et al. Keywords and topic analysis of social issues on twitter based on text mining and topic modeling
PH12020551961A1 (en) Information providing system and data structure
Won et al. Embedding for out of vocabulary words considering contextual and morphosyntactic information
Saggion et al. Can text summaries help predict ratings? a case study of movie reviews
Nand et al. A HMM POS tagger for micro-blogging type texts
Welekar et al. Emotion Categorization Using Twitter
Rudge The nominal group in British Sign Language: A preliminary description
Hmood A Discourse Analysis of Grammatical Cohesion in Some Selected Presidential Texts
Lohani et al. Comparison of Sequence Models for Text Narration from Tabular Data
Gordeev Automatic verbal aggression detection for Russian and American imageboards
Khan et al. Achieving success through effective business communication
Hmood Grammatical Cohesion in Some Selected Political Texts
Ahmed et al. Speech Source Separation Using a Multi-Pitch Harmonic Product Spectrum-Based Algorithm