MX2022007872A - Privacy preservation in a queryable database built from unstructured texts. - Google Patents
Privacy preservation in a queryable database built from unstructured texts.Info
- Publication number
- MX2022007872A MX2022007872A MX2022007872A MX2022007872A MX2022007872A MX 2022007872 A MX2022007872 A MX 2022007872A MX 2022007872 A MX2022007872 A MX 2022007872A MX 2022007872 A MX2022007872 A MX 2022007872A MX 2022007872 A MX2022007872 A MX 2022007872A
- Authority
- MX
- Mexico
- Prior art keywords
- named entities
- queryable database
- free text
- queryable
- abstract
- Prior art date
Links
- 238000004321 preservation Methods 0.000 title 1
- 238000003058 natural language processing Methods 0.000 abstract 3
- 238000000034 method Methods 0.000 abstract 2
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
- G06F21/6254—Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6227—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
Abstract
A computer-implemented method of generating a queryable database (109). The method receives a corpus of free text documents (120) containing confidential data, the free text documents being related to the same domain. A trained Natural Language Processing (NLP) system (104) assigns one or more abstract named entities to each free text document in the corpus. The abstract named entities of each free text document are stored in a queryable database configured to provide aggregated information regarding the named entities. The NLP system is configured such that the abstract named entities are recognised and disambiguated with a precision between 0.75 and less than 1 and a recall between 0.75 and less than 1, and such that the ratio of precision and recall is between 0.7 and 1.3; wherein the queryable database is free from the addition of artificial noise by an artificial noise generation algorithm.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP19383189 | 2019-12-23 | ||
PCT/EP2020/087816 WO2021130337A1 (en) | 2019-12-23 | 2020-12-23 | Privacy preservation in a queryable database built from unstructured texts |
Publications (1)
Publication Number | Publication Date |
---|---|
MX2022007872A true MX2022007872A (en) | 2022-07-19 |
Family
ID=69174292
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
MX2022007872A MX2022007872A (en) | 2019-12-23 | 2020-12-23 | Privacy preservation in a queryable database built from unstructured texts. |
Country Status (8)
Country | Link |
---|---|
US (1) | US20230032536A1 (en) |
EP (1) | EP4081924B1 (en) |
AU (1) | AU2020412315A1 (en) |
BR (1) | BR112022012424A2 (en) |
CA (1) | CA3163953A1 (en) |
CO (1) | CO2022010085A2 (en) |
MX (1) | MX2022007872A (en) |
WO (1) | WO2021130337A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114386424B (en) * | 2022-03-24 | 2022-06-10 | 上海帜讯信息技术股份有限公司 | Industry professional text automatic labeling method, industry professional text automatic labeling device, industry professional text automatic labeling terminal and industry professional text automatic labeling storage medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8473451B1 (en) * | 2004-07-30 | 2013-06-25 | At&T Intellectual Property I, L.P. | Preserving privacy in natural language databases |
US9437207B2 (en) * | 2013-03-12 | 2016-09-06 | Pullstring, Inc. | Feature extraction for anonymized speech recognition |
US9996526B2 (en) * | 2016-10-19 | 2018-06-12 | International Business Machines Corporation | System and method for supplementing a question answering system with mixed-language source documents |
US11194967B2 (en) * | 2018-03-15 | 2021-12-07 | International Business Machines Corporation | Unsupervised on-the-fly named entity resolution in dynamic corpora |
-
2020
- 2020-12-23 BR BR112022012424A patent/BR112022012424A2/en not_active Application Discontinuation
- 2020-12-23 EP EP20833918.4A patent/EP4081924B1/en active Active
- 2020-12-23 MX MX2022007872A patent/MX2022007872A/en unknown
- 2020-12-23 CA CA3163953A patent/CA3163953A1/en active Pending
- 2020-12-23 US US17/788,250 patent/US20230032536A1/en active Pending
- 2020-12-23 WO PCT/EP2020/087816 patent/WO2021130337A1/en active Search and Examination
- 2020-12-23 AU AU2020412315A patent/AU2020412315A1/en active Pending
-
2022
- 2022-07-18 CO CONC2022/0010085A patent/CO2022010085A2/en unknown
Also Published As
Publication number | Publication date |
---|---|
CA3163953A1 (en) | 2021-07-01 |
EP4081924C0 (en) | 2023-10-18 |
AU2020412315A1 (en) | 2022-06-23 |
CO2022010085A2 (en) | 2022-08-09 |
EP4081924A1 (en) | 2022-11-02 |
EP4081924B1 (en) | 2023-10-18 |
BR112022012424A2 (en) | 2022-09-06 |
WO2021130337A1 (en) | 2021-07-01 |
US20230032536A1 (en) | 2023-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gamal et al. | Twitter benchmark dataset for Arabic sentiment analysis | |
US20160307114A1 (en) | Performing sentiment analysis | |
Çelikkaya et al. | Named entity recognition on real data: a preliminary investigation for Turkish | |
KR20180084580A (en) | Device and method to generate abstractive summaries from large multi-paragraph texts, recording medium for performing the method | |
CN106610955A (en) | Dictionary-based multi-dimensional emotion analysis method | |
Munro et al. | Short message communications: users, topics, and in-language processing | |
Maylawati et al. | Set of frequent word item sets as feature representation for text with Indonesian slang | |
Adelani et al. | The Effect of Domain and Diacritics in Yor\ub\'a-English Neural Machine Translation | |
Matsumoto et al. | Emotions expressed by leaders in videos predict political aggression | |
Hawkins et al. | Communicative interaction in spontaneous music and speech | |
MX2022007872A (en) | Privacy preservation in a queryable database built from unstructured texts. | |
PH12019000353A1 (en) | Natural language processing based sign language generation | |
Kwak et al. | Keywords and topic analysis of social issues on twitter based on text mining and topic modeling | |
PH12020551961A1 (en) | Information providing system and data structure | |
Won et al. | Embedding for out of vocabulary words considering contextual and morphosyntactic information | |
Saggion et al. | Can text summaries help predict ratings? a case study of movie reviews | |
Nand et al. | A HMM POS tagger for micro-blogging type texts | |
Welekar et al. | Emotion Categorization Using Twitter | |
Rudge | The nominal group in British Sign Language: A preliminary description | |
Hmood | A Discourse Analysis of Grammatical Cohesion in Some Selected Presidential Texts | |
Lohani et al. | Comparison of Sequence Models for Text Narration from Tabular Data | |
Gordeev | Automatic verbal aggression detection for Russian and American imageboards | |
Khan et al. | Achieving success through effective business communication | |
Hmood | Grammatical Cohesion in Some Selected Political Texts | |
Ahmed et al. | Speech Source Separation Using a Multi-Pitch Harmonic Product Spectrum-Based Algorithm |