CN106021237B - 语言无关的概率内容匹配 - Google Patents

语言无关的概率内容匹配 Download PDF

Info

Publication number
CN106021237B
CN106021237B CN201610389876.9A CN201610389876A CN106021237B CN 106021237 B CN106021237 B CN 106021237B CN 201610389876 A CN201610389876 A CN 201610389876A CN 106021237 B CN106021237 B CN 106021237B
Authority
CN
China
Prior art keywords
content
document
rule
matching
fractionation regimen
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201610389876.9A
Other languages
English (en)
Chinese (zh)
Other versions
CN106021237A (zh
Inventor
M·甘地
C·拉曼纳
V·桑卡拉纳拉亚南
菲约 R·庞特斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing LLC filed Critical Microsoft Technology Licensing LLC
Publication of CN106021237A publication Critical patent/CN106021237A/zh
Application granted granted Critical
Publication of CN106021237B publication Critical patent/CN106021237B/zh
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6263Protecting personal data, e.g. for financial or medical purposes during internet communication, e.g. revealing personal data from cookies
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Bioethics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)
  • Machine Translation (AREA)
CN201610389876.9A 2012-02-07 2013-02-01 语言无关的概率内容匹配 Expired - Fee Related CN106021237B (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US13/367,469 2012-02-07
US13/367,469 US9087039B2 (en) 2012-02-07 2012-02-07 Language independent probabilistic content matching
CN201380008426.5A CN104094250B (zh) 2012-02-07 2013-02-01 语言无关的概率内容匹配

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201380008426.5A Division CN104094250B (zh) 2012-02-07 2013-02-01 语言无关的概率内容匹配

Publications (2)

Publication Number Publication Date
CN106021237A CN106021237A (zh) 2016-10-12
CN106021237B true CN106021237B (zh) 2019-07-02

Family

ID=48903680

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201610389876.9A Expired - Fee Related CN106021237B (zh) 2012-02-07 2013-02-01 语言无关的概率内容匹配
CN201380008426.5A Expired - Fee Related CN104094250B (zh) 2012-02-07 2013-02-01 语言无关的概率内容匹配

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201380008426.5A Expired - Fee Related CN104094250B (zh) 2012-02-07 2013-02-01 语言无关的概率内容匹配

Country Status (6)

Country Link
US (2) US9087039B2 (https=)
EP (1) EP2812810A4 (https=)
JP (1) JP6169620B2 (https=)
KR (1) KR102064623B1 (https=)
CN (2) CN106021237B (https=)
WO (1) WO2013119457A1 (https=)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8880989B2 (en) * 2012-01-30 2014-11-04 Microsoft Corporation Educating users and enforcing data dissemination policies
US9087039B2 (en) 2012-02-07 2015-07-21 Microsoft Technology Licensing, Llc Language independent probabilistic content matching
US10834027B2 (en) * 2015-06-27 2020-11-10 Mcafee, Llc Protection of sensitive chat data
US10218654B2 (en) 2015-09-29 2019-02-26 International Business Machines Corporation Confidence score-based smart email attachment saver
CN109313894A (zh) * 2016-06-21 2019-02-05 索尼公司 信息处理装置与信息处理方法
US10546154B2 (en) 2017-03-28 2020-01-28 Yodlee, Inc. Layered masking of content
US10915657B2 (en) * 2017-07-19 2021-02-09 AVAST Software s.r.o. Identifying and protecting personal sensitive documents
US12511477B2 (en) * 2021-12-30 2025-12-30 Huawei Technologies Co., Ltd. Methods and devices for generating sensitive text detectors
CN115658842A (zh) * 2022-08-29 2023-01-31 超聚变数字技术有限公司 个人信息识别方法及电子设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021852A (zh) * 2006-10-10 2007-08-22 鲍东山 基于内容的视频搜索调度系统
CN101571921A (zh) * 2008-04-28 2009-11-04 富士通株式会社 关键字识别方法和装置
US20110040983A1 (en) * 2006-11-09 2011-02-17 Grzymala-Busse Withold J System and method for providing identity theft security

Family Cites Families (92)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6850252B1 (en) 1999-10-05 2005-02-01 Steven M. Hoffberg Intelligent electronic appliance system and method
CA2129075C (en) 1993-10-18 1999-04-20 Joseph J. Daniele Electronic copyright royalty accounting system using glyphs
US6006242A (en) 1996-04-05 1999-12-21 Bankers Systems, Inc. Apparatus and method for dynamically creating a document
US6308148B1 (en) 1996-05-28 2001-10-23 Cisco Technology, Inc. Network flow data export
US6014135A (en) 1997-04-04 2000-01-11 Netscape Communications Corp. Collaboration centric document processing environment using an information centric visual user interface and information presentation method
US5958005A (en) 1997-07-17 1999-09-28 Bell Atlantic Network Services, Inc. Electronic mail security
US6148297A (en) 1998-06-01 2000-11-14 Surgical Safety Products, Inc. Health care information and data tracking system and method
US6104990A (en) 1998-09-28 2000-08-15 Prompt Software, Inc. Language independent phrase extraction
JP2000181916A (ja) * 1998-12-17 2000-06-30 Fujitsu Ltd 文書解析装置および方法ならびに文書解析プログラムを記録したコンピュータ読み取り可能な記録媒体
US6968308B1 (en) * 1999-11-17 2005-11-22 Microsoft Corporation Method for segmenting non-segmented text using syntactic parse
US6629081B1 (en) 1999-12-22 2003-09-30 Accenture Llp Account settlement and financing in an e-commerce environment
US7610233B1 (en) 1999-12-22 2009-10-27 Accenture, Llp System, method and article of manufacture for initiation of bidding in a virtual trade financial environment
US6678409B1 (en) 2000-01-14 2004-01-13 Microsoft Corporation Parameterized word segmentation of unsegmented text
DE60015709T2 (de) 2000-01-19 2005-11-10 Hewlett-Packard Development Co., L.P., Houston Sicherheitspolitik, die auf eine Gemeinschaftsdaten-Sicherheitsarchitektur angewendet wird
US6678698B2 (en) 2000-02-15 2004-01-13 Intralinks, Inc. Computerized method and system for communicating and managing information used in task-oriented projects
US6826609B1 (en) 2000-03-31 2004-11-30 Tumbleweed Communications Corp. Policy enforcement in a secure data file delivery system
AUPQ865700A0 (en) 2000-07-07 2000-08-03 Toneguzzo Group Pty Limited, The Content filtering and management
US6839707B2 (en) 2001-01-17 2005-01-04 General Electric Company Web-based system and method for managing legal information
US7181017B1 (en) 2001-03-23 2007-02-20 David Felsher System and method for secure three-party communications
US6990534B2 (en) 2001-07-20 2006-01-24 Flowfinity Wireless, Inc. Method for a proactive browser system for implementing background frame maintenance and asynchronous frame submissions
US20040205531A1 (en) 2001-08-17 2004-10-14 Innes Bruce Donald Method and application for developing a statement of work
US7725490B2 (en) 2001-11-16 2010-05-25 Crucian Global Services, Inc. Collaborative file access management system
US7260555B2 (en) 2001-12-12 2007-08-21 Guardian Data Storage, Llc Method and architecture for providing pervasive security to digital assets
US7113905B2 (en) 2001-12-20 2006-09-26 Microsoft Corporation Method and apparatus for determining unbounded dependencies during syntactic parsing
US7903549B2 (en) 2002-03-08 2011-03-08 Secure Computing Corporation Content-based policy compliance systems and methods
US9237514B2 (en) 2003-02-28 2016-01-12 Apple Inc. System and method for filtering access points presented to a user and locking onto an access point
US7809698B1 (en) 2002-12-24 2010-10-05 International Business Machines Corporation System and method remapping identifiers to secure files
US8020192B2 (en) 2003-02-28 2011-09-13 Michael Wright Administration of protection of data accessible by a mobile device
US9197668B2 (en) 2003-02-28 2015-11-24 Novell, Inc. Access control to files based on source information
US7493251B2 (en) 2003-05-30 2009-02-17 Microsoft Corporation Using source-channel models for word segmentation
JP4333229B2 (ja) * 2003-06-23 2009-09-16 沖電気工業株式会社 固有表現文字列の評価装置および評価方法
GB2405293B (en) 2003-08-18 2007-04-25 Clearswift Ltd Email policy manager
US20060008256A1 (en) 2003-10-01 2006-01-12 Khedouri Robert K Audio visual player apparatus and system and method of content distribution using the same
CA2564917C (en) 2004-04-30 2012-11-20 Research In Motion Limited Message service indication system and method
WO2005117466A2 (en) 2004-05-24 2005-12-08 Computer Associates Think, Inc. Wireless manager and method for managing wireless devices
US20060048224A1 (en) 2004-08-30 2006-03-02 Encryptx Corporation Method and apparatus for automatically detecting sensitive information, applying policies based on a structured taxonomy and dynamically enforcing and reporting on the protection of sensitive data through a software permission wrapper
US7454778B2 (en) 2004-09-30 2008-11-18 Microsoft Corporation Enforcing rights management through edge email servers
US7634735B2 (en) 2004-11-24 2009-12-15 Mccary David W Collaborative platform
JP4301513B2 (ja) 2004-11-26 2009-07-22 インターナショナル・ビジネス・マシーンズ・コーポレーション ポリシーを用いたアクセス制御効果の判定方法
US7533420B2 (en) 2004-12-09 2009-05-12 Microsoft Corporation System and method for restricting user access to a network document
JP4747591B2 (ja) * 2005-01-31 2011-08-17 日本電気株式会社 機密文書検索システム、機密文書検索方法、および機密文書検索プログラム
EP1853976B1 (en) * 2005-02-14 2018-12-26 Symantec Corporation Method and apparatus for handling messages containing pre-selected data
US8140664B2 (en) 2005-05-09 2012-03-20 Trend Micro Incorporated Graphical user interface based sensitive information and internal information vulnerability management system
US7853472B2 (en) 2005-07-15 2010-12-14 Saudi Arabian Oil Company System, program product, and methods for managing contract procurement
US7925973B2 (en) 2005-08-12 2011-04-12 Brightcove, Inc. Distribution of content
JP4826265B2 (ja) 2006-01-25 2011-11-30 富士ゼロックス株式会社 セキュリティポリシ付与装置、プログラム及び方法
US20070239600A1 (en) 2006-04-10 2007-10-11 Lundberg Steven W System and method for annuity processing
US20070261099A1 (en) 2006-05-02 2007-11-08 Broussard Scott J Confidential content reporting system and method with electronic mail verification functionality
US7984283B2 (en) 2006-05-22 2011-07-19 Hewlett-Packard Development Company, L.P. System and method for secure operating system boot
US7876335B1 (en) 2006-06-02 2011-01-25 Adobe Systems Incorporated Methods and apparatus for redacting content in a document
US20070294428A1 (en) 2006-06-19 2007-12-20 Ido Guy Method and System for Email Messaging
US8001130B2 (en) 2006-07-25 2011-08-16 Microsoft Corporation Web object retrieval based on a language model
GB2458087A (en) 2006-10-30 2009-09-09 Cryptometrics Canada Inc Computerized biometric passenger identification system and method
US8539349B1 (en) * 2006-10-31 2013-09-17 Hewlett-Packard Development Company, L.P. Methods and systems for splitting a chinese character sequence into word segments
JP4823022B2 (ja) * 2006-11-07 2011-11-24 キヤノンItソリューションズ株式会社 情報処理装置、情報処理方法、及びコンピュータプログラム
US8256006B2 (en) * 2006-11-09 2012-08-28 Touchnet Information Systems, Inc. System and method for providing identity theft security
US7953614B1 (en) 2006-11-22 2011-05-31 Dr Systems, Inc. Smart placement rules
US8117022B2 (en) * 2006-12-07 2012-02-14 Linker Sheldon O Method and system for machine understanding, knowledge, and conversation
US7797010B1 (en) 2007-02-15 2010-09-14 Nextel Communications Inc. Systems and methods for talk group distribution
US7738900B1 (en) 2007-02-15 2010-06-15 Nextel Communications Inc. Systems and methods of group distribution for latency sensitive applications
US20080221882A1 (en) * 2007-03-06 2008-09-11 Bundock Donald S System for excluding unwanted data from a voice recording
JP2008269173A (ja) 2007-04-18 2008-11-06 Hitachi Ltd 計算機システム、ストレージシステムおよびデータ管理方法
US8521511B2 (en) 2007-06-18 2013-08-27 International Business Machines Corporation Information extraction in a natural language understanding system
US20090019121A1 (en) 2007-07-10 2009-01-15 Messagelabs Limited Message processing
US8091138B2 (en) 2007-09-06 2012-01-03 International Business Machines Corporation Method and apparatus for controlling the presentation of confidential content
US8396838B2 (en) 2007-10-17 2013-03-12 Commvault Systems, Inc. Legal compliance, electronic discovery and electronic document handling of online and offline copies of data
US8161526B2 (en) 2007-10-22 2012-04-17 International Business Machines Corporation Protecting sensitive information on a publicly accessed data processing system
US20090119372A1 (en) 2007-11-02 2009-05-07 Sean Callanan System and method for providing email warnings
US8151200B2 (en) 2007-11-15 2012-04-03 Target Brands, Inc. Sensitive information handling on a collaboration system
WO2009070931A1 (en) 2007-12-06 2009-06-11 Google Inc. Cjk name detection
US7913167B2 (en) 2007-12-19 2011-03-22 Microsoft Corporation Selective document redaction
US8707384B2 (en) 2008-02-11 2014-04-22 Oracle International Corporation Change recommendations for compliance policy enforcement
US8423483B2 (en) 2008-05-16 2013-04-16 Carnegie Mellon University User-controllable learning of policies
US8346532B2 (en) 2008-07-11 2013-01-01 International Business Machines Corporation Managing the creation, detection, and maintenance of sensitive information
US8271483B2 (en) 2008-09-10 2012-09-18 Palo Alto Research Center Incorporated Method and apparatus for detecting sensitive content in a document
JP4586913B2 (ja) 2008-09-19 2010-11-24 富士ゼロックス株式会社 文書管理システム、文書利用管理装置、及びプログラム
US8272028B2 (en) 2008-10-15 2012-09-18 Ricoh Company, Ltd. Approach for managing access to electronic documents on network devices using document retention policies and document security policies
WO2010059720A1 (en) 2008-11-19 2010-05-27 Scigen Technologies, S.A. Document creation system and methods
US8234693B2 (en) 2008-12-05 2012-07-31 Raytheon Company Secure document management
US9614924B2 (en) 2008-12-22 2017-04-04 Ctera Networks Ltd. Storage device and method thereof for integrating network attached storage with cloud storage services
US20100169771A1 (en) 2008-12-31 2010-07-01 Cerner Innovation, Inc. User Interface for Managing Patient Care Plans
JP4701292B2 (ja) * 2009-01-05 2011-06-15 インターナショナル・ビジネス・マシーンズ・コーポレーション テキスト・データに含まれる固有表現又は専門用語から用語辞書を作成するためのコンピュータ・システム、並びにその方法及びコンピュータ・プログラム
US8131735B2 (en) 2009-07-02 2012-03-06 Battelle Memorial Institute Rapid automatic keyword extraction for information retrieval and analysis
KR101621481B1 (ko) 2009-12-15 2016-05-16 에스케이 텔레콤주식회사 보안 문서 관리 장치 및 방법
CN101841684B (zh) 2009-12-18 2013-01-23 闪联信息技术工程中心有限公司 显示内容加密系统和方法及观看显示内容的装置
US20110246965A1 (en) 2010-04-01 2011-10-06 International Business Machines Corporation Correcting document generation for policy compliance
CN101943955A (zh) * 2010-09-25 2011-01-12 吴保国 拼音义标直观汉字及多语言文字输入法
US20120084868A1 (en) 2010-09-30 2012-04-05 International Business Machines Corporation Locating documents for providing data leakage prevention within an information security management system
US8806615B2 (en) * 2010-11-04 2014-08-12 Mcafee, Inc. System and method for protecting specified data combinations
WO2012109386A1 (en) 2011-02-08 2012-08-16 T-Mobile Usa, Inc. Dynamic binding of service on bearer
US8880989B2 (en) 2012-01-30 2014-11-04 Microsoft Corporation Educating users and enforcing data dissemination policies
US9087039B2 (en) 2012-02-07 2015-07-21 Microsoft Technology Licensing, Llc Language independent probabilistic content matching

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021852A (zh) * 2006-10-10 2007-08-22 鲍东山 基于内容的视频搜索调度系统
US20110040983A1 (en) * 2006-11-09 2011-02-17 Grzymala-Busse Withold J System and method for providing identity theft security
CN101571921A (zh) * 2008-04-28 2009-11-04 富士通株式会社 关键字识别方法和装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Development of a Multi-Classifier Approach for Multilingual Text Categorization;Chung-Hong Lee et al;《Conference on Data Mining(DIMI"06)》;20061231;第73-77页 *
多语言文本聚类研究综述;章成志 等;《现代图书情报技术》;20091231(第6期);第31-36页 *

Also Published As

Publication number Publication date
US9633001B2 (en) 2017-04-25
WO2013119457A1 (en) 2013-08-15
JP2015511360A (ja) 2015-04-16
CN104094250B (zh) 2017-10-10
EP2812810A1 (en) 2014-12-17
CN106021237A (zh) 2016-10-12
JP6169620B2 (ja) 2017-07-26
CN104094250A (zh) 2014-10-08
KR20140133515A (ko) 2014-11-19
EP2812810A4 (en) 2015-12-02
US20160012037A1 (en) 2016-01-14
KR102064623B1 (ko) 2020-01-09
US9087039B2 (en) 2015-07-21
US20130204609A1 (en) 2013-08-08

Similar Documents

Publication Publication Date Title
CN106021237B (zh) 语言无关的概率内容匹配
US11681960B2 (en) Extracting and surfacing user work attributes from data sources
US11386769B2 (en) Creation of reminders using activity state of an application
US12436668B2 (en) Systems, devices and methods for electronic determination and communication of location information
KR102318884B1 (ko) 터치 감응형 디바이스 상의 관련 콘텐츠를 사전행동적으로 식별 및 표면화하기 위한 시스템 및 방법
US11811711B2 (en) Method, apparatus, system, and non-transitory computer readable medium for controlling user access through content analysis of an application
US20110289407A1 (en) Font recommendation engine
KR20140039196A (ko) 컨텍스트 인식 입력 엔진
US12591707B2 (en) Privacy preserving insights and distillation of large language model backed experiences
US10042840B2 (en) Hybrid grammatical and ungrammatical parsing
GB2522133A (en) Alternative unlocking patterns
KR20150035798A (ko) 현지화된 사용자 인터페이스의 생성 방법
KR20090127936A (ko) 클라이언트 입력 방식
CN106326204A (zh) 消息中错误的基于内容的检测和处理
US20150350030A1 (en) Pattern-based validation, constraint and generation of hierarchical metadata
US12561045B2 (en) Content-based menus for tabbed user interface
JP2013206387A (ja) データ検索システム及びデータ検索方法
WO2016176379A1 (en) Extracting and surfacing user work attributes from data sources
CN116599924A (zh) 一种邮件发送方法、装置、计算设备及存储介质
CN106295278A (zh) 一种发送验证信息的方法及终端
CN113536158A (zh) 提供对查询的更新后的答案
Hardware et al. Hardware

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190702