EP2460093A2 - Auf linguistischer analyse basierendes geopositionierungssystem - Google Patents

Auf linguistischer analyse basierendes geopositionierungssystem

Info

Publication number
EP2460093A2
EP2460093A2 EP10762962A EP10762962A EP2460093A2 EP 2460093 A2 EP2460093 A2 EP 2460093A2 EP 10762962 A EP10762962 A EP 10762962A EP 10762962 A EP10762962 A EP 10762962A EP 2460093 A2 EP2460093 A2 EP 2460093A2
Authority
EP
European Patent Office
Prior art keywords
data
agents
actions
linguistic
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP10762962A
Other languages
English (en)
French (fr)
Inventor
Mathieu Guidere
Christian Fluhr
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Geolsemantics
Original Assignee
Geolsemantics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Geolsemantics filed Critical Geolsemantics
Publication of EP2460093A2 publication Critical patent/EP2460093A2/de
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/0009Transmission of position information to remote stations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the invention relates to the multilingual localization of an agent in time and space from a linguistic analysis of digital data transiting over a communication network.
  • the digital data includes linguistic data and metadata about the agent.
  • the agent may be at the origin of the digital data and / or be the subject of the linguistic data contained in the digital data.
  • Cross-language information retrieval which consists of formulating a query in a source language and searching for relevant documents in target languages, or searching for information in Multi Lingual Information Retrieval (MLIR), which consists of formulating a query in a source language and searching for relevant documents in all languages, allows both to enter queries and to obtain documents in different languages, but they do not make it possible to know exactly where and when specific data is contained in a particular document or associated with specific information in the linguistic data.
  • agent means a person or group of persons by whom an action is performed or will be performed
  • Linguistic data consists of a word or groups of words, categorizable in classes such as names of persons, names of organizations, institutions or enterprises, place names, quantities, distances, etc. ;
  • Metadata are data about the characteristics of a digital data such as the file type, the type of transmitter, the identity of the sender or transmitting equipment, the language of transmission, etc.
  • the invention relates more particularly to the metadata relating to the place of issue and the date and time of transmission of the data concerning the agent.
  • the invention overcomes this need with geolinguistic localization technology that prioritizes the spatiotemporal information present in the linguistic data and the associated metadata contained in the digital data.
  • the invention relates to an analysis system receiving, as input, digital data that comprises linguistic data that can be analyzed by linguistic processing, as well as metadata associated with their transmission, the system comprising: a data processing engine; linguistic analysis to analyze linguistic data from a semantic point of view and to qualify them from the point of view of space, time, agents and actions; an extraction engine for extracting, from the analyzed linguistic data and the spatiotemporal metadata associated with the transmission, data relating to the space and the time of agents'actions; a determination engine, from the extracted data, of the spatio-temporal location of the actions of the agents; a representation engine to represent, on a geographical map, the spatio-temporal location of the agents' actions.
  • the language analysis engine is a multilingual engine
  • the system receives as input data from a communication network
  • - digital data comes from Internet sites, blogs, forums, RSS feeds, instant messengers, e-mail services.
  • the invention relates to a method for analyzing digital data that includes linguistic data that can be analyzed by linguistic processing, as well as metadata associated with the transmission of these linguistic data, in which The following processing steps are carried out: linguistic data are analyzed from a semantic point of view to qualify them from the point of view of space, time, agents and actions; from the linguistic data analyzed and the spatio-temporal metadata associated with the program, data relating to the space and time of the agents' actions are extracted; the spatio-temporal location of the actions of the agents is determined from the extracted data; the spatial and temporal location of the actions of the agents is represented on a geographical map.
  • the data relating to the space and time of the agents' actions are compared with one another to evaluate their authenticity and / or; - we sort the data relating to the space and time of the actions of the agents to highlight, on all of these data, groupings of data according to at least one of their characteristic of time, space, action, agent;
  • the data relating to the space and time of the agents' actions are selected from only one of their characteristic of time, space, action, agent;
  • the determination of the spatio-temporal location being made from the authenticated extracted data and / or from the groupings obtained and / or from the selection thus made.
  • the invention relates to a computer program comprising machine instructions for implementing the method according to the second aspect of the invention.
  • FIG. 1 illustrates an embodiment of an analysis system for geolinguistic localization.
  • FIG. 2 schematically illustrates an analysis method implemented using the analysis system for geolinguistic localization
  • FIGS. 3 to 5 illustrate examples of analysis for the geolinguistic location. DETAILED DESCRIPTION OF THE INVENTION
  • Geolinguistic localization is a form of strategic intelligence that combines the principles of physical geolocation with the techniques of advanced linguistic analysis.
  • the basic principle is to go back to the physical source of the information sought by following the path that was used to propagate it, based on a cross-check of language indices of this information in several languages and metadata relating to its support. diffusion.
  • a spatial-temporal reference is derived from the data and / or metadata, allowing the information to be located at a given moment and in a specific place.
  • This reference makes it possible to have a representation at the same time spatial and temporal of the actions and the relations which exist between the emitters, the receivers and the relays of the information.
  • the linguistic data and the metadata concerning the supports of the actions and the relations can come from open sources freely accessible by internet or coming from legal interception.
  • Open source means freely accessible sites, agency sites, blogs, online videos, streaming television, social networks, RSS feeds, information accessed by engines search, information obtained by querying a specialized site (invisible web).
  • the linguistic data can be of private type and are processed only in the context of legal interceptions.
  • Principal data refers to data coming from electronic mail, oral conversations according to different channels, SMS, VoIP.
  • a user 1 transmits digital data D N on a communication network 3 by means of a communication device 2.
  • the digital data D N comprise linguistic data M L able to be analyzed by linguistic processing as well as spatio-temporal metadata M 0 , the digital data D N are received by the analysis system and are open or private.
  • the analysis system consists of the following elements:
  • the motors 10, 20, 30, 40, 50 can be connected to each other by means of a wired or wireless connection. They may be located at the same place or in different locations and may be under implemented as software stored on a CD-ROM type digital medium, key
  • USB or any other known type of storage medium.
  • a coordination module (not shown) makes it possible to update the transmitted data and to control the adequacy of the results to the requests to ensure the coherence and updating of the system .
  • Motor 10 for acquiring digital data DN The functions of the motors 10, 20, 30, 40, 50 are described below.
  • Motor 10 for acquiring digital data DN The functions of the motors 10, 20, 30, 40, 50 are described below.
  • the digital data acquisition engine D N makes it possible to acquire E 0 of digital data D N via a search by means of a search engine accessible on the Internet or by listening to the communication network 3.
  • the communication network 3 is, for example, a mobile communication cellular network, the Internet network, a corporate network, type Intranet or any other type of known network capable of conveying digital data D N.
  • the digital data D N are, for example, supported by an SMS sent by means of, for example, a mobile terminal 2 (as in the example illustrated in FIG. 1), an email or a voice message.
  • these digital data D N can be supported by any digital medium that can support linguistic data M L.
  • the linguistic analysis engine 20 analyzes Ei the linguistic data D L from a semantic point of view and qualifies them from the point of view of the space S, the time T, the agents A and the actions A c .
  • the analysis E 1 and the analysis engine 20 are based on a multilingual morphosyntactic analysis software which recognizes the agents A (names of persons, place names, name of organizations, data of date type and hour, numerical amounts with their unit), the actions A c (agent of the action, action, object of the action, and the circumstantial of time, place, instrument, manner, etc.).
  • agents A names of persons, place names, name of organizations, data of date type and hour, numerical amounts with their unit
  • the actions A c agents of the action, action, object of the action, and the circumstantial of time, place, instrument, manner, etc.
  • the set is standardized (for example, for people, the different spellings of these names are identified regardless of the language and the character set) and, during the analysis Ei, the ambiguities concerning the places according to the context and M 0 metadata by detecting the language L used, are raised.
  • the system thus makes it possible to analyze, from the linguistic data, the agents concerned ("On") and the nature of the action ("sees itself”), then, from the metadata, to know the identity of the agents (by their phone numbers) and the spatio-temporal coordinates of the appointment ("within an hour”).
  • the engine 30 extracts E 2 , from the linguistic data D L analyzed by the analysis engine 20 and space-time metadata M 0 associated with the transmission, data relating to the space and time of the actions Ac and Agents A.
  • This engine 30 makes it possible to locate in time T and in space S the data relating to the space and time of actions A c and agents A.
  • access sites containing digital data are listed and acquisition of Numeric data via search engines are based on defined criteria.
  • the message in the example above was issued using a mobile terminal "We see each other in an hour at Montparnasse station".
  • the metadata M 0 will identify the caller and called and the date and time of the call.
  • the caller and the called party are the agents and can be identified by their telephone number, and the relationship between the two is the rendezvous set in a given location: relocation of at least one of the two agents at Montparnasse station.
  • the result of the extraction of the digital data D N is as follows.
  • the system thus makes it possible to extract, through the analysis of the linguistic data, the agents concerned ("On") and the nature of the action ("sees itself”), then, thanks to the metadata, to know the identity agents (by their phone numbers) and calculate the time of the appointment ("in an hour"), thanks to an inference from the spatio-temporal metadata of the emission and the reception.
  • the motor 40 determines E 3 , from the extracted data (S, T, A c , A), the location of the Ac actions of the agents A.
  • the determination E 3 implemented in this engine 40 may further consist in implementing an authentication E 3 i, a profiling E 32 , a targeting E 33 of the data relating to the space S and to the time T of the actions A c Agents A.
  • the authentication E 3 i, the profiling E 32 , the targeting E 33 can be seen as a filtering of the data.
  • the authentication E 3 i, the profiling E 32 and the targeting E 33 are optional and are implemented after the extraction E 2 . They can be implemented successively or alternatively.
  • the determination E 3 of the spatio-temporal location is performed from the authenticated extracted data and / or from the groupings obtained by the profiling and / or from the selection made by the targeting.
  • Authentication E ⁇ it is a question of comparing data relating to the space S and time T of the actions A c of the agents A.
  • an agent A is a natural person located in
  • authentication consists of verifying the identity of the agent by exploring its attributes (time, relationships, actions, etc.) in the available linguistic data D L.
  • the authentication consists of searching in the linguistic data D L in several languages M L , actions A c and relationships associated with the agent.
  • the authentication consists of comparing the multilingual linguistic data M L and the metadata M 0 to ensure that there are no contradictions of the type: for an agent to be in two different places at the same time .
  • the system searches for all the forms under which this name appears in several languages M L , then it searches all the documents or data relating to these forms. by recovering, for each piece of data found, the positioning and location indices, both internal (places and dates indicated in the documents) and external (geo-positional data of the network that carries them). After this research phase, the data is cross-checked to extract those relating to the actions that the agent is about to perform or has already performed.
  • Prof i lape Ez? it is a question of sorting the data relating to the space S and the time T of the actions Ac of the agents to highlight, on the set of these data, groupings according to at least one of their characteristic of time T, space S, action A c , agent A.
  • profiling starts from the set of multilingual linguistic data M L in several languages L to extract, thanks to linguistic rules, the only data relating to places (geographic, urban, territorial landmarks %) and agents A.
  • Profiling includes three levels of analysis.
  • the first level is that of the general exploration of multilingual linguistic data M L in several languages L, focusing on the actions A c in the multilingual linguistic data M L and on the geo-positional information in the metadata M D.
  • the second level of analysis is that of sorting the data according to metadata indices M 0 (place and time of transmission) and data indices (relations and actions of the agent).
  • the determination engine 40 starts from the data available to go back to salient data, then to reveal a phenomenon or an agent (not or little known), providing the maximum of spatio-temporal indications on its location from multilingual linguistic data M L and metadata Mp.
  • the input data is a set of documents that one seeks to organize.
  • the data obtained by profiling is a set of semantic relations between actions and agents having links between them.
  • Target E ⁇ _ It is a question of selecting the multilingual linguistic data M L relating to the space S and to the time T of the actions Ac of the agents A from only one of their characteristic of time T, space S, action A c , agent A.
  • the targeting consists of aiming, by linguistic analysis, a particular type of data (action Ac, relation R) and looking for this multilingual linguistic data item M L in several languages and on various types of supports.
  • the input data is a type of action Ac that is searched in priority.
  • the data obtained by targeting is a particular type of semantic relationship between the action sought and agents with links to each other.
  • the motor 50 of representation E 4 makes it possible to display on a geographical map 60 the spatio-temporal location of the actions of the agents.
  • This representation engine 50 makes it possible in particular to align the markers of a geographical map with the spatiotemporal data resulting from the analysis of the multilingual linguistic data M L , by locating the data extracted from each other and by visualizing their links. in time and space.
  • Standardized locations are associated with geographic coordinates in longitude and latitude, which allows them to be represented on a geographic map 60 (for example, using Googlemap TM or any other device of the same type).
  • Example 1 Geolinguistic Location from the Internet
  • the system makes it possible to detect this language and to extract the spatio-temporal indices making it possible to locate the threat, to cross-check these internal indices with the metadata and external information of the connection (IP / DNS), the server and managers, finally to visualize on a map the places and links between individuals who have spoken or endorsed the content of said forum.
  • the system makes it possible to detect this language and to extract the spatio-temporal indices making it possible to locate the transmitting agent, to cross-check these internal indices with the metadata (for example the international mobile equipment identity (in English "International Mobile Equipment Identity ", IMEI) and the external information of the SMS in question (GSM information, GPRS), in order to visualize on a geographical map the places and links between individuals who have received or shared the content of said SMS.
  • the metadata for example the international mobile equipment identity (in English "International Mobile Equipment Identity ", IMEI) and the external information of the SMS in question (GSM information, GPRS), in order to visualize on a geographical map the places and links between individuals who have received or shared the content of said SMS.
  • the system identifies the following message "we see you tomorrow in front of Montparnasse station", this message was sent on 2/07/2010 at 10h52.
  • the system retains an SMS of the two and identifies the associated metadata as well as AC actions, agents A and the space S.
  • the system makes it possible to detect this language and to extract the spatio-temporal indices making it possible to locate the threat, to cross-check these internal indices with the metadata and external information of the mailing in question (network information), finally to visualize on a map the places and the links between individuals having received or shared the content of said Email.
  • two Emails 51, 52 transit over a communication network and a multilingual search of an agent is carried out.
  • the system then keeps one of the two emails and identifies the associated metadata as well as the AC actions, the A agents, and the S space.
  • Time T "last Thursday, tomorrow”.
  • Time T time coordinates (date and time) of transmission and reception of
  • the system thus makes it possible to locate, through the analysis of the linguistic data and the metadata, the agents concerned, to know the identity of the agents (by their telephone numbers) and to calculate the moment ("tomorrow"), thanks to an inference from the spatio-temporal metadata of transmission and reception.
  • Example 4 Geolinguistic Location from a Telephone Communication (Including VoIP)
  • the system makes it possible to detect this language from the transcription that will be made and extract the spatio-temporal indices for locating the threat, to intersect these internal indices with the metadata and external information of geopositioning (VoIP, GSM, GPRS, GPS), finally to visualize on a map the places and the links between individuals having received or shared this communication.
  • the system makes it possible to detect the threat by a linguistic analysis, to go back to the resource supporting the threat and to locate the transmitter and the potential receivers in time and in space, starting from the language production contained in the relays.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Telephonic Communication Services (AREA)
  • Machine Translation (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
EP10762962A 2009-07-30 2010-07-30 Auf linguistischer analyse basierendes geopositionierungssystem Ceased EP2460093A2 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR0955355A FR2948791B1 (fr) 2009-07-30 2009-07-30 Systeme de geolocalisation par analyse linguistique
PCT/FR2010/051637 WO2011012834A2 (fr) 2009-07-30 2010-07-30 Systeme de geolocalisation par analyse linguistique

Publications (1)

Publication Number Publication Date
EP2460093A2 true EP2460093A2 (de) 2012-06-06

Family

ID=42126443

Family Applications (1)

Application Number Title Priority Date Filing Date
EP10762962A Ceased EP2460093A2 (de) 2009-07-30 2010-07-30 Auf linguistischer analyse basierendes geopositionierungssystem

Country Status (3)

Country Link
EP (1) EP2460093A2 (de)
FR (1) FR2948791B1 (de)
WO (1) WO2011012834A2 (de)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050108195A1 (en) * 2002-05-07 2005-05-19 Microsoft Corporation Method, system, and apparatus for processing information based on the discovery of semantically labeled strings
US7752210B2 (en) * 2003-11-13 2010-07-06 Yahoo! Inc. Method of determining geographical location from IP address information
US20070099626A1 (en) * 2005-10-31 2007-05-03 Honeywell International Inc. Tracking system and method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHINATSU AONE ET AL: "REES : a large-scale relation and event extraction system", PROCEEDINGS OF THE SIXTH CONFERENCE ON APPLIED NATURAL LANGUAGE PROCESSING -, 1 January 2000 (2000-01-01), Morristown, NJ, USA, pages 76 - 83, XP055539811, DOI: 10.3115/974147.974158 *
HRISTO TANEV ET AL: "Real-time News Event Extraction for Global Monitoring Systems", 1 December 2008 (2008-12-01), XP055540095, Retrieved from the Internet <URL:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.178.8698&rep=rep1&type=pdf> [retrieved on 20190109] *
JAKUB PISKORSKI ET AL: "Cluster-Centric Approach to News Event Extraction", FRONTIERS IN ARTIFICIAL INTELLIGENCE AND APPLICATIONS, September 2008 (2008-09-01), pages 276 - 290, XP055496177 *

Also Published As

Publication number Publication date
WO2011012834A2 (fr) 2011-02-03
FR2948791B1 (fr) 2016-09-30
WO2011012834A3 (fr) 2011-04-07
FR2948791A1 (fr) 2011-02-04

Similar Documents

Publication Publication Date Title
US10270862B1 (en) Identifying non-search actions based on a search query
US9531649B2 (en) Identification of message recipients
US20190087072A1 (en) System and method for context enhanced mapping
US8055675B2 (en) System and method for context based query augmentation
US11122009B2 (en) Systems and methods for identifying geographic locations of social media content collected over social networks
US9563649B2 (en) Location stamping and logging of electronic events and habitat generation
US8386506B2 (en) System and method for context enhanced messaging
CN104270521B (zh) 对来电号码进行处理的方法和移动终端
US8452855B2 (en) System and method for presentation of media related to a context
JP6689515B2 (ja) ユーザ地理的ロケーションのタイプを識別するための方法および装置
US8527279B2 (en) Voice recognition grammar selection based on context
US8223932B2 (en) Appending content to a telephone communication
US9507836B1 (en) Associating an event attribute with a user based on a group of one or more electronic messages associated with the user
US20130297694A1 (en) Systems and methods for interactive presentation and analysis of social media content collection over social networks
CN104700835A (zh) 提供话音接口的方法和系统
CN102323933A (zh) 一种面向即时通信的信息嵌入和交互系统及方法
CN103457975A (zh) 获取地图兴趣点评价数据的方法和装置
WO2020186824A1 (zh) 应用程序唤醒控制方法、装置、计算机设备及存储介质
JP2017199225A (ja) 災害情報選択装置およびその方法
Devkota et al. An exploratory study on the generation and distribution of geotagged tweets in Nepal
US8566425B1 (en) Identifying social profiles of entities
WO2019200044A1 (en) System and method of ai assisted search based on events and location
KR101024165B1 (ko) 위치기반 영상인식을 활용한 콘텐츠 생성 및 제공 방법
EP2460093A2 (de) Auf linguistischer analyse basierendes geopositionierungssystem
WO2017064446A1 (fr) Procede de communication entre deux utilisateurs, systeme utilisant un tel procede

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20120228

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK SM TR

RIN1 Information on inventor provided before grant (corrected)

Inventor name: FLUHR, CHRISTIAN

Inventor name: GUIDERE, MATHIEU

DAX Request for extension of the european patent (deleted)
17Q First examination report despatched

Effective date: 20170216

REG Reference to a national code

Ref country code: DE

Ref legal event code: R003

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 20190215