EP2460093A2 - Linguistic-analysis-based geopositioning system - Google Patents
Linguistic-analysis-based geopositioning systemInfo
- Publication number
- EP2460093A2 EP2460093A2 EP10762962A EP10762962A EP2460093A2 EP 2460093 A2 EP2460093 A2 EP 2460093A2 EP 10762962 A EP10762962 A EP 10762962A EP 10762962 A EP10762962 A EP 10762962A EP 2460093 A2 EP2460093 A2 EP 2460093A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- data
- agents
- actions
- linguistic
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/0009—Transmission of position information to remote stations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Definitions
- the invention relates to the multilingual localization of an agent in time and space from a linguistic analysis of digital data transiting over a communication network.
- the digital data includes linguistic data and metadata about the agent.
- the agent may be at the origin of the digital data and / or be the subject of the linguistic data contained in the digital data.
- Cross-language information retrieval which consists of formulating a query in a source language and searching for relevant documents in target languages, or searching for information in Multi Lingual Information Retrieval (MLIR), which consists of formulating a query in a source language and searching for relevant documents in all languages, allows both to enter queries and to obtain documents in different languages, but they do not make it possible to know exactly where and when specific data is contained in a particular document or associated with specific information in the linguistic data.
- agent means a person or group of persons by whom an action is performed or will be performed
- Linguistic data consists of a word or groups of words, categorizable in classes such as names of persons, names of organizations, institutions or enterprises, place names, quantities, distances, etc. ;
- Metadata are data about the characteristics of a digital data such as the file type, the type of transmitter, the identity of the sender or transmitting equipment, the language of transmission, etc.
- the invention relates more particularly to the metadata relating to the place of issue and the date and time of transmission of the data concerning the agent.
- the invention overcomes this need with geolinguistic localization technology that prioritizes the spatiotemporal information present in the linguistic data and the associated metadata contained in the digital data.
- the invention relates to an analysis system receiving, as input, digital data that comprises linguistic data that can be analyzed by linguistic processing, as well as metadata associated with their transmission, the system comprising: a data processing engine; linguistic analysis to analyze linguistic data from a semantic point of view and to qualify them from the point of view of space, time, agents and actions; an extraction engine for extracting, from the analyzed linguistic data and the spatiotemporal metadata associated with the transmission, data relating to the space and the time of agents'actions; a determination engine, from the extracted data, of the spatio-temporal location of the actions of the agents; a representation engine to represent, on a geographical map, the spatio-temporal location of the agents' actions.
- the language analysis engine is a multilingual engine
- the system receives as input data from a communication network
- - digital data comes from Internet sites, blogs, forums, RSS feeds, instant messengers, e-mail services.
- the invention relates to a method for analyzing digital data that includes linguistic data that can be analyzed by linguistic processing, as well as metadata associated with the transmission of these linguistic data, in which The following processing steps are carried out: linguistic data are analyzed from a semantic point of view to qualify them from the point of view of space, time, agents and actions; from the linguistic data analyzed and the spatio-temporal metadata associated with the program, data relating to the space and time of the agents' actions are extracted; the spatio-temporal location of the actions of the agents is determined from the extracted data; the spatial and temporal location of the actions of the agents is represented on a geographical map.
- the data relating to the space and time of the agents' actions are compared with one another to evaluate their authenticity and / or; - we sort the data relating to the space and time of the actions of the agents to highlight, on all of these data, groupings of data according to at least one of their characteristic of time, space, action, agent;
- the data relating to the space and time of the agents' actions are selected from only one of their characteristic of time, space, action, agent;
- the determination of the spatio-temporal location being made from the authenticated extracted data and / or from the groupings obtained and / or from the selection thus made.
- the invention relates to a computer program comprising machine instructions for implementing the method according to the second aspect of the invention.
- FIG. 1 illustrates an embodiment of an analysis system for geolinguistic localization.
- FIG. 2 schematically illustrates an analysis method implemented using the analysis system for geolinguistic localization
- FIGS. 3 to 5 illustrate examples of analysis for the geolinguistic location. DETAILED DESCRIPTION OF THE INVENTION
- Geolinguistic localization is a form of strategic intelligence that combines the principles of physical geolocation with the techniques of advanced linguistic analysis.
- the basic principle is to go back to the physical source of the information sought by following the path that was used to propagate it, based on a cross-check of language indices of this information in several languages and metadata relating to its support. diffusion.
- a spatial-temporal reference is derived from the data and / or metadata, allowing the information to be located at a given moment and in a specific place.
- This reference makes it possible to have a representation at the same time spatial and temporal of the actions and the relations which exist between the emitters, the receivers and the relays of the information.
- the linguistic data and the metadata concerning the supports of the actions and the relations can come from open sources freely accessible by internet or coming from legal interception.
- Open source means freely accessible sites, agency sites, blogs, online videos, streaming television, social networks, RSS feeds, information accessed by engines search, information obtained by querying a specialized site (invisible web).
- the linguistic data can be of private type and are processed only in the context of legal interceptions.
- Principal data refers to data coming from electronic mail, oral conversations according to different channels, SMS, VoIP.
- a user 1 transmits digital data D N on a communication network 3 by means of a communication device 2.
- the digital data D N comprise linguistic data M L able to be analyzed by linguistic processing as well as spatio-temporal metadata M 0 , the digital data D N are received by the analysis system and are open or private.
- the analysis system consists of the following elements:
- the motors 10, 20, 30, 40, 50 can be connected to each other by means of a wired or wireless connection. They may be located at the same place or in different locations and may be under implemented as software stored on a CD-ROM type digital medium, key
- USB or any other known type of storage medium.
- a coordination module (not shown) makes it possible to update the transmitted data and to control the adequacy of the results to the requests to ensure the coherence and updating of the system .
- Motor 10 for acquiring digital data DN The functions of the motors 10, 20, 30, 40, 50 are described below.
- Motor 10 for acquiring digital data DN The functions of the motors 10, 20, 30, 40, 50 are described below.
- the digital data acquisition engine D N makes it possible to acquire E 0 of digital data D N via a search by means of a search engine accessible on the Internet or by listening to the communication network 3.
- the communication network 3 is, for example, a mobile communication cellular network, the Internet network, a corporate network, type Intranet or any other type of known network capable of conveying digital data D N.
- the digital data D N are, for example, supported by an SMS sent by means of, for example, a mobile terminal 2 (as in the example illustrated in FIG. 1), an email or a voice message.
- these digital data D N can be supported by any digital medium that can support linguistic data M L.
- the linguistic analysis engine 20 analyzes Ei the linguistic data D L from a semantic point of view and qualifies them from the point of view of the space S, the time T, the agents A and the actions A c .
- the analysis E 1 and the analysis engine 20 are based on a multilingual morphosyntactic analysis software which recognizes the agents A (names of persons, place names, name of organizations, data of date type and hour, numerical amounts with their unit), the actions A c (agent of the action, action, object of the action, and the circumstantial of time, place, instrument, manner, etc.).
- agents A names of persons, place names, name of organizations, data of date type and hour, numerical amounts with their unit
- the actions A c agents of the action, action, object of the action, and the circumstantial of time, place, instrument, manner, etc.
- the set is standardized (for example, for people, the different spellings of these names are identified regardless of the language and the character set) and, during the analysis Ei, the ambiguities concerning the places according to the context and M 0 metadata by detecting the language L used, are raised.
- the system thus makes it possible to analyze, from the linguistic data, the agents concerned ("On") and the nature of the action ("sees itself”), then, from the metadata, to know the identity of the agents (by their phone numbers) and the spatio-temporal coordinates of the appointment ("within an hour”).
- the engine 30 extracts E 2 , from the linguistic data D L analyzed by the analysis engine 20 and space-time metadata M 0 associated with the transmission, data relating to the space and time of the actions Ac and Agents A.
- This engine 30 makes it possible to locate in time T and in space S the data relating to the space and time of actions A c and agents A.
- access sites containing digital data are listed and acquisition of Numeric data via search engines are based on defined criteria.
- the message in the example above was issued using a mobile terminal "We see each other in an hour at Montparnasse station".
- the metadata M 0 will identify the caller and called and the date and time of the call.
- the caller and the called party are the agents and can be identified by their telephone number, and the relationship between the two is the rendezvous set in a given location: relocation of at least one of the two agents at Montparnasse station.
- the result of the extraction of the digital data D N is as follows.
- the system thus makes it possible to extract, through the analysis of the linguistic data, the agents concerned ("On") and the nature of the action ("sees itself”), then, thanks to the metadata, to know the identity agents (by their phone numbers) and calculate the time of the appointment ("in an hour"), thanks to an inference from the spatio-temporal metadata of the emission and the reception.
- the motor 40 determines E 3 , from the extracted data (S, T, A c , A), the location of the Ac actions of the agents A.
- the determination E 3 implemented in this engine 40 may further consist in implementing an authentication E 3 i, a profiling E 32 , a targeting E 33 of the data relating to the space S and to the time T of the actions A c Agents A.
- the authentication E 3 i, the profiling E 32 , the targeting E 33 can be seen as a filtering of the data.
- the authentication E 3 i, the profiling E 32 and the targeting E 33 are optional and are implemented after the extraction E 2 . They can be implemented successively or alternatively.
- the determination E 3 of the spatio-temporal location is performed from the authenticated extracted data and / or from the groupings obtained by the profiling and / or from the selection made by the targeting.
- Authentication E ⁇ it is a question of comparing data relating to the space S and time T of the actions A c of the agents A.
- an agent A is a natural person located in
- authentication consists of verifying the identity of the agent by exploring its attributes (time, relationships, actions, etc.) in the available linguistic data D L.
- the authentication consists of searching in the linguistic data D L in several languages M L , actions A c and relationships associated with the agent.
- the authentication consists of comparing the multilingual linguistic data M L and the metadata M 0 to ensure that there are no contradictions of the type: for an agent to be in two different places at the same time .
- the system searches for all the forms under which this name appears in several languages M L , then it searches all the documents or data relating to these forms. by recovering, for each piece of data found, the positioning and location indices, both internal (places and dates indicated in the documents) and external (geo-positional data of the network that carries them). After this research phase, the data is cross-checked to extract those relating to the actions that the agent is about to perform or has already performed.
- Prof i lape Ez? it is a question of sorting the data relating to the space S and the time T of the actions Ac of the agents to highlight, on the set of these data, groupings according to at least one of their characteristic of time T, space S, action A c , agent A.
- profiling starts from the set of multilingual linguistic data M L in several languages L to extract, thanks to linguistic rules, the only data relating to places (geographic, urban, territorial landmarks %) and agents A.
- Profiling includes three levels of analysis.
- the first level is that of the general exploration of multilingual linguistic data M L in several languages L, focusing on the actions A c in the multilingual linguistic data M L and on the geo-positional information in the metadata M D.
- the second level of analysis is that of sorting the data according to metadata indices M 0 (place and time of transmission) and data indices (relations and actions of the agent).
- the determination engine 40 starts from the data available to go back to salient data, then to reveal a phenomenon or an agent (not or little known), providing the maximum of spatio-temporal indications on its location from multilingual linguistic data M L and metadata Mp.
- the input data is a set of documents that one seeks to organize.
- the data obtained by profiling is a set of semantic relations between actions and agents having links between them.
- Target E ⁇ _ It is a question of selecting the multilingual linguistic data M L relating to the space S and to the time T of the actions Ac of the agents A from only one of their characteristic of time T, space S, action A c , agent A.
- the targeting consists of aiming, by linguistic analysis, a particular type of data (action Ac, relation R) and looking for this multilingual linguistic data item M L in several languages and on various types of supports.
- the input data is a type of action Ac that is searched in priority.
- the data obtained by targeting is a particular type of semantic relationship between the action sought and agents with links to each other.
- the motor 50 of representation E 4 makes it possible to display on a geographical map 60 the spatio-temporal location of the actions of the agents.
- This representation engine 50 makes it possible in particular to align the markers of a geographical map with the spatiotemporal data resulting from the analysis of the multilingual linguistic data M L , by locating the data extracted from each other and by visualizing their links. in time and space.
- Standardized locations are associated with geographic coordinates in longitude and latitude, which allows them to be represented on a geographic map 60 (for example, using Googlemap TM or any other device of the same type).
- Example 1 Geolinguistic Location from the Internet
- the system makes it possible to detect this language and to extract the spatio-temporal indices making it possible to locate the threat, to cross-check these internal indices with the metadata and external information of the connection (IP / DNS), the server and managers, finally to visualize on a map the places and links between individuals who have spoken or endorsed the content of said forum.
- the system makes it possible to detect this language and to extract the spatio-temporal indices making it possible to locate the transmitting agent, to cross-check these internal indices with the metadata (for example the international mobile equipment identity (in English "International Mobile Equipment Identity ", IMEI) and the external information of the SMS in question (GSM information, GPRS), in order to visualize on a geographical map the places and links between individuals who have received or shared the content of said SMS.
- the metadata for example the international mobile equipment identity (in English "International Mobile Equipment Identity ", IMEI) and the external information of the SMS in question (GSM information, GPRS), in order to visualize on a geographical map the places and links between individuals who have received or shared the content of said SMS.
- the system identifies the following message "we see you tomorrow in front of Montparnasse station", this message was sent on 2/07/2010 at 10h52.
- the system retains an SMS of the two and identifies the associated metadata as well as AC actions, agents A and the space S.
- the system makes it possible to detect this language and to extract the spatio-temporal indices making it possible to locate the threat, to cross-check these internal indices with the metadata and external information of the mailing in question (network information), finally to visualize on a map the places and the links between individuals having received or shared the content of said Email.
- two Emails 51, 52 transit over a communication network and a multilingual search of an agent is carried out.
- the system then keeps one of the two emails and identifies the associated metadata as well as the AC actions, the A agents, and the S space.
- Time T "last Thursday, tomorrow”.
- Time T time coordinates (date and time) of transmission and reception of
- the system thus makes it possible to locate, through the analysis of the linguistic data and the metadata, the agents concerned, to know the identity of the agents (by their telephone numbers) and to calculate the moment ("tomorrow"), thanks to an inference from the spatio-temporal metadata of transmission and reception.
- Example 4 Geolinguistic Location from a Telephone Communication (Including VoIP)
- the system makes it possible to detect this language from the transcription that will be made and extract the spatio-temporal indices for locating the threat, to intersect these internal indices with the metadata and external information of geopositioning (VoIP, GSM, GPRS, GPS), finally to visualize on a map the places and the links between individuals having received or shared this communication.
- the system makes it possible to detect the threat by a linguistic analysis, to go back to the resource supporting the threat and to locate the transmitter and the potential receivers in time and in space, starting from the language production contained in the relays.
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR0955355A FR2948791B1 (en) | 2009-07-30 | 2009-07-30 | LINGUISTIC ANALYSIS GEOLOCATION SYSTEM |
PCT/FR2010/051637 WO2011012834A2 (en) | 2009-07-30 | 2010-07-30 | Linguistic-analysis-based geopositioning system |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2460093A2 true EP2460093A2 (en) | 2012-06-06 |
Family
ID=42126443
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP10762962A Ceased EP2460093A2 (en) | 2009-07-30 | 2010-07-30 | Linguistic-analysis-based geopositioning system |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP2460093A2 (en) |
FR (1) | FR2948791B1 (en) |
WO (1) | WO2011012834A2 (en) |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050108195A1 (en) * | 2002-05-07 | 2005-05-19 | Microsoft Corporation | Method, system, and apparatus for processing information based on the discovery of semantically labeled strings |
US7257570B2 (en) * | 2003-11-13 | 2007-08-14 | Yahoo! Inc. | Geographical location extraction |
US20070099626A1 (en) * | 2005-10-31 | 2007-05-03 | Honeywell International Inc. | Tracking system and method |
-
2009
- 2009-07-30 FR FR0955355A patent/FR2948791B1/en not_active Expired - Fee Related
-
2010
- 2010-07-30 EP EP10762962A patent/EP2460093A2/en not_active Ceased
- 2010-07-30 WO PCT/FR2010/051637 patent/WO2011012834A2/en active Application Filing
Non-Patent Citations (3)
Title |
---|
CHINATSU AONE ET AL: "REES : a large-scale relation and event extraction system", PROCEEDINGS OF THE SIXTH CONFERENCE ON APPLIED NATURAL LANGUAGE PROCESSING -, 1 January 2000 (2000-01-01), Morristown, NJ, USA, pages 76 - 83, XP055539811, DOI: 10.3115/974147.974158 * |
HRISTO TANEV ET AL: "Real-time News Event Extraction for Global Monitoring Systems", 1 December 2008 (2008-12-01), XP055540095, Retrieved from the Internet <URL:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.178.8698&rep=rep1&type=pdf> [retrieved on 20190109] * |
JAKUB PISKORSKI ET AL: "Cluster-Centric Approach to News Event Extraction", FRONTIERS IN ARTIFICIAL INTELLIGENCE AND APPLICATIONS, September 2008 (2008-09-01), pages 276 - 290, XP055496177 * |
Also Published As
Publication number | Publication date |
---|---|
FR2948791B1 (en) | 2016-09-30 |
FR2948791A1 (en) | 2011-02-04 |
WO2011012834A3 (en) | 2011-04-07 |
WO2011012834A2 (en) | 2011-02-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10270862B1 (en) | Identifying non-search actions based on a search query | |
US9531649B2 (en) | Identification of message recipients | |
US8055675B2 (en) | System and method for context based query augmentation | |
US10133458B2 (en) | System and method for context enhanced mapping | |
US11122009B2 (en) | Systems and methods for identifying geographic locations of social media content collected over social networks | |
US9563649B2 (en) | Location stamping and logging of electronic events and habitat generation | |
US8386506B2 (en) | System and method for context enhanced messaging | |
JP6689515B2 (en) | Method and apparatus for identifying the type of user geographic location | |
US8223932B2 (en) | Appending content to a telephone communication | |
US9749274B1 (en) | Associating an event attribute with a user based on a group of one or more electronic messages associated with the user | |
US20100082427A1 (en) | System and Method for Context Enhanced Ad Creation | |
US20130297581A1 (en) | Systems and methods for customized filtering and analysis of social media content collected over social networks | |
US20130297652A1 (en) | System and method for presentation of media related to a context | |
US20140195234A1 (en) | Voice Recognition Grammar Selection Based on Content | |
US20130297694A1 (en) | Systems and methods for interactive presentation and analysis of social media content collection over social networks | |
CN104700835A (en) | Method and system for providing voice interface | |
CN107231485B (en) | Method and device for establishing event reminding | |
CN103457975A (en) | Method and device for acquiring map interest point evaluation data | |
JP2017199225A (en) | Device and method for selecting disaster information | |
Devkota et al. | An exploratory study on the generation and distribution of geotagged tweets in Nepal | |
US8566425B1 (en) | Identifying social profiles of entities | |
WO2019200044A1 (en) | System and method of ai assisted search based on events and location | |
KR101024165B1 (en) | Contents generating and providing method using image recognition based on location | |
WO2011012834A2 (en) | Linguistic-analysis-based geopositioning system | |
US11728025B2 (en) | Automatic tracking of probable consumed food items |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20120228 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK SM TR |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: FLUHR, CHRISTIAN Inventor name: GUIDERE, MATHIEU |
|
DAX | Request for extension of the european patent (deleted) | ||
17Q | First examination report despatched |
Effective date: 20170216 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R003 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED |
|
18R | Application refused |
Effective date: 20190215 |