EP3283977A1 - Auf maschinellem lernen basierende suchverbesserung - Google Patents

Auf maschinellem lernen basierende suchverbesserung

Info

Publication number
EP3283977A1
EP3283977A1 EP16714678.6A EP16714678A EP3283977A1 EP 3283977 A1 EP3283977 A1 EP 3283977A1 EP 16714678 A EP16714678 A EP 16714678A EP 3283977 A1 EP3283977 A1 EP 3283977A1
Authority
EP
European Patent Office
Prior art keywords
client device
query
search engine
feature
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP16714678.6A
Other languages
English (en)
French (fr)
Inventor
John M. Hornkvist
Gaurav Kapoor
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apple Inc
Original Assignee
Apple Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/721,945 external-priority patent/US10885039B2/en
Application filed by Apple Inc filed Critical Apple Inc
Publication of EP3283977A1 publication Critical patent/EP3283977A1/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • a client device can generate a substantial amount of information that is private to a user.
  • the private information can be used to generate one or more local predictors that can be used to determine preferences of a user of the client device.
  • the private information can also be used to add one or more features to an existing local predictor.
  • at least some of the private information of the user of the client device can be anonymized and shared with a remote search engine. For example, the client device can generate a predictor that learns that this particular user frequently selects movies from query results when the user is at home on a weekend evening in the winter. The particular user's exact selection of which movies she watches, and the particular user's home address are private information and would not be sent to a remote search engine.
  • a user may have been searching for cafes open between 12:00pm and 2:00pm in Sunnyvale, California, near Murphy's Station, with a Yelp® price rating of "$$" and a service rating of 4 stars or more.
  • the search context for subsequent searches will be, "Cafes open between 12:00pm and 2:00pm in Sunnyvale, California, near Murphy's Station, with a Yelp® price rating of "$$" and a service rating of 4 stars or more.”
  • the information in the session context is anonymized before sending the session context to the search engine.
  • a feature can also be obtained from information sources available to a client device, such as the current date, time, time zone, weather, or temperature.
  • a feature can be further be a state, or combination of states of a client device, such as which applications are open, how long the applications have been open, whether a user has issued a query that relates to an application that is open, such as a user query regarding music when iTunes® is open, calendar events in the user' s calendar, or whether a user is on a call, writing a text, or answering an email.
  • Features can also be obtained from tags in results. For example, Yelp® may tag restaurant results with a price rating with a certain number of "$" signs or tag results with a service quality rating measured in a certain number of stars.
  • a feature can alternatively be expressed as a real value, e.g. "real:
  • a predictor treats each of these possible features as an individual feature (input) to the predictor.
  • a client device may maintain an aggregate, current state of a combination of features that a plurality of predictors can use as a single feature. For example, a user's location, the current day of the week, the current time of day, and a list of applications open on a device are features that may frequently appear together in predictors. A client device can maintain these features in a current state, as an aggregate feature (input) to any predictor that uses these features.
  • a feature can additional be a distinction learned from analyzing results data.
  • Feature metadata can be represented in data structures passed from a search engine to client device, or from a client device a search engine.
  • Features can utilize this format with values in the fields.
  • a feature set can be represented as follows:
  • Some embodiments described herein include one or more application programming interfaces (APIs) in an environment with calling program code interacting with other program code being called through the one or more interfaces.
  • APIs application programming interfaces
  • Various function calls, messages or other types of invocations, which further may include various kinds of parameters, can be transferred via the APIs between the calling program and the code being called.
  • an API may provide the calling program code the ability to use data types or classes defined in the API and implemented in the called program code.
  • Figure 6 illustrates, in block diagram form, a method of locally learning a query feature passed to a local device by a remote search engine in response to a query sent to the remote search engine.
  • Figure 7 illustrates, in block diagram form, a method of receiving or determining a new feature, locally training on the feature, and utilizing the feature.
  • Embodiments are described for using locally available information on a client device, and information obtained from a remote search engine, to learn features that improve both local and remote search results for a user of the client device, without disclosing private information about the user to a remote search engine.
  • Local database 111 indexes local information on the computing device 100 for searching using local search interface 110.
  • Local information is private to a computing device 100 and is not shared with remote search subsystem 135.
  • Local information can include data, metadata, and other information about applications 112 and data 113 on client device 100.
  • Remote search interface 120 can communicate with the remote query service 121 via interface 2.
  • Remote query service 121 can communicate with the network service 122 via interface 4.
  • Remote query service 121 can perform analogous functionality for the remote search engine 150.
  • local search interface 110 can pass the query to the remote query service 121, via communication interface 7, to obtain query results from remote search engine 150.
  • remote query service 121 can receive a query feature learned by local learning system 116 via communication interface 8. The feature can be used to extend the query and/or bias a query feature to the remote search engine 150.
  • remote query service 121 can pass a query feature, returned from the remote search engine 150, to the local learning system 116 for training on that feature via communication interface 8.
  • Local search and feedback history 115 can store the history of all search queries issued using the local query interface 110, including queries that are sent to the remote query service 121 via communication interface 7.
  • Local search and feedback history 115 can also store user feedback associated with both local and remote results returned from a query.
  • Feedback can include an indication of whether a user engaged with a result, e.g. by clicking-through on the result, how much time the user spent viewing the result, whether the result was the first result that the user interacted with, or other ordinal value, whether result was the only result that a user interacted with, and whether the user did not interact with a result, i.e. abandoned the result.
  • the user feedback can be encoded and stored in association with the query that generated the results for which the feedback was obtained.
  • the local search and feedback history 115 can store a reference to one or more of the results returned by the query. Information stored in the local search and feedback history 115 is deemed private user information and is not available to, or accessible by, the remote search subsystem 135. In one embodiment, the local search and feedback history 115 can be flushed. In an embodiment, local search and feedback history 115 can be aged-out. The age-out timing can be analyzed so that stable long term trends are kept longer than search and feedback history showing no stable trend.
  • An anonymization and location fuzzing service 117 (“anonymization service”) ensures that private information of the user that is stored in local database 111, local search and feedback history 115 and local learning 116 is kept private and is not sent to a remote search engine 150 without first anonymizing the data to be sent to the remote search engine 150.
  • anonymization and location fuzzing service 117 may substitute "at home” as a status of the user, instead of sending the user's home address, nearby cell tower identifiers, cell network IP address, WiFi IP address, or other information that could identify the user's location with a high degree of specificity.
  • anonymization service 117 may substitute "romantic comedy” as a genre that the user prefers in place of exact information identifies a particular movie that the user has previously selected for viewing, such as "Something About Mary.”
  • Anonymization service 117 can further include a location "fuzzing” service.
  • the location fuzzing service ensures that the exact location of a user is kept private.
  • the location fuzzing service can take into account the population density of the current location of the user and obfuscate (or "fuzz") the user's location sufficiently to ensure privacy. For example, a user may currently be located in a highly dense city, looking for Italian restaurants having a price rating on Yelp® of "$$$$" and a dinner service rating of 4.5 stars on Columbus Ave. in San Francisco California.
  • the computing device 100 can also include a remote search subsystem 135 that includes a remote search interface 120 and a remote query service 121.
  • a remote search interface 120 can include a web browser such as Apple® Safari®, Mozilla®, or Firefox®.
  • a query service 121 can perform intermediary processing on a query prior to passing the query to network service 122 and on to remote search engine 150 via network 140.
  • Network service 122 can receive results back from remote search engine 150 for display on remote query interface 120 or on local search interface 110.
  • Remote query service 121 can be communicatively coupled to network service 122 via communication interface 4.
  • FIG. 3 illustrates, in block diagram form, a search system 300 in which a plurality of client devices 100 are coupled to search engine 150 and aggregator 152.
  • Laptop computer 101, tablet computer 102, and cell phone 103 are representative of client devices 100.
  • search engine 150 can generate a unique session identifier (session ID) for the client device 100 and can also start a session timer for the session.
  • session ID unique session identifier
  • search engine 150 can store a history of the queries issued by the user, and can store an indication of which query results the user interacted with, and other user feedback data.
  • the query results and feedback data can be stored by search engine 150 in association with the session ID.
  • the stored queries and user interaction data represent a "user intent" or "query context” indicating what the user of the client device 100 has been querying about during the session. Since the stored queries and interaction data are private to the user of client device 100, the information can be retained on client device 100 even after the session timer has expired, thereby ending the session.
  • search engine 150 can generate a new session ID and can transmit the new session ID to the user. To preserve privacy of the user, the new session ID and the expired session ID are not associated with one another within the search engine 150.
  • Search engine 150 includes aggregator 152 and multiple search domains 160A-G.
  • aggregator 152 receives requests for query completions based on at least a partial input query ("input query prefix").
  • the aggregator 152 sends the input query prefix to each of the search domains 160A-G.
  • Each of the search domains 160A-G uses the input query prefix to determine possible query completions in that domain.
  • the map search domain 160 A receives an input query prefix and searches this domain for possible query completions.
  • the aggregator 152 receives the query completions from each of the search domains 160A-G, and ranks the received query completions based on the relevance scores for each of the completions determined by the corresponding search domain and weights based on the query prefix context.
  • the query, feedback history 115, and association with a particular user can be used by the local learning 116 to generate a social graph for the particular user.
  • the local learning 116 can be used by the local learning 116 to generate a social graph for the particular user.
  • Local results 413 can be received from, e.g., a contacts application 112 and remote results 415 can be returned for, e.g., Linkedln® profiles of persons named Bill and Steven, as well as other remote results 415.
  • a user may issue a query for "football scores.”
  • Remote search engine 150 may return results for both football scores and soccer scores.
  • Remote search engine 150 may have determined that the computing device 100 that sent the query was located at an IP address that is in the United States. Therefore remote search engine 150 prioritized American football scores, such as the Dallas Cowboys, as being the most relevant results.
  • football means soccer.
  • Local learning system 116 can analyze the local search history and feedback history 115 to determine that the user did not interact with the higher-ranked American football scores. Local learning system 116 can then analyze the results and determine that the feature that football has at least two meanings and that the user of this computing device 100 has a preference for soccer over American football.
  • search engine 150 can examine its own local cache and indices to obtain query results before, or in conjunction with, passing the received query to one or more search domains 160 via network 140, in operation 815.
  • search engine 150 can analyze results returned from search domains 160 with crowd-source user feedback data for the query to detect one or more new features for search engine 150 to train upon.
  • search engine 150 can optionally modify one or more predictors from set A or set B, whichever set produced better progress. Modifying a predictor can include adding a feature to a predictor (extending the predictor), deleting a feature from a predictor, or changing one or more values of a predictor. Values can be text, numbers, or both.
  • a user may have been search for karaoke bars open at 8 p.m. this Friday night in San Rafael, California.
  • a search engine session time may expire, ending the first session, or a user may leave the client device 100, and return later to the client device 100 and resume searching for karaoke bars in a second, different session.
  • Providing context information from the first session to the search engine 150 for the second session allows a user to resume searching for karaoke bars open at 8 p.m. this Friday night in San Rafael, California, as though there had been only one session.
  • client device 100 can generate and store a context for the client device session with search engine 150.
  • Storing a context for a session can include tagging queries and user feedback data stored in local search and feedback history 115 with the session ID.
  • Storing context can further include storing information that identifies the session, such as session ID, current date, start time of the session, or IP address or home page of search engine 150 or other identifying search engine 150.
  • client device 100 can anonymize context information stored in local search and feedback history database 115. Anonymization can be performed by anonymization and location fuzzing service 117.
  • client device 100 can transmit the anonymized context information from one or more previous sessions to search engine 150.
  • client device 100 transmits the anonymized context information to search engine 150 in response to receiving a new session ID.
  • client device 100 transmits context information from one or more previous sessions to a different search engine 150 than the search engine 150 of the one or more previous sessions. This allows a user to search for the same information on a different search engine 150 without re-entering all previous queries of the previous one or more sessions.
  • the method continues at operation 1015, with the user interacting with remote search engine 150.
  • Figure 11 illustrates, in block form, a system 1100 for synchronizing predictors across multiple client devices 100 of a user.
  • predictors can be stored on each client device 100 within local learning system 116.
  • synchronizing predictors can include synchronizing local learning system 116 for each client device 100.
  • synchronizing predictors across multiple devices can also include synchronizing local search and feedback history 115 across multiple client devices 100. Context between search sessions, as described with reference to Figure 10, above, can be performed by synchronizing local search and feedback history 115 across multiple client devices.
  • Synchronizing multiple client devices 100 can further include synchronizing local database 111 for multiple client devices 100.
  • local database 111 can include data, metadata, and other information about applications 112 and data 113 on a client device 100.
  • only selected portions of local database 111 are synchronized between multiple client devices. The selected portions correspond to applications 112 andl 13 which a user may have chosen to synchronize between multiple client devices 100.
  • the user data set can be chunked into chunked data portions and stored on the one or more contents servers 1170.
  • Metadata describing the user data set and metadata about the chunked portions of the user data set can be stored on the metadata server 1110 in a synchronization metadata database.
  • the metadata server 1110 and contents server 1170 can be managed using a synchronization management system 1160.
  • Synchronization system manager 1160 can provide software updates and patches to the metadata server 1110 to adapt the document for use with both version 1.0 and version 2.0 of the word processing software.
  • a predictor on a first client device 101 may utilize features that are not supported by a second client device 102, or have specifications that differ on the second client device 102.
  • client device 101 may have a predictor that utilizes a "location" feature based upon a GPS signal.
  • Client device 102 may only be able to detect location by cell tower approximation, area code, or IP address.
  • Synchronization system manager 1160 can modify the "location" feature in the predictor of client device 101 before synchronizing the predictor with client device 102.
  • synchronization system manager 1160 can remove a feature from a predictor that is not supported by a target client device before synchronizing predictors.
  • a modified predictor and be flagged or otherwise marked as to the change made to a predictor before synchronization.
  • Communication between one or more of the synchronization system manager 1160, metadata server 1110, and contents server(s) 1170 can be by sockets, messages, shared memory, an application program interface (API), interprocess communication, or other processing communication service.
  • API application program interface
  • a client device 100 can include a desktop computer system, a laptop computer system such as client device 101, a tablet computer system such as client device 102, a cellular telephone such as client device 103, a personal digital assistant (PDA) including cellular-enabled PDAs, a set top box, an entertainment system, a gaming device, or other consumer electronic device.
  • PDA personal digital assistant
  • Synchronization metadata can include a universally unique identifier (UUID) for a file or a directory that is unique across the client devices 100 of a user, and can further include ETAGS.
  • UUID universally unique identifier
  • ETAGS can specify a specific version of the metadata for a document or a directory.
  • ETAGS can be generated by the synchronization system 100 to manage the user data sets and resolve conflicts between differing generations of user data for a particular user data set. For example, an ETAG can be used to distinguish different generations of a word processing document of the resume of the user.
  • FIG 12 (“Software Stack"), an exemplary embodiment, applications can make calls to Services A or B using several Service APIs and to Operating System (OS) using several as APIs, A and B can make calls to as using several as APIs.
  • OS Operating System
  • Figure 13 is a block diagram of one embodiment of a computing system 1300.
  • the computing system illustrated in Figure 13 is intended to represent a range of computing systems (either wired or wireless) including, for example, desktop computer systems, laptop computer systems, cellular telephones, personal digital assistants (PDAs) including cellular-enabled PDAs, set top boxes, entertainment systems or other consumer electronic devices.
  • Alternative computing systems may include more, fewer and/or different components.
  • the computing system of Figure 13 may be used to provide the computing device and/or the server device.
  • Computing system 1300 includes bus 1305 or other communication device to communicate information, and processor 1310 coupled to bus 1305 that may process information.
  • computing system 1300 may include multiple processors and/or co-processors 1310.
  • Computing system 1300 further may include random access memory (RAM) or other dynamic storage device 1320 (referred to as main memory), coupled to bus 1305 and may store information and instructions that may be executed by processor(s) 1310.
  • Main memory 1320 may also be used to store temporary variables or other intermediate information during execution of instructions by processor 1310.
  • Computing system 1300 may also include read only memory (ROM) and/or other static storage device 1340 coupled to bus 1305 that may store static information and instructions for processor(s) 1310.
  • ROM read only memory
  • Data storage device 1340 may be coupled to bus 1305 to store information and instructions.
  • Data storage device 1340 such as flash memory or a magnetic disk or optical disc and corresponding drive may be coupled to computing system 1300.
  • Computing system 1300 can also include an alphanumeric input device 1360, including
  • alphanumeric and other keys which may be coupled to bus 1305 to communicate information and command selections to processor(s) 1310.
  • cursor control 1370 such as a touchpad, a mouse, a trackball, or cursor direction keys to communicate direction information and command selections to processor(s) 1310 and to control cursor movement on display 1350.
  • Computing system 1300 further may include one or more network interface(s) 1380 to provide access to a network, such as a local area network.
  • Network interface(s) 1380 may include, for example, a wireless network interface having antenna 1385, which may represent one or more antenna(e).
  • Computing system 1200 can include multiple wireless network interfaces such as a combination of WiFi, Bluetooth® and cellular telephony interfaces.
  • Network interface(s) 1380 may also include, for example, a wired network interface to communicate with remote devices via network cable 1387, which may be, for example, an Ethernet cable, a coaxial cable, a fiber optic cable, a serial cable, or a parallel cable.
  • network interface(s) 1380 may provide access to a local area network, for example, by conforming to IEEE 802.11 b and/or IEEE 802.11 g standards, and/or the wireless network interface may provide access to a personal area network, for example, by conforming to Bluetooth standards. Other wireless network interfaces and/or protocols can also be supported.
  • network interface(s) 1380 may provide wireless communications using, for example, Time Division, Multiple Access (TDMA) protocols, Global System for Mobile Communications (GSM) protocols, Code Division, Multiple Access (CDMA) protocols, and/or any other type of wireless communications protocol.
  • TDMA Time Division, Multiple Access
  • GSM Global System for Mobile Communications
  • CDMA Code Division, Multiple Access
EP16714678.6A 2015-05-26 2016-03-18 Auf maschinellem lernen basierende suchverbesserung Withdrawn EP3283977A1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14/721,945 US10885039B2 (en) 2014-05-30 2015-05-26 Machine learning based search improvement
PCT/US2016/023284 WO2016190947A1 (en) 2014-05-30 2016-03-18 Machine learning based search improvement

Publications (1)

Publication Number Publication Date
EP3283977A1 true EP3283977A1 (de) 2018-02-21

Family

ID=60955415

Family Applications (1)

Application Number Title Priority Date Filing Date
EP16714678.6A Withdrawn EP3283977A1 (de) 2015-05-26 2016-03-18 Auf maschinellem lernen basierende suchverbesserung

Country Status (1)

Country Link
EP (1) EP3283977A1 (de)

Similar Documents

Publication Publication Date Title
US10885039B2 (en) Machine learning based search improvement
US10311478B2 (en) Recommending content based on user profiles clustered by subscription data
US10223465B2 (en) Customizable, real time intelligence channel
US20220365939A1 (en) Methods and systems for client side search ranking improvements
US8856168B2 (en) Contextual application recommendations
US8055675B2 (en) System and method for context based query augmentation
US8495058B2 (en) Filtering social search results
US20160189225A1 (en) Generating Advertisements Using Functional Clusters
US9477720B1 (en) Social search endorsements
US20120023085A1 (en) Social graph search system
US9946799B2 (en) Federated search page construction based on machine learning
US8528053B2 (en) Disambiguating online identities
KR20120037383A (ko) 컴퓨팅 장치로의 검색 결과의 제공
KR20160113741A (ko) 소셜 네트워크 데이터에 기반한 검색 결과의 클라이언트-측 변경
US20120295633A1 (en) Using user's social connection and information in web searching
US20120150833A1 (en) Using social-network data for identification and ranking of urls
US20170193059A1 (en) Searching For Applications Based On Application Usage
US9946794B2 (en) Accessing special purpose search systems
US9519683B1 (en) Inferring social affinity based on interactions with search results
US8825698B1 (en) Showing prominent users for information retrieval requests
US11941145B2 (en) User data system including user data fragments
US10445326B2 (en) Searching based on application usage
EP3283977A1 (de) Auf maschinellem lernen basierende suchverbesserung
US20150046441A1 (en) Return of orthogonal dimensions in search to encourage user exploration

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20171114

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: APPLE INC.

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20191125

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20210614