RO131695A2

RO131695A2 - Customized prediction method concerning categories of internet pages to be visited by a user

Info

Publication number: RO131695A2
Application number: ROA201500607A
Authority: RO
Inventors: Costin-Gabriel Chiru; Constantin Ilaş
Original assignee: Iminent Technology S.R.L.
Priority date: 2015-08-21
Filing date: 2015-08-21
Publication date: 2017-02-28

Abstract

The invention relates to a customized prediction method based on artificial intelligence algorithms, concerning the categories of Internet pages to be visited by a user, method that can be employed by Internet browsers and providers of web services for suggesting pages thereto, aiming at accelerating and simplifying navigation, with benefits in navigation from personal computers, but especially from mobile devices. According to the invention, the prediction method comprises the following stages: a first stage of pre-processing (3) in which, starting from a current navigation sequence of a user and from his navigation history, one or several acyclic vectors are created with the categories of visited pages, in the visiting order; a second stage, of processing (4), which predicts the category of the next Internet page to be accessed by the user, by combining two different prediction techniques: one based on a recommendation process (12) and the other - a statistic method based on comparing probabilities of access of various categories of pages, using three different factors for making predictions: the user's navigation history, the current moment in time and the user's current navigation pace; there is also a third stage in which the pre-processed information is added to the user's navigation history. The prediction method outputs the probable categories (O1) for the current moment, the probable categories (O2) for the current navigation pace, the probability (O3) of continuing the sequence and the predicted category (O4), by combining the previous categories (O1, O2, O3).

Description

Invenția se referă la o metodă de predicție, bazată pe algoritmi de inteligență artificială, a categoriilor probabile din care vor face parte paginile de internet ce urmează a fi vizitate de un utilizator, metodă care poate fi folosită de browserele de internet și de furnizorii de servicii web pentru oferirea de sugestii de pagini, în vederea accelerării și simplificării navigației. De asemenea, invenția poate fi folosită și pentru oferirea de servicii web complexe, bazate pe informații provenind din mai multe surse, obținute anticipat față de momentul când sunt cerute, precum și pentru creșterea securității, prin detectarea unui utilizator neautorizat. Prin categorie a unei pagini de internet se înțelege domeniul căruia îi aparține pagina, de exemplu: știri, filme, muzică, știință și tehnică, etc.The invention relates to a method of prediction, based on artificial intelligence algorithms, of the probable categories of which web pages will be visited by a user, a method that can be used by Internet browsers and service providers. web site for offering page suggestions, in order to speed up and simplify navigation. Also, the invention can also be used to provide complex web services, based on information from multiple sources, obtained in advance of the time when they are required, as well as to increase security, by detecting an unauthorized user. The category of a web page means the domain to which the page belongs, for example: news, movies, music, science and technology, etc.

Sunt cunoscute diverse metode de predicție folosite în diferite aplicații. Unele dintre ele sunt specializate pentru dispozitive mobile, pentru sarcini ca predicția zonelor din documente ce trebuiesc pre-încărcate pe dispozitivele mobile, prezicerea cuvintelor ce se doresc a fi tastate pornind de la câteva caractere ambigue sau de la funcționalitățile telefoanelor, cum ar fi cele prezentate în documentele de brevet Predicting and Retrieving Data for Preloading on Client Device, inventatori Chang J, Yuan S, Wang B, Tsai E, Jiang A (US2013304798A1, publicat 14.11.2013), A Communication Terminal Having a Predictive Editor Application, inventator Haestrup Jan (EP1031914A2, publicat 30.08.2000), Messaging System, inventatori Ezra D și Kaluaratchie D C (GB2470585A, publicat 01.12.2010).Various prediction methods used in different applications are known. Some of them are specialized for mobile devices, for tasks such as predicting the areas of documents that have to be pre-loaded on mobile devices, predicting the words that they want to be typed based on a few ambiguous characters or the functionalities of the phones, such as the ones presented. In Patent Documents Predicting and Retrieving Data for Preloading on Client Device, Inventors Chang J, Yuan S, Wang B, Tsai E, Jiang A (US2013304798A1, published 14.11.2013), A Communication Terminal Having a Predictive Editor Application, Inventor Haestrup Jan (EP1031914A2, published 30.08.2000), Messaging System, inventors Ezra D and Kaluaratchie DC (GB2470585A, published 01.12.2010).

Cele mai multe metode însă sunt fie specializate pentru dispozitive desktop, fie adaptate pentru ambele tipuri de dispozitive, pentru sarcini ca îmbunătățirea rezultatelor furnizate de motoarele de căutare, prezicerea următorului link ce urmează a fi vizitat de către utilizator, îmbunătățirea navigării pe internet prin pre-randarea de pagini web sau prin realizarea unui sistem de caching pe partea de server, prezicerea căilor de navigare alcătuite din mai mulți pași, prezicerea sit-urilor web ce se doresc a fi accesate folosind doar primele caractere tipărite într-un browser, evaluarea acțiunilor posibile ale utilizatorilor în cazul sistemelor IPTV, cum ar fi cele prezentate în documentele de brevet Modelling, Detecting and Predicting User Behavior with Hidden Markov Models, inventator Heck Larry P. (US2009164395A1, publicat 25.06.2009), System and Method for Auto-sensed Search Help, inventator Watson Eric B. (US2006259861A1, publicat 16.11.2006), Predicting User Navigation Events Based on Chronological History Data, inventatori Burkard T, Hamon D și Jain A (US8655819B1, publicat 18.02.2014), Accelerating User Interfaces by Predicting User Actions, inventatori Fredricksen E R., Buchheit P și Rennie J G. (W02006085314A2, publicat 30.06.2004), Access to NetWork Content, inventatori Burkard T și Jain A (US2014095966A1, publicat 03.04.2014), Predicting User Navigation Events, inventatori Hamon D, Burkard T și Jain A (US8566696B1, publicat 22.10.2013), Predicting User Navigation Events in an Internet Browser, inventator Hamon D (US8744988B1, publicat 03.06.2014), A Novei Book-like Internet Browser for Electronic Information, inventatori Seet Chern H și Ho Seng B (W00206917A2, publicat 24.01.2002), Dynamic Prefetching Method and System for Metadata, inventatori Cetin G, Gumus U și Kupusoglu O (EP2194471A1, publicat 09.06.2010), Web Site Connecting System Using Keyword and ' 2 0 1 5 . - 066072 1 -08-2015Most methods, however, are either specialized for desktop devices or adapted for both types of devices, for tasks such as improving search engine results, predicting the next link to be visited by the user, improving web browsing through pre- rendering web pages or by creating a caching system on the server side, predicting the navigation paths consisting of several steps, predicting the websites that are wanted to be accessed using only the first characters printed in a browser, evaluating the possible actions of users in the case of IPTV systems, such as those presented in Patent documents Modeling, Detecting and Predicting User Behavior with Hidden Markov Models, Heck inventor Larry P. (US2009164395A1, published 25.06.2009), System and Method for Auto-sensed Search Help, inventor Watson Eric B. (US2006259861A1, published 16.11.2006), Predicting Use r Navigation Events Based on Chronological History Data, inventors Burkard T, Hamon D and Jain A (US8655819B1, published 18.02.2014), Accelerating User Interfaces by Predicting User Actions, inventors Fredricksen E R., Buchheit P and Rennie J G. (W02006085314A2 , published June 30, 2004), Access to NetWork Content, inventors Burkard T and Jain A (US2014095966A1, published 03.04.2014), Predicting User Navigation Events, inventors Hamon D, Burkard T and Jain A (US8566696B1, published 22.10.2013), Predicting User Navigation Events in an Internet Browser, inventor Hamon D (US8744988B1, published 03.06.2014), A New Book-like Internet Browser for Electronic Information, inventors Seet Chern H and Ho Seng B (W00206917A2, published 24.01.2002), Dynamic Prefetching Method and System for Metadata, Cetin G, Gumus U and Kupusoglu O inventors (EP2194471A1, published 09.06.2010), Web Site Connecting System Using Keyword and '2 0 1 5. - 066072 1 -08-2015

Method Threreby, inventator Park Ho-nam (W02007027002A1, publicat 08.03.2007), Enabling Predictive Web Browsing, inventatori White C A. și Jones C D. (US2012137201A1, publicat 31.05.2012), User Intention Modeling for Web Navigation, inventatori Zheng C, Xiaoming S și Liu W (US2003212760A1, publicat 13.11.2003), Dynamic Page Generation Acceleration Using Component-Level Caching, inventator Anindya D (US6622168B1, publicat 16.11.2003), Access to Network Content, inventatori Burkard T, Jain A și Bentzel C (US2012324043A1, publicat 20.12.2012), Adaptive Prefetching for Computer Network and Web Browsing with a Graphic User Interface, inventatori Jiang Z și Kleinrock L (US6385641B1, publicat 07.05.2002), Access to Network Content, inventator Hamon Dominic (US8954524B1, publicat 10.02.2015), Web Brower Graphics Management, inventatori Eves D și Timms A (W00019336A1, publicat 06.04.2000), Navigating Organizational Structures, inventatori Forman G H și Suermondt H J (US2003139901A1, publicat 24.07.2003), Accelerating User Interfaces by Predicting User Actions, inventatori Fredricksen E R., Buchheit P și Rennie J G. (W02006012144A2, publicat 02.02.2006).Method Threreby, Park Ho-nam inventor (W02007027002A1, published 08.03.2007), Enabling Predictive Web Browsing, White C A. and Jones C D. inventors (US2012137201A1, published 31.05.2012), User Intention Modeling for Web Navigation, Zheng inventors C, Xiaoming S and Liu W (US2003212760A1, published 13.11.2003), Dynamic Page Generation Acceleration Using Component-Level Caching, Anindya D inventor (US6622168B1, published 16.11.2003), Access to Network Content, Burkard T, Jain A inventors and Bentzel C (US2012324043A1, published 20.12.2012), Adaptive Prefetching for Computer Network and Web Browsing with a Graphic User Interface, inventors Jiang Z and Kleinrock L (US6385641B1, published 07.05.2002), Access to Network Content, inventor Hamon Dominic (US8954524B1 , published 10.02.2015), Web Brower Graphics Management, Eves D and Timms A inventors (W00019336A1, published 06.04.2000), Navigating Organizational Structures, Forman GH and Suermondt HJ inventors (US2003139901A1, published Cat. 24.07.2003), Accelerating User Interfaces by Predicting User Actions, inventors Fredricksen E R., Buchheit P and Rennie J G. (W02006012144A2, published 02.02.2006).

Aceste metode au multe dezavantaje. Un dezavantaj constă în aceea că, în toate cazurile, pentru predicție sunt folosite metode simpliste ce folosesc informații puține (bigrame sau cel mult n-grame de vizite de pagini) și presupuneri simpliste (Naive Bayse) pe baza cărora se pot deriva informații limitate (în general numai predicția următoarei pagini sau predicția unei secvențe de pagini, dar cu acuratețe din ce în ce mai scăzută).These methods have many disadvantages. A disadvantage is that, in all cases, for the prediction, simplistic methods are used that use little information (bigrame or at most n-grams of page visits) and simplistic assumptions (Naive Bayse) based on which limited information can be derived ( generally only the prediction of the next page or the prediction of a sequence of pages, but with increasing accuracy).

Majoritatea metodelor de mai sus (cu excepția US2003212760A1) consideră drept istoric de navigare numai pagina curentă, fără a se raporta la istoria anterioară de navigare, iar aceasta istorie foarte scurtă nu permite obținerea unei precizii bune. Un alt dezavantaj constă în faptul că aceste metode folosesc informația referitoare la pagina curentă pentru a prezice numai următoarea pagină ce va fi accesată și nu o anumită cale de navigare. De cele mai multe ori, predicția se bazează pe parsarea paginii curente pentru a extrage legăturile existente către alte pagini web iar predicția consideră ca posibilități de continuare a navigării numai aceste legături, ceea ce necesită mai mult timp de procesare pentru a extrage legăturile din pagină și de aici un alt dezavantaj. Excepție fac US2003212760A1, US6622168B1. Unele dintre metodele menționate folosesc informații sensibile pentru clienți, ceea ce ridică probleme de încredere din partea acestora, reprezentând un alt punct slab.Most of the above methods (with the exception of US2003212760A1) consider only the current page as a browsing history, without referring to the previous browsing history, and this very short history does not allow to obtain a good accuracy. Another disadvantage is that these methods use the information on the current page to predict only the next page that will be accessed and not a certain navigation. Most of the time, the prediction is based on parsing the current page to extract existing links to other web pages and the prediction considers that these links can continue to be browsed, which requires more processing time to extract links from the page and hence another disadvantage. Exceptions are US2003212760A1, US6622168B1. Some of the mentioned methods use sensitive information for clients, which raises trust issues from them, representing another weakness.

în plus, metoda folosită în US2003212760A1, care apare ca excepție în lista de mai sus, are dezavantajul că are nevoie pentru a funcționa de o listă lungă de informații pentru care se păstrează istoria (uri, moment de acces al acestuia, durata de timp petrecută pe fiecare pagină, modul în care s-a ajuns la un anumit uri: în urma unui click pe un link, prin căutare folosind un motor sau direct scriind url-ul în browser, ce motor de căutare a fost folosit, cuvintele căutate, rezultatele obținute, pe care din rezultate întoarse de motorul de căutare s-a dat click) ceea ce înseamnă că, la implementarea pe un dispozitiv mobil, fie toată această informație amplă e salvata pe dispozitivul mobil (ocupând memoria) și atunci prelucrarea se face local (cu impact asupra duratei de viață a bateriei și a performanței dispozitivului), fie, dacă informația e transmisă pe un server, se va realiza mult trafic de date, cu impact negativ asupra duratei de viață a bateriei și a performanței dispozitivului, inclusiv a vitezei de navigare.In addition, the method used in US2003212760A1, which appears as an exception in the list above, has the disadvantage that it needs to operate a long list of information for which the history (s), its access time, time spent on each page, the way in which it reached a certain number: following a click on a link, through a search using a motor or directly writing the url in the browser, what search engine was used, the words searched, the results obtained, which of the results returned by the search engine was clicked) which means that, when implemented on a mobile device, all this vast information is saved on the mobile device (occupying memory) and then processing is done locally (with impact on the duration battery life and device performance), or, if the information is transmitted on a server, there will be a lot of data traffic, with a negative impact on the service life ab. landing gear and device performance, including browsing speed.

X*—-1II 15, - Ο Β 6 Ο 7 - >X * —- 1II 15, - Ο Β 6 Ο 7 ->

1 -08-20151 -08-2015

Un alt dezavantaj major este faptul că, de multe ori, predicția pentru un utilizator se face pe baza acțiunilor anterioare ale altor utilizatori, deci nu este individualizat pentru fiecare utilizator în parte, ci ține cont de acțiunile majorității.Another major disadvantage is that, often, the prediction for a user is made based on the previous actions of other users, so it is not individualized for each user, but takes into account the actions of the majority.

Problema tehnică pe care o rezolvă invenția este accelerarea și simplificarea navigației, prin oferirea, la momentul potrivit de timp, de sugestii pertinente, ușor de accesat, care este benefică indiferent de utilizator și de platforma folosită, dar este foarte importantă în cazul folosirii dispozitivelor mobile (telefoane, tablete). în acest caz, accesarea unor pagini conținând informația dorită la un moment dat se poate face prin folosirea motoarelor de căutare, dar aceasta presupune introducerea manuală a mai multor termeni, și eventual repetarea operației de câteva ori, până la găsirea a ceea ce se caută. Prin metoda propusă, aceasta etapă este eliminată, utilizatorul putând primi direct sugestii de pagini aparținând categoriei prezise. în plus, prin felul în care este concepută, metoda poate fi cu ușurință implementată, într-o manieră optimizată, atât pe dispozitive mobile (cu minimizarea puterii consumate și un volum redus de trafic de date suplimentar), cât și pe calculatoare personale.The technical problem solved by the invention is the acceleration and simplification of navigation, by offering, at the right time, relevant suggestions, easy to access, which is beneficial to both the user and the platform used, but is very important when using mobile devices. (phones, tablets). In this case, accessing pages containing the information you want at a given time can be done by using search engines, but this involves manually entering several terms, and possibly repeating the operation several times, until finding what you are looking for. By the proposed method, this step is eliminated, the user being able to directly receive page suggestions from the predicted category. In addition, by the way it is designed, the method can be easily implemented, in an optimized manner, both on mobile devices (with minimized power consumption and a reduced volume of additional data traffic), as well as on personal computers.

Metoda de predicție, conform invenției, rezolvă problema tehnică enunțată și înlătură dezavantajele menționate, prin aceea că furnizează, cu probabilitate crescută și într-o manieră eficientă, intenția de navigare a unui utilizator, reprezentată prin categoriile paginilor de internet ce urmează să fie vizitate de către acesta, destinată utilizării pe calculatoare personale și dispozitive mobile (telefoane, tablete), în scopul accelerării și simplificării navigației, prin oferirea, la momentul potrivit de timp, de sugestii pertinente de pagini web, ușor de accesat de către utilizator. în acest fel, este posibilă preîncărcarea unei serii de pagini din această categorie și oferirea lor ca opțiuni de continuare a sesiunii de navigare, astfel permițând accelerarea și simplificarea navigației.The prediction method, according to the invention, solves the mentioned technical problem and removes the mentioned disadvantages, by providing, with high probability and in an efficient way, the intention of navigating a user, represented by the categories of the web pages to be visited by for this purpose, intended for use on personal computers and mobile devices (phones, tablets), in order to speed up and simplify navigation, by offering, at the right time, relevant suggestions of web pages, easily accessible by the user. In this way, it is possible to preload a series of pages in this category and offer them as options to continue the browsing session, thus allowing to speed up and simplify navigation.

Metoda de predicție, conform invenției, constă în următoarele fazele:The method of prediction, according to the invention, consists of the following phases:

a. Pre-procesare, în care, pornind de la secvența curentă de navigare a utilizatorului și de la istoricul său de navigare se crează imul sau mai mulți vectori aciclici cu categoriile paginilor vizitate, în ordinea vizitărilor;a. Pre-processing, in which, starting from the user's current browsing sequence and from its browsing history, the imic or more acyclic vectors with the categories of pages visited are created, in order of visits;

b. Procesare, în care se prezice categoria următoarei categorii de pagini de internet ce va fi accesată de utilizator. Această predicție se face combinând două tehnici diferite de predicție: una bazată pe folosirea unui procedeu de recomandare, iar cealaltă, o metodă statistică bazată pe compararea probabilităților diferitelor categorii de pagini de a fi accesate; și folosind trei factori diferiți pentru realizarea predicției: istoria navigării utilizatorului, momentul actual de timp și ritmul curent de navigare al utilizatorului;b. Processing, in which the category of the next category of web pages to be accessed by the user is predicted. This prediction is done by combining two different prediction techniques: one based on the use of a recommendation process, and the other, a statistical method based on comparing the probabilities of the different categories of pages to be accessed; and using three different factors to make the prediction: the user's browsing history, the current time point and the user's current browsing rate;

c. Adăugarea informației pre-procesate în istoricul de navigare al utilizatorului.c. Adding pre-processed information to the user's browsing history.

Metoda de predicție prezentată, conform invenției, are următoarele avantaje:The prediction method presented according to the invention has the following advantages:

- Are o precizie ridicată, deoarece:- It has high accuracy because:

o Este aplicată fiecărui utilizator în parte, fiind bazată pe prelucrarea informației de navigare specifică acestuia și nu prin medierea informațiilor provenite de la un grup mare de utilizatori;o It is applied to each user individually, being based on the processing of the navigation information specific to it and not by mediating the information coming from a large group of users;

o Combină două tehnici diferite de predicție: una bazată pe folosirea unui procedeu de recomandare, iar cealaltă, o metodă statistică bazată pe compararea probabilităților diferitelor categorii de pagini de a fi accesate;o It combines two different prediction techniques: one based on the use of a recommendation process, and the other, a statistical method based on comparing the probabilities of the different categories of pages to be accessed;

^‘ 2 0 1 5 ,- 0 0 6 0 7 - kt 2 1 -08-2015 ^V o Combină trei factori diferiți pentru realizarea predictiei: istoria navigării, momentul actual de timp și ritmul curent de navigare.^ '2 0 1 5, - 0 0 6 0 7 - kt 2 1 -08-2015 ^V o Combines three different factors to make the prediction: the history of navigation, the current moment of time and the current pace of navigation.

- Folosește doar informații non-intruzive, puțin sensibile, adică momentul navigării, pagina de web accesată și categoria acesteia, și nu informații despre gen, vârstă, venit, pasiuni, locație, etc.- Use only non-intrusive, slightly sensitive information, ie browsing time, web page accessed and its category, and not information about gender, age, income, passions, location, etc.

- Necesită ca informație de intrare doar momentul navigării, url-ul paginii accesate și categoria ei, deci permite cu ușurință implementarea optimă pe dispozitive mobile, în cazul cărora metoda e implementată pe server și primește aceste informații, fără impact important asupra traficului de date, duratei de viață a bateriei și performanțelor dispozitivului mobil.- Requires as input information only the time of navigation, the url of the page accessed and its category, so it allows easy implementation on mobile devices, where the method is implemented on the server and receives this information, without significant impact on data traffic, battery life and mobile device performance.

Invenția este prezentată în legătură cu următoarele figuri, care reprezintă:The invention is presented in relation to the following figures, which represent:

- fig. 1, principalele etape ale procedeului vizat de acest patent;FIG. 1, the main stages of the process covered by this patent;

- fig. 2, diagrama de flux a procesării implicate de procedeul vizat de acest patent;FIG. 2, the flow diagram of the processing involved by the process covered by this patent;

- fig. 3, diagrama de flux a determinării categoriilor probabile în funcție de diferite momente de timp;FIG. 3, the flow chart of determining the probable categories according to different time points;

- fig. 4, diagrama de flux a determinării categoriilor probabile la momentul curent de timp;FIG. 4, the flow chart of determining the probable categories at the current moment of time;

- fig. 5, diagrama de flux a determinării categoriilor probabile în funcție de ritmul de navigare;FIG. 5, the flow chart of determining the probable categories according to the navigation rhythm;

- fig. 6, diagrama de flux pentru determinarea mediei și deviaței ritmului de navigare a utilizatorului;FIG. 6, flow chart for determining the average and the deviation of the user's navigation rate;

- fig. 7, diagrama de flux a determinării categoriilor probabile conform ritmului de navigare curent;FIG. 7, the flow chart of determining the probable categories according to the current navigation rhythm;

- fîg. 8, diagrama de flux pentru determinarea probabilităților de a continua secvența începută cu diferite categorii.- fig. 8, the flow chart for determining the probabilities of continuing the sequence started with different categories.

în Fig. 1 este prezentată arhitectura metodei de predicție propuse, conform invenției. Astfel, modulul de procesare 4 va primi, pentru fiecare utilizator în parte, secvența curentă de navigare 1 ce conține paginile vizitate, momentele de timp ale vizitelor și categoriile acestor pagini. Momentul de timp al vizitelor ne va ajuta să evaluăm doi dintre factorii menționați anterior: momentul de timp la care se face navigarea și ritmul de navigare al utilizatorului, în timp ce paginile vizitate ne dau istoricul utilizatorului, cel de-al treilea factor amintit.in FIG. 1 shows the architecture of the proposed prediction method, according to the invention. Thus, the processing module 4 will receive, for each individual user, the current navigation sequence 1 containing the pages visited, the time points of the visits and the categories of these pages. The time of visits will help us evaluate two of the factors mentioned above: the time of navigation and the user's browsing pace, while the pages visited give us the user's history, the third factor mentioned.

Datorită faptului că paginile web vizitate de utilizatori au o foarte mare diversitate, factor ce împiedică învățarea, vom reduce această diversitate prin înlocuirea paginilor web cu categoriile din care aceastea fac parte, motiv pentru care această informație este de asemenea reținută în vederea procesării.Due to the fact that the web pages visited by the users have a great diversity, which impedes learning, we will reduce this diversity by replacing the web pages with the categories they belong to, which is why this information is also retained for processing.

în plus, datorită faptului că o secvență tipică de navigare poate conține cicluri” - un utilizator se poate întoarce la un anumit subiect (categorie de pagină) vizionat(ă) în trecut în cadrul aceleiași secvențe -, în timp ce pentru învățare avem nevoie de secvențe aciclice (încercăm să învățăm indicele unei anumite categorii într-o secvență și drept urmare acesta nu poate avea decât o singură valoare), acestea vor fi pre-pocesate 3 pentru eliminarea eiclicității.in addition, due to the fact that a typical browsing sequence may contain cycles "- a user may return to a specific topic (page category) viewed in the past within the same sequence -, while for learning we need acyclic sequences (we try to learn the index of a certain category in a sequence and as a result it can have only one value), they will be pre-processed 3 to eliminate the eiclicicity.

La modulul de procesare 4, pe lângă informațiile pre-procesate legate de secvența curentă de navigare, ajunge și istoricul utilizatorului 2, conținând secvențe anterioare de navigare.At processing module 4, in addition to the pre-processed information related to the current navigation sequence, the user's history 2 also comes, containing previous navigation sequences.

^“2015,-00607- $( l 1 -08-2015^ “2015, -00607- $ (l 1 -08-2015

Pe baza acestor informații, metoda identifică categoriile probabile spre a fi vizitate de către utilizator având in vedere momentul de timp actual Ol, categoriile cel mai probabil să fie vizitate conform ritmului de navigare curent al acestuia 02, categoriile cel mai probabil să fie vizitate conform istoricului de navigare 03 și predicția categoriei paginii următoare ce va fi vizitată de către client ținând cont de toți factorii considerați în analiză 04.Based on this information, the method identifies the probable categories to be visited by the user considering the current time Ol, the categories most likely to be visited according to its current browsing rhythm 02, the categories most likely to be visited according to history of navigation 03 and the prediction of the category of the next page to be visited by the customer, taking into account all the factors considered in the analysis 04.

In continuare, se verifică dacă secvența de navigare este sau nu încheiată 5. In cazul în care nu s-a detectat încheierea secvenței de navigare, întregul proces este reluat în vederea continuării analizei și predicției 10, 11, 12, 13. în caz contrar, secvența trebuie salvată în istoricul utilizatorului. Pentru aceasta, mai întâi este determinat spațiul necesar stocării secvenței curente 6. Datorită faptului că memoria asignată acestui procedeu poate fi limitată, înaintea salvării secvenței se face verificarea dacă există suficient spațiu disponibil sau nu 7.Next, check whether or not the navigation sequence is completed 5. If the navigation sequence has not been detected, the whole process is resumed in order to continue the analysis and prediction 10, 11, 12, 13. otherwise, the sequence must be saved in the user's history. To do this, first determine the space needed to store the current sequence 6. Due to the fact that the memory allocated to this process can be limited, before saving the sequence, check whether there is sufficient space available or not 7.

Dacă există suficient spațiu disponibil, atunci secvența este adaugată la istoric 9 și metoda își încheie funcționarea. Dacă nu există suficient spațiu, atunci mai întâi sunt șterse o serie de intrări deja existente în istoric 8 pentru a se crea spațiul necesar, iar după aceea se stochează noua secvență în spațiul recent eliberat 9. Ștergerea secvențelor are scopul pe de-o parte de a menține istoricul 2 în limite rezonabile din punctul de vedere al dimensiunii ocupate, iar pe de altă parte, de a furniza predicții cât mai apropiate de comportamentul actual al utilizatorului curent.If enough space is available, then the sequence is added to log 9 and the method ends its operation. If there is not enough space, then a series of entries already existing in history 8 are deleted first to create the necessary space, and then the new sequence is stored in the newly released space 9. The deletion of the sequences has the purpose on the one hand. to keep history 2 within reasonable limits from the point of view of the occupied dimension and, on the other hand, to provide predictions as close as possible to the current behavior of the current user.

Având (pentru fiecare utilizator) secvențe de navigare curente 1, în cadrul procesării 4 vom folosi câte un vector și o matrice independente pentru fiecare utilizator în parte (vectorul reprezentând secvența de navigare curentă 1, iar matricea istoricul utilizatorului 2). Matricea ce conține istoricul utilizatorului 2 va avea pe cele două dimensiuni următoarele informații:Having (for each user) current navigation sequences 1, in processing 4 we will use an independent vector and matrix for each individual user (the vector representing the current navigation sequence 1, and the user history matrix 2). The matrix containing the history of user 2 will have on the two dimensions the following information:

secvențele de pagini web vizitate de către utilizatorul respectiv în trecut și - categoriile posibile pentru diferitele pagini web.the web page sequences visited by the respective user in the past and - the possible categories for the different web pages.

Informațiile salvate în matrice vor reprezenta indicii din secvență ai diferitelor categorii de pagini vizitate. Cu alte cuvinte, în matrice vom surprinde ordinea în care utilizatorul a vizitat diverse categorii de pagini. Datorită faptului că în astfel de matrici nu se poate salva decât o singură valoare pentru fiecare tuplu (în cazul nostru <secvență, categorie din secvențâ>) în timp ce o secvență tipică poate conține cicluri” - un utilizator se poate întoarce la un anumit subiect (categorie de pagină) vizionat(ă) în trecut în cadrul aceleiași secvențe -, trebuie să avem grijă ea fiecare secvență din matrice să fie aciclică. Drept urmare, este nevoie de o preprocesare a secvențelor de navigare inițiale 1 astfel încât să se poată salva toate informațiile asociate lor (chiar dacă există cicluri). Astfel, secvența de navigare curentă 1 este preprocesată și transformată în unul sau mai mulți vectori de navigare curentă. Dacă se obțin mai mulți astfel de vectori, înseamnă că au fost identificate cicluri și în acest caz nu se menține în vederea procesării 4 decât ultimul vector. Ceilalți vectori vor fi salvați 9 în matricea reprezentând istoricul utilizatorului 2 dacă există suficient spațiu de stocare 7, sau dacă nu, înainte de această operație, sunt eliminate 8 o serie de coloane din acea matrice de istoric 2 pentru a face loc pentru salvarea secvenței curente 9. Odată finalizată procesarea 4 vectorului de navigare curentă, același lucru se întâmplă și cu acest vector: se adaugă 9 în matricea de istoric 2 dacă este posibil 7, iar dacă nu, atunci mai întâi se elimină 8 niște coloane din istoric 2 și apoi este salvat vectorul respectiv 9.The information stored in the matrix will represent sequence indexes of the different categories of pages visited. In other words, in the matrix we will capture the order in which the user visited different categories of pages. Due to the fact that in such matrices, only one value can be saved for each tuple (in our case <sequence, sequence category>) while a typical sequence may contain cycles "- a user may return to a particular topic (page category) viewed in the past within the same sequence - we must take care of each sequence in the matrix to be acyclic. As a result, it is necessary to preprocess the initial navigation sequences 1 so that all associated information can be saved (even if there are cycles). Thus, the current navigation sequence 1 is preprocessed and transformed into one or more current navigation vectors. If more such vectors are obtained, it means that cycles have been identified and in this case it is not maintained for processing 4 than the last vector. The other vectors will be saved 9 in the array representing user history 2 if there is enough storage space 7, or if not, before this operation, 8 a series of columns in that history array 2 are removed to make room for the current sequence save 9. Once processing 4 of the current navigation vector is completed, the same thing happens with this vector: 9 is added to the history matrix 2 if possible 7, and if not, then first 8 columns from history 2 are removed and then the respective vector 9 is saved.

“2 O 1 5 ,- 0 0 6 0 7 2 1 -08- 2015“2 O 1 5, - 0 0 6 0 7 2 1 -08- 2015

Algoritmul de pre-procesare 3, realizează, conform metodelor cunoscute, transformarea secvenței curente de navigare într-o serie de vectori aciclici, conținând categoriile paginilor vizitate (obținute de la un serviciu existent). De asemenea, se adaugă trei categorii fictive (start, continue, stop) în scopul delimitării sub-secvențelor.The pre-processing algorithm 3, realizes, according to the known methods, the transformation of the current navigation sequence into a series of acyclic vectors, containing the categories of the pages visited (obtained from an existing service). Also, three fictitious categories are added (start, continuous, stop) for the purpose of delimiting the sub-sequences.

In urma pre-procesării 3 se obține vectorul ce trebuie procesat. Detaliile procesării 4 sunt prezentate în Fig. 2. Procesarea 4 implică, pe de-o parte informațiile referitoare la vectorul curent de navigare, iar pe de alta, informațiile referitoare la istoricul de navigare al utilizatorului 2. Istoricul de navigare 2 este folosit pentru determinarea categoriilor probabile spre a fi vizitate de utilizator în funcție de diverse momente de timp 14, precum și pentru determinarea categoriilor probabile spre a fi vizitate de acesta în funcție de ritmul său de navigare 15. Odată obținute aceste informații, ele se combină cu vectorul de navigare curent pentru a determina care sunt categoriile probabile spre a fi vizitate la momentul curent de timp 10, respectiv categoriile probabile conform ritmului de navigare curent 11. De asemenea, vectorul de navigare curent și istoricul de navigare al utilizatorului 2 sunt folosite într-un modul de determinare a probabilității de continuarea a secvenței curente 12.After pre-processing 3, the vector to be processed is obtained. The details of processing 4 are shown in Fig. 2. Processing 4 involves, on the one hand, information on the current navigation vector, and on the other, information on the user's browsing history 2. Navigation history 2 is used to determine the probable categories to be visited by the user. according to various time points 14, as well as for determining the probable categories to be visited by it according to its navigation rate 15. Once this information is obtained, they are combined with the current navigation vector to determine which are the probable categories. to be visited at the current time of time 10, respectively the probable categories according to the current navigation rhythm 11. Also, the current navigation vector and the user's browsing history 2 are used in a module to determine the probability of continuation of the current sequence. 12.

Odată obținute rezultatele de la modulele de determinare categorii probabile la momentul de timp curent 10 (Ol), de determinare categorii probabile conform ritmului de navigare curent 11 (02) și de determinare probabilitate de continuare a secvenței 12 (03), acestea sunt combinate pentru a obține un singur set de valori pentru fiecare categorie în parte 16. în final, pe baza acestor valori se va face predicția categoriei următoarei pagini ce va fi vizitată de către utilizator 04 în cadrul modulului de predicție categorie 13.Once the results have been obtained from the modules for determining probable categories at the current time 10 (Ol), for determining probable categories according to the current navigation rate 11 (02) and for determining the probability of continuation of sequence 12 (03), they are combined for to obtain a single set of values for each category in part 16. finally, based on these values, the prediction of the category of the next page will be made, which will be visited by the user 04 within the category 13 prediction module.

în fig. 3 este detaliat modulul de determinare a categoriilor probabile în funcție de diferite momente de timp 14. în cadrul acestuia, mai întâi se construiește o matrice (M) având categoriile de pagini pe linie și sloiurile de timp pe coloane 17 și se inițializează cu 0 fiecare intrare din matrice 18. în acest context, un slot de timp poate avea mai multe granularități, in funcție de cantitatea de date disponibilă și de necesitatea de predicție. în continuare este analizat istoricul utilizatorului 2 și pentru fiecare intrare în parte 19 se determină momentul de timp la care a fost accesată categoria respectivă 20, iar apoi se încadrează momentul respectiv de timp într-unul din slot-urile de timp definite 21. Pe baza acestei încadrări se va incrementa elementul din matrice (Mjj) corespunzător categoriei i și sloiului de timp din care face parte aceasta (j) 22. Odată încheiată analiza întregului istoric de navigare al utilizatorului 2 se pot determina probabilitățile fiecărei categorii i pentru fiecare slot de timp j - p(i|j) - (prin împărțirea lui M;j la Lj Mjj) 23, precum și probabilitățile fiecărui slot de timp j - p(j) - (prin împărțirea Σι Mjj la Ση Mjj) 24.in FIG. 3 is detailed the module for determining the probable categories according to different time points 14. Within it, first a matrix (M) is constructed with the categories of pages per line and the time slots on columns 17 and initialized with 0 each matrix entry 18. In this context, a time slot may have more granularities, depending on the amount of data available and the need for prediction. The user history 2 is analyzed below and for each entry in part 19, the time point at which the respective category 20 was accessed is determined, and then the respective time point is included in one of the defined time slots 21. Based on this classification will increase the element in the matrix (Mjj) corresponding to the category i and the time slot to which it belongs (j) 22. Once the analysis of the entire user browsing history 2 has been completed, the probabilities of each category i can be determined for each time slot j - p (i | j) - (by dividing M; j by Lj Mjj) 23, as well as the probabilities of each time slot j - p (j) - (by dividing Σι Mjj by Ση Mjj) 24.

pțcategorie i | slot j) =category i | slot j) =

Mgmg

IjMt, p(,SÎOtj') ZtMțj l/M,,IjMt, p (, SÎOtj ') ZtMțj l / M ,,

Pe baza acestor probabilități se determină în continuare categoriile cele mai probabile să fie accesate la momentul actual de timp Ol în cadrul modulului de determinarea categoriilor probabile la momentul curent de timp 10. Diagrama acestui proces este prezentată în fig. 4. Odată obținute probabilitățile fiecărei categorii i pentru fiecare slot de timp j 23, aceste valoriBased on these probabilities, the categories most likely to be accessed at the present time of time Ol are determined in the module for determining the probable categories at the current moment of time 10. The diagram of this process is shown in fig. 4. Once the probabilities of each category i for each time slot j 23 are obtained, these values

A- -2015,-006072 1 -08- 2015 trebuie filtrate astfel încât sâ rămână numai acele valori specifice pentru slotul curent. Drept urmare, mai întâi se determină momentul de timp al ultimei categorii accesate 25 pe baza vectorului de navigare curent, după care se încadrează secvența curentă într-unul din sloiurile de timp definite 26. Având identificat slotul de timp curent, vom extrage din probabilitățile identificate pentru fiecare categorie i și pentru fiecare slot de timp j 23 doar acele probabilități care se întâlnesc în slotul curent de timp 10. Aceste probabilități sunt furnizate drept ieșirea Ol a metodei.A- -2015, -006072 1 -08- 2015 must be filtered so that only those specific values for the current slot remain. As a result, we first determine the time of the last category accessed 25 based on the current navigation vector, and then the current sequence falls into one of the defined time slots 26. Having identified the current time slot, we will extract from the identified probabilities. for each category i and for each time slot j 23 only those probabilities that meet in the current time slot 10. These probabilities are provided as the output of the method Ol.

în fig. 5 este prezentat în detaliu blocul de determinare a categoriilor probabile spre a fi vizitate în funcție de ritmul de navigare 15. Pentru aceasta, mai întâi se construiește o matrice având pe linii categoriile de pagini disponibile, iar pe coloane ritmurile de navigare identificate. De exemplu, au fost considerate 4 ritmuri de navigare diferite, folosind media și abaterea standard (deviația) ale ritmului secvențelor anterioare ale utilizatorului calculate pe baza istoricului acestuia 2. Totuși, în funcție de necesități, se poate defini un număr oricât de mare de astfel de ritmuri, folosind și alte metrici în afară de cele două sugerate mai sus. Cele 4 ritmuri de navigare considerate sunt:in FIG. 5 is presented in detail the block for determining the probable categories to be visited according to the navigation rhythm 15. To do this, first a matrix is constructed having on line the categories of available pages, and on the columns the identified navigation rhythms. For example, 4 different browsing rhythms were considered, using the average and the standard deviation (deviation) of the rhythm of the previous sequences of the user calculated on the basis of his / her history 2. However, depending on the needs, a large number of such can be defined of beats, using other metrics besides the two suggested above. The 4 navigation rates considered are:

1. Timp mediu mare între categorii, deviație mare -> navigare cu pauze (Tip 1)1. High average time between categories, high deviation -> pause navigation (Type 1)

2. Timp mediu mare între categorii, deviație mică -> navigare încetinită (Tip 2)2. High average time between categories, small deviation -> slow navigation (Type 2)

3. Timp mediu mic între categorii, deviație mare -> navigare haotică (Tip 3)3. Small average time between categories, large deviation -> chaotic navigation (Type 3)

4. Timp mediu mic între categorii, deviație mică -> navigare accelerată (Tip 4)4. Low average time between categories, small deviation -> accelerated navigation (Type 4)

Odată construită matricea cu categoriile de pagină pe linii și cele 4 ritmuri de navigare pe coloane, aceasta se inițializează cu valoarea 0 28. în continuare se determină media și deviația ritmului de navigare a utilizatorului pentru fiecare intrare din fiecare secvență din istoricul de navigare 29. Acest modul va fi detaliat în fig. 6. Mai departe, pentru fiecare intrare din istoricul de navigare 30 trebuie determinat dacă intrarea respectivă este sau nu prima intrare din secvență 31, deoarece pentru prima intrare nu se poate determina ritmul de navigare întrucât se consideră că nu există o intrare anterioară și atunci pentru aceasta nu avem cum să calculăm media sau deviația ritmului de navigare. Dacă este prima intrare 31 da, atunci se trece la următoarea intrare (nu se calculează nimic) 32. Altfel, 31 Nu, se extrage media și deviația ritmului de navigare curente ale utilizatorului 33 calculate deja la pasul 29. Odată obținute aceste valori, ele sunt comparate cu 2 praguri (unul pentru medie 34 și altul pentru deviație 35, 36). Valorile celor 2 praguri pot fi determinate euristic, pot fi alese pseudo-aleator sau pot fi determinate pe baza istoricului utilizatorului 2.Once the matrix with the page categories on the lines and the 4 column browsing rhythms is constructed, it is initialized with the value 0 28. The average and the deviation of the user's navigation rate are determined for each entry in each sequence in the browsing history 29. This module will be detailed in FIG. 6. Further, for each entry in the navigation history 30 it must be determined whether or not the respective entry is the first entry in the sequence 31, because for the first entry the navigation rate cannot be determined as it is considered that there is no previous entry and then for we have no way to calculate the average or the deviation of the navigation rate. If it is the first entry 31 yes, then it is passed to the next entry (nothing is calculated) 32. Otherwise, 31 No, the average and the deviation of the current navigation rate of the user 33 already calculated in step 29. Once these values are obtained, they are are compared with 2 thresholds (one for mean 34 and one for deviation 35, 36). The values of the 2 thresholds can be determined heuristically, can be chosen pseudo-randomly or can be determined based on the user's history 2.

în cazul în care valorile sunt alese pseudo-aleator, se pot considera valori recunoscute ca fiind valide pentru diferite ritmuri de navigare (de exemplu, media timpului petrecut pe o pagină web - adică întârzierea dintre 2 pagini consecutive - este de 15 secunde iar deviația este de 3 secunde). Avantajul acestei metode este că nu necesită cunoștințe prea multe despre utilizator sau despre paginile vizitate și se poate aplica imediat ee un nou utilizator a fost înregistrat. Totuși, o astfel de metodă nu poate funcționa eficient pentru toată lumea și toate genurile de pagini web; ea trebuie adaptată diferitelor situații întâlnite în practică.If the values are chosen pseudo-randomly, values recognized as valid for different browsing rhythms can be considered (for example, the average time spent on a web page - ie the delay between 2 consecutive pages - is 15 seconds and the deviation is of 3 seconds). The advantage of this method is that it does not require too much knowledge about the user or the pages visited and can apply immediately a new user has been registered. However, such a method cannot work effectively for everyone and all kinds of web pages; it must be adapted to the different situations encountered in practice.

Dacă pragurile se determină pe baza istoricului utilizatorului 2, atunci se pot calcula valorile mediei și deviaței întârzierii de navigare și apoi aceste valori să fie considerate drept praguri. Avantajul acestei metode este că valorile sunt adaptate foarte bine utilizatorului.If the thresholds are determined based on user history 2, then the values of the mean and the deviation of the navigation delay can be calculated and then these values are considered as thresholds. The advantage of this method is that the values are very well adapted to the user.

.λ,-2 Ο 1 5 BD 6 0 7 - λ/ω 2 1 -08-2015 ^F în funcție de valorile celor 2 praguri și de comparațiile mediei 34 și deviației 35,36 cu aceste praguri, se stabilește ritmul curent de navigare ea făcând parte din unul din cele 4 tipuri și apoi se incrementează valoarea corespunzătoare categoriei curente și ritmului curent de navigare 37, 38,39, 40. Mai departe, se verifică dacă s-au terminat de prelucrat toate intrările 41. Dacă mai sunt intrări de prelucrat 41 Nu, atunci se trece la următoarea intrare 32 și procesul se reia. în caz contrar 41 Da se pot calcula probabilitățile fiecărei categorii i pentru fiecare ritm de navigare j (p(i|j)) prin împărțirea valorii lui My la Zj Mij 42. De asemenea, tot aici se pot determina probabilitățile fiecărui ritm de navigare j (p(j)) prin împărțirea Zi My la Eij My..λ, -2 Ο 1 5 BD 6 0 7 - λ / ω 2 1 -08-2015 ^F depending on the values of the 2 thresholds and the comparisons of the average 34 and the deviation 35,36 with these thresholds, the current navigation rate is established it is part of one of the 4 types and then the value corresponding to the current category and the current navigation rate 37, 38.39, 40 is increased. Further, it is checked whether all entries 41. have been processed. to be processed 41 No, then proceed to the next entry 32 and the process resumes. otherwise 41 Yes you can calculate the probabilities of each category i for each navigation rhythm j (p (i | j)) by dividing the value of My by Zj Mij 42. Also, here you can determine the probabilities of each navigation rhythm j (p (j)) by dividing Day My by Eij My.

pțcategorie i | ritm de browsing j)category i | browsing rate j)

Mg p(ritm de browsing j) = =—r/în fig. 6, așa cum s-a menționat deja, este descrisă modalitatea de calcul a mediei și deviației ritmului de navigare curente ale utilizatorului 29. Aceste calcule se pot realiza considerând întreaga secvență de navigare a utilizatorului sau numai ultima parte a acestei secvențe, având în vedere faptul că ritmul de navigare al utilizatorului se poate modifica pe parcursul unei secvențe de navigare în funcție de diverși factori, cum ar fi tipul de informație căutată sau posibila plictiseală a utilizatorului ce poate interveni după un anumit timp. Drept urmare, înainte de a începe calculul, se stabiliește cantitatea de informații referitoarea la navigarea anterioră folosite pentru aceste calcule, adică fereastra din care extragem informațiile legate de vizitele anterioare ale utilizatorului din secvența curentă. Această fereastră poate fi definită fie ca întreaga secvență curentă a utilizatorului (nemaiținând cont de modificările de ritm ce pot apare în interiorul secvenței), fie ca cea mai recentă parte a acestei secvențe, ce poate fi definită la rândul ei fie ca interval de timp (de exemplu, vizitele din ultimele 30 de secunde, ultimele 5 minute sau ultima oră față de momentul curent), fie ca număr de categorii de pagini vizitate (de exemplu, ultimele 3, 5, 10 categorii de pagini vizitate care fac parte din secvența curentă), fie ca o combinație a acestor doi factori (număr de categorii de pagini vizitate dintr-un anumit interval de timp: de exemplu, ultimele 10 categorii vizitate în ultimele 5 minute). în fig. 6 este detaliată varianta ce ține cont de un număr fix de categorii de pagini vizitate (dimensiunea ferestrei, window_size) pentru a evidenția ce probleme ridică o astfel de abordare, celelalte variante reprezentând simple adaptări ale acestei abordări. Astfel, pentru prima secvență j (= 0) din istoricul de navigare 43, odată stabilită dimensiunea ferestrei (window_size), se vor crea doi vectori având un număr de componente egal cu dimensiunea ferestrei (window_size): t și delta 44. Vectorul t reprezintă vectorul timpilor de accesare a diferitelor categorii de pagini, în timp ce vectorul delta reprezintă diferența de timp dintre accesarea curentă și cea anterioară. De asemenea, tot în acest pas se vor construi alți doi vectori m și dev având un număr de componente egal cu dimensiunea secvenței. M reprezintă vectorul mediilor diferențelor de timp dinaintea fiecărei intrări din secvența j (ținând cont de dimensiunea ferestrei și de intrările care se încadrează în această fereastră), iar dev reprezintă deviațiile standard ale diferențelor de timp calculate pentru aceleași intrări. Dev se va calcula pe baza vectorilor m și delta, conform formulei:Mg p (browsing rate j) = = —r / in fig. 6, as already mentioned, describes how to calculate the average and the deviation of the current navigation rate of the user 29. These calculations can be performed considering the entire user navigation sequence or only the last part of this sequence, given that the user's browsing rate may change during a navigation sequence depending on various factors, such as the type of information sought or possible boredom of the user that may occur after a certain time. As a result, before starting the calculation, the amount of information related to the previous navigation used for these calculations is determined, that is, the window from which we extract the information related to the previous visits of the user from the current sequence. This window can be defined either as the entire current sequence of the user (regardless of the rhythm changes that may occur within the sequence), or as the most recent part of this sequence, which can be defined in turn or as a time interval ( for example, visits from the last 30 seconds, the last 5 minutes, or the last hour from the current time, either as the number of page categories visited (for example, the last 3, 5, 10 categories of pages visited that are part of the current sequence ), or as a combination of these two factors (number of page categories visited in a given time frame: for example, the last 10 categories visited in the last 5 minutes). in FIG. 6 is detailed the variant that takes into account a fixed number of categories of pages visited (window size, window_size) to highlight what problems such an approach raises, the other variants representing simple adaptations of this approach. Thus, for the first sequence j (= 0) in navigation history 43, once the window size (window_size) is established, two vectors will be created having a number of components equal to the window size (window_size): t and delta 44. The vector t represents the vector of the times for accessing different categories of pages, while the delta vector represents the time difference between the current and the previous access. Also, in this step, two other m and dev vectors will be constructed, having a number of components equal to the sequence size. M represents the mean vector of the time differences before each entry in sequence j (taking into account the window size and the entries that fit into this window), and dev represents the standard deviations of the time differences calculated for the same entries. Dev will be calculated based on the vectors m and delta, according to the formula:

-^-21 15,-08 6 8 7 2 1 -08-2015 devi = %k(delta_k - m_t)² windowjsize -1 unde k reprezintă indicii intrărilor ce fac parte din fereastra aleasă și variază de la O la window size -1.- ^ - 21 15, -08 6 8 7 2 1 -08-2015 devi =% k (delta _k - m _t ) ² windowjsize -1 where k represents the indices of the entries in the selected window and varies from O to window size -1.

în afară de acești vectori, algoritmul de determinare a mediei și deviației de navigare curente ale utilizatorului 29 mai folosește trei variabile: s, reprezintă suma diferențelor de timp din fereastră și va fi folosită pentru a calcula media diferențelor de timp din fereastră: m; = s/window_size; i, reprezintă indicele elementului din secvență și este folosit pentru trecerea de la o intrare la alta și j care reprezintă indicele secvenței curente din istoricul utilizatoruluiIn addition to these vectors, the algorithm for determining the average user's current navigation and deviation 29 also uses three variables: s, represents the sum of the time differences in the window and will be used to calculate the average of the time differences in the window: m; = s / window_size; i, represents the index of the element in the sequence and is used for switching from one entry to another and j which represents the index of the current sequence in the user's history

2.2.

După construcția celor patru vectori (m, dev, t și delta) 44, aceștia sunt inițializați cu valoarea O și la fel se întâmplă și cu valorile lui s și i 45 Q a fost deja inițializat la pasul 43). în continuare, pentru fiecare intrare i 46, se determină momentul de timp la care a fost accesată categoria respectivă și se salvează această valoare în t[i%window_size] 47 unde operatorul % reprezintă operația de obținere a restului împărțirii. Cu alte cuvinte, cât timp valoarea lui i este mai mică decât dimensiunea ferestrei, momentele de timp se salvează chiar în ordinea în care au fost citite. Odată ajunsă valoarea lui i la valoarea dimensiunii ferestrei (i = window size), înseamnă că cea mai veche valoare din fereastră trebuie ștearsă pentru că intrarea respectivă nu mai face parte din fereastră (indicii din fereastră sunt cuprinși între O și window_size -1 și atunci când se găsește intrarea cu indicele window_size, înseamnă că cele mai recente window_size intrări sunt intrările din secvență cu valorile cuprinse între 1 și window_size). Drept urmare, informațiile aferente intrării curente trebuie să suprascrie datele referitoare la cea mai veche intrare din fereastră (intrarea cu indicele 0), iar acest lucru se realizează folosind operatorul modulo (%), întrucât restul împărțirii lui window size (indicele curent din secvență) la window size (dimensiunea ferestrei), este 0 și astfel informațiile referitoare la intrarea curentă vor suprascrie informațiile existente deja pe poziția 0 din vectorul t.After the construction of the four vectors (m, dev, t and delta) 44, they are initialized with the value O and the same happens with the values of s and i 45 Q was already initialized in step 43). Next, for each entry i 46, determine the time point at which the respective category was accessed and save this value in t [i% window_size] 47 where operator% represents the operation of obtaining the rest of the division. In other words, as long as its value is smaller than the size of the window, the moments of time are saved even in the order in which they were read. Once its value i reaches the value of the window size (i = window size), it means that the oldest value in the window must be deleted because that entry is no longer part of the window (the indexes in the window are between O and window_size -1 and then when the entry with window_size index is found, it means that the most recent window_size entries are the sequence entries with values between 1 and window_size). As a result, the information related to the current entry must overwrite the data related to the oldest entry in the window (entry with index 0), and this is done using the modulo operator (%), since the rest of the window size division (the current index in the sequence) at window size, it is 0 and so the information about the current input will overwrite the existing information at position 0 in the vector t.

Odată salvat momentul de timp al intrării curente, se verifică dacă această intrarea este prima din secvență 48, întrucât dacă este prima intrare nu se pot determina metricile necesare pentru ea. Dacă este prima intrare 48 Da, atunci se trece la următoarea intrare prin incrementarea indicelui i 53. Altfel 48 Nu, se verifică dacă valoarea lui i este sau nu mai mare decât window_size 49. Această verificare se face cu scopul optimizării calculului valorii s (suma diferențelor de timp din fereastră). Astfel, dacă valoarea lui i este mai mare decât valoarea window_size 49 Da vor trebui actualizate niște valori deja salvate cu noile valori calculate ca urmare a intrării curente. Una din aceste valori este s (suma diferențelor de timp din fereastră). Din valoarea lui s va trebui să se scadă prima valoare a întârzierii și în schimb să se adauge noua valoare calculată pentru intrarea curentă. Valoarea veche a întârzierii este eliminată folosind formula s = s - delta[i%window_size] 50. Mai departe, trebuie adăugată noua întârziere, dar pentru a putea face acest lucru, mai întâi trebuie calculată și salvată această nouă valoare, folosind formula delta[i%window_size]=t[i%window_size]-t[(il)%window_size] 51. Odată avută valoarea noii întârzieri, ea se adaugă la suma diferențelor de timp din fereastră (s): s = s + delta[i%window_size] 51. în acest moment se pot calculaOnce the time of the current entry is saved, it is checked whether this entry is the first in sequence 48, because if it is the first entry, the metrics required for it cannot be determined. If it is the first entry 48 Yes, then it goes to the next entry by increasing the index i 53. Otherwise 48 No, it is checked whether or not the value of i is greater than window_size 49. This check is done with the purpose of optimizing the calculation of the value s (sum time differences in the window). Thus, if its value i is greater than the value of window_size 49 Yes some values already saved with the new values calculated as a result of the current entry will have to be updated. One of these values is s (the sum of the time differences in the window). The value of s will have to subtract the first value of the delay and instead add the new value calculated for the current entry. The old delay value is eliminated using the formula s = s - delta [i% window_size] 50. Further, the new delay must be added, but in order to do so, this new value must first be calculated and saved using the delta formula [ i% window_size] = t [i% window_size] -t [(il)% window_size] 51. Once the value of the new delay is taken, it is added to the sum of the time differences in the window (s): s = s + delta [i% window_size] 51. at this time you can calculate

0 15,-00 6 0 7 - (Λ0 15, -00 6 0 7 - (Λ

1 -08- 7015 media și deviația ritmului de navigare curente folosind formulele de mai sus. Aceste valori se vor salva pe poziția i în vectorii m și dev.1 -08- 7015 average and current navigation rate deviation using the formulas above. These values will be saved at position i in the vectors m and dev.

Dacă în schimb valoarea lui i nu este mai mare decât valoarea windowsize 49 Nu, înseamnă că nu va trebui să actualizăm valori, ci doar să adăugăm noile informații care fac parte din fereastraă. Drept urmare se va sări peste pasul 50 și se va trece direct la pasul 51, în care se calculează și se salvează noua întârziere ca diferență de timp între intrarea curentă și cea anterioară, iar după aceea se calculează valoarea lui s și pe baza acesteia, valorile mediei și deviației de navigare de la momentul curent de timp. Având în vedere că valorile lui i sunt mai mici decât window_size, modulul (%) nu influențează operațiile realizate în pasul 51, astfel încât se pot trata unitar și eficient atât cazurile când i este mai mare decât window size cât și atunci când i este mai mic.If the value of i is not greater than the value of windowsize 49 No, it means that we will not have to update values, but only to add the new information that is part of the window. As a result, it will skip step 50 and proceed directly to step 51, in which the new delay is calculated and saved as the time difference between the current and the previous entry, and then its value s is calculated and based on it, the values of the mean and the navigation deviation from the current time point. Given that its i values are smaller than window_size, the module (%) does not influence the operations performed in step 51, so that both cases when i is larger than window size and when i is more little.

Odată calculate valorile mediei și deviației de navigare curente se poate trece la următoarea intrare 52 pentru a repeta procesul. Pentru aceasta se verifică dacă intrarea curentă a fost sau nu ultima din secvența curentă 53. Dacă nu 53 Nu, atunci procesul se reia de la pasul 46.Once the average values and the current navigation deviation are calculated, one can go to the next entry 52 to repeat the process. For this, it is checked whether or not the current entry was the last one in the current sequence 53. If not 53 No, then the process resumes from step 46.

Altfel 53 Da, procesarea secvenței curente s-a încheiat și se încearcă trecerea la următoarea secvență de prelucrat. în următorul pas 54 se verifică dacă secvența a cărei procesare tocmai s-a încheiat a fost ultima din istoricul utilizatorului 2. Dacă da 54 Da, atunci procesarea se încheie întrucât întreagul istoric al utilizatorului 2 a fost procesat. în caz contrar se trece la următoarea secvență 55 și apoi procesul se reia de la pasul 44.Otherwise 53 Yes, the current sequence processing is completed and the next sequence to be processed is attempted. In the next step 54 it is checked if the sequence whose processing has just ended was the last one in the history of user 2. If yes 54, then the processing ends as the whole history of user 2 has been processed. otherwise, proceed to the next sequence 55 and then the process is resumed from step 44.

în fig. 7 este prezentat modul în care se decid probabilitățile fiecărei categorii de pagini în parte, astfel încât să se țină cont de ritmul curent de navigare 02. Astfel, odată determinate probabilitățile fiecărei categorii i pentru fiecare ritm de navigare 42, acestea sunt filtrate în funcție de ritmul de navigare curent 56, pentru a retuma numai acele probabilități care sunt adaptate acestui ritm 11 (02). Determinarea ritmului de navigare curent 56 se face cu ajutorai modulului de determinare a mediei și deviației de navigare a utilizatorului 29, cu observația că în acest caz nu avem decât o singură secvență (deci j = 0 mereu) și în plus nu ne interesează decât valorile ultimelor valori ale mediei și deviației de navigare a utilizatorului.in FIG. 7 is presented the way in which the probabilities of each category of pages are determined, so as to take into account the current navigation rate 02. Thus, once the probabilities of each category i for each navigation rhythm 42 are determined, they are filtered according to the current navigation rate 56, to resume only those probabilities that are adapted to this rhythm 11 (02). Determining the current navigation rhythm 56 is done with the help of the module for determining the average and the navigation deviation of the user 29, with the observation that in this case we have only one sequence (so j = 0 always) and in addition we are only interested in the values the latest values of the mean and the user's navigation deviation.

în fig. 8 este descrisă modalitatea de calcul a categoriilor cel mai probabil să fie vizitate conform istoricului de navigare 03, din cadrul modulului de determinare a probabilității de continuare a secvenței 12, folosind istoricul de navigare al utlizatoralui 2. Așa cum am menționat deja, informațiile legate de istoricul de navigare al utilizatorului sunt salvate într-o matrice, astfel încât să se surprindă ordinea în care utilizatorul a vizitat diverse categorii de pagini. Drept urmare, informațiile salvate în această matrice vor reprezenta indicii din secvență ai diferitelor categorii de pagini vizitate. Folosind această matrice 2 și secvența parțială de navigare 1, se va determina probabilitatea fiecărei categorii de pagini de a fi vizitată în continuare cu ajutorai unui procedeu de recomandare. Cu alte cuvinte, se folosește un procedeu de recomandare care pe baza unei secvențe parțiale existente la un moment dat și a istoriei cunoscute despre un utilizator (secvențe anterioare de navigare ale acestuia), să învețe ordinea în care acesta vizitează diferite categorii de pagini și apoi să sugereze posibilități de continuare a unor noi secvențe (parțiale) de navigare. Procedeele de recomandare se bazează pe realizarea unei matrici utilizator - obiect, unde sunt salvate, de obicei, ratingurile date de către fiecare utilizator obiectelor folosite. Această matrice este de obicei rară, întrucât numărul mare de obiecte și utilizatori face practic imposibilă existența .^.-2 0 15,-B86B7- (d 2 1 -08- 2015 unui rating pentru fiecare tuplu <utilizator,obiect> (care presupune ca fiecare utilizator să fi folosit și evaluat fiecare dintre obiectele din matrice). Varianta clasică a unui astfel de procedeu de recomandare a fost adaptată astfel încât să țină cont de nevoile / informațiile disponibile în invenția noastră. Astfel, în loc să avem mai mulți utilizatori, noi vom avea mai multe secvențe de navigare, iar pe post de obiecte vom avea categoriile de pagină identificate; în cazul nostru, în locul tuplurilor clasice cutilizator, obiect> vom avea tupluri <secvență, categorie din seevență>. Cu alte cuvinte, vom folosi matricea istoricului utilizatorului 2 având pe cele două dimensiuni secvențele de pagini web vizitate de către utilizatorul respectiv în trecut și categoriile posibile pentru diferitele pagini web. Folosind această matrice, ceea ce învață procedeul de recomandare este diferit față de versiunea clasică (unde detecta un rating pentru un anumit obiect). Astfel, procedeul de recomandare învață de fapt ordinea în care sunt parcurse diferite categorii de pagini, putând fi folosit ulterior pentru sugerarea ordinii în care vor fi vizitate diverse categorii pentru secvențe noi, nefinalizate.in FIG. 8 is described the way of calculating the categories most likely to be visited according to the navigation history 03, within the module for determining the probability of continuation of the sequence 12, using the navigation history of user 2. As already mentioned, the information related to The user's browsing history is stored in an array, so that the order in which the user visited various categories of pages is surprised. As a result, the information saved in this matrix will represent sequence indexes of the different categories of pages visited. Using this matrix 2 and the partial navigation sequence 1, it will be determined the probability of each category of pages to be visited further with the help of a recommendation process. In other words, a recommendation process is used which, based on a partial sequence existing at a given time and the known history of a user (his previous browsing sequences), to learn the order in which he visits different categories of pages and then to suggest possibilities for further (partial) navigation sequences. The recommendation procedures are based on the realization of a user-object matrix, where the ratings given by each user to the objects used are usually saved. This matrix is usually rare, since the large number of objects and users makes the existence practically impossible. ^ .- 2 0 15, -B86B7- (d 2 1 -08- 2015 a rating for each tuple <user, object> (which implies for each user to have used and evaluated each of the objects in the matrix.) The classic variant of such a recommendation process has been adapted to take into account the needs / information available in our invention, so that instead of having more users , we will have more browsing sequences, and as objects we will have the page categories identified; in our case, instead of the classic cutilizer tuples, object> we will have tuples <sequence, category from the sequence>. In other words, we will use the matrix of user history 2 having on the two dimensions the web page sequences visited by the respective user in the past and the possible categories for the different p web agini. Using this matrix, what the recommendation process teaches is different from the classic version (where it detects a rating for a particular object). Thus, the referral process actually learns the order in which different categories of pages have been browsed, and can subsequently be used to suggest the order in which various categories for new, unfinished sequences will be visited.

Acest modul preia vectorul de navigare curent (care este incomplet) și istoricul de navigare al utilizatorului 2 și îl trimite unui procedeu de recomandare 57. Procedeul de recomandare poate fi orice procedeu de recomandare clasic sau îmbunătățit pe baza noilor descoperiri din domeniu, adaptările făcute anterior neimpunând nicio restricție asupra funcționalității acestuia, făcând posibilă folosirea oricărui astfel de procedeu. în continuare, procedeul de recomandare va furniza câte o valoare pentru fiecare categorie nevizitată deja (care nu face parte din secvența parțială de navigare) 58, inclusiv pentru categoriile continue și stop introduse suplimentar în cadrul algoritmului de pre-procesare. Această valoare reprezintă ordinea probabilă a fiecărei categorii analizate în secvența incompletă dată spre evaluare.This module takes the current navigation vector (which is incomplete) and the user's browsing history 2 and sends it to a recommendation procedure 57. The recommendation process can be any classic or improved recommendation process based on new discoveries in the field, previously made adaptations. not imposing any restrictions on its functionality, making it possible to use any such process. Next, the recommendation process will provide a value for each previously unseen category (which is not part of the partial navigation sequence) 58, including for the continuous and stop categories additionally introduced in the pre-processing algorithm. This value represents the probable order of each category analyzed in the incomplete sequence given for evaluation.

Știind câte categorii au fost deja vizitate în cadrul secvenței curente incomplete, se poate determina, pentru fiecare categorie rămasă nevizitată, diferența dintre următorul indice al secvenței (lungimea secvenței + 1) - indicele pe care ar urma să fie vizitată o nouă categorie și valoarea prezisă de procedeul de recomandare. Această diferență este transformată într-o probabilitate de vizitare 59 prin împărțirea ei la numărul total de categorii + 1 și scăderea valorii obținute din valoarea 1. Acest lucru se întâmplă datorită faptului că cea mai lungă secvență ce se poate construi poate avea maxim numărul total de categorii +2 elemente (start și continue sau stop): start (începutul secvenței), toate categoriile (fiecare o singură dată pentru că altfel am avea ciclu și secvența s-ar sparge în două sub-secvențe) și apoi ori continue (s-a găsit ciclu) ori stop (s-a terminat secvența de navigare), iar diferența maximă de indici din această secvență este numărul total de categorii + 1. Astfel, folosind această metodologie se poate determina probabilitatea ca o categorie de pagini nevizitată încă să fie următoarea pagină vizitată. Totuși problema estimării probabilităților pentru toate categoriile de pagini nu este încă rezolvată, deoarece mai sunt și categoriile de pagini deja existente în secvență (vizitate), a căror probabilitate de re-vizitare trebuie estimată 60. Pentru a evalua aceste probabilități se folosesc două informații: probabilitatea categoriei continue (probabilitatea de a se încheia un ciclu la următorul pas) și probabilitățile categoriilor deja vizitate (cu excepția categoriei Start) de a se afla pe prima poziție în istoricul utilizatorului 2.Knowing how many categories have already been visited in the current incomplete sequence, it is possible to determine, for each category that remains unseen, the difference between the following index of the sequence (length of sequence + 1) - the index on which a new category would be visited and the predicted value by the recommendation process. This difference is transformed into a probability of visitation 59 by dividing it by the total number of categories + 1 and subtracting the value obtained from value 1. This is because the longest sequence that can be built can have a maximum total number of categories +2 elements (start and continue or stop): start (beginning of the sequence), all categories (each one once because otherwise we would have cycle and the sequence would break into two sub-sequences) and then either continuous (it was found cycle) or stop (navigation sequence ended), and the maximum difference of indices in this sequence is the total number of categories + 1. Thus, using this methodology it is possible to determine the probability that an unvisited category of pages will still be the next visited page. However, the problem of estimating the probabilities for all categories of pages is not yet solved, as there are also the categories of pages already existing in the sequence (visited), whose probability of re-visiting must be estimated 60. To evaluate these probabilities two information are used: the probability of the continuous category (the probability of completing a cycle in the next step) and the probabilities of the categories already visited (except for the Start category) of being in the first position in the user's history 2.

Cu alte cuvinte, mai întâi estimăm care este probabilitatea de a încheia ciclul în pasul următor, după care trebuie să evaluăm care este categoria cea mai probabilă să închidă ciclul din cele vizitate deja. Cum prima informație este furnizată de către procedeul de recomandare, ne mai rămâne să evaluăm cea de-a doua informație. Datorită algoritmului de liIn other words, we first estimate the probability of completing the cycle in the next step, and then we must evaluate which category is most likely to close the cycle from those already visited. As the first information is provided by the recommendation process, it remains for us to evaluate the second information. Due to the li algorithm

2075,-00607 -, (7®2075, -00607 -, (7®

1 -08- 2015 pre-procesare, de fiecare dată când se identifică un nou ciclu în secvența de navigare, aceasta este spartă în două subsecvențe, cea de-a doua începând cu categoria care a dus la crearea ciclului. Drept urmare, se poate evalua probabilitatea fiecărei categorii de a genera cicluri prin determinarea numărului de situații în care acest lucru s-a întâmplat și împărțirea acestui număr la numărul total de cicluri identificate în toată istoria de navigare a utilizatorului 2.1 -08- 2015 pre-processing, whenever a new cycle is identified in the navigation sequence, it is broken into two sub-sequences, the second starting with the category that led to the creation of the cycle. As a result, it is possible to evaluate the probability of each category of generating cycles by determining the number of situations in which this happened and dividing this number by the total number of cycles identified throughout the user's browsing history 2.

Având aceste probabilități, ele se filtrează astfel încât să rămână numai probabilitățile acelor categorii care au fost deja vizitate în cadrul secvenței, după care valorile obținute se înmulțesc cu probabilitatea de a încheia ciclul (determinată pe baza rezultatelor procedeului de recomandare). înmulțirea celor două probabilități are loc datorită faptului că ambele evenimente (apariția ciclului și categoria care să creeze ciclul respectiv) trebuie să apară concomitent și astfel probabilitățile lor de apariție trebuie combinate (prin înmulțire). După acest ultim pas avem disponibile toate probabilitățile de continuare a secvenței curente 03 și acestea se pot furniza la ieșirea modulului 61.Given these probabilities, they are filtered so that only the probabilities of those categories that have already been visited in the sequence remain, after which the obtained values are multiplied by the probability of completing the cycle (determined based on the results of the recommendation process). the multiplication of the two probabilities takes place due to the fact that both events (the occurrence of the cycle and the category that creates the respective cycle) must occur simultaneously and thus their probabilities of occurrence must be combined (through multiplication). After this last step we have all the probabilities of continuation of the current sequence 03 available and these can be provided at the output of module 61.

Odată obținute valorile probabilităților fiecărei categorii în parte în funcție de factorii considerați (momentul actual de timp, modulul 10, ieșirea Ol, ritmul curent de navigare, modulul 11, ieșirea 02 și istoricul utilizatorului, modulul 12, ieșirea 03), aceste valori trebuie combinate astfel încât procedeul să furnizeze o singură probabilitate finală pentru fiecare categorie în parte 16. Practic, procedeul trebuie să furnizeze valoarea probabilității fiecărei categorii i dându-se momentul actual de timp, ritmul curent de navigare și istoricul utilizatorului: p(categoriei | momentul actual de timp, ritmul curent de navigare, istoricul utilizatorului). Combinarea probabilităților se poate face fie prin înmulțire directă a probabilităților corespunzătoare aceleiași categorii, fie prin adunarea acestora, fie prin adunarea logaritmului acestora, fie prin utilizarea unei sume ponderate astfel încât să se poată da o importanță diferită fiecărei din cele trei categorii de probabilități considerate. Dacă cele trei probabilități se înmulțesc direct, se consideră că cei trei factori (momentul actual de timp, ritmul curent de navigare și istoricul utilizatorului) sunt independenți și atunci probabilitățile implicate de fiecare dintre aceștia factorizează, rezultând:Once the values of the probabilities of each category are obtained, depending on the factors considered (current time, module 10, output Ol, current browsing rate, module 11, output 02 and user history, module 12, output 03), these values must be combined. so that the process provides a single final probability for each category 16. Basically, the process must provide the probability value of each category and given the current time, current browsing rate and user history: p (category | current time of time, current browsing rate, user history). The combination of probabilities can be done either by directly multiplying the probabilities corresponding to the same category, either by summing them, or by summing their logarithms, or by using a weighted sum so that different importance can be given to each of the three categories of probabilities considered. If the three probabilities are directly multiplied, it is considered that the three factors (current time, current browsing rate and user history) are independent and then the probabilities involved by each of these factors factor, resulting in:

p(categoriei | momentul actual de timp, ritmul curent de navigare, istoricul utilizatorului) = p(categoriei | momentul actual de timp) * p(categoriei | ritmul curent de navigare) * p(categoriej | istoricul utilizatorului).p (category | current time point, current browsing rate, user history) = p (category | current time point) * p (category | current browsing rate) * p (category | user history).

Datorită faptului că se lucrează cu probabilități având valori cuprinse între 0 și 1, unele dintre ele putând fi extrem de mici, este posibil ca prin înmulțirea lor să se ajungă foarte repede la valoarea 0 (din cauza depășirii preciziei calculatoarelor). De aceea, în unele cazuri se poate apela la un artificiu: logaritmarea în prealabil a formulei de mai sus. Astfel, produsul se transformă în sumă și drept urmare se poate face o însumare a logaritmului probabilităților (în loc de o înmulțire directă a acestora) pentru a evita aceste probleme. Uneori, pentru a mai simplifica calculele, in loc să se însumeze logaritmul probabilităților, se însumează chiar valorile probabilităților, chiar dacă rezultatele nu sunt la fel de corecte. De asemenea, folosind suma (logaritmului) probabilităților în locul produsului acestora, se poate schimba importanța dată fiecărui factor în parte prin adăugarea câte unui coeficient de importanță fiecărui factor în parte. Acești coeficienți nu apar întâmplător, prezența lor fiind explicată de posibilele interacțiuni existente între diverși factori, interacțiuni ce nu pot fi surprinse de modelul inițial în care s-a presupus independența factorilor prezentați. Drept urmare, dacă ^-2015,-00 6 0 7 - $ 5Due to the fact that they work with probabilities having values between 0 and 1, some of them being extremely small, it is possible that by multiplying them they can reach very quickly the value 0 (due to the exceeding of the precision of the computers). That is why, in some cases, it is possible to resort to an artifice: the logarithm in advance of the above formula. Thus, the product is transformed into sum and as a result a summary of the logarithm of probabilities (instead of a direct multiplication of them) can be made to avoid these problems. Sometimes, to simplify the calculations, instead of summing the logarithm of the probabilities, the values of the probabilities are summed, even if the results are not as correct. Also, using the sum (logarithm) of the probabilities instead of their product, one can change the importance given to each factor by adding one coefficient of importance to each factor separately. These coefficients do not appear by chance, their presence being explained by the possible interactions between different factors, interactions that cannot be surprised by the initial model in which the independence of the presented factors was assumed. As a result, if ^ -2015, -00 6 0 7 - $ 5

1 -08- 2015 există influențe între factori, acei factori cu influențe mai mari vor avea coeficienți mai mari în cadrul sumei. De asemenea, dacă nu se dorește să se țină cont de vreun factor, acesta poate primi coeficientul O și astfel predicția lui este ignorată.1 -08- 2015 there are influences between factors, those factors with higher influences will have higher coefficients within the sum. Also, if one does not want to take into account any factor, it can receive the coefficient O and thus its prediction is ignored.

Ultimul pas al aplicației, predicție categorie 13, are rolul de a determina și furniza la ieșire cea mai probabilă categorie dându-se istoricul utilizatorului, factorii de care se ține cont (momentul actual de timp, ritmul curent de navigare și istoricul utilizatorului), precum și eventualii coeficienți de importanță acordați fiecăruia dintre acești factori. în acest pas sunt comparate valorile probabilitățile tuturor categoriilor (obținute de la modulul de combinare a probabilităților 16) și se va furniza la ieșire acea categorie care are probabilitatea maximă 04. Exemplul 1 de aplicare a metodei propuse.The last step of the application, category 13 prediction, has the role of determining and providing at the output the most probable category taking into account the user's history, the factors that are taken into account (current time, current browsing rate and user history), as well as and any coefficients of importance given to each of these factors. In this step, the values of the probabilities of all categories (obtained from the combination of probabilities module 16) are compared and the one that has the maximum probability 04. The example of applying the proposed method will be provided at the output.

într-un exemplu nelimitativ de aplicare a invenției, metoda propusă poate fi folosită pentru cazul în care navigarea se face de pe un sistem desktop (calculator personal). în acest caz, metoda va fi implementată chiar pe calculatorul în cauză și va analiza istoria de navigare a utilizatorului, făcând predicții legate de următoarea/următoarele categorii de pagini ce vor fi vizitate de acesta. Pentru aceasta, metoda va fi implementata în browser, de exemplu într-o extensie de browser (scrisă în JavaScript).In a non-limiting example of the application of the invention, the proposed method can be used for browsing on a desktop (personal computer) system. In this case, the method will be implemented on the computer itself and will analyze the browsing history of the user, making predictions related to the next / next categories of pages that will be visited by him. For this, the method will be implemented in the browser, for example in a browser extension (written in JavaScript).

Datele de intrare sunt reprezentate de o secvență de navigare de forma:The input data is represented by a navigation sequence of the form:

2015-03-02 11:29:11 impbeacon {location:www.google.es, impId:O7be8c3b-3O2f457e-864f-ab4953100636, vertical:Intemet_and_Telecom/Search_Engine, pk:0, appInstanceUid :3AA812EC-A599-438B-9664-B747296AB527} ES2015-03-02 11:29:11 impbeacon {location: www.google.com, impId: O7be8c3b-3O2f457e-864f-ab4953100636, vertical: Intemet_and_Telecom / Search_Engine, pk: 0, appInstanceUid: 3AA812EC-A5994-38B-9 B747296AB527} ES

2015-03-02 13:49:33 impbeacon {location:www.angrybirdsgames.com, impId:8ebad827-881a-4f8d-8104-38164fddl689, vertical:Games, pk:O, appInstanceUid:3AA812EC-A599-438B-9664-B747296AB527} ES2015-03-02 13:49:33 impbeacon {location: www.angrybirdsgames.com, impId: 8ebad827-881a-4f8d-8104-38164fddl689, vertical: Games, pk: O, appInstanceUid: 3AA812EC-A599-438B-9664- B747296AB527} ES

Procedeul va determina probabilitățile fiecărei categorii în parte pentru fiecare slot de timp 14. De exemplu, dacă se consideră 3 sloturi de timp (dimineața, prânz, seara), atunci se vor obține probabilitățile din Tabelul 1 pentru 5 categorii de pagini considerate (Internet_and_ Telecom, Games, Adult, Finance și Other) + încă două suplimentare (Continue și Stop). Tabel 1. Probabilitățile fiecărei categorii în parte pentru fiecare slot de timpThe procedure will determine the probabilities of each category separately for each time slot 14. For example, if you consider 3 time slots (morning, lunch, evening), then the probabilities in Table 1 for the 5 categories of pages considered (Internet_and_ Telecom) will be obtained. , Games, Adult, Finance and Other) + two more (Continue and Stop). Table 1. The probabilities of each category separately for each time slot

^\Categorie Slot detă^î ^ \ Slot category owns ^ Î Internet_and_Telecom Internet_and_Telecom Games Games Adult Adult Finance Finance Other Other Continue continue Stop Stop Dimineața Morning 0.46 0.46 0.23 0.23 0 0 0.3 0.3 0 0 0.31 0.31 0.06 0.06 Prânz Lunch 0.38 0.38 0.08 0.08 0 0 0.15 0.15 0.38 0.38 0.44 0.44 0.23 0.23 Seara In the evening 0.23 0.23 0.3 0.3 0.15 0.15 0 0 0.3 0.3 0.25 0.25 0.69 0.69

De asemenea, se vor determina probabilitățile fiecărei categorii în parte pentru fiecare ritm de browsing al utilizatorului 15, obținându-se valorile din Tabelul 2.Also, the probabilities of each category will be determined for each browsing rhythm of user 15, obtaining the values in Table 2.

în fine, dându-se o secvență parțială conținând categoriile Start, IntemetandTelecom și Games, folosind procedeul de recomandare 12, se vor determina indicii următori pentru celelalte categorii rămase: Adult - 5, Finance -4, Other - 3, Continue - 3, Stop - 5.Finally, given a partial sequence containing the categories Start, IntemetandTelecom and Games, using the procedure of recommendation 12, the following indices will be determined for the remaining remaining categories: Adult - 5, Finance -4, Other - 3, Continue - 3, Stop - 5.

fu-ι O 15 D16 D 7 2 1 -08-2015fu-ι O 15 D16 D 7 2 1 -08-2015

Probabilitățile obținute pentru fiecare din aceste categorii sunt Adult - 0.66, Finance -0.83 , Other -1, Continue -1, Stop - 0.66. De asemenea, pentru cele 2 categorii care pot să determine ciclu se obțin probabilitățile IntemetandTelecom - 0.8 și Games - 0.2. Atunci ieșirea 03 va fi: Internet_and_Telecom - 0.8, Games - 0.2, Adult - 0.66, Finance -0.83 , Other -1, Continue -1, Stop - 0.66.The probabilities obtained for each of these categories are Adult - 0.66, Finance -0.83, Other -1, Continue -1, Stop - 0.66. Also, for the 2 categories that can determine the cycle, the probabilities of IntemetandTelecom - 0.8 and Games - 0.2 are obtained. Then output 03 will be: Internet_and_Telecom - 0.8, Games - 0.2, Adult - 0.66, Finance -0.83, Other -1, Continue -1, Stop - 0.66.

Tabel 2. Probabilitățile fiecărei categorii în parte pentru fiecare slot de timpTable 2. The probabilities of each category separately for each time slot

Categorie Ritm navigaîe\^^ Category Rhythm navigaîe \ ^^ Internet_and_ Telecom Internet_and_ Telecom Games Games Adult Adult Finance Finance Other Other Continue continue Stop Stop Tipl Tiplic 0.17 0.17 0.67 0.67 0 0 0 0 0.17 0.17 0.13 0.13 0.38 0.38 Tip 2 Type 2 0.43 0.43 0.14 0.14 0 0 0.14 0.14 0.29 0.29 0.31 0.31 0.31 0.31 Tip 3 Type 3 0.31 0.31 0.15 0.15 0 0 0.23 0.23 0.31 0.31 0.38 0.38 0.15 0.15 Tip 4 Type 4 0.46 0.46 0.08 0.08 0.15 0.15 0.15 0.15 0.15 0.15 0.19 0.19 0.15 0.15

Știind că navigarea curentă se face în slotul de la prânz, ieșirea Ol a procedeului va fi reprezentată de valorile Intemet_and_Telecom - 0.38, Games - 0.08, Adult - 0, Finance -0,15, Other - 0.38, Continue - 0.44, Stop - 0.23. Dacă ritmul de navigare al utilizatorului este de tipul T2, ieșirea 02 a procedeului va consta în valorile: Intemet_and_Telecom - 0.43, Games - 0.14, Adult - 0, Finance -0.14, Other - 0.29, Continue - 0.31, Stop - 0.31.Knowing that current browsing is done in the lunch slot, the Ol output of the process will be represented by Intemet_and_Telecom values - 0.38, Games - 0.08, Adult - 0, Finance -0.15, Other - 0.38, Continue - 0.44, Stop - 0.23 . If the user's navigation rate is of type T2, the output 02 of the procedure will consist of the values: Intemet_and_Telecom - 0.43, Games - 0.14, Adult - 0, Finance -0.14, Other - 0.29, Continue - 0.31, Stop - 0.31.

Pentru a obține probabilitățile finale, trebuie înmulțite valorile reprezentând ieșirile Ol, 02 și 03 și apoi normalizate valorile obținute. Drept urmare, probabilitățile finale vor fi: Intemet_and_Telecom - 0.294, Games - 0.005, Adult - 0, Finance - 0.039, Other - 0,248, Continue - 0.307, Stop - 0.106. Pe baza acestor valori, se observă că vom obține valoarea maximă pentru categoria Continue (30.7%), ceea ce înseamnă că cel mai probabil utilizatorul se va întoarce la una din cele două categorii vizitate deja. Dintre cele două categorii, cea care are probabilitatea mai mare (apropiată de cea a categoriei Continue) este categoria Internet_and_Telecom (29.4%) și drept urmare se va alege drept categoria următoarei pagini ce va fi vizitată de către utilizator, aceasta reprezentând ieșirea 04.In order to obtain the final probabilities, the values representing the outputs Ol, 02 and 03 must be multiplied and then normalized the obtained values. As a result, the final odds will be: Intemet_and_Telecom - 0.294, Games - 0.005, Adult - 0, Finance - 0.039, Other - 0.248, Continue - 0.307, Stop - 0.106. Based on these values, it is observed that we will obtain the maximum value for the Continue category (30.7%), which means that the user will most likely return to one of the two categories already visited. Of the two categories, the one with the highest probability (close to that of the Continue category) is the Internet_and_Telecom category (29.4%) and as a result will be chosen as the category of the next page that will be visited by the user, this being output 04.

Exemplul 2 de aplicare a metodei propuse. într-un alt exemplu nelimitativ de realizare a invenției, aplicația poate fi folosită pe un dispozitiv mobil (telefon sau tabletă). Față de varianta de pe calculatorul personal, acum datele nu mai sunt procesate local, datorită resurselor limitate de care dispune dispozitivul mobil, ci sunt trimise la un server unde se fac prelucrările prezentate în exemplul anterior. Browserul dispozitivului mobil va colecta informațiile de navigare (aceleași ca în exemplul anterior) și apoi le va trimite către un server, de exemplu folosind formatul Java Script Object Notation, JSON. Serverul poate fi implementat pe o singură mașină, caz în care mecanismul de transmisie de pe server folosește, de exemplu, limbajul PHP și un server web (de exemplu, Tomcat), sau poate fi implementat într-un cloud, în cazul în care se decide să se rețină mai multe date pentru predicție (și unde implicit este nevoie de o mai mare putere de calcul pentru procesarea acestora). în acest caz modulul de transmisie de pe server este implementat, de exemplu, în Scala. La sfârșitul procesărilor, categoria cea mai probabilă (de exemplu, ca la Exemplul 1 Intemet_and_Telecom) e retumată la dispozitivul mobil, unde e folosită în conjuncție cu un alt modul de pre-încărcare a paginilor din categoria prezisă.Example 2 application of the proposed method. In another non-limiting embodiment of the invention, the application may be used on a mobile device (telephone or tablet). Compared to the version on the personal computer, now the data are no longer processed locally, due to the limited resources available to the mobile device, but are sent to a server where the processing presented in the previous example is done. The mobile device browser will collect the browsing information (same as in the previous example) and then send it to a server, for example using the Java Script Object Notation, JSON format. The server can be deployed on a single machine, in which case the transmission mechanism on the server uses, for example, PHP language and a web server (for example, Tomcat), or it can be deployed in a cloud, if decides to keep more data for prediction (and where implicitly more computing power is needed to process them). In this case the transmission module on the server is implemented, for example, in Scale. At the end of the processing, the most probable category (for example, as in Example 1 Intemet_and_Telecom) is resumed on the mobile device, where it is used in conjunction with another module for pre-loading pages of the predicted category.

Claims

1. Method of individual prediction of the categories of web pages to be visited by a user, intended for use on personal computers and mobile devices (phones, tablets), in order to speed up and simplify browsing, by offering, at the right time, relevant web page suggestions, easy to access by the user, characterized by having the following phases:

a. Pre-processing 3, in which, starting from the user's current navigation sequence and from its browsing history, one or more acyclic vectors with the categories of pages visited, in order of visits;

b. Processing 4, in which the category of the next web page to be accessed by the user is predicted. This prediction is done by combining two different prediction techniques: one based on the use of a recommendation procedure 12, and the other, a statistical method based on comparing the probabilities of the different categories of pages to be accessed 10.11; and using three different factors to make the prediction: the user's browsing history, the current time point and the user's current browsing rate;

c. Adding pre-processed information to the user's browsing history.

2. The prediction method according to claim 1, characterized in that the user's current navigation rate is determined, within the processing module 4, by the method of determining the probable category according to the navigation rate 11, based on the mean and speed deviation. navigation, as belonging to one of the following 4 categories: pause navigation, slow navigation, chaotic navigation, and accelerated navigation.

3. The method of prediction based on a recommendation method, according to claim 1, characterized in that, in the processing module 4, the method of determining the probability of continuation of the sequence 12 takes into account the entire browsing history of the user and can generate predictions for an entire browsing sequence (not just for the next category to be accessed) along with the method of pre-processing the data 3 so that it can be used in such a recommendation process.

4. The prediction method according to claim 3, characterized in that the method of determining the probability of continuation of sequence 12 uses the transformation of the results provided by the recommendation process into probabilities:

o for the unseen categories, by estimating the difference between the order provided by the recommendation procedure and the order number of the next category that should be visited within the partial sequence, and normalizing this difference by dividing by the maximum difference that can occur (the total number of categories + 1 )

o for the re-visited categories, by multiplying the probability of the Continue category (determined using the methodology above) with the probability of each category already visited to generate cycles.