LT6685B

LT6685B - Message sequence analysis

Info

Publication number: LT6685B
Application number: LT2019021A
Authority: LT
Inventors: Aleksej Zaicevskij
Original assignee: Aleksej Zaicevskij
Priority date: 2019-05-02
Filing date: 2019-05-02
Publication date: 2019-12-27
Also published as: LT2019021A

Abstract

The invention subject is the technology for identifying statistical links in the sequence of news items, adverts, or other messages. Incoming messages are classified according to several attributes. Selective reclassification is used to account for different trait assessment interpretations. The messages converted into code form an estimator matrix. To detect a pattern in a message sequence on a timescale, it is necessary to compare matrix fragments which follow either before or after messages with the same assessment value according to one or more traits. The correlation dependence with the same data filter on the superimposed time segments is assessed. If the correlation dependence for two or more matrix fragments is high, the data filter becomes narrower. Data on settings and search results are stored in the database as a pattern. The examples discovered are assessed by a person for significance. A new or repeated pattern search starts with settings combining two or more known patterns with similar message codes. The patterns with high significance assessment are more often used to create combined search settings. The data filter is additionally extended using random values. Figuratively speaking, the pattern search criteria evolve by crossing, mutation, and selection. The analysis predictive power is expressed in the assessment of probability with which the new or probable message fits into the previously identified pattern. The past message sequence examples show what typically happens under similar circumstances. (fig. 10)

Description

Išradimas yra susijęs su duomenų apdorojimo sistemomis ir metodais, specialiai pritaikytais administraciniams, komerciniams, finansiniams, vadybiniams priežiūros ar prognozavimo tikslams.The present invention relates to data processing systems and methods specifically adapted for administrative, commercial, financial, managerial or forecasting purposes.

Patente US10181167 yra aprašytas politiko veiksmų nuspėjimo metodas pagal nesurištus istorinius duomenis. Šiame pavyzdyje yra akivaizdus ryšys tarp subjekto ir aplinkybių. Ryšių, tarp įvykių su nenustatytu subjektų kiekiu, paieška nenumatyta.US10181167 describes a method of predicting a politician's actions based on unbound historical data. In this example, there is an obvious link between the subject and the circumstances. Searching for relationships between events with an unspecified number of entities is not foreseen.

Toliau aprašomo išradimo sprendžiama užduotis yra statistinių ryšių aptikimas naujienų, reklamos ar kitų pranešimų sekoje. Žmogus gauna kognityvinių sugebėjimų išplėtimą, naudojant kolektyvinės patirties analizę.A further object of the present invention is to detect statistical relationships in a sequence of news, advertising or other communications. The individual receives an extension of his cognitive abilities through the analysis of collective experience.

Analizė atliekama kompiuterio pagalba. Yra būtini sekantys sistemos komponentai: kaip minimum vienas atminties modulis; kaip minimum vienas procesorius; kaip minimum vienas duomenų įvedimo interfeisas; kaip minimum vienas duomenų atvaizdavimo arba perdavimo prietaisas. Visi šiame aprašyme paminėti duomenys yra saugomi atminties modulyje. Visus skaitmeninius procesus atlieka procesorius.The analysis is done by computer. The following system components are required: as a minimum one memory module; as a minimum one processor; as a minimum one data entry interface; as a minimum of one device for displaying or transmitting data. All data mentioned in this description is stored in a memory module. All digital processes are done by the processor.

Pranešimai yra klasifikuojami pagal keletą požymių. Požymiai apima duomenis apie pranešimą ir jo aplinkybes su formaliu įvertinimu. Pradiniame etape įvertinimus duoda žmogus. Save apmokanti sistema naudoja sukauptą medžiagą klasifikacijos proceso automatizacijai. Pagal turinį klasifikatorius yra panašus į tarptautinę patentų klasifikaciją.Messages are classified by several attributes. Symptoms include data about the message and its context with a formal evaluation. In the initial stage, the ratings are given by the individual. The self-paying system uses accumulated material to automate the classification process. In terms of content, the classification is similar to the international patent classification.

Klasifikatorius gali būti traktuojamas netiksliai. Įvertinimų nuokrypiai yra aptinkami, pasirinktinai atliekant pakartotinę pranešimo klasifikaciją. Algoritmui yra svarbus ne kosensusas dėl įvertinimo taikymo, o įvertinimo pasikartojimas skirtinguose pranešimuose. Skirtingi to pačio pranešimo įvertinimai yra naudojami atliekant analogiškų pranešimų paiešką. Skirtingų įvertinimų kiekio padidėjimas ar sumažėjimas parodo galimo socialinio ar individualaus konflikto atsiradimo ir išnykimo taškus.The classifier may be treated inaccurately. Estimation deviations are detected by optionally re-classifying the message. What matters to the algorithm is not the cosensity of the application of the estimation, but the repetition of the estimation in different messages. Different ratings for the same message are used to search for similar messages. Increases or decreases in the number of different estimates indicate points of occurrence and disappearance of potential social or individual conflict.

Pranešimai yra paverčiami į kodus ir formuoja įvertinimų matricą. Matricos duomenys leidžia formuoti sekančius grafikus laiko skalėje: pritraukto dėmesio kiekis; duomenų apimtis; duomenų apimties pasikeitimo greitis; kitos išvestinės funkcijos. Grafikai yra formuojami taikant duomenų filtrus, kad išskirti dominančią duomenų kombinaciją. Filtrai - tai išėmimas, inversija, konjunkcija, disjunkcija ir koeficientų taikymas. Koeficientu gali būti pranešimo peržiūrų kiekis ar kitas reikšmingumo įvertinimas. Yra apskaičiuojamos vidutinės grafikų koreliacijos reikšmės su skirtingais filtrais. Lokalinis vidutinės reikšmės nukrypimas atspindi pakeitimus, būdingus tam tikram reiškiniui. Pavyzdžiui, išskiriami pakeitimai būdingi poilsio dienai ar karo laikotarpiui.The messages are converted into codes and form a matrix of ratings. The matrix data allow you to plot the following graphs over time: the amount of attention attracted; volume of data; rate of data volume change; other derivative functions. Graphs are formed using data filters to exclude a combination of data of interest. Filters are extraction, inversion, conjuncture, disjunction and the application of coefficients. The ratio can be the number of views to the message or other significance rating. Average correlation values of graphs with different filters are calculated. The local deviation of the mean value reflects the changes specific to a given phenomenon. For example, distinguishable changes are specific to rest day or war period.

Matricoje yra ieškomi analogai atskirai paimtam pranešimui, tam kad aptikti pasikartojantį šabloną iš kelių pranešimų. Analogiškas pranešimas turi vienodus įvertinimus pagal vieną ar daugiau požymių. Matricos fragmentai, kurie eina po arba prieš analogiškus pranešimus, yra sulyginami laiko skalėje. Yra vertinama koreliacija tarp sulygintų matricos fragmentų su vienodu duomenų filtru. Jeigu koreliacijos reikšmės yra žemos - paieška yra kartojama su kitokiais duomenų filtrais. Turi būti aptikti du ar daugiau pavyzdžių su aukšta koreliacijos reikšme. Sulyginamų matricos fragmentų ilgis yra keičiamas, kad aptikti didesnį kiekį pavyzdžių. Pranešimų įvertinimai yra paeiliui išimami iš duomenų filtro arba invertuojami, kad aptikti maksimalią koreliacijos reikšmę. Duomenų filtras tampa siauresnis ir yra identifikuojami pranešimų kodai, kurie užtikrina aukštą koreliaciją. Paieškos nustatymai ir aptikti pavyzdžiai yra įrašomi į duomenų bazę kaip „šablonas“.The matrix looks for analogues of a single message to detect a duplicate template from multiple messages. A similar message has the same ratings for one or more attributes. Fragments of the matrix that follow or precede analog messages are aligned over time. Correlation between aligned matrix fragments with a uniform data filter is evaluated. If the correlation values are low, the search is repeated with different data filters. Two or more samples with a high correlation value shall be detected. The length of the matrix fragments to be compared is scaled to detect a greater number of samples. Message estimates are sequentially removed from the data filter or inverted to detect the maximum correlation value. The data filter becomes narrower and message codes are identified which ensure high correlation. The search settings and detected examples are stored in the database as a "template".

Keli šablonai su analogiškais pranešimų kodais yra naudojami tam, kad sujungti paieškos nustatymus ir gauti platesnį duomenų filtrą. Duomenų filtras yra papildomai praplečiamas naudojant atsitiktinai paimtas reikšmes. Platesnis duomenų filtras yra naudojamas atliekant naują šablono paieškos procedūrą.Multiple templates with similar message codes are used to combine search settings and obtain a broader data filter. The data filter is further expanded using randomly taken values. A broader data filter is used in the new template search procedure.

Analizė gali būti vykdoma suplanuoto ar įsivaizduojamo pranešimo atžvilgiu. Kad parodyti vartotojui, kas dažniausiai nutinka prieš arba po panašių aplinkybių, yra formuojamas pranešimų sekos pavyzdys iš praeities. Įdomus pavyzdys pritraukia žmonių dėmesį. Žmonių dėmesys yra matuojamas kaip peržiūrų skaičius ar kitoks pateiktas įvertinimas. Žmonių dėmesys — tai atbulinis ryšys save apmokančiam algoritmui, kad ieškoti heuristinės kombinacijas automatiškai.The analysis can be done against a planned or imagined message. To show the user what usually happens before or after similar circumstances, an example of a message sequence from the past is formed. An interesting example attracts people's attention. People's attention is measured as the number of views or other rating provided. People's attention is the callback to a self-paid algorithm to look for heuristic combinations automatically.

Pateiktų vaizdų aprašymas:Description of submitted images:

Fig.1 yra pateiktas pranešimų klasifikatoriaus pavyzdys, indeksas E. Pažymėtos pozicijos: 1- formalaus įvertinimo sutrumpinimas; 2- įvertinimo apibrėžimas.Fig. 1 is an example of a message classifier, index E. Marked positions: 1- Abbreviation for formal evaluation; 2- Definition of Evaluation.

Fig.2 yra pateiktas pranešimų klasifikatoriaus pavyzdys, indeksas F.Fig. 2 is an example of a message classifier, index F.

Fig.3 yra pateiktas pranešimų klasifikatoriaus pavyzdys, indeksas G.Fig. 3 is an example of a message classifier, index G.

Fig.4 yra pateiktas pranešimų klasifikatoriaus pavyzdys, indeksas H.Fig. 4 is an example of a message classifier, index H.

Fig.5 yra pateiktas pranešimų klasifikatoriaus pavyzdys, indeksas I.Fig. 5 is an example of a message classifier, index I.

Fig.6 yra pateiktas pranešimų klasifikatoriaus pavyzdys, indeksai J, K.Fig.6 shows an example of a message classifier, indexes J, K.

Fig.7 yra pateiktas pranešimų klasifikatoriaus pavyzdys, indeksas L.Fig.7 is an example of a message classifier, index L.

Fig.8 yra pateiktas pranešimo konvertavimo į kodą pavyzdys. Pažymėtos pozicijos: 3- informacija apie pranešimą; 4- pranešimo antraštė; 5- kodo formatas pagal raidinį indeksą; 6- pranešimo kodas.Figure 8 shows an example of converting a message into a code. Marked Positions: 3- message information; 4- message header; 5- code format based on alphanumeric index; 6- message code.

Fig.9 yra pateiktas matricos pavyzdys. Pažymėtos pozicijos: 7- požymių indeksai; 8- pranešimo numeris; 9- klasifikuoti duomenys.Figure 9 is an example of a matrix. Marked Positions: 7- Attribute Indices; 8- message number; 9- Classified data.

Fig.10 yra parodyta matricos fragmentų sulyginimo schema. Pažymėtos pozicijos: 10- laiko skalė; 11- pranešimų kodai; 12- analogiškų pranešimų kodai; 13- sulyginami matricos fragmentai; 14- duomenų sekos koreliacijos įvertinimas; 15pavyzdžiai su didesniu koreliacijos koeficientu.Fig. 10 is a schematic representation of the alignment of matrix fragments. Marked Positions: 10-time scale; 11- message codes; 12- codes for analog messages; 13- aligned fragments of the matrix; 14- estimation of data sequence correlation; 15examples with higher correlation coefficient.

Fig.11 yra parodyta šablonų paieškos procedūra. Pažymėtos pozicijos: 16analizuojamo pranešimo kodas; 17- analitiko pasas; 18- pranešimų kodų matrica; 19- šablonų duomenų bazė; 20- šablonų su panašiais kodais paieška; 21- šių šablonų duomenų filtras yra išimamas iš paieškos nustatymų; 22- šablonai, kurių duomenų filtrai yra sujungiami; 23- atsitiktine tvarka pasirinkti įvertinimai arba požymiai; 24- paieškos nustatymų kombinacija; 25- šablonų paieška; 26- aptikto šablono duomenys.Figure 11 shows a template search procedure. Selected positions: 16code of the message being analyzed; 17- Analyst Passport; 18- message code matrix; 19- Database of templates; 20- Searching for templates with similar codes; 21- the data filter of these templates is removed from the search settings; 22- templates whose data filters are merged; 23- random assessments or attributes; 24- combination of search settings; 25- template search; 26- Detected template data.

Fig.12 yra parodytas šablono duomenų pavyzdys. Pažymėtos pozicijos: 27analogiškų pranešimų filtras; 28- analogiškų kodų kiekis matricoje paieškos metu; 29- duomenų filtras paieškos pradžioje; 30- matricos fragmentų ilgis; 31- duomenų filtras paieškos pabaigoje; 32- kiekis aptiktų šablono pavyzdžių; 33- analogiškų pranešimų numeriai; 34- statistiškai surištų pranešimų numeriai; 35- šablonui suteiktas pavadinimas; 36- reikšmingumo įvertinimas.Figure 12 shows an example of template data. Selected positions: 27analog filters; 28- number of similar codes in the matrix during the search; 29- data filter at the beginning of the search; 30- length of matrix fragments; 31- data filter at the end of the search; 32- number of detected template samples; 33- numbers of analogous messages; 34- numbers of statistically linked messages; 35- Name given to the template; 36- Materiality rating.

Fig.13 yra pateiktas duomenų apimties grafikų pavyzdys. Pažymėtos pozicijos: 37- laiko skalė; 38- analogiškų pranešimų kiekis per metus; 39- šablono pavyzdžių kiekis per metus; 40- grafikų koreliacijos koeficientas.Figure 13 is an example of data volume graphs. Marked Positions: 37- time scale; 38- number of similar messages per year; 39- Number of template samples per year; 40- correlation coefficient of graphs.

Toliau yra pateiktas dvidešimt pirmojo amžiaus rusakalbių naujienų pranešimų analizės pavyzdys. Spėjama, kad Rusija buvo didelio kiekio klaidinančių pranešimų šaltiniu. Todėl ir susidomėjimas aptiktais šablonais yra didesnis.Below is an example of analysis of twenty-first century Russian-language news reports. Russia is believed to have been the source of a large number of misleading messages. Therefore, interest in detected templates is also higher.

Iš naujienų archyvo kiekvienai dienai yra paimami keturi aukščiausią poziciją užimantys naujienų pranešimai. Žmogus paverčia pranešimus j kodą (fig.8) klasifikatoriaus pagalba (fig.1-7). Pranešimo požymiams yra priskirti raidiniai indeksai:The top four news reports are taken daily from the news archive. The person converts the messages into code (Fig.8) with the help of a classifier (Figs. 1-7). The message attributes are assigned alphabetical indexes:

A - pranešimo data;A - date of notification;

B - pranešimo pozicija dienos naujienų sąraše;B - position of the message in the daily news list;

C - antraštė;C - heading;

D - nuoroda j pranešimą;D - link to the message;

E - pranešimo tema;E - subject of the message;

F - kaip apibūdintas arba kaip suvokiamas veiksmas;F - as described or perceived action;

G - kokioje įvykdymo stadijoje yra veiksmas;G - at what stage of execution is the action;

H - laiko tarpas tarp įvykio ir pranešimo;H - time interval between the event and the message;

I - informacijos šaltinio tipas;I - type of information source;

J - įvykių arba pasekmių vieta;J - place of events or consequences;

K - pranešimo priežasties atsiradimo vieta arba subjektas;K - the place or subject of the cause of the message;

L - tikrovės įvertinimas iš kitų šaltinių;L - assessment of reality from other sources;

M - kito pranešimo apie tą patį reiškinį numeris ir žodinis markeris;M - number and verbal marker of another message about the same phenomenon;

N - pranešimo aplinkybės;N - circumstances of the notification;

O - apskaičiuojami pranešimo parametrai;O - calculates message parameters;

P - analitiko paso numeris.P - analyst's passport number.

Kombinacijų kiekis pagal indeksus E*F*G*H*I*J*K*L sudaro 1 610 612 736 variantų.The number of combinations by indexes E * F * G * H * I * J * K * L represents 1,610,612,736 variants.

Duomenys iš kitų informacijos šaltinių patikslina pranešimo aplinkybes (indeksas N). Šiame pavyzdyje kaip aplinkybė yra paimta naftos kaina.Data from other sources clarify the context of the report (index N). In this example, the price of oil is taken as a circumstance.

Informacija apie pranešimą (indeksas O) yra atnaujinama kiekvieną kartą, kai algoritmas apdirbinėja pranešimo duomenis. Apskaičiuojamų pranešimo parametrų sąrašas: kodo nuskaitymų kiekis per visą matricos istoriją; šablonų sąrašas, kur šis pranešimas yra paminėtas; žmonių duotas šablonų reikšmingumo įvertinimas. Apskaičiuojamų parametrų sąrašas gali būti praplėstas.The message information (index O) is updated each time the algorithm processes the message data. List of estimated message parameters: number of code scans throughout the matrix history; a list of templates where this message is mentioned; human-made assessment of the significance of templates. The list of calculated parameters can be expanded.

Pranešimo kodas (6) gali turėti analitiko padarytą klaidą arba gali būti skirtingas dėl kitos interpretacijos. Jau esantys matricoje pranešimai (9, 11) yra pasirinktinai klasifikuojami pakartotinai su nustatytu periodiškumu. Duomenys apie skirtingas interpretacijas yra įrašomi į analitiko pasą (17) kaip “paradoksas”. Analitiko paso duomenys: žmogaus identifikatorius; apdorotų pranešimų kiekis; darbo laiko kiekis; paradoksų sąrašas; paradoksų aptikimo datos. Jeigu paradoksų kiekis yra didelis, tai visą analitiko atliktą darbą turi patikrinti kitas analitikas. Paradoksų sąrašas duoda skirtingų įvertinimų variantus, kurie yra naudojami analogiškų pranešimų (12) paieškos metu. Tai leidžia išlaikyti klasifikatoriaus paprastumą ir samdyti didelį kiekį analitikų be specialaus profesinio paruošimo.The message code (6) may contain an error by the analyst or may be different due to another interpretation. The messages (9, 11) that are already in the matrix are optionally reclassified with a set periodicity. Data on different interpretations are recorded in the analyst's passport (17) as a "paradox". Analyst Passport Data: Human Identifier; number of messages processed; amount of working time; a list of paradoxes; detection dates for paradoxes. If the amount of paradoxes is large, all the work done by the analyst must be verified by another analyst. The list of paradoxes gives variations on the different ratings that are used to search for similar messages (12). This allows the classifier to remain simple and to employ a large number of analysts without specific professional training.

Šablonų paieškai naudojamas nustatymų profilis (24). Nustatymų profilis turi sekančią informaciją: sulyginamų matricos fragmentų ilgis (30); paieškos kryptis prieš ar po pranešimo; duomenų filtras (29). Pirmuosius nustatymų profilius sukuria žmogus.The settings profile (24) is used to search for templates. The setup profile has the following information: length of alignable matrix fragments (30); search direction before or after the message; data filter (29). The first setting profiles are created by a human.

Šablono aptikimo procedūra:Template detection procedure:

1. Analogiškų kodų paieška matricoje (12). Analogiškas pranešimo kodas turi vienodą įvertinimą pagal vieną ar daugiau požymių (27).1. Finding similar codes in a matrix (12). An analogous message code has the same rating based on one or more attributes (27).

2. Matricos fragmentai (13), einantys prieš arba po analogišku kodu, yra sulyginami laiko skalėje (10).2. Matrix fragments (13) preceding or following an analogue code are aligned on a time scale (10).

3. Sulygintų duomenų sekos koreliacijos įvertinimas (14). Yra naudojamas vienas duomenų filtras visiems fragmentams (13).3. Estimation of sequence correlation of matched data (14). One data filter is used for all fragments (13).

4. Išskiriami pavyzdžiai su didesniu koreliacijos koeficientu (15).4. Samples with higher correlation coefficient (15) are distinguished.

5. Duomenų filtras daromas siauresniu (31), kad aptikti kodus, kurie užtikrina aukštą koreliacijos reikšmę (15). Pranešimų įvertinimai yra paeiliui invertuojami arba išimami. Sulyginti matricos fragmentai yra trumpinami.5. The data filter is made narrower (31) to detect codes that provide a high correlation value (15). Message ratings are sequentially inverted or removed. Aligned fragments of the matrix are truncated.

6. Šablono paieškos procedūra yra kartojama su siauresniu duomenų filtru tol, kol koreliacijos reikšmė didėja (15), o pavyzdžių kiekis nemažėja (32-34).6. The pattern search procedure is repeated with a narrower data filter until the correlation value increases (15) and the number of samples decreases (32-34).

Duomenys apie aptiktus šablonus (fig. 12, fig. 13) yra publikuojami. Šablonas gauna reikšmingumo įvertinimą iš žmonių (36). Yra vertinamas peržiūrų kiekis, komentarų kiekis ar kitoks susidomėjimo rodiklis. Analitikas gali teikti įvertinimą balų sistema (36) nuo „neįdomu“ iki „įdomu“. Šablonui gali būti suteiktas pavadinimas (35). Publikacijos metu šablonai yra rūšiuojami pagal aukštesnio dėmesio požymius. Pavyzdžiui, šablono pavyzdys su esančiu pranešimo įvertinimu „paneigta“ (fig.7) pritraukia daugiau dėmesio. Aukštesnio dėmesio požymiai yra tikslinami, atsižvelgiant į šablonų reikšmingumo įvertinimų (36) statistiką. Šablonai su aukštesniu reikšmingumo ir našumo įvertinimu (22) yra dažniau naudojami pranešimų analizės procese. Šablono našumas yra vertinamas pagal aptiktų pavyzdžių kiekį (32), pavyzdžių ilgį (30), pavyzdžių koreliacijos koeficientą (15, 40).The detected patterns (Figs. 12, 13) are published. The template receives a significance rating from people (36). The number of views, number of comments or other interest rate is evaluated. The analyst can rate the score system (36) from 'not interesting' to 'interesting'. The template may be given a name (35). During publication, templates are sorted by attributes of higher attention. For example, a template example with an existing message rating "denied" (fig.7) attracts more attention. The attributes of higher attention are adjusted based on the statistics of the Template Significance Assessments (36). Templates with higher significance and performance ratings (22) are more commonly used in the message analysis process. Template performance is evaluated by the number of detected samples (32), sample length (30), sample correlation coefficient (15, 40).

Išsaugotas šablonas (fig. 12) turi siaurą duomenų filtrą (30, 31). Nustatymų kombinacija iš kelių šablonų (24) duoda platesnį duomenų filtrą tam, kad atlikti naują šablonų paiešką (25). Nustatymų profilio formavimo procedūra:The stored template (Fig. 12) has a narrow data filter (30, 31). A combination of settings from multiple templates (24) provides a broader data filter to perform a new template search (25). Procedure for setting profile settings:

1. Pranešimo kodas (16) yra palyginamas su paradoksų įrašais (17). Jeigu yra daugiau nei vienas įvertinimų variantas, tai naudojami visi žinomi variantai.1. The message code (16) is comparable to the paradox records (17). If there is more than one variation of estimates, all known variants shall be used.

2. Analogiškų kodų paieška šablonų duomenų bazėje (19). Yra skaitomi kodai, pagal kuriuos ėjo paieška (27) ir kodai su aptiktu statistiniu ryšiu (31). Jeigu paieška yra pakartotinė tam pačiam pranešimui, tai anksčiau aptiktas šablonas nėra skaitomas.2. Searching for similar codes in the template database (19). The codes followed by the search (27) and the codes with the detected statistical relationship (31) are read. If the search is repeated for the same message, the previously detected template is not read.

3. Duomenų filtrų sujungimas iš dvejų ar daugiau šablonų su analogiškais kodais (22). Sujungiamų filtrų kiekis yra apribotas (22). Filtrai gali būti sujungiami iš eilės arba atsitiktine tvarka. Šablonai su aukštu reikšmingumo įvertinimu (36) yra naudojami dažniau (22). Tokiu būdu didinama tikimybė aptikti sudėtingesnį ir žmogui įdomų pavyzdį.3. Combining data filters from two or more templates with analog codes (22). The number of filters that can be connected is limited (22). The filters can be connected in series or randomly. Templates with high significance rating (36) are used more frequently (22). This increases the likelihood of detecting a more complex and intriguing sample.

4. Iš duomenų filtro išimami vieno ar daugiau šablonų su žemu reikšmingumo įvertinimu nustatymai (21). Tokiu būdu mažinama tikimybė aptikti žmogui neįdomu pavyzdį.4. The data filter removes settings for one or more low-significance templates (21). This reduces the likelihood of a person not being interested in the sample.

5. Sulyginamų matricos fragmentų ilgis yra išplečiamas iki laiko intervalo tarp įvykio ir pranešimo apie jį (indeksai H, M). Šita operacija gali būti vykdoma pasirinktinai. Pasirinkimas gali būti atsitiktinis.5. The length of the matrix fragments to be compared is extended to the time interval between the event and its notification (indexes H, M). This operation may be optional. The choice may be random.

6. Atsitiktine tvarka išrinkti įvertinimai ar požymiai yra naudojami duomenų filtro išplėtimui (23).6. Randomized estimates or attributes are used to extend the data filter (23).

Pagal pateiktą procedūrą šablonų paieškos nustatymai evoliucionuoja kombinacijos (22), mutacijos (23) ir atrankos (20) būdu. Kiekvieną kartą matrica (18) yra analizuojama (25) su pakeistu filtru (24). Pranešimo analizė gali būti kartojama daugybę kartų. Iš pradžių duomenų bazėje kaupiasi didelis kiekis paprastų šablonų (fig.12). Vėliau atsiranda šablonai su dideliu kiekiu pranešimų viename pavyzdyje.In the given procedure, template search settings evolve through combination (22), mutation (23), and selection (20). Each time the matrix (18) is analyzed (25) with a modified filter (24). The message analysis can be repeated many times. Initially, a large number of simple templates are stored in the database (FIG. 12). Later templates with a large number of messages in a single instance appear.

Algoritmo operacijų eiliškumas, duomenų prioritetas ir ribinės vertės yra keičiamos. Skaičiavimo operacijų kiekis, kuris reikalingas šablono pavyzdžių aptikimui, yra skaičiuojamas. Procesų optimizacija yra vykdoma, taikant didesnio našumo nustatymus. Ribinės vertės pavyzdys yra koreliacijos koeficientas (15), virš kurio statistinis pranešimų ryšys laikomas aptiktu. Duomenų prioriteto pavyzdys - tai balansas tarp atsitiktinių reikšmių (23) ir išsaugotų šablonų nustatymų (22). Operacijų eiliškumas ir duomenų prioritetas yra keičiamas žmogaus pagalba arba įtraukus į priklausomybę nuo apskaičiuojamų reikšmių. Kaupiami algoritmo našumo įvertinimai yra naudojami parametrų tikslinimui. Pavyzdžiui, yra tikslinamas kiekis šablonų, iš kurių formuojamas naujas šablono paieškos nustatymų profilis (22).The order of operations of the algorithm, the priority of the data and the thresholds are changed. The number of count operations required to detect template samples is counted. Process optimization is performed with higher performance settings. An example of a threshold value is the correlation coefficient (15) above which a statistical message relationship is considered detected. An example of data priority is the balance between random values (23) and saved template settings (22). The order of operations and the priority of the data are changed by human assistance or by dependence on the calculated values. Cumulative algorithm performance estimates are used to refine the parameters. For example, there is a refinement of the number of templates that are used to form a new profile for the template search settings (22).

Išradimas yra realizuojamas kaip universalus programinis produktas. Duomenų įvedimo būdas ir klasifikatorius yra keičiami priklausomai nuo užduoties.The invention is realized as a universal software product. The data entry method and classifier are changed depending on the task.

Pirmas šio išradimo panaudojimo pavyzdys - naujienų reitingo agentūra. Reitingas - tai tikimybė, kad kartojasi ar gali pasikartoti tam tikra pranešimų seka.A first example of using the present invention is a news rating agency. Ranking is the likelihood of a repetition or repetition of a particular message sequence.

Antras šio išradimo panaudojimo pavyzdys - ryšių rekonstrukcija tarp žmogaus gyvenimo aplinkybių ir jam adresuotos reklamos turinio. Matricoje yra kaupiami duomenys apie reklamos turinį ir reklamos gavėją. Yra surenkami duomenys nuo didelio kiekio reklamos gavėjų. Šablonų paieška daroma visų reklamos gavėjų matricose. Reklamos gavėjas už dalyvavimą tokiame tyrime gauna išsamią informaciją apie aptiktus šablonus.A second example of the use of the present invention is the reconstruction of the relationship between the circumstances of human life and the content of advertising addressed to it. The matrix contains data about the content of the advertisement and the recipient of the advertisement. Data is collected from a large number of advertising recipients. Template searches are done on all ad recipients' matrices. The recipient of the advertisement receives detailed information about the detected templates for participating in such investigation.

Trečias šio išradimo panaudojimo pavyzdys - lojalumo aplinkybių tyrimas. Pavyzdžiui, aptiktas šablonas parodo, kuo išsiskiria vartotojai, kuriems taikomos ypatingos sąlygos. Surenkami pranešimai apie sąlygas iš didelio kiekio vartotojų.A third example of the use of the present invention is the study of loyalty circumstances. For example, the detected template represents what distinguishes users with special conditions. Collects messages about conditions from a large number of users.

Ketvirtas šio išradimo panaudojimo pavyzdys - pranešėjo savybių įvertinimas. Pavyzdžiui, yra analizuojamas susirašinėjimų archyvas iš daugelio vartotojų. Išskiriami šablonai, siejami su neigiamu atsiliepimu. Vartotojas gauna informaciją apie tikėtiną pranešėjo manipuliacijos įprotį.A fourth example of the use of the present invention is the evaluation of the properties of the presenter. For example, an archive of correspondence from many users is being analyzed. Excludes templates associated with negative feedback. The user receives information on the likely behavior of the reporter's manipulation.

Penktas šio išradimo panaudojimo pavyzdys - analitiko savybių įvertinimas. Pavyzdžiui, žmogui arba dirbtiniam intelektui duodama užduotis klasifikuoti pranešimus. Skirtingi žmonės klasifikuoja tuos pačius pranešimus. Vieno žmogaus pranešimų įvertinimai sudaro fiksuoto ilgio matricos fragmentą, šablonai atspindi žmonių grupes su skirtingomis pažiūromis.A fifth example of the use of the present invention is the evaluation of the properties of an analyst. For example, human or artificial intelligence is given the task of classifying messages. Different people classify the same messages. Estimates of one person messages form a fragment of a fixed length matrix, the templates represent groups of people with different views.

Claims

Definition

A method of analyzing a message sequence for detecting patterns and using formal estimates of message attributes and circumstances to form a value matrix that is stored in one or more memory modules, wherein the processor detects in the array analogs for each message according to one or more attributes; compares, on a time scale, fragments of a matrix that precede or precede analog messages; if a high correlation between matched matrix fragments with a uniform data filter is detected, information about the performed alignment, called a template, is stored in the memory module.

2. The method of analyzing a message sequence according to claim 1, characterized in that the classification of the event message is performed according to the following attributes: date of the message; evaluation of attracted attention; the subject of the report; location of events or consequences; the location or entity of the cause of the event; a description of the action; the stage of completion of the action; the time between the event and the message; type of information source; assessment of reality from other sources; circumstances of notification from other sources of information; communication with other messages about the same event.

3. A method for analyzing a message sequence according to claim 1, characterized in that the message is reclassified; different estimates of the same message are written to the memory module as a paradox; the processor performs a search for message analogs in the matrix using paradox records.

4. The method of analyzing a message sequence according to claim 1, wherein the processor calculates a correlation between aligned matrix fragments using one of the following graphs on a time scale: the amount of attention attracted; volume of data; rate of data volume change; other derivative functions.

5. The method of analyzing a message sequence according to claim 1, characterized in that the processor uses the following data filters: extraction, inversion, conjugation, disjunction, coefficient.

6. The method of analyzing a message sequence according to claim 1, wherein the length of the matrix fragments to be compared includes the time interval between the event and the event message.

7. The method of analyzing a message sequence according to claim 1, wherein the processor shortens matching matrix fragments until the correlation value increases.

8. A method of analyzing a message sequence according to claim 1, wherein the processor optionally removes the data filter or inverts the message estimates and writes a template with a higher correlation coefficient to the memory module.

9. The message sequence analysis method of claim 1, wherein the processor comprises a wider data filter of two or more templates stored in the memory module.

A method for analyzing a message sequence according to claim 1 and point 9, characterized in that the human-provided template significance estimate is written to the memory module; the processor is more likely to use templates with a higher significance rating to expand the data filter.

11. A method for analyzing a message sequence according to claim 1, wherein the processor extends the data filter using randomly selected estimates or attributes.

12. A method for analyzing a message sequence according to claim 1, characterized in that the processor changes the order of operations of the algorithm, the data priority and the thresholds; the amount of processor operations for detecting template samples is counted and written to the memory module.

13. A method of analyzing a message sequence according to claim 1, wherein the template search is performed in relation to a scheduled or imaginary message.