Summary of the invention
The technical problem to be solved in the present invention is, by the creative use to existing network character terminal technology, provides a kind of character terminal characteristic extracting method accurate, efficient, applied widely.
For solving these technical matterss, solution of the present invention is:
The invention provides the character terminal characteristic extracting method that a kind of Behavior-based control is analyzed, obtain by network monitoring the data produced in the client of user and main frame reciprocal process, by terminal simulation to data analysis, reciprocal process between reduction client and main frame, then extract required content further according to the rule of conduct of direct interaction man-machine in client use procedure; The method realizes based on character terminal characteristic extraction system;
Described character terminal characteristic extraction system comprises data monitoring module, terminal simulation module, behavioural analysis module and data recordin module; Wherein, data monitoring module is connected with behavioural analysis module with terminal simulation module respectively, and terminal simulation module is connected with behavioural analysis module, and behavioural analysis module is connected with data recordin module, realizes electrical signal transfer between each connected module; Described data monitoring module is deployed between client and main frame, for all communication datas between supervisory user and main frame, and submits to terminal simulation module and processes after these data of acquisition; Described terminal simulation module for the instruction simulation that main frame sent to the description of client and how to draw on client terminal display and change into user can the Word message of direct reading; Described behavioural analysis module is used for the data analysis collected and screening, to promote the efficiency that character terminal characteristic is extracted; Described data recordin module, for recording the data by analysis, produces corresponding log information simultaneously;
Describedly obtained the data produced in the client of user and main frame reciprocal process by network monitoring, refer to that all data of data monitoring module to client and main frame reciprocal process carry out record, comprise the data of recording user request and response of host;
Described by terminal simulation to data analysis, reciprocal process between reduction client and main frame, referring to the instruction simulation that main frame sends to the description of client how to draw on the terminal display of client by terminal simulation module and change into user can the Word message of direct reading;
By the analysis to response of host data, terminal work mode is divided into command mode and edit pattern; The described rule of conduct according to direct interaction man-machine in client use procedure extracts required content further, refers to the data of behavioural analysis module needed for the extraction of one of following rule:
(1) by judging whether there is the mode of operation that cursor positioning sequence distinguishes client in the interaction data between client and main frame, if there is cursor positioning sequence, client operation is under edit pattern; If there is no cursor positioning sequence then client operation in command mode;
(2) last column content only extracting response of host data is analyzed, and constructs last column content that user is operating, i.e. the order line content of user's execution;
(3) input of character string of user command is recorded, and the data content comparison of replying with host side; Be exactly the place that this order starts to perform when identical character string occurs time, time namely this last bar order performs and terminates;
(4) by extracting and analyze the end row of response of host data, when user keys in execution symbol (enter key), the input data of user are re-constructed out as the content needing to extract;
(5) if require only to need to extract the order line performed by user, then the correlated inputs that user does under editing mode exports and is just left in the basket, to reduce redundant work amount;
(6) when user's request is the multiple line content of sticky note, multiple line content is split as single file content one by one, and processes with single file logic respectively;
(7) specify a separator (enter key is also order execute key simultaneously), with this separator for boundary, splitting multiple line content, is single file logic by multirow logic simplifying.
In the present invention, the data of response of host comprise two kinds: need the printable content (as character and symbol) that outputs on terminal display and control the terminal serial that character and character attibute terminal attribute content are described in where over the display.
Compared with prior art, beneficial effect of the present invention is:
The present invention solves the problem that when using automation tools analysis network character terminal device data stream, accuracy, high efficiency and applicability can not be taken into account well, and provide one flexibly, the flogic system of sustainable improvement, thus there is wide application space.
Embodiment
First it should be noted that, the present invention relates to the aspects of contents such as computer networking technology, information security technology and data mining technology, is a kind of integrated application of computer technology in above-mentioned field.In implementation procedure of the present invention, the application of multiple software function module can be related to.Applicant thinks, as reading over application documents, accurate understanding is of the present invention realize principle and goal of the invention after, when in conjunction with existing known technology, those skilled in the art can use its software programming technical ability grasped to realize the present invention completely.Aforementioned software functional module comprises but is not limited to:
Data monitoring module: this module is deployed between user and main frame, all communication datas between user and main frame all can under the monitoring of this module, and a series of modules submitted to below process by these data.
Terminal simulation module: as mentioned before, the instruction simulation that host side sends to client to describe how to draw on client terminal display by this module in charge also changes into the Word message that ordinary people can read.
Behavioural analysis module: this module be a series of according to character terminal in use man-machine direct interaction time behavior and the rule set that sums up of experience.By these rules to the data analysis collected and screening, promote working effect.
Data recordin module: this module in charge record needs the data recorded after described series module analysis above, produces corresponding log information simultaneously.
This category of all genus that all the present patent application files are mentioned, applicant will not enumerate.
The principle that realizes of the present invention is:
First obtained the data produced in user and main frame reciprocal process by network monitoring, comprise user's request and the two-part content of response of host.Then by terminal simulation, response of host is analyzed, restore the content that user can see on terminal display.This content again with the user's request msg obtained before, integrate and analyze further (behavioural analysis) according to the behavior of direct interaction man-machine in character terminal use procedure and experience, thus obtain the critical data be concerned about.Finally extract and record data.
Describedly obtained the data produced in the client of user and main frame reciprocal process by network monitoring, refer to that all data of data monitoring module to client and main frame reciprocal process carry out record, comprise the data of recording user request and response of host;
By analytic record data, reciprocal process is reduced, comprises the following steps:
(1) utilize terminal simulation to analyze response of host, restore the content that user sees on terminal display.
(2) data of user's request are obtained.
(3) in conjunction with the behavior of direct interaction man-machine in character terminal use procedure and experience, the data of acquisition are comprehensively analyzed, extract paid close attention to data (hereinafter referred to as " behavioural analysis ").
The use-pattern to terminal serial that " terminal simulation " mentioned above method is traditional, with the addition of new application idea.Traditionally, if the effect of terminal serial is control terminal display show what content---terminal display is regarded as a piece of paper, and the effect of terminal serial is exactly tell how Computer application draws on this paper.And the information that user wants to pass on by understanding main frame to the reading of these images.Terminal simulation is then that these to be described the information reverting of how to draw be that domestic consumer can the Word message of direct reading, is below the contrast of conventional terminal analysis and the terminal simulation course of work:
Conventional terminal is analyzed: the image → user on terminal serial → terminal display reads the Word message required for oneself from image.
Terminal simulation: terminal serial → Word message → user's direct reading.
" behavioural analysis " mentioned above be a series of according to character terminal equipment in use man-machine direct interaction time behavior and the rule set that sums up of experience.Apply these rules to the data analysis collected, can greatly promote working effect of the present invention.This serial of methods is comparatively loaded down with trivial details can not simplified summary, and following instance illustrates how this function solves the ubiquitous problem of previously mentioned like product:
(1) terminal work mode judgment mode is stiff:
The mode of operation of terminal is divided into two classes, i.e. command mode and edit pattern, and we need the mode of operation of distinguishing terminal to stop the generation of a large amount of gibberish.Such as: if the order line (mode of operation of order line is command mode) will extracting user's input so just needs to ignore the operation (such as user writes document in vi editing machine) under edit pattern, otherwise redundant data useless in a large number can be produced.
More common differentiation mode relies on the terminal serial of certain several key (DECSET:DECPrivateModeSet enters edit pattern; DECRST:DECPrivateModeReset exits edit pattern etc.), then think when these sequences occur to have entered or exited edit pattern.This method seems feasible but in fact as previously mentioned, terminal and user are direct interactions, how user uses terminal to be prescribed, in this example, if user adopts improper means to exit edit pattern and (such as directly stops this process after entering edit pattern, in linux, use kill order etc.) then these critical sequences would not occur, but edit pattern has been exited and entered command mode, user's use procedures all has afterwards been carried out under being all considered to be in edit pattern and can not have been carried out record.So just lost the data that originally should record.
And determination methods of the present invention is like this: cursor positioning sequence (CUP:CursorPosition) is the one of terminal serial, its effect is the position describing current cursor traditionally, and the present invention imparts new using method to cursor positioning sequence---rule of thumb analyze, as long as no matter when terminal is in edit pattern, then be bound to by using cursor positioning sequence to carry out cursor location, such terminal display just knows where will show information at screen, and can not use cursor positioning sequence under command mode.Judge whether there is cursor positioning sequence in the interaction data between user and main frame---so just can two kinds of mode of operations of clearly distinguishing terminal whenever and wherever possible, this differentiation mode granularity is fine to every a line of screen display, because the words entered a new line under edit pattern must need to use up and demarcate bit sequence and put to the row first place moving the cursor to next line, even so improper termination process or switch between edit pattern and command mode continually and also can not affect working effect of the present invention.
(2) full scan efficiency is low:
Like product is got used to analyzing obtained all terminal serial at work stiffly, in fact as mentioned before, terminal serial is at description how rendering image on terminal display in fact, wherein a large amount of information does not in fact resolve value, just some crucial literal information wherein in fact that user is concerned about.Such analysis mode wastes a large amount of hardware resources.
Screen divider is some regions by disposal route of the present invention: by behavioural analysis, only analyzes interested region.For a modal example: capture the line command that user inputs in command mode.According to behavioural analysis, user input order necessarily current screen display all the elements in last column, so for obtained all terminal serial, the present invention only resolves (namely only analyzing last terminal line feed sequence---content after " 0x0d0a ") last column.In such Water demand great amount of terminals sequence, very small percentage content just can grab the content needing to capture, and dramatically saves on hardware resource.
(3) key-strings is relied on:
Application scenarios: user replicates multirow order, then pastes terminal and performs these orders.Difficult point maximum in this application scenarios distinguishes the exercise boundary of each order, such as first command performs aft engine end can reply a large amount of information to screen, after these return informations export, and then will input second command and perform, how can just know that first command finishes, second command starts in this case?
Traditional method relies on command-line prompt symbol, command-line prompt symbol character string (character string that similar " [usercentos139 ~] $ " is such) namely printed during each host side prompting user's input command.Same precedent, first command performs after prompt, and when prompt occurs again, illustrate and be about to perform second command, that is first command is finished.
The method seems feasible but in fact there is very big hidden danger, because the command-line prompt symbol of nearly all character terminal can both be modified by User Defined, once user in use have modified the character string of command-line prompt symbol---even direct prompt is defined as sky, so this working method just thoroughly fails.
Solution of the present invention: according to behavioural analysis, for each character of user's input, main frame all sequentially can give echo, such as user have input respectively in order character " abcde " then host side to be bound to the character string that echo " abcde " is so in order.Get back in application scenarios above-mentioned, the present invention records the input of character string of second command, and with the content comparison that host side is replied, when identical character string occurs time be exactly second command start perform place, namely first command perform terminate time.
This mode user inputs any content with regard to what content of comparison, and do not associate with Time window with applied environment, the scope of application is greatly improved.
(4) limitation of applied environment.
Some like product specifies that user can only use the terminal of its customized, just can complete or complete above-mentioned functions preferably.
And the present invention does not have such restriction, the terminal program of any one main flow is all within the scope of application.Because terminal control sequence is terminal works standard general at present, be that customized terminal or those traditional classical end products are all based upon on this standard of terminal control sequence.The present invention supports the terminal serial of standard completely, the scope of application and extensively.
(5) limitation function, impact uses.
Some like product is the use habit of limited subscriber because processing the application scenarios of some complexity, and the function that some are conventional is forbidden.Such as user can not use the complete order of tab key automatic makeup, upper and lower key can not be used to input history command fast, can not use classical " reverse-i-search " function etc.Although can guarantee like this to implement record to the use procedure of user, so limitation function greatly can affect the experience of user.
And the present invention does not do any similar restriction, user in use can not feel completely.
5 enumerated above examples are a part for behavioural analysis module, also have much loaded down with trivial details analysis mode just not list one by one here.
" behavioural analysis " function mentioned above is a module being similar to rule base, and the experience that this module produces in can using according to reality on existing basis adds more analysis logic, improves accuracy, high efficiency and the applicability extracted further.
Below once the exemplary embodiments that simple character terminal order line is extracted:
Suppose that user is signed in on certain typical linux system by network character terminal.
Suppose that user has carried out some operations on the terminal.
Suppose that we need to extract user and have keyed in those orders.
Concrete steps:
(1) user signs in host computer system, and main frame can reply to user's bulk information, as system logo, current time, user basic information etc., and character as shown below:
Lastlogin:SatMar1612:05:022013from192.168.50.139<0d><0d><0a>
[usercentos139~]$
(2) learn that be concerned about command-line prompt character string is positioned at last column through behavioural analysis, so the content before last <0d><0aGreatT.Gr eaT.GT can be left in the basket, more finally obtain character string through terminal simulation:
[usercentos139~]$
(3) user begins typing order, and such as user keys in order " ls ", and namely first backward main frame have sent two characters " l ", " s ".
(4) main frame can carry out echo to the order that user keys in, and now we record host response further, after general terminal simulation, just obtains following character string in the data recorded before the content of reply being added to:
[usercentos139 ~] $ ls---(non-concealed file is listed in the effect of ordering " ls " in linux)
(5) user finds the order that have input mistake, needs to modify to order line, and suppose that user has keyed in backspace key (backspace) here, host side can reply following character:
<08><1b>[K
(6) through terminal simulation analysis, string representation current cursor above moves forward a col width, and eliminates all the elements of cursor position and (but being only limited to one's own profession) afterwards thereof.So obtain character string:
[usercentos139 ~] $ l---(character " s " has been eliminated)
(7) user's typing character ". ", host side can reply same character ". ".
(8) through terminal simulation analysis, new character string is obtained:
[usercentos139 ~] $ l.---(hidden file is listed in the effect of ordering " l. " in linux)
(9) user keys in enter key, namely starts fill order, and data monitoring module captures this action, represents that an order is about to be performed, and produce a record as above-mentioned command-line string, this order line captures successfully thereupon.
Above-mentioned example is a very simple applicating example, applied environment in reality wants complicated a lot of by comparison, the content of such as user input in the editing machine (the vi editing machine as Linux) does not need to extract, user have input very long order and carries out various editing operation (as inserted or delete character) to order line, or employing many situations such as history command search function, the complete function of order automatic makeup.