CN101004737A - Individualized document processing system based on keywords - Google Patents

Individualized document processing system based on keywords Download PDF

Info

Publication number
CN101004737A
CN101004737A CN 200710200102 CN200710200102A CN101004737A CN 101004737 A CN101004737 A CN 101004737A CN 200710200102 CN200710200102 CN 200710200102 CN 200710200102 A CN200710200102 A CN 200710200102A CN 101004737 A CN101004737 A CN 101004737A
Authority
CN
China
Prior art keywords
keyword
document
user
window
processing system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 200710200102
Other languages
Chinese (zh)
Inventor
李丹宁
李丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
YITE SOFTWARE CO Ltd GUIYANG
Original Assignee
YITE SOFTWARE CO Ltd GUIYANG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by YITE SOFTWARE CO Ltd GUIYANG filed Critical YITE SOFTWARE CO Ltd GUIYANG
Priority to CN 200710200102 priority Critical patent/CN101004737A/en
Publication of CN101004737A publication Critical patent/CN101004737A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A processing system of personalized file based on key words comprises screen coordination display unit, information organizing unit, identification unit of mouse track, screen word-taking unit, word-taking unit of clipboard, word-taking unit of input course, identification unit of key word, analysis unit of key word, key word semantic unit, identification unit of concerned position, identification unit of operation control, subject word and evaluation unit, remarks and review unit, search engine interface unit and interface unit of auxiliary tool library.

Description

Individualized document processing system based on keyword
Technical field
The present invention relates to a kind of DPS (Document Processing System), particularly a kind of individualized document processing system that utilizes artificial intelligence document and man-machine interaction to be carried out Intelligent treatment based on keyword.
Background technology
Along with popularizing of personal computer and Internet network, a large amount of informational needs that can handle considerably beyond human user is organization and management better, so hyperlink technology and portal website spread all over the Internet network, the full-text search technology is rapidly developed.They provide two kinds of patterns of search information at least, such as their directory web site by the theme tissue that will browse by the user, or by carrying out the keyword search of user interface input on the browser.Because when organizing the mode of webpage to show the information of Internet network by theme, often need that classification is divided into many sub-directories and a large amount of pages with relevant summary with directory web site, when making the user seek certain customizing messages, expend a large amount of operations; Because the information all too on the Internet network is many, hundreds of catalogue can only be showed minimum a part of information wherein, so the user often can't find information specific in addition.Concerning many users, organize the mode of webpage to search the source as newspaper and public information on the network by theme.Full-text search technology based on keyword can provide information inquiry more easily, and usually, keyword search will be use up computing machine can find all Web website or webpages that wherein have any information relevant with phrase with all keywords of appointment.This problem of bringing is to search too many spam page, makes the user wish that the information of searching floods wherein, needs the user to find the information of real needs by reading.
On personal computer, along with from the Internet network download with by the exchanging of internal lan, and the document that user oneself writes and edits is more and more, number is also quite huge, often is several ten thousand or a hundreds of thousands document.Therefore, also begin to apply on the personal computer, as replenishing of file directory storage mode by theme organizes documents and full-text search technology.Compare specialized portal website, the personal user organizes the ability of document on the personal computer and energy little a lot, is difficult to safeguard a document system by the theme tissue good, real-time update.For the full-text search based on keyword, search engine can be safeguarded and upgrade automatically, but must reduce the interference of junk information as far as possible.Because the information that the user often that the user will search on personal computer determines, and do not need to find a collection of similar information, so file organization technology personalized, intelligent, robotization is the real needs of personal computer user, though this technology has certain development, but still be a difficult problem.
Summary of the invention
The objective of the invention is to, a kind of individualized document processing system based on keyword is provided.This system searches in full and is basic means, and the function of comprehensive a plurality of aids is utilized soft hyperlink technology, is node with the keyword, adopts intelligentized method, and document is organized.
Technical scheme of the present invention.Individualized document processing system based on keyword, it comprises having at least one processor and storage stack, and comprise at least an output usefulness screen and the input usefulness keyboard, mouse is to provide the user interface of user and program interaction, and with external memory storage, internal lan connects, perhaps with external memory storage, outside Internet connects, or and external memory storage, the computing machine that internal lan is connected with outside Internet, operation has the operating system of multitask and multiwindow in this computing machine, the formation of system also comprises
Main document and the auxiliary document related in order to the explicit user concern with this main document, the perhaps main document and the supplementary related paid close attention to of explicit user, the perhaps main document paid close attention to of explicit user and the auxiliary document related and the collaborative display device of screen of supplementary with this main document with this main document;
Various information that show in order to the collaborative display device of storage screen and the relation between these information and the information organization device that uses or supply external tool to call for other device;
Some particular mouse tracks that draw consciously in order to identification user rolling mouse and the mouse track recognition device that calls respective operations;
Be used for determining the keyword generation device of user's interest keyword;
The keyword that is used for that the user is determined carries out the keyword treating apparatus of analyzing and processing;
And the facility invokes device that is used for calling external tool according to the keyword that the user determines.
In the above-mentioned individualized document processing system based on keyword, the keyword generation device comprises in order to the particular track of using mouse to move by the user indicates the literal that shows on the screen, the user can see and the starting and ending position of symbol string consciously, and this literal and symbol string are grabbed the speech device as the screen that keyword takes out.
In the aforesaid individualized document processing system based on keyword, the keyword generation device comprises that the clipbook that utilizes operating system to provide in order to monitoring user duplicates, pastes, moves and even deletion action, and watch content in the clipbook, judge whether it is keyword, if then the clipbook that takes out as keyword is got the speech device.
In the aforesaid individualized document processing system based on keyword, the keyword generation device comprises literal and the symbol string in order to the monitoring user input, and whether literal and the symbol string imported by automatic analysis of key word recognition device and judgement are keyword, if then this literal and symbol string are got the speech device as the input process that keyword takes out.
In the aforesaid individualized document processing system based on keyword, whether the keyword treating apparatus comprises in order to judging whether a given literal and symbol string are keywords, and may be the key word recognition device of new keyword and in order to determine the user's interest keyword and to determine that the user wishes the key word analysis device by this keyword start-up operation.
In the above-mentioned individualized document processing system based on keyword, the keyword treating apparatus also comprises in order to the semantic device of the keyword of the semanteme that provides keyword, the semantic device of this keyword provides the semantic of keyword that the key word recognition device identifies and records in the information organization device, shows on screen; Perhaps keyword and relevant semanteme are delivered to the key word analysis device and carry out subsequent treatment.
In the aforesaid individualized document processing system based on keyword, the facility invokes device comprises that one is called external search engine based on keyword, to improve the search engine interface device of information search quality.
In the aforesaid individualized document processing system based on keyword, the facility invokes device comprises one group of aid bank interface device that calls outside aid storehouse based on keyword.
In the aforesaid individualized document processing system based on keyword, described aid storehouse include but not limited to China and foreign countries' cliction allusion quotation, Chinese dictionary, encyclopedia, address list, yellow pages, counter, map, vidclip, snatch of music, the famous person introduces and content is relevant document links one of at least.
In the aforesaid individualized document processing system based on keyword, the formation of system also comprises in order to the information of extracting certain position that the user pays close attention in the main document and the concern position identification device of characteristics.
In the aforesaid individualized document processing system based on keyword, the formation of system also comprises in order to search corresponding function or instruction according to the track code name that obtains behind the mouse track recognition device identification track, and call corresponding function or send instruction corresponding according to the situation of running environment and correlation parameter, and under unaccommodated situation, the operation of sending of calling and the instructing control recognition device of cancellation function.
In the aforesaid individualized document processing system based on keyword, the formation of system also comprises in order to eject a window shows descriptor and the comment device that descriptor or comment are selected for the user with tree structure.
In the aforesaid individualized document processing system based on keyword, the formation of system also comprises in order to open a text editor for user's typing Word message, the context that document is paid close attention to the position carries out remarks and replenishes, perhaps comment on, perhaps carry out remarks and replenish and remarks of commenting on and comment device.
In the aforesaid individualized document processing system based on keyword, the collaborative display device of screen comprises main window, auxilliary window, collaborative display control unit; Main window is exactly the display window of conventional software, carries out main reading operation, perhaps editing operation, or the information that includes literal and symbol of reading and editing operation in order to explicit user; Collaborative display control unit is in order to monitor the operation of main window, perhaps accept the message that main window sends, perhaps monitor the operation of main window and accept the message that main window sends, when the main window content displayed changes, calculate all that show in the main window selected or the coordinate figure of the keyword that produces and the scaling value of demonstration, and these values are passed to all auxilliary windows, and make auxilliary window can adjust display position and state, change synergistically with the main window content displayed and show; Auxilliary window is used for demonstrating corresponding help, remarks, comment, descriptor and comment supplementary based on the displaying contents in the main window under the control of collaborative display control unit.
In the aforesaid individualized document processing system based on keyword, auxilliary window has visually-clear and the transparent function of interactive operation, that is: when auxilliary window places on the main window, auxilliary window has translucent to complete transparent frame and background, the user can see the shown content of main window under the auxilliary window clearly, and the literal or the graphical content that show in the auxilliary window swim on the main window; User's keyboard and mouse action can see through the displaying contents in auxilliary window and the auxilliary window and operate under the auxilliary window on the main window content displayed.
In the aforesaid individualized document processing system based on keyword, the new position coordinate value of the keyword that auxilliary window can provide according to collaborative display control unit and scaling value show and refresh, the demonstration that mainly contains four kinds of patterns refreshes, that is: when move the keyword position of this auxilliary window of link in the main window, pattern one, auxilliary window is followed mobile; Pattern two, auxilliary window keeps motionless; Pattern three, auxilliary window become an equal big or small translucent icons of literal that comprises with keyword, hang over this keyword back and follow mobile; Pattern four, auxilliary close.
In the aforesaid individualized document processing system based on keyword, the formation of information organization device comprises daily record of work device, comprehensive inverted index device, soft hyperlink device, document function device and system configuration device.
In the aforesaid individualized document processing system based on keyword, the daily record of work device is used to preserve the state and the result of the computed process of user and each device operation of native system: emphasis is preserved current configuration, recent used instrument, hyperlink and the record of soft hyperlink of computing machine, user's personalization features.
In the aforesaid individualized document processing system based on keyword, the information of daily record of work device recording comprises keyword generation table, keyword application table at least and pays close attention to the position operation table, wherein the information in the keyword generation table is to grab speech device, clipbook by screen to get speech device or input process and get the speech device and produce, and after key word recognition device identification, transmitting, call that the daily record of work device writes down; Information in the keyword application table is after being called external search engine and successfully returned auxiliary document by the search engine interface device, or after aid bank interface device calls aid success return message, calls the daily record of work device and carry out record; The information in the position operation table paid close attention to is after to be descriptor with comment device and remarks and comment device move, and calls that the daily record of work device writes down.
In the aforesaid individualized document processing system based on keyword, comprehensive inverted index device is used for the record about keyword, descriptor and comment that writes down according to the daily record of work device, foundation is index terms with keyword, descriptor and comment, is the concordance list of search terms with document, remarks, comment and aid; Vocabulary in the concordance list is dynamic, constantly there is new keywords to add also constantly to have been friends in the past the keyword deletion or forgets, all new keywords of confirming through the user adopt the pattern of immediate memory, short-term memory and long-term memory three phases to remember and forget, and emerging descriptor and comment directly enter the short-term memory buffer zone; The new keywords of immediate memory is retained in the immediate memory buffer zone of the comprehensive device of falling the ranking index, and simultaneously according to its channel information of index master meter record, this memory buffer zone frequency of utilization sorts and reduces word frequency with forgeing very fast forgetting algorithm; When this new keywords and certain document associations or with certain aid fixed correlation, perhaps instantaneous word frequency greater than certain threshold value after, this keyword is just deposited in the short-term memory buffer zone of the comprehensive device of falling the ranking index, and still write down the generation channel of this speech according to the index master meter, the word frequency statistics of this new keywords takes new channel to reward the algorithm statistics, and sort and forget slow forgetting algorithm and reduce word frequency according to word frequency, the speech that word frequency is very low will pass into silence, and get rid of from buffer zone; When the necessary attribute of the new keywords that deposits the short-term memory buffer zone in is replenished, and word frequency is higher than certain threshold value, then this keyword will be deposited in the long-term memory buffer zone of the comprehensive device of falling the ranking index, become the new keywords in this district, and still carry out the channel bonus algorithm of word frequency statistics and reduce word frequency with forgeing very slow forgetting algorithm according to the index master meter.
In the aforesaid individualized document processing system based on keyword, the channel bonus algorithm of word frequency statistics is the combined action and the influence of feeling of freshness to remembering of the sense organ channel in the imitation human mind process; A keyword is used the n time, and uses channel Hi, and at this moment Hi has totally been used n iInferior, the word frequency Fc of this speech (n)=Fc (n-1)+1+Ft (Hi, n then i), Fc is the integer between 0~255, when Fc (n-1)=255, and Fc (n)=Fc (n-1); Ft (Hi, n i) for using Hi channel n iInferior award word frequency value, Ft (Hi, n i) computing formula be:
Ft ( Hi , n i ) = F 0 · min j { Zd i n i [ 1 - Q ( Hi , Hj ) · ( 1 - ZD j n j ) ] }
Fo is a word frequency award value, and Q is the channel similarity, and Zd is the feeling of freshness exponential factor of channel; The value of Zd is greater than 0, less than 1; The span of Q is between 0 and 1;
Forgetting algorithm is the characteristics of forgeing when using keyword according to the user, with reference to great this memory regulation curve revealed law of Chinese mugwort guest, this curve is divided into three sections: immediate memory stage, short-term memory stage and long-term memory stage, and all simulate with exponential function, the word frequency Fc of index terms is as the tolerance of memory intensity in comprehensive inverted index device, if T is after the time for process, forgeing ratio is Y, the value of Y is greater than 0, less than 1, then forgetting algorithm is exactly after t after a while, memory residual quantity Fc=Fc 0* (1-Y) T/T
In the aforesaid individualized document processing system based on keyword, index file is pressed document attention rate Gz ordering, document attention rate Gz is relevant with the situation that the document is used, at first the frequency Fw that is used with the document is relevant, also keyword, descriptor and the comment of using with all users of the document is relevant, directly give the marking Fs of document attention rate relevant with the user, also other document and the aid of getting in touch with the document is relevant; The information that document is paid close attention to is included in the index terms of the most often using when the user uses the document, all documents get related with it most important before k index terms to carry out word frequency average, the computing formula that obtains the attention rate Gz of document is:
Gz = Rw · Fw + Rc 1 k · Σ i = 1 k Fc i + Rs · Fs
In the formula, Rw represents the weight of document frequency of utilization, and Rc represents the weight of important index terms word frequency average, and the Rs representative of consumer for different users, can be got different weighted values to the weight of the subjective marking value of document attention rate, and Rw+Rc+Rs=1.
In the aforesaid individualized document processing system based on keyword, it is the center that soft hyperlink device is used for the document, from the record of daily record of work device, keyword that grasped when the user is used the document and position thereof, aid that calls by this keyword and the information of returning, the auxiliary document that calling search engine searches, also have by paying close attention to descriptor and the comment mark that position and context thereof carry out, and the remarks and the review record of adding were arranged according to the time, with recently and the record of the most frequent use as the soft hyperlink of the document, when the document is opened and used once more, these soft hyperlink will be opened automatically accesses supplementary, collaborative being presented on the screen, operation conditions when the recovery user uses the document recently several times, and from the record of daily record of work device, can also count the main mode of calling or open the document, thereby make the document can carry out certain back tracking operation.
In the aforesaid individualized document processing system based on keyword, the document function device be used for the user the most frequently used to document, the up-to-date document of using, and will be shown according to the automatic classification that descriptor and important keyword carry out under the user-driven by the title of the new document of user's interest and brief abstract.
In the aforesaid individualized document processing system based on keyword, system configuration device is used for the user and sets total system operational factor and input userspersonal information.
Among the present invention, document make a general reference all comprise Word message, that human user can be read, can be by the message unit of independent access.The example of document has a record in TXT file, DOC file, pdf document, Excel electronic report forms file, Web webpage, the database etc.The content of document may be displayed on the screen for the user and reads, and some can not be by edit-modify (as PDF document, Web webpage and e-book etc.), some can edit-modify (as Word document etc.).The content capacity of document may diminish to has only several words, also can be greatly to louver could show on tens dividing on the screen, key be a document must as one independently message unit be stored and call.Angle from information search, document can be used as an integral body and search, i.e. each document is no matter it is stored on personal computer or leaves on the server in the internal lan, even leave on the website of exterior I nternet network, a unique memory address is all arranged.Program or software to document function can be different, though a software can read and edit multiple document, but a kind of document is often with certain specific software correspondence, such as the corresponding Word program of DOC document, PDF document corresponding A crobat Reader program, the document among the present invention is all supposed has the user to specify corresponding software.Keyword is to occur in the document, and the essential meaning of document is had significant correlation, perhaps catches the nominal speech and the phrase of document key character.Comprise the important title in the document: name, place name, address, mechanism's name, the formulation and the description of the formulation of the important time in the document, the formulation of incident and description, problem and description, notion, unique neologisms, technical term etc.Semantic speech is the semanteme of the keyword that not necessarily occurs in document.Such as name, just need to determine that this name be author, translator, the commentator of document, or the leading role who describes in the document, supporting role etc.Descriptor be needn't appear in the document according to the subject system (politics of generally acknowledging, economical, military, industry, agricultural, science, nature, biological, geology, geographical, physics, mathematics, chemistry, history etc.), (the management of business department's official document taxonomic hierarchies alone, produce, marketing, buying, investment, finance, logistics, assets, brand etc.) or user-defined personalized taxonomic hierarchies (news, amusement, school work, work, hobby, communication, family, emotion etc.) nominal speech and the phrase that certain part of document or document is classified.Comment is the speech and the phrase of the appearance part of speech document being estimated or commented on by the viewpoint of user's uniqueness, and they generally do not occur in document, such as the viewpoint to document: agree, oppose; Style to document: graceful, overcautious, stiff, inflexible, rigid, serious, rigorous, careless and sloppy, random, active, lively; Writing gimmick to document: acrid, unkind, flatter, eulogize, praise, criticize, satirize; Humorous, humour etc.Comprehensive inverted index be with keyword (comprise the new keywords with semantic word combination, will be such as " author einstein " and " einstein " as two different index terms.When mentioning index terms later on, all comprise the semanteme of keyword), the inverted index as index terms document set up jointly of descriptor and comment.The frequency of utilization static fields and the part of speech field that have also added these speech among the present invention.Because the user carries out descriptor and comment mark to the some parts of certain document or document, make the descriptor that in document, not to occur carry out related with document exactly with comment.In fact keyword, descriptor and comment have embodied the feature of document and user's characteristics from three different angles, and they are united document classified and can portray document more meticulously.In the full-text search technology, descriptor and comment can come together document is set up inverted index with keyword, and promptly in inverted index, descriptor, comment and keyword are all treated as index terms.Because descriptor, comment and keyword can easily be distinguished each other, so in an inverted index, they are still relatively independent, promptly identical with the effect of setting up three inverted indexs respectively separately in fact.
The invention belongs to the intelligent housing category of operating system, relate to the interactive operation of various users and computing machine, it be in essence one be connected the in-house network that has content management service and have on the Internet of full-text search service with keyboard, mouse and windows display are under information terminal (such as being the personal computer of operating system with Windows or the Linux) environment of mutual system, according to user personality and custom, search is basic means in full, the function of comprehensive a plurality of aids, adopt intelligentized method, help the user to go deep into the backup system of document interior tissue and leading subscriber personal information.By helping the user,, grab speech and other auxiliary interactive interfaces by screen by the course of work of recording user, increase the interactive mode and the mutual amount of user and native system, thereby grasp user's characteristics better, and utilize the intelligence of user on participle, improve the accuracy of key word recognition.The interactive mode of the newly-increased mouse track identification of native system is different from information interchange and the control model between user and the original working procedure.
The present invention pays close attention to the keyword of document inside consciously by the user, makes the granularity refinement of organization and management of information.Along with the keyword difference of selecting, user's classification is different with evaluation, the remarks that adds is different with comment, the supplementary difference of search, the aid difference of calling through intelligent machine study, just can be held the personalized cognition of user to certain document better, and used all documents of organization and management user in view of the above, by the collaborative display mode of screen remarks comment and supplementary are showed the user in company with document on screen simultaneously.
When the present invention uses document on computers by the user, monitoring user and the reciprocal process of opening the program of the document, the keyword that identification and recording user are paid close attention to, sentence, paragraph etc., and pay close attention to descriptor (the comprising user-defined descriptor) classification that contents are carried out document by these, the see content of being write is carried out the comment evaluation of the comment system of user's foundation, or outside document, replenishing to some content of document, remarks and comment, and by this keyword search to information with call other information relevant that other instrument obtains and be associated with main document, from these relevant informations, extract personalized cognitive to the document of user's personalization features and user then.In order to allow the user be ready that consciously the document being paid close attention to information partly passes to native system, this system provides the aid of each side to make things convenient for the user to browse for the user and has read, classified and estimated, writes and edited remarks and comment.
For the user who uses Chinese, the Chinese-character intelligent input tool is necessary, English-Chinese and Chinese-English Dictionary, encyclopedia, to write intelligent aid such as secretary also be standing, Internet webpage individualized intelligent research tool, the establishment instrument of hypertext hyperlink is also with intellectuality, the present invention integrates these instruments, share the kernel of a Chinese information processing, according to the unified man-machine interaction mode of a cover, use under the situation of its groundwork software at interference user as few as possible, synthetically provide service for the user.
More careful to the classification of keyword in the present invention, be to classify for the record that appears in the various databases by the characteristics of database, be equivalent to set up a hyperlink, perhaps a relation.Population storehouse, legal person storehouse, commodity storehouse, geocoding storehouse, address list, encyclopedic item library, document classification storehouse, picture library, software registration table etc. all can be relevant with keyword, and provide some contacts between them.Such as, the name in the address list can be got in touch with the name in population storehouse, and the address can be got in touch with the geocoding storehouse, and work unit can get in touch with the legal person storehouse, or the like.The abbreviation of an organization names, another name etc. should be able to be set up with the formal title of this mechanism and get in touch.Name and address list, famous person storehouse, experts database, office worker storehouse etc. are about people's associating information.A mechanism has the structure of subunit, the title of all departments, department head, main business, what main supplier of major customer etc.The name that occurs in the user email, mechanism's name, proprietary term etc. are close with customer relationship.The present invention is by aid bank interface device and search engine interface device, and the instrument and the information in other source are called in unification, and utilize soft hyperlink technology, sets up the relation of keyword and these information.
Description of drawings
Fig. 1 is formation of the present invention and control relation synoptic diagram;
Fig. 2 is the formation and the control relation synoptic diagram of the collaborative display device of screen;
Fig. 3 is the formation and the control relation synoptic diagram of information organization device;
Fig. 4 is a running environment synoptic diagram of the present invention.
Embodiment
Embodiments of the invention.Individualized document processing system based on keyword of the present invention, it constitutes as shown in Figure 1, comprising: screen work in coordination with display device, information organization device, mouse track recognition device, clipbook and is got that speech device, screen are grabbed the speech device, input process is got speech device, key word recognition device, key word analysis device, search engine interface device, aid bank interface device, paid close attention to position identification device, operation control recognition device, keyword semanteme device, descriptor and comment device and remarks and comment device.Each the device function and be constructed as follows described:
One, the collaborative display device of screen:
Main document and the auxiliary document related that the collaborative display device of screen is mainly paid close attention in order to explicit user with this main document, perhaps related supplementary, or auxiliary document related and supplementary with this main document with this main document.Its formation comprises main window, auxilliary window, collaborative display control unit as shown in Figure 2.Main window is exactly the display window of conventional software, display window as the Word software under the Windows system, carry out main reading operation in order to explicit user, perhaps editing operation, perhaps read and the information that includes literal and symbol of editing operation, by independently outside master routine control, the auxiliary main window of some demonstration auxiliary documents can also be arranged.Collaborative display control unit is in order to monitor the operation of main window, perhaps accept the message that main window sends, perhaps monitor the operation of main window and accept the message that main window sends, when the main window content displayed changes, calculate all that show in the main window selected or the coordinate figure of the keyword that produces and the scaling value of demonstration, and these values are passed to all auxilliary windows, and make auxilliary window can adjust display position and state, change synergistically with the main window content displayed and show.Auxilliary window then shows according to the position coordinate value of the new keyword that should collaborative display control unit provides and scaling value and refreshes, the demonstration that mainly contains four kinds of patterns (but being not limited to this four kinds of patterns) refreshes, that is: when move the keyword position of this auxilliary window of link in the main window, pattern one, auxilliary window is followed mobile; Pattern two, auxilliary window keeps motionless; Pattern three, auxilliary window become an equal big or small translucent icons of literal that comprises with keyword, hang over this keyword back and follow mobile; Pattern four, auxilliary close; When the demonstration refresh mode of above-mentioned auxilliary window is opened at auxilliary window, determine, also can change at any time by the user according to default mode.
Auxilliary window is unique window that the present invention designs, it has visually-clear and the transparent function of interactive operation, its display mode on screen does not succinctly have frame and slightly transparent, be as the criterion with content unshowy but that can read in the auxilliary window, be similar to the explanation window that " Kingsoft Powerword " occurs on screen, but it will be to input equipments " transparent " such as mouse, keyboard, writing pencils, promptly when cursor and mouse move in the auxilliary window ranges, the main window under the corresponding auxilliary window of the interactive operation of its mouse, keyboard, writing pencil.When auxilliary window placed on the main window, auxilliary window had translucent to complete transparent frame and background, makes the user can see the shown content of main window of assisting under the window clearly, and the literal or the graphical content that show in the auxilliary window swim on the main window; User's keyboard and mouse action can see through the displaying contents in auxilliary window and the auxilliary window and operate under the auxilliary window on the main window content displayed.Should can be converted into main window by auxilliary window, a fritter in its upper left corner by being arranged on auxilliary window or the upper right corner transforms sign and can accept the left button of mouse and click and be converted into main window, clicks when transforming sign once more, and main window is converted into auxilliary window again.Conversion operation also can be designed to the seizure of certain mouse track except the left button of mouse is clicked, such as the track at auxilliary window inside-paint " Z " printed words.Should assist window under the control of collaborative display control unit, can demonstrate corresponding supplementarys such as help, remarks, comment, descriptor and comment based on the displaying contents in the main window easily, and can move respective change along with moving of main window displaying contents along with the variation of main window displaying ratio.Characteristics that should auxilliary window and realization technology are identical with figure layer in the Arc GIS Geographic Information System.Because the user when using personal computer, reads far away more than writing, reading rate is far longer than writing speed, the user does not wish that the content that will read is capped during reading, therefore, auxilliary window generally appears at the upper left side of user's " focus ", the reading of interference user less as far as possible; During writing, because the user often will be with reference to the content of just having finished writing, so auxilliary window generally appears at user's's " focus " lower right blank space.
Two, information organization device:
The information organization device is used to store the user's that the operation by various keywords, descriptor, comment, search, aid and auxilliary window obtains individualized feature, and uses for other device, perhaps calls for external tool.It is a core of the present invention, and it constitutes as shown in Figure 3, includes daily record of work device, system configuration device, document function device, comprehensive inverted index device and soft hyperlink device.
Daily record of work device is wherein preserved the computed process of user, and emphasis is current configuration, recent used instrument, hyperlink and the record of soft hyperlink, user's the personalization features etc. of preserving computing machine.Personalized and be an importance of user personality feature towards the keyword set of information.Each user's individual character can be reflected on the keyword set and these keyword institute's linked document and the page of his use, and when handling different information, specific user can use unique keyword set.When using different instruments, when the specific article of read and write, with different aid the time, user's characteristics are just different, and keyword set is difference just.Keyword set can be the set that does not have the mathematical concept of structure, also can be the ontologies that structure and mutual relationship are arranged, and just with mechanism's name (work unit), professional name, hobby, telephone number etc. relation is arranged such as name in the address list.The daily record of work device also writes down the state and the result of each device operation of the present invention, but in internal memory, only keep recent daily record of work record, out-of-date recorded and stored is storer (in the server hard disc of this machine hard disk or internal network) externally, these records can be checked but can not be modified that it is through the general data source of statistics and the comprehensive inverted index device of arrangement back conduct, document function device and soft hyperlink device by the user.The information of daily record of work device recording comprises keyword generation table, keyword application table at least and pays close attention to the position operation table.
Wherein 1, the information in the keyword generation table is to grab speech device, clipbook by screen to get speech device or input process and get the speech device and produce, and after the identification of key word recognition device, transmitting, call that the daily record of work device writes down, the information in the keyword generation table comprises at least:
Keyword Time Produce the channel code name The document code name The position Semantic set of words Accurately/fuzzy
Wherein, for neologisms, the time must fill out, and the time is that sky shows it is old speech; This keyword of document code name record is to grasp from which document, and for the keyword that produces in the input process, if during the aid that input position is not a document in neither the aid storehouse, the document code name be a sky; Position: represent the position of this keyword in document; Accurately/and fuzzy: expression user screen is grabbed speech or input process, and to get speech be accurately or fuzzy.
2, the information in the keyword application table is after being called external search engine and successfully returned auxiliary document by the search engine interface device, or after aid bank interface device calls aid success return message, calls that the daily record of work device writes down.Information in the keyword application table comprises at least:
Keyword Time The document code name Return message Display mode
Document code name wherein is the document that searches, or the aid code name; Return message comprises aid classification and the help information of finding at aid; Display mode shows main window or auxilliary window.
3, to pay close attention to information in the position operation table be descriptor with comment device and remarks and comment device move after, call that the daily record of work device writes down.The information of paying close attention in the position operation table comprises at least:
Position classification Position coordinates Time The document code name Context Content association
Position classification wherein is screen position, the window's position, document file page position and the document content position of paying close attention in the position identification device; Position coordinates is exactly the numerical expression of position; The time that time finishes for this operation; The document code name shows the main document of operation; Context provides the corresponding context of paying close attention to the position of main document; Content association be at the context in the scope carry out that descriptor mark, comment comment, icon or sign flag, remarks replenish, the content record of text reviews etc.
Comprehensive inverted index device is used for the record about keyword, descriptor and comment that writes down according to the daily record of work device, and setting up with keyword, descriptor and comment is index terms, is the concordance list of search terms with document, remarks, comment and aid.The keyword here is dynamically to adjust, and dynamic implication is not only word frequency and is being adjusted, and the vocabulary in the index also is dynamic, constantly has neologisms to add (memory), and the speech of also constantly haveing been friends in the past is deleted (forgeing).Forgeing of old speech has two implications, one is along with time lapse, (relative time combined with absolute time, if whole index all is not used for a long time, the passage of relative time equals zero), the word frequency of all speech will reduce automatically, and another is exactly the physics deletion of entry, when word frequency drops to zero, just belong to can physics the old speech of deletion, but still be retained in the memory storage, when neologisms need storage space, carry out the physics deletion again.The entry total number of index can be a standard such as the zero word frequency entry with reservation entry sum 3% by the entry decision of zero word frequency, when the entry of zero word frequency is less than 3%, this device should increase the entry number automatically, and when too many zero word frequency entry, this device just reduces total entry number automatically." neologisms " confirmed through certain form of user used in the integrated instrument of all and the present invention can add index.All " neologisms " adopt the pattern of immediate memory, short-term memory and long-term memory three phases to remember and forget." the neologisms A " of immediate memory is retained in the immediate memory buffer zone of the comprehensive device of falling the ranking index, (screen is grabbed speech, input process, and to get speech still be that clipbook is got speech to write down the channel information that produces it simultaneously, and by which aid of this tone, returned any result), because that adopts is to use the frequency ordering and reduces word frequency with forgeing very fast forgetting algorithm, so be that the speech of using recently preferentially keeps basically, what can reject a variety of causes generation like this is not the wrong word string of speech.When " neologisms A " and certain document associations or with certain aid fixed correlation, perhaps instantaneous word frequency greater than certain threshold value after, the short-term memory buffer zone that this speech is just deposited in the comprehensive device of falling the ranking index becomes " neologisms B ", and still to write down this speech be by what channel (producing the aid of this speech) generation.The way statistics that the word frequency record of " neologisms B " takes new channel to reward, that is: generalized case, " neologisms B " is used once, word frequency increases by 1, when but new channel uses this speech for the first time, word frequency increases a bigger award number (such as 20), the award number of different channels can be different, at the short-term memory buffer zone, " neologisms B " sorts according to word frequency and forgets slow forgetting algorithm and reduce word frequency, the speech that word frequency is very low will " be forgotten " (promptly getting rid of from buffer zone), to adapt to the change in time of user interest and individual character.When the necessary attribute of " neologisms B " replenished (such as attributes such as the work unit of certain name " neologisms B ", home address, telephone number by typing after), and word frequency is higher than certain threshold value, then this speech will be deposited in the long-term memory buffer zone of the comprehensive device of falling the ranking index, become " neologisms C ", speech at this buffer zone still carries out word frequency statistics and reduces word frequency with forgeing very slow forgetting algorithm, shift with the interest that adapts to the user, word frequency is mainly used in the significance level of this speech of expression.The definition of speech has many kinds, and the definition based on morphology and syntax rule is arranged, and the definition based on statistical significance is arranged, and definition based on semanteme or the like is also arranged.The present invention is comprehensive with the definition of speech, is defined as the master with under the semantic meaning, admits the vocabulary definition of various meanings, is not limited to the speech that pure Chinese character string is formed in form.Such as " C ++Language ", " Eleventh-Five Year Plan ", " 110 alarm call " be exactly typical speech; Some long names such as meeting full name, technical paper title, movie name claim (also comprise some punctuation mark sometimes, but do not comprise fullstop) also to think speech.So index terms is defined as the character string of the certain meaning of representative of a Chinese character, letter, numbers and symbols combination.Index terms carries out word frequency statistics according to the time utilization channel bonus algorithm and the forgetting algorithm that are used to the channel that is used, number of times and the first time, and with the tolerance of word frequency (representing with Fc) as the importance of index terms.Index terms can corresponding a plurality of search terms, and these search terms all are endowed the parameter of this search terms significance level of expression: attention rate (representing) with Gz, and by the attention rate ordering.A kind of example that is achieved as follows of the data structure of comprehensive inverted index:
The index master meter
Index terms Word frequency Fc The set of channel code name Access time Document code name sequence
Document is paid close attention to kilsyth basalt
The document code name Attention rate Gz Document frequency of utilization Fw Important index word set Sg
Channel bonus value table
The channel code name Word frequency award value Fj
" access time " in the table is that index terms is added the into time of index master meter; Channel code name set be this index terms (be exactly keyword here, comprise the new keywords with semantic word combination, will be such as " author einstein " and " einstein " as two different index terms.When mentioning index terms later on, all comprise the semanteme of keyword) set of the channel that is used to; Document code name sequence is keyword generation table in the daily record of work device, keyword application table and the union of paying close attention to document code name related with this index terms in the position operation table, through the sequence that obtains after sorting by attention rate Gz.Channel is the present invention when being used for describing the user and using keyword, the situation that user's sense organ is utilized, and the reaction user is to the intensity of experiencing of this keyword.The channel bonus algorithm of word frequency statistics is exactly the combined action and the influence of feeling of freshness to remembering of the sense organ channel in the imitation human mind process, and (for the first time) rewarded Fo to promotion maximum of memory when a new channel was brought into use; Use for the second time, promote to reduce, reward 0.5*Fo; Use for the third time, reward 0.5*0.5*Fo; Just become old channel at last, reuse keyword, word frequency all increases by 1.The feeling of freshness that is channel is that index descends, and can use exponential factor Zd (0<Zd<1) expression, and Zd is more little, and feeling of freshness descends fast more, Zd=0.5 in the last example.The present invention is defined as the freshness Xx of a channel: Xx=Zd n, (n=0,1 ...).In addition, judge a channel under the situation that other channel had been used, whether new fully use for the first time, need see that these two channels have great similarity, with Q (0≤Q≤1) expression, when they when not similar at all (similarity Q=0), then be fully new; Part similar (similarity Q=0.5), then other parts are new; Similar fully (similarity Q=1) then do not have new part.Used for channel Hj under m time the situation, the relative freshness Xd of channel Hi during with the n time (Hi Hj) is defined as:
Xd ( Hi , Hj ) = Zd i n i ( 1 - Q ( Hi , Hj ) · ( 1 - Zd j n j ) )
Under the situation that has multiple support channels to use, can calculate the relative freshness Xd of channel Hi and other channel respectively, get the overall freshness Xz relatively under wherein minimum this situation of conduct then, that is: Xz (Hi, n)=Min j(Xd (Hi, Hj)).
The following similar kilsyth basalt of channel is simplified example to the similarity value as one:
Old channel New channel Similarity Q (Hi, Hj)
H11 (screen is accurately grabbed speech) H12 (input process is accurately got speech) 0.0
H11 H13 (the fuzzy speech of grabbing of screen) 0.8
H11 H14 (the fuzzy speech of getting of input process) 0.0
H11 H15 (clipbook is got speech) 0.8
H13 (the fuzzy speech of grabbing of screen) H21 (being used for search engine) 0.0
H12 (input process is accurately got speech) H24 (being used for address list) 0.0
H24 (being used for address list) H25 (being used for electronic chart) 0.2
H21 (being used for search engine) H25 (being used for electronic chart) 0.3
H22 (being used for English-Chinese, Chinese-English Dictionary) H29 (being used for counter) 0.0
As can be seen from the above table, keyword produces channel and uses the channel dissmilarity; And between the generation channel and use may be similar between the channel; Different screens is accurately grabbed speech channel H11 and clipbook is got speech channel H15 for seeming to be, but very similar Q=0.8.After having defined channel H, word frequency award value Fo, channel similarity Q and channel freshness Xx, relative freshness Xd, overall freshness Xz, just can provide word frequency channel bonus algorithm and use Hi channel n iInferior award word frequency value Ft (Hi, n i) computing formula be:
Ft ( Hi , n i ) = F 0 · min j { ZD i n i [ 1 - Q ( Hi , Hj ) · ( 1 - ZD j n j ) ] } - - - ( 1 )
Through above definition, a keyword is used the n time, and uses channel Hi, and at this moment Hi has totally been used n iInferior, the word frequency Fc of this speech (n)=Fc (n-1)+1+Ft (Hi, n then i), (0≤Fc≤255), when Fc (n-1)=255, Fc (n)=Fc (n-1).Here it is channel bonus word frequency statistics algorithm of the present invention.
Forgetting algorithm of the present invention is the characteristics of forgeing when using keyword according to the user, with reference to great this memory regulation curve revealed law of Chinese mugwort guest, this curve is divided into three sections: immediate memory stage, short-term memory stage and long-term memory stage, and all simulate with exponential function.The word frequency Fc of index terms establishes through T after the time as the tolerance of memory intensity in comprehensive inverted index device, and forgeing ratio is Y (0<Y<1), and then forgetting algorithm is exactly after t after a while, remembers residual quantity Fc=Fc 0* (1-Y) T/T, wherein Y is related with T, after promptly definite certain parameter, adjusts another parameter and determines to forget speed.
The main operation of comprehensive inverted index device comprises automatically carries out to comprehensive inverted index that new index terms interpolation, word frequency statistics, word frequency are forgotten, operations such as index terms is deleted, the renewal of document code name sequence, the renewal of channel code set, and the operation of inquiring about according to given query statement.Provide the implementation process of these operations below:
1, new index terms adds operation: by the new keywords of key word recognition device identification, new descriptor and comment by descriptor and the typing of comment device, after calling the daily record of work device and writing down, trigger comprehensive inverted index device simultaneously and add new index terms.Index terms adopts immediate memory, short-term memory and long-term memory three phases to remember and forget.New keywords A always is added to the comprehensive immediate memory buffer zone that falls to sort and draw, and write down the channel information of this keyword simultaneously according to the index master meter, be convenient to use word frequency channel bonus algorithm, adopt word frequency Fc ordering here, and with forgeing the very fast forgetting algorithm minimizing word frequency of speed, so neologisms must repeatedly be used in a short time after memory, particularly by just can being retained that different channels are used, after word frequency is greater than certain threshold value (such as 100), this speech is just changed over to the short-term memory buffer zone, and the purpose of doing like this is that what to reject that a variety of causes produces is not the wrong text strings of speech.Because new descriptor and comment produce in descriptor and comment device inediting by the user, old speech when they are used to, and because they do not have the so much use channel of keyword, so they are without the immediate memory stage, directly enter the short-term memory buffer zone, their different channels are exactly the different piece of different document and same document.
2, the word frequency statistics operation is forgotten with word frequency: along with the old keyword of each use, descriptor and comment, the capital is triggered comprehensive inverted index device and is found certain index terms in the index master meter, and carries out word frequency statistics and check that whether needing to carry out word frequency forgets and refresh.If this index terms is in the long-term memory memory block, then the word frequency statistics of this speech is to increase by 1, i.e. Fc=Fc+1, if Fc=255, then Fc is constant; This is that the use channel of keyword has not all had freshness because being in the index terms of long-term memory memory block all was used many times, and it all is 0 that the word frequency of all channels is rewarded.Forgetting algorithm for the index terms of long-term memory memory block, take to refresh every day method once, when start native system every day for the first time or refresh and surpass after one day, make the word frequency of each index terms in this district: Fc=Int (Fc* (1-Yc)), here Int () is the function that rounds numerical value, and Yc approximately is the numeral of 0.01 magnitude.For the word frequency Fc=255 of maximum, if this speech no longer is used to (being that word frequency no longer increases), forget with top formula by Yc=0.01, word frequency Fc drops to 0 after then about 170 days.Be not activated native system if there are several days continuously, then need not refresh, promptly usefulness is that the relative time of equipment operation refreshes.It for word frequency 0 index terms, in the long-term memory memory block, a very little ratio (such as 3%) that only keeps this district's general index speech, these index terms are integrated ordered with the time that they are used at last according to the number of files of their correspondences, the oldest index terms and the minimum speech of corresponding document are come the back, and the index terms that makes them newly be entered this memory block replaces.The purpose of the forgetting algorithm here is to adapt to user interest and individual character change in time, makes the word frequency of the speech that the user pays close attention to current period most keep the highest.If find index terms at the short-term memory buffer zone, Fc (n)=Fc (n-1)+1+Ft (Hi, n then i), (0≤Fc≤255), when Fc (n-1)=255, Fc (n)=Fc (n-1), Ft (Hi, n i) calculating see formula (1), the effect of channel bonus at this moment must be considered.The renewal frequency of forgetting algorithm is brought up to one hour, when increasing word frequency, checks that all whether refresh time is above one hour at every turn, as surpass, then pressing Fc=Int (Fc* (1-Yd)), Int () is the function that rounds numerical value here, Yd approximately is the numeral of 0.01 magnitude, and is irrelevant with Yc.When the word frequency Fc of certain index terms obtains clearly greater than the part of speech of certain threshold value (such as 150) and this speech, when belonging to important name, place name, mechanism's name, ProductName etc. (the channel code name set by the index master meter provides) or surpassing certain threshold value (such as 100) by the number of documents of this speech correspondence, this index terms just is transferred to the long-term memory memory block.The disposal route that for word frequency Fc is 0 index terms is the same with the method for long-term memory memory block.The purpose of short-term memory buffer zone is the characteristics of interim interest that adapt to user's non-individual character.If find index terms at the immediate memory buffer zone, Fc (n)=Fc (n-1)+1+Ft (Hi, n then i),
(0≤Fc≤255), when Fc (n-1)=255, Fc (n)=Fc (n-1), Ft (Hi, n i) calculating see formula (1), the effect of channel bonus at this moment is extremely important.The forgetting algorithm in this district calculates by absolute time, has index terms to be used to when needing to increase word frequency at every turn, just calculates the new residual ratio Rs=of memory (1-Ys) according to the preceding back elapsed time t that once refreshes T/T(T=100 second, Ys is approximately 0.01 magnitude) here when Rs<0.95, just refreshes the immediate memory buffer zone, promptly for Fc=20, can make Fc*Rs=19, reaches the real effect that refreshes.Since this district to forget speed fast, the word frequency of all speech all reduces to 0 after general one day, but these index terms also can be retained in this buffer zone, for convenience of new word frequency statistics, when the word frequency of certain keyword reduces to 0, also its channel can be used set empty, make the channel bonus algorithm effective again.After word frequency was greater than certain threshold value (such as 100), this speech was just changed over to the short-term memory buffer zone.The purpose of immediate memory buffer zone is that what to be filtered that a variety of causes produces is not the wrong text strings of speech, because most of mistake can not be repeated.
3, index terms deletion action: when new index terms enters certain buffer zone or memory block, according to the number of files of correspondence or come last index entry service time with deleted.The index entry total number in each district, entry decision by zero word frequency, such as the zero word frequency entry with reservation entry sum 3% is standard, when the entry of zero word frequency is less than 3%, comprehensive inverted index device just increases the index entry number automatically, when too many zero word frequency entry, this device just reduces total entry number automatically, and deletes unnecessary zero word frequency entry.
4, document code name sequence is upgraded operation: when certain keyword is used by certain channel, tend to get in touch certain document, perhaps certain descriptor, when comment is selected, also get in touch with certain document, at this moment, when the word frequency statistics operation of carrying out this index terms, whether document that need to check this contact is in the document code name sequence of this index terms correspondence, if do not exist, the attention rate of searching the document in document concern table then is then according to the attention rate value of the document is inserted corresponding position with its code name.Because the frequency Fw that the attention rate Gz of document and the document are used to and with the word frequency Fc of the related important index terms of the document iRelevant, so after the word frequency of every index terms changed, the attention rate of document will change, and also will influence the arrangement of document code name sequence.Because the calculating of this renewal is quite complicated, so can not carry out real-time update.In fact, because some variations of some index terms word frequency, do not cause the great change of relevant documentation attention rate, so the influence to document code name sequence can be not clearly, so the present invention adopts regularly update mode, forget when refreshing such as carry out the index terms word frequency in the long-term memory memory block, carry out the renewal of document code name sequence simultaneously.
Document attention rate Gz is relevant with the situation that the document is used, the number of times that is used with the document at first, be that frequency is relevant, also keyword, descriptor and the comment of using with all users of the document is relevant, also directly give the relevant (value of Fs from-100 to 100 of marking Fs of document attention rate certainly with the user, default value is 0), also may be relevant with other document and aid that the document is got in touch.In the present invention, the frequency of utilization Fw of document be limited in [0,255) scope, be convenient to word frequency relatively; The information that document is paid close attention to is included in the index terms of the most often using when the user uses the document, so gather the characteristics of representing the document with important index terms.In order to make the relatively justice between the various documents, all documents are got most important preceding k the index terms (k get number 3~7 between) related with it, be preceding k index terms of word frequency maximum, it is average to carry out word frequency, be less than k situation for the general index speech, think that remaining index terms word frequency is 0.Therefore, the present invention uses the attention rate Gz that following formula calculates document:
Gz = Rw · Fw + Rc 1 k · Σ i = 1 k Fc i + Rs · Fs
Wherein, Rw (0≤Rw≤1) represents the weight of document frequency of utilization, Rc represents the weight of important index terms word frequency average, the Rs representative of consumer is to the weight of the subjective marking value of document attention rate, Rw+Rc+Rs=1, when Rw=1, just only consider frequency that document itself is used tolerance as the document attention rate; Generally get Rw=0.3, Rc=0.5, Rs=0.2; For different users, can get different weighted values.
5, channel code name collection upgrades operation: when certain keyword is used by certain channel, will be recorded in the daily record of work device, trigger neologisms interpolation, word frequency statistics or the word frequency of comprehensive inverted index device simultaneously and forget operation, at this moment, the channel code name that directly the channel code name that uses is added the index master meter is concentrated.This gather many channel code names be with the diadactic structure of (channel code name, access times) as set element, when certain channel code name does not also have, just add this channel code name in set, and to put access times be 1; When certain code name in set, then the access times with correspondence add 1.Forget operation when the immediate memory buffer zone carries out word frequency, and will be at 0 o'clock, the set of channel code name need be emptied, at this moment just the access times in all set are changed to 0 to the word frequency of certain index terms.
6, query statement query manipulation: query statement with logic " with " and inclusive-OR operator with the logical operation formula of a plurality of index terms combination, the Query Result of query statement be one by the document code name sequence of each the index terms correspondence in the query statement according to the correspondence position in the query language " with ", inclusive-OR operator according to document code name sequence " with ", the document code name sequence that obtains after the computing of inclusive-OR operation mode is the result.Here need to define between the document code name sequence " with ", the inclusive-OR operation mode.For convenience, usefulness symbol " * " replacement " with ", symbol "+" replacement " or ".
Definition 1: establish: the word frequency of index terms S is Fc, the document code name sequence of the correspondence that finds is L, the computing attention rate Gzyi=Fc*Gzi of the document Wi among the L then, wherein Gzi is the attention rate of document Wi, Ly then is the computing document code name sequence by the ordering of computing attention rate.
Definition 2: establish: the computing document code name sequence of index terms S1 is L1; The computing document code name sequence of index terms S2 is L2; Then L1+L2 is exactly that the merging set of set of document W2j all in the set of document W1i all in the L1 sequence and the L2 sequence is by the document code sequence of computing attention rate ordering, wherein get wherein bigger computing attention rate for its computing attention rate of document W that in two sequences, occurs simultaneously, be Max (Gzy1i, and the computing attention rate of other document is constant Gzy2i).
Definition 3: establish: the computing document code name sequence of index terms S1 is L1; The computing document code name sequence of index terms S2 is L2; Then L1 * L2 is exactly that the merging set of set of document W2j all in the set of document W1i all in the L1 sequence and the L2 sequence is by the document code sequence of computing attention rate ordering, be these two computing attention rate sums wherein for its computing attention rate of document W that in two sequences, occurs simultaneously, be Gzy1i+Gzy2i, and the computing attention rate of other document is constant.
According to above definition, L1 * L2 and L1+L2 still are computing document code name sequence, they can also carry out again with other computing document code name sequence L " with ", inclusive-OR operation.The meaning of inclusive-OR operation is to enlarge to search scope, and the meaning of AND operation is that the document that index terms S1 and index terms S2 can both be inquired is discharged to more front.More than be the realization explanation of some basic operations of comprehensive inverted index device, can also increase other useful operation.
The hyperlink technology is the basic technology of file organization, in order to allow the document that to edit and to revise also can increase hyperlink, just need the soft hyperlink technology of development, promptly by checking that one is being write down certain keyword chain of certain position in certain document and follows the Operation Log of certain information specific and start hyperlink.It is the center that soft hyperlink device among the present invention is used for the document, from the record of daily record of work device, when the user is used the document, the auxiliary document that keyword that grasped and position thereof, the aid that is called by this keyword and the information of returning, calling search engine search, also have by paying close attention to descriptor and the comment mark that position and context thereof carry out, and records such as remarks that adds and comment arrange according to the time, with Nr soft hyperlink that writes down as the document of frequent use recently and.When the document is opened and used once more, these soft hyperlink will be opened (some rules that are provided with according to the user are coordinated: connect preferential, high-frequency priority scheduling such as important keyword, unique keyword, this machine document) automatically here and are accessed supplementary, collaborative being presented on the screen, the operation conditions when the recovery user uses the document recently several times.From the record of daily record of work device, can also count the main mode of calling or open the document, such as being to open from the document function device, be that certain index terms searches the document and opens, be that those documents are opened by soft hyperlink, or even certain document opens etc. by hard hyperlink, can make the document carry out certain back tracking operation like this.
The document function device is equivalent to the homepage of native system, be used for the user the most frequently used to document, the up-to-date document of using, and will be shown according to the automatic classification that descriptor and important keyword carry out under the user-driven by the title of the new document of user's interest etc. and brief abstract.Comment is only as subsidiary classification information.This device also has an interface of accepting the query statement input, and most important index terms and the index terms of just having used are listed alternative.This device from user search to the document buffer zone, from the Email of newly sending, collect the new document that includes a lot of important index terms in the LAN Information shared region automatically internally and offer the user, in this device, process, the user that the process of user's opening document, user adjust sorted columns increases at certain column, the operation of deletion document all can and can be improved later classify accuracy by the automatic classified part study of document function device by the daily record of work device recording.Such as the user paragraph of certain lengthy document has been made a descriptor mark, this device may divide the document in the classification of this descriptor, but the user thinks that the document should not divide as a whole in the classification of this descriptor, and document is moved on in the classification of another descriptor, then the daily record of work device will write down this process, make new descriptor and the document global configuration simultaneously, and be the higher contact of significance level.In this device, the user does not have opening document and to the certain operations of document, relevant descriptor, comment, keyword at document integral body.
System configuration device is used for the user and sets total system operational factor and input userspersonal information.All operational factors all have default value to set, and the user can revise according to oneself individual character and hobby.The userspersonal information comprises name, sex, age, schooling, specialty, occupation, hobby, friend, address, phone etc.
Three: be set forth in hu inquire about and return Query Result.Screen is grabbed the speech device:
The major function that screen is grabbed the speech device is to indicate the literal that shows on the screen, the user can see and the starting and ending position of symbol string consciously by the particular track that the user uses mouse to move, and then this literal and symbol string is taken out as keyword; The literal here comprises various countries' literal of mixed display, and symbol comprises numeral; Because the conventional use-pattern of mouse (is represented cursor positioning by a left button, pin left button and move the representative object selection, connect by twice left button representative and carry out, call shortcut function etc. by a right button representative) taken by master routine, in order not disturb the operation of master routine, the present invention who moves simultaneously with master routine adopts the mouse track recognition device, pay close attention to position identification device and screen grabs the speech device and realizes new mouse interactive mode.Screen is grabbed the screen range information that the speech device then transmits according to the mouse track recognition device, extracts the character code content of this screen scope in the display buffer of main window, is organized into text strings as keyword.For guaranteeing really the focus of noting to be placed on certain speech or the word as the user, rather than accidental when moving to cursor on this speech, the content of being paid close attention to just can pass to native system, and the auxilliary window that calls thus could occur.This just requires the mouse moving track of Special Significance, be that a string being not easy formed by the combination of one group of consistent by chance special track of user's unconscious mouse motion track, can adopt following manner to realize: the user is with the beginning position of mouse short stay (such as 0.05-0.2 second) at certain keyword, around this keyword, pace up and down then, moving (when literal is horizontal) or move around up and down (during the literal homeotropic alignment) or draw a circle back and forth about promptly on this speech around this speech, wherein the end bit at this keyword is equipped with a brief stay, be parked in the section start of pacing up and down or drawing a circle then, at this moment assert promptly that user's focus is clearly expressed, and the user has also selected the keyword of being paid close attention to.Can adopt classification, artificially set or the identification of focus according to the intelligentized adjustment of user's usage frequency.To the user of high frequency use native system, by the once stop identification of cursor; The general user once paces up and down or draws a circle and assert by cursor; Low-frequency degree uses the user of native system to assert by twice two circle of pacing up and down or draw of cursor.By drawing a circle, the user is selected object (mainly be keyword in document, in GIS and CAD figures software, can drawing object selected) exactly.The user can accurately determine the reference position and the end position of keyword consciously, and with some unique mouse tracks positional information is passed to the mouse track recognition device.For in delegation with interior keyword, with pacing up and down or draw a circle selected keyword; Keyword for the multirow demonstration, mouse is drawn a particular track (stopping once more such as getting back to the below again after mouse being moved on to literal top short stay) then in short stay below the literal of the reference position of keyword (such as 0.05-0.1 second), arrive the same track of picture after the end position short stay of this keyword then.When the keyword of screen scraping is when accurately providing, whether neologisms of this keyword are just just judged in the processing of key word recognition device, then provide the part of speech and the classification of this keyword if not neologisms, call the key word analysis device then and carry out next step processing.But, user's operation for convenience, for once grasped with used, the keyword that is difficult for producing ambiguity, or, can adopt the selected keyword of fuzzy screen scope technique for delineating (as after optional position short stay on this keyword, drawing a circle that does not stop) according to the neologisms (name, address, time etc.) that rule is judged easily.At this moment, need call the key word recognition device and carry out key word recognition, just can enter next step processing.
Four, clipbook is got the speech device:
Clipbook is got the speech device and the clipbook that monitoring user utilizes operating system to provide is provided is duplicated, pastes, moves and even the operation of deletion, and watch content in the clipbook, when if content is text strings, just this text strings is taken out, cooperate the particular track of mouse to judge whether be the accurately selected keyword of user, indicate that then representative accurately gets speech channel code name; If when shearing or pasting content, there are not mouse track or mouse track not right, then indicate the fuzzy channel code name of getting speech.These information pass to the key word recognition device and carry out participle, identification and extraction keyword attribute then.If keyword then passes to the key word analysis device with this speech again and analyzes, call certain aid by aid bank interface device then and be user's service.When particularly the user directly pasted the content of clipbook in the aid of certain keyword-driven or the search engine, clipbook was got the speech device and directly the content of clipbook is deposited in the daily record of work device of information organization device as the aid of keyword together with the keyword that is stuck.
Five, input process is got the speech device:
It is literal and symbol string in order to monitoring user input that input process is got the speech device, analyzes automatically and judges whether input characters and symbol string are keyword by the key word recognition device, if then this literal and symbol string are taken out as keyword; Input process comprises keyboard, voice and handwriting input; It is also integrated with the Chinese-character intelligent load module of operating system that this input process is got the speech device, can activate and same user interactions by the hot key (as the Ctrl+ space) that the Chinese character input activates like this, can also utilize to simplify to import and carry out participle, with intelligent phonetic Chinese character input method is example, for primary word commonly used, three above speech of word just can use initial consonant to simplify input, promptly import the initial consonant of each word and import this speech; For user's particular keywords, this method can be used for two words.Intelligent phonetic Chinese character input method module is also wanted to write down the multiword keyword of many user's appointments to a pinyin string, can not only be to be purpose and only write down a keyword to eliminate repeated code.Professional term in the keyword and condense speech and write down and preserve with their definition, definition can directly be imported by the user, also can from the more complete definition storehouse of enterprises, select, can also from the webpage that the Internet network searches, automatically extract the back by the user and obtain by user's arrangement and approval.A keyword is used from a plurality of instruments by the user and is shown that then this keyword is extremely important for this user.Want to embody these characteristics for the scoring formula of keyword.Such as grabbed certain keyword from screen, some webpages on Internet, have been searched with it, and several webpages have been clicked, the residence time is longer in the webpage that certain clicks out, show that the user is reading, perhaps this webpage is preserved, perhaps generated summary, perhaps the definition of certain keyword in the webpage is extracted in the definition storehouse with the definition Core Generator with the autoabstract instrument.Record the new keywords of comprehensive inverted index device like this, in intelligent phonetic Chinese character input method module, will treat with proprietary speech or the phrase crossed, can use then, directly import by the Chinese phonetic alphabet in next time as the user.If certain Chinese character in the proprietary speech has a plurality of pronunciations, then all possible pinyin combinations is all remembered, and determines a right pronunciation when the user imports Chinese character again.Get the integrated Chinese-character intelligent load module of speech device with input process and need carry out certain transformation, make the user in the input in Chinese process, import consciously specific keyword cut apart sign (such as " space ", " ' " or other special keys position) and accurately determine keyword.The key word recognition device is then according to some rules, in the Chinese character input, discern neologisms (being the speech that does not have in the comprehensive inverted index device of information organization device), if neologisms, then feed back to input process and get the speech device, allow the user confirm or import side information, when the user does not wish when disturbed, the user need only continue the Chinese character input or the operation that continues in the main window gets final product.
Six, key word recognition device:
The major function of key word recognition device is to pass through Chinese Automatic Word Segmentation, identify keyword, then to comprehensive attribute and the classification of finding this speech in the collator of falling, and call this speech of daily record of work device recording and produce its channel, and trigger comprehensive inverted index device and carry out word frequency statistics, if this keyword is neologisms, just triggers comprehensive inverted index device and carry out new index terms interpolation.Word algorithm can be divided into three major types in existing minute: based on the segmenting method of string matching, based on the segmenting method of understanding with based on the segmenting method of adding up.1, the segmenting method based on string matching is called mechanical segmentation method again, it is according to certain strategy the entry in Chinese character string to be analyzed and one " fully big " machine dictionary to be mated, if find certain character string in dictionary, then the match is successful (identifying a speech).According to the difference of direction of scanning, string coupling segmenting method can be divided into forward coupling and reverse coupling; According to the situation of the preferential coupling of different length, can be divided into maximum (the longest) coupling and minimum (the shortest) coupling; According to whether combining, can be divided into the integral method that simple segmenting method and participle combine with mark again with the part-of-speech tagging process.Several mechanical segmentation methods commonly used have forward maximum matching method (by left-to-right direction), reverse maximum matching method (by the direction of the right side to a left side), minimum cutting (making the speech that cuts out in each count minimum) etc.These segmenting methods based on string matching are fundamental method of key word recognition device automatic word segmentation.2, based on the segmenting method of statistics, from form, speech is the combination of stable word, and therefore in context, the number of times that adjacent word occurs simultaneously is many more, just might constitute a speech more.Therefore word and the frequency or the probability of the adjacent co-occurrence of the word confidence level that can reflect into speech preferably.Can add up the frequency of the combination of each word of adjacent co-occurrence in the language material, calculate their information that appears alternatively, when being higher than some threshold values, can think that just this word group may constitute a speech.The statistical morphology of using in the key word recognition device is discerned some new speech and is carried out the part ambiguity and eliminate.3, based on the segmenting method of understanding, this segmenting method is by allowing the understanding of anthropomorphic distich of computer mould, reaching the effect of identification speech.Because general, the complicacy of Chinese language knowledge are difficult to various language messages are organized into the form that machine can directly read, and therefore also are in experimental stage based on the Words partition system of understanding at present.In the key word recognition device, can adopt some reliable method replenishing as the ambiguity elimination through checking.The computing machine automatic word segmentation, some defectives are always arranged, to utilize Chinese character input process and screen to grab the keyword of the accurate extracting in the speech process more, and carry out the mark of part of speech and part of speech, be applied to then in other aid, after a keyword used in a plurality of aids, the content of mark will be enriched gradually, and the meaning of these neologisms will be bright and clear gradually.Such as when importing name " Li Changjiang ", the Intelligent treatment function of " intelligent phonetic Chinese character input method " can allow the user be combined into name from candidate and speech, possible combination is " Lee "+" the Changjiang river ", but the user is in order to allow system know that " Li Changjiang " is a proprietary name, can improve in the intelligent phonetic Chinese character input method that " ' " double as is for accurately getting the separator of speech, and then " Li Changjiang " can be by " ' "+" Lee "+" length "+" river "+" ' " clearly provides with the syllable separator.When this " Li Changjiang " is used to call address list, then this keyword is indicated as user's important relation people's name, can enter the long-term memory memory block at comprehensive inverted index device, so just is not easy to be eliminated.Carry out full-text search when this user uses " Li Changjiang " again as keyword, and when searching desired document, its importance nearly strengthen by a step.Use in two different aids same keyword each once, then the importance of this speech may be equal to and use in same instrument 20 times much larger than using twice in an instrument therein, the essence of Here it is word frequency channel bonus algorithm of the present invention.Screen grabs that speech device and input process get that the speech device provides accurately gets speech, all is the training example of the intelligent automatic participle of key word recognition device.
Seven, the semantic device of keyword:
The semantic device of keyword is used for providing the semantic of keyword that the key word recognition device identifies and records the daily record of work device of information organization device, and on screen, show, perhaps keyword and relevant semanteme are delivered to the key word analysis device and carry out subsequent treatment.The semantic device of keyword is an automanual instrument, extracts the semantic information assisted user automatically and indicate the semanteme of keyword in document in mutual with the master routine of main document.Such as keyword is a name, just needs to determine that this name be author, translator, the commentator of this document, or the leading role who describes in the document, supporting role etc.It is different from the processing of address list to name, and what address list was concerned about the processing of name is this name and user's relation, and care here is the relation of this name and document.Such as keyword is an address, and its semanteme may be the relation of relevant document, is the address of leading role in the publishing house address, document of author's address, document or business address etc.If keyword is a numeral, its semanteme may be price, the author's of the document telephone number, document name, page number of document etc.Setting up one in this device describes document and to the semantic system of document classification, is similar to descriptor and comment architecture.In addition, definition to keyword also is provided, definition can directly be imported by the user, also can select from the more complete definition storehouse of enterprises, can also automatically extract the back from the webpage that the Internet network searches by the user and be obtained by user's arrangement and approval.Definition and semanteme have certain relation, and definition is recapitulative, and semanteme is targetedly.Definition such as " telephone number " is used for making a phone call; And semanteme is the telephone number that this " telephone number " is the so-and-so.The semanteme of keyword has very big value in search, people's name is during as keyword, and indicating it is that author or leading role are very big to accurate search difference.Is distinct such as " einstein " as author's search with as leading role's Search Results.In fact the semantic device of keyword has carried out plain text semi-structured processing to a certain degree, if main document is exactly a semi-structured document, as XML document, Web webpage, the Word document of document format etc. is arranged, then the semantic device of keyword just extracts the structured message of these documents, and the structural information of additional and integrated device.
Eight, key word analysis device:
The function of key word analysis device is the attribute and the classification of the keyword that provides according to the key word recognition device, and the semanteme that provides according to the semantic device of keyword, call different instruments or search engine is served for the user by aid bank interface device and search engine interface device.For the keyword of an extracting, having a plurality of aids and search engine can utilize, and at this moment can be accustomed to recommending a first-selected instrument according to the user, simultaneously other instruments is listed for the user and is selected.The user in the mode of drawing circle when the screen word-selecting, often delimit the scope of keyword, promptly manually carried out participle, if that delimited be neologisms then by the daily record of work device it is noted, and use different instruments to treat the parts of speech classification information that this keyword increases this keyword according to the user.Utilize Chinese-English Dictionary to search English word such as the user, then can determine the Chinese part of speech of this speech according to the part of speech of the English word of choosing; When the user got in touch it with address list, this keyword may be exactly name or place name; When the user searched for it, it was exactly the keyword of the category significance of user's care; When the user searched address on the map with it, it was exactly address or mechanism's name; When it is numeral, it may be telephone number, ID (identity number) card No., product code name etc., at this moment, the user wants be exactly these digitized representations theme for information about, it also may be pure numeral, at this moment the user carries out digital computation possibly, so just calls counter by the key word analysis device.From the keyword of key word analysis device transmission may be calling search engine.At this moment, the search engine interface device just carries out personalization and intelligent the processing to keyword, and at first search in comprehensive inverted index device, and then search for to the search engine of outside, the result of search carries out returning to the user again after intellectuality is filtered and selects.From the keyword of key word analysis device transmission may be to call aid.At this moment, the unification of aid bank interface device is carried out calling of all instruments in the aid storehouse and is shown, prevents conflicting of mouse track call operation and screen display.
Nine, search engine interface device:
The search engine interface device is used for according to each word frequency characteristic of comprehensive inverted index device and a kind of meaning of a word that the user of keyword correspondence to be searched uses always in index, transform the search words and phrases in conjunction with synonym, near synonym, hypernym, hyponym and association's keyword, pass to external search engine search then, improve the quality of information search.The search engine interface device is exactly that general search engine adds the personalized and intelligent research tool of strengthening.Improve search quality and often need the auxiliary of personalization, intelligent search technique again.User individual is exactly to grasp the computed characteristics of user as far as possible, interactive features from user and computing machine, the characteristics of the information of being concerned about, the different demands of distinguishing different occasion users are (during work, difference during amusement), utilizes these characteristics both can distinguish different user, also can distinguish same user and be under the different occasions, more to distinguish the different characteristics of this user when handling different task, so the implication of the personalization of search engine interface device of the present invention is more extensive.The state current according to the user, it is a method that effectively improves query accuracy that keyword is limited, use such as the user what theme this keyword is under, promptly add descriptor, other relevant keyword also will join in the query contents in the main document that the active user uses, the user shows writing style and the type that the user likes to the comment of main document, can also limit or the like this.The importance ranking of universal search engine is not considered specific user's needs, so the information to searching is carried out the importance rearrangement of data entries according to the user personality characteristics, and the webpage dynamic order, the filtration of user's degree of care dynamic order is also very important.During filtration, select Search Results to have very great help to the document classification that searches to the user by theme, comment, semanteme etc.The main window mode that the selected document of end user will be used the Web browser of standard shows that the document may be the auxiliary document of existing main document, along with increasing of user's concern also can be another main document.
Ten, aid interface arrangement:
The aid interface arrangement is the integrated interface of various outside aids, and these outside aids are classified by the characteristics of using keyword, make it belong to different " channels " respectively, after they are correctly called, significant to " new channel award " word frequency statistics method of keyword.The aid storehouse comprises China and foreign countries' cliction allusion quotation at least, Chinese dictionary and encyclopedia, the document links that address list, yellow pages, counter, map, vidclip, snatch of music, famous person are introduced, content the is relevant (document that this locality that keyword that grasps by main document or statement carry out and the intelligent search of Web obtain, can set up contact with hyperlink, if main document is read-only, then on weekdays this incident of will device recording and set up soft hyperlink) in one or more, but be not limited to above cited these.They mainly show with auxilliary window form, need to cooperate locations of points of interest information, keyword and context of co-text simultaneously.For some aids that can only show at the main window of oneself control, then the collaborative display device of screen can not make they and the main window concerted action of main document, and after the main window activation of main document, the main window of this aid will be placed in background state.Every certain standard and agreement of meeting, any different software developer's aid can be registered a part that becomes the aid storehouse, accept the unified management of aid bank interface device, share mouse track recognition device, screen and grab the function that speech device and the collaborative display device of screen provide, carry out further information intelligent in conjunction with the special rules of each instrument oneself and inference engine then and be treated to the user service is provided.Aid bank interface device is being stored title, function, classification and the display property of each aid, after certain instrument is called by a particular keywords, will give this keyword with the classification of this instrument, give the classification of keyword name such as address list, map is given the classification of keyword address, encyclopedia is given the semantic definition of keyword etc., the arrangement of these information via aid bank interface devices will deposit the daily record of work device in, as the important information of safeguarding comprehensive inverted index device.
11, mouse track recognition device:
Some particular mouse tracks that the mouse track recognition device draws consciously in order to identification user rolling mouse, different tracks characteristics according to identification, judge keyword and content, the fixed point documents location that the user selectes or send operational order, call screen then respectively and grab the speech device, pay close attention to position identification device and operation control recognition device carries out next step processing.
12, pay close attention to position identification device:
Be to pay close attention to information and the characteristics that position identification device is used for extracting certain position that the main document user pays close attention in Ei.Positional information is divided into four kinds: screen position, the window's position, document file page position and document content position.The screen position is the absolute position on the screen, can not change along with the change of displaying contents; The window's position is the position with main window position relative coordinate, and the mobile and ratio of window changes all might change the window's position; The document file page position is the coordinate position with respect to the page layout background of document in the main window, and when the page moved and changes with displaying ratio, the document file page position can corresponding change; The document content position is the literal coordinate position with respect to document, all can correspondingly change when character rolling, ratio variation and Font Change.When document can not be edited, such as the PDF document, the document file page position was equal to mutually with the document content position.Because some track of mouse track recognition device identification is not in order to grasp keyword, should not be as the sentence or the paragraph of keyword but grasp.The user wishes these contents are carried out the comment evaluation, expressed and to sigh with deep feeling or carry out descriptor mark etc., also may just on document, locate, at this moment just handle positional information, and default determine context (being generally the natural paragraph at place, position) or clearly be revised as this page or leaf, this section, this chapter or in full by the user by paying close attention to position identification device.
13, operation control recognition device:
Operation control recognition device is used for searching corresponding function or instruction according to the track code name that obtains behind the mouse track recognition device identification track, and call corresponding function or send instruction corresponding according to the situation of running environment and correlation parameter, and under unaccommodated situation, the cancellation function call and instruct send.Such as having received the corresponding track code that calls descriptor and comment device, operation control this moment recognition device only under the situation of receiving the location parameter that the concern position identification device provides, just calls descriptor and comment device.There is not more information to analyze owing to pay close attention to the position of position identification device transmission, and the analysis of the document context of this position is too difficult, therefore, call descriptor and comment device and remarks and comment device and generally adopt the user to pass through the mode that mouse track is directly selected, and be not similar to the intelligent analysis work of key word analysis device.
14, descriptor and comment device:
Descriptor and comment device are used to eject a window and show descriptor or comment with tree structure, select for the user.Selected descriptor and comment are listed in the check boxes, and the user can also adjust their order, or delete certain descriptor or comment.After the user determines, these descriptor and comment be just with the context of main document relevant position or the chosen related demonstration of paragraph, simultaneously on weekdays writing time in the will device, main document, paid close attention to position and the selected descriptor and the contact details of comment.Some descriptor and comment user wish they and document associations, but do not wish to show, a show label can be set in this case treat with a certain discrimination.Sometimes, the user does not wish to estimate with the comment of literal to the document content of paying close attention to the position, and wish to carry out mark with icon or symbol, such as representing " attention " with the underscore of auxilliary window, representing " very important ", be used in and draw fork expression " sheer nonsense " or the like on sentence or the paragraph with double underline, then descriptor and comment device with these icons and symbol one by one or the comment of the corresponding literal in ground more than one, are to show as icon and symbol when demonstration when inter-process and storage.In descriptor and comment device, can also allow the user directly give the synthesis result of importance, reference value and the favorable rating of document, be attention rate (Fs) marking of document, Fs value from-100 to 100, default value is 0,-100 expression documents inessential, no reference values, do not liked, 0 expression is general, and 100 expressions are very important, reference value is arranged very much, be popular.Have the comment of violent emotion color for some, be equivalent to the user and document carried out subjective assessment, comment such as " extraordinary argument " shows that the user likes this article, and the comment of " utmost point has reference value " shows it is one piece of important documents, and the comment of " rubbish " shows that the document is meaningless.Show that for some descriptor the user's interest degree is very high, divide 100 such as " football " theme interest concerning certain user.Can give a score value for this class descriptor and comment, when using the direct marking that then can revise the user, because user's viewpoint can change, want that such as one section news its importance before reading of reading is 100, but reading later, importance may become 20 or 0, if a pseudo event importance also can reduce to-100, so the attention rate of document is as the criterion with up-to-date marking or through the up-to-date descriptor and the correction of comment.Descriptor and comment device also have the typing and the maintenance function of descriptor and comment, can express the relation that descriptor has notional level and hypernym, hyponym, and this function realizes with the OWL language of expressing body.Descriptor and comment have shown user's viewpoint, understanding and emotion, are document is carried out personalized tissue and the important foundation of managing.Because descriptor and comment often do not occur in document, so be the irreplaceable useful information of keyword.
15, remarks and comment device:
Remarks is used to open a text editor for user's typing Word message with the comment device, the context that document is paid close attention to the position carries out remarks and replenishes, perhaps comment on, perhaps carry out remarks and replenish and comment on, and in auxilliary window, be presented at page empty place in the main document with the pattern of page location.It can partly float on the document literal, also can surmount outside the page, outside the window, but within screen, be standard reading.This display mode wishes to reach readers write remarks and comment on paper media effect.Remarks can be thought replenishing of document, and the keyword that occurs from remarks also increases the description to main document; Comment is finer comment, and descriptor that occurs in the comment and comment can be used as the evaluation to main document, passes to descriptor and comment device after can extracting automatically, and calls the daily record of work device and carry out recording processing.
Install running environment of the present invention as shown in Figure 4, the user calls a master routine on personal computer, on main window, show the main main document of handling, this user operates main document according to the interactive mode of master routine regulation by input equipment (keyboard, mouse, microphone, writing pencil etc.), and the result of operation is displayed in the main window.Monitoring user of the present invention is to the operation of main document, extract the literal input of user by input equipment (keyboard, microphone, writing pencil etc.), word content when the extraction user uses clipbook, identification user's the mouse and the track of writing pencil, extracting has the Word message of the track appointment of specific meanings.And the above-mentioned by analysis Word message that is extracted, obtain keyword, sentence or paragraph that the user pays close attention to, and their positions in document, gripping portion to document carries out descriptor mark and comment mark thus, and to the additional remarks of this part and comment etc., perhaps with the keyword that grasps by other document information of search engine searches, call the information of handling extracting in the aid storehouse etc.These supplementarys are arranged on the screen by the user again and show with main document is collaborative.Aid stock wherein be placed on this machine hard disk and the server of internal network on, search engine will search on this machine hard disk document on the server with internal network and the info web on the Internet network.And the present invention is directly left the information that self produces on the server of this machine hard disk or internal lan.
Native system can become and is hosted by on Intranet or the Internet server, and a kind of housing (Operation System Shell) of the operating system of main service can cross-platform (PC, PAD, mobile phone and other terminals that can surf the Net) be provided.Computing machine and the in-house network that is connected with computing machine and extranets (mainly comprising the Internet net) are for the user provides a large amount of information services, and user's operation and work has on computers increased new information for this huge infosystem again.

Claims (25)

1. individualized document processing system based on keyword, it comprises having at least one processor and storage stack, and the keyboard, mouse that comprise the screen of an output usefulness and input usefulness at least are to provide the user interface of user and program interaction, and the computing machine that is connected with external memory storage, internal lan and/or outside Internet, operation has the operating system of multitask and multiwindow in this computing machine, it is characterized in that: the formation of system also comprises
The main document of paying close attention in order to explicit user and the auxiliary document related and/or the collaborative display device of screen of supplementary with this main document;
Various information that show in order to the collaborative display device of storage screen and the relation between these information and the information organization device that uses or supply external tool to call for other device;
Some particular mouse tracks that draw consciously in order to identification user rolling mouse and the mouse track recognition device that calls respective operations;
Be used for determining the keyword generation device of user's interest keyword;
The keyword that is used for that the user is determined carries out the keyword treating apparatus of analyzing and processing;
And the facility invokes device that is used for calling external tool according to the keyword that the user determines.
2. the individualized document processing system based on keyword according to claim 1, it is characterized in that: the keyword generation device comprises in order to the particular track of using mouse to move by the user indicates the literal that shows on the screen, the user can see and the starting and ending position of symbol string consciously, and this literal and symbol string are grabbed the speech device as the screen that keyword takes out.
3. the individualized document processing system based on keyword according to claim 1, it is characterized in that: the keyword generation device comprises that the clipbook that utilizes operating system to provide in order to monitoring user duplicates, pastes, moves and even deletion action, and watch content in the clipbook, judge whether it is keyword, if then the clipbook that takes out as keyword is got the speech device.
4. the individualized document processing system based on keyword according to claim 1, it is characterized in that: the keyword generation device comprises literal and the symbol string in order to the monitoring user input, and whether literal and the symbol string imported by automatic analysis of key word recognition device and judgement are keyword, if then this literal and symbol string are got the speech device as the input process that keyword takes out.
5. the individualized document processing system based on keyword according to claim 1, whether it is characterized in that: the keyword treating apparatus comprises in order to judging whether a given literal and symbol string are keywords, and may be the key word recognition device of new keyword and in order to determine the user's interest keyword and to determine that the user wishes the key word analysis device by this keyword start-up operation.
6. the individualized document processing system based on keyword according to claim 5, it is characterized in that: the keyword treating apparatus also comprises in order to the semantic device of the keyword of the semanteme that provides keyword, the semantic device of this keyword provides the semantic of keyword that the key word recognition device identifies and records in the information organization device, shows on screen; Perhaps keyword and relevant semanteme are delivered to the key word analysis device and carry out subsequent treatment.
7. the individualized document processing system based on keyword according to claim 1 is characterized in that: the facility invokes device comprises that one is called external search engine based on keyword, to improve the search engine interface device of information search quality.
8. the individualized document processing system based on keyword according to claim 1 is characterized in that: the facility invokes device comprises one group of aid bank interface device that calls outside aid storehouse based on keyword.
9. the individualized document processing system based on keyword according to claim 8 is characterized in that: described aid storehouse include but not limited to China and foreign countries' cliction allusion quotation, Chinese dictionary, encyclopedia, address list, yellow pages, counter, map, vidclip, snatch of music, the famous person introduces and content is relevant document links one of at least
10. the individualized document processing system based on keyword according to claim 1 is characterized in that: the formation of system also comprises in order to the information of extracting certain position that the user pays close attention in the main document and the concern position identification device of characteristics.
11. the individualized document processing system based on keyword according to claim 1, it is characterized in that: the formation of system also comprises in order to search corresponding function or instruction according to the track code name that obtains behind the mouse track recognition device identification track, and call corresponding function or send instruction corresponding according to the situation of running environment and correlation parameter, and under unaccommodated situation, the operation of sending of calling and the instructing control recognition device of cancellation function.
12. according to any described individualized document processing system based on keyword in claim 10 or 11, it is characterized in that: the formation of system also comprises in order to eject a window shows descriptor and the comment device that descriptor or comment are selected for the user with tree structure.
13. according to any described individualized document processing system in claim 10 or 11 based on keyword, it is characterized in that: the formation of system comprises that also the context of document being paid close attention to the position carries out the remarks and comment device that remarks replenishes and/or comments in order to open a text editor for user's typing Word message.
14. the individualized document processing system based on keyword according to claim 1 is characterized in that: the collaborative display device of screen comprises main window, auxilliary window, collaborative display control unit; Main window is exactly the display window of conventional software, carries out the information that includes literal and symbol of main reading and/or editing operation in order to explicit user; Collaborative display control unit is in order to the operation that monitors main window and/or accept the message that main window sends, when the main window content displayed changes, calculate all that show in the main window selected or the coordinate figure of the keyword that produces and the scaling value of demonstration, and these values are passed to all auxilliary windows, make auxilliary window can adjust display position and state, change synergistically with the main window content displayed and show; Auxilliary window is used for demonstrating corresponding help, remarks, comment, descriptor and comment supplementary based on the displaying contents in the main window under the control of collaborative display control unit.
15. the individualized document processing system based on keyword according to claim 14, it is characterized in that: auxilliary window has visually-clear and the transparent function of interactive operation, that is: when auxilliary window places on the main window, auxilliary window has translucent to complete transparent frame and background, the user can see the shown content of main window under the auxilliary window clearly, and the literal or the graphical content that show in the auxilliary window swim on the main window; User's keyboard and mouse action can see through the displaying contents in auxilliary window and the auxilliary window and operate under the auxilliary window on the main window content displayed.
16. the individualized document processing system based on keyword according to claim 14, it is characterized in that: the new position coordinate value of the keyword that auxilliary window can provide according to collaborative display control unit and scaling value show and refresh, the demonstration that mainly contains four kinds of patterns refreshes, that is: when move the keyword position of this auxilliary window of link in the main window, pattern one, auxilliary window is followed mobile; Pattern two, auxilliary window keeps motionless; Pattern three, auxilliary window become an equal big or small translucent icons of literal that comprises with keyword, hang over this keyword back and follow mobile; Pattern four, auxilliary close.
17. the individualized document processing system based on keyword according to claim 1 is characterized in that: the formation of information organization device comprises daily record of work device, comprehensive inverted index device, soft hyperlink device, document function device and system configuration device.
18. the individualized document processing system based on keyword according to claim 17 is characterized in that: the daily record of work device is used to preserve the state and the result of the computed process of user and each device operation of native system; Emphasis is preserved current configuration, recent used instrument, hyperlink and the record of soft hyperlink of computing machine, user's personalization features.
19. the individualized document processing system based on keyword according to claim 18, it is characterized in that: the information of daily record of work device recording comprises keyword generation table, keyword application table at least and pays close attention to the position operation table, wherein the information in the keyword generation table is to grab speech device, clipbook by screen to get speech device or input process and get the speech device and produce, and after key word recognition device identification, transmitting, call that the daily record of work device writes down; Information in the keyword application table is after being called external search engine and successfully returned auxiliary document by the search engine interface device, or after aid bank interface device calls aid success return message, calls the daily record of work device and carry out record; The information in the position operation table paid close attention to is after to be descriptor with comment device and remarks and comment device move, and calls that the daily record of work device writes down.
20. the individualized document processing system based on keyword according to claim 17, it is characterized in that: comprehensive inverted index device is used for the record about keyword, descriptor and comment that writes down according to the daily record of work device, foundation is index terms with keyword, descriptor and comment, is the concordance list of search terms with document, remarks, comment and aid; Vocabulary in the concordance list is dynamic, constantly there is new keywords to add also constantly to have been friends in the past the keyword deletion or forgets, all new keywords of confirming through the user adopt the pattern of immediate memory, short-term memory and long-term memory three phases to remember and forget, and emerging descriptor and comment directly enter the short-term memory buffer zone; The new keywords of immediate memory is retained in the immediate memory buffer zone of the comprehensive device of falling the ranking index, and simultaneously according to its channel information of index master meter record, this memory buffer zone frequency of utilization sorts and reduces word frequency with forgeing very fast forgetting algorithm; When this new keywords and certain document associations or with certain aid fixed correlation, perhaps instantaneous word frequency greater than certain threshold value after, this keyword is just deposited in the short-term memory buffer zone of the comprehensive device of falling the ranking index, and still write down the generation channel of this speech according to the index master meter, the word frequency statistics of this new keywords takes new channel to reward the algorithm statistics, and sort and forget slow forgetting algorithm and reduce word frequency according to word frequency, the speech that word frequency is very low will pass into silence, and get rid of from buffer zone; When the necessary attribute of the new keywords that deposits the short-term memory buffer zone in is replenished, and word frequency is higher than certain threshold value, then this keyword will be deposited in the long-term memory buffer zone of the comprehensive device of falling the ranking index, become the new keywords in this district, and still carry out the channel bonus algorithm of word frequency statistics and reduce word frequency with forgeing very slow forgetting algorithm according to the index master meter
21. the individualized document processing system based on keyword according to claim 20 is characterized in that: the channel bonus algorithm of word frequency statistics is the combined action and the influence of feeling of freshness to remembering of the sense organ channel in the imitation human mind process; A keyword is used the n time, and uses channel Hi, and at this moment Hi has totally been used ni time, then the word frequency Fc of this speech (n)=Fc (n-1)+1+Ft (Hi, ni), Fc is the integer between 0~255, when Fc (n-1)=255, Fc (n)=Fc (n-1); Ft (Hi, ni) for using the award word frequency value of Hi channel the ni time, Ft (Hi, computing formula ni) is:
Ft ( Hi , n i ) = F 0 · min j { zd i n i [ 1 - Q ( Hi , Hj ) · ( 1 - Zd j n j ) ] }
Fo is a word frequency award value, and Q is the channel similarity, and Zd is the feeling of freshness exponential factor of channel; The value of Zd is greater than 0, less than 1; The span of Q is between 0 and 1;
Forgetting algorithm is the characteristics of forgeing when using keyword according to the user, with reference to great this memory regulation curve revealed law of Chinese mugwort guest, this curve is divided into three sections: immediate memory stage, short-term memory stage and long-term memory stage, and all simulate with exponential function, the word frequency Fc of index terms is as the tolerance of memory intensity in comprehensive inverted index device, if T is after the time for process, forgeing ratio is Y, the value of Y is greater than 0, less than 1, then forgetting algorithm is exactly after t after a while, memory residual quantity Fc=Fc0* (1-Y) t/T.
22. the individualized document processing system based on keyword according to claim 20, it is characterized in that: index file is pressed document attention rate Gz ordering, document attention rate Gz is relevant with the situation that the document is used, at first the frequency Fw that is used with the document is relevant, also keyword, descriptor and the comment of using with all users of the document is relevant, directly give the marking Fs of document attention rate relevant with the user, also other document and the aid of getting in touch with the document is relevant; The information that document is paid close attention to is included in the index terms of the most often using when the user uses the document, all documents get related with it most important before k index terms to carry out word frequency average, the computing formula that obtains the attention rate Gz of document is:
Gz = Rw · Fw + Rc 1 k · Σ i = 1 k Fc i + Rs · Fs
In the formula, Rw represents the weight of document frequency of utilization, and Rc represents the weight of important index terms word frequency average, and the Rs representative of consumer for different users, can be got different weighted values to the weight of the subjective marking value of document attention rate, and Rw+Rc+Rs=1.
23. the individualized document processing system based on keyword according to claim 17, it is characterized in that: it is the center that soft hyperlink device is used for the document, from the record of daily record of work device, keyword that grasped when the user is used the document and position thereof, aid that calls by this keyword and the information of returning, the auxiliary document that calling search engine searches, also have by paying close attention to descriptor and the comment mark that position and context thereof carry out, and the remarks and the review record of adding were arranged according to the time, with recently and the record of the most frequent use as the soft hyperlink of the document, when the document is opened and used once more, these soft hyperlink will be opened automatically accesses supplementary, collaborative being presented on the screen, operation conditions when the recovery user uses the document recently several times, and from the record of daily record of work device, can also count the main mode of calling or open the document, thereby make the document can carry out certain back tracking operation.
24. the individualized document processing system based on keyword according to claim 17, it is characterized in that: the document function device be used for the user the most frequently used to document, the up-to-date document of using, and will be shown according to the automatic classification that descriptor and important keyword carry out under the user-driven by the title of the new document of user's interest and brief abstract.
25. the individualized document processing system based on keyword according to claim 17 is characterized in that: system configuration device is used for the user and sets total system operational factor and input userspersonal information.
CN 200710200102 2007-01-24 2007-01-24 Individualized document processing system based on keywords Pending CN101004737A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200710200102 CN101004737A (en) 2007-01-24 2007-01-24 Individualized document processing system based on keywords

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200710200102 CN101004737A (en) 2007-01-24 2007-01-24 Individualized document processing system based on keywords

Publications (1)

Publication Number Publication Date
CN101004737A true CN101004737A (en) 2007-07-25

Family

ID=38703883

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200710200102 Pending CN101004737A (en) 2007-01-24 2007-01-24 Individualized document processing system based on keywords

Country Status (1)

Country Link
CN (1) CN101004737A (en)

Cited By (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102004776A (en) * 2010-11-22 2011-04-06 何吴迪 Window expression oriented cloud computing window system constructing method
CN102193936A (en) * 2010-03-09 2011-09-21 阿里巴巴集团控股有限公司 Data classification method and device
CN102541901A (en) * 2010-12-26 2012-07-04 上海量明科技发展有限公司 Method and system for identifying and outputting information during document reading
CN101739393B (en) * 2008-11-20 2012-07-04 苗玉水 Chinese text intelligent participle method
CN102609189A (en) * 2012-01-13 2012-07-25 百度在线网络技术(北京)有限公司 Method and client side for processing content of messages of mobile terminal
CN102750279A (en) * 2011-04-18 2012-10-24 北京圣涛平试验工程技术研究院有限责任公司 Classification association networked sharing method and system for electronic document
CN103019814A (en) * 2012-11-21 2013-04-03 北京荣之联科技股份有限公司 System and method for managing shear plate
CN103038764A (en) * 2010-04-14 2013-04-10 惠普发展公司,有限责任合伙企业 Method for keyword extraction
CN103377232A (en) * 2012-04-25 2013-10-30 阿里巴巴集团控股有限公司 Headline keyword recommendation method and system
CN103513972A (en) * 2012-06-25 2014-01-15 联想(北京)有限公司 Display method and electronic equipment
CN103678379A (en) * 2012-09-17 2014-03-26 腾讯科技(深圳)有限公司 Method and device for pushing media information in real time on basis of user focus information
CN104077011A (en) * 2013-03-26 2014-10-01 北京三星通信技术研究有限公司 Method for associating documents in same type and terminal equipment
CN104462558A (en) * 2014-12-26 2015-03-25 浙江宇视科技有限公司 Method and device for modifying words in Lucene index file
CN104462056A (en) * 2013-09-17 2015-03-25 国际商业机器公司 Active knowledge guidance based on deep document analysis
CN104537040A (en) * 2014-12-23 2015-04-22 小米科技有限责任公司 Method and device for capturing webpage content and electronic device
CN104850608A (en) * 2015-05-07 2015-08-19 深圳市世强先进科技有限公司 Method for searching keywords on information exhibiting page
CN104866545A (en) * 2015-05-07 2015-08-26 深圳市世强先进科技有限公司 Method for searching keywords on information display page
CN104922906A (en) * 2015-07-15 2015-09-23 网易(杭州)网络有限公司 Action executing method and device
WO2016004584A1 (en) * 2014-07-08 2016-01-14 Yahoo! Inc. Method and system for providing a personalized snippet
CN105488038A (en) * 2014-09-15 2016-04-13 阿里巴巴集团控股有限公司 Communication application personalized information matching method and device
CN105574162A (en) * 2015-12-16 2016-05-11 南京鼎岩信息科技有限公司 Automatic hyperlink method of keyword
CN105706046A (en) * 2013-08-02 2016-06-22 谷歌公司 Surfacing user-specific data records in search
CN105808520A (en) * 2014-12-30 2016-07-27 联想(北京)有限公司 Electronic equipment and sentence processing method thereof
CN106202146A (en) * 2012-07-16 2016-12-07 刘二中 A kind of search engine terminal use inputs the processing method of reference paper Search Hints information
WO2017028407A1 (en) * 2015-08-20 2017-02-23 百度在线网络技术(北京)有限公司 Method and device for extracting text digest
CN107092588A (en) * 2016-02-18 2017-08-25 腾讯科技(深圳)有限公司 A kind of text message processing method, device and system
CN107451168A (en) * 2016-05-30 2017-12-08 中华电信股份有限公司 File Classification System and Method Based on Vocabulary Statistics
CN107545039A (en) * 2017-07-31 2018-01-05 腾讯科技(深圳)有限公司 The index acquisition methods and device of keyword, computer equipment and storage medium
CN107784027A (en) * 2016-08-31 2018-03-09 北京国双科技有限公司 A kind of reminding method and device of judgement document's search key
CN108399213A (en) * 2018-02-05 2018-08-14 中国科学院信息工程研究所 A kind of clustering method and system of user oriented personal document
CN108509585A (en) * 2018-03-29 2018-09-07 重庆大学 A kind of isomeric data real-time, interactive optimized treatment method
CN109284352A (en) * 2018-09-30 2019-01-29 哈尔滨工业大学 A kind of querying method of the assessment class document random length words and phrases based on inverted index
CN109388806A (en) * 2018-10-26 2019-02-26 北京布本智能科技有限公司 A kind of Chinese word cutting method based on deep learning and forgetting algorithm
CN109615001A (en) * 2018-12-05 2019-04-12 上海恺英网络科技有限公司 A kind of method and apparatus identifying similar article
CN109800303A (en) * 2018-12-28 2019-05-24 深圳市世强元件网络有限公司 A kind of document information extracting method, storage medium and terminal
US10311874B2 (en) 2017-09-01 2019-06-04 4Q Catalyst, LLC Methods and systems for voice-based programming of a voice-controlled device
CN109857301A (en) * 2018-12-27 2019-06-07 维沃移动通信有限公司 Show the method and terminal device of information
CN109933782A (en) * 2018-12-03 2019-06-25 阿里巴巴集团控股有限公司 User emotion prediction technique and device
CN110019771A (en) * 2017-07-28 2019-07-16 北京国双科技有限公司 The method and device of text-processing
CN110019590A (en) * 2017-09-13 2019-07-16 北京嘀嘀无限科技发展有限公司 Method, apparatus, electronic equipment and the storage medium of map are shown in the page
CN110603545A (en) * 2017-04-26 2019-12-20 谷歌有限责任公司 Organizing messages exchanged in a human-machine conversation with an automated assistant
CN110764668A (en) * 2019-10-30 2020-02-07 维沃移动通信有限公司 Comment information acquisition method and electronic equipment
CN110968246A (en) * 2018-09-28 2020-04-07 北京搜狗科技发展有限公司 Intelligent Chinese handwriting input recognition method and device
CN111046252A (en) * 2019-11-20 2020-04-21 北京字节跳动网络技术有限公司 Information processing method, device, medium, electronic equipment and system
US10769225B2 (en) 2016-08-15 2020-09-08 Richard S. Brown Processor-implemented method, computing system and computer program for invoking a search
CN111966816A (en) * 2020-07-09 2020-11-20 福建亿榕信息技术有限公司 Intelligent association method and system for official documents
CN112384903A (en) * 2018-09-26 2021-02-19 多玩国株式会社 Server system, application distribution server, terminal for viewing, content viewing method, application, distribution method, and application distribution method
CN112381519A (en) * 2020-11-20 2021-02-19 北京云族佳科技有限公司 Method and device for processing work logs and readable storage medium
TWI733581B (en) * 2020-09-04 2021-07-11 南開科技大學 Online e-book translation and teaching match immediately system and method
CN113204579A (en) * 2021-04-29 2021-08-03 北京金山数字娱乐科技有限公司 Content association method, system, device, electronic equipment and storage medium
CN113448918A (en) * 2021-08-31 2021-09-28 中国建筑第五工程局有限公司 Enterprise scientific research result management method, management platform, equipment and storage medium
CN113743054A (en) * 2021-08-17 2021-12-03 上海明略人工智能(集团)有限公司 Alphabet vector learning method, system, storage medium and electronic device
CN114995691A (en) * 2021-03-01 2022-09-02 北京字跳网络技术有限公司 Document processing method, device, equipment and medium
CN115204123A (en) * 2022-07-29 2022-10-18 北京知元创通信息技术有限公司 Analysis method, analysis device and storage medium for collaborative editing of document
CN117556782A (en) * 2024-01-11 2024-02-13 深圳市度申科技有限公司 Text formatting method, electronic equipment and computer readable storage medium
CN117762889A (en) * 2024-02-20 2024-03-26 成都融见软件科技有限公司 Same-file multi-window state synchronization method, electronic equipment and medium

Cited By (88)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739393B (en) * 2008-11-20 2012-07-04 苗玉水 Chinese text intelligent participle method
CN102193936A (en) * 2010-03-09 2011-09-21 阿里巴巴集团控股有限公司 Data classification method and device
CN102193936B (en) * 2010-03-09 2013-09-18 阿里巴巴集团控股有限公司 Data classification method and device
CN103038764A (en) * 2010-04-14 2013-04-10 惠普发展公司,有限责任合伙企业 Method for keyword extraction
CN102004776A (en) * 2010-11-22 2011-04-06 何吴迪 Window expression oriented cloud computing window system constructing method
CN102541901A (en) * 2010-12-26 2012-07-04 上海量明科技发展有限公司 Method and system for identifying and outputting information during document reading
CN102750279A (en) * 2011-04-18 2012-10-24 北京圣涛平试验工程技术研究院有限责任公司 Classification association networked sharing method and system for electronic document
CN102750279B (en) * 2011-04-18 2015-11-25 北京圣涛平试验工程技术研究院有限责任公司 The classification associated networking service system method and system of electronic document
CN102609189A (en) * 2012-01-13 2012-07-25 百度在线网络技术(北京)有限公司 Method and client side for processing content of messages of mobile terminal
CN103377232A (en) * 2012-04-25 2013-10-30 阿里巴巴集团控股有限公司 Headline keyword recommendation method and system
CN103513972B (en) * 2012-06-25 2017-06-27 联想(北京)有限公司 Display methods and electronic equipment
CN103513972A (en) * 2012-06-25 2014-01-15 联想(北京)有限公司 Display method and electronic equipment
CN106202146A (en) * 2012-07-16 2016-12-07 刘二中 A kind of search engine terminal use inputs the processing method of reference paper Search Hints information
CN106202146B (en) * 2012-07-16 2019-04-16 刘二中 A kind of search engine terminal user inputs the processing method of reference paper Search Hints information
CN103678379A (en) * 2012-09-17 2014-03-26 腾讯科技(深圳)有限公司 Method and device for pushing media information in real time on basis of user focus information
CN103678379B (en) * 2012-09-17 2019-01-29 腾讯科技(深圳)有限公司 The method and apparatus of the real-time push media information of information are absorbed in based on user
CN103019814A (en) * 2012-11-21 2013-04-03 北京荣之联科技股份有限公司 System and method for managing shear plate
CN103019814B (en) * 2012-11-21 2016-03-30 北京荣之联科技股份有限公司 A kind of shear plate management system and method
CN104077011B (en) * 2013-03-26 2017-08-11 北京三星通信技术研究有限公司 Correlating method and terminal device between a kind of same type document
CN104077011A (en) * 2013-03-26 2014-10-01 北京三星通信技术研究有限公司 Method for associating documents in same type and terminal equipment
CN105706046A (en) * 2013-08-02 2016-06-22 谷歌公司 Surfacing user-specific data records in search
CN104462056B (en) * 2013-09-17 2018-02-09 国际商业机器公司 For the method and information handling systems of knouledge-based information to be presented
US10698956B2 (en) 2013-09-17 2020-06-30 International Business Machines Corporation Active knowledge guidance based on deep document analysis
CN104462056A (en) * 2013-09-17 2015-03-25 国际商业机器公司 Active knowledge guidance based on deep document analysis
US10621220B2 (en) 2014-07-08 2020-04-14 Oath Inc. Method and system for providing a personalized snippet
WO2016004584A1 (en) * 2014-07-08 2016-01-14 Yahoo! Inc. Method and system for providing a personalized snippet
CN105488038A (en) * 2014-09-15 2016-04-13 阿里巴巴集团控股有限公司 Communication application personalized information matching method and device
CN104537040A (en) * 2014-12-23 2015-04-22 小米科技有限责任公司 Method and device for capturing webpage content and electronic device
CN104462558B (en) * 2014-12-26 2017-12-08 浙江宇视科技有限公司 The method and device of word in a kind of modification Lucene index files
US10769105B2 (en) 2014-12-26 2020-09-08 Zhejiang Uniview Technologies Co., Ltd. Modifying Lucene index file
WO2016101915A1 (en) * 2014-12-26 2016-06-30 浙江宇视科技有限公司 Lucene index file modification
CN104462558A (en) * 2014-12-26 2015-03-25 浙江宇视科技有限公司 Method and device for modifying words in Lucene index file
CN105808520A (en) * 2014-12-30 2016-07-27 联想(北京)有限公司 Electronic equipment and sentence processing method thereof
CN105808520B (en) * 2014-12-30 2018-12-14 联想(北京)有限公司 Electronic equipment and its sentence processing method
CN104850608A (en) * 2015-05-07 2015-08-19 深圳市世强先进科技有限公司 Method for searching keywords on information exhibiting page
CN104866545A (en) * 2015-05-07 2015-08-26 深圳市世强先进科技有限公司 Method for searching keywords on information display page
CN104922906A (en) * 2015-07-15 2015-09-23 网易(杭州)网络有限公司 Action executing method and device
CN104922906B (en) * 2015-07-15 2018-09-04 网易(杭州)网络有限公司 Action executes method and apparatus
WO2017028407A1 (en) * 2015-08-20 2017-02-23 百度在线网络技术(北京)有限公司 Method and device for extracting text digest
CN105574162A (en) * 2015-12-16 2016-05-11 南京鼎岩信息科技有限公司 Automatic hyperlink method of keyword
CN105574162B (en) * 2015-12-16 2019-05-03 南京鼎岩信息科技有限公司 The method of the automatic hyperlink of keyword
CN107092588A (en) * 2016-02-18 2017-08-25 腾讯科技(深圳)有限公司 A kind of text message processing method, device and system
CN107092588B (en) * 2016-02-18 2022-09-09 腾讯科技(深圳)有限公司 Text information processing method, device and system
CN107451168A (en) * 2016-05-30 2017-12-08 中华电信股份有限公司 File Classification System and Method Based on Vocabulary Statistics
CN107451168B (en) * 2016-05-30 2023-08-04 台湾中华电信股份有限公司 File classification system and method based on vocabulary statistics
US10769225B2 (en) 2016-08-15 2020-09-08 Richard S. Brown Processor-implemented method, computing system and computer program for invoking a search
CN107784027A (en) * 2016-08-31 2018-03-09 北京国双科技有限公司 A kind of reminding method and device of judgement document's search key
CN110603545A (en) * 2017-04-26 2019-12-20 谷歌有限责任公司 Organizing messages exchanged in a human-machine conversation with an automated assistant
CN110603545B (en) * 2017-04-26 2024-03-12 谷歌有限责任公司 Method, system and non-transitory computer readable medium for organizing messages
CN110019771A (en) * 2017-07-28 2019-07-16 北京国双科技有限公司 The method and device of text-processing
CN107545039B (en) * 2017-07-31 2021-05-18 腾讯科技(深圳)有限公司 Keyword index acquisition method and device, computer equipment and storage medium
CN107545039A (en) * 2017-07-31 2018-01-05 腾讯科技(深圳)有限公司 The index acquisition methods and device of keyword, computer equipment and storage medium
US10311874B2 (en) 2017-09-01 2019-06-04 4Q Catalyst, LLC Methods and systems for voice-based programming of a voice-controlled device
CN110019590A (en) * 2017-09-13 2019-07-16 北京嘀嘀无限科技发展有限公司 Method, apparatus, electronic equipment and the storage medium of map are shown in the page
CN110019590B (en) * 2017-09-13 2021-10-12 北京嘀嘀无限科技发展有限公司 Method and device for displaying map in page, electronic equipment and storage medium
CN108399213B (en) * 2018-02-05 2022-04-01 中国科学院信息工程研究所 User-oriented personal file clustering method and system
CN108399213A (en) * 2018-02-05 2018-08-14 中国科学院信息工程研究所 A kind of clustering method and system of user oriented personal document
CN108509585A (en) * 2018-03-29 2018-09-07 重庆大学 A kind of isomeric data real-time, interactive optimized treatment method
CN112384903B (en) * 2018-09-26 2022-06-24 多玩国株式会社 Server system, application distribution server, and reading terminal
US11936939B2 (en) 2018-09-26 2024-03-19 Dwango Co., Ltd. Server system, application program distribution server, viewing terminal, content viewing method, application program, distribution method, and application program distribution method
CN112384903A (en) * 2018-09-26 2021-02-19 多玩国株式会社 Server system, application distribution server, terminal for viewing, content viewing method, application, distribution method, and application distribution method
CN110968246A (en) * 2018-09-28 2020-04-07 北京搜狗科技发展有限公司 Intelligent Chinese handwriting input recognition method and device
CN109284352A (en) * 2018-09-30 2019-01-29 哈尔滨工业大学 A kind of querying method of the assessment class document random length words and phrases based on inverted index
CN109284352B (en) * 2018-09-30 2022-02-08 哈尔滨工业大学 Query method for evaluating indefinite-length words and sentences of class documents based on inverted index
CN109388806A (en) * 2018-10-26 2019-02-26 北京布本智能科技有限公司 A kind of Chinese word cutting method based on deep learning and forgetting algorithm
CN109933782B (en) * 2018-12-03 2023-11-28 创新先进技术有限公司 User emotion prediction method and device
CN109933782A (en) * 2018-12-03 2019-06-25 阿里巴巴集团控股有限公司 User emotion prediction technique and device
CN109615001A (en) * 2018-12-05 2019-04-12 上海恺英网络科技有限公司 A kind of method and apparatus identifying similar article
CN109857301A (en) * 2018-12-27 2019-06-07 维沃移动通信有限公司 Show the method and terminal device of information
CN109800303A (en) * 2018-12-28 2019-05-24 深圳市世强元件网络有限公司 A kind of document information extracting method, storage medium and terminal
CN110764668B (en) * 2019-10-30 2021-04-16 维沃移动通信有限公司 Comment information acquisition method and electronic equipment
CN110764668A (en) * 2019-10-30 2020-02-07 维沃移动通信有限公司 Comment information acquisition method and electronic equipment
CN111046252A (en) * 2019-11-20 2020-04-21 北京字节跳动网络技术有限公司 Information processing method, device, medium, electronic equipment and system
CN111966816B (en) * 2020-07-09 2022-07-12 福建亿榕信息技术有限公司 Intelligent association method and system for official documents
CN111966816A (en) * 2020-07-09 2020-11-20 福建亿榕信息技术有限公司 Intelligent association method and system for official documents
TWI733581B (en) * 2020-09-04 2021-07-11 南開科技大學 Online e-book translation and teaching match immediately system and method
CN112381519A (en) * 2020-11-20 2021-02-19 北京云族佳科技有限公司 Method and device for processing work logs and readable storage medium
CN114995691B (en) * 2021-03-01 2024-03-08 北京字跳网络技术有限公司 Document processing method, device, equipment and medium
CN114995691A (en) * 2021-03-01 2022-09-02 北京字跳网络技术有限公司 Document processing method, device, equipment and medium
CN113204579A (en) * 2021-04-29 2021-08-03 北京金山数字娱乐科技有限公司 Content association method, system, device, electronic equipment and storage medium
CN113743054A (en) * 2021-08-17 2021-12-03 上海明略人工智能(集团)有限公司 Alphabet vector learning method, system, storage medium and electronic device
CN113448918B (en) * 2021-08-31 2021-11-12 中国建筑第五工程局有限公司 Enterprise scientific research result management method, management platform, equipment and storage medium
CN113448918A (en) * 2021-08-31 2021-09-28 中国建筑第五工程局有限公司 Enterprise scientific research result management method, management platform, equipment and storage medium
CN115204123A (en) * 2022-07-29 2022-10-18 北京知元创通信息技术有限公司 Analysis method, analysis device and storage medium for collaborative editing of document
CN115204123B (en) * 2022-07-29 2023-02-17 北京知元创通信息技术有限公司 Collaborative editing document analysis method, analysis device, and storage medium
CN117556782A (en) * 2024-01-11 2024-02-13 深圳市度申科技有限公司 Text formatting method, electronic equipment and computer readable storage medium
CN117762889A (en) * 2024-02-20 2024-03-26 成都融见软件科技有限公司 Same-file multi-window state synchronization method, electronic equipment and medium
CN117762889B (en) * 2024-02-20 2024-04-19 成都融见软件科技有限公司 Same-file multi-window state synchronization method, electronic equipment and medium

Similar Documents

Publication Publication Date Title
CN101004737A (en) Individualized document processing system based on keywords
Da The computational case against computational literary studies
Balog Entity-oriented search
Schönfelder CAQDAS and qualitative syllogism logic—NVivo 8 and MAXQDA 10 compared
Loia et al. A fuzzy-oriented sentic analysis to capture the human emotion in Web-based content
JP5607164B2 (en) Semantic Trading Floor
CN1670733B (en) Rendering tables with natural language commands
US20160364377A1 (en) Language Processing And Knowledge Building System
US20140280072A1 (en) Method and Apparatus for Human-Machine Interaction
US20140280314A1 (en) Dimensional Articulation and Cognium Organization for Information Retrieval Systems
Maynard et al. Ontology-based information extraction for market monitoring and technology watch
Light From words to networks and back: Digital text, computational social science, and the case of presidential inaugural addresses
CN111339284A (en) Product intelligent matching method, device, equipment and readable storage medium
WO2015084404A1 (en) Matching of an input document to documents in a document collection
US20240104405A1 (en) Schema augmentation system for exploratory research
Alexander et al. Metaphor, popular science, and semantic tagging: Distant reading with the Historical Thesaurus of English
Rubinstein Historical corpora meet the digital humanities: the Jerusalem corpus of emergent modern Hebrew
Völkel Personal knowledge models with semantic technologies
Phan et al. Applying skip-gram word estimation and SVM-based classification for opinion mining Vietnamese food places text reviews
Bakalov et al. A hybrid approach to identifying user interests in web portals
Della Volpe et al. Semantic predicates in the business language
Kasmuri et al. Building a Malay-English code-switching subjectivity corpus for sentiment analysis
Fritzner Automated information extraction in natural language
Hassanian-esfahani et al. A survey on web news retrieval and mining
Woldeyohannis et al. Usable Amharic text corpus for natural language processing applications

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Open date: 20070725