CN103383697B - Method and equipment for determining object representation information of object header - Google Patents
Method and equipment for determining object representation information of object header Download PDFInfo
- Publication number
- CN103383697B CN103383697B CN201310260162.4A CN201310260162A CN103383697B CN 103383697 B CN103383697 B CN 103383697B CN 201310260162 A CN201310260162 A CN 201310260162A CN 103383697 B CN103383697 B CN 103383697B
- Authority
- CN
- China
- Prior art keywords
- information
- title
- titles
- determines
- equipment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention aims to provide a method and equipment for determining object representation information of an object header corresponding to a target object. The method specifically comprises the steps as follows: obtaining a plurality of training headers, creating or updating corresponding labeling pattern dictionaries via labeling pattern information in the plurality of training headers, obtaining an object header of a to-be-processed target object, performing filtering treatment on the object header via the labeling pattern dictionaries, and determining the object representation information of the object header corresponding to the target object via related word information of header words in the object header after filtering treatment. Compared with the prior art, the method performs filtering treatment on an object header of a target object via labeling pattern dictionaries, and determines the object representation information of the object header corresponding to the target object according to the related word information of header words in the object header after filtering treatment, so as to effectively identify low-quality object headers, improve the efficiency of obtaining information by users, and enhance the information sharing experience of the users.
Description
Technical field
The present invention relates to Internet technical field, more particularly, to a kind of object characterization information for determining object titles
Technology.
Background technology
Currently, the development with Internet technology and the Internet, applications are to user learning, work and the infiltration lived, people
Pass through network acquisition information more and more, and the information being had is shared by network, such as in Baidu library, beans
The network platforms such as fourth, space upload the data content that it has.However, destination object such as document, video, picture that user uploads
Deng object titles quality uneven, low-quality object titles generally can not reflect the true letter of corresponding destination object content
Breath, prior art cannot effectively judge low-quality object titles, correspondingly, also low-quality object titles cannot be given
Optimize configured information, to point out user that object titles are improved, not only reduce the efficiency that user obtains information, also affect
The Information Sharing experience of user.
Content of the invention
It is an object of the invention to provide a kind of for determining the object characterization information with regard to corresponding destination object for the object titles
Method and apparatus.
According to an aspect of the invention, it is provided a kind of for determining the object with regard to corresponding destination object for the object titles
The method of characterization information, wherein, the method comprises the following steps:
X obtains multiple training titles;
Y, according to the label pattern information in the plurality of training title, sets up or updates corresponding label pattern dictionary, its
In, described label pattern dictionary includes one or more label patterns and its frequency information;
Wherein, the method also includes:
A obtains the object titles of pending destination object;
B, according to described label pattern dictionary, carries out filtration treatment to described object titles;
C, according to the word relevant information of the title word in the described object titles after filtration treatment, determines described object
Title is with regard to the object characterization information of described destination object.
According to a further aspect in the invention, additionally provide a kind of right with regard to corresponding destination object for determining object titles
As the information of characterization information determines equipment, wherein, this information determines that equipment includes:
Training acquisition device, for obtaining multiple training titles;
Device set up by dictionary, corresponds to for according to the label pattern information in the plurality of training title, setting up or updating
Label pattern dictionary, wherein, described label pattern dictionary includes one or more label patterns and its frequency information;
Wherein, this information determines that equipment also includes:
Title acquisition device, for obtaining the object titles of pending destination object;
Filtration treatment device, for according to described label pattern dictionary, carrying out filtration treatment to described object titles;
Characterize and determine device, for the related letter of word according to the title word in the described object titles after filtration treatment
Breath, determines the object characterization information with regard to described destination object for the described object titles.
According to a further aspect of the invention, additionally provide a kind of computer equipment, including such as aforementioned another according to the present invention
The information for determining the object characterization information with regard to corresponding destination object for the object titles of one side determines equipment.
According to a further aspect of the invention, additionally provide a kind of browser, including such as aforementioned according to the present invention another
The information for determining the object characterization information with regard to corresponding destination object for the object titles of aspect determines equipment.
According to a further aspect of the invention, additionally provide a kind of browser plug-in, including such as aforementioned another according to the present invention
The information for determining the object characterization information with regard to corresponding destination object for the object titles of one side determines equipment.
Compared with prior art, the present invention passes through according to the label pattern dictionary set up or update, to the target pair obtaining
The object titles of elephant carry out filtration treatment, related with the word according to the title word in the described object titles after filtration treatment
Information, determine described object titles with regard to the object characterization information of described destination object it is achieved that effectively identification is low-quality right
As title, not only increase the value of Information Sharing and user obtains the efficiency of information, also improve the Information Sharing body of user
Test.And, when described object characterization information is less than predetermined sign threshold information, the present invention may further determine that with regard to described object mark
The optimization configured information of topic, described optimization configured information is supplied to the user corresponding to described destination object, thus entering one
Improve to step the value of Information Sharing and user obtains the efficiency of information, improve the Information Sharing experience of user.Additionally, working as
When the object language type information of described destination object is inconsistent with the title language type information of described object titles, the present invention
Also described optimization can be contained in refer to corresponding under described object language type information for described object titles with reference to heading message
Showing information, thus further improve the value of Information Sharing and the efficiency of user's acquisition information, improving the letter of user
Breath shares experience.
Brief description
By reading the detailed description that non-limiting example is made made with reference to the following drawings, other of the present invention
Feature, objects and advantages will become more apparent upon:
Fig. 1 illustrates according to one aspect of the invention for determining the object characterization with regard to corresponding destination object for the object titles
The equipment schematic diagram of information;
It is right with regard to corresponding destination object for determining object titles that Fig. 2 illustrates in accordance with a preferred embodiment of the present invention
Equipment schematic diagram as characterization information;
Fig. 3 illustrate according to a further aspect of the present invention for determining the Object table with regard to corresponding destination object for the object titles
The method flow diagram of reference breath;
It is right with regard to corresponding destination object for determining object titles that Fig. 4 illustrates in accordance with a preferred embodiment of the present invention
Method flow diagram as characterization information.
In accompanying drawing, same or analogous reference represents same or analogous part.
Specific embodiment
Below in conjunction with the accompanying drawings the present invention is described in further detail.
Fig. 1 illustrates according to one aspect of the invention for determining the object characterization with regard to corresponding destination object for the object titles
The information of information determines equipment 1, and wherein, information determines that equipment 1 includes training acquisition device 11, dictionary to set up device 12, title
Acquisition device 13, filtration treatment device 14 and sign determine device 15.Specifically, training acquisition device 11 obtains multiple training marks
Topic;Dictionary sets up device 12 according to the label pattern information in the plurality of training title, sets up or updates corresponding label mould
Formula dictionary, wherein, described label pattern dictionary includes one or more label patterns and its frequency information;Title acquisition device 13
Obtain the object titles of pending destination object;Filtration treatment device 14 according to described label pattern dictionary, to described object
Title carries out filtration treatment;Characterize the word determining device 15 according to the title word in the described object titles after filtration treatment
Relevant information, determines the object characterization information with regard to described destination object for the described object titles.Here, information determines that equipment 1 wraps
Include but be not limited to as:1) it is applied not only to provide information storage space for its login user, uploaded to share it with realizing this user
Destination object such as document, video, picture;Can be additionally used in providing the user online reading, download, exchange the mesh that other users are shared
The mark network platform of object or terminal platform, such as Baidu library, beans fourth, Sina's love are asked, road visitor Ba Ba etc., wherein, described terminal
Platform includes but is not limited to the user equipment such as mobile terminal, PC;2) it is used for being embodied as the offer message reference of its login user, information
Shared, information is issued or the network platform of synchronization or terminal platform, such as social network sites, forum, space, blog, microblogging etc. the 3rd
Square website.Here, information determines that equipment 1 includes but is not limited to user network equipment, user equipment or the network equipment and user sets
For by the mutually integrated equipment being constituted of network.Here, described network determines equipment including but not limited to as network host, single
The webserver, multiple webserver collection or the set of computers based on cloud computing etc. are realized;Or realized by user equipment.
Here, cloud is made up of a large amount of main frames based on cloud computing (Cloud Computing) or the webserver, wherein, cloud computing is
One kind of Distributed Calculation, a super virtual computer being made up of a group loosely-coupled computer collection.Here, described use
Family equipment can be that any one can be carried out by modes such as keyboard, mouse, touch pad, touch-screen or handwriting equipments with user
The electronic product of man-machine interaction, such as computer, mobile phone, PDA, palm PC PPC or panel computer etc..Described network include but
It is not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN, VPN, wireless self-organization network (Ad Hoc network) etc..This area
Technical staff will be understood that above- mentioned information determines that equipment 1 is only for example, other network equipments that are existing or being likely to occur from now on
Or user equipment is such as applicable to the present invention, within also should being included in the scope of the present invention, and here is comprised with way of reference
In this.Here, the network equipment and user equipment all include a kind of can automatically entering line number according to the instruction being previously set or store
Value calculates the electronic equipment with information processing, and its hardware includes but is not limited to microprocessor, special IC (ASIC), can compile
Journey gate array (FPGA), digital processing unit (DSP), embedded device etc..
Specifically, training acquisition device 11 passes through the application journey that the third party device such as browser, search engine provides
Sequence interface (API), obtains multiple training titles;Or, obtaining of the third party devices such as search engine, browser offer is be provided
Take the application programming interfaces (API) that family uploads daily record, obtain multiple users and upload daily record;Then, upload from the plurality of user
Multiple training titles are obtained in daily record.For example, training acquisition device 11 uploads daily record by the acquisition that provides that browser provides
Application programming interfaces (API), get multiple users and upload daily record, such as within certain time, which document user uploads, regards
Frequently, picture etc.;Then, training acquisition device 11 uploads from the plurality of user and obtains below multiple training titles training mark daily record
Topic I to VIII etc. as:
I " the 6th chapter serial line interface 2010 spring "
II " Algorithms for Page Ranking based on Segment "
III " the 8th chapter application layer "
IV " 5-5_ minimum cost maximum flow problem-xfj "
V " angular momentum of 3-6 particle and angular momentum theorem -1 "
VI " 2011-12 ground knot "
VII " experiment seven Network Sniffings "
VIII " the WEB page block algorithms of facing mobile apparatus "
............
Those skilled in the art will be understood that above-mentioned acquisition multiple training title mode be only for example, other existing or
The mode of the acquisition multiple training title being likely to occur from now on is such as applicable to the present invention, also should be included in the scope of the present invention
Within, and here is incorporated herein with way of reference.
Dictionary sets up device 12 according to the label pattern information in the plurality of training title, sets up or updates corresponding mark
Number pattern dictionary, wherein, described label pattern dictionary includes one or more label patterns and its frequency information.Specifically, word
Allusion quotation is set up device 12 and can first-selected the plurality of training title be normalized;Then, to described in after normalized
Label pattern information in multiple training titles carries out label and processes, to determine corresponding to the plurality of training title
Or multiple label pattern;Then, then to one or more of label patterns carry out statistical disposition, obtain described label pattern word
Allusion quotation.Here, the including but not limited to following at least any one of described normalized:1) to the alphabet size in described training title
Write and be normalized, the described alphabet size trained in title will write into capable unification;2) in described training title
Character carries out full-shape/half-angle normalized.Here, described label pattern information represents mark training present in training title
Time comprising in chapters and sections belonging to title, mark training title etc. does not characterize the content part of essential meaning, such as " the 6th chapter ",
" 2.1 section ", " experiment seven ", " 3-6 ", " 2011-12 " etc..Those skilled in the art will be understood that above-mentioned label pattern information and return
One change processing mode is only for example, and other existing or label pattern information of being likely to occur from now on or normalized mode are such as
It is applicable to the present invention, within also should being included in the scope of the present invention, and here is incorporated herein with way of reference.
For example, connect example, dictionary is set up device 12 and training title I to VIII that training acquisition device 11 gets etc. is entered
After row normalized, the label pattern information in multiple training title I to VIII after normalized etc. is carried out mark
Number process, such as by number designation be substituted for character " _ ", to determine the one or more labels corresponding to the plurality of training title
Pattern, such as obtains training in title II and VIII do not have label pattern, and trains and comprise label in title I, III to VII respectively
Pattern " the _ chapter ", the _ chapter ", " _-_ ", " _-_ ", " _ _ _ _-_ _ ", " experiment _ ";Then, dictionary sets up device 12 again to described
One or more label patterns carry out statistical disposition, and label pattern and its corresponding frequency information are stored in label pattern dictionary
In, such as obtain mark pattern dictionary as shown in the following Table 1, wherein, described label pattern dictionary includes one or more marks
Number pattern and its frequency information, and this label pattern dictionary can be updated by certain way, such as update according to predetermined period, timing,
Update described label pattern dictionary immediately:
Label pattern | Frequency information |
The _ chapter | 449291 |
____-__-__ | 144205 |
____-__ | 49938 |
Experiment _ | 90522 |
The _ _ chapter | 80418 |
(_) | 57856 |
Table 1
Preferably, dictionary set up device 12 also can be first to the training training title I that gets of acquisition device 11 extremely
VIII etc. carries out label and processes, to determine the one or more label patterns corresponding to the plurality of training title;Then, right
Described label pattern carries out statistical disposition, to obtain corresponding initial label pattern dictionary, wherein, described initial label pattern word
Allusion quotation includes included label pattern and its corresponding frequency information in the plurality of training title;Then, further according to described frequency
Secondary information, carries out Screening Treatment to the label pattern in described initial title pattern dictionary, to obtain described label pattern dictionary.
For example, also connect example, dictionary is set up device 12 and carried out label process to the plurality of training title first, such as by number designation
Be substituted for character " _ ", with determine the plurality of training title corresponding to one or more label patterns, such as obtain train title
There is no label pattern in II and VIII, and train and in title I, III to VII, comprise label pattern " the _ chapter ", the _ chapter respectively ",
" _-_ ", " _-_ ", " _ _ _ _-_ _ ", " experiment _ ";Then, statistical disposition is carried out to described label pattern, corresponding first to obtain
Beginning label pattern dictionary, wherein, described initial label pattern dictionary includes included label mould in the plurality of training title
Formula and its corresponding frequency information, as obtained the initial label pattern dictionary shown in above-mentioned table 1;Then, further according to the described frequency
Information, carries out Screening Treatment to the label pattern in described initial title pattern dictionary, to obtain described label pattern dictionary, such as
By frequency information be less than predetermined threshold such as 50000 label pattern be removed, acquisition described label pattern dictionary, such as obtain as
Label pattern dictionary shown in table 2:
Label pattern | Frequency information |
The _ chapter | 449291 |
____-__-__ | 144205 |
Experiment _ | 90522 |
The _ _ chapter | 80418 |
(_) | 57856 |
Table 2
Those skilled in the art will be understood that above-mentioned foundation or the mode of the corresponding label pattern dictionary of renewal are only for example,
Other existing or foundation being likely to occur from now on or the mode updating corresponding label pattern dictionary are such as applicable to the present invention,
Also within the scope of the present invention should being included in, and here is incorporated herein with way of reference.
Title acquisition device 13 passes through the application programming interfaces that the third party device such as browser, search engine provides
(API) object titles of pending destination object, are obtained;Or, by dynamic web page techniques such as ASP, JSP, obtain user
The object titles of the destination object being uploaded by its user equipment PC, using the object titles as pending destination object.?
This, described destination object include but is not limited to that user uploads with the media formats such as document, video, picture, daily record or a combination thereof,
Or the combination of one or more of which, carry the information for sharing.For example, it is assumed that user A logs in Baidu library http://
After wenku.baidu.com/, upload PDF document document1, " LTE physical down controls its entitled title1
Channel blind detection process study " and document2, its entitled title2 " the piece inner joint of the 5th chapter MCS-51 series monolithic and
Interrupt ", then the application programming interfaces (API) that title acquisition device 13 is provided by Baidu library, just can get user A and pass through
The object titles " LTE Physical Downlink Control Channel blind check process study " of destination object and " the 5th chapter that its user equipment PC uploads
The piece inner joint of MCS-51 series monolithic and interruption ".
Those skilled in the art will be understood that the mode of the object titles of the pending destination object of above-mentioned acquisition is only and lift
Example, the mode of the pending object titles of destination object of other acquisitions that are existing or being likely to occur from now on is such as applicable to this
Invention, within also should being included in the scope of the present invention, and here is incorporated herein with way of reference.
Filtration treatment device 14, according to described label pattern dictionary, carries out filtration treatment to described object titles, such as filters
The label pattern corresponding label pattern information in described label pattern dictionary is met in described object titles.For example, connect example,
Filtration treatment device 14 sets up the described label pattern dictionary of device 12 foundation according to dictionary, and title acquisition device 13 is got
User A upload the object titles title1 " LTE Physical Downlink Control Channel blind check process study " of document document1 and
The object titles title2 " the piece inner joint of the 5th chapter MCS-51 series monolithic and interruption " of document document2 is filtered
Process, each meet in described label pattern dictionary in object titles as filtered document document1 and document document2
Label pattern corresponding label pattern information, such as filters document document2 object titles title2 " the 5th chapter MCS-51 series
Label pattern information " the 5th chapter " in the piece inner joint of single-chip microcomputer and interruption ", and the object titles title1 of document1
There is not the label pattern meeting in described label pattern dictionary in " LTE Physical Downlink Control Channel blind check process study " to correspond to
Label pattern information, then filtration treatment device 14 not the object titles title1 to document1 " LTE physical down control
Channel blind detection process study " carries out filtration treatment.
Those skilled in the art will be understood that the above-mentioned mode carrying out filtration treatment to described object titles is only for example, its
He will such as be applicable to the present invention at the mode that described object titles are carried out with filtration treatment that is existing or being likely to occur from now on, also should
Within being included in the scope of the present invention, and here is incorporated herein with way of reference.
Characterize the word relevant information determining device 15 according to the title word in the described object titles after filtration treatment,
Determine the object characterization information with regard to described destination object for the described object titles.Specifically, characterizing determines device 15 first to mistake
Described object titles after filter is processed carry out word segmentation processing, to obtain the title word in described object titles;Then, further according to
The word relevant information of described title word, determines the object characterization information with regard to described destination object for the described object titles.?
This, described word relevant information include but is not limited under at least any one:1) word of the title word in described object titles
Frequency information, here, the word frequency information of described title word can be obtained by query terms frequency database, wherein, described word
Frequency database can be and pre-sets, and also can be obtained by carrying out statistics to the title words in multiple training titles;2) institute
State the quantity information of the title word in object titles;3) quantity information of the character in described object titles.Here, it is described right
As characterization information is used for representing the quality information of described object titles, it not only reflects described object titles to described target pair
The sign ability of the content information of elephant, also embodies whether described object titles can characterize the interior of described destination object well
The tolerance of appearance information, it such as can be represented using numerical value with quantificational expression, and it can be such as high and low etc. with qualitative representation.For example, mistake
Filter processing meanss 14 are to document2 object titles title2 " the piece inner joint of the 5th chapter MCS-51 series monolithic and interruption "
Filtering object title titile2 ' " the piece inner joint of MCS-51 series monolithic and interruption ", then table is obtained after carrying out filtration treatment
Levy determination device 15 and first word segmentation processing is carried out to filtering object title titile1 ', to obtain filtering object title titile2 '
The bag of words information " interruption of MCS-51 interface microcontroller " obtaining after word segmentation processing, that is, obtain filtering object title titile1 ' institute
Corresponding title word information;Then, characterize the word determining device 15 according to title word " interruption of MCS-51 interface microcontroller "
Language relevant information, determines the object characterization information with regard to described destination object for the described object titles, such as assumes filtering object title
Title word " interruption of MCS-51 interface microcontroller " corresponding word frequency information in titile2 ' be respectively 9486,503200,
664560th, 432598, have more than predetermined threshold such as 400000 word frequency in title word " interruption of MCS-51 interface microcontroller "
Title word " interface microcontroller interruption ", then characterize determine device 15 can determine that object titles title2 " the 5th chapter MCS-51 system
The piece inner joint of row single-chip microcomputer and interruption " is height with regard to the object characterization information of described destination object document2;For another example, false
If the title word in filtering object title titile2 ' " interruption of MCS-51 interface microcontroller " corresponding word frequency information is respectively
9486th, 303200,264560,392598, do not have more than predetermined threshold in title word " interruption of MCS-51 interface microcontroller "
The title word of the word frequency of value such as 400000, but the quantity information of title word " interruption of MCS-51 interface microcontroller " satisfaction is more than
Equal to predetermined threshold 4, then characterize and determine that device 15 can determine that the object titles title2 " piece of the 5th chapter MCS-51 series monolithic
Inner joint and interruption " is height with regard to the object characterization information of described destination object document2;Also such as, if title word " MCS-
51 interface microcontrollers interrupt " in there is no the title word of word frequency more than predetermined threshold such as 400000 and/or title word
Quantity information is also unsatisfactory for predetermined threshold 4, then characterize and determine that device 15 can determine that object titles title2 " the 5th chapter MCS-51 system
The piece inner joint of row single-chip microcomputer and interruption " is low with regard to the object characterization information of described destination object document2.Here, institute
Predicate speech frequency rate database can be located at information and determines in equipment 1, may be alternatively located at and determines, with information, the net that equipment 1 is connected by network
In network equipment.
Here, the present invention is by the related letter of the word according to the title word in the described object titles after filtration treatment
Breath, because the described object titles after filtration treatment provide object titles content closer to real quality it is achieved that low-quality
Header identification rate and recognition accuracy respectively reach 93% and 91% beneficial effect.
Those skilled in the art will be understood that the object characterization with regard to described destination object for the described object titles of above-mentioned determination
The mode of information is only for example, and other existing or object titles described in determination of being likely to occur from now on are with regard to described destination object
The mode of object characterization information be such as applicable to the present invention, within also should being included in the scope of the present invention, and here is to draw
It is incorporated herein with mode.
It is constant work that information determines between each device of equipment 1.Specifically, training acquisition device 11 continues
Obtain multiple training titles;Dictionary is set up device 12 and is continued, according to the label pattern information in the plurality of training title, to set up
Or update corresponding label pattern dictionary, wherein, described label pattern dictionary includes one or more label patterns and its frequency
Information;Title acquisition device 13 persistently obtains the object titles of pending destination object;Filtration treatment device 14 continues basis
Described object titles are carried out filtration treatment by described label pattern dictionary;Characterize after determining that device 15 continues according to filtration treatment
Described object titles in title word word relevant information, determine that described object titles are right with regard to described destination object
As characterization information.Here, skilled artisan would appreciate that " continuing " information of referring to determines between each device of equipment 1 respectively
Constantly be trained the acquisition of title, the foundation of label pattern dictionary or renewal, the acquisition of object titles, to described object
Title carries out the determination of filtration treatment and object characterization information, until information determines that equipment 1 stops " object mark in a long time
The acquisition of topic ".
Preferably, information determines that equipment 1 also includes pretreatment unit (not shown), and specifically, pretreatment unit is to filtration
Described object titles after process are pre-processed, to obtain pretreated described object titles;Wherein, characterize and determine device
15, according to the word relevant information of the title word in pretreated described object titles, determine described object titles with regard to institute
State the object characterization information of destination object.
Specifically, pretreatment unit pre-processes to the described object titles after filtration treatment, to obtain after pretreatment
Described object titles.Here, the including but not limited to following at least any one of described pretreatment:1) to described in after filtration treatment
Object titles carry out punctuation mark denoising, that is, remove the punctuation mark in the described object titles after filtration treatment;2) right
Described object titles after filtration treatment carry out ASCII symbol removal and process, but simultaneously according to predetermined foreign language dictionary, retain and filter
Foreign language words in the described predetermined foreign language dictionary having in described object titles after process, wherein, described predetermined outer cliction
Allusion quotation can be and pre-sets, and such as existing collects that English glossary arranges in some sequence and be further explained supplies people to check the English of reference
Cliction allusion quotation;Also can be obtained by statistics is carried out to the title word in multiple English training titles.
For example, the object titles title1 for document document1 " grind by LTE Physical Downlink Control Channel blind check process
Study carefully ", filtration treatment device 14 carries out to titile1 obtaining filtering object title title1 ' " LTE physical down after filtration treatment
Control channel blind check process study ", then " LTE Physical Downlink Control Channel is blind to filtering object title title1 ' for pretreatment unit
Inspection process study " is pre-processed it is assumed that the English word " LTE " in filtering object title title1 ' is present in predetermined foreign language
In dictionary, then after filtration treatment device pre-processes to filtering object title title1 ', obtain pretreated described object
Title such as titile1 " " LTE Physical Downlink Control Channel blind check process study ";For another example, for the object titles of document2
Title2 " the piece inner joint of the 5th chapter MCS-51 series monolithic and interruption ", filtration treatment device 14 filters to titile2
Filtering object title title2 ' " the piece inner joint of MCS-51 series monolithic and interruption ", then pretreatment unit is obtained after process
Filtering object title title2 ' " the piece inner joint of MCS-51 series monolithic and interruption " is pre-processed it is assumed that filtered right
As the English word " MCS-51 " in title title2 ' is not present in predetermined foreign language dictionary, then filtration treatment device is to filtration
After object titles title2 ' is pre-processed, obtain pretreated described object titles such as titile2 " " series monolithic
Piece inner joint and interruption ".
Those skilled in the art will be understood that the above-mentioned mode that described object titles after filtration treatment are pre-processed
It is only for example, other modes that the described object titles after filtration treatment are pre-processed that are existing or being likely to occur from now on
As being applicable to the present invention, within also should being included in the scope of the present invention, and here is incorporated herein with way of reference.
Then, characterize the related letter of word determining device 15 according to the title word in pretreated described object titles
Breath, determines the object characterization information with regard to described destination object for the described object titles.Here, characterize determining that device 15 is located according to pre-
The word relevant information of the title word in described object titles after reason determines described object titles with regard to described destination object
Object characterization information and aforementioned characteristic determine device 15 according to the title word in the described object titles after filtration treatment
Word relevant information determines that described object titles are same or similar with regard to the mode of the object characterization information of described destination object, is
For the sake of simple and clear, therefore will not be described here, and comprise by reference and this.
In another preferred embodiment, can be by above-mentioned for determining the object characterization with regard to corresponding destination object for the object titles
The information of information determines equipment 1, combines with existing browser, constitutes a kind of new browser, and existing browser includes
The IE browser of such as Microsoft Corporation, the netscape browser of Netscape company, the Firefox of Mozilla company
Browser, the Chrome browser of Google company, the Maxthon browser of company of roaming, the opera of Opera company browse
Device, 360 browsers of 360 companies, the sogou browser of Sohu.com Inc., tencent TT browser of Tencent etc..
In another preferred embodiment, can be by above-mentioned for determining the object characterization with regard to corresponding destination object for the object titles
The information of information determines equipment 1, combines with existing browser plug-in, constitutes a kind of new browser plug-in, existing clear
Device plug-in unit of looking at is included as Flash plug-in unit, RealPlayer plug-in unit, MMS plug-in unit, MIDI staff plug-in unit, ActiveX plug-in unit etc..
It is right with regard to corresponding destination object for determining object titles that Fig. 2 illustrates in accordance with a preferred embodiment of the present invention
As the equipment schematic diagram of characterization information, wherein, information determine equipment 1 include training acquisition device 11 ', dictionary set up device 12 ',
Title acquisition device 13 ', filtration treatment device 14 ', sign determine device 15 ', optimize determination device 16 ' and offer device 17 '.
Specifically, training acquisition device 11 ' obtains multiple training titles;Dictionary sets up device 12 ' according in the plurality of training title
Label pattern information, set up or update corresponding label pattern dictionary, wherein, described label pattern dictionary includes one or many
Individual label pattern and its frequency information;Title acquisition device 13 ' obtains the object titles of pending destination object;Filtration treatment
Device 14 ', according to described label pattern dictionary, carries out filtration treatment to described object titles;Characterize and determine device 15 ' according to mistake
The word relevant information of the title word in described object titles after filter process, determines described object titles with regard to described target
The object characterization information of object;When described object characterization information is less than predetermined sign threshold information, optimizes and determine device 16 ' really
The fixed optimization configured information with regard to described object titles;Described optimization configured information is supplied to described target by offer device 17 '
User corresponding to object.Here, training acquisition device 11 ', dictionary set up device 12 ', title acquisition device 13 ', at filtration
Reason device 14 ', sign determine that device 15 ' is same or similar with the content of corresponding intrument in Fig. 1 embodiment, for simplicity's sake, therefore
Will not be described here, and comprise by reference and this.
Specifically, when described object characterization information is less than predetermined sign threshold information, optimization determines that device 16 ' determines and closes
Optimization configured information in described object titles.Here, how described optimization configured information includes instruction user to described object
Title is modified, is optimized to obtain the information of high-quality object titles, as amending advice with regard to described object titles etc..
Here, the mode optimizing the optimization configured information determining that device 16 ' determines with regard to described object titles is including but not limited to following
At least any one:
1) summary info according to described destination object, determines described optimization configured information.Specifically, optimize and determine device
16 ' can carry out semantic analysis process to the summary info of described destination object first, to obtain one or more summary keywords;
Then, further according to one or more of summary keywords, in title with carrying out matching inquiry in repertorie, with according to Query Result
Determine described optimization configured information.For example, it is assumed that title acquisition device 13 ' gets pending following destination object
object-document:
Title title:Test seven Network Sniffings
Summary info abstract:Based on Ethereal Sniffer software sniff packet, and the data being arrived according to sniff
Bag judges network condition.
Body matter information content:【Experimental principle】Network monitoring is a kind of conventional Passive Network attack method,
Invader's very unobtainable information of acquisition additive method easily can be helped, including user password, account, sensitive data, IP
Address, routing iinformation, TCP socket number etc........
Assume to characterize and determine that device 15 ' determines the described object with regard to destination object object-document for the title title
Characterization information is less than predetermined sign threshold information, then optimize and determine that device 16 ' first can be to destination object object-document
Summary info abstract carry out semantic analysis process, to obtain one or more summary keywords, such as " Ethereal sniff
Packet networks situation ";Then, optimizing determines device 16 ' further according to this summary keyword " Ethereal sniff packet networks
Situation ", in title with carrying out matching inquiry in repertorie, to determine described optimization configured information according to Query Result, as when in institute
State matching inquiry in title repertorie and make a summary what keyword " Ethereal sniff packet networks situation " matched to described
In title term and/or described summary keyword " Ethereal sniff packet networks situation " with described title repertorie in
When the quantity that title term matches accounts for the ratio of the total quantity of described keyword and meets predetermined threshold such as 0.8, then optimize and determine
The described optimization configured information that device 16 ' determines includes " in conjunction with summary info, described object titles can be optimized ", otherwise,
Optimize and determine that the described optimization configured information that device 16 ' determines includes " suggestion is optimized to described object titles ".Here, institute
State title repertorie and can be located at information and determine in equipment 1, may be alternatively located at and determine that equipment 1 is set by the network that network is connected with information
In standby.
2) degree of correlation according to described object titles and the body matter information of described destination object, in conjunction with described target pair
The quantity information of the text word of the body matter information of elephant, determines described optimization configured information.Specifically, optimize and determine device
16 ' first can be by the title word information matches corresponding to such as described object titles in the body matter of described destination object
The title word quantity information of the text word information corresponding to information, or, by described object titles and described target pair
The matching degree of the body matter information of elephant, determines that described object titles are related to the body matter information of described destination object
Degree;Then, optimize and determine device 16 ' according to this degree of correlation, in conjunction with the text word of the body matter information of described destination object
Quantity information, determine described optimization configured information.For example, connect example, optimize and determine device 16 ' first to described object mark
Inscribe, and the body matter information of described destination object carries out semantic analysis process, obtains the mark corresponding to described object titles
Epigraph language information " Network Sniffing ", and the text word information " network monitoring corresponding to the body matter information of destination object
Sniff packet network interface card experimental service configures ";Then, optimize and determine title according to corresponding to described object titles for the device 16 '
It is matched with the title word quantity of the text word information corresponding to body matter information of described destination object in word information
Information, determines the degree of correlation of described object titles and the body matter information of described destination object, as being matched with described target
The title word quantity information of the text word information corresponding to body matter information of object and described title word total quantity
Ratio, as the described degree of correlation;Then, optimize and determine device 16 ' according to this degree of correlation, in conjunction with the text of described destination object
The quantity information of the text word of content information, determines described optimization configured information, as assumed the title corresponding to title title
It is matched with the mark of the text word information corresponding to body matter information of described destination object in word information " Network Sniffing "
Epigraph language quantity information is 100% with the ratio of described title word total quantity, then optimize and determine that device 16 ' determines described object
Title is 1 with the degree of correlation of the body matter information of described destination object;Then, optimizing determines device 16 ' according to this degree of correlation
1, body matter information content of combining target object object-document:Text word quantity information, such as false
If the quantity information of the text word of the body matter information of described destination object has 20, determine that described optimization indicates letter
Breath, such as " in conjunction with body matter information, described object titles can be optimized ", otherwise, optimizes the institute determining that device 16 ' determines
State optimization configured information and include " suggestion is optimized to described object titles ".
Here, the present invention passes through the summary info of combining target object and/or the body matter combining described destination object
Information is it is achieved that the beneficial effect to 100% for the rate of accuracy reached of the described optimization configured information determining.
Those skilled in the art will be understood that above-mentioned determination with regard to the optimization configured information of described object titles mode only
For citing, other existing or determinations of being likely to occur from now on such as can with regard to the mode of the optimizations configured information of described object titles
It is applied to the present invention, within also should being included in the scope of the present invention, and here is incorporated herein with way of reference.
Preferably, when the object language type information of described destination object is believed with the title language type of described object titles
When ceasing inconsistent, optimize determine device 16 ' also can by described object titles under described object language type information corresponding ginseng
Examine heading message and be contained in described optimization configured information.For example, it is assumed that the object language class of destination object object-document
Type information is English, and the title language type information of object titles titile is Chinese, then optimize and determine that device 16 ' also can be by
Described object titles titile corresponding reference heading message under described object language type information is contained in described optimization and refers to
Show information, described optimization configured information will be contained in reference to heading message by the corresponding English of object titles titile.
Offer device 17 ' passes through the dynamic web page technique such as ASP, JSP or PHP, or the communication mode of other agreements,
As communication protocols such as http or https, described optimization configured information is supplied to the user corresponding to described destination object, such as should
The user equipment of user, reads for user and browses.
Preferably, preferably determine that device 16 ' includes degree of correlation determining unit (not shown) and optimization determining unit (is not shown
Go out).Specifically, when described object characterization information is less than predetermined sign threshold information, degree of correlation determining unit determines described mesh
The body matter information of mark object and the degree of correlation of described title repertorie;Optimize determining unit according to the described degree of correlation, determine
Described optimization configured information.
Specifically, when described object characterization information is less than predetermined sign threshold information, degree of correlation determining unit determines institute
State the body matter information of destination object and the degree of correlation of described title repertorie.Specifically, when described object characterization information is low
When predetermined sign threshold information, content keyword coupling according to corresponding to described body matter information for the degree of correlation determining unit
The keyword quantity information of the title term in described title repertorie, determines the described degree of correlation, and content is crucial as will be described
It is matched with the keyword quantity information of title term in described title repertorie and described content keyword total quantity in word
Ratio, as the described degree of correlation.For example, it is assumed that characterize determining that device 15 ' determines title title with regard to destination object object-
The described object characterization information of document is less than predetermined sign threshold information, then degree of correlation determining unit is first to destination object
Body matter information content of object-document carries out semantic analysis process, obtains body matter information content
Corresponding content keyword " configuration of network monitoring sniff packet network interface card experimental service ";Then, degree of correlation determining unit root
It is matched with the keyword of the title term in described title repertorie according to the content keyword corresponding to described body matter information
Quantity information, determines the degree of correlation of described object titles and the body matter information of described destination object, as described in will be matched with
The keyword quantity information of the title term in title repertorie and the ratio of described content keyword total quantity, as described phase
Guan Du, the content keyword as corresponding to hypothesis body matter information content is matched with the title in described title repertorie
The keyword quantity information of term accounts for the 92% of described content keyword total quantity, then degree of correlation determining unit can determine that target pair
As body matter information content of object-document is 0.92 with the degree of correlation of described title repertorie.
Those skilled in the art will be understood that the body matter information of the described destination object of above-mentioned determination and described title are used
The mode of the degree of correlation of repertorie is only for example, in other existing or texts of destination object described in determination of being likely to occur from now on
Appearance information is such as applicable to the present invention with the mode of the degree of correlation of described title repertorie, also should be included in the scope of the present invention
Within, and here is incorporated herein with way of reference.
Then, optimize determining unit according to the described degree of correlation, determine described optimization configured information, as when the described degree of correlation big
When predetermined threshold, determine that described optimization configured information includes in conjunction with body matter information, described object titles being carried out excellent
Change ", otherwise, it determines described optimization configured information includes " suggestion is optimized " to described object titles.For example, example, phase are connected
Pass degree determining unit determines body matter information content of destination object object-document and described title repertorie
The degree of correlation be 0.92, more than predetermined threshold such as 0.85, then optimization determining unit, according to this degree of correlation 0.92, determines described optimization
Configured information, such as " in conjunction with body matter information, described object titles can be optimized ", otherwise, optimizes what determining unit determined
Described optimization configured information includes " suggestion is optimized " to described object titles.
Fig. 3 illustrate according to a further aspect of the present invention for determining the Object table with regard to corresponding destination object for the object titles
The method flow diagram of reference breath.
Specifically, in step sl, information determines that equipment 1 obtains multiple training titles;In step s 2, information determines and sets
Standby 1, according to the label pattern information in the plurality of training title, sets up or updates corresponding label pattern dictionary, wherein, institute
State label pattern dictionary and include one or more label patterns and its frequency information;In step s3, information determines that equipment 1 obtains
The object titles of pending destination object;In step s 4, information determines equipment 1 according to described label pattern dictionary, to institute
State object titles and carry out filtration treatment;In step s 5, information determines equipment 1 according in the described object titles after filtration treatment
Title word word relevant information, determine the object characterization information with regard to described destination object for the described object titles.Here,
Information determines equipment 1 including but not limited to such as:1) it is applied not only to provide information storage space for its login user, to realize this use
Family uploads to share its destination object such as document, video, picture;Can be additionally used in providing the user online reading, download, exchange it
The network platform of the destination object that his user shares or terminal platform, such as Baidu library, beans fourth, Sina's love are asked, road visitor Ba Ba etc.,
Wherein, described terminal platform includes but is not limited to the user equipment such as mobile terminal, PC;2) it is used for being embodied as the offer of its login user
Message reference, the network platform of information sharing, information issue or synchronization or terminal platform, as social network sites, forum, space, win
The third party websites such as visitor, microblogging.Here, information determines that equipment 1 includes but is not limited to user network equipment, user equipment or network
Equipment passes through the mutually integrated equipment being constituted of network with user equipment.Here, described network determines equipment including but not limited to such as
Network host, single network server, multiple webserver collection or the set of computers based on cloud computing etc. are realized;Or by
User equipment is realized.Here, cloud is made up of a large amount of main frames based on cloud computing (Cloud Computing) or the webserver,
Wherein, cloud computing is one kind of Distributed Calculation, a super virtual computing being made up of a group loosely-coupled computer collection
Machine.Here, described user equipment can be any one can pass through keyboard, mouse, touch pad, touch-screen or hand-written with user
The modes such as equipment carry out the electronic product of man-machine interaction, such as computer, mobile phone, PDA, palm PC PPC or panel computer etc..
Described network includes but is not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN, VPN, wireless self-organization network (Ad Hoc
Network) etc..Those skilled in the art will be understood that above- mentioned information determines that equipment 1 is only for example, and other are existing or from now on may
The network equipment occurring or user equipment are such as applicable to the present invention, within also should being included in the scope of the present invention, and here
It is incorporated herein with way of reference.Here, the network equipment and user equipment all include a kind of according to being previously set or to store
Instruction, carries out the electronic equipment of numerical computations and information processing automatically, and its hardware includes but is not limited to microprocessor, special integrated
Circuit (ASIC), programmable gate array (FPGA), digital processing unit (DSP), embedded device etc..
Specifically, in step sl, information determines that equipment 1 is carried by third party devices such as browser, search engines
For application programming interfaces (API), obtain multiple training titles;Or, first pass through the third parties such as search engine, browser and set
The standby application programming interfaces (API) obtaining user's upload daily record providing, obtain multiple users and upload daily record;Then, many from this
Individual user uploads in daily record and obtains multiple training titles.For example, in step sl, information determines that equipment 1 is provided by browser
Provide and obtain the application programming interfaces (API) uploading daily record, get multiple users and upload daily records, such as within certain time,
User uploads which document, video, picture etc.;Then, in step sl, information determines that equipment 1 uploads from the plurality of user
Obtain in daily record below multiple training titles training title I to VIII etc. as:
I " the 6th chapter serial line interface 2010 spring "
II " Algorithms for Page Ranking based on Segment "
III " the 8th chapter application layer "
IV " 5-5_ minimum cost maximum flow problem-xfj "
V " angular momentum of 3-6 particle and angular momentum theorem -1 "
VI " 2011-12 ground knot "
VII " experiment seven Network Sniffings "
VIII " the WEB page block algorithms of facing mobile apparatus "
............
Those skilled in the art will be understood that above-mentioned acquisition multiple training title mode be only for example, other existing or
The mode of the acquisition multiple training title being likely to occur from now on is such as applicable to the present invention, also should be included in the scope of the present invention
Within, and here is incorporated herein with way of reference.
In step s 2, information determine equipment 1 according to the plurality of training title in label pattern information, set up or more
Newly corresponding label pattern dictionary, wherein, described label pattern dictionary includes one or more label patterns and its frequency information.
Specifically, in step s 2, information determines that equipment 1 can first-selected be normalized to the plurality of training title;Then, right
The label pattern information in the plurality of training title after normalized carries out label and processes, to determine the plurality of instruction
Practice the one or more label patterns corresponding to title;Then, then to one or more of label patterns carry out statistical disposition,
Obtain described label pattern dictionary.Here, the including but not limited to following at least any one of described normalized:1) to described instruction
Alphabet size in white silk title is write and is normalized, and the described alphabet size trained in title will write into capable unification;2)
Full-shape/half-angle normalized is carried out to the character in described training title.Here, described label pattern information represents training mark
Time comprising in chapters and sections belonging to mark training title present in topic, mark training title etc. does not characterize the interior of essential meaning
Hold part, such as " the 6th chapter ", " 2.1 section ", " experiment seven ", " 3-6 ", " 2011-12 " etc..Those skilled in the art will be understood that
State label pattern information, normalized mode is only for example, other label pattern information that are existing or being likely to occur from now on
Or normalized mode is such as applicable to the present invention, within also should being included in the scope of the present invention, and here is with the side of quoting
Formula is incorporated herein.
For example, connect example, in step s 2, information determines training title I that equipment 1 gets in step sl to it extremely
After VIII etc. is normalized, to the label pattern information in multiple training title I to VIII after normalized etc.
Carry out label to process, such as by number designation be substituted for character " _ ", with determine corresponding to the plurality of training title one or
Multiple label patterns, such as obtain training in title II and VIII do not have label pattern, and train in title I, III to VII respectively
Comprise label pattern " the _ chapter ", the _ chapter ", " _-_ ", " _-_ ", " _ _ _ _-_ _ ", " experiment _ ";Then, in step s 2, information
Determine that equipment 1 carries out statistical disposition to one or more of label patterns again, by label pattern and its corresponding frequency information
It is stored in label pattern dictionary, such as obtain mark pattern dictionary as shown in the following Table 3, wherein, described label pattern dictionary
Including one or more label patterns and its frequency information, and this label pattern dictionary can be updated by certain way, such as according to pre-
Fixed cycle, timing update, update described label pattern dictionary immediately:
Table 3
Preferably, in step s 2, information determines the training title that equipment 1 also can get first in step sl to it
I to VIII etc. carries out label and processes, to determine the one or more label patterns corresponding to the plurality of training title;So
Afterwards, statistical disposition is carried out to described label pattern, to obtain corresponding initial label pattern dictionary, wherein, described initial label
Pattern dictionary includes included label pattern and its corresponding frequency information in the plurality of training title;Then, further according to
Described frequency information, carries out Screening Treatment to the label pattern in described initial title pattern dictionary, to obtain described label mould
Formula dictionary.For example, also connect example, in step s 2, information determines that equipment 1 carries out label to the plurality of training title first
Process, such as by number designation be substituted for character " _ ", to determine the one or more label moulds corresponding to the plurality of training title
Formula, such as obtains training in title II and VIII do not have label pattern, and trains and comprise label mould in title I, III to VII respectively
Formula " the _ chapter ", the _ chapter ", " _-_ ", " _-_ ", " _ _ _ _-_ _ ", " experiment _ ";Then, Statistics Division is carried out to described label pattern
Reason, to obtain corresponding initial label pattern dictionary, wherein, described initial label pattern dictionary includes the plurality of training title
In included label pattern and its corresponding frequency information, as obtained the initial label pattern dictionary shown in above-mentioned table 3;Connect
, further according to described frequency information, Screening Treatment is carried out to the label pattern in described initial title pattern dictionary, to obtain
State label pattern dictionary, the label pattern such as frequency information being less than predetermined threshold such as 50000 is removed, obtain described label
Pattern dictionary, such as obtains label pattern dictionary as shown in table 4:
Label pattern | Frequency information |
The _ chapter | 449291 |
____-__-__ | 144205 |
Experiment _ | 90522 |
The _ _ chapter | 80418 |
(_) | 57856 |
Table 4
Those skilled in the art will be understood that above-mentioned foundation or the mode of the corresponding label pattern dictionary of renewal are only for example,
Other existing or foundation being likely to occur from now on or the mode updating corresponding label pattern dictionary are such as applicable to the present invention,
Also within the scope of the present invention should being included in, and here is incorporated herein with way of reference.
In step s3, information determines that equipment 1 passes through the application that the third party device such as browser, search engine provides
Routine interface (API), obtains the object titles of pending destination object;Or, by dynamic web page techniques such as ASP, JSP,
Obtain the object titles that user passes through the destination object that its user equipment PC uploads, using the object as pending destination object
Title.Here, described destination object include but is not limited to that user uploads with media formats such as document, video, picture, daily records or
A combination thereof or the combination of one or more of which, carry the information for sharing.For example, it is assumed that user A logs in Baidu library
http:After //wenku.baidu.com/, upload PDF document document1, its entitled title1 is " under LTE physics
Row control channel blind check process study " and document2, its entitled title2 is " in the piece of the 5th chapter MCS-51 series monolithic
Interface and interruption ", then in step s3, information determines the application programming interfaces (API) that equipment 1 is provided by Baidu library, just
Object titles " the LTE Physical Downlink Control Channel blind check that user A passes through the destination object that its user equipment PC uploads can be got
Process study " and " the piece inner joint of the 5th chapter MCS-51 series monolithic and interruption ".
Those skilled in the art will be understood that the mode of the object titles of the pending destination object of above-mentioned acquisition is only and lift
Example, the mode of the pending object titles of destination object of other acquisitions that are existing or being likely to occur from now on is such as applicable to this
Invention, within also should being included in the scope of the present invention, and here is incorporated herein with way of reference.
In step s 4, information determines that equipment 1, according to described label pattern dictionary, is carried out at filtration to described object titles
Reason, as filtered the label pattern corresponding label pattern information meeting in described object titles in described label pattern dictionary.Example
As connected example, in step s 4, information determines the described label pattern dictionary that equipment 1 is set up in step s 2 according to it, to it
" LTE physical down controls letter to the object titles title1 of the document document1 that the user A getting in step s3 uploads
The object titles title2 of road blind check process study " and document document2 " connects in the piece of the 5th chapter MCS-51 series monolithic
Mouthful and interrupt " carry out filtration treatment, each meet institute in object titles as filtered document document1 and document document2
State the label pattern corresponding label pattern information in label pattern dictionary, such as filter document document2 object titles
Label pattern information " the 5th chapter " in title2 " the piece inner joint of the 5th chapter MCS-51 series monolithic and interruption ", and
Do not exist in the object titles title1 " LTE Physical Downlink Control Channel blind check process study " of document1 and meet described mark
Label pattern corresponding label pattern information in number pattern dictionary, then in step s 4, information determines that equipment 1 is not right
The object titles title1 " LTE Physical Downlink Control Channel blind check process study " of document1 carries out filtration treatment.
Those skilled in the art will be understood that the above-mentioned mode carrying out filtration treatment to described object titles is only for example, its
He will such as be applicable to the present invention at the mode that described object titles are carried out with filtration treatment that is existing or being likely to occur from now on, also should
Within being included in the scope of the present invention, and here is incorporated herein with way of reference.
In step s 5, information determines the word according to the title word in the described object titles after filtration treatment for the equipment 1
Language relevant information, determines the object characterization information with regard to described destination object for the described object titles.Specifically, in step s 5,
Information determines that equipment 1 carries out word segmentation processing to the described object titles after filtration treatment first, to obtain in described object titles
Title word;Then, further according to the word relevant information of described title word, determine described object titles with regard to described target
The object characterization information of object.Here, described word relevant information include but is not limited under at least any one:1) described object
The word frequency information of the title word in title, here, the word frequency information of described title word can pass through query terms frequency data
Storehouse obtains, and wherein, described term frequencies database can be and pre-sets, also can be by the headings in multiple training titles
Language carries out statistics and obtains;2) quantity information of the title word in described object titles;3) character in described object titles
Quantity information.Here, described object characterization information is used for representing the quality information of described object titles, it not only reflects described
Whether the sign ability of the content information to described destination object for the object titles, also embodying described object titles can be well
Characterize the tolerance of the content information of described destination object, it such as can be represented using numerical value, it can be with qualitative table with quantificational expression
Show, such as high and low etc..For example, in step s 4, information determines equipment 1 to document2 object titles title2 " the 5th chapter MCS-
The piece inner joint of 51 series monolithics and interruption " obtains filtering object title titile2 ' " MCS-51 system after carrying out filtration treatment
The piece inner joint of row single-chip microcomputer and interruption ", then in step s 5, information determines equipment 1 first to filtering object title
Titile1 ' carries out word segmentation processing, to obtain the bag of words information that filtering object title titile2 ' obtains after word segmentation processing
" interruption of MCS-51 interface microcontroller ", that is, obtain the title word information corresponding to filtering object title titile1 ';Then, exist
In step S5, information determines the word relevant information according to title word " interruption of MCS-51 interface microcontroller " for the equipment 1, determines institute
State the object characterization information with regard to described destination object for the object titles, such as assume the heading in filtering object title titile2 '
Language " interruption of MCS-51 interface microcontroller " corresponding word frequency information is respectively 9486,503200,664560,432598, i.e. title
There is in word " interruption of MCS-51 interface microcontroller " the title word " interface microcontroller more than predetermined threshold such as 400000 word frequency
Interrupt ", then in step s 5, information determine equipment 1 can determine that object titles title2 " the 5th chapter MCS-51 series monolithic
Piece inner joint and interruption " is height with regard to the object characterization information of described destination object document2;For another example it is assumed that filtering object
Title word " interruption of MCS-51 interface microcontroller " corresponding word frequency information in title titile2 ' respectively 9486,
303200th, 264560,392598, do not have more than predetermined threshold such as in title word " interruption of MCS-51 interface microcontroller "
The title word of 400000 word frequency, but the quantity information of title word " interruption of MCS-51 interface microcontroller " satisfaction is more than or equal to
Predetermined threshold 4, then in step s 5, information determines that equipment 1 can determine that object titles title2 " the 5th chapter MCS-51 series monolithic
The piece inner joint of machine and interruption " is height with regard to the object characterization information of described destination object document2;Also such as, if heading
There is no in language " interruption of MCS-51 interface microcontroller " the title word of word frequency and/or the title more than predetermined threshold such as 400000
The quantity information of word is also unsatisfactory for predetermined threshold 4, then in step s 5, information determines that equipment 1 can determine that object titles
Title2 " the piece inner joint of the 5th chapter MCS-51 series monolithic and interruption " is with regard to the object of described destination object document2
Characterization information is low.Here, described term frequencies database can be located at information determining in equipment 1, may be alternatively located at and determine with information
Equipment 1 passes through in the network equipment that network is connected.
Here, the present invention is by the related letter of the word according to the title word in the described object titles after filtration treatment
Breath, because the described object titles after filtration treatment provide object titles content closer to real quality it is achieved that low-quality
Header identification rate and recognition accuracy respectively reach 93% and 91% beneficial effect.
Those skilled in the art will be understood that the object characterization with regard to described destination object for the described object titles of above-mentioned determination
The mode of information is only for example, and other existing or object titles described in determination of being likely to occur from now on are with regard to described destination object
The mode of object characterization information be such as applicable to the present invention, within also should being included in the scope of the present invention, and here is to draw
It is incorporated herein with mode.
It is constant work that information determines between each step of equipment 1.Specifically, in step sl, information is true
Locking equipment 1 persistently obtains multiple training titles;In step s 2, information determines that equipment 1 continues according to the plurality of training title
In label pattern information, set up or update corresponding label pattern dictionary, wherein, described label pattern dictionary include one or
Multiple label patterns and its frequency information;In step s3, information determines that equipment 1 persistently obtains the right of pending destination object
As title;In step s 4, information determines that equipment 1 continues according to described label pattern dictionary, and described object titles were carried out
Filter is processed;In step s 5, information determines that equipment 1 continues according to the title word in the described object titles after filtration treatment
Word relevant information, determines the object characterization information with regard to described destination object for the described object titles.Here, people in the art
Member is it should be understood that " continuing " information of referring to determines acquisition, the mark being constantly trained title between each step of equipment 1 respectively
The foundation in number pattern dictionary storehouse or renewal, the acquisition of object titles, filtration treatment and object characterization are carried out to described object titles
The determination of information, until information determines that equipment 1 stops the acquisition of " object titles " in a long time.
Preferably, information determines that equipment 1 also includes step S8 (not shown), and specifically, in step s 8, information determination sets
Described object titles after standby 1 pair of filtration treatment pre-process, to obtain pretreated described object titles;Wherein, exist
In step S5, information determines the word relevant information according to the title word in pretreated described object titles for the equipment 1, really
Fixed described object titles are with regard to the object characterization information of described destination object.
Specifically, in step s 8, information determines that equipment 1 pre-processes to the described object titles after filtration treatment,
To obtain pretreated described object titles.Here, the including but not limited to following at least any one of described pretreatment:1) to mistake
Described object titles after filter is processed carry out punctuation mark denoising, that is, remove in the described object titles after filtration treatment
Punctuation mark;2) the described object titles after filtration treatment are carried out with ASCII symbol removal process, but simultaneously according to predetermined foreign language
Dictionary, retains the foreign language words in the described predetermined foreign language dictionary having in the described object titles after filtration treatment, wherein, institute
State predetermined foreign language dictionary and can be and pre-set, such as existing collection English glossary arranges in some sequence and be further explained supplies people
Check the English dictionary of reference;Also can be obtained by statistics is carried out to the title word in multiple English training titles.
For example, the object titles title1 for document document1 " grind by LTE Physical Downlink Control Channel blind check process
Study carefully ", in step s 4, information determines that equipment 1 carries out to titile1 obtaining filtering object title title1 ' after filtration treatment
" LTE Physical Downlink Control Channel blind check process study ", then in step s 8, information determines equipment 1 to filtering object title
Title1 ' " LTE Physical Downlink Control Channel blind check process study " is pre-processed it is assumed that in filtering object title title1 '
English word " LTE " be present in predetermined foreign language dictionary, then in step s 8, information determines equipment 1 to filtering object title
After title1 ' is pre-processed, obtain pretreated described object titles such as titile1 " " LTE Physical Downlink Control Channel
Blind check process study ";For another example, the object titles title2 for document2 is " in the piece of the 5th chapter MCS-51 series monolithic
Interface and interruption ", in step s 4, information determines that equipment 1 carries out obtaining filtering object title after filtration treatment to titile2
Title2 ' " the piece inner joint of MCS-51 series monolithic and interruption ", then in step s 8, information determines that equipment 1 is right to filtering
As title title2 ' " the piece inner joint of MCS-51 series monolithic and interruption " is pre-processed it is assumed that filtering object title
English word " MCS-51 " in title2 ' is not present in predetermined foreign language dictionary, then in step s 8, information determines equipment 1
After filtering object title title2 ' is pre-processed, obtain pretreated described object titles such as titile2 " " series is single
The piece inner joint of piece machine and interruption ".
Those skilled in the art will be understood that the above-mentioned mode that described object titles after filtration treatment are pre-processed
It is only for example, other modes that the described object titles after filtration treatment are pre-processed that are existing or being likely to occur from now on
As being applicable to the present invention, within also should being included in the scope of the present invention, and here is incorporated herein with way of reference.
Then, in step s 5, information determines equipment 1 according to the title word in pretreated described object titles
Word relevant information, determines the object characterization information with regard to described destination object for the described object titles.Here, in step s 5,
Information determines that equipment 1 determines described object according to the word relevant information of the title word in pretreated described object titles
Title with regard to described destination object object characterization information with aforementioned in step s 5, information determines equipment 1 according to filtration treatment
The word relevant information of the title word in described object titles afterwards determines described object titles with regard to described destination object
The mode of object characterization information is same or similar, for simplicity's sake, therefore will not be described here, and comprise by reference with
This.
It is right with regard to corresponding destination object for determining object titles that Fig. 4 illustrates in accordance with a preferred embodiment of the present invention
Method flow diagram as characterization information.
Wherein, the method comprising the steps of S1 ', step S2 ', step S3 ', step S4 ', step S5 ', step S6 ' and step
S7’.Specifically, in step S1 ' in, information determines that equipment 1 obtains multiple training titles;In step S2 ' in, information determines equipment
1 according to the label pattern information in the plurality of training title, sets up or update corresponding label pattern dictionary, wherein, described
Label pattern dictionary includes one or more label patterns and its frequency information;In step S3 ' in, information determines that equipment 1 obtains
The object titles of pending destination object;In step S4 ' in, information determines equipment 1 according to described label pattern dictionary, to institute
State object titles and carry out filtration treatment;In step S5 ' in, information determines equipment 1 according to the described object titles after filtration treatment
In title word word relevant information, determine the object characterization information with regard to described destination object for the described object titles;When
When described object characterization information is less than predetermined sign threshold information, in step S6 ' in, information determines that equipment 1 determines with regard to described
The optimization configured information of object titles;In step S7 ' in, it is described that information determines that described optimization configured information is supplied to by equipment 1
User corresponding to destination object.', step S2 ', step S3 here, step S1 ', step S4 ', step S5 ' with Fig. 3 embodiment
The content of middle corresponding step is same or similar, for simplicity's sake, therefore will not be described here, and comprises by reference and this.
Specifically, when described object characterization information is less than predetermined sign threshold information, in step S6 ' in, information determines
Equipment 1 determines the optimization configured information with regard to described object titles.Here, how described optimization configured information includes instruction user
Described object titles are modified, optimizes to obtain the information of high-quality object titles, as with regard to described object titles
Amending advice etc..Here, in step S6 ' in, information determines that equipment 1 determines the optimization configured information with regard to described object titles
The including but not limited to following at least any one of mode:
1) summary info according to described destination object, determines described optimization configured information.Specifically, in step S6 ' in,
Information determines that equipment 1 can carry out semantic analysis process to the summary info of described destination object first, one or more to obtain
Summary keyword;Then, further according to one or more of summary keywords, in title with carrying out matching inquiry in repertorie, with
Described optimization configured information is determined according to Query Result.For example, it is assumed that in step S3 ' in, information determines that equipment 1 gets and waits to locate
The following destination object object-document of reason:
Title title:Test seven Network Sniffings
Summary info abstract:Based on Ethereal Sniffer software sniff packet, and the data being arrived according to sniff
Bag judges network condition.
Body matter information content:【Experimental principle】Network monitoring is a kind of conventional Passive Network attack method,
Invader's very unobtainable information of acquisition additive method easily can be helped, including user password, account, sensitive data, IP
Address, routing iinformation, TCP socket number etc........
Assume in step S5 ' in, information determines that equipment 1 determines title title with regard to destination object object-document
Described object characterization information be less than predetermined characterize threshold information, then in step S6 ' in, information determines that equipment 1 first can be to mesh
The summary info abstract of mark object object-document carries out semantic analysis process, to obtain one or more summaries
Keyword, such as " Ethereal sniff packet networks situation ";Then, in step S6 ' in, information determines equipment 1 further according to this
Summary keyword " Ethereal sniff packet networks situation ", in title with carrying out matching inquiry in repertorie, with according to inquiry
Result determines described optimization configured information, as when in described title repertorie matching inquiry to described summary keyword
Title term and/or described summary keyword " Ethereal sniff that " Ethereal sniff packet networks situation " matches
The total quantity of described keyword is accounted for the quantity that matches of title term in described title repertorie in packet networks situation "
Ratio when meeting predetermined threshold such as 0.8, then in step S6 ' in, information determines the described optimization configured information bag that equipment 1 determines
Include " in conjunction with summary info, described object titles can be optimized ", otherwise, in step S6 ' in, information determines what equipment 1 determined
Described optimization configured information includes " suggestion is optimized " to described object titles.Here, described title repertorie can be located at letter
In breath determination equipment 1, may be alternatively located at and determine in the network equipment that equipment 1 is connected by network with information.
2) degree of correlation according to described object titles and the body matter information of described destination object, in conjunction with described target pair
The quantity information of the text word of the body matter information of elephant, determines described optimization configured information.Specifically, in step S6 ' in,
Information determines that equipment 1 first can be by the title word information matches corresponding to such as described object titles in described destination object
The text word information corresponding to body matter information title word quantity information, or, by described object titles with
The matching degree of the body matter information of described destination object, determines the body matter letter of described object titles and described destination object
The degree of correlation of breath;Then, in step S6 ' in, information determines equipment 1 according to this degree of correlation, in conjunction with the text of described destination object
The quantity information of the text word of content information, determines described optimization configured information.For example, connect example, in step S6 ' in, letter
Breath determination equipment 1 is first to described object titles, and the body matter information of described destination object carries out semantic analysis process,
Obtain title word information " Network Sniffing " corresponding to described object titles, and the body matter information institute of destination object is right
The text word information " configuration of network monitoring sniff packet network interface card experimental service " answered;Then, in step S6 ' in, information is true
Locking equipment 1 is matched with the body matter information of described destination object in the title word information according to corresponding to described object titles
The title word quantity information of corresponding text word information, determines in described object titles and the text of described destination object
The degree of correlation of appearance information, the title of the text word information as being matched with corresponding to the body matter information of described destination object
Word quantity information and the ratio of described title word total quantity, as the described degree of correlation;Then, in step S6 ' in, information is true
Locking equipment 1, according to this degree of correlation, in conjunction with the quantity information of the text word of the body matter information of described destination object, determines institute
State optimization configured information, as assumed to be matched with described target in title word information " Network Sniffing " corresponding to title title
The title word quantity information of the text word information corresponding to body matter information of object and described title word total quantity
Ratio be 100%, then in step S6 ' in, information determines that equipment 1 determines the text of described object titles and described destination object
The degree of correlation of content information is 1;Then, in step S6 ' in, information determines equipment 1 according to this degree of correlation 1, combining target object
Body matter information content of object-document:Text word quantity information, assume as described in destination object
The quantity information of the text word of body matter information there are 20, determine described optimization configured information, such as " can be in conjunction with text
Content information is optimized to described object titles ", otherwise, in step S6 ' in, information determines the described optimization that equipment 1 determines
Configured information includes " suggestion is optimized " to described object titles.
Here, the present invention passes through the summary info of combining target object and/or the body matter combining described destination object
Information is it is achieved that the beneficial effect to 100% for the rate of accuracy reached of the described optimization configured information determining.
Those skilled in the art will be understood that above-mentioned determination with regard to the optimization configured information of described object titles mode only
For citing, other existing or determinations of being likely to occur from now on such as can with regard to the mode of the optimizations configured information of described object titles
It is applied to the present invention, within also should being included in the scope of the present invention, and here is incorporated herein with way of reference.
Preferably, when the object language type information of described destination object is believed with the title language type of described object titles
When ceasing inconsistent, in step S6 ' in, information determines that equipment 1 also can be by described object titles in described object language type information
Under corresponding be contained in described optimization configured information with reference to heading message.For example, it is assumed that destination object object-document
Object language type information is English, and the title language type information of object titles titile is Chinese, then in step S6 '
In, information determine equipment 1 also can by described object titles titile under described object language type information corresponding with reference to mark
Topic information is contained in described optimization configured information, will be contained in institute with reference to heading message by the corresponding English of object titles titile
State optimization configured information.
In step S7 ' in, information determines that equipment 1 passes through the dynamic web page technique such as ASP, JSP or PHP, or other
The communication mode of agreement, the such as communication protocol such as http or https, described optimization configured information is supplied to described destination object institute
Corresponding user, the such as user equipment of this user, read for user and browse.
Preferably, step S6 ' include step S61 ' (not shown) and step S62 ' (not shown).Specifically, when described right
When being less than predetermined sign threshold information as characterization information, in step S61 ' in, information determines that equipment 1 determines described destination object
Body matter information and the degree of correlation of described title repertorie;In step S62 ' in, information determines equipment 1 according to described correlation
Degree, determines described optimization configured information.
Specifically, when described object characterization information is less than predetermined sign threshold information, in step S61 ' in, information determines
Equipment 1 determines the body matter information of described destination object and the degree of correlation of described title repertorie.Specifically, when described object
When characterization information is less than predetermined sign threshold information, in step S61 ' in, information determines equipment 1 according to described body matter information
Corresponding content keyword is matched with the keyword quantity information of the title term in described title repertorie, determines described phase
Guan Du, is matched with the keyword quantity information of title term and the institute in described title repertorie as will be described in content keyword
State the ratio of content keyword total quantity, as the described degree of correlation.For example, it is assumed that in step S5 ' in, information determines that equipment 1 is true
Calibration is inscribed title and is less than predetermined sign threshold information with regard to the described object characterization information of destination object object-document,
Then degree of correlation determining unit carries out semantic point first to body matter information content of destination object object-document
Analysis is processed, and obtains content keyword " the network monitoring sniff packet network interface card lab-gown corresponding to body matter information content
Business configuration ";Then, in step S61 ' in, information determines content keyword according to corresponding to described body matter information for the equipment 1
It is matched with the keyword quantity information of the title term in described title repertorie, determine described object titles and described target pair
The degree of correlation of the body matter information of elephant, the keyword quantity information of the title term as being matched with described title repertorie
With the ratio of described content keyword total quantity, as the described degree of correlation, as assumed corresponding to body matter information content
It is total that the keyword quantity information of the title term that content keyword is matched with described title repertorie accounts for described content keyword
The 92% of quantity, then in step S61 ' in, information determines that equipment 1 can determine that in the text of destination object object-document
Appearance information content is 0.92 with the degree of correlation of described title repertorie.
Those skilled in the art will be understood that the body matter information of the described destination object of above-mentioned determination and described title are used
The mode of the degree of correlation of repertorie is only for example, in other existing or texts of destination object described in determination of being likely to occur from now on
Appearance information is such as applicable to the present invention with the mode of the degree of correlation of described title repertorie, also should be included in the scope of the present invention
Within, and here is incorporated herein with way of reference.
Then, in step S62 ' in, information determines equipment 1 according to the described degree of correlation, determines described optimization configured information, such as
When the described degree of correlation is more than predetermined threshold, determine that include can be in conjunction with body matter information to described right for described optimization configured information
As title is optimized ", otherwise, it determines described optimization configured information includes " suggestion is optimized " to described object titles.Example
As connected example, in step S61 ' in, information determines that equipment 1 determines the body matter information of destination object object-document
Content is 0.92 with the degree of correlation of described title repertorie, more than predetermined threshold such as 0.85, then in step S62 ' in, information
Determine that equipment 1, according to this degree of correlation 0.92, determines described optimization configured information, such as " can be in conjunction with body matter information to described right
As title is optimized ", otherwise, the described optimization configured information optimizing determining unit determination includes " advising to described object mark
Topic is optimized ".
It should be noted that the present invention can be carried out in software and/or software with the assembly of hardware, for example, can adopt
Realized with special IC (ASIC), general purpose computer or any other similar hardware device.In an embodiment
In, the software program of the present invention can realize steps described above or function by computing device.Similarly, the present invention
Software program (including related data structure) can be stored in computer readable recording medium storing program for performing, for example, RAM memory,
Magnetically or optically driver or floppy disc and similar devices.In addition, some steps of the present invention or function can employ hardware to realize, example
As coordinated thus executing the circuit of each step or function as with processor.
In addition, the part of the present invention can be applied to computer program, such as computer program instructions, when its quilt
During computer execution, by the operation of this computer, can call or provide the method according to the invention and/or technical scheme.
And call the programmed instruction of the method for the present invention, it is possibly stored in fixing or moveable recording medium, and/or pass through
Data flow in broadcast or other signal bearing medias and be transmitted, and/or be stored according to described program instruction operation
In the working storage of computer equipment.Here, including a device according to one embodiment of present invention, this device includes using
In memory and the processor for execute program instructions of storage computer program instructions, wherein, when this computer program refers to
When order is by this computing device, trigger the method based on aforementioned multiple embodiments according to the present invention for this plant running and/or skill
Art scheme.
It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie
In the case of the spirit or essential attributes of the present invention, the present invention can be realized in other specific forms.Therefore, no matter
From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention is by appended power
Profit requires rather than described above limits, it is intended that all in the implication and scope of the equivalency of claim by falling
Change is included in the present invention.Any reference in claim should not be considered as limiting involved claim.This
Outward it is clear that " inclusion " one word is not excluded for other units or step, odd number is not excluded for plural number.In device claim, statement is multiple
Unit or device can also be realized by software or hardware by a unit or device.The first, the second grade word is used for table
Show title, and be not offered as any specific order.
Claims (19)
1. a kind of method for determining the object characterization information with regard to corresponding destination object for the object titles, wherein, the method bag
Include following steps:
X obtains multiple training titles;
Y, according to the label pattern information in the plurality of training title, sets up or updates corresponding label pattern dictionary, wherein,
Described label pattern dictionary includes one or more label patterns and its frequency information, and described step y includes:
- label process is carried out to the plurality of training title, to determine or many corresponding to the plurality of training title
Individual label pattern;
- statistical disposition is carried out to described label pattern, to obtain corresponding initial label pattern dictionary, wherein, described initial mark
Number pattern dictionary includes included label pattern and its corresponding frequency information in the plurality of training title;
- according to described frequency information, Screening Treatment is carried out to the label pattern in described initial title pattern dictionary, to obtain
State label pattern dictionary;
Wherein, the method also includes:
A obtains the object titles of pending destination object;
B, according to described label pattern dictionary, carries out filtration treatment to described object titles;
C, according to the word relevant information of the title word in the described object titles after filtration treatment, determines described object titles
Object characterization information with regard to described destination object.
2. method according to claim 1, wherein, the method also includes:
- the described object titles after filtration treatment are pre-processed, to obtain pretreated described object titles;
Wherein, described step c includes:
- according to the word relevant information of the title word in pretreated described object titles, determine that described object titles close
Object characterization information in described destination object.
3. method according to any one of claim 1 to 2, wherein, the method also includes:
M, when described object characterization information is less than and makes a reservation for characterize threshold information, determines the optimization instruction with regard to described object titles
Information;
- described optimization configured information is supplied to the user corresponding to described destination object.
4. method according to claim 3, wherein, described step m includes:
- when described object characterization information is less than predetermined sign threshold information, according to the summary info of described destination object, determine
Described optimization configured information.
5. method according to claim 4, wherein, described step m includes:
- when described object characterization information is less than predetermined sign threshold information, language is carried out to the summary info of described destination object
Justice analyzing and processing, to obtain one or more summary keywords;
- according to one or more of summary keywords, in title with carrying out matching inquiry in repertorie, with true according to Query Result
Fixed described optimization configured information.
6. method according to claim 3, wherein, described step m includes:
M1, when described object characterization information is less than and makes a reservation for characterize threshold information, determines the body matter information of described destination object
The degree of correlation with described title repertorie;
- according to the described degree of correlation, determine described optimization configured information.
7. method according to claim 6, wherein, described step m1 includes:
- when described object characterization information be less than predetermined characterize threshold information when, according to corresponding to described body matter information in
Hold the keyword quantity information of title term in described title repertorie for the Keywords matching, determine the described degree of correlation.
8. method according to claim 3, wherein, described step m also includes:
- when the object language type information of described destination object and the title language type information of described object titles inconsistent
When, it is contained in described optimization instruction by corresponding under described object language type information for described object titles with reference to heading message
Information.
9. a kind of information for determining the object characterization information with regard to corresponding destination object for the object titles determines equipment, wherein,
This information determines that equipment includes:
Training acquisition device, for obtaining multiple training titles;
Device set up by dictionary, for according to the label pattern information in the plurality of training title, setting up or updating corresponding mark
Number pattern dictionary, wherein, described label pattern dictionary includes one or more label patterns and its frequency information, and described dictionary is built
Vertical device is used for:
- label process is carried out to the plurality of training title, to determine or many corresponding to the plurality of training title
Individual label pattern;
- statistical disposition is carried out to described label pattern, to obtain corresponding initial label pattern dictionary, wherein, described initial mark
Number pattern dictionary includes included label pattern and its corresponding frequency information in the plurality of training title;
- according to described frequency information, Screening Treatment is carried out to the label pattern in described initial title pattern dictionary, to obtain
State label pattern dictionary;
Wherein, this information determines that equipment also includes:
Title acquisition device, for obtaining the object titles of pending destination object;
Filtration treatment device, for according to described label pattern dictionary, carrying out filtration treatment to described object titles;
Characterize and determine device, for the word relevant information according to the title word in the described object titles after filtration treatment,
Determine the object characterization information with regard to described destination object for the described object titles.
10. information according to claim 9 determines equipment, and wherein, this information determines that equipment also includes:
Pretreatment unit, for pre-processing to the described object titles after filtration treatment, pretreated described to obtain
Object titles;
Wherein, described sign determines that device is used for:
- according to the word relevant information of the title word in pretreated described object titles, determine that described object titles close
Object characterization information in described destination object.
11. information according to claim 9 or 10 determine equipment, and wherein, this information determines that equipment also includes:
Optimize and determine device, for when described object characterization information is less than predetermined sign threshold information, determining with regard to described right
Optimization configured information as title;
Offer device, for being supplied to the user corresponding to described destination object by described optimization configured information.
12. information according to claim 11 determine equipment, and wherein, described optimization determines that device is used for:
- when described object characterization information is less than predetermined sign threshold information, according to the summary info of described destination object, determine
Described optimization configured information.
13. information according to claim 12 determine equipment, and wherein, described optimization determines that device is used for:
- when described object characterization information is less than predetermined sign threshold information, language is carried out to the summary info of described destination object
Justice analyzing and processing, to obtain one or more summary keywords;
- according to one or more of summary keywords, in title with carrying out matching inquiry in repertorie, with true according to Query Result
Fixed described optimization configured information.
14. information according to claim 11 determine equipment, and wherein, described optimization determines that device includes:
Degree of correlation determining unit, for when described object characterization information is less than predetermined sign threshold information, determining described target
The body matter information of object and the degree of correlation of described title repertorie;
Optimize determining unit, for according to the described degree of correlation, determining described optimization configured information.
15. information according to claim 14 determine equipment, and wherein, described degree of correlation determining unit is used for:
- when described object characterization information be less than predetermined characterize threshold information when, according to corresponding to described body matter information in
Hold the keyword quantity information of title term in described title repertorie for the Keywords matching, determine the described degree of correlation.
16. information according to claim 11 determine equipment, and wherein, described optimization determines that device is additionally operable to:
- when the object language type information of described destination object and the title language type information of described object titles inconsistent
When, it is contained in described optimization instruction by corresponding under described object language type information for described object titles with reference to heading message
Information.
A kind of 17. computer equipments, determine equipment including the information as any one of claim 9 to 16.
A kind of 18. browsers, determine equipment including the information as any one of claim 9 to 16.
A kind of 19. browser plug-ins, determine equipment including the information as any one of claim 9 to 16.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310260162.4A CN103383697B (en) | 2013-06-26 | 2013-06-26 | Method and equipment for determining object representation information of object header |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310260162.4A CN103383697B (en) | 2013-06-26 | 2013-06-26 | Method and equipment for determining object representation information of object header |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103383697A CN103383697A (en) | 2013-11-06 |
CN103383697B true CN103383697B (en) | 2017-02-15 |
Family
ID=49491487
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310260162.4A Active CN103383697B (en) | 2013-06-26 | 2013-06-26 | Method and equipment for determining object representation information of object header |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103383697B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105630909A (en) * | 2015-12-21 | 2016-06-01 | 北京奇虎科技有限公司 | Method and device for displaying normalized header information |
CN109740130B (en) * | 2018-11-22 | 2022-12-09 | 厦门市美亚柏科信息股份有限公司 | Method and device for generating file |
CN109729348B (en) * | 2019-03-07 | 2020-06-02 | 腾讯科技(深圳)有限公司 | Method, device and equipment for determining video quality |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101315624A (en) * | 2007-05-29 | 2008-12-03 | 阿里巴巴集团控股有限公司 | Text subject recommending method and device |
CN102737017A (en) * | 2011-03-31 | 2012-10-17 | 北京百度网讯科技有限公司 | Method and apparatus for extracting page theme |
EP2546760A1 (en) * | 2011-07-11 | 2013-01-16 | Accenture Global Services Limited | Provision of user input in systems for jointly discovering topics and sentiment |
CN103136352A (en) * | 2013-02-27 | 2013-06-05 | 华中师范大学 | Full-text retrieval system based on two-level semantic analysis |
-
2013
- 2013-06-26 CN CN201310260162.4A patent/CN103383697B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101315624A (en) * | 2007-05-29 | 2008-12-03 | 阿里巴巴集团控股有限公司 | Text subject recommending method and device |
CN102737017A (en) * | 2011-03-31 | 2012-10-17 | 北京百度网讯科技有限公司 | Method and apparatus for extracting page theme |
EP2546760A1 (en) * | 2011-07-11 | 2013-01-16 | Accenture Global Services Limited | Provision of user input in systems for jointly discovering topics and sentiment |
CN103136352A (en) * | 2013-02-27 | 2013-06-05 | 华中师范大学 | Full-text retrieval system based on two-level semantic analysis |
Also Published As
Publication number | Publication date |
---|---|
CN103383697A (en) | 2013-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103294781B (en) | A kind of method and apparatus for processing page data | |
US10776885B2 (en) | Mutually reinforcing ranking of social media accounts and contents | |
US20110167053A1 (en) | Visual and multi-dimensional search | |
US20080005091A1 (en) | Visual and multi-dimensional search | |
CN107784092A (en) | A kind of method, server and computer-readable medium for recommending hot word | |
CN107220386A (en) | Information-pushing method and device | |
US8949227B2 (en) | System and method for matching entities and synonym group organizer used therein | |
JP2017157192A (en) | Method of matching between image and content item based on key word | |
CN107609152A (en) | Method and apparatus for expanding query formula | |
CN107346326A (en) | For generating the method and system of neural network model | |
CN103514191A (en) | Method and device for determining keyword matching mode of target popularization information | |
CN111813905B (en) | Corpus generation method, corpus generation device, computer equipment and storage medium | |
CN104035972B (en) | A kind of knowledge recommendation method and system based on microblogging | |
CN103544178A (en) | Method and equipment for providing reconstruction page corresponding to target page | |
CN103399862B (en) | Determine the method and apparatus of search index information corresponding to target query sequence | |
CN105677931A (en) | Information search method and device | |
CN109947952A (en) | Search method, device, equipment and storage medium based on english knowledge map | |
CN107766234A (en) | A kind of assessment method, the apparatus and system of the webpage health degree based on mobile device | |
CN109918648B (en) | Rumor depth detection method based on dynamic sliding window feature score | |
CN109033282A (en) | A kind of Web page text extracting method and device based on extraction template | |
US10127322B2 (en) | Efficient retrieval of fresh internet content | |
CN110096681A (en) | Contract terms analysis method, device, equipment and readable storage medium storing program for executing | |
JP2017157193A (en) | Method of selecting image that matches with content based on metadata of image and content | |
CN103383697B (en) | Method and equipment for determining object representation information of object header | |
CN103257975A (en) | Search method, search device and search system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |