CN103383697B - Method and equipment for determining object representation information of object header - Google Patents

Method and equipment for determining object representation information of object header Download PDF

Info

Publication number
CN103383697B
CN103383697B CN201310260162.4A CN201310260162A CN103383697B CN 103383697 B CN103383697 B CN 103383697B CN 201310260162 A CN201310260162 A CN 201310260162A CN 103383697 B CN103383697 B CN 103383697B
Authority
CN
China
Prior art keywords
information
title
titles
determines
equipment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310260162.4A
Other languages
Chinese (zh)
Other versions
CN103383697A (en
Inventor
徐兴军
潘昕婷
李成洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201310260162.4A priority Critical patent/CN103383697B/en
Publication of CN103383697A publication Critical patent/CN103383697A/en
Application granted granted Critical
Publication of CN103383697B publication Critical patent/CN103383697B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention aims to provide a method and equipment for determining object representation information of an object header corresponding to a target object. The method specifically comprises the steps as follows: obtaining a plurality of training headers, creating or updating corresponding labeling pattern dictionaries via labeling pattern information in the plurality of training headers, obtaining an object header of a to-be-processed target object, performing filtering treatment on the object header via the labeling pattern dictionaries, and determining the object representation information of the object header corresponding to the target object via related word information of header words in the object header after filtering treatment. Compared with the prior art, the method performs filtering treatment on an object header of a target object via labeling pattern dictionaries, and determines the object representation information of the object header corresponding to the target object according to the related word information of header words in the object header after filtering treatment, so as to effectively identify low-quality object headers, improve the efficiency of obtaining information by users, and enhance the information sharing experience of the users.

Description

Determine the method and apparatus of the object characterization information of object titles
Technical field
The present invention relates to Internet technical field, more particularly, to a kind of object characterization information for determining object titles Technology.
Background technology
Currently, the development with Internet technology and the Internet, applications are to user learning, work and the infiltration lived, people Pass through network acquisition information more and more, and the information being had is shared by network, such as in Baidu library, beans The network platforms such as fourth, space upload the data content that it has.However, destination object such as document, video, picture that user uploads Deng object titles quality uneven, low-quality object titles generally can not reflect the true letter of corresponding destination object content Breath, prior art cannot effectively judge low-quality object titles, correspondingly, also low-quality object titles cannot be given Optimize configured information, to point out user that object titles are improved, not only reduce the efficiency that user obtains information, also affect The Information Sharing experience of user.
Content of the invention
It is an object of the invention to provide a kind of for determining the object characterization information with regard to corresponding destination object for the object titles Method and apparatus.
According to an aspect of the invention, it is provided a kind of for determining the object with regard to corresponding destination object for the object titles The method of characterization information, wherein, the method comprises the following steps:
X obtains multiple training titles;
Y, according to the label pattern information in the plurality of training title, sets up or updates corresponding label pattern dictionary, its In, described label pattern dictionary includes one or more label patterns and its frequency information;
Wherein, the method also includes:
A obtains the object titles of pending destination object;
B, according to described label pattern dictionary, carries out filtration treatment to described object titles;
C, according to the word relevant information of the title word in the described object titles after filtration treatment, determines described object Title is with regard to the object characterization information of described destination object.
According to a further aspect in the invention, additionally provide a kind of right with regard to corresponding destination object for determining object titles As the information of characterization information determines equipment, wherein, this information determines that equipment includes:
Training acquisition device, for obtaining multiple training titles;
Device set up by dictionary, corresponds to for according to the label pattern information in the plurality of training title, setting up or updating Label pattern dictionary, wherein, described label pattern dictionary includes one or more label patterns and its frequency information;
Wherein, this information determines that equipment also includes:
Title acquisition device, for obtaining the object titles of pending destination object;
Filtration treatment device, for according to described label pattern dictionary, carrying out filtration treatment to described object titles;
Characterize and determine device, for the related letter of word according to the title word in the described object titles after filtration treatment Breath, determines the object characterization information with regard to described destination object for the described object titles.
According to a further aspect of the invention, additionally provide a kind of computer equipment, including such as aforementioned another according to the present invention The information for determining the object characterization information with regard to corresponding destination object for the object titles of one side determines equipment.
According to a further aspect of the invention, additionally provide a kind of browser, including such as aforementioned according to the present invention another The information for determining the object characterization information with regard to corresponding destination object for the object titles of aspect determines equipment.
According to a further aspect of the invention, additionally provide a kind of browser plug-in, including such as aforementioned another according to the present invention The information for determining the object characterization information with regard to corresponding destination object for the object titles of one side determines equipment.
Compared with prior art, the present invention passes through according to the label pattern dictionary set up or update, to the target pair obtaining The object titles of elephant carry out filtration treatment, related with the word according to the title word in the described object titles after filtration treatment Information, determine described object titles with regard to the object characterization information of described destination object it is achieved that effectively identification is low-quality right As title, not only increase the value of Information Sharing and user obtains the efficiency of information, also improve the Information Sharing body of user Test.And, when described object characterization information is less than predetermined sign threshold information, the present invention may further determine that with regard to described object mark The optimization configured information of topic, described optimization configured information is supplied to the user corresponding to described destination object, thus entering one Improve to step the value of Information Sharing and user obtains the efficiency of information, improve the Information Sharing experience of user.Additionally, working as When the object language type information of described destination object is inconsistent with the title language type information of described object titles, the present invention Also described optimization can be contained in refer to corresponding under described object language type information for described object titles with reference to heading message Showing information, thus further improve the value of Information Sharing and the efficiency of user's acquisition information, improving the letter of user Breath shares experience.
Brief description
By reading the detailed description that non-limiting example is made made with reference to the following drawings, other of the present invention Feature, objects and advantages will become more apparent upon:
Fig. 1 illustrates according to one aspect of the invention for determining the object characterization with regard to corresponding destination object for the object titles The equipment schematic diagram of information;
It is right with regard to corresponding destination object for determining object titles that Fig. 2 illustrates in accordance with a preferred embodiment of the present invention Equipment schematic diagram as characterization information;
Fig. 3 illustrate according to a further aspect of the present invention for determining the Object table with regard to corresponding destination object for the object titles The method flow diagram of reference breath;
It is right with regard to corresponding destination object for determining object titles that Fig. 4 illustrates in accordance with a preferred embodiment of the present invention Method flow diagram as characterization information.
In accompanying drawing, same or analogous reference represents same or analogous part.
Specific embodiment
Below in conjunction with the accompanying drawings the present invention is described in further detail.
Fig. 1 illustrates according to one aspect of the invention for determining the object characterization with regard to corresponding destination object for the object titles The information of information determines equipment 1, and wherein, information determines that equipment 1 includes training acquisition device 11, dictionary to set up device 12, title Acquisition device 13, filtration treatment device 14 and sign determine device 15.Specifically, training acquisition device 11 obtains multiple training marks Topic;Dictionary sets up device 12 according to the label pattern information in the plurality of training title, sets up or updates corresponding label mould Formula dictionary, wherein, described label pattern dictionary includes one or more label patterns and its frequency information;Title acquisition device 13 Obtain the object titles of pending destination object;Filtration treatment device 14 according to described label pattern dictionary, to described object Title carries out filtration treatment;Characterize the word determining device 15 according to the title word in the described object titles after filtration treatment Relevant information, determines the object characterization information with regard to described destination object for the described object titles.Here, information determines that equipment 1 wraps Include but be not limited to as:1) it is applied not only to provide information storage space for its login user, uploaded to share it with realizing this user Destination object such as document, video, picture;Can be additionally used in providing the user online reading, download, exchange the mesh that other users are shared The mark network platform of object or terminal platform, such as Baidu library, beans fourth, Sina's love are asked, road visitor Ba Ba etc., wherein, described terminal Platform includes but is not limited to the user equipment such as mobile terminal, PC;2) it is used for being embodied as the offer message reference of its login user, information Shared, information is issued or the network platform of synchronization or terminal platform, such as social network sites, forum, space, blog, microblogging etc. the 3rd Square website.Here, information determines that equipment 1 includes but is not limited to user network equipment, user equipment or the network equipment and user sets For by the mutually integrated equipment being constituted of network.Here, described network determines equipment including but not limited to as network host, single The webserver, multiple webserver collection or the set of computers based on cloud computing etc. are realized;Or realized by user equipment. Here, cloud is made up of a large amount of main frames based on cloud computing (Cloud Computing) or the webserver, wherein, cloud computing is One kind of Distributed Calculation, a super virtual computer being made up of a group loosely-coupled computer collection.Here, described use Family equipment can be that any one can be carried out by modes such as keyboard, mouse, touch pad, touch-screen or handwriting equipments with user The electronic product of man-machine interaction, such as computer, mobile phone, PDA, palm PC PPC or panel computer etc..Described network include but It is not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN, VPN, wireless self-organization network (Ad Hoc network) etc..This area Technical staff will be understood that above- mentioned information determines that equipment 1 is only for example, other network equipments that are existing or being likely to occur from now on Or user equipment is such as applicable to the present invention, within also should being included in the scope of the present invention, and here is comprised with way of reference In this.Here, the network equipment and user equipment all include a kind of can automatically entering line number according to the instruction being previously set or store Value calculates the electronic equipment with information processing, and its hardware includes but is not limited to microprocessor, special IC (ASIC), can compile Journey gate array (FPGA), digital processing unit (DSP), embedded device etc..
Specifically, training acquisition device 11 passes through the application journey that the third party device such as browser, search engine provides Sequence interface (API), obtains multiple training titles;Or, obtaining of the third party devices such as search engine, browser offer is be provided Take the application programming interfaces (API) that family uploads daily record, obtain multiple users and upload daily record;Then, upload from the plurality of user Multiple training titles are obtained in daily record.For example, training acquisition device 11 uploads daily record by the acquisition that provides that browser provides Application programming interfaces (API), get multiple users and upload daily record, such as within certain time, which document user uploads, regards Frequently, picture etc.;Then, training acquisition device 11 uploads from the plurality of user and obtains below multiple training titles training mark daily record Topic I to VIII etc. as:
I " the 6th chapter serial line interface 2010 spring "
II " Algorithms for Page Ranking based on Segment "
III " the 8th chapter application layer "
IV " 5-5_ minimum cost maximum flow problem-xfj "
V " angular momentum of 3-6 particle and angular momentum theorem -1 "
VI " 2011-12 ground knot "
VII " experiment seven Network Sniffings "
VIII " the WEB page block algorithms of facing mobile apparatus "
............
Those skilled in the art will be understood that above-mentioned acquisition multiple training title mode be only for example, other existing or The mode of the acquisition multiple training title being likely to occur from now on is such as applicable to the present invention, also should be included in the scope of the present invention Within, and here is incorporated herein with way of reference.
Dictionary sets up device 12 according to the label pattern information in the plurality of training title, sets up or updates corresponding mark Number pattern dictionary, wherein, described label pattern dictionary includes one or more label patterns and its frequency information.Specifically, word Allusion quotation is set up device 12 and can first-selected the plurality of training title be normalized;Then, to described in after normalized Label pattern information in multiple training titles carries out label and processes, to determine corresponding to the plurality of training title Or multiple label pattern;Then, then to one or more of label patterns carry out statistical disposition, obtain described label pattern word Allusion quotation.Here, the including but not limited to following at least any one of described normalized:1) to the alphabet size in described training title Write and be normalized, the described alphabet size trained in title will write into capable unification;2) in described training title Character carries out full-shape/half-angle normalized.Here, described label pattern information represents mark training present in training title Time comprising in chapters and sections belonging to title, mark training title etc. does not characterize the content part of essential meaning, such as " the 6th chapter ", " 2.1 section ", " experiment seven ", " 3-6 ", " 2011-12 " etc..Those skilled in the art will be understood that above-mentioned label pattern information and return One change processing mode is only for example, and other existing or label pattern information of being likely to occur from now on or normalized mode are such as It is applicable to the present invention, within also should being included in the scope of the present invention, and here is incorporated herein with way of reference.
For example, connect example, dictionary is set up device 12 and training title I to VIII that training acquisition device 11 gets etc. is entered After row normalized, the label pattern information in multiple training title I to VIII after normalized etc. is carried out mark Number process, such as by number designation be substituted for character " _ ", to determine the one or more labels corresponding to the plurality of training title Pattern, such as obtains training in title II and VIII do not have label pattern, and trains and comprise label in title I, III to VII respectively Pattern " the _ chapter ", the _ chapter ", " _-_ ", " _-_ ", " _ _ _ _-_ _ ", " experiment _ ";Then, dictionary sets up device 12 again to described One or more label patterns carry out statistical disposition, and label pattern and its corresponding frequency information are stored in label pattern dictionary In, such as obtain mark pattern dictionary as shown in the following Table 1, wherein, described label pattern dictionary includes one or more marks Number pattern and its frequency information, and this label pattern dictionary can be updated by certain way, such as update according to predetermined period, timing, Update described label pattern dictionary immediately:
Label pattern Frequency information
The _ chapter 449291
____-__-__ 144205
____-__ 49938
Experiment _ 90522
The _ _ chapter 80418
(_) 57856
Table 1
Preferably, dictionary set up device 12 also can be first to the training training title I that gets of acquisition device 11 extremely VIII etc. carries out label and processes, to determine the one or more label patterns corresponding to the plurality of training title;Then, right Described label pattern carries out statistical disposition, to obtain corresponding initial label pattern dictionary, wherein, described initial label pattern word Allusion quotation includes included label pattern and its corresponding frequency information in the plurality of training title;Then, further according to described frequency Secondary information, carries out Screening Treatment to the label pattern in described initial title pattern dictionary, to obtain described label pattern dictionary. For example, also connect example, dictionary is set up device 12 and carried out label process to the plurality of training title first, such as by number designation Be substituted for character " _ ", with determine the plurality of training title corresponding to one or more label patterns, such as obtain train title There is no label pattern in II and VIII, and train and in title I, III to VII, comprise label pattern " the _ chapter ", the _ chapter respectively ", " _-_ ", " _-_ ", " _ _ _ _-_ _ ", " experiment _ ";Then, statistical disposition is carried out to described label pattern, corresponding first to obtain Beginning label pattern dictionary, wherein, described initial label pattern dictionary includes included label mould in the plurality of training title Formula and its corresponding frequency information, as obtained the initial label pattern dictionary shown in above-mentioned table 1;Then, further according to the described frequency Information, carries out Screening Treatment to the label pattern in described initial title pattern dictionary, to obtain described label pattern dictionary, such as By frequency information be less than predetermined threshold such as 50000 label pattern be removed, acquisition described label pattern dictionary, such as obtain as Label pattern dictionary shown in table 2:
Label pattern Frequency information
The _ chapter 449291
____-__-__ 144205
Experiment _ 90522
The _ _ chapter 80418
(_) 57856
Table 2
Those skilled in the art will be understood that above-mentioned foundation or the mode of the corresponding label pattern dictionary of renewal are only for example, Other existing or foundation being likely to occur from now on or the mode updating corresponding label pattern dictionary are such as applicable to the present invention, Also within the scope of the present invention should being included in, and here is incorporated herein with way of reference.
Title acquisition device 13 passes through the application programming interfaces that the third party device such as browser, search engine provides (API) object titles of pending destination object, are obtained;Or, by dynamic web page techniques such as ASP, JSP, obtain user The object titles of the destination object being uploaded by its user equipment PC, using the object titles as pending destination object.? This, described destination object include but is not limited to that user uploads with the media formats such as document, video, picture, daily record or a combination thereof, Or the combination of one or more of which, carry the information for sharing.For example, it is assumed that user A logs in Baidu library http:// After wenku.baidu.com/, upload PDF document document1, " LTE physical down controls its entitled title1 Channel blind detection process study " and document2, its entitled title2 " the piece inner joint of the 5th chapter MCS-51 series monolithic and Interrupt ", then the application programming interfaces (API) that title acquisition device 13 is provided by Baidu library, just can get user A and pass through The object titles " LTE Physical Downlink Control Channel blind check process study " of destination object and " the 5th chapter that its user equipment PC uploads The piece inner joint of MCS-51 series monolithic and interruption ".
Those skilled in the art will be understood that the mode of the object titles of the pending destination object of above-mentioned acquisition is only and lift Example, the mode of the pending object titles of destination object of other acquisitions that are existing or being likely to occur from now on is such as applicable to this Invention, within also should being included in the scope of the present invention, and here is incorporated herein with way of reference.
Filtration treatment device 14, according to described label pattern dictionary, carries out filtration treatment to described object titles, such as filters The label pattern corresponding label pattern information in described label pattern dictionary is met in described object titles.For example, connect example, Filtration treatment device 14 sets up the described label pattern dictionary of device 12 foundation according to dictionary, and title acquisition device 13 is got User A upload the object titles title1 " LTE Physical Downlink Control Channel blind check process study " of document document1 and The object titles title2 " the piece inner joint of the 5th chapter MCS-51 series monolithic and interruption " of document document2 is filtered Process, each meet in described label pattern dictionary in object titles as filtered document document1 and document document2 Label pattern corresponding label pattern information, such as filters document document2 object titles title2 " the 5th chapter MCS-51 series Label pattern information " the 5th chapter " in the piece inner joint of single-chip microcomputer and interruption ", and the object titles title1 of document1 There is not the label pattern meeting in described label pattern dictionary in " LTE Physical Downlink Control Channel blind check process study " to correspond to Label pattern information, then filtration treatment device 14 not the object titles title1 to document1 " LTE physical down control Channel blind detection process study " carries out filtration treatment.
Those skilled in the art will be understood that the above-mentioned mode carrying out filtration treatment to described object titles is only for example, its He will such as be applicable to the present invention at the mode that described object titles are carried out with filtration treatment that is existing or being likely to occur from now on, also should Within being included in the scope of the present invention, and here is incorporated herein with way of reference.
Characterize the word relevant information determining device 15 according to the title word in the described object titles after filtration treatment, Determine the object characterization information with regard to described destination object for the described object titles.Specifically, characterizing determines device 15 first to mistake Described object titles after filter is processed carry out word segmentation processing, to obtain the title word in described object titles;Then, further according to The word relevant information of described title word, determines the object characterization information with regard to described destination object for the described object titles.? This, described word relevant information include but is not limited under at least any one:1) word of the title word in described object titles Frequency information, here, the word frequency information of described title word can be obtained by query terms frequency database, wherein, described word Frequency database can be and pre-sets, and also can be obtained by carrying out statistics to the title words in multiple training titles;2) institute State the quantity information of the title word in object titles;3) quantity information of the character in described object titles.Here, it is described right As characterization information is used for representing the quality information of described object titles, it not only reflects described object titles to described target pair The sign ability of the content information of elephant, also embodies whether described object titles can characterize the interior of described destination object well The tolerance of appearance information, it such as can be represented using numerical value with quantificational expression, and it can be such as high and low etc. with qualitative representation.For example, mistake Filter processing meanss 14 are to document2 object titles title2 " the piece inner joint of the 5th chapter MCS-51 series monolithic and interruption " Filtering object title titile2 ' " the piece inner joint of MCS-51 series monolithic and interruption ", then table is obtained after carrying out filtration treatment Levy determination device 15 and first word segmentation processing is carried out to filtering object title titile1 ', to obtain filtering object title titile2 ' The bag of words information " interruption of MCS-51 interface microcontroller " obtaining after word segmentation processing, that is, obtain filtering object title titile1 ' institute Corresponding title word information;Then, characterize the word determining device 15 according to title word " interruption of MCS-51 interface microcontroller " Language relevant information, determines the object characterization information with regard to described destination object for the described object titles, such as assumes filtering object title Title word " interruption of MCS-51 interface microcontroller " corresponding word frequency information in titile2 ' be respectively 9486,503200, 664560th, 432598, have more than predetermined threshold such as 400000 word frequency in title word " interruption of MCS-51 interface microcontroller " Title word " interface microcontroller interruption ", then characterize determine device 15 can determine that object titles title2 " the 5th chapter MCS-51 system The piece inner joint of row single-chip microcomputer and interruption " is height with regard to the object characterization information of described destination object document2;For another example, false If the title word in filtering object title titile2 ' " interruption of MCS-51 interface microcontroller " corresponding word frequency information is respectively 9486th, 303200,264560,392598, do not have more than predetermined threshold in title word " interruption of MCS-51 interface microcontroller " The title word of the word frequency of value such as 400000, but the quantity information of title word " interruption of MCS-51 interface microcontroller " satisfaction is more than Equal to predetermined threshold 4, then characterize and determine that device 15 can determine that the object titles title2 " piece of the 5th chapter MCS-51 series monolithic Inner joint and interruption " is height with regard to the object characterization information of described destination object document2;Also such as, if title word " MCS- 51 interface microcontrollers interrupt " in there is no the title word of word frequency more than predetermined threshold such as 400000 and/or title word Quantity information is also unsatisfactory for predetermined threshold 4, then characterize and determine that device 15 can determine that object titles title2 " the 5th chapter MCS-51 system The piece inner joint of row single-chip microcomputer and interruption " is low with regard to the object characterization information of described destination object document2.Here, institute Predicate speech frequency rate database can be located at information and determines in equipment 1, may be alternatively located at and determines, with information, the net that equipment 1 is connected by network In network equipment.
Here, the present invention is by the related letter of the word according to the title word in the described object titles after filtration treatment Breath, because the described object titles after filtration treatment provide object titles content closer to real quality it is achieved that low-quality Header identification rate and recognition accuracy respectively reach 93% and 91% beneficial effect.
Those skilled in the art will be understood that the object characterization with regard to described destination object for the described object titles of above-mentioned determination The mode of information is only for example, and other existing or object titles described in determination of being likely to occur from now on are with regard to described destination object The mode of object characterization information be such as applicable to the present invention, within also should being included in the scope of the present invention, and here is to draw It is incorporated herein with mode.
It is constant work that information determines between each device of equipment 1.Specifically, training acquisition device 11 continues Obtain multiple training titles;Dictionary is set up device 12 and is continued, according to the label pattern information in the plurality of training title, to set up Or update corresponding label pattern dictionary, wherein, described label pattern dictionary includes one or more label patterns and its frequency Information;Title acquisition device 13 persistently obtains the object titles of pending destination object;Filtration treatment device 14 continues basis Described object titles are carried out filtration treatment by described label pattern dictionary;Characterize after determining that device 15 continues according to filtration treatment Described object titles in title word word relevant information, determine that described object titles are right with regard to described destination object As characterization information.Here, skilled artisan would appreciate that " continuing " information of referring to determines between each device of equipment 1 respectively Constantly be trained the acquisition of title, the foundation of label pattern dictionary or renewal, the acquisition of object titles, to described object Title carries out the determination of filtration treatment and object characterization information, until information determines that equipment 1 stops " object mark in a long time The acquisition of topic ".
Preferably, information determines that equipment 1 also includes pretreatment unit (not shown), and specifically, pretreatment unit is to filtration Described object titles after process are pre-processed, to obtain pretreated described object titles;Wherein, characterize and determine device 15, according to the word relevant information of the title word in pretreated described object titles, determine described object titles with regard to institute State the object characterization information of destination object.
Specifically, pretreatment unit pre-processes to the described object titles after filtration treatment, to obtain after pretreatment Described object titles.Here, the including but not limited to following at least any one of described pretreatment:1) to described in after filtration treatment Object titles carry out punctuation mark denoising, that is, remove the punctuation mark in the described object titles after filtration treatment;2) right Described object titles after filtration treatment carry out ASCII symbol removal and process, but simultaneously according to predetermined foreign language dictionary, retain and filter Foreign language words in the described predetermined foreign language dictionary having in described object titles after process, wherein, described predetermined outer cliction Allusion quotation can be and pre-sets, and such as existing collects that English glossary arranges in some sequence and be further explained supplies people to check the English of reference Cliction allusion quotation;Also can be obtained by statistics is carried out to the title word in multiple English training titles.
For example, the object titles title1 for document document1 " grind by LTE Physical Downlink Control Channel blind check process Study carefully ", filtration treatment device 14 carries out to titile1 obtaining filtering object title title1 ' " LTE physical down after filtration treatment Control channel blind check process study ", then " LTE Physical Downlink Control Channel is blind to filtering object title title1 ' for pretreatment unit Inspection process study " is pre-processed it is assumed that the English word " LTE " in filtering object title title1 ' is present in predetermined foreign language In dictionary, then after filtration treatment device pre-processes to filtering object title title1 ', obtain pretreated described object Title such as titile1 " " LTE Physical Downlink Control Channel blind check process study ";For another example, for the object titles of document2 Title2 " the piece inner joint of the 5th chapter MCS-51 series monolithic and interruption ", filtration treatment device 14 filters to titile2 Filtering object title title2 ' " the piece inner joint of MCS-51 series monolithic and interruption ", then pretreatment unit is obtained after process Filtering object title title2 ' " the piece inner joint of MCS-51 series monolithic and interruption " is pre-processed it is assumed that filtered right As the English word " MCS-51 " in title title2 ' is not present in predetermined foreign language dictionary, then filtration treatment device is to filtration After object titles title2 ' is pre-processed, obtain pretreated described object titles such as titile2 " " series monolithic Piece inner joint and interruption ".
Those skilled in the art will be understood that the above-mentioned mode that described object titles after filtration treatment are pre-processed It is only for example, other modes that the described object titles after filtration treatment are pre-processed that are existing or being likely to occur from now on As being applicable to the present invention, within also should being included in the scope of the present invention, and here is incorporated herein with way of reference.
Then, characterize the related letter of word determining device 15 according to the title word in pretreated described object titles Breath, determines the object characterization information with regard to described destination object for the described object titles.Here, characterize determining that device 15 is located according to pre- The word relevant information of the title word in described object titles after reason determines described object titles with regard to described destination object Object characterization information and aforementioned characteristic determine device 15 according to the title word in the described object titles after filtration treatment Word relevant information determines that described object titles are same or similar with regard to the mode of the object characterization information of described destination object, is For the sake of simple and clear, therefore will not be described here, and comprise by reference and this.
In another preferred embodiment, can be by above-mentioned for determining the object characterization with regard to corresponding destination object for the object titles The information of information determines equipment 1, combines with existing browser, constitutes a kind of new browser, and existing browser includes The IE browser of such as Microsoft Corporation, the netscape browser of Netscape company, the Firefox of Mozilla company Browser, the Chrome browser of Google company, the Maxthon browser of company of roaming, the opera of Opera company browse Device, 360 browsers of 360 companies, the sogou browser of Sohu.com Inc., tencent TT browser of Tencent etc..
In another preferred embodiment, can be by above-mentioned for determining the object characterization with regard to corresponding destination object for the object titles The information of information determines equipment 1, combines with existing browser plug-in, constitutes a kind of new browser plug-in, existing clear Device plug-in unit of looking at is included as Flash plug-in unit, RealPlayer plug-in unit, MMS plug-in unit, MIDI staff plug-in unit, ActiveX plug-in unit etc..
It is right with regard to corresponding destination object for determining object titles that Fig. 2 illustrates in accordance with a preferred embodiment of the present invention As the equipment schematic diagram of characterization information, wherein, information determine equipment 1 include training acquisition device 11 ', dictionary set up device 12 ', Title acquisition device 13 ', filtration treatment device 14 ', sign determine device 15 ', optimize determination device 16 ' and offer device 17 '. Specifically, training acquisition device 11 ' obtains multiple training titles;Dictionary sets up device 12 ' according in the plurality of training title Label pattern information, set up or update corresponding label pattern dictionary, wherein, described label pattern dictionary includes one or many Individual label pattern and its frequency information;Title acquisition device 13 ' obtains the object titles of pending destination object;Filtration treatment Device 14 ', according to described label pattern dictionary, carries out filtration treatment to described object titles;Characterize and determine device 15 ' according to mistake The word relevant information of the title word in described object titles after filter process, determines described object titles with regard to described target The object characterization information of object;When described object characterization information is less than predetermined sign threshold information, optimizes and determine device 16 ' really The fixed optimization configured information with regard to described object titles;Described optimization configured information is supplied to described target by offer device 17 ' User corresponding to object.Here, training acquisition device 11 ', dictionary set up device 12 ', title acquisition device 13 ', at filtration Reason device 14 ', sign determine that device 15 ' is same or similar with the content of corresponding intrument in Fig. 1 embodiment, for simplicity's sake, therefore Will not be described here, and comprise by reference and this.
Specifically, when described object characterization information is less than predetermined sign threshold information, optimization determines that device 16 ' determines and closes Optimization configured information in described object titles.Here, how described optimization configured information includes instruction user to described object Title is modified, is optimized to obtain the information of high-quality object titles, as amending advice with regard to described object titles etc.. Here, the mode optimizing the optimization configured information determining that device 16 ' determines with regard to described object titles is including but not limited to following At least any one:
1) summary info according to described destination object, determines described optimization configured information.Specifically, optimize and determine device 16 ' can carry out semantic analysis process to the summary info of described destination object first, to obtain one or more summary keywords; Then, further according to one or more of summary keywords, in title with carrying out matching inquiry in repertorie, with according to Query Result Determine described optimization configured information.For example, it is assumed that title acquisition device 13 ' gets pending following destination object object-document:
Title title:Test seven Network Sniffings
Summary info abstract:Based on Ethereal Sniffer software sniff packet, and the data being arrived according to sniff Bag judges network condition.
Body matter information content:【Experimental principle】Network monitoring is a kind of conventional Passive Network attack method, Invader's very unobtainable information of acquisition additive method easily can be helped, including user password, account, sensitive data, IP Address, routing iinformation, TCP socket number etc........
Assume to characterize and determine that device 15 ' determines the described object with regard to destination object object-document for the title title Characterization information is less than predetermined sign threshold information, then optimize and determine that device 16 ' first can be to destination object object-document Summary info abstract carry out semantic analysis process, to obtain one or more summary keywords, such as " Ethereal sniff Packet networks situation ";Then, optimizing determines device 16 ' further according to this summary keyword " Ethereal sniff packet networks Situation ", in title with carrying out matching inquiry in repertorie, to determine described optimization configured information according to Query Result, as when in institute State matching inquiry in title repertorie and make a summary what keyword " Ethereal sniff packet networks situation " matched to described In title term and/or described summary keyword " Ethereal sniff packet networks situation " with described title repertorie in When the quantity that title term matches accounts for the ratio of the total quantity of described keyword and meets predetermined threshold such as 0.8, then optimize and determine The described optimization configured information that device 16 ' determines includes " in conjunction with summary info, described object titles can be optimized ", otherwise, Optimize and determine that the described optimization configured information that device 16 ' determines includes " suggestion is optimized to described object titles ".Here, institute State title repertorie and can be located at information and determine in equipment 1, may be alternatively located at and determine that equipment 1 is set by the network that network is connected with information In standby.
2) degree of correlation according to described object titles and the body matter information of described destination object, in conjunction with described target pair The quantity information of the text word of the body matter information of elephant, determines described optimization configured information.Specifically, optimize and determine device 16 ' first can be by the title word information matches corresponding to such as described object titles in the body matter of described destination object The title word quantity information of the text word information corresponding to information, or, by described object titles and described target pair The matching degree of the body matter information of elephant, determines that described object titles are related to the body matter information of described destination object Degree;Then, optimize and determine device 16 ' according to this degree of correlation, in conjunction with the text word of the body matter information of described destination object Quantity information, determine described optimization configured information.For example, connect example, optimize and determine device 16 ' first to described object mark Inscribe, and the body matter information of described destination object carries out semantic analysis process, obtains the mark corresponding to described object titles Epigraph language information " Network Sniffing ", and the text word information " network monitoring corresponding to the body matter information of destination object Sniff packet network interface card experimental service configures ";Then, optimize and determine title according to corresponding to described object titles for the device 16 ' It is matched with the title word quantity of the text word information corresponding to body matter information of described destination object in word information Information, determines the degree of correlation of described object titles and the body matter information of described destination object, as being matched with described target The title word quantity information of the text word information corresponding to body matter information of object and described title word total quantity Ratio, as the described degree of correlation;Then, optimize and determine device 16 ' according to this degree of correlation, in conjunction with the text of described destination object The quantity information of the text word of content information, determines described optimization configured information, as assumed the title corresponding to title title It is matched with the mark of the text word information corresponding to body matter information of described destination object in word information " Network Sniffing " Epigraph language quantity information is 100% with the ratio of described title word total quantity, then optimize and determine that device 16 ' determines described object Title is 1 with the degree of correlation of the body matter information of described destination object;Then, optimizing determines device 16 ' according to this degree of correlation 1, body matter information content of combining target object object-document:Text word quantity information, such as false If the quantity information of the text word of the body matter information of described destination object has 20, determine that described optimization indicates letter Breath, such as " in conjunction with body matter information, described object titles can be optimized ", otherwise, optimizes the institute determining that device 16 ' determines State optimization configured information and include " suggestion is optimized to described object titles ".
Here, the present invention passes through the summary info of combining target object and/or the body matter combining described destination object Information is it is achieved that the beneficial effect to 100% for the rate of accuracy reached of the described optimization configured information determining.
Those skilled in the art will be understood that above-mentioned determination with regard to the optimization configured information of described object titles mode only For citing, other existing or determinations of being likely to occur from now on such as can with regard to the mode of the optimizations configured information of described object titles It is applied to the present invention, within also should being included in the scope of the present invention, and here is incorporated herein with way of reference.
Preferably, when the object language type information of described destination object is believed with the title language type of described object titles When ceasing inconsistent, optimize determine device 16 ' also can by described object titles under described object language type information corresponding ginseng Examine heading message and be contained in described optimization configured information.For example, it is assumed that the object language class of destination object object-document Type information is English, and the title language type information of object titles titile is Chinese, then optimize and determine that device 16 ' also can be by Described object titles titile corresponding reference heading message under described object language type information is contained in described optimization and refers to Show information, described optimization configured information will be contained in reference to heading message by the corresponding English of object titles titile.
Offer device 17 ' passes through the dynamic web page technique such as ASP, JSP or PHP, or the communication mode of other agreements, As communication protocols such as http or https, described optimization configured information is supplied to the user corresponding to described destination object, such as should The user equipment of user, reads for user and browses.
Preferably, preferably determine that device 16 ' includes degree of correlation determining unit (not shown) and optimization determining unit (is not shown Go out).Specifically, when described object characterization information is less than predetermined sign threshold information, degree of correlation determining unit determines described mesh The body matter information of mark object and the degree of correlation of described title repertorie;Optimize determining unit according to the described degree of correlation, determine Described optimization configured information.
Specifically, when described object characterization information is less than predetermined sign threshold information, degree of correlation determining unit determines institute State the body matter information of destination object and the degree of correlation of described title repertorie.Specifically, when described object characterization information is low When predetermined sign threshold information, content keyword coupling according to corresponding to described body matter information for the degree of correlation determining unit The keyword quantity information of the title term in described title repertorie, determines the described degree of correlation, and content is crucial as will be described It is matched with the keyword quantity information of title term in described title repertorie and described content keyword total quantity in word Ratio, as the described degree of correlation.For example, it is assumed that characterize determining that device 15 ' determines title title with regard to destination object object- The described object characterization information of document is less than predetermined sign threshold information, then degree of correlation determining unit is first to destination object Body matter information content of object-document carries out semantic analysis process, obtains body matter information content Corresponding content keyword " configuration of network monitoring sniff packet network interface card experimental service ";Then, degree of correlation determining unit root It is matched with the keyword of the title term in described title repertorie according to the content keyword corresponding to described body matter information Quantity information, determines the degree of correlation of described object titles and the body matter information of described destination object, as described in will be matched with The keyword quantity information of the title term in title repertorie and the ratio of described content keyword total quantity, as described phase Guan Du, the content keyword as corresponding to hypothesis body matter information content is matched with the title in described title repertorie The keyword quantity information of term accounts for the 92% of described content keyword total quantity, then degree of correlation determining unit can determine that target pair As body matter information content of object-document is 0.92 with the degree of correlation of described title repertorie.
Those skilled in the art will be understood that the body matter information of the described destination object of above-mentioned determination and described title are used The mode of the degree of correlation of repertorie is only for example, in other existing or texts of destination object described in determination of being likely to occur from now on Appearance information is such as applicable to the present invention with the mode of the degree of correlation of described title repertorie, also should be included in the scope of the present invention Within, and here is incorporated herein with way of reference.
Then, optimize determining unit according to the described degree of correlation, determine described optimization configured information, as when the described degree of correlation big When predetermined threshold, determine that described optimization configured information includes in conjunction with body matter information, described object titles being carried out excellent Change ", otherwise, it determines described optimization configured information includes " suggestion is optimized " to described object titles.For example, example, phase are connected Pass degree determining unit determines body matter information content of destination object object-document and described title repertorie The degree of correlation be 0.92, more than predetermined threshold such as 0.85, then optimization determining unit, according to this degree of correlation 0.92, determines described optimization Configured information, such as " in conjunction with body matter information, described object titles can be optimized ", otherwise, optimizes what determining unit determined Described optimization configured information includes " suggestion is optimized " to described object titles.
Fig. 3 illustrate according to a further aspect of the present invention for determining the Object table with regard to corresponding destination object for the object titles The method flow diagram of reference breath.
Specifically, in step sl, information determines that equipment 1 obtains multiple training titles;In step s 2, information determines and sets Standby 1, according to the label pattern information in the plurality of training title, sets up or updates corresponding label pattern dictionary, wherein, institute State label pattern dictionary and include one or more label patterns and its frequency information;In step s3, information determines that equipment 1 obtains The object titles of pending destination object;In step s 4, information determines equipment 1 according to described label pattern dictionary, to institute State object titles and carry out filtration treatment;In step s 5, information determines equipment 1 according in the described object titles after filtration treatment Title word word relevant information, determine the object characterization information with regard to described destination object for the described object titles.Here, Information determines equipment 1 including but not limited to such as:1) it is applied not only to provide information storage space for its login user, to realize this use Family uploads to share its destination object such as document, video, picture;Can be additionally used in providing the user online reading, download, exchange it The network platform of the destination object that his user shares or terminal platform, such as Baidu library, beans fourth, Sina's love are asked, road visitor Ba Ba etc., Wherein, described terminal platform includes but is not limited to the user equipment such as mobile terminal, PC;2) it is used for being embodied as the offer of its login user Message reference, the network platform of information sharing, information issue or synchronization or terminal platform, as social network sites, forum, space, win The third party websites such as visitor, microblogging.Here, information determines that equipment 1 includes but is not limited to user network equipment, user equipment or network Equipment passes through the mutually integrated equipment being constituted of network with user equipment.Here, described network determines equipment including but not limited to such as Network host, single network server, multiple webserver collection or the set of computers based on cloud computing etc. are realized;Or by User equipment is realized.Here, cloud is made up of a large amount of main frames based on cloud computing (Cloud Computing) or the webserver, Wherein, cloud computing is one kind of Distributed Calculation, a super virtual computing being made up of a group loosely-coupled computer collection Machine.Here, described user equipment can be any one can pass through keyboard, mouse, touch pad, touch-screen or hand-written with user The modes such as equipment carry out the electronic product of man-machine interaction, such as computer, mobile phone, PDA, palm PC PPC or panel computer etc.. Described network includes but is not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN, VPN, wireless self-organization network (Ad Hoc Network) etc..Those skilled in the art will be understood that above- mentioned information determines that equipment 1 is only for example, and other are existing or from now on may The network equipment occurring or user equipment are such as applicable to the present invention, within also should being included in the scope of the present invention, and here It is incorporated herein with way of reference.Here, the network equipment and user equipment all include a kind of according to being previously set or to store Instruction, carries out the electronic equipment of numerical computations and information processing automatically, and its hardware includes but is not limited to microprocessor, special integrated Circuit (ASIC), programmable gate array (FPGA), digital processing unit (DSP), embedded device etc..
Specifically, in step sl, information determines that equipment 1 is carried by third party devices such as browser, search engines For application programming interfaces (API), obtain multiple training titles;Or, first pass through the third parties such as search engine, browser and set The standby application programming interfaces (API) obtaining user's upload daily record providing, obtain multiple users and upload daily record;Then, many from this Individual user uploads in daily record and obtains multiple training titles.For example, in step sl, information determines that equipment 1 is provided by browser Provide and obtain the application programming interfaces (API) uploading daily record, get multiple users and upload daily records, such as within certain time, User uploads which document, video, picture etc.;Then, in step sl, information determines that equipment 1 uploads from the plurality of user Obtain in daily record below multiple training titles training title I to VIII etc. as:
I " the 6th chapter serial line interface 2010 spring "
II " Algorithms for Page Ranking based on Segment "
III " the 8th chapter application layer "
IV " 5-5_ minimum cost maximum flow problem-xfj "
V " angular momentum of 3-6 particle and angular momentum theorem -1 "
VI " 2011-12 ground knot "
VII " experiment seven Network Sniffings "
VIII " the WEB page block algorithms of facing mobile apparatus "
............
Those skilled in the art will be understood that above-mentioned acquisition multiple training title mode be only for example, other existing or The mode of the acquisition multiple training title being likely to occur from now on is such as applicable to the present invention, also should be included in the scope of the present invention Within, and here is incorporated herein with way of reference.
In step s 2, information determine equipment 1 according to the plurality of training title in label pattern information, set up or more Newly corresponding label pattern dictionary, wherein, described label pattern dictionary includes one or more label patterns and its frequency information. Specifically, in step s 2, information determines that equipment 1 can first-selected be normalized to the plurality of training title;Then, right The label pattern information in the plurality of training title after normalized carries out label and processes, to determine the plurality of instruction Practice the one or more label patterns corresponding to title;Then, then to one or more of label patterns carry out statistical disposition, Obtain described label pattern dictionary.Here, the including but not limited to following at least any one of described normalized:1) to described instruction Alphabet size in white silk title is write and is normalized, and the described alphabet size trained in title will write into capable unification;2) Full-shape/half-angle normalized is carried out to the character in described training title.Here, described label pattern information represents training mark Time comprising in chapters and sections belonging to mark training title present in topic, mark training title etc. does not characterize the interior of essential meaning Hold part, such as " the 6th chapter ", " 2.1 section ", " experiment seven ", " 3-6 ", " 2011-12 " etc..Those skilled in the art will be understood that State label pattern information, normalized mode is only for example, other label pattern information that are existing or being likely to occur from now on Or normalized mode is such as applicable to the present invention, within also should being included in the scope of the present invention, and here is with the side of quoting Formula is incorporated herein.
For example, connect example, in step s 2, information determines training title I that equipment 1 gets in step sl to it extremely After VIII etc. is normalized, to the label pattern information in multiple training title I to VIII after normalized etc. Carry out label to process, such as by number designation be substituted for character " _ ", with determine corresponding to the plurality of training title one or Multiple label patterns, such as obtain training in title II and VIII do not have label pattern, and train in title I, III to VII respectively Comprise label pattern " the _ chapter ", the _ chapter ", " _-_ ", " _-_ ", " _ _ _ _-_ _ ", " experiment _ ";Then, in step s 2, information Determine that equipment 1 carries out statistical disposition to one or more of label patterns again, by label pattern and its corresponding frequency information It is stored in label pattern dictionary, such as obtain mark pattern dictionary as shown in the following Table 3, wherein, described label pattern dictionary Including one or more label patterns and its frequency information, and this label pattern dictionary can be updated by certain way, such as according to pre- Fixed cycle, timing update, update described label pattern dictionary immediately:
Table 3
Preferably, in step s 2, information determines the training title that equipment 1 also can get first in step sl to it I to VIII etc. carries out label and processes, to determine the one or more label patterns corresponding to the plurality of training title;So Afterwards, statistical disposition is carried out to described label pattern, to obtain corresponding initial label pattern dictionary, wherein, described initial label Pattern dictionary includes included label pattern and its corresponding frequency information in the plurality of training title;Then, further according to Described frequency information, carries out Screening Treatment to the label pattern in described initial title pattern dictionary, to obtain described label mould Formula dictionary.For example, also connect example, in step s 2, information determines that equipment 1 carries out label to the plurality of training title first Process, such as by number designation be substituted for character " _ ", to determine the one or more label moulds corresponding to the plurality of training title Formula, such as obtains training in title II and VIII do not have label pattern, and trains and comprise label mould in title I, III to VII respectively Formula " the _ chapter ", the _ chapter ", " _-_ ", " _-_ ", " _ _ _ _-_ _ ", " experiment _ ";Then, Statistics Division is carried out to described label pattern Reason, to obtain corresponding initial label pattern dictionary, wherein, described initial label pattern dictionary includes the plurality of training title In included label pattern and its corresponding frequency information, as obtained the initial label pattern dictionary shown in above-mentioned table 3;Connect , further according to described frequency information, Screening Treatment is carried out to the label pattern in described initial title pattern dictionary, to obtain State label pattern dictionary, the label pattern such as frequency information being less than predetermined threshold such as 50000 is removed, obtain described label Pattern dictionary, such as obtains label pattern dictionary as shown in table 4:
Label pattern Frequency information
The _ chapter 449291
____-__-__ 144205
Experiment _ 90522
The _ _ chapter 80418
(_) 57856
Table 4
Those skilled in the art will be understood that above-mentioned foundation or the mode of the corresponding label pattern dictionary of renewal are only for example, Other existing or foundation being likely to occur from now on or the mode updating corresponding label pattern dictionary are such as applicable to the present invention, Also within the scope of the present invention should being included in, and here is incorporated herein with way of reference.
In step s3, information determines that equipment 1 passes through the application that the third party device such as browser, search engine provides Routine interface (API), obtains the object titles of pending destination object;Or, by dynamic web page techniques such as ASP, JSP, Obtain the object titles that user passes through the destination object that its user equipment PC uploads, using the object as pending destination object Title.Here, described destination object include but is not limited to that user uploads with media formats such as document, video, picture, daily records or A combination thereof or the combination of one or more of which, carry the information for sharing.For example, it is assumed that user A logs in Baidu library http:After //wenku.baidu.com/, upload PDF document document1, its entitled title1 is " under LTE physics Row control channel blind check process study " and document2, its entitled title2 is " in the piece of the 5th chapter MCS-51 series monolithic Interface and interruption ", then in step s3, information determines the application programming interfaces (API) that equipment 1 is provided by Baidu library, just Object titles " the LTE Physical Downlink Control Channel blind check that user A passes through the destination object that its user equipment PC uploads can be got Process study " and " the piece inner joint of the 5th chapter MCS-51 series monolithic and interruption ".
Those skilled in the art will be understood that the mode of the object titles of the pending destination object of above-mentioned acquisition is only and lift Example, the mode of the pending object titles of destination object of other acquisitions that are existing or being likely to occur from now on is such as applicable to this Invention, within also should being included in the scope of the present invention, and here is incorporated herein with way of reference.
In step s 4, information determines that equipment 1, according to described label pattern dictionary, is carried out at filtration to described object titles Reason, as filtered the label pattern corresponding label pattern information meeting in described object titles in described label pattern dictionary.Example As connected example, in step s 4, information determines the described label pattern dictionary that equipment 1 is set up in step s 2 according to it, to it " LTE physical down controls letter to the object titles title1 of the document document1 that the user A getting in step s3 uploads The object titles title2 of road blind check process study " and document document2 " connects in the piece of the 5th chapter MCS-51 series monolithic Mouthful and interrupt " carry out filtration treatment, each meet institute in object titles as filtered document document1 and document document2 State the label pattern corresponding label pattern information in label pattern dictionary, such as filter document document2 object titles Label pattern information " the 5th chapter " in title2 " the piece inner joint of the 5th chapter MCS-51 series monolithic and interruption ", and Do not exist in the object titles title1 " LTE Physical Downlink Control Channel blind check process study " of document1 and meet described mark Label pattern corresponding label pattern information in number pattern dictionary, then in step s 4, information determines that equipment 1 is not right The object titles title1 " LTE Physical Downlink Control Channel blind check process study " of document1 carries out filtration treatment.
Those skilled in the art will be understood that the above-mentioned mode carrying out filtration treatment to described object titles is only for example, its He will such as be applicable to the present invention at the mode that described object titles are carried out with filtration treatment that is existing or being likely to occur from now on, also should Within being included in the scope of the present invention, and here is incorporated herein with way of reference.
In step s 5, information determines the word according to the title word in the described object titles after filtration treatment for the equipment 1 Language relevant information, determines the object characterization information with regard to described destination object for the described object titles.Specifically, in step s 5, Information determines that equipment 1 carries out word segmentation processing to the described object titles after filtration treatment first, to obtain in described object titles Title word;Then, further according to the word relevant information of described title word, determine described object titles with regard to described target The object characterization information of object.Here, described word relevant information include but is not limited under at least any one:1) described object The word frequency information of the title word in title, here, the word frequency information of described title word can pass through query terms frequency data Storehouse obtains, and wherein, described term frequencies database can be and pre-sets, also can be by the headings in multiple training titles Language carries out statistics and obtains;2) quantity information of the title word in described object titles;3) character in described object titles Quantity information.Here, described object characterization information is used for representing the quality information of described object titles, it not only reflects described Whether the sign ability of the content information to described destination object for the object titles, also embodying described object titles can be well Characterize the tolerance of the content information of described destination object, it such as can be represented using numerical value, it can be with qualitative table with quantificational expression Show, such as high and low etc..For example, in step s 4, information determines equipment 1 to document2 object titles title2 " the 5th chapter MCS- The piece inner joint of 51 series monolithics and interruption " obtains filtering object title titile2 ' " MCS-51 system after carrying out filtration treatment The piece inner joint of row single-chip microcomputer and interruption ", then in step s 5, information determines equipment 1 first to filtering object title Titile1 ' carries out word segmentation processing, to obtain the bag of words information that filtering object title titile2 ' obtains after word segmentation processing " interruption of MCS-51 interface microcontroller ", that is, obtain the title word information corresponding to filtering object title titile1 ';Then, exist In step S5, information determines the word relevant information according to title word " interruption of MCS-51 interface microcontroller " for the equipment 1, determines institute State the object characterization information with regard to described destination object for the object titles, such as assume the heading in filtering object title titile2 ' Language " interruption of MCS-51 interface microcontroller " corresponding word frequency information is respectively 9486,503200,664560,432598, i.e. title There is in word " interruption of MCS-51 interface microcontroller " the title word " interface microcontroller more than predetermined threshold such as 400000 word frequency Interrupt ", then in step s 5, information determine equipment 1 can determine that object titles title2 " the 5th chapter MCS-51 series monolithic Piece inner joint and interruption " is height with regard to the object characterization information of described destination object document2;For another example it is assumed that filtering object Title word " interruption of MCS-51 interface microcontroller " corresponding word frequency information in title titile2 ' respectively 9486, 303200th, 264560,392598, do not have more than predetermined threshold such as in title word " interruption of MCS-51 interface microcontroller " The title word of 400000 word frequency, but the quantity information of title word " interruption of MCS-51 interface microcontroller " satisfaction is more than or equal to Predetermined threshold 4, then in step s 5, information determines that equipment 1 can determine that object titles title2 " the 5th chapter MCS-51 series monolithic The piece inner joint of machine and interruption " is height with regard to the object characterization information of described destination object document2;Also such as, if heading There is no in language " interruption of MCS-51 interface microcontroller " the title word of word frequency and/or the title more than predetermined threshold such as 400000 The quantity information of word is also unsatisfactory for predetermined threshold 4, then in step s 5, information determines that equipment 1 can determine that object titles Title2 " the piece inner joint of the 5th chapter MCS-51 series monolithic and interruption " is with regard to the object of described destination object document2 Characterization information is low.Here, described term frequencies database can be located at information determining in equipment 1, may be alternatively located at and determine with information Equipment 1 passes through in the network equipment that network is connected.
Here, the present invention is by the related letter of the word according to the title word in the described object titles after filtration treatment Breath, because the described object titles after filtration treatment provide object titles content closer to real quality it is achieved that low-quality Header identification rate and recognition accuracy respectively reach 93% and 91% beneficial effect.
Those skilled in the art will be understood that the object characterization with regard to described destination object for the described object titles of above-mentioned determination The mode of information is only for example, and other existing or object titles described in determination of being likely to occur from now on are with regard to described destination object The mode of object characterization information be such as applicable to the present invention, within also should being included in the scope of the present invention, and here is to draw It is incorporated herein with mode.
It is constant work that information determines between each step of equipment 1.Specifically, in step sl, information is true Locking equipment 1 persistently obtains multiple training titles;In step s 2, information determines that equipment 1 continues according to the plurality of training title In label pattern information, set up or update corresponding label pattern dictionary, wherein, described label pattern dictionary include one or Multiple label patterns and its frequency information;In step s3, information determines that equipment 1 persistently obtains the right of pending destination object As title;In step s 4, information determines that equipment 1 continues according to described label pattern dictionary, and described object titles were carried out Filter is processed;In step s 5, information determines that equipment 1 continues according to the title word in the described object titles after filtration treatment Word relevant information, determines the object characterization information with regard to described destination object for the described object titles.Here, people in the art Member is it should be understood that " continuing " information of referring to determines acquisition, the mark being constantly trained title between each step of equipment 1 respectively The foundation in number pattern dictionary storehouse or renewal, the acquisition of object titles, filtration treatment and object characterization are carried out to described object titles The determination of information, until information determines that equipment 1 stops the acquisition of " object titles " in a long time.
Preferably, information determines that equipment 1 also includes step S8 (not shown), and specifically, in step s 8, information determination sets Described object titles after standby 1 pair of filtration treatment pre-process, to obtain pretreated described object titles;Wherein, exist In step S5, information determines the word relevant information according to the title word in pretreated described object titles for the equipment 1, really Fixed described object titles are with regard to the object characterization information of described destination object.
Specifically, in step s 8, information determines that equipment 1 pre-processes to the described object titles after filtration treatment, To obtain pretreated described object titles.Here, the including but not limited to following at least any one of described pretreatment:1) to mistake Described object titles after filter is processed carry out punctuation mark denoising, that is, remove in the described object titles after filtration treatment Punctuation mark;2) the described object titles after filtration treatment are carried out with ASCII symbol removal process, but simultaneously according to predetermined foreign language Dictionary, retains the foreign language words in the described predetermined foreign language dictionary having in the described object titles after filtration treatment, wherein, institute State predetermined foreign language dictionary and can be and pre-set, such as existing collection English glossary arranges in some sequence and be further explained supplies people Check the English dictionary of reference;Also can be obtained by statistics is carried out to the title word in multiple English training titles.
For example, the object titles title1 for document document1 " grind by LTE Physical Downlink Control Channel blind check process Study carefully ", in step s 4, information determines that equipment 1 carries out to titile1 obtaining filtering object title title1 ' after filtration treatment " LTE Physical Downlink Control Channel blind check process study ", then in step s 8, information determines equipment 1 to filtering object title Title1 ' " LTE Physical Downlink Control Channel blind check process study " is pre-processed it is assumed that in filtering object title title1 ' English word " LTE " be present in predetermined foreign language dictionary, then in step s 8, information determines equipment 1 to filtering object title After title1 ' is pre-processed, obtain pretreated described object titles such as titile1 " " LTE Physical Downlink Control Channel Blind check process study ";For another example, the object titles title2 for document2 is " in the piece of the 5th chapter MCS-51 series monolithic Interface and interruption ", in step s 4, information determines that equipment 1 carries out obtaining filtering object title after filtration treatment to titile2 Title2 ' " the piece inner joint of MCS-51 series monolithic and interruption ", then in step s 8, information determines that equipment 1 is right to filtering As title title2 ' " the piece inner joint of MCS-51 series monolithic and interruption " is pre-processed it is assumed that filtering object title English word " MCS-51 " in title2 ' is not present in predetermined foreign language dictionary, then in step s 8, information determines equipment 1 After filtering object title title2 ' is pre-processed, obtain pretreated described object titles such as titile2 " " series is single The piece inner joint of piece machine and interruption ".
Those skilled in the art will be understood that the above-mentioned mode that described object titles after filtration treatment are pre-processed It is only for example, other modes that the described object titles after filtration treatment are pre-processed that are existing or being likely to occur from now on As being applicable to the present invention, within also should being included in the scope of the present invention, and here is incorporated herein with way of reference.
Then, in step s 5, information determines equipment 1 according to the title word in pretreated described object titles Word relevant information, determines the object characterization information with regard to described destination object for the described object titles.Here, in step s 5, Information determines that equipment 1 determines described object according to the word relevant information of the title word in pretreated described object titles Title with regard to described destination object object characterization information with aforementioned in step s 5, information determines equipment 1 according to filtration treatment The word relevant information of the title word in described object titles afterwards determines described object titles with regard to described destination object The mode of object characterization information is same or similar, for simplicity's sake, therefore will not be described here, and comprise by reference with This.
It is right with regard to corresponding destination object for determining object titles that Fig. 4 illustrates in accordance with a preferred embodiment of the present invention Method flow diagram as characterization information.
Wherein, the method comprising the steps of S1 ', step S2 ', step S3 ', step S4 ', step S5 ', step S6 ' and step S7’.Specifically, in step S1 ' in, information determines that equipment 1 obtains multiple training titles;In step S2 ' in, information determines equipment 1 according to the label pattern information in the plurality of training title, sets up or update corresponding label pattern dictionary, wherein, described Label pattern dictionary includes one or more label patterns and its frequency information;In step S3 ' in, information determines that equipment 1 obtains The object titles of pending destination object;In step S4 ' in, information determines equipment 1 according to described label pattern dictionary, to institute State object titles and carry out filtration treatment;In step S5 ' in, information determines equipment 1 according to the described object titles after filtration treatment In title word word relevant information, determine the object characterization information with regard to described destination object for the described object titles;When When described object characterization information is less than predetermined sign threshold information, in step S6 ' in, information determines that equipment 1 determines with regard to described The optimization configured information of object titles;In step S7 ' in, it is described that information determines that described optimization configured information is supplied to by equipment 1 User corresponding to destination object.', step S2 ', step S3 here, step S1 ', step S4 ', step S5 ' with Fig. 3 embodiment The content of middle corresponding step is same or similar, for simplicity's sake, therefore will not be described here, and comprises by reference and this.
Specifically, when described object characterization information is less than predetermined sign threshold information, in step S6 ' in, information determines Equipment 1 determines the optimization configured information with regard to described object titles.Here, how described optimization configured information includes instruction user Described object titles are modified, optimizes to obtain the information of high-quality object titles, as with regard to described object titles Amending advice etc..Here, in step S6 ' in, information determines that equipment 1 determines the optimization configured information with regard to described object titles The including but not limited to following at least any one of mode:
1) summary info according to described destination object, determines described optimization configured information.Specifically, in step S6 ' in, Information determines that equipment 1 can carry out semantic analysis process to the summary info of described destination object first, one or more to obtain Summary keyword;Then, further according to one or more of summary keywords, in title with carrying out matching inquiry in repertorie, with Described optimization configured information is determined according to Query Result.For example, it is assumed that in step S3 ' in, information determines that equipment 1 gets and waits to locate The following destination object object-document of reason:
Title title:Test seven Network Sniffings
Summary info abstract:Based on Ethereal Sniffer software sniff packet, and the data being arrived according to sniff Bag judges network condition.
Body matter information content:【Experimental principle】Network monitoring is a kind of conventional Passive Network attack method, Invader's very unobtainable information of acquisition additive method easily can be helped, including user password, account, sensitive data, IP Address, routing iinformation, TCP socket number etc........
Assume in step S5 ' in, information determines that equipment 1 determines title title with regard to destination object object-document Described object characterization information be less than predetermined characterize threshold information, then in step S6 ' in, information determines that equipment 1 first can be to mesh The summary info abstract of mark object object-document carries out semantic analysis process, to obtain one or more summaries Keyword, such as " Ethereal sniff packet networks situation ";Then, in step S6 ' in, information determines equipment 1 further according to this Summary keyword " Ethereal sniff packet networks situation ", in title with carrying out matching inquiry in repertorie, with according to inquiry Result determines described optimization configured information, as when in described title repertorie matching inquiry to described summary keyword Title term and/or described summary keyword " Ethereal sniff that " Ethereal sniff packet networks situation " matches The total quantity of described keyword is accounted for the quantity that matches of title term in described title repertorie in packet networks situation " Ratio when meeting predetermined threshold such as 0.8, then in step S6 ' in, information determines the described optimization configured information bag that equipment 1 determines Include " in conjunction with summary info, described object titles can be optimized ", otherwise, in step S6 ' in, information determines what equipment 1 determined Described optimization configured information includes " suggestion is optimized " to described object titles.Here, described title repertorie can be located at letter In breath determination equipment 1, may be alternatively located at and determine in the network equipment that equipment 1 is connected by network with information.
2) degree of correlation according to described object titles and the body matter information of described destination object, in conjunction with described target pair The quantity information of the text word of the body matter information of elephant, determines described optimization configured information.Specifically, in step S6 ' in, Information determines that equipment 1 first can be by the title word information matches corresponding to such as described object titles in described destination object The text word information corresponding to body matter information title word quantity information, or, by described object titles with The matching degree of the body matter information of described destination object, determines the body matter letter of described object titles and described destination object The degree of correlation of breath;Then, in step S6 ' in, information determines equipment 1 according to this degree of correlation, in conjunction with the text of described destination object The quantity information of the text word of content information, determines described optimization configured information.For example, connect example, in step S6 ' in, letter Breath determination equipment 1 is first to described object titles, and the body matter information of described destination object carries out semantic analysis process, Obtain title word information " Network Sniffing " corresponding to described object titles, and the body matter information institute of destination object is right The text word information " configuration of network monitoring sniff packet network interface card experimental service " answered;Then, in step S6 ' in, information is true Locking equipment 1 is matched with the body matter information of described destination object in the title word information according to corresponding to described object titles The title word quantity information of corresponding text word information, determines in described object titles and the text of described destination object The degree of correlation of appearance information, the title of the text word information as being matched with corresponding to the body matter information of described destination object Word quantity information and the ratio of described title word total quantity, as the described degree of correlation;Then, in step S6 ' in, information is true Locking equipment 1, according to this degree of correlation, in conjunction with the quantity information of the text word of the body matter information of described destination object, determines institute State optimization configured information, as assumed to be matched with described target in title word information " Network Sniffing " corresponding to title title The title word quantity information of the text word information corresponding to body matter information of object and described title word total quantity Ratio be 100%, then in step S6 ' in, information determines that equipment 1 determines the text of described object titles and described destination object The degree of correlation of content information is 1;Then, in step S6 ' in, information determines equipment 1 according to this degree of correlation 1, combining target object Body matter information content of object-document:Text word quantity information, assume as described in destination object The quantity information of the text word of body matter information there are 20, determine described optimization configured information, such as " can be in conjunction with text Content information is optimized to described object titles ", otherwise, in step S6 ' in, information determines the described optimization that equipment 1 determines Configured information includes " suggestion is optimized " to described object titles.
Here, the present invention passes through the summary info of combining target object and/or the body matter combining described destination object Information is it is achieved that the beneficial effect to 100% for the rate of accuracy reached of the described optimization configured information determining.
Those skilled in the art will be understood that above-mentioned determination with regard to the optimization configured information of described object titles mode only For citing, other existing or determinations of being likely to occur from now on such as can with regard to the mode of the optimizations configured information of described object titles It is applied to the present invention, within also should being included in the scope of the present invention, and here is incorporated herein with way of reference.
Preferably, when the object language type information of described destination object is believed with the title language type of described object titles When ceasing inconsistent, in step S6 ' in, information determines that equipment 1 also can be by described object titles in described object language type information Under corresponding be contained in described optimization configured information with reference to heading message.For example, it is assumed that destination object object-document Object language type information is English, and the title language type information of object titles titile is Chinese, then in step S6 ' In, information determine equipment 1 also can by described object titles titile under described object language type information corresponding with reference to mark Topic information is contained in described optimization configured information, will be contained in institute with reference to heading message by the corresponding English of object titles titile State optimization configured information.
In step S7 ' in, information determines that equipment 1 passes through the dynamic web page technique such as ASP, JSP or PHP, or other The communication mode of agreement, the such as communication protocol such as http or https, described optimization configured information is supplied to described destination object institute Corresponding user, the such as user equipment of this user, read for user and browse.
Preferably, step S6 ' include step S61 ' (not shown) and step S62 ' (not shown).Specifically, when described right When being less than predetermined sign threshold information as characterization information, in step S61 ' in, information determines that equipment 1 determines described destination object Body matter information and the degree of correlation of described title repertorie;In step S62 ' in, information determines equipment 1 according to described correlation Degree, determines described optimization configured information.
Specifically, when described object characterization information is less than predetermined sign threshold information, in step S61 ' in, information determines Equipment 1 determines the body matter information of described destination object and the degree of correlation of described title repertorie.Specifically, when described object When characterization information is less than predetermined sign threshold information, in step S61 ' in, information determines equipment 1 according to described body matter information Corresponding content keyword is matched with the keyword quantity information of the title term in described title repertorie, determines described phase Guan Du, is matched with the keyword quantity information of title term and the institute in described title repertorie as will be described in content keyword State the ratio of content keyword total quantity, as the described degree of correlation.For example, it is assumed that in step S5 ' in, information determines that equipment 1 is true Calibration is inscribed title and is less than predetermined sign threshold information with regard to the described object characterization information of destination object object-document, Then degree of correlation determining unit carries out semantic point first to body matter information content of destination object object-document Analysis is processed, and obtains content keyword " the network monitoring sniff packet network interface card lab-gown corresponding to body matter information content Business configuration ";Then, in step S61 ' in, information determines content keyword according to corresponding to described body matter information for the equipment 1 It is matched with the keyword quantity information of the title term in described title repertorie, determine described object titles and described target pair The degree of correlation of the body matter information of elephant, the keyword quantity information of the title term as being matched with described title repertorie With the ratio of described content keyword total quantity, as the described degree of correlation, as assumed corresponding to body matter information content It is total that the keyword quantity information of the title term that content keyword is matched with described title repertorie accounts for described content keyword The 92% of quantity, then in step S61 ' in, information determines that equipment 1 can determine that in the text of destination object object-document Appearance information content is 0.92 with the degree of correlation of described title repertorie.
Those skilled in the art will be understood that the body matter information of the described destination object of above-mentioned determination and described title are used The mode of the degree of correlation of repertorie is only for example, in other existing or texts of destination object described in determination of being likely to occur from now on Appearance information is such as applicable to the present invention with the mode of the degree of correlation of described title repertorie, also should be included in the scope of the present invention Within, and here is incorporated herein with way of reference.
Then, in step S62 ' in, information determines equipment 1 according to the described degree of correlation, determines described optimization configured information, such as When the described degree of correlation is more than predetermined threshold, determine that include can be in conjunction with body matter information to described right for described optimization configured information As title is optimized ", otherwise, it determines described optimization configured information includes " suggestion is optimized " to described object titles.Example As connected example, in step S61 ' in, information determines that equipment 1 determines the body matter information of destination object object-document Content is 0.92 with the degree of correlation of described title repertorie, more than predetermined threshold such as 0.85, then in step S62 ' in, information Determine that equipment 1, according to this degree of correlation 0.92, determines described optimization configured information, such as " can be in conjunction with body matter information to described right As title is optimized ", otherwise, the described optimization configured information optimizing determining unit determination includes " advising to described object mark Topic is optimized ".
It should be noted that the present invention can be carried out in software and/or software with the assembly of hardware, for example, can adopt Realized with special IC (ASIC), general purpose computer or any other similar hardware device.In an embodiment In, the software program of the present invention can realize steps described above or function by computing device.Similarly, the present invention Software program (including related data structure) can be stored in computer readable recording medium storing program for performing, for example, RAM memory, Magnetically or optically driver or floppy disc and similar devices.In addition, some steps of the present invention or function can employ hardware to realize, example As coordinated thus executing the circuit of each step or function as with processor.
In addition, the part of the present invention can be applied to computer program, such as computer program instructions, when its quilt During computer execution, by the operation of this computer, can call or provide the method according to the invention and/or technical scheme. And call the programmed instruction of the method for the present invention, it is possibly stored in fixing or moveable recording medium, and/or pass through Data flow in broadcast or other signal bearing medias and be transmitted, and/or be stored according to described program instruction operation In the working storage of computer equipment.Here, including a device according to one embodiment of present invention, this device includes using In memory and the processor for execute program instructions of storage computer program instructions, wherein, when this computer program refers to When order is by this computing device, trigger the method based on aforementioned multiple embodiments according to the present invention for this plant running and/or skill Art scheme.
It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie In the case of the spirit or essential attributes of the present invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention is by appended power Profit requires rather than described above limits, it is intended that all in the implication and scope of the equivalency of claim by falling Change is included in the present invention.Any reference in claim should not be considered as limiting involved claim.This Outward it is clear that " inclusion " one word is not excluded for other units or step, odd number is not excluded for plural number.In device claim, statement is multiple Unit or device can also be realized by software or hardware by a unit or device.The first, the second grade word is used for table Show title, and be not offered as any specific order.

Claims (19)

1. a kind of method for determining the object characterization information with regard to corresponding destination object for the object titles, wherein, the method bag Include following steps:
X obtains multiple training titles;
Y, according to the label pattern information in the plurality of training title, sets up or updates corresponding label pattern dictionary, wherein, Described label pattern dictionary includes one or more label patterns and its frequency information, and described step y includes:
- label process is carried out to the plurality of training title, to determine or many corresponding to the plurality of training title Individual label pattern;
- statistical disposition is carried out to described label pattern, to obtain corresponding initial label pattern dictionary, wherein, described initial mark Number pattern dictionary includes included label pattern and its corresponding frequency information in the plurality of training title;
- according to described frequency information, Screening Treatment is carried out to the label pattern in described initial title pattern dictionary, to obtain State label pattern dictionary;
Wherein, the method also includes:
A obtains the object titles of pending destination object;
B, according to described label pattern dictionary, carries out filtration treatment to described object titles;
C, according to the word relevant information of the title word in the described object titles after filtration treatment, determines described object titles Object characterization information with regard to described destination object.
2. method according to claim 1, wherein, the method also includes:
- the described object titles after filtration treatment are pre-processed, to obtain pretreated described object titles;
Wherein, described step c includes:
- according to the word relevant information of the title word in pretreated described object titles, determine that described object titles close Object characterization information in described destination object.
3. method according to any one of claim 1 to 2, wherein, the method also includes:
M, when described object characterization information is less than and makes a reservation for characterize threshold information, determines the optimization instruction with regard to described object titles Information;
- described optimization configured information is supplied to the user corresponding to described destination object.
4. method according to claim 3, wherein, described step m includes:
- when described object characterization information is less than predetermined sign threshold information, according to the summary info of described destination object, determine Described optimization configured information.
5. method according to claim 4, wherein, described step m includes:
- when described object characterization information is less than predetermined sign threshold information, language is carried out to the summary info of described destination object Justice analyzing and processing, to obtain one or more summary keywords;
- according to one or more of summary keywords, in title with carrying out matching inquiry in repertorie, with true according to Query Result Fixed described optimization configured information.
6. method according to claim 3, wherein, described step m includes:
M1, when described object characterization information is less than and makes a reservation for characterize threshold information, determines the body matter information of described destination object The degree of correlation with described title repertorie;
- according to the described degree of correlation, determine described optimization configured information.
7. method according to claim 6, wherein, described step m1 includes:
- when described object characterization information be less than predetermined characterize threshold information when, according to corresponding to described body matter information in Hold the keyword quantity information of title term in described title repertorie for the Keywords matching, determine the described degree of correlation.
8. method according to claim 3, wherein, described step m also includes:
- when the object language type information of described destination object and the title language type information of described object titles inconsistent When, it is contained in described optimization instruction by corresponding under described object language type information for described object titles with reference to heading message Information.
9. a kind of information for determining the object characterization information with regard to corresponding destination object for the object titles determines equipment, wherein, This information determines that equipment includes:
Training acquisition device, for obtaining multiple training titles;
Device set up by dictionary, for according to the label pattern information in the plurality of training title, setting up or updating corresponding mark Number pattern dictionary, wherein, described label pattern dictionary includes one or more label patterns and its frequency information, and described dictionary is built Vertical device is used for:
- label process is carried out to the plurality of training title, to determine or many corresponding to the plurality of training title Individual label pattern;
- statistical disposition is carried out to described label pattern, to obtain corresponding initial label pattern dictionary, wherein, described initial mark Number pattern dictionary includes included label pattern and its corresponding frequency information in the plurality of training title;
- according to described frequency information, Screening Treatment is carried out to the label pattern in described initial title pattern dictionary, to obtain State label pattern dictionary;
Wherein, this information determines that equipment also includes:
Title acquisition device, for obtaining the object titles of pending destination object;
Filtration treatment device, for according to described label pattern dictionary, carrying out filtration treatment to described object titles;
Characterize and determine device, for the word relevant information according to the title word in the described object titles after filtration treatment, Determine the object characterization information with regard to described destination object for the described object titles.
10. information according to claim 9 determines equipment, and wherein, this information determines that equipment also includes:
Pretreatment unit, for pre-processing to the described object titles after filtration treatment, pretreated described to obtain Object titles;
Wherein, described sign determines that device is used for:
- according to the word relevant information of the title word in pretreated described object titles, determine that described object titles close Object characterization information in described destination object.
11. information according to claim 9 or 10 determine equipment, and wherein, this information determines that equipment also includes:
Optimize and determine device, for when described object characterization information is less than predetermined sign threshold information, determining with regard to described right Optimization configured information as title;
Offer device, for being supplied to the user corresponding to described destination object by described optimization configured information.
12. information according to claim 11 determine equipment, and wherein, described optimization determines that device is used for:
- when described object characterization information is less than predetermined sign threshold information, according to the summary info of described destination object, determine Described optimization configured information.
13. information according to claim 12 determine equipment, and wherein, described optimization determines that device is used for:
- when described object characterization information is less than predetermined sign threshold information, language is carried out to the summary info of described destination object Justice analyzing and processing, to obtain one or more summary keywords;
- according to one or more of summary keywords, in title with carrying out matching inquiry in repertorie, with true according to Query Result Fixed described optimization configured information.
14. information according to claim 11 determine equipment, and wherein, described optimization determines that device includes:
Degree of correlation determining unit, for when described object characterization information is less than predetermined sign threshold information, determining described target The body matter information of object and the degree of correlation of described title repertorie;
Optimize determining unit, for according to the described degree of correlation, determining described optimization configured information.
15. information according to claim 14 determine equipment, and wherein, described degree of correlation determining unit is used for:
- when described object characterization information be less than predetermined characterize threshold information when, according to corresponding to described body matter information in Hold the keyword quantity information of title term in described title repertorie for the Keywords matching, determine the described degree of correlation.
16. information according to claim 11 determine equipment, and wherein, described optimization determines that device is additionally operable to:
- when the object language type information of described destination object and the title language type information of described object titles inconsistent When, it is contained in described optimization instruction by corresponding under described object language type information for described object titles with reference to heading message Information.
A kind of 17. computer equipments, determine equipment including the information as any one of claim 9 to 16.
A kind of 18. browsers, determine equipment including the information as any one of claim 9 to 16.
A kind of 19. browser plug-ins, determine equipment including the information as any one of claim 9 to 16.
CN201310260162.4A 2013-06-26 2013-06-26 Method and equipment for determining object representation information of object header Active CN103383697B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310260162.4A CN103383697B (en) 2013-06-26 2013-06-26 Method and equipment for determining object representation information of object header

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310260162.4A CN103383697B (en) 2013-06-26 2013-06-26 Method and equipment for determining object representation information of object header

Publications (2)

Publication Number Publication Date
CN103383697A CN103383697A (en) 2013-11-06
CN103383697B true CN103383697B (en) 2017-02-15

Family

ID=49491487

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310260162.4A Active CN103383697B (en) 2013-06-26 2013-06-26 Method and equipment for determining object representation information of object header

Country Status (1)

Country Link
CN (1) CN103383697B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630909A (en) * 2015-12-21 2016-06-01 北京奇虎科技有限公司 Method and device for displaying normalized header information
CN109740130B (en) * 2018-11-22 2022-12-09 厦门市美亚柏科信息股份有限公司 Method and device for generating file
CN109729348B (en) * 2019-03-07 2020-06-02 腾讯科技(深圳)有限公司 Method, device and equipment for determining video quality

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101315624A (en) * 2007-05-29 2008-12-03 阿里巴巴集团控股有限公司 Text subject recommending method and device
CN102737017A (en) * 2011-03-31 2012-10-17 北京百度网讯科技有限公司 Method and apparatus for extracting page theme
EP2546760A1 (en) * 2011-07-11 2013-01-16 Accenture Global Services Limited Provision of user input in systems for jointly discovering topics and sentiment
CN103136352A (en) * 2013-02-27 2013-06-05 华中师范大学 Full-text retrieval system based on two-level semantic analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101315624A (en) * 2007-05-29 2008-12-03 阿里巴巴集团控股有限公司 Text subject recommending method and device
CN102737017A (en) * 2011-03-31 2012-10-17 北京百度网讯科技有限公司 Method and apparatus for extracting page theme
EP2546760A1 (en) * 2011-07-11 2013-01-16 Accenture Global Services Limited Provision of user input in systems for jointly discovering topics and sentiment
CN103136352A (en) * 2013-02-27 2013-06-05 华中师范大学 Full-text retrieval system based on two-level semantic analysis

Also Published As

Publication number Publication date
CN103383697A (en) 2013-11-06

Similar Documents

Publication Publication Date Title
CN103294781B (en) A kind of method and apparatus for processing page data
US10776885B2 (en) Mutually reinforcing ranking of social media accounts and contents
US20110167053A1 (en) Visual and multi-dimensional search
US20080005091A1 (en) Visual and multi-dimensional search
CN107784092A (en) A kind of method, server and computer-readable medium for recommending hot word
CN107220386A (en) Information-pushing method and device
US8949227B2 (en) System and method for matching entities and synonym group organizer used therein
JP2017157192A (en) Method of matching between image and content item based on key word
CN107609152A (en) Method and apparatus for expanding query formula
CN107346326A (en) For generating the method and system of neural network model
CN103514191A (en) Method and device for determining keyword matching mode of target popularization information
CN111813905B (en) Corpus generation method, corpus generation device, computer equipment and storage medium
CN104035972B (en) A kind of knowledge recommendation method and system based on microblogging
CN103544178A (en) Method and equipment for providing reconstruction page corresponding to target page
CN103399862B (en) Determine the method and apparatus of search index information corresponding to target query sequence
CN105677931A (en) Information search method and device
CN109947952A (en) Search method, device, equipment and storage medium based on english knowledge map
CN107766234A (en) A kind of assessment method, the apparatus and system of the webpage health degree based on mobile device
CN109918648B (en) Rumor depth detection method based on dynamic sliding window feature score
CN109033282A (en) A kind of Web page text extracting method and device based on extraction template
US10127322B2 (en) Efficient retrieval of fresh internet content
CN110096681A (en) Contract terms analysis method, device, equipment and readable storage medium storing program for executing
JP2017157193A (en) Method of selecting image that matches with content based on metadata of image and content
CN103383697B (en) Method and equipment for determining object representation information of object header
CN103257975A (en) Search method, search device and search system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant