CN107944898A - The automatic discovery of advertisement putting building information and sort method - Google Patents

The automatic discovery of advertisement putting building information and sort method Download PDF

Info

Publication number
CN107944898A
CN107944898A CN201610895300.XA CN201610895300A CN107944898A CN 107944898 A CN107944898 A CN 107944898A CN 201610895300 A CN201610895300 A CN 201610895300A CN 107944898 A CN107944898 A CN 107944898A
Authority
CN
China
Prior art keywords
building
information
advertisement
data
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610895300.XA
Other languages
Chinese (zh)
Inventor
李美美
董家毅
夏云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chi Yu Information Technology (shanghai) Co Ltd
Original Assignee
Chi Yu Information Technology (shanghai) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chi Yu Information Technology (shanghai) Co Ltd filed Critical Chi Yu Information Technology (shanghai) Co Ltd
Priority to CN201610895300.XA priority Critical patent/CN107944898A/en
Publication of CN107944898A publication Critical patent/CN107944898A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A kind of automatic discovery of advertisement putting building information and sort method, including:Extraction and selection characteristic of advertisement information from the description content of the requirement of advertiser and advertisement;According to advertised product knowledge mapping database, semantic expansion is carried out to characteristic of advertisement information, semantic expand includes expanding the keyword and attribute that include in characteristic of advertisement information and carry out generalities processing to the result after expansion;Label information by the semantic characteristic of advertisement information expanded and the corresponding concepts attached by building characteristic information in building knowledge mapping database is carried out Semantic Similarity Measurement, obtained similarity mode result is the advertisement putting building information;If point is recommended in the advertisement putting in the advertisement putting building information for same advertised product, there are when more than one, the weight for recommending point according to each advertisement putting recommends point to be ranked up all advertisement puttings.The present invention can improve the efficiency of advertisement putting building and the accuracy of advertisement putting.

Description

The automatic discovery of advertisement putting building information and sort method
Technical field
The present invention relates to computer information technology field, more particularly to a kind of advertisement putting building information it is automatic find with Sort method.
Background technology
In the development experience of internet two stages of web1.0 and web2.0, just develop towards web3.0 directions.web1.0 It is that the content that user's reading website provides, represents website as three big door of Sina, Sohu and Netease characterized by editing. Web2.0 more focuses on the reciprocation of user, and user is both the consumer (viewer) of web site contents, and the system of web site contents The person of making (microblogging, ends of the earth community, from media).
In this internet, it is document that user, which clicks on (or interconnection) least unit accessed, in two generation, document and document it Between connection be to be realized by hyperlink;Since html language lacks semantic, the content in this internet main in two generation It is manward, it is difficult to by computer automatic understanding.And web3.0, that is, Web of Data (semantic net) are the networks of data, What is issued and interconnect on it will be solid data (i.e. things, entity are the Component units of semantic web data) one by one, it Realize the data interconnection and interoperability of entity level (entity-level).
In Web of Data, all entities are all identified using a globally unique definite ID, this ID The identifier (identifier) of target is corresponded to, this way has one in corresponding URL, a database with a webpage Bar record has the specific major key similar;Link between object represents the association between object, a large amount of entities and they Between relation form a huge figure.Using shared data dictionary, i.e., body (body refers to a kind of formalization, For the clear and definite of shared ideas system and it is described in detail) description object and link, it is possible to achieve the chain of distributed data collection Connect.Under this framework, data sharing, management, exchange are more prone to.It is a kind of loose coupling (" loose coupling " i.e. index herein According to pattern decentralization, the non-top-down prior appointment data pattern of implementation, but by the bottom-up spontaneous structure of each data source Data pattern, and carry out the association and management of data), the internet of decentralization.In Web of Data technological frames, number According to be in the form of body existing for, using resource description framework (RDF, Resource Description Framework), The ontology description languages such as network ontology language (OWL, Ontology Web Language) carry out data expression, and use body Query language SPARQL carries out data query.
Knowledge mapping is application and realization based on web of data technological frames.Knowledge mapping is realized to objective world Structuring semantic description is described to from character string, is the Knowledge Mapping (mapping world knowledge) to objective world, The conceptual model and logical foundations that body can be represented as knowledge mapping.Knowledge mapping can describe different levels and granularity Conceptual abstraction.Knowledge mapping can be presented as a huge figure, and the node table in figure shows data source, and the side in figure represents not The identical entity link with data source.Knowledge mapping can be with explicit knowledge's development process and a series of various differences of structural relation Figure, with visualization technique Description of Knowledge resource and its carrier.It can be used for excavating, analyze, build, drawing and explicit knowledge And connecting each other between them.It is by the theory of the subjects such as applied mathematics, graphics, Information Visualization Technology, information science With method efficiently used visually, visually show information.
With new media, the development of new technology, the media involved by advertisement are almost all-embracing, as long as information can be transmitted Medium, it is possible to as the carrier of advertisement.Advertising is increasingly competitive, and the precise positioning requirement to commercial audience is increasingly It is high.Only the advertisement putting with the empirically determined building of ad sales personnel cannot meet the requirements, it is necessary to according to building phase The accurate datas such as pass essential information (such as location, rent etc.), audient's essential information (as building move in company etc.) determine building Advertisement putting is worth, make ad distribution can rapidly decision-making advertisement dispensing, to obtain maximum return.
Above-mentioned building relevant information generally can be by establishing building database realizing, however, building in the prior art Building database be generally adopted by relational database.Those skilled in the art know, in relational database, data definition Description is limited only in database, its data dictionary is mainly available to people with data register, is but not directly used for machine, its Relation is stored in file, SQL code and collective memory (collective memories), it is impossible to is supplied directly to apply journey Sequence.Compared to relational database, the relation in RDF graph data is a kind of hereditary property, is clear and definite model, can be supplied directly to Application program.And the data model in semantic net has a natural extendible ability, and the conjunction of the tables of data in relational database And or the increased cost of field it is very high.Therefore than traditional relational, the diagram data of semantic net be more conducive to data maintenance with Data fusion, and data therein more more can be carried out effectively effective for application program using unified inference engine Data reasoning.
In addition, the building data in the building database of the prior art need artificial collection and typing mostly, and there are certain The problem of a little data are not complete or not accurate enough, can not accurately embody in especially existing building data building with building by Matching between everybody group, so as to be difficult to more accurately carry out advertisement putting.
Map of navigation electronic (abbreviation electronic map), also known as navigation electronic, its main contents include with The social and economic information and static state/dynamic information that road network is the geographical frame information of skeleton, is superimposed on it.It is existing It is open that some electronic map databases can generally provide location Based service (LBS, Location Base Services) Platform, and LBS open platforms are much the free one-stop development service platforms towards developer, can be provided to developer from Data store, manage, retrieving the comprehensive LBS developing instruments of Map Services ability, can not only realize Orientation on map, show, The functions such as mark, retrieval, driving route planning, public transportation enquiry, navigation, can also quickly be realized by modular unit purchase by group, A variety of services such as message push, map business card, allow developer to develop good LBS applications rapidly absorbedly.However, the prior art In electronic map but lack and concentration directly perceived, integrated, that scalability is good is carried out to building relevant information show, so as to can not expire The user of the types such as foot advertisement putting side launches decision-making for auxiliary ad-vertisement and provides the application demand supported.
As it was previously stated, advertising is increasingly competitive, the precise positioning of commercial audience is required it is higher and higher, and only with The empirically determined advertisement of ad sales personnel should launch which suitable building, the either accuracy of advertisement putting, also It is to determine the efficiency of advertisement putting building, all cannot meets actual requirement.Therefore, realized by computer information technology wide Accuse the automatic discovery for launching building information and recommend sequence, thus, it is possible to improve the efficiency of advertisement putting building and ensure that advertisement is thrown The accuracy put, becomes urgent problem to be solved.
The content of the invention
The problem to be solved in the present invention is that the prior art is difficult to realize the automatic discovery of advertisement putting building information and recommends Sequence, to improve the efficiency of advertisement putting building and ensure the accuracy of advertisement putting.
To solve the above problems, technical solution of the present invention provides a kind of automatic discovering method of advertisement putting building information, The advertisement putting building information is to find and identify to be adapted to launch according to the requirement of advertiser and the description content of advertisement to be somebody's turn to do The building information of advertisement, the automatic discovering method include:Extraction and choosing from the description content of the requirement of advertiser and advertisement Select characteristic of advertisement information;According to advertised product knowledge mapping database, semantic expansion is carried out to the characteristic of advertisement information, it is described Semanteme, which expands, to be included expanding and to the result after expansion the keyword and attribute that include in the characteristic of advertisement information Carry out generalities processing;The generalities processing refers to add keyword and attribute included in the result after the expansion The label information of corresponding concepts;By the semantic characteristic of advertisement information expanded and building in building knowledge mapping database The label information of corresponding concepts attached by space characteristic information carries out Semantic Similarity Measurement, obtained similarity mode result For the advertisement putting building information;The Semantic Similarity Measurement includes label candidate generation, label score calculates and label Definite step.
Optionally, mining algorithm is random used by being expanded the keyword included in the characteristic of advertisement information Stroll mining algorithm or restart random walk mining algorithm.
Optionally, keyword included in the result after the expansion includes kernel keyword and expanded keyword, institute It is to extract to obtain from the description content of the requirement of the advertiser and advertisement to state kernel keyword, and the expanded keyword is pair The kernel keyword obtains after carrying out the semantic expansion.
Optionally, the result after described pair of expansion, which carries out generalities processing, includes:Using generalities tag set by described in The kernel keyword, expanded keyword and the attribute included in result after expansion carries out generalities processing, the generalities The definite source of tag set includes the employee of advertisement putting side according to working experience and advertisement putting side's internal corporate resource meter Draw the label information that (ERP, Enterprise Resource Planning) system for content provides, search engine database provides And obtained using clustering algorithm from shopping website cluster analysis.
Optionally, after the generalities tag set is determined, the generalities are carried out using multi-tag sorting algorithm Processing;The multi-tag sorting algorithm is included the multi-tag sorting algorithm based on Adaboost algorithm, is expanded based on traditional decision-tree The multi-tag sorting algorithm of exhibition, multi-tag algorithm of support vector machine, multi-tag k nearest neighbor algorithms, the study of backpropagation multi-tag are calculated Method and multi-tag maximize the one of which in entropy algorithm.
Optionally, the building knowledge mapping database is built-up by the way that method is constructed as below:From more than one data Source obtains building data, and the building data to getting integrate;The building data include the class for being used for building classification Believe belonging to other information, building geography information, building basic information, building with user's information and people's group hunting and labeling Breath;People's group hunting and labeling information be for crowd in building major search content carry out labeling statistics with The information that index obtains after calculating;Building knowledge mapping database is built based on the building data after integrated;It is described it is integrated after Building data are handled by hierarchical classification and structuring;Entity in the building knowledge mapping includes building entity, unit Entity, search label entries, Searching point position entity and unit industry entity;It is geographical that the building entity attributes include building Belonging to information attribute, building basic information attribute, building with user's information attribute and people's group hunting and labeling information category Property, people's group hunting is included under attribute with labeling information attribute with described search label entries, Searching point position entity Information be associated, the information included to user's information attribute with the unit entity under attribute belonging to the building is related Connection;The attribute of the unit entity includes unit essential information attribute, unit trade information attribute and unit operation information attribute, The unit trade information attribute is associated with the information that the unit industry entity is included under attribute;The building feature letter Breath include entity in the building knowledge mapping and it includes attribute.
To solve the above problems, technical solution of the present invention also provides a kind of auto-sequencing side of advertisement putting building information Method, including:It is automatic to find the corresponding advertisement putting of targeted advertisements with the automatic discovering method of above-mentioned advertisement putting building information Building information;If in the corresponding advertisement putting building information of targeted advertisements, the advertisement putting for same advertised product is recommended For point there are when more than one, the weight for recommending point according to each advertisement putting recommends point to be ranked up all advertisement puttings;Institute Stating advertisement putting recommends the weight of point to be determined according to ordering factor, and the ordering factor is pushed away including at least the advertisement putting Recommend similarity a little.
Compared with prior art, technical scheme at least has the following advantages:
On the basis of building knowledge mapping is built, by from the description content of the requirement of advertiser and advertisement extract with Characteristic of advertisement information is selected, and semantic expansion carries out characteristic of advertisement information according to advertised product knowledge mapping database, then is led to Semantic Similarity Measurement is crossed to find and identify relevant potential advertisement putting building information, and in the advertisement putting building information In for same advertised product advertisement putting recommend point exist when more than one, according to each advertisement putting recommend weight All advertisement puttings recommendation point is ranked up, is sorted so as to fulfill the automatic discovery and recommendation of advertisement putting building information, by This can improve the efficiency of advertisement putting building and ensure the accuracy of advertisement putting.
Brief description of the drawings
Fig. 1 is the schematic diagram of the construction method of the building knowledge mapping database of the embodiment of the present invention;
Fig. 2 is the local instance schematic diagram of the building knowledge mapping of the embodiment of the present invention;
Fig. 3 is the flow diagram being standardized in the embodiment of the present invention to building geography information;
Fig. 4 is the flow diagram of Forward Maximum Method algorithm;
Fig. 5 is the relation schematic diagram between want advertisement side's platform, ad serving side's platform and data management platform;
Fig. 6 is the automatic schematic diagram found with sort method of the advertisement putting building information of the embodiment of the present invention.
Embodiment
It is understandable to enable the above objects, features and advantages of the present invention to become apparent, below in conjunction with the accompanying drawings to the present invention Specific embodiment be described in detail.
In the prior art, in order to realize building database that advertisement can be invested in each building exactly and establish, one As using relational database, and relational database there are its it is intrinsic the defects of:It is unfavorable for data maintenance and data fusion, difficulty Effective for application program and data reasoning is effectively performed.On the other hand, the building data in building database are most It need to manually gather and typing, and still suffer from the problem of some data are not complete or not accurate enough, nothing in especially existing building data Method accurately embodies the matching between the audient crowd in building and building, so that advertisement throwing cannot be carried out more accurately accordingly Put.
To solve the above problems, the embodiment of the present invention provides a kind of construction method of building knowledge mapping database, including: Building data are obtained from more than one data source, and the building data to getting integrate;The building data, which include, to be used In the classification information of building classification, building geography information, building basic information, belonging to building with user's information and people's group hunting With labeling information;People's group hunting and labeling information are the major search content for crowd in building into rower The information that label statistic of classification obtains after being calculated with index;Building knowledge mapping database is built based on the building data after integrated; It is described it is integrated after building data handled by hierarchical classification and structuring;Entity in the building knowledge mapping includes building Space entity, unit entity, search label entries, Searching point position entity and unit industry entity;The building entity attributes Comprising belonging to building geography information attribute, building basic information attribute, building with user's information attribute and people's group hunting and mark Sign classification information attribute, people's group hunting and labeling information attribute and described search label entries, Searching point position entity Information under included attribute is associated, and attribute is included with the unit entity with user's information attribute belonging to the building Under information be associated;The attribute of the unit entity includes unit essential information attribute, unit trade information attribute and unit Operation information attribute, the unit trade information attribute are associated with the information that the unit industry entity is included under attribute.
The construction method of the building knowledge mapping database of the embodiment of the present invention may be referred to Fig. 1.
The collection of building data is carried out first.Building data acquisition carries out data acquisition primarily directed to multiple data sources, Data source 1 as shown in Figure 1, data source 2 ... data source n.
In the present embodiment, numerous data sources can be summarized as including three classes, be referred to as the first data source, the second data Source and the 3rd data source, wherein:First data source is (i.e. several to be stored with the relational database of part building data According to storehouse);Second data source is the data that are provided by expert data provider, people's group hunting and labeling information And with the building belonging to the unit trade information associated with user's information be taken at second data source;Described Three data sources are internet data, and the internet data can include and the data of the relevant portal website of building, online hundred The data and search-engine results of section.
When it is implemented, the collection of building data can be responsible for collecting original number from internet by information acquisition module According to, for the later stage data analysis and excavate data basis is provided.Information acquisition module can use web crawlers and adapter skill Art carries out data acquisition, it covers the multiple data sources such as portal website, encyclopaedia, search-engine results and relational database.For Realize that user data source is free, the data source inlet of customization, and built-in Web site format Top Structure Automatic Detecting Arithmetic and net can also be provided Page content automatic fitration algorithm, therefore, user only need to set the basic entrance of website, without by largely matching somebody with somebody Put work.To realize the assessment to data source technorati authority and the quality of data, using different probabilistic polling methods, by data source Relation between the credible and accuracy of data value is used in the thought of ballot, while considers the shadow between different pieces of information value Ring.Such as NEWACCU algorithms, the average of the quasi- technorati authority of the data source mainly used and the turnout of data value is as data source Confidence level participate in calculating, and the different expression form of data value is handled.
The building data of collection include the classification information for being used for building classification, building geography information, building basic information, building Belonging to space with user's information and people's group hunting and labeling information;People's group hunting is for building with labeling information The major search content of crowd carries out the information obtained after labeling statistics is calculated with index in the world, wherein the mark related generally to Label classification includes automobile, mother and baby parent-offspring, educational training, medical treatment & health, software application, reading, customized horoscope etc., by gathering Building data in introduce people's group hunting and labeling information, can make real between the audient crowd in building and building Now accurately matching, subsequently built-up building knowledge mapping database accordingly, just can more accurately carry out advertisement putting;Institute Stating building geography information includes geographical location description or the latitude and longitude coordinates of building;The building basic information is building itself Some essential informations, its lower information can include building title, construction area, the number of plies, elevator number, rent (room rate), whether For landmark building, age of dwellings (including new building), open usage time interval and stream of people's quantity etc.;Belonging to building with user's information The relevant information of user and the owner including building, and it is divided into enterprises and institutions and two major class of population;It is described For the classification information of building classification, for example, rent a house it is short rent a house, second-hand house, new house, the Business Building that writes, retail shop etc..
, should be by above-mentioned three before data modeling (building knowledge mapping structure) is carried out after the completion of building data acquisition Class data source is integrated in storage aspect.Data integration be exactly by the data in several scattered data sources, logically or It is physically integrated into a unified data acquisition system.The core missions of data integration are will be inter-related distributed heterogeneous Data source is integrated together, and allows users to access these data sources in a transparent manner.During data integration, on the one hand The efficiently integrated and reliability of guarantee data is needed, while speed is analyzed and accessed to the second level for being also required to provide PB rank mass datas Degree.
Complete information gathering after, the main data message of acquisition is non-structured text data, it is necessary to it into The processing of row generic text, is allowed to structuring, lays the first stone for further information extraction with data modeling.
Therefore, in the present embodiment, the described pair of building data got carry out the integrated building number included to getting According to being pre-processed, the pretreatment includes:Cleaned into row format, remove noise information;Automatic re-arrangement and automatic classification are carried out, And building data conversion into predetermined format;Generic text processing is carried out, the generic text processing includes participle, part of speech mark Note, syntactic analysis, name Entity recognition, cluster and classification.
Specifically, cleaned first into row format, remove the noise information (such as advertisement) included in webpage;Then carry out automatic Reset and classify automatically, and convert data to the customized form of platform interior;Next, these information are carried out general Text-processing, including participle, part-of-speech tagging, syntactic analysis, name Entity recognition, cluster and classification etc., are follow-up depth number Basis is provided according to analysis and data mining.
Due to the building geography information of acquisition, building basic information, affiliated with user's information, building classification information etc., Many is all by text representation, it is therefore desirable to which the correlation technique excavated using text depth is handled.It can generally use The correlation technique and method of Text Information Extraction, extract foregoing a few class building data to build building knowledge mapping.Text envelope Breath, which extracts, be directed to existing application demand, to people's group hunting and labeling information, building geography information, building basic information, It is all to be extracted with user's information, and for the classification information of building classification, this processing be based on application demand, Core is structure building knowledge mapping.
Therefore, in the present embodiment, the building data structure building knowledge mapping database based on after integrating includes: Using Text Information Extraction method, extract it is integrated after the various information that is included of building data to build building knowledge mapping number According to storehouse;The Text Information Extraction method is included based on artificial constructed regular method, the method for rule-based study, based on machine One or more kinds of combinations in the method for device study.
Furthermore it is also possible to provide the calculation that the data analysis of profound level is carried out on the basis of generic text processing and is excavated Consuming capacity analysis of enterprise information Mining, the industry analysis of building enterprise and building resident family etc. in method, including building.Carry out During data mining, most of algorithm is built based on domain knowledge collection of illustrative plates (building knowledge mapping), improves the accurate of Algorithm Analysis Degree.
In the present embodiment, the data obtained from second data source and the 3rd data source are broadly divided into two classes, Yi Leike It is integrated in integrated storage in existing relational database, it is another kind of, need separately to build database to store it.In data integration, Including for needing to stress is as follows:
Reply from second data source people's group hunting and labeling information and with the building institute Belong to the unit trade information associated with user's information, establish relatively independent and complete RDF graph database, this two category information Lacked in existing relational database, and it is indispensable for establishing building knowledge mapping;
The data structure and its storage mode of the classification information for building classification should further be optimized, made it possible to Convenient be extended refines with classification, additionally needs to believe its further supplement with reference to the classification classified for building Relevant building geography information and building basic information are ceased, further to support the structure of building knowledge mapping.
In the present embodiment, people's group hunting with labeling information and with the building belonging to user's information phase Associated unit trade information is integrated in the RDF graph database accordingly established, and is taken at the data one of the 3rd data source Divide and be integrated in the relational database, another part is integrated in the RDF graph database;The building data based on after integrating Structure building knowledge mapping database includes:Data in the relational database are converted into RDF graph data, and with it is described Data fusion in RDF graph database is in the building knowledge mapping database.
Complete building data it is integrated after, building knowledge mapping number can be built based on the building data after integrated According to storehouse.
For different types of advertisement, its audient crowd invested is different, and structure building knowledge mapping model is To realize " building ← → audient crowd " (i.e. building are matched with audient crowd) this target.Its core is structure building and its phase Close the knowledge hierarchy of information.Consider the structure of this model from this five dimensions of information science " time, space, tissue, people, event " Basis is built, then building knowledge mapping should include following 4 basic knowledge:
The geographical location description of geographical knowledge, i.e. building or latitude and longitude coordinates;
The opening usage time interval of temporal knowledge, i.e. building;
Tissue and crowd's knowledge, the i.e. relevant information of the user of building and the owner, and it is divided into enterprises and institutions With two major class of population;Since the individual in population has uncertain and mobility, to acquisition needed for population Main knowledge should be economic strength (by being determined with the relevant pricing information of building), quantity information (by the stream of people of building Quantity is determined) and people's group hunting information;The main knowledge of enterprises and institutions is its industry, main business and other correlations Information;
Event knowledge, the event that occurs in building is main to be determined by the function of building, and the function of building is main By the industry and main business information (building organized in the classification information (natural functions that building are initially set up) and building of building Artificial caused actual functional capability after foundation) determined.
Therefore, the body frame of building knowledge mapping is as shown in table 1.It should be noted that had using knowledge mapping technology There is very good autgmentability, can constantly be extended by stages according to application demand, the basis of only current desired structure is known in table 1 Know, and can also be not limited to bulk form in specific application (such as can be using other technologies such as correlation rule and labels come auxiliary Help structure knowledge mapping).
Table 1:Building knowledge mapping body frame (model)
(1) entity classification level
(2) entity attributes
Building knowledge mapping includes five primary entities (concept):
● building entity, at present with the building classification information of acquisition, can be divided into 24 fructifications by building entity, but can With further addition classification and it is classified refinement.(data that fructification division may be referred to obtain do further division and classification, and Specific building classification information from data with existing storehouse in addition to obtaining, it is also necessary to is crawled from the website of associated gate family, led at the same time Cross search-engine results and do further supplement.)
● unit entity, including business unit and two fructifications of public institution.
● search label entries.
● Searching point position entity, without fructification.
● unit industry entity, unit domain knowledge collection of illustrative plates can be to third part purchase or customization, the class model and technology It is more mature.
It is the homogeneity of clear and definite entity in the present embodiment, can be ensured by the way of isolation to the building knowledge graph Entity in spectrum occurs all assigning a unique identifier every time, and confirms identical entity by the way of reconciling and closed And.
Entity attributes explanation is referring to as described below in building knowledge mapping.
In the present embodiment, building entity mainly comprising geography information, basic information, affiliated searched with user's information, crowd Four base attributes of rope information.
● geography information attribute, its lower geographical location information that building are primarily referred to as comprising information, i.e. building are in detail Location information (including three kinds of the description of address sort, address text and coordinate), basis is provided for subsequent builds building information map.Its Middle address sort includes " not installing " (representing not installing the building of any advertisement playing device), " frame ", three kinds of " building ", this One mode classification can will set advertisement putting point to be included into the lump in building knowledge mapping with being not provided with the building of advertisement putting point. In general, geographical location information moves in official's net of enterprise in data with existing storehouse, the data of expert data provider offer, building Encyclopaedia information stand, moved in enterprise etc. includes;Or included in the building information in relation to portal website.
● basic information attribute, its lower information include building title, construction area, the number of plies, elevator number, rent (room Valency), whether be landmark building, age of dwellings (including new building), open usage time interval and stream of people's quantity etc..Building sides therein Product, the number of plies, elevator number, rent (room rate), age of dwellings (including new building), stream of people's quantity these information can be directly several from According to being extracted in the data of storehouse, can also be extracted from the building data of the related portal website obtained.Taken out from search-engine results The information taken can also be used as supplement.
● it is all to be divided into affiliated unit (i.e. owner) with using two sub- attributes of unit with user's information attribute.This two Information under a sub- attribute is enterprises and institutions' relevant information in the building under associated " unit " entity.
● people's group hunting information attribute, including (its lower information is to believe under associated " search label " entity to search label Breath), select position (its lower information is information under associated " Searching point position " entity), screen type, installation site and set meal category Property.
Search label entries mainly include days, label class name, three base attributes of label index.Its tag class is under one's name Labeling information includes automobile, mother and baby parent-offspring, educational training, medical treatment & health, software application, reading, customized horoscope etc..Year The data that information extraction under three moon, label class name, label index attributes is provided in the second data source, corresponding data are existing very Good hierarchical classification and (partly) structuring processing, in the knowledge mapping that can be easier to merge structure.
Searching point position entity is mainly comprising floor and two base attributes of quantity.The attribute included with search label entries Data identical, that the information extraction under floor and quantitative attribute is provided in second data source, corresponding data also tegillum fraction Class is handled with (partly) structuring, can easily be extended and is dissolved into knowledge mapping.
Unit entity is mainly comprising three essential information, trade information, operation information base attributes.
● essential information attribute, including organization, set up time, registered capital, four sub- attributes of registration authority.Four The information that sub- attribute is included is essentially from related portal website, in order to accurate, it is proposed that it is main from business standing inquiry net, (Shanghai) enterprises registration register information discloses net, national company information publicity system, the acquisition of Shanghai sincerity Wang Deng official websites (if necessary, can be bought to the administration for industry and commerce).
● trade information attribute, the information under the attribute are the unit trade classification under associated " unit industry " entity Information (is determined) by third part purchase or the unit domain knowledge collection of illustrative plates of customization.
● operation information attribute, including main business and two sub- attributes of major product.The letter that two word attributes are included Breath is mainly drawn from related portal website.
Building knowledge mapping (part) example is shown in Fig. 2.Wherein circular to represent concept or entity, square is that atomic type is (right Character string or numeral are answered, is no longer dissipated).Solid arrow (describes) expression father and son's concept with "comprising", and solid arrow (uses other Relationship description) represent attribute.Dotted arrow represents the belonging relation of concept or inter-entity.The top half of Fig. 2 is conceptual level, under Half portion is divided into instance layer.
Based on the construction method of above-mentioned building knowledge mapping database, the embodiment of the present invention also provides more than one and states structure The built-up building knowledge mapping database of method, it is real that the entity in the building knowledge mapping includes building entity, unit Body, search label entries, Searching point position entity and unit industry entity;The building entity attributes include the geographical letter of building Cease attribute, building basic information attribute, belonging to building with user's information attribute and people's group hunting and labeling information attribute, People's group hunting includes the letter under attribute with labeling information attribute with described search label entries, Searching point position entity Manner of breathing associates, and the information included belonging to the building with user's information attribute with the unit entity under attribute is associated; The attribute of the unit entity includes unit essential information attribute, unit trade information attribute and unit operation information attribute, institute It is associated with the information that the unit industry entity is included under attribute to state unit trade information attribute.
The specific implementation of the building knowledge mapping database may be referred to the structure of above-mentioned building knowledge mapping database The implementation of method, details are not described herein again.
In the present embodiment, by obtaining building data from multiple data sources, and include the building data and be used for building The classification information of classification, building geography information, building basic information, belonging to building with user's information and people's group hunting and label Classification information, and the building data to getting integrate, and building knowledge mapping number is built based on the building data after integrated According to storehouse, building knowledge mapping uses the diagram data of semantic net, can be more conducive to data maintenance and data fusion, and data therein are more Data reasoning can more be effectively performed using unified inference engine, can so construct reality effective for application program More matched building knowledge mapping model between audient crowd in existing building and building, thus, it is possible to more accurately carry out advertisement throwing Put.
Although electronic map of the prior art can realize such as Orientation on map, show, marks, retrieving, driving route rule Draw, public transportation enquiry, numerous functions such as navigation, but lack collection directly perceived, integrated to the progress of building relevant information, that scalability is good In show so that can not meet the users of the types such as advertisement putting side for auxiliary ad-vertisement launch decision-making provide support should Use demand.
For this reason, the embodiment of the present invention also provides a kind of construction method of building information map, including:From more than one data Source obtains building data, and the building data to getting integrate;The building data include the class for being used for building classification Believe belonging to other information, building geography information, building basic information, building with user's information and people's group hunting and labeling Breath;People's group hunting and labeling information be for crowd in building major search content carry out labeling statistics with The information that index obtains after calculating;The building geography information is standardized, the standardization includes position The classification of information and the example of positional information are extracted with backbone data;The search provided by invocation map data place connects Mouthful, the building geography information Jing Guo the standardization is searched for, so that it is determined that its specific position;Pass through invocation map data The mark interface that place provides, the building data after integrating are labeled in the relevant position determined in the figure layer of map, to realize The structure of building information map.
In the present embodiment, the structure of building information map is carried out on the basis of building knowledge mapping is built, building letter It is one of expanded application for relying on the building knowledge mapping and being formed to cease map.
After realizing the collection of building data and integrating, in order to using the Retrieval Interface of electronic map database, also need The building geography information is standardized, the standardization includes classification and the position letter of positional information The example of breath is extracted with backbone data.
In the present embodiment, the electronic map database is specifically using Amap LBS open platforms.During Gao De is The leading numerical map content of state, navigation and location-based service solution provider.High moral LBS open platforms are towards developer Free one-stop development service platform, provided to developer and store from data, manage, retrieve the full side of Map Services ability Position LBS developing instruments, can not only realize Orientation on map, show, mark, retrieving, driving route planning, public transportation enquiry, navigation etc. Function, can also quickly be realized by modular unit purchase by group, message push, a variety of services such as map business card, make developer fast Speed develops good LBS applications absorbedly.The open platform of Amap passes through JavaScript API, static map API, cloud The various ways such as figure are supplied to developer to use, dramatically convenient for users to use.In other embodiments, the electricity Sub- map data base can also use other similar LBS open platforms to realize, such as Baidu map LBS open platforms, Google Map LBS open platforms etc..
, can be building number by calling the interfaces such as positioning, mark and retrieval in Amap LBS in the present embodiment According to being labeled in Amap.
The basic step of building information map structure is as follows:
(1) standardization of building geography information:In order to use the Retrieval Interface of map, it is necessary first to geographical to the building Information is standardized and (can be realized by developing " Address Standardization engine "), including the classification of positional information is (preliminary Be divided into longitude and latitude class, text describes class), the example of positional information and backbone data extract.
In the present embodiment, example and the extraction of backbone data of the positional information may include steps of:Establish place name Dictionary, the vocabulary in described ground thesaurus include ground noun and suffix word;Based on described ground thesaurus, to needing standardization The word string that is included of building geography information carry out positive cutting word;To the building geography information formed after the positive cutting word In ground noun and suffix word be labeled;Building geography information according to the address rule storage being pre-created by mark.
During actual implementation, the forward direction cutting word can use Forward Maximum Method algorithm to carry out.
The basic procedure of the standardization of building geography information is referring also to Fig. 3.Wherein:
1) ground thesaurus are established.Include again:
Ground noun.(a) Chinese province, city, area, county;(b) town of Chinese main cities, rural area, link name.These words Allusion quotation substantially covers the place name vocabulary of road (village) level and the above.
Suffix word.Suffix word is usually the everyday words in Chinese, thus current existing natural language processing instrument is to it Cutting word effect be than more satisfactory.Such as " city ", " area ", " county ".
During actual implementation, due to the cell dictionary (http of search dog input method://pinyin.sogou.com/dict/) carry Sufficient dictionary of place name has been supplied, therefore can therefrom download and use three dictionaries:(a) countries and regions in the world are (standby in the future With);(b) Chinese province, city, area, county;(c) town of Chinese main cities, rural area, link name.These dictionaries cover substantially The place name vocabulary occurred in Chinese enterprise's name.Suffix dictionary, is added by manual analysis data.
2) positive cutting word.Forward Maximum Method algorithm is used, realizes cutting word.The algorithm flow of Forward Maximum Method is, Remember that most long word length is MaxLen in dictionary, in the word string of cutting word is treated, take the word string that length is MaxLen from left to right, with Dictionary is matched, if including the word in dictionary, is branched away, takes the word string of MaxLen length again backward, otherwise, will taken The word string the last character gone out removes, and again with dictionary matching, repeats in this way, until former word string has been handled.Forward direction is most Big matching algorithm flow chart refers to Fig. 4.
3) mark.The place name of address, suffix involved in building geography information are labeled i.e. after cutting word.
4) address rule is created, that is, analyzes the data that Focus provides and crawls data with us, establishes address rule.Should be one Complete logic rules are covered, and with the iterative development for being somebody's turn to do " Address Standardization engine " in the future, new rule can be continuously increased and (adopted The address resolution regulation engine is realized with Drools).In actual implementation, Drools instruments, secondary development address can be downloaded Rule settings module.
5) by regular write into Databasce.The enterprise address (building address) after mark is write using existing address rule Enter database.In the present embodiment, for every building geography information by mark of storage, identifying it, whether specification is complete, Its bright address should be identified per data, and whether specification is complete.In actual implementation, standardized address storage data can be built Library structure, exploitation address date import (based on address rule) module.
6) default addresses for use completion.I.e. using the method for more map API inquiry ballots, the default ground of completion enterprise (building) Location.Since from the building geography information in the building data that multiple data sources obtain, there may be the situation of missing, this reality Apply the standardization in example and further include building geography information default in building data described in completion., can in actual implementation To develop default addresses for use completion module (based on more map API inquiries voting method).
(2) searching interface is called to determine position:By the searching interface for calling Amap to provide, search is by standardization Information, so that it is determined that its specific position.
(3) building map label:Building data are labeled in the figure layer of map;For convenience of with other data separations, Building data exist as a single figure layer.Amap is mainly by using JavaScript API, static map API Or three kinds of modes of cloud atlas carry out physical interface calling and application build:
● current Amap JavaScript API, are the application programming interfaces write by JavaScript language, it It can help the map application that constructing function enriches in website or mobile terminal, interactivity is strong for you.Except base map work( Outside the interface of energy, JavaScript API additionally provide the data services such as local search, route planning, you can be according to certainly Oneself needs making choice property to use.
■ AMap are the NameSpaces of whole Amap JavaScript API, and all classes and object all pass through The mode of AMap.XXX is called;
The call address of ■ JavaScript API is " http://webapi.amap.com/mapsV=1.3 ", basis Class is referring to table 2, including Pixel, Size, LngLat and Bounds.
Table 2:The foundation class of JavaScript API
● static map API service responds HTTP request by returning to a map picture, allows users to by high moral Figure is embedded in graphic form in the webpage of oneself.
■ user specifies request picture size, map level and covering is added on map, such as label, mark, folding Line, polygon or traffic situation, request obtain static images.Such as compiled without Special Statement, the input parameter and output data of interface Code is all unified for utf-8.
The call number of ■ static state API is unrestricted.Mark maximum number 50, label maximum number 50, broken line and polygon Maximum number 4.API service address service address is " http://restapi.amap.com/v3/staticmap Parameters ", parameter are shown in Table 3.
Table 3:Static API parameter lists
● to call under cloud atlas mode, own data are stored directly in high in the clouds by user, are not take up server resource, meanwhile, Data are merged with map base map, and rendering effect is more preferable, and loading faster, and covers the platforms such as Web, H5, Android, iOS.It has Having reduces data-storing cost;Mass data Fast rendering;Own data retrieval, i.e., by the business datum and cloud of developer itself Figure mass data merges beyond the clouds, so as to fulfill personalized data retrieval;Guarantee data security, i.e., stringent permission system, The features such as perfect data backup mechanism.Data can import cloud atlas by following three kinds of modes:
■ PC cloud atlas data management platforms;Store data into excel (2003~2007) or csv (UTF-8/GBK codings) (field is no more than 40) logs into cloud atlas data management platform in file;
■ imports data by server-side cloud storage api interface;
■ is acquired by mobile terminal data acquisition A PP.
In the present embodiment, complete that the building data got integrate and then to the building geography information It is standardized, by interfaces such as search, positioning and the marks provided in invocation map database LBS, search is passed through The building geography information of the standardization, the building data after integrating are labeled in the corresponding positions determined in the figure layer of map Put, so as to fulfill the structure of building information map;The building information map is that the concentration of building information in information bank is showed, Have the characteristics that directly perceived, integrated, scalability is good, can meet the user of the types such as advertisement putting side for auxiliary well Advertisement putting decision-making is helped to provide the application demand supported.
As stated in the Background Art, as advertising is increasingly competitive, the precise positioning requirement to commercial audience is increasingly Which suitable building height, the only empirically determined advertisement with ad sales personnel should launch, either advertisement putting Accuracy, is also to determine the efficiency of advertisement putting building, all cannot meet actual requirement.
Therefore, the embodiment of the present invention also provides automatic discovery and the sort method of a kind of advertisement putting building information, specifically Be in structure building knowledge mapping, and on the basis of successfully realizing the application of building information map, pass through it is semantic expand, semantic phase Like technologies such as degree calculating, realize that the automatic of advertisement putting building information finds and recommend sequence.It should be noted that the present embodiment In the advertisement putting building information found by the requirement according to advertiser to the description content of advertisement it is related to identifying Potential advertisement be adapted to launch building information.
The automatic formation found with sort method of the advertisement putting building information of the present embodiment is simply introduced first Background.
The advertisement marketing science that advertising is a combining information science is calculated, to pursue the comprehensive income of advertisement putting most Target is turned to greatly, emphasis solves the problems, such as user's correlation matched with advertisement and the Competitive Bidding Model of advertisement.Calculating advertising is The fusion of the subjects such as natural language processing, data mining and marketing of bidding, creative design.There are business for a user Business search advertisements, browsing pages launch the diversified forms such as advertisement, community-based population advertisement.The key problem for calculating advertisement is to find Under given environment, the best match of user and advertisement.Calculating advertising includes various systems, wherein wide with the present embodiment Accuse and launch the automatic of building information and find and sort method is relevant there are following 3 kinds of platforms:
(1) DSP (Demand Site Platform) --- party in request's platform.Advertiser can set advertisement on platform Target audience, launch region, advertisement bid etc..
(2) SSP (Sell Side Platform) --- supplier's platform.It is absorbed in advertisement bit optimization, displaying validity Optimization, show optimization of bidding.By this platform, ad slot provider is (such as:Focus Media) highest effectively exhibition can be obtained Show expense, without going out with sale at low prices.
(3) DMP (Data Management Platform) --- data management platform, it is each in merchandising towards advertisement Side, is absorbed in data management, data analysis, data call.Using technology as driving, scattered data are carried out integration and include unification Technology platform, these data are standardized and segmented, and these subdivision results push in existing Marketing Environment, make Advertiser and the equal benefit of supplier.
The automatic of the advertisement putting building information of the present embodiment finds that with sort method be with answering to the wherein research of DMP With as shown in Figure 5 to calculate the relation schematic diagram of each side and knowledge mapping in advertising system.
In Figure 5, all advertisement positions of advertisement putting side belong to supplier (SSP), and building knowledge mapping is included to building The description of each attribute knowledge is (such as:Building --- building type;Building --- building price (rent);Building --- age of dwellings etc., Refer to above-mentioned table 1).Consumption preferences and the taboo of each building aggregation crowd can be released using the inference technology of semantic net.This Outside, the information such as the price of building advertisement position can also be included in building knowledge mapping.The related content of the building knowledge mapping It has been described in detail before this, details are not described herein again.
The client of advertisement putting side is advertiser, belongs to party in request (DSP).Party in request often has when launching advertisement It is required that for example it can propose desired target audience, launch region, advertisement bid etc..In order to more reasonably launch, in DMP systems Should also contain product know-how collection of illustrative plates (can be by third part purchase, such as:This kind of big data supplier of nine powers;Because product Knowledge base needs have certain accumulation just to establish to data such as industry and supply chains, alone complete will take considerable time and into This is excessive), the product of advertiser is analyzed, its product parameters, affiliated industry etc. are found out, easy to pushing away using DMP systems Reason technology more accurately matches advertisement position.
Whole system finally provides dispensing building after precisely analysis supply and demand both sides' feature by matching algorithm (building marking) Space recommends and sequence.
The automatic discovering method of the advertisement putting building information of the embodiment of the present invention specifically includes:From the requirement of advertiser and Extraction and selection characteristic of advertisement information in the description content of advertisement;According to advertised product knowledge mapping database, to the advertisement Characteristic information carries out semantic expansion, it is described it is semantic expand include keyword to being included in the characteristic of advertisement information and attribute into Row expands and carries out generalities processing to the result after expansion;The generalities processing refers to in the result after the expansion Comprising keyword and attribute add the label informations of corresponding concepts;By the semantic characteristic of advertisement information expanded Semantic similarity is carried out with the label information of the corresponding concepts attached by building characteristic information in building knowledge mapping database Calculate, obtained similarity mode result is the advertisement putting building information;The Semantic Similarity Measurement is waited including label Choosing generation, the step that label score calculates and label determines.
In the present embodiment, " the automatic discovery of advertisement putting building information " refers in building information map, according to advertisement Main requirement and ad content are found and identify relevant potential advertisement putting building, so more can intuitively show institute It was found that building.In other embodiments, " the automatic discovery of advertisement putting building information " is not limited in building information Realized in figure, and simply simply provide corresponding information.
In the present embodiment, the building characteristic information includes in the building knowledge mapping each entity for including and every The attribute that a entity is included.
The present embodiment has constructed building knowledge mapping.As shown in Figure 2 and Table 1, in the knowledge mapping there is every building Corresponding attribute, such as:Building " same and International Building " have address, and " Yangpu district of Shanghai Zheng Li roads 477, elevator number 10 etc. belong to Property.These attribute tags are described around building, and the groundwork that building are found automatically is exactly to pass through semantic analysis, are drawn wide The similarity relation with building is accused, and finally provides matching result.It is as follows to the analytical procedure of advertisement, see Fig. 6:
The first step ----feature extraction and selection:Extracted from the description content of the requirement of advertiser and advertisement keyword, The characteristic of advertisement information such as metamessage.Characteristic of advertisement information is to embody the relevant information of characteristic of advertisement, extraction step (Fig. 6 of feature Middle feature extraction and selection) it is fairly simple, refer to extract kernel keyword from the text of description advertisement under normal conditions;Through The specific steps crossed have participle, part-of-speech tagging and name Entity recognition etc..Feature extraction and selection is embodied as this area Technical staff is known, is not described in detail herein.
Second step ----semanteme expands:Using advertised product knowledge mapping, keyword expansion is carried out to characteristic of advertisement information Expand with attribute, and generalities processing is carried out to the result after expansion.
The advertised product knowledge mapping is the product know-how collection of illustrative plates corresponding to the product involved in ad content, the product What knowledge mapping stored usually in the form of database.
In the present embodiment, keyword included in the result after the expansion includes kernel keyword and extension is crucial Word, the kernel keyword are to extract to obtain from the description content of the requirement of the advertiser and advertisement, and the extension is crucial Word is obtained after the semantic expansion is carried out to the kernel keyword.
In the present embodiment, attribute expands the advertised product knowledge mapping utilized can be from third part purchase, Cong Zhongzhu Obtain the information on involved product in advertisement, such as product industry, characteristic, product parameters, product accessories " attribute " (this A little attributes are often unavailable in " first step ";Before buying suitable knowledge base, these attributes can also be provided by advertiser; There is product know-how spectrum data storehouse to analyze more attributes, increase building find accuracy).
In the present embodiment, mining algorithm is used by being expanded the keyword included in the characteristic of advertisement information Walk random (Random Walk) mining algorithm restarts random walk (Random Walk with Restart) excavation calculation Method.
Keyword expansion can use mining algorithm by taking Random Walk algorithms as an example, it is a kind of filter-type based on figure Feature selecting algorithm.The algorithm builds a lexical item to each attribute of each attribute data collection, passes through Ma Er on knowledge mapping Can husband's chain calculate a scoring to each lexical item, then integrate each lexical item in the scoring of each classification, obtain each lexical item One total scoring, so as to obtain the sequence of a lexical item, chooses several maximum lexical items of scoring, as expanded set, from And the keyword that is expanded.
The purpose for carrying out semantic expansion has at following 2 points:
First:Building knowledge mapping and product know-how collection of illustrative plates are built for different field, plus product know-how collection of illustrative plates It is from third part purchase, the two knowledge mappings are typically isomery.The purpose that semanteme expands is to unify word by mining algorithm Converge space and increases the intersection of the two by increasing expanded keyword.
By taking S600 advertisements of running quickly as an example:Perhaps, " benz S600 " one is only extracted by the characteristic of advertisement extraction of the first step Kernel keyword, obtains " the attribute of benz S600 " from advertised product knowledge mapping:With reference to price, brand, oil consumption, standard seat Digit, bearing, lubricating oil, engine etc..The expanded keyword of benz S600 is obtained by mining algorithm:President of a company, company Vice president etc..And can be obtained by mining algorithm in building knowledge mapping " with and International Building " have data mining, big data and The expanded keywords such as vice president.Semanteme, which expands, adds the intersection (Corp. Vice President) of the two.
Second:The attribute directly obtained from knowledge mapping, is sometimes very sparse i.e. fewer, it is also desirable to pass through calculation Method expands lexical space.Especially it is no purchase product know-how collection of illustrative plates, only by advertiser provide product attribute when with greater need for Expand.
After completing above-mentioned two step, also building discovery directly should not be carried out by similarity mode algorithm, because there can be knot The problem of interpretation of fruit is very poor.Draw, might not meet wide this is because attribute and expanded keyword are all algorithms Accuse main or advertisement putting side sales force use habit or be not the vocabulary being concerned about in advertisement (such as:The attribute axis of benz S600 Hold, lubricating oil, engine), and algorithm " attribute+expanded keyword+kernel keyword " that draws there may come a time when up to it is up to ten thousand it It is more, the so big dimension of mankind's indigestion, in order to increase interpretation, it is also desirable to dimensionality reduction.It is important that also, do not having On the premise of having dimensionality reduction, the building for recommending out using matching algorithm can also deviate sometimes, be not necessarily required for user 's.
To solve the above-mentioned problems, need to utilize " generalities label " set before similarity measure by " attribute+extension Keyword+kernel keyword " generalities.Therefore, in the present embodiment, the result progress generalities processing after described pair of expansion can With including:Using generalities tag set by the kernel keyword included in the result after the expansion, expanded keyword Generalities processing is carried out with attribute.
Described to utilize " generalities label " to gather " attribute+expanded keyword+kernel keyword " generalities, this is one Multi-tag classification problem (multi-label classification), it is therefore an objective to labeling is obtained, so as to dimensionality reduction.Such as, run quickly The speed normal seat number of S600, inner space can be conceptualized as " comfort level is extremely excellent ", can be conceptualized as with reference to price, oil consumption " luxurious Wasteful product ".President of a company, Corp. Vice President can be conceptualized as " Top Management ".
Here " generalities label " set can have following source to determine:1) employee of advertisement putting side is according to work ERP system content provides inside experience and advertisement putting side;2) label that search engine database (such as Baidu's data) provides Information;3) obtained using clustering algorithm from some shopping website cluster analyses.After tag set determines, classification learning can be used Algorithm product is handled, and completes product and building " playing (generalities) label ".
In the present embodiment, after the generalities tag set is determined, multi-tag sorting algorithm can be used to carry out institute State generalities processing.The method for solving multi-tag classification problem at present, it is different according to general design idea, two can be classified as Kind, a kind of is the multi-tag sorting algorithm based on single optimization problem, and a kind of is the multi-tag sorting algorithm decomposed based on data. Here the former should be used, this kind of method does not have the structure of change data collection, and the incidence relation between classification, reflects more marks Sign the special nature of classification.Available algorithm include based on Adaboost algorithm multi-tag sorting algorithm (AdaBoost.MH with AdaBoost.MR algorithms), based on the multi-tag sorting algorithm of traditional decision-tree extension, multi-tag algorithm of support vector machine, more marks K nearest neighbor algorithms (ML-KNN), backpropagation multi-tag learning algorithm (BP-MLL) are signed, multi-tag maximizes entropy algorithm (MLME) Deng.
By taking AdaBoost.MH algorithms as an example, basic principle is to be first by m " attributes+expanded keyword+core key The training dataset that word " sample is formed with k label establishes m*k weights (initial weight is identical) respectively, is circulating every time In, reduce its weights for " attribute+expanded keyword+kernel keyword " sample easily classified, and for being difficult to what is classified " attribute+expanded keyword+kernel keyword " sample increases its weights, after repeatedly circulation, is finally predicted with these weights Unknown data concentrates the affiliated label of " attribute+expanded keyword+kernel keyword " sample.
3rd step ----semantic tagger:The characteristic of advertisement information for carrying out semantic expansion (including generalities processing) Semantic Similarity Measurement is carried out with the feature tag of building in building knowledge mapping, by label candidate generation, label score meter Calculate and label determines these three steps, be adapted to launch building type so as to obtain advertisement.Since " generalities label " set includes Focus person works' experience, obtained recommended candidate set will it is more accurate, it is more readily appreciated that and being conducive to system user sieve Choosing.
It should be noted that during the Semantic Similarity Measurement is carried out, except accessing building knowledge mapping data Information in storehouse, can be combined with the information in advertisement point knowledge mapping database, and the advertisement point knowledge mapping database is The knowledge mapping database that " advertisement point " according to characteristic of advertisement is embodied in ad content is formed.
On the basis of the advertisement putting building information is found automatically, if there are one in the advertisement putting building information Point is recommended in advertisement putting more than a, then point can also be recommended these advertisement puttings to be ranked up, so as to fit well on A position for being placed in preferential recommendation is recommended in the advertisement putting of user demand.
Therefore, the embodiment of the present invention also provides a kind of auto ordering method of advertisement putting building information, including:With above-mentioned The automatic discovering method of advertisement putting building information, it is automatic to find the corresponding advertisement putting building information of targeted advertisements;If mesh Mark the corresponding advertisement putting building information of advertisement in, for same advertised product advertisement putting recommend point there are more than one When, the weight for recommending point according to each advertisement putting recommends point to be ranked up all advertisement puttings;The advertisement putting is recommended The weight of point is determined according to ordering factor, and the ordering factor includes at least the similarity that point is recommended in the advertisement putting.
On the basis of the automatic discovery of point is recommended in advertisement putting, the advertisement putting found for identical product recommends point may It is also likely to be multiple to be one, it is necessary to recommend to click through to these advertisement puttings in the case of recommending point there are multiple advertisement puttings (auto-sequencing of advertisement putting building information is the further work(on the basis of the automatic discovery of advertisement putting building information for row sequence It can extend).Sort by advertisement putting recommends the similarity of point to sort, the Conceptual Extension of similarity once, you can be based on Advertisement putting recommends the weight of point to carry out advertisement putting.
The calculating of weight can also consider multiple ordering factors in addition to considering this ordering factor of similarity, such as wide Accuse the main preference to region, advertisement serving policy etc..Therefore, in the present embodiment, the ordering factor further includes advertiser over the ground The preference and advertisement serving policy in domain, the advertisement putting recommend the weight of point comprehensive to be formed by different ordering factor mappings Close weight.After considering multiple ordering factors, sequence polymerization (rank aggregation) class algorithm or row can be used Different ordering factors is mapped as a comprehensive weight by class algorithm, and is finally drawn for sequence study (learning to rank) Ranking results.
By taking class algorithm is merged in arrangement as an example, this method is that multiple arrangements that building are launched on one group of candidate are fused into one The process of the arrangement of a " consistent accreditation ".Generally first according to ordering factor (such as advertiser's geographic preferences, advertisement serving policy Deng) the oriented weighted graph of construction, then arrangement fusion is carried out with 3 kinds of methods:1st kind is out-degree summation with oriented weighted graph And then the difference of in-degree summation is ranked up candidate result according to this fraction as the fraction of candidate result;2nd kind of method It is to use bucket centring point algorithm, similar to the thought of quicksort, a centring point (i.e. some building) is selected every time, then according to figure The both sides for candidate result being placed on centring point are directed toward, then respectively in the candidate result collection on the left side and the candidate result collection phase on the right Same thought recurrence is gone down;3rd kind of method is oriented weighted graph to be regarded as the linking relationship of candidate's building, is calculated with PageRank A steady-state distribution is obtained after method iteration, the last probability according to when being surely distributed is ranked up candidate's building collection.
In the present embodiment, after recommending point to be ranked up and draw ranking results all advertisement puttings, can also according to Family is clicked on or the feedback of advertising results constantly adjusts the sequence polymeric type algorithm or sequence learns the parameter of class algorithm, iteration row Sequence is to optimize the ranking results.Complete advertisement putting recommends point is automatic to find with ordering techniques flow example referring to Fig. 6.
Although the present invention is disclosed as above with preferred embodiment, it is not for limiting the present invention, any this area Technical staff without departing from the spirit and scope of the present invention, may be by the methods and technical content of the disclosure above to this hair Bright technical solution makes possible variation and modification, therefore, every content without departing from technical solution of the present invention, according to the present invention Any simple modifications, equivalents, and modifications made to above example of technical spirit, belong to technical solution of the present invention Protection domain.

Claims (10)

1. a kind of automatic discovering method of advertisement putting building information, it is characterised in that the advertisement putting building information is root The building information for being adapted to launch the advertisement, the automatic hair are found and identified according to the requirement of advertiser and the description content of advertisement Existing method includes:
Extraction and selection characteristic of advertisement information from the description content of the requirement of advertiser and advertisement;
According to advertised product knowledge mapping database, semantic expansion, the semanteme extension packets are carried out to the characteristic of advertisement information Include and the keyword and attribute that are included in the characteristic of advertisement information are expanded and generalities are carried out to the result after expansion Processing;The generalities processing refers to add corresponding concepts to keyword and attribute included in the result after the expansion Label information;
By attached by building characteristic information in the semantic characteristic of advertisement information expanded and building knowledge mapping database Corresponding concepts label information carry out Semantic Similarity Measurement, obtained similarity mode result is the advertisement putting building Space information;The Semantic Similarity Measurement includes label candidate generation, the step that label score calculates and label determines.
2. the automatic discovering method of advertisement putting building information according to claim 1, it is characterised in that to the advertisement Mining algorithm for walk random mining algorithm or restarts random used by the keyword included in characteristic information is expanded Migration mining algorithm.
3. the automatic discovering method of advertisement putting building information according to claim 1, it is characterised in that after the expansion Result included in keyword include kernel keyword and expanded keyword, the kernel keyword is from the advertiser Requirement and advertisement description content in extract and obtain, the expanded keyword is to carry out the semanteme to the kernel keyword Obtained after expansion.
4. the automatic discovering method of advertisement putting building information according to claim 3, it is characterised in that described pair of expansion Result afterwards, which carries out generalities processing, to be included:The core that will be included using generalities tag set in the result after the expansion Heart keyword, expanded keyword and attribute carry out generalities processing, and the definite source of the generalities tag set includes advertisement The employee of dispensing side provides according to ERP system content inside working experience and advertisement putting side, search engine database provides Label information and obtained using clustering algorithm from shopping website cluster analysis.
5. the automatic discovering method of advertisement putting building information according to claim 4, it is characterised in that described in determining After generalities tag set, the generalities processing is carried out using multi-tag sorting algorithm;The multi-tag sorting algorithm bag Include the multi-tag sorting algorithm based on Adaboost algorithm, the multi-tag sorting algorithm based on traditional decision-tree extension, multi-tag Algorithm of support vector machine, multi-tag k nearest neighbor algorithms, backpropagation multi-tag learning algorithm and multi-tag are maximized in entropy algorithm One of which.
6. the automatic discovering method of advertisement putting building information according to claim 1, it is characterised in that the building are known It is built-up by the way that method is constructed as below to know spectrum data storehouse:
Building data are obtained from more than one data source, and the building data to getting integrate;The building data packet Containing classify for building classification information, building geography information, building basic information, belonging to building with user's information and crowd Search and labeling information;People's group hunting and labeling information be for crowd in building major search content into The information that row label statistic of classification obtains after being calculated with index;
Building knowledge mapping database is built based on the building data after integrated;It is described it is integrated after building data by level point Class and structuring processing;Entity in the building knowledge mapping include building entity, unit entity, search label entries, Searching point position entity and unit industry entity;The building entity attributes include building geography information attribute, building basis Belonging to information attribute, building with user's information attribute and people's group hunting and labeling information attribute, people's group hunting and Labeling information attribute is associated with the information that described search label entries, Searching point position entity are included under attribute, described The information included belonging to building with user's information attribute with the unit entity under attribute is associated;The unit entity Attribute includes unit essential information attribute, unit trade information attribute and unit operation information attribute, the unit trade information Attribute is associated with the information that the unit industry entity is included under attribute;
The building characteristic information include the building knowledge mapping in entity and it includes attribute.
A kind of 7. auto ordering method of advertisement putting building information, it is characterised in that including:
With the automatic discovering method of claim 1 to 6 any one of them advertisement putting building information, targeted advertisements are found automatically Corresponding advertisement putting building information;
If in the corresponding advertisement putting building information of targeted advertisements, the advertisement putting for same advertised product recommends point to exist More than one when, according to each advertisement putting recommend point weight to all advertisement puttings recommend point be ranked up;The advertisement Launch and recommend the weight of point to be determined according to ordering factor, the ordering factor includes at least the advertisement putting and recommends point Similarity.
8. the auto ordering method of advertisement putting building information according to claim 7, it is characterised in that the sequence because Element further includes preference and advertisement serving policy of the advertiser to region;It is by different rows that the weight of point is recommended in the advertisement putting The comprehensive weight that the mapping of sequence factor forms.
9. the auto ordering method of advertisement putting building information according to claim 8, it is characterised in that by different rows Sequence factor is mapped as the comprehensive weight and is carried out using sequence polymeric type algorithm or sequence study class algorithm.
10. the auto ordering method of advertisement putting building information according to claim 9, it is characterised in that to all wide Accuse launch recommend point be ranked up and draw ranking results after, according to user click on or advertising results feedback constantly adjustment described in Sort the parameter of polymeric type algorithm or sequence study class algorithm, and iterative sequencing is to optimize the ranking results.
CN201610895300.XA 2016-10-13 2016-10-13 The automatic discovery of advertisement putting building information and sort method Pending CN107944898A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610895300.XA CN107944898A (en) 2016-10-13 2016-10-13 The automatic discovery of advertisement putting building information and sort method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610895300.XA CN107944898A (en) 2016-10-13 2016-10-13 The automatic discovery of advertisement putting building information and sort method

Publications (1)

Publication Number Publication Date
CN107944898A true CN107944898A (en) 2018-04-20

Family

ID=61928490

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610895300.XA Pending CN107944898A (en) 2016-10-13 2016-10-13 The automatic discovery of advertisement putting building information and sort method

Country Status (1)

Country Link
CN (1) CN107944898A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002470A (en) * 2018-06-12 2018-12-14 东方银谷(北京)投资管理有限公司 Knowledge mapping construction method and device, client
CN109191210A (en) * 2018-09-13 2019-01-11 厦门大学嘉庚学院 A kind of broadband target user's recognition methods based on Adaboost algorithm
CN110879843A (en) * 2019-08-06 2020-03-13 上海孚典智能科技有限公司 Self-adaptive knowledge graph technology based on machine learning
CN111198955A (en) * 2020-01-09 2020-05-26 广东博智林机器人有限公司 Method, device, equipment and storage medium for file searching and knowledge graph construction
CN111444394A (en) * 2019-01-16 2020-07-24 阿里巴巴集团控股有限公司 Method, system and equipment for obtaining relation expression between entities and advertisement recalling system
CN111768237A (en) * 2020-06-28 2020-10-13 京东数字科技控股有限公司 Advertisement putting method and device, electronic equipment and storage medium
CN111815357A (en) * 2020-07-09 2020-10-23 湖南数客星球信息技术有限公司 Big data intelligent advertisement system and method based on amazon search word bank
CN112016973A (en) * 2020-08-31 2020-12-01 成都新潮传媒集团有限公司 Advertisement sorting method and device and computer readable storage medium
CN112287179A (en) * 2020-06-30 2021-01-29 浙江好络维医疗技术有限公司 Patient identity matching method combining connection priority algorithm and graph database
CN112364610A (en) * 2020-12-01 2021-02-12 深圳市房多多网络科技有限公司 Method and device for inserting building card in house source article and computing equipment
CN113205365A (en) * 2021-05-07 2021-08-03 武汉连岳传媒有限公司 Mobile internet advertisement intelligent delivery management method based on big data analysis and cloud service platform
CN113570413A (en) * 2021-07-28 2021-10-29 杭州王道控股有限公司 Method and device for generating advertisement keywords, storage medium and electronic equipment
CN114255056A (en) * 2020-09-19 2022-03-29 华为技术有限公司 Advertisement display method and electronic equipment
CN114881689A (en) * 2022-04-26 2022-08-09 驰众信息技术(上海)有限公司 Building recommendation method and system based on matrix decomposition
CN116701549A (en) * 2023-06-21 2023-09-05 黑龙江禹桥科技开发有限公司 Big data multi-scale fusion supervision system and method based on blockchain

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254265A (en) * 2010-05-18 2011-11-23 北京首家通信技术有限公司 Rich media internet advertisement content matching and effect evaluation method
CN103226597A (en) * 2013-04-19 2013-07-31 北京集奥聚合科技有限公司 Keyword advertisement matching method based on natural semantics
CN103853824A (en) * 2014-03-03 2014-06-11 沈之锐 In-text advertisement releasing method and system based on deep semantic mining
CN105183869A (en) * 2015-09-16 2015-12-23 分众(中国)信息技术有限公司 Building knowledge mapping database and construction method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254265A (en) * 2010-05-18 2011-11-23 北京首家通信技术有限公司 Rich media internet advertisement content matching and effect evaluation method
CN103226597A (en) * 2013-04-19 2013-07-31 北京集奥聚合科技有限公司 Keyword advertisement matching method based on natural semantics
CN103853824A (en) * 2014-03-03 2014-06-11 沈之锐 In-text advertisement releasing method and system based on deep semantic mining
CN105183869A (en) * 2015-09-16 2015-12-23 分众(中国)信息技术有限公司 Building knowledge mapping database and construction method thereof

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002470A (en) * 2018-06-12 2018-12-14 东方银谷(北京)投资管理有限公司 Knowledge mapping construction method and device, client
CN109191210A (en) * 2018-09-13 2019-01-11 厦门大学嘉庚学院 A kind of broadband target user's recognition methods based on Adaboost algorithm
CN111444394A (en) * 2019-01-16 2020-07-24 阿里巴巴集团控股有限公司 Method, system and equipment for obtaining relation expression between entities and advertisement recalling system
CN111444394B (en) * 2019-01-16 2023-05-23 阿里巴巴集团控股有限公司 Method, system and equipment for obtaining relation expression between entities and advertisement recall system
CN110879843A (en) * 2019-08-06 2020-03-13 上海孚典智能科技有限公司 Self-adaptive knowledge graph technology based on machine learning
CN110879843B (en) * 2019-08-06 2020-08-04 上海孚典智能科技有限公司 Method for constructing self-adaptive knowledge graph technology based on machine learning
CN111198955A (en) * 2020-01-09 2020-05-26 广东博智林机器人有限公司 Method, device, equipment and storage medium for file searching and knowledge graph construction
CN111768237A (en) * 2020-06-28 2020-10-13 京东数字科技控股有限公司 Advertisement putting method and device, electronic equipment and storage medium
CN112287179A (en) * 2020-06-30 2021-01-29 浙江好络维医疗技术有限公司 Patient identity matching method combining connection priority algorithm and graph database
CN112287179B (en) * 2020-06-30 2024-02-23 浙江好络维医疗技术有限公司 Patient identity matching method combining connection priority algorithm with graph database
CN111815357A (en) * 2020-07-09 2020-10-23 湖南数客星球信息技术有限公司 Big data intelligent advertisement system and method based on amazon search word bank
CN112016973A (en) * 2020-08-31 2020-12-01 成都新潮传媒集团有限公司 Advertisement sorting method and device and computer readable storage medium
CN112016973B (en) * 2020-08-31 2022-05-06 成都新潮传媒集团有限公司 Advertisement sorting method and device and computer readable storage medium
CN114255056A (en) * 2020-09-19 2022-03-29 华为技术有限公司 Advertisement display method and electronic equipment
CN112364610A (en) * 2020-12-01 2021-02-12 深圳市房多多网络科技有限公司 Method and device for inserting building card in house source article and computing equipment
CN113205365A (en) * 2021-05-07 2021-08-03 武汉连岳传媒有限公司 Mobile internet advertisement intelligent delivery management method based on big data analysis and cloud service platform
CN113570413A (en) * 2021-07-28 2021-10-29 杭州王道控股有限公司 Method and device for generating advertisement keywords, storage medium and electronic equipment
CN113570413B (en) * 2021-07-28 2023-12-05 杭州王道控股有限公司 Advertisement keyword generation method and device, storage medium and electronic equipment
CN114881689A (en) * 2022-04-26 2022-08-09 驰众信息技术(上海)有限公司 Building recommendation method and system based on matrix decomposition
CN116701549A (en) * 2023-06-21 2023-09-05 黑龙江禹桥科技开发有限公司 Big data multi-scale fusion supervision system and method based on blockchain

Similar Documents

Publication Publication Date Title
CN107944898A (en) The automatic discovery of advertisement putting building information and sort method
CN105183869B (en) Building knowledge mapping database and its construction method
Wan et al. Aminer: Search and mining of academic social networks
Amato et al. Big data meets digital cultural heritage: Design and implementation of scrabs, a smart context-aware browsing assistant for cultural environments
Scharl et al. The geospatial web: how geobrowsers, social software and the Web 2.0 are shaping the network society
Keßler et al. An agenda for the next generation gazetteer: Geographic information contribution and retrieval
TWI493367B (en) Progressive filtering search results
CN108345596A (en) Building information converged services platform
Hyvönen Semantic portals for cultural heritage
US8682882B2 (en) System and method for automatically identifying classified websites
CN107943810A (en) The construction method of building information map
Chuang et al. Enabling maps/location searches on mobile devices: Constructing a POI database via focused crawling and information extraction
Atzmueller et al. Exploratory pattern mining on social media using geo-references and social tagging information
US20130031458A1 (en) Hyperlocal content determination
Tulić Ceballos The impact of Web 3.0 technologies on tourism information systems
Ardissono et al. Exploration of cultural heritage information via textual search queries
Mata-Rivera et al. A collaborative learning approach for geographic information retrieval based on social networks
Li et al. A case-based reasoning approach for task-driven spatial–temporally aware geospatial data discovery through geoportals
Haris et al. Mining graphs from travel blogs: a review in the context of tour planning
Sakib Preference Oriented Mining Techniques for Location based Point Search
Suresh Kumar et al. Multi-ontology based points of interests (MO-POIS) and parallel fuzzy clustering (PFC) algorithm for travel sequence recommendation with mobile communication on big social media
Kayed et al. Postal address extraction from the web: A comprehensive survey
Liang Intelligent Tourism Personalized Recommendation Based on Multi-Fusion of Clustering Algorithms
Maree et al. Multi-modality search and recommendation on Palestinian cultural heritage based on the holy-land ontology and extrinsic semantic resources
Sun et al. Conflating point of interest (POI) data: A systematic review of matching methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180420

RJ01 Rejection of invention patent application after publication