CN104391893B - Find and track in time the dynamic method of real estate projects - Google Patents

Find and track in time the dynamic method of real estate projects Download PDF

Info

Publication number
CN104391893B
CN104391893B CN201410633346.5A CN201410633346A CN104391893B CN 104391893 B CN104391893 B CN 104391893B CN 201410633346 A CN201410633346 A CN 201410633346A CN 104391893 B CN104391893 B CN 104391893B
Authority
CN
China
Prior art keywords
information
project
library
real estate
soil
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410633346.5A
Other languages
Chinese (zh)
Other versions
CN104391893A (en
Inventor
邓伟
张泽泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Sharp Data Processing Technology Ltd By Share Ltd
Original Assignee
Chengdu Sharp Data Processing Technology Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sharp Data Processing Technology Ltd By Share Ltd filed Critical Chengdu Sharp Data Processing Technology Ltd By Share Ltd
Priority to CN201410633346.5A priority Critical patent/CN104391893B/en
Publication of CN104391893A publication Critical patent/CN104391893A/en
Application granted granted Critical
Publication of CN104391893B publication Critical patent/CN104391893B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The dynamic method of real estate projects is found and tracks in time the invention discloses a kind of;It specifically includes and obtains Land Information, structuring Land Information, soil is stored in Land Information library, judges that real estate index location, extraction project information, acquisition are not logged in new projects' information, association soil and project information and typing project library, the project information of extraction structuring, renewal item follow-up.The beneficial effects of the invention are as follows:The present invention establishes various template and is extracted to information, optimizes extraction as a result, time saving and energy saving and coverage rate is high.

Description

Find and track in time the dynamic method of real estate projects
Technical field
The invention belongs to natural language recognition technical fields, more particularly to a kind of discovery in time and tracking real estate projects dynamic Method.
Background technology
Natural language recognition (Natural Language Processing, NLP) is a kind of to natural language information progress The technology of processing, for philological angle, natural language recognition is also computational linguistics (Computational Linguistics).Natural language recognition includes natural language understanding (Natural Language Understanding, NLU) With spatial term (Natural Language Generation, NLG) two parts.Natural language understanding refers to nature The content of language and the deep layer of intention are held.In artificial intelligence field, natural language understanding refers in particular to computer to natural language Content and intention deep layer hold.Spatial term refers to being input to the processing that natural language exports from unnatural language. Natural language understanding and spatial term inverse process each other.Natural language recognition be artificial intelligence early stage research field it One and particularly important field, mainly include human-computer dialogue and machine translation two main tasks, be one melt linguistics, Computer science, mathematics are in the science of one.Due to using Chomsky as the contribution and calculating of the language school of new generation of representative The development of machine technology, natural language understanding, which is becoming increasingly hot topic, has many cause values to obtain people go to study how to make meter Calculation machine program can use the problem of natural language in some way.Spoken language is the natural form that people communicate, and computer is used Family wishes to and machine dialogue.Natural language input can be expressed as spoken language, can also be squeezed into from keyboard, with the shape of style Formula provides.Information extraction (I nformati on Extracti on) refers to that a specified category information is extracted from one section of text (such as event, fact) and the process used will be inquired for user in its (data for forming structuring) one database of filling. Conditional random fields (field) (conditional random fields, abbreviation CRF or CRFs), are a kind of discriminate probability moulds Type is one kind of random field, is usually used in mark or analytical sequence data, such as natural language word or biological sequence.Such as horse Er Kefu random fields, condition random field are with undirected graph model, and the vertex in figure represents stochastic variable, the line between vertex The dependence relation between stochastic variable is represented, in condition random field, stochastic variable Y's is distributed as conditional probability, given observation Value is then stochastic variable X.In principle, the graph model layout of condition random field can be any given, general common layout It is the framework of chain eliminant, no matter chain eliminant framework is in training (training), inference (inference) or decoding (decoding) on, all there is the higher algorithm of efficiency for calculation." condition random field " is used for Chinese word segmentation and part of speech mark The morphological analyses such as note work, General Sequences disaggregated model usually use hidden Markov model (HMM), as class-based Chinese point Word.But there are two hypothesis in hidden Markov model:It exports independence assumption and Markov property is assumed.Wherein, output is only Vertical property assumes to require the stringent correctness that just can guarantee derivation independently of each other of sequence data, and in fact most of sequence datas are not It can be expressed as a series of independent events.And condition random field then uses a kind of probability graph model, has expression long-distance dependence Property and overlapping property feature ability, the advantages of capable of preferably solving the problems such as mark (classification) biases, and all features can To carry out global normalization, global optimal solution can be acquired.The technology that entity is identified there are many at present, but real estate Industry is different from name place name identification, he has the naming rule of oneself, such as the mixing of digital alphabet Chinese.To different marketing Purpose has a set of corresponding nomenclature principle.Existing template extraction technology does not account for industry characteristic, has daily on network big The project of amount, artificial regeneration is time-consuming and laborious, and coverage rate is not high.
Invention content
In order to solve problem above, the present invention proposes one kind and finding and track in time the dynamic method of real estate projects.
The technical scheme is that:It is a kind of to find and track in time the dynamic method of real estate projects, it specifically includes as follows Step:
S1. it obtains and the relevant target webpage of Land Information, extraction Land Information;
S2. according to the soil and project indicator rule base pre-established, Land Information is subjected to structuring;
S3. Land Information deposit Land Information library of the soil knowledge base by structuring is combined;
S4. real estate knowledge base is read, judges real estate index position;
S5. it obtains and the relevant target webpage of project information, extraction project information;
S6. integration project library naming rule does not log in new projects' information using the acquisition of CRF algorithms;
S7. soil and project information, and typing project library are associated with;
S8. the project information of structuring is extracted;
S9. renewal item follow-up.
Further, These parameters rule base includes the contextual information of index, the scope limitation of index value, type limitation Deng.
Further, These parameters rule base can be updated.
Further, above-mentioned steps S6 specifically comprises the following steps:
S61. project name is labeled using CRF methods;
S62. maximum N number of possibility is calculated according to viterbi algorithm, obtains project name.
The beneficial effects of the invention are as follows:The timely discovery of the present invention and the tracking dynamic method combination real estate row of real estate projects Industry feature establishes various template and is extracted to information, optimize extraction for index feature in conjunction with knowledge base thought As a result, can timely find automatically simultaneously and tracking project from take soil to project of establishing to sale the case where, time saving province Power and coverage rate height.
Description of the drawings
Fig. 1 is the flow diagram of the method for the present invention.
Specific implementation mode
The invention will be further elaborated with specific embodiment below in conjunction with the accompanying drawings.
A kind of timely detailed process for finding and tracking the dynamic method of real estate projects is as shown in Figure 1.
Step S1. is obtained and the relevant target webpage of Land Information, extraction Land Information.
The keyword that Land Information is inputted in existing universal search engine or other search engines scans for, and according to Each web page interlinkage is sequentially chosen in tandem, and the connection that at least two parent pages are chosen from search result is formed Articulation set.The original web page content for obtaining each link instruction in link set respectively, by being detected to original web page, According to the content obtaining at least two of each original web page and the relevant original web page of Land Information as target webpage.Each Determination includes the attribute of Land Information and attribute value information corresponding with Land Information attribute, positioning respectively in target webpage The attribute of Land Information, and extract Land Information attribute and attribute value information corresponding with Land Information attribute.
Land Information is carried out structuring by step S2. according to the soil and project indicator rule base that pre-establish.
Indicator rule library is pre-established according to soil and project information, the information for describing soil and project.This In be that rule is described using programming language, can be updated according to use experience.Items in indicator rule library refer to Mark describes every attribute in the soil and project, and index relationship in indicator rule library describes the association between each index Relationship.These rules include:The contextual information of index, the scope limitation of index value, type limitation etc..Such as:House type area For the rooms 101-148 square meter 3-4;Establish rule:Value range 0-10000;Unit:Square metre;Type:Numeric type;Standardization As a result:House type area:It is 101 square metres minimum, it is 148 square metres maximum;Obtain information model:[x],[n+1],[n+2],[n+3], [index], [number], [range symbol], [number], [square measure].
The Land Information of structuring is stored in Land Information library by step S3. combinations soil knowledge base.
According to the Land Information of structuring in step S2, and according to the indicator rule library pre-established, Land Information is pressed According in the rule storage to Land Information library of formulation.
Step S4. reads real estate knowledge base, judges real estate index position.
Each web analysis can be dom tree by using existing analytical tool by step S41..Here document pair As model DOM is a kind of programming interface for HTML and XML document, it provides a kind of expression side of structuring to document Method, thus it is possible to vary the content and presentation mode of document.The internal logic structure of DOM is usually expressed as the form of node tree.Pass through To the dissection process of html web page, the various elements in html web page are converted into the node object in DOM.If in that can analyze Webpage DOM structure obtain corresponding value automatically in conjunction with DOM structure feature.
Step S42. then combines the rule formulated in indicator rule library, passes through value limiting structure if it is description information That changes extracts information.
Step S5. is obtained and the relevant target webpage of project information, extraction project information.
The keyword of cuit information scans in existing universal search engine or other search engines, and according to Each web page interlinkage is sequentially chosen in tandem, and the connection that at least two parent pages are chosen from search result is formed Articulation set.The original web page content for obtaining each link instruction in link set respectively, by being detected to original web page, According to the content obtaining at least two of each original web page and the relevant original web page of project information as target webpage.Each Determination includes the attribute of project information and attribute value information corresponding with project information attribute, positioning respectively in target webpage The attribute of project information, and extract project information attribute and attribute value information corresponding with project information attribute.
Step S6. integration projects library naming rule does not log in new projects' information using the acquisition of CRF algorithms.
When real estate projects are established, without newest project name inside project library, these project names often have certainly Oneself feature, it is necessary to be labeled in light of its own characteristics.Here, we are believed using CRF algorithms to obtain the new projects being not logged in Breath.CRF refers to conditional random fields (field), is a kind of discriminate probabilistic model, is one kind of random field, is usually used in marking or analyze Sequence data, such as natural language word or biological sequence.Viterbi algorithm is a kind of dynamic programming algorithm most to be had for finding There may be-Viterbi path-hidden state the sequences of observed events sequence, especially in Markoff information source context and In hidden Markov model.Viterbi algorithm also be used to find observation result and most possibly explain that relevant Dynamic Programming is calculated Method.Such as dynamic programming algorithm can be used to find the derivation (solution of most probable context-free in statistical moment theory Analysis) character string, sometimes referred to as " dimension bit analysis ".
Step S51. is labeled project name using CRF methods.
Step S52. calculates maximum N number of possibility according to viterbi algorithm, obtains the new projects' title being not logged in.
The association soils step S6. and project information, and typing project library.
The project information that will be obtained in the Land Information obtained in step S1 and step S5, according to the soil pre-established and The rule formulated in project indicator rule base is associated;And by the soil and project information typing project library after association.
The project information of step S7. extraction structurings.
According to the Rule Information in the indicator rule library pre-established, it can be extracted from project library and obtain association soil letter The real estate projects information of breath and project information.
Step S8. renewal item follow-ups.
According to use experience, the indicator rule library pre-established can be updated.It is carried out more to indicator rule library After new, step S1 to step S7 is repeated, we can obtain the multidate information of real estate projects.
Those of ordinary skill in the art will understand that the embodiments described herein, which is to help reader, understands this hair Bright principle, it should be understood that protection scope of the present invention is not limited to such specific embodiments and embodiments.This field Those of ordinary skill can make according to the technical disclosures disclosed by the invention various does not depart from the other each of essence of the invention The specific variations and combinations of kind, these variations and combinations are still within the scope of the present invention.

Claims (3)

1. a kind of finding and track in time the dynamic method of real estate projects, which is characterized in that specifically comprise the following steps:
S1. it obtains and the relevant target webpage of Land Information, extraction Land Information;
S2. according to the soil and project indicator rule base pre-established, Land Information is subjected to structuring;The indicator rule library Including house type area, value range, unit;The indicator rule library is pre-established according to soil and project information, for describing The information in soil and project;Indices in indicator rule library describe every attribute in the soil and project, index rule Then index relationship in library describes the incidence relation between each index;These rules include:The contextual information of index, index The scope limitation of value, type limitation;
S3. Land Information deposit Land Information library of the soil knowledge base by structuring is combined;
S4. real estate knowledge base is read, judges real estate index position;
S5. it obtains and the relevant target webpage of project information, extraction project information;
S6. integration project library naming rule does not log in new projects' information using the acquisition of CRF algorithms;
S7. soil and project information, and typing project library are associated with;
S8. the project information of structuring is extracted;
S9. renewal item follow-up.
2. a kind of discovery in time as described in claim 1 and the tracking dynamic method of real estate projects, which is characterized in that the finger Mark rule base can be updated.
3. a kind of discovery in time as described in claim 1 and the tracking dynamic method of real estate projects, which is characterized in that the step Rapid S6 specifically comprises the following steps:
S61. project name is labeled using CRF methods;
S62. maximum N number of possibility is calculated according to viterbi algorithm, obtains project name.
CN201410633346.5A 2014-11-11 2014-11-11 Find and track in time the dynamic method of real estate projects Active CN104391893B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410633346.5A CN104391893B (en) 2014-11-11 2014-11-11 Find and track in time the dynamic method of real estate projects

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410633346.5A CN104391893B (en) 2014-11-11 2014-11-11 Find and track in time the dynamic method of real estate projects

Publications (2)

Publication Number Publication Date
CN104391893A CN104391893A (en) 2015-03-04
CN104391893B true CN104391893B (en) 2018-10-30

Family

ID=52609797

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410633346.5A Active CN104391893B (en) 2014-11-11 2014-11-11 Find and track in time the dynamic method of real estate projects

Country Status (1)

Country Link
CN (1) CN104391893B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103309926A (en) * 2013-03-12 2013-09-18 中国科学院声学研究所 Chinese and English-named entity identification method and system based on conditional random field (CRF)
CN103488746A (en) * 2013-09-22 2014-01-01 成都锐理开创信息技术有限公司 Method and device for acquiring business information

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120158759A1 (en) * 2010-07-21 2012-06-21 Saigh Michael M Information processing device for selecting real estate professionals

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103309926A (en) * 2013-03-12 2013-09-18 中国科学院声学研究所 Chinese and English-named entity identification method and system based on conditional random field (CRF)
CN103488746A (en) * 2013-09-22 2014-01-01 成都锐理开创信息技术有限公司 Method and device for acquiring business information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GIS应用数据库设计二阶分析模式;赵秀怡 等;《武汉大学学报.信息科学版》;20030228;第28卷(第1期);第102-103页 *

Also Published As

Publication number Publication date
CN104391893A (en) 2015-03-04

Similar Documents

Publication Publication Date Title
CN109684440B (en) Address similarity measurement method based on hierarchical annotation
CN106874378B (en) Method for constructing knowledge graph based on entity extraction and relation mining of rule model
CN107368468B (en) Operation and maintenance knowledge map generation method and system
US20220147836A1 (en) Method and device for text-enhanced knowledge graph joint representation learning
CN104834747B (en) Short text classification method based on convolutional neural networks
CN106649260B (en) Product characteristic structure tree construction method based on comment text mining
CN107944559B (en) Method and system for automatically identifying entity relationship
CN107463658B (en) Text classification method and device
CN111767725B (en) Data processing method and device based on emotion polarity analysis model
CN109753660B (en) LSTM-based winning bid web page named entity extraction method
CN106055675B (en) A kind of Relation extraction method based on convolutional neural networks and apart from supervision
CN107169079B (en) A kind of field text knowledge abstracting method based on Deepdive
CN107330011A (en) The recognition methods of the name entity of many strategy fusions and device
CN103823857B (en) Space information searching method based on natural language processing
CN106709754A (en) Power user grouping method based on text mining
CN106407113B (en) A kind of bug localization method based on the library Stack Overflow and commit
CN110781670B (en) Chinese place name semantic disambiguation method based on encyclopedic knowledge base and word vectors
CN110362678A (en) A kind of method and apparatus automatically extracting Chinese text keyword
CN103646112A (en) Dependency parsing field self-adaption method based on web search
CN111125367A (en) Multi-character relation extraction method based on multi-level attention mechanism
CN112860898B (en) Short text box clustering method, system, equipment and storage medium
CN112069312A (en) Text classification method based on entity recognition and electronic device
CN115599899A (en) Intelligent question-answering method, system, equipment and medium based on aircraft knowledge graph
CN114997288A (en) Design resource association method
CN115795056A (en) Method, server and storage medium for constructing knowledge graph by unstructured information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 611731 No. 7, No. 7, Tianfu Avenue, Chengdu middle and high tech Zone, Sichuan 1

Applicant after: Chengdu Rui Li data processing techniques LLC

Address before: 611731 401A room 4, 6 building D, Tianfu Software Park, 216 Century City South Road, Chengdu, Sichuan.

Applicant before: CHENGDU REALLYINFOR TECHNOLOGY CO.,LTD.

GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: The method of discovering and tracking the real estate project in time

Effective date of registration: 20201215

Granted publication date: 20181030

Pledgee: Chengdu SME financing Company Limited by Guarantee

Pledgor: Chengdu Rui Li data processing techniques LLC

Registration number: Y2020980009271

PE01 Entry into force of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20211124

Granted publication date: 20181030

Pledgee: Chengdu SME financing Company Limited by Guarantee

Pledgor: Chengdu Rui Li data processing techniques LLC

Registration number: Y2020980009271

PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Methods for timely discovering and tracking the dynamics of real estate projects

Effective date of registration: 20220111

Granted publication date: 20181030

Pledgee: Chengdu SME financing Company Limited by Guarantee

Pledgor: Chengdu Rui Li data processing techniques LLC

Registration number: Y2022980000302

PC01 Cancellation of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20230131

Granted publication date: 20181030

Pledgee: Chengdu SME financing Company Limited by Guarantee

Pledgor: Chengdu Rui Li data processing techniques LLC

Registration number: Y2022980000302

PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A method for timely discovering and tracking the dynamics of real estate projects

Effective date of registration: 20230404

Granted publication date: 20181030

Pledgee: Chengdu SME financing Company Limited by Guarantee

Pledgor: Chengdu Rui Li data processing techniques LLC

Registration number: Y2023980037326