CN109299081A - Clean method, apparatus, computer equipment and the storage medium of room rate data - Google Patents

Clean method, apparatus, computer equipment and the storage medium of room rate data Download PDF

Info

Publication number
CN109299081A
CN109299081A CN201810955918.XA CN201810955918A CN109299081A CN 109299081 A CN109299081 A CN 109299081A CN 201810955918 A CN201810955918 A CN 201810955918A CN 109299081 A CN109299081 A CN 109299081A
Authority
CN
China
Prior art keywords
rate data
field
room rate
record
room
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810955918.XA
Other languages
Chinese (zh)
Other versions
CN109299081B (en
Inventor
王先锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN201810955918.XA priority Critical patent/CN109299081B/en
Publication of CN109299081A publication Critical patent/CN109299081A/en
Application granted granted Critical
Publication of CN109299081B publication Critical patent/CN109299081B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/16Real estate

Landscapes

  • Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Method, apparatus, computer equipment and the storage medium of cleaning room rate data proposed by the present invention, wherein method includes: to obtain initial room rate data;When the field in initial room rate data with sequence dislocation, processing is adjusted to obtain correct first room rate data of order of the field to the sequence of field in initial room rate data;When the first room rate data lacks field, fill a vacancy to first room rate data is handled to obtain complete second room rate data of field;When the second room rate data has invalid or repeat character (RPT), reject invalid or repeat character (RPT) denoising to the second room rate data to obtain third room rate data;When third room rate data includes the record of redundance, the duplicate removal processing of redundance record is removed to obtain standard rate data to third room rate data, the present invention is to the processing such as the data carry out sequence adjustment crawled, field of filling a vacancy, denoising, duplicate removal, to obtain accurate, unique standard rate data, facilitate subsequent use.

Description

Clean method, apparatus, computer equipment and the storage medium of room rate data
Technical field
The present invention relates to the technical fields of data processing, especially relate to method, the dress of a kind of cleaning room rate data It sets, computer equipment and storage medium.
Background technique
The general room rate data crawled by crawler is all more at random, since the operation system standard of each website is different Cause, service fields are inconsistent or language differential expression, so as to cause the room rate data disunity or even room rate data crawled Only part field is effective, it is difficult to therefrom obtain authentic and valid information, on the other hand, use number between the system of room rate data May be inconsistent according to standard, cause interaction unimpeded, so needing to clean the room rate data crawled, data cleansing It is that the data of different-format, different expression are unified into the data for meeting predetermined format requirement, so that data normalization, is convenient for Subsequent data processing, but it is big to the room rate data cleaning difficulty crawled currently on the market, and the general difficulty of the result of cleaning reaches To desired effect.
Summary of the invention
The main object of the present invention is the method, apparatus for providing a kind of convenient quickly cleaning room rate data of cleaning, calculates Machine equipment and storage medium.
The present invention proposes a kind of method for cleaning room rate data, comprising: obtains initial room rate data;
When the field in the initial room rate data with sequence dislocation, to the sequence of field in the initial room rate data Processing is adjusted to obtain correct first room rate data of order of the field;
When first room rate data lacks field, fill a vacancy to first room rate data, it is complete to obtain field to handle The second whole room rate data;
When second room rate data has invalid or repeat character (RPT), to second room rate data reject it is invalid or The denoising of repeat character (RPT) is to obtain third room rate data;
When record of the third room rate data comprising redundance, it is more that repetition is removed to the third room rate data The duplicate removal processing of remaining record is to obtain standard rate data.
Further, the sequence to field in the initial room rate data is adjusted processing to obtain the first room rate The step of data, comprising:
Read the field of every record in the initial room rate data;
Judge the field format whether with preset field format match;
If the format of the field and preset field format mismatching, whether judge in the field comprising preset field Critical field in format;
If so, according to the sequence of the critical field in the preset field format to the field carry out sequence adjustment, To obtain the first room rate data.
Further, described fill a vacancy processing to first room rate data to obtain the complete second room rate number of field According to the step of, comprising:
Read the field of every record in first room rate data;
According to the preset field format with described every record field matching result or the record in field it Between incidence relation judge whether to lack field;
If so, searching the absent field in first room rate data in preset table;
The absent field is padded to the position of the absent field, to obtain second room rate data.
Further, described reject invalid or repeat character (RPT) denoising to second room rate data to obtain After the step of third room rate data, comprising:
Room rate unit in the third room rate data is handled by conversion, to form unified room rate unit.
Further, described that the duplicate removal processing of redundance record is removed to be marked to the third room rate data The step of quasi- room rate data, comprising:
Every record in the third room rate data is grouped according to preset core field;
The non-core field respectively recorded in the grouping is rejected multiple with identical core field to obtain Record, the non-core field are except the field except the preset core field;
Multiple records with identical core field are deleted to obtain only remaining one and there is core field note The standard rate data of record.
Further, the step of acquisition initial room rate data, comprising:
The page data in house property website is crawled by reptile instrument;
The initial room rate data is searched in the page data, and obtains the initial room rate data.
Further, described that the duplicate removal processing of redundance record is removed to be marked to the third room rate data After the step of quasi- room rate data, comprising:
According to the standard rate data acquisition room rate;
Average room rate is calculated by preset rules in the room rate.
The device of cleaning room rate data proposed by the present invention, comprising:
First obtains module, for obtaining initial room rate data;
First processing module, for working as the field in the initial room rate data with sequence dislocation, to the initial room Valence mumber order of the field progress sequence adjustment in is handled to obtain correct first room rate data of order of the field;
Second processing module mends first room rate data for lacking field when first room rate data Processing is lacked to obtain complete second room rate data of field;
Third processing module, for there is invalid or repeat character (RPT) when second room rate data, to second room rate Data reject invalid or repeat character (RPT) denoising to obtain third room rate data;
Fourth processing module, for including the record of redundance when the third room rate data, to the third room rate Data are removed the duplicate removal processing of redundance record to obtain standard rate data.
The present invention also provides a kind of computer equipment, including memory and processor, the memory is stored with computer The step of program, the processor realizes the above method when executing the computer program.
The present invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, the computer The step of above method is realized when program is executed by processor.
The invention has the benefit that crawl the adjustment of room rate data carry out sequence, field of filling a vacancy, denoising, duplicate removal Deng processing, to obtain accurate, unique room rate data, facilitate subsequent processing, and cleaning process is convenient quickly, it can be with root Average room rate is calculated according to room rate data, so that the flat price for assessing different community is horizontal, and can be with when user being allowed to buy house With reference to.
Detailed description of the invention
Fig. 1 is the step schematic diagram that the method for room rate data is cleaned in one embodiment of the invention;
Fig. 2 is the step schematic diagram that the method for room rate data is cleaned in another embodiment of the present invention;
Fig. 3 is the structural schematic block diagram that the device of room rate data is cleaned in one embodiment of the invention;
Fig. 4 is the first structural schematic block diagram for obtaining module in one embodiment of the invention;
Fig. 5 is the structural schematic block diagram of first processing module in one embodiment of the invention;
Fig. 6 is the structural schematic block diagram of Second processing module in one embodiment of the invention;
Fig. 7 is the structural schematic block diagram of fourth processing module in one embodiment of the invention;
Fig. 8 is the structural schematic block diagram that the device of room rate data is cleaned in another embodiment of the present invention;
Fig. 9 is the structural schematic block diagram that the device of room rate data is cleaned in another embodiment of the present invention;
Figure 10 is the structural schematic block diagram of the computer equipment of one embodiment of the invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
Referring to Fig.1, the method for the cleaning room rate data in the present embodiment, comprising:
Step S1: initial room rate data is obtained;
Step S2: when the field in the initial room rate data with sequence dislocation, to word in the initial room rate data The sequence of section is adjusted processing to obtain the first room rate data;
Step S3: when first room rate data lacks field, fill a vacancy to first room rate data is handled to obtain To complete second room rate data of field;
Step S4: when second room rate data has invalid or repeat character (RPT), second room rate data is picked Except invalid or repeat character (RPT) denoising is to obtain third room rate data;
Step S5: when record of the third room rate data comprising redundance, the third room rate data is gone Except the duplicate removal processing of redundance record is to obtain standard rate data.
In the present embodiment, above-mentioned initial room rate data is obtained by the page data that reptile instrument crawls house property website, Page data includes that above-mentioned initial room rate data, room rate data be related or other unrelated data, and wherein room rate data includes The data such as city, cell, address, title, price, specifically, step S1 include: to be crawled in house property website by reptile instrument Page data, the initial room rate data that do not cleaned is then searched in page data, to obtain initial room rate data.
As described in above-mentioned steps S2-S5, after obtaining initial room rate data, room rate data need to be cleaned, i.e., successively The adjustment of carry out sequence fills a vacancy, denoises and duplicate removal processing, most afterwards through handling the standard rate data needed, said sequence Adjustment be by the field to misplace in room rate data adjust, when in initial room rate data order of the field dislocation, then progress sequence Adjustment processing is to obtain correct first room rate data of order of the field, if the order of the field in initial room rate data does not have mistake Position then directly carries out processing of filling a vacancy in next step.Processing of filling a vacancy fills up the field of vacancy completely, when the correct room of order of the field Valence mumber lacks field according to (the first room rate data or initial room rate data), then fill a vacancy processing to the data to obtain field Complete second room rate data, without carrying out processing of filling a vacancy, directly can be walked once at denoising if the data field is complete Reason.Denoising is to reject invalid or repeat character (RPT) in room rate data, when the complete room rate data of field (initial room rate number According to or the first room rate data or the second room rate data) there is invalid or repeat character (RPT), then denoising is carried out to the data and obtained Third room rate data, if the data do not have invalid or duplicate character, without carrying out denoising and directly carrying out next Walk duplicate removal processing.Duplicate removal processing is to remove duplicate, unnecessary record, when room rate data (initial room rate data or the One room rate data or the second room rate data or third room rate data) include redundance record, then it is last to carry out duplicate removal processing The standard rate data that can be used.
In one embodiment, above-mentioned steps S2, comprising:
Step S20: the field of every record in the initial room rate data is read;
Step S21: judge the field format whether with preset field format match;
Step S22: if the format of the field and preset field format mismatching, judge in the field whether include Critical field in preset field format;
Step S23: if so, being carried out according to the sequence of the critical field in the preset field format to the field suitable Sequence adjustment, to obtain the first room rate data.
In the present embodiment, initial room rate data has a plurality of record, and each record is corresponding with corresponding field, reads The field of every record, that is, obtain the format of field in initial room rate data, namely is aware of the suitable of each field in field Sequence in the step s 21 compares the format of above-mentioned field and preset field format, judge field format whether in advance If field format matches, preset field format is pre-set order of the field format, in step S22, if the lattice of above-mentioned field Formula is matched with preset field format, then illustrate above-mentioned order of the field be correctly, do not need to be adjusted order of the field, if Above-mentioned field format and preset field format mismatch, i.e. description field sequence error, at this moment need to judge in field whether Comprising the critical field in preset field format, specifically, the default pass in preset field format is searched in above-mentioned field Key field illustrates that above-mentioned field is appropriate for sequence and adjusts if critical field can be found, at this moment can be according to above-mentioned default The sequence of critical field in field format is to above-mentioned field carry out sequence adjustment, if not comprising predetermined word in above-mentioned field At this moment critical field in paragraph format illustrates that above-mentioned field is not suitable for adjustment sequence, then can field to the sequence error into Row filtering removal.
Citing ground, preset room rate data order of the field format is " urban addresses price ", but reads room rate data Field be " city the A address B price C ", since the field format and preset field format mismatch, at this moment then determine the word Section sequence error, then judges that the field " city the A address B price C " of the sequence error if appropriate for adjustment, that is, judges field " A Whether the city address B price C " is comprising critical field " city ", " address ", " price " in preset field format, in this example, Determine comprising after critical field, by the field of said sequence mistake according to preset field Format adjusting at the " city the A address C B valence At this moment lattice " similarly determine the field format and predetermined word if the field for reading room rate data is " C number of a-quadrant B cell " Paragraph format mismatches, but due to searching the critical field " city less than preset field format in " B number of a-quadrant B cell " Address price ", at this moment judgement are not suitable for adjusting the sequence of the field, directly the Field Sanitization are removed.
In one embodiment, above-mentioned steps S3, comprising:
Step S30: the field of every record in first room rate data is read;
Step S31: according to the matching result or the record of the preset field format and the field of described every record Incidence relation between middle field judges whether to lack field;
Step S32: if so, searching the absent field in first room rate data in preset table;
Step S33: being padded to the position with the absent field for the absent field, to obtain the second room rate number According to.
In the present embodiment, every record in above-mentioned first room rate data is first read, then according to above-mentioned preset field lattice Incidence relation in formula or record between field judges whether to lack field, specifically, the incidence relation between above-mentioned field Including the relationship according to field semantics correspondence, if illustrating to lack field, such as the city A according to not corresponding to before and after Semantic judgement The flat price in city is 10,000 yuan every square metre, and can obtain corresponding field below by the semanteme of flat price under normal circumstances is tool Body price, such as above-mentioned 10,000 yuan every square metre, and the subsequent field of flat price is not price, illustrates that the record lacks field. In the present embodiment, when judging whether to lack field according to above-mentioned preset field format, as certain record in room rate data is normal Field is " flat price of the address xxx in certain city is xxx ", but the record field crawled is " the house property in certain city Price is xxx ", then can determine whether that this record has lacked address field according to above-mentioned preset field format " urban addresses price "; When judging whether to lack field according to the incidence relation in above-mentioned record between field, just such as certain record in room rate data Normal field is " flat price of the address xxx in certain city is xxx ", but the record field crawled is " the ground in certain city The flat price of location is ", then it can determine whether out that this record has lacked the specific price of house property according to the incidence relation between field And the field of specific address.
In step s 32, if judging, above-mentioned first room rate data lacks field, can be by searching in preset table The absent field of the first room rate data is stated, specifically, above-mentioned preset table is whole by the web data crawled in house property website Reason and is formed, it is to be understood that the data for the full page that reptile instrument crawls, and room rate data only the page certain A region, other regions of the page have been also possible to content relevant to room rate data, but it is desirable that room rate data, so Only room rate data is cleaned, other data are not cleaned, when lacking certain fields in room rate data, so that it may is logical It crosses and searches content relevant to room rate data, check whether that there are these absent fields, if these absent fields are found, by this A little absent fields are padded to the position of above-mentioned absent field, and in the present embodiment, preset table is obtained by page data arrangement, The preset table include the above-mentioned initial room rate data that do not cleaned, all data relevant to room rate data (such as with it is above-mentioned The duplicate data of initial room rate data) and other data (such as ad data) unrelated with room rate data, so can be default The absent field in record is searched in table, specifically match with preset table by above-mentioned record and then obtains missing word Then the absent field is filled to the position of absent field, such as mends above-mentioned field " flat price in certain city is xxx " by section It is together " flat price of the address xxx in certain city is xxx ".If being searched less than above-mentioned record, also in preset table With less than absent field, then above-mentioned record filtering can be deleted.
It in another embodiment, can also be the field for lacking field according to the incidence relation in the record between field It matches corresponding absent field, and the absent field is filled between respective field, in the present embodiment, above-mentioned incidence relation Including the relationship according to field syntax correspondence, then absent field described herein is simple conjunctive word field, if climbing Get it is above-mentioned be recorded as " the flat price xxx of certain city address xxx ", then can pass through the incidence relation between above-mentioned field It matches corresponding field, and the field is filled between corresponding field, such as by the " flat price of certain city address xxx Xxx " filling is completed to be " flat price of the address xxx in certain city is xxx ".
In one embodiment, the above-mentioned field in second room rate data is denoised to obtain third room rate number According to, specifically, denoising include remove invalid or duplicate character, first determine whether the field in room rate data whether include Invalid or duplicate character, it is the field that null value, messy code and content are not inconsistent that wherein idle character, which includes field value, not such as content The field of symbol can be with are as follows: price field but should not be number for number, and as content is not inconsistent.If judging in room rate data Field is invalid or repeat character (RPT), then rejects above-mentioned invalid or repeat character (RPT).It such as should be the price field packet of number in record Text has been included, then these texts are idle character, these texts have at this moment been removed, to achieve the purpose that regular price field.
It further, can be to the unit of room rate after above-mentioned third room rate data being removed invalid or repeat character (RPT) It is handled by conversion, to form unified room rate unit, as by " thousand yuan every square metre ", " ten thousand yuan every square metre ", " thousand yuan are often put down The inconsistent room rate unit field of square decimetre " etc. is all unified into same unit field according to default rule, such as according to Above-mentioned inconsistent room rate unit is all converted into " ten thousand yuan every square metre " by formula scales, facilitates subsequent use.
In one embodiment, above-mentioned steps S5, comprising:
Step S50: every record in the third room rate data is grouped according to preset core field;
Step S51: the non-core field respectively recorded in the grouping is rejected multiple with phase same core to obtain The record of heart field, the non-core field are except the field except the preset core field;
Step S52: multiple records with identical core field are deleted to obtain only remaining one with core The standard rate data of heart field record.
Duplicate removal processing mainly removes duplicate record according to core field, carries out in the present embodiment to third room rate data Duplicate removal processing specifically first presets core field, which includes cell name, cell address, price, area, city Then every record of third room rate data is grouped according to core field, i.e., will have identical core words by the fields such as city The record of section is divided into one group, carries out duplicate removal processing to each record in each group, i.e., by the non-core word respectively recorded in grouping Duan Jinhang is rejected, to retain the record of the cores field such as cell name, cell address, price, area, city, to core words Section is grouped duplicate removal, guarantees that the data of a grouping are recorded containing only one, above-mentioned non-core field is except described preset Field except core field.Citing, two records in third room rate data: the field of first record includes cell name Claiming, cell address and finishing price, the field of Article 2 record includes cell name, cell address and parking stall price, Middle cell name and cell address are core field, remaining is non-core field, due to the core field of the two records Field it is identical, can be grouped into the same group, when duplicate removal can then remove non-core field, such as remove the finishing price in first record The parking stall price field of field and Article 2 record, so that two records for containing identical core field are only left in the grouping, Then a wherein record for redundance is removed, wherein the record includes cell name and cell address field, to protect Card guarantees only to remain next effective, standard room rate data.
In one embodiment, after reference Fig. 2, above-mentioned steps S5, comprising:
Step S6: according to the standard rate data acquisition room rate;
Step S7: average room rate is calculated by preset rules in the room rate.
The room rate data of acquisition obtains accurate, unique 4th room rate number after above-mentioned steps S2-S5 is processed According at this moment can obtaining room rate from processed 4th room rate data, then, be counted according to the room rate according to preset rules Calculate, specifically, by above-mentioned room rate press corresponding House Property Area Surveying weighted calculation average room rate, wherein above-mentioned acquisition room rate include with Lower two kinds of situations: one is the room rates for having plot area, and one is the room rate of not plot area, specific preset rules are such as Under: when the data crawled have plot area, average room rate at this moment can be by avg1=sum (plot area * unit price)/sum (plot area) calculates, wherein the average price of cell can by formula cell average price=(avg1*count1+avg2*count2+ ...+ Avgn*countn)/(count1+count2+ ... countn) is calculated and is obtained, and above-mentioned avg is the flat of every place's house property of cell Equal room rate, count are the area of the every place's house property of cell;When the data crawled do not have plot area, crawling at this moment Room rate be defaulted as average room rate, the flat price that above-mentioned average room rate can be used for assessing different community is horizontal, Yi Jirang User when buying house can with reference to etc..
Referring to Fig. 3, the device of room rate data is cleaned in the present embodiment, comprising:
First obtains module 100, for obtaining initial room rate data;
First processing module 200, for working as the field in the initial room rate data with sequence dislocation, to described initial The sequence of field is adjusted processing to obtain the first room rate data in room rate data;
Second processing module 300 carries out first room rate data for lacking field when first room rate data Processing fill a vacancy to obtain complete second room rate data of field;
Third processing module 400, for there is invalid or repeat character (RPT) when second room rate data, to second room Valence mumber is according to reject the denoising of invalid or repeat character (RPT) to obtain third room rate data;
Fourth processing module 500, for including the record of redundance when the third room rate data, to the third room Valence mumber is according to the carry out duplicate removal processing of removal redundance record to obtain standard rate data.
In the present embodiment, above-mentioned initial room rate data is obtained by the page data that reptile instrument crawls house property website, Page data includes that above-mentioned initial room rate data, room rate data be related or other unrelated data, and wherein room rate data includes The data such as city, cell, address, title, price, specifically, referring to Fig. 4, the first acquisition module 100 includes:
Crawl submodule 110: for crawling the page data in house property website by reptile instrument;
Search submodule 120: for searching initial room rate data in page data, to obtain initial room rate data.
After getting initial room rate data, initial room rate data need to be cleaned, it can successively carry out sequence tune It is whole, fill a vacancy, denoise and duplicate removal processing, most handled the standard rate data needed afterwards, said sequence adjustment be by Misplace in room rate data field adjustment, when in initial room rate data order of the field dislocation, then progress sequence adjustment processing from And correct first room rate data of sequence is obtained, if the order of the field in initial room rate data does not misplace, directly carry out down One step is filled a vacancy processing.Processing of filling a vacancy fills up the field of vacancy completely, when order of the field correct room rate data (the first room rate Data or initial room rate data) lack field, then fill a vacancy processing to the data to obtain the complete second room rate number of field According to without carrying out processing of filling a vacancy, directly can once being walked denoising if the data field is complete.Denoising is Invalid or repeat character (RPT) in room rate data is rejected, when the complete room rate data of field (initial room rate data or the first room rate number According to or the second room rate data) there is invalid or repeat character (RPT), then denoising is carried out to the data and obtains third room rate data, if The data do not have invalid or duplicate character, then without carrying out denoising and directly carrying out next step duplicate removal processing.Duplicate removal Processing be by duplicate, unnecessary record data remove, when room rate data (initial room rate data or the first room rate data or Second room rate data or third room rate data) include redundance record, then carry out duplicate removal processing and finally obtain can be used Standard rate data.
Referring to Fig. 5, in one embodiment, above-mentioned first processing module 200, comprising:
First reading submodule 210, for reading the field value of every record in the initial room rate data;
First matched sub-block 220, for judge the field format whether with preset field format match;
First judging submodule 230, if format and preset field format mismatching for the field, described in judgement Whether include the critical field in preset field format in field;
Adjusting submodule 240, when in the field including the critical field in preset field format, then according to The sequence of critical field in preset field format is to the field carry out sequence adjustment, to obtain the first room rate data.
In the present embodiment, initial room rate data has a plurality of record, and each record is corresponding with corresponding field, and first Reading submodule 210 reads the field of every record in initial room rate data, that is, obtains the format of field, namely be aware of word The sequence of each field, the format of above-mentioned field and preset field format is compared, the first matched sub-block 220 is sentenced in section The format of disconnected field whether with preset field format match, preset field format is pre-set order of the field format, if on The format for stating field is matched with preset field format, then illustrates that above-mentioned order of the field is correctly, not need to order of the field It is adjusted, if above-mentioned field format and preset field format mismatch, i.e. description field sequence error, at this moment the first judgement Whether submodule 230 needs to judge comprising the critical field in preset field format in field, specifically, in above-mentioned field It is suitable to illustrate that above-mentioned field is appropriate for if critical field can be found for the preset keyword section searched in preset field format Sequence adjustment, at this moment adjusting submodule 240 can be according to the sequence of the critical field in above-mentioned preset field format to above-mentioned field Carry out sequence adjustment, if not having at this moment to illustrate above-mentioned field comprising the critical field in preset field format in above-mentioned field Be not suitable for adjustment sequence, then removal can be filtered to the field of the sequence error.
Citing ground, preset room rate data order of the field format is " urban addresses price ", but reads room rate data Field be " city the A address B price C ", since the field format and preset field format mismatch, at this moment then determine the word Section sequence error, then judges that the field " city the A address B price C " of the sequence error if appropriate for adjustment, that is, judges field " A Whether the city address B price C " is comprising critical field " city ", " address ", " price " in preset field format, in this example, After judgement includes, by the field of said sequence mistake according to preset field Format adjusting at " city the A address C B price ", if reading The field for getting room rate data is " C number of a-quadrant B cell ", similarly at this moment determines the field format and preset field format not Matching, but due to searching " urban addresses valence in the critical field less than preset field format in " B number of a-quadrant B cell " Lattice ", at this moment judgement are not suitable for adjusting the sequence of the field, directly the Field Sanitization are removed.
Referring to Fig. 6, in one embodiment, above-mentioned Second processing module 300, comprising:
Second reading submodule 310, for reading the field of every record in first room rate data;
Second judgment submodule 320, for the matching according to the preset field format and the field of described every record As a result or the incidence relation in the record between field judges whether to lack field;
Search submodule 330: for closing according to the association in the preset field format and/or the record between field When system's judgement lacks field, then the absent field in first room rate data is searched in preset table;
It fills a vacancy submodule 340, it is described to obtain for the absent field to be padded to the position with the absent field Second room rate data.
In the present embodiment, the second reading submodule 310 first reads every in above-mentioned first room rate data record, and then the Two judging submodules 320 judge whether to lack word according to the incidence relation in above-mentioned preset field format or record between field Section, specifically, the incidence relation between above-mentioned field includes according to the relationship of field semantics correspondence, if sentencing according to semanteme Disconnected front and back does not correspond to, then explanation lacks field, if the flat price in the city A is 10,000 yuan every square metre, under normal circumstances by room It is specific price that the semanteme for producing price, which can obtain corresponding field below, such as above-mentioned 10,000 yuan every square metre, and behind flat price Field be not price, illustrate that the record lacks field.In the present embodiment, second judgment submodule 320 is according to above-mentioned predetermined word It is " the room of the address xxx in certain city as certain in room rate data records normal field when paragraph format judges whether to lack field Producing price is xxx ", but the record field crawled is " flat price in certain city is xxx ", then according to above-mentioned predetermined word Paragraph format " urban addresses price " can determine whether that this record has lacked field address;When second judgment submodule 320 is according to above-mentioned When incidence relation in record between field judges whether to lack field, it is as certain in room rate data records normal field " flat price of the address xxx in certain city is xxx ", but the record field crawled is " the house property of the address in certain city Price is ", then it can determine whether out that this record has lacked the specific price of house property and specific according to the incidence relation between field The field of address.
If second judgment submodule 320 judges that above-mentioned first room rate data lacks field, searching submodule 330 can pass through The absent field of above-mentioned first room rate data is searched in preset table, specifically, above-mentioned preset table is by house property website The web data crawled is arranged and is formed, it is to be understood that the data for the full page that reptile instrument crawls, and room rate For data only in some region of the page, other regions of the page have been also possible to content relevant to room rate data, but need What is wanted is room rate data, so only cleaning to room rate data, is not cleaned to other data, when scarce in room rate data Lose certain fields, so that it may by searching for content relevant to room rate data, check whether that there are these absent fields, if finding These absent fields, then these absent fields are padded to the position of above-mentioned absent field by submodule 340 of filling a vacancy, in the present embodiment In, preset table by page data arrangement obtain, the preset table include the above-mentioned initial room rate data that do not cleaned, with The relevant all data of room rate data (such as with the above-mentioned initial duplicate data of room rate data) and its unrelated with room rate data His data (such as ad data), so the absent field in record can be searched in preset table, it specifically can be by above-mentioned record Match with preset table and then obtain absent field, then the absent field is filled to the position of absent field, such as will Above-mentioned field " flat price in certain city is xxx " polishing is " flat price of the address xxx in certain city is xxx ".If It searches in preset table less than above-mentioned record, also matches less than absent field, then above-mentioned record filtering can be deleted.
It in another embodiment, can also be the field for lacking field according to the incidence relation in the record between field It matches corresponding absent field, and the absent field is filled between respective field, in the present embodiment, above-mentioned incidence relation Including the relationship according to field syntax correspondence, then absent field described herein is simple associate field, if crawling To it is above-mentioned be recorded as " the flat price xxx of certain city address xxx ", then can pass through the incidence relation between above-mentioned field It allots corresponding field, and the field is filled between corresponding field, such as by the " flat price of certain city address xxx Xxx " filling is completed to be " flat price of the address xxx in certain city is xxx ".
In one embodiment, the above-mentioned field in second room rate data is denoised to obtain third room rate number According to, specifically, denoising include remove invalid or duplicate character, first determine whether the field in room rate data whether include Invalid or duplicate character, it is the field that null value, messy code and content are not inconsistent that wherein idle character, which includes field value, not such as content The field of symbol can be with are as follows: price field but should not be number for number, and as content is not inconsistent.If judging in room rate data Field is invalid or repeat character (RPT), then rejects above-mentioned invalid or repeat character (RPT).It such as should be the price field packet of number in record Text has been included, then these texts are idle character, these texts have at this moment been removed, to achieve the purpose that regular price field.
Further, referring to Fig. 8, the device of above-mentioned cleaning room rate data further include:
Convert module 600, is handled for the unit to room rate by conversion, to form unified room rate unit.
In the present embodiment, after above-mentioned third room rate data is removed invalid or repeat character (RPT), conversion module 600 can The unit of room rate is handled by conversion, to form unified room rate unit, such as by " thousand yuan every square metre ", " ten thousand yuan every square The inconsistent room rate unit field of rice ", " thousand yuan every square decimeter " etc. is all unified into same list according to default rule Above-mentioned inconsistent room rate unit is such as all converted into " ten thousand yuan every square metre " according to formula scales, facilitated subsequent by bit field It uses.
Referring to Fig. 7, in one embodiment, above-mentioned fourth processing module 500, comprising:
Be grouped submodule 510, for by every in third room rate data record according to preset core field into Row grouping;
Reject submodule 520, for by the non-core field respectively recorded in the field of the grouping reject with Multiple records with identical core field are obtained, the non-core field is except the word except the preset core field Section;
Submodule 530 is deleted, multiple records with identical core field are deleted only one surplus to obtain Standard rate data with core field record.
Duplicate removal processing mainly removes duplicate record according to core field, carries out in the present embodiment to third room rate data Duplicate removal processing specifically first presets core field, which includes cell name, cell address, price, area, city Then the fields such as city are grouped submodule 510 and are grouped every record of third room rate data according to core field, i.e., will tool There is the record of identical core field to be divided into one group, duplicate removal processing is carried out to each record in each group, i.e., by each note in grouping The non-core field of record is rejected, to retain the note of the cores field such as cell name, cell address, price, area, city Record, is grouped duplicate removal to core field, guarantees that the data of core grouping are recorded containing only one, above-mentioned non-core field is Except the field except the preset core field.Citing ground, two records in third room rate data: first record Field include cell name, cell address and finishing price, Article 2 record field include cell name, cell address with And parking stall price, wherein cell name and cell address are core field, remaining is non-core field, due to the two notes The field of the core field of record is identical, can be grouped into the same group, and when duplicate removal can then remove non-core field, such as removes first record In finishing price field and Article 2 record parking stall price field so that the grouping only be left two contain phase same core Then the record of heart field removes a wherein record for redundance, wherein the record is comprising cell name and cell Location field, to guarantee to guarantee only to remain next effective, standard room rate data.
Referring to Fig. 9, in one embodiment, the device of above-mentioned cleaning room rate data further include:
Second obtains module 700, for according to the standard rate data acquisition room rate;
Computing module 800, for average room rate to be calculated by preset rules in the room rate.
The room rate data of acquisition obtains accurate, unique 4th room rate data, due to the 4th room rate after processed Comprising information such as room rates in data, at this moment the second acquisition module 700 can obtain room rate from processed 4th room rate data, Then, computing module 800 is calculated according to the room rate according to preset rules, specifically, above-mentioned room rate is pressed corresponding room It produces Area-weighted and calculates average room rate, wherein above-mentioned acquisition room rate includes following two situation: one is the rooms for having plot area Valence, one is the room rate of not plot area, specific preset rules are as follows: when the data crawled have plot area, this When average room rate can be calculated by avg1=sum (plot area * unit price)/sum (plot area), wherein the average price of cell can By formula cell average price=(avg1*count1+avg2*count2+ ...+avgn*countn)/(count1+count2+ ... Countn it) is calculated and is obtained, above-mentioned avg is the average room rate of every place's house property of cell, and count is the face of the every place's house property of cell Product;When the data crawled do not have plot area, the room rate crawled at this moment is defaulted as average room rate, above-mentioned average The flat price that room rate can be used for assessing different community is horizontal, and when user being allowed to buy house can with reference to etc..
Referring to Fig.1 0, a kind of computer equipment is also provided in the embodiment of the present invention, which can be server, Its internal structure can be as shown in Figure 10.The computer equipment includes processor, the memory, network connected by system bus Interface and database.Wherein, the processor of the Computer Design is for providing calculating and control ability.The computer equipment is deposited Reservoir includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program And database.The internal memory provides environment for the operation of operating system and computer program in non-volatile memory medium.It should The data such as the method that the database of computer equipment is used to store preset cleaning room rate data.The network of the computer equipment connects Mouth with external terminal by network connection for being communicated.To realize a kind of cleaning room when the computer program is executed by processor The method of valence mumber evidence.
Above-mentioned processor executes the step of method of above-mentioned cleaning room rate data: obtaining initial room rate data;When described first There is the field of sequence dislocation in beginning room rate data, processing is adjusted to obtain to the sequence of field in the initial room rate data To correct first room rate data of order of the field;When first room rate data lacks field, to first room rate data into Row fills a vacancy processing to obtain complete second room rate data of field;When second room rate data has invalid or repeat character (RPT), Reject invalid or repeat character (RPT) denoising to second room rate data to obtain third room rate data;When described Three room rate datas include the record of redundance, and the duplicate removal processing of redundance record is removed to the third room rate data To obtain standard rate data.
Above-mentioned computer equipment is adjusted processing to the sequence of field in the initial room rate data to obtain the first room Valence mumber evidence, including, read the field of every record in the initial room rate data;Judge the field format whether with it is default Field format matching;If the format of the field and preset field format mismatching, whether judge in the field comprising pre- If the critical field in field format;If so, according to the sequence of the critical field in the preset field format to the word Duan Jinhang sequence adjusts, to obtain the first room rate data.
In one embodiment, above-mentioned fill a vacancy processing to first room rate data to obtain field complete second The step of room rate data, comprising: read the field of every record in first room rate data;According to the preset field format Incidence relation in the matching result or the record of the field recorded with described every between field judges whether to lack field; If so, searching the absent field in first room rate data in preset table;The absent field is padded to described The position of absent field, to obtain second room rate data.
In one embodiment, above-mentioned that second room rate data is carried out to reject invalid or repeat character (RPT) denoising After the step of obtaining third room rate data, comprising: the room rate unit in the third room rate data is handled by conversion, To form unified room rate unit.
In one embodiment, it is above-mentioned to the third room rate data be removed redundance record duplicate removal processing with The step of obtaining standard rate data, comprising: by every record in the third room rate data according to preset core field It is grouped;The non-core field respectively recorded in the grouping is rejected multiple with identical core field to obtain Record, the non-core field are except the field except the preset core field;It will be multiple described with identical core The record of field is deleted to obtain an only surplus standard rate data with core field record.
In one embodiment, the step of above-mentioned acquisition initial room rate data, comprising: house property net is crawled by reptile instrument Page data in standing;The initial room rate data is searched in the page data, and obtains the initial room rate data.
In one embodiment, it is described to the third room rate data carry out removal redundance record processing again with After the step of obtaining standard rate data, comprising: according to the standard rate data acquisition room rate;By the room rate by default Average room rate is calculated in rule.
It will be understood by those skilled in the art that structure shown in Figure 10, only part relevant to application scheme The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme.
One embodiment of the invention also provides a kind of computer readable storage medium, is stored thereon with computer program, calculates Machine program realizes a kind of method when being executed by processor, specifically: obtain initial room rate data;When the initial room rate data In have sequence dislocation field, to the sequence of field in the initial room rate data be adjusted processing to obtain order of the field Correct first room rate data;When first room rate data lacks field, processing of filling a vacancy is carried out to first room rate data To obtain complete second room rate data of field;When second room rate data has invalid or repeat character (RPT), to described second Room rate data reject invalid or repeat character (RPT) denoising to obtain third room rate data;When the third room rate data Record comprising redundance is removed the duplicate removal processing of redundance record to the third room rate data to obtain standard Room rate data.
Above-mentioned computer readable storage medium is adjusted processing to the sequence of field in the initial room rate data to obtain To the first room rate data, including, read the field of every record in the initial room rate data;Judging the format of the field is No and preset field format match;If the format of the field and preset field format mismatching, judge be in the field The no critical field comprising in preset field format;If so, according to the sequence of the critical field in the preset field format To the field carry out sequence adjustment, to obtain the first room rate data.
In one embodiment, above-mentioned fill a vacancy processing to first room rate data to obtain field complete second The step of room rate data, comprising: read the field of every record in first room rate data;According to the preset field format Incidence relation in the matching result or the record of the field recorded with described every between field judges whether to lack field; If so, searching the absent field in first room rate data in preset table;The absent field is padded to described The position of absent field, to obtain second room rate data.
In one embodiment, above-mentioned that second room rate data is carried out to reject invalid or repeat character (RPT) denoising After the step of obtaining third room rate data, comprising: the room rate unit in the third room rate data is handled by conversion, To form unified room rate unit.
In one embodiment, it is above-mentioned to the third room rate data be removed redundance record duplicate removal processing with The step of obtaining standard rate data, comprising: by every record in the third room rate data according to preset core field It is grouped;The non-core field respectively recorded in the grouping is rejected multiple with identical core field to obtain Record, the non-core field are except the field except the preset core field;It will be multiple described with identical core The record of field is deleted to obtain an only surplus standard rate data with core field record.
In one embodiment, the step of above-mentioned acquisition initial room rate data, comprising: house property net is crawled by reptile instrument Page data in standing;The initial room rate data is searched in the page data, and obtains the initial room rate data.
In one embodiment, it is described to the third room rate data be removed redundance record duplicate removal processing with After the step of obtaining standard rate data, comprising: according to the standard rate data acquisition room rate;By the room rate by default Average room rate is calculated in rule.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can store and a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, Any reference used in provided herein and embodiment to memory, storage, database or other media, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, mono- diversified forms of RAM can obtain, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double speed are according to rate SDRAM (SSRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, device, article or the method that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, device, article or method institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, device of element, article or method.
The above description is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all utilizations Equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content is applied directly or indirectly in other correlations Technical field, be included within the scope of the present invention.

Claims (10)

1. a kind of method for cleaning room rate data characterized by comprising
Obtain initial room rate data;
When the field in the initial room rate data with sequence dislocation, the sequence of field in the initial room rate data is carried out Adjustment processing is to obtain correct first room rate data of order of the field;
When first room rate data lacks field, fill a vacancy to first room rate data, it is complete to obtain field to handle Second room rate data;
When second room rate data has invalid or repeat character (RPT), second room rate data is carried out rejecting invalid or be repeated The denoising of character is to obtain third room rate data;
When record of the third room rate data comprising redundance, redundance note is removed to the third room rate data The duplicate removal processing of record is to obtain standard rate data.
2. the method for cleaning room rate data according to claim 1, which is characterized in that described to the initial room rate data The sequence of middle field is adjusted the step of processing is to obtain the first room rate data, comprising:
Read the field of every record in the initial room rate data;
Judge the field format whether with preset field format match;
If the format of the field and preset field format mismatching, whether judge in the field comprising preset field format In critical field;
If so, according to the sequence of the critical field in the preset field format to the field carry out sequence adjustment, to obtain To the first room rate data.
3. the method for cleaning room rate data according to claim 1, which is characterized in that described to first room rate data The step of processing fill a vacancy to obtain field complete second room rate data, comprising:
Read the field of every record in first room rate data;
According in the matching result or the record of the field of the preset field format and every record between field Incidence relation judges whether to lack field;
If so, searching the absent field in first room rate data in preset table;
The absent field is padded to the position of the absent field, to obtain second room rate data.
4. the method for cleaning room rate data according to claim 1, which is characterized in that described to second room rate data After reject the step of invalid or repeat character (RPT) denoising is to obtain third room rate data, comprising:
Room rate unit in the third room rate data is handled by conversion, to form unified room rate unit.
5. the method for cleaning room rate data according to claim 1, which is characterized in that described to the third room rate data It is removed the step of duplicate removal processing of redundance record is to obtain standard rate data, comprising:
Every record in the third room rate data is grouped according to preset core field;
The non-core field respectively recorded in the grouping is rejected to obtain multiple records with identical core field, The non-core field is except the field except the preset core field;
Multiple records with identical core field are deleted with obtain only surplus one with core field record Standard rate data.
6. the method for cleaning room rate data according to claim 1, which is characterized in that the initial room rate data of acquisition Step, comprising:
The page data in house property website is crawled by reptile instrument;
The initial room rate data is searched in the page data, and obtains the initial room rate data.
7. the method for cleaning room rate data according to claim 1, which is characterized in that described to the third room rate data It is removed after the step of duplicate removal processing of redundance record is to obtain standard rate data, comprising:
According to the standard rate data acquisition room rate;
Average room rate is calculated by preset rules in the room rate.
8. a kind of device for cleaning room rate data characterized by comprising
First obtains module, for obtaining initial room rate data;
First processing module, for working as the field in the initial room rate data with sequence dislocation, to the initial room rate number Processing is adjusted according to middle order of the field to obtain correct first room rate data of order of the field;
Second processing module carries out the place that fills a vacancy to first room rate data for lacking field when first room rate data Reason is to obtain complete second room rate data of field;
Third processing module, for there is invalid or repeat character (RPT) when second room rate data, to second room rate data Invalid or repeat character (RPT) denoising reject to obtain third room rate data;
Fourth processing module, for including the record of redundance when the third room rate data, to the third room rate data The duplicate removal processing of redundance record is removed to obtain standard rate data.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 7 is realized when being executed by processor.
CN201810955918.XA 2018-08-21 2018-08-21 Method, device, computer equipment and storage medium for cleaning house price data Active CN109299081B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810955918.XA CN109299081B (en) 2018-08-21 2018-08-21 Method, device, computer equipment and storage medium for cleaning house price data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810955918.XA CN109299081B (en) 2018-08-21 2018-08-21 Method, device, computer equipment and storage medium for cleaning house price data

Publications (2)

Publication Number Publication Date
CN109299081A true CN109299081A (en) 2019-02-01
CN109299081B CN109299081B (en) 2024-04-05

Family

ID=65165314

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810955918.XA Active CN109299081B (en) 2018-08-21 2018-08-21 Method, device, computer equipment and storage medium for cleaning house price data

Country Status (1)

Country Link
CN (1) CN109299081B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069687A (en) * 2019-03-15 2019-07-30 平安城市建设科技(深圳)有限公司 Target cell room rate tendency drawing generating method, device, equipment and storage medium
CN115757423A (en) * 2022-11-29 2023-03-07 中诚智信工程咨询集团股份有限公司 Engineering cost data correction method, system, equipment and storage medium
CN117436496A (en) * 2023-11-22 2024-01-23 深圳市网安信科技有限公司 Training method and detection method of anomaly detection model based on big data log

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239581A (en) * 2017-07-07 2017-10-10 小草数语(北京)科技有限公司 Data cleaning method and device
CN107741990A (en) * 2017-11-01 2018-02-27 深圳汇生通科技股份有限公司 Data cleansing integration method and system
CN108153789A (en) * 2016-12-02 2018-06-12 航天星图科技(北京)有限公司 A kind of transaction platform data processing method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108153789A (en) * 2016-12-02 2018-06-12 航天星图科技(北京)有限公司 A kind of transaction platform data processing method
CN107239581A (en) * 2017-07-07 2017-10-10 小草数语(北京)科技有限公司 Data cleaning method and device
CN107741990A (en) * 2017-11-01 2018-02-27 深圳汇生通科技股份有限公司 Data cleansing integration method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069687A (en) * 2019-03-15 2019-07-30 平安城市建设科技(深圳)有限公司 Target cell room rate tendency drawing generating method, device, equipment and storage medium
CN115757423A (en) * 2022-11-29 2023-03-07 中诚智信工程咨询集团股份有限公司 Engineering cost data correction method, system, equipment and storage medium
CN115757423B (en) * 2022-11-29 2024-01-30 中诚智信工程咨询集团股份有限公司 Engineering cost data correction method, system, equipment and storage medium
CN117436496A (en) * 2023-11-22 2024-01-23 深圳市网安信科技有限公司 Training method and detection method of anomaly detection model based on big data log

Also Published As

Publication number Publication date
CN109299081B (en) 2024-04-05

Similar Documents

Publication Publication Date Title
CN110489633B (en) Intelligent brain service system based on library data
CN103714084B (en) The method and apparatus of recommendation information
CN109299081A (en) Clean method, apparatus, computer equipment and the storage medium of room rate data
CN104504003B (en) The searching method and device of diagram data
CN108536708A (en) A kind of automatic question answering processing method and automatically request-answering system
CN107526807A (en) Information recommendation method and device
CN109508420B (en) Method and device for cleaning attributes of knowledge graph
CN108108426A (en) Understanding method, device and the electronic equipment that natural language is putd question to
CN106934023A (en) A kind of data managing method and device
JP2015505629A (en) Information search method and server
CN106933863B (en) Data clearing method and device
CN102760151A (en) Implementation method of open source software acquisition and searching system
CN104978356A (en) Synonym identification method and device
CN106503223A (en) A kind of binding site and the online source of houses searching method and device of key word information
CN109697256B (en) Method, device, storage medium and electronic equipment for determining related search terms
CN103970753A (en) Pushing method and pushing device for related knowledge
CN104199945A (en) Data storing method and device
CN106936778A (en) The abnormal detection method of website traffic and device
CN112434216A (en) Intelligent investment project recommendation method and device, storage medium and computer equipment
CN103309984A (en) Data processing method and device
CN106709805A (en) Method and system for acquiring user income data
CN104572932A (en) Method and device for determining interest label
CN107527289A (en) A kind of investment combination industry distribution method, apparatus, server and storage medium
CN106603742A (en) IP address and domain name corresponding relationship update method and device
CN106708880A (en) Topic associated word obtaining method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant