CN108804594A - A kind of construction method and device of news content full-text search engine - Google Patents

A kind of construction method and device of news content full-text search engine Download PDF

Info

Publication number
CN108804594A
CN108804594A CN201810523561.8A CN201810523561A CN108804594A CN 108804594 A CN108804594 A CN 108804594A CN 201810523561 A CN201810523561 A CN 201810523561A CN 108804594 A CN108804594 A CN 108804594A
Authority
CN
China
Prior art keywords
news
data
website
time
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810523561.8A
Other languages
Chinese (zh)
Inventor
李雄
张传新
刘春阳
张旭
王萌
王慧
王利军
李磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tianrun Foundation Technology Development Ltd By Share Ltd
National Computer Network and Information Security Management Center
Original Assignee
Beijing Tianrun Foundation Technology Development Ltd By Share Ltd
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tianrun Foundation Technology Development Ltd By Share Ltd, National Computer Network and Information Security Management Center filed Critical Beijing Tianrun Foundation Technology Development Ltd By Share Ltd
Priority to CN201810523561.8A priority Critical patent/CN108804594A/en
Publication of CN108804594A publication Critical patent/CN108804594A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of construction methods and device of news content full-text search engine, and steps are as follows for this method:Obtain the real-time website daily record with real-time access information;Obtain the data of the news website with news temperature comment information;Classify to the real-time website daily record and the news website data;The sorted news website data are handled, indexed and stored;It obtains the news metamessage of news website data China and stores;The temperature information obtained in news website data is stored, and to the temperature Information Statistics in news website data.The present invention realizes rational balance in query performance, index space and structure aspect of performance;In view of the characteristic that statistical data changes over time, dynamic updates indexed results;Improve the robustness of system;Improve the compound query performance of statistical data and text data.

Description

A kind of construction method and device of news content full-text search engine
Technical field
The present invention relates to a kind of construction methods and device of news content full-text search engine, more particularly to one kind is based on system The construction method for counting the news content full-text search engine of index incremental update, belongs to technical field of data processing.
Background technology
Traditional news content full-text search engine only supports the index to free text, and data statistic analysis is usual It is retrieved using database, when needing to obtain news content retrieval result simultaneously and when based on the statistical data of news, Two kinds of retrievals can separately be operated, on the one hand can not ensure the locality of index data and statistical data, retrieval statistics It can not ensure, it is both time-consuming and laborious, while relying on the degree of coupling that external data base increases system again.
Invention content
The purpose of the present invention is to provide a kind of construction methods and device of news content full-text search engine, are suitable for making In the data management engine of data statistics and content analysis mixed type, looked into improving statistical data and the compound of text data Ask performance.
A kind of construction method of news content full-text search engine, including:
S1, the real-time website daily record with real-time access information is obtained;
S2, the data for obtaining the news website with news temperature comment information;
S3, classify to the real-time website daily record and the news website data;
S4, the sorted news website data are handled, indexed and is stored;
S5, the news metamessage for obtaining news website data China simultaneously store;
S6, the temperature information obtained in news website data are stored, and to the temperature information in news website data Statistics.
Wherein, to the sorted news website data be indexed including:
S41, different segmenter is selected by the category of language of news content;
S42, word segmentation processing is carried out to targeted news content according to selected segmenter, obtains participle list;
S43, using stop words dictionary, delete the stop words in the participle list, obtain filtered participle list;
S44, it is directed to filtered participle list, generates reverse indexing, is stored in distributed index database.
Wherein, the generation reverse indexing is to divide subregion on schedule, then by the reverse indexing of news website data It can store in logical partition.
Wherein, described monthly to be divided as subregion on schedule.
Wherein, the temperature information obtained in news website data is that daily statistics is primary.
Wherein, the temperature information includes:In contribution ID, access time, page browsing amount and independent visitor's number at least It is a kind of.
Wherein, metamessage includes:The date of distributing new dispatchs of Press release, the author that distributes new dispatchs, the emotion information of news, degree of correlation information At least one of with classification information.
Wherein, the temperature Information Statistics in the data to news website include:
S61, according to keyword retrieval to relevant contribution;
S62, according to determining query time section;
S63, aggregate statistics are carried out to the visit capacity information in time interval.
Further, the present invention provides a kind of news content text searching method, including:
D1, search condition, given query time interval are obtained;
D2, the subregion for determining index to be retrieved;
D3, the keyword expression of input is parsed, is generated after search engine the problem of needing in corresponding subregion It is retrieved;
D4, the hit results for getting retrieval, find out the metamessage of Press release;
D5, it finds after temperature information of the Press release at the appointed time in interval range carries out aggregate statistics and exports.
Further, the present invention provides a kind of news content full-text search device, including:
Log acquisition unit, for obtaining the real-time website daily record with real-time access information;
Data capture unit, the data for obtaining the news website with news temperature comment information;
Taxon, for classifying to the real-time website daily record and the news website data;
Indexing units, for the sorted news website data to be handled, indexed and stored;
News metamessage acquiring unit, news metamessage for obtaining news website data China simultaneously store;
News temperature information process unit, the temperature information for obtaining in news website data are stored, and to new Hear the temperature Information Statistics in website data.
The present invention a kind of construction method, search method and the device of news content full-text search engine, advantage and effect It is:
1, the present invention realizes rational balance in query performance, index space and structure aspect of performance, with it is conventional then Full-text search engine is suitable compared to retrieval performance, the increment very little occupied to storage resource.
2, using statistical data in such a way that full-text search data are combined, to build to new index, while in view of system The characteristic changed over time is counted, dynamic updates indexed results, and convenient search person disposably obtains content results and statistics is tied Fruit.
3, in combination with the time response of Press release content, to indexing this metadata of time of distributing new dispatchs by Press release Characteristic carries out logical partition storage, and index data is on the one hand enable to be more suitable for carrying out distributed storage, pair for being on the other hand The management of index data is more prone to, for example, when the index data of a time interval if necessary to be indexed again, then only It needs to reacquire this partial data from external source and is indexed update, do not influence, to the retrieval statistics of other index partitions, to carry The high robustness of system.
4, the present invention is suitable for use in the data management engine of data statistics and content analysis mixed type, improves statistics The compound query performance of data and text data.
Description of the drawings
Fig. 1 is the construction method flow diagram of news content full-text search engine in the embodiment of the present invention.
Fig. 2 is to be indexed method flow schematic diagram to sorted news website data in the embodiment of the present invention.
Fig. 3 is in the embodiment of the present invention to the temperature Information Statistics flow diagram in news website data.
Fig. 4 is a kind of news content text searching method flow diagram in the embodiment of the present invention.
Fig. 5 is a kind of news content full-text search apparatus structure schematic diagram in the embodiment of the present invention.
Specific implementation mode
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art The every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
The embodiment of the present invention provides a kind of construction method of news content full-text search engine, and this method includes:S1, acquisition Real-time website daily record with real-time access information;S2, the data for obtaining the news website with news temperature comment information; S3, classify to the real-time website daily record and the news website data;S4, the sorted news website data are carried out It handles, index and store;S5, the news metamessage for obtaining news website data China simultaneously store;S6, news website data are obtained In temperature information stored, and to the temperature Information Statistics in news website data.Below to news provided by the invention The construction method expansion detailed description of content full-text search engine.
Specifically, the web log file that band is sometimes access information is obtained by web crawlers, two kinds of reptile acquisition is main Data source, a part carry the real-time website daily record of real-time access information, and another part is with news temperature comment letter The data of the news website of breath.Further, according to different reptile data sources, simply classified, respectively to original Reptile data handled and indexed.Further, for filtered participle list, reverse indexing, deposit distribution are generated In the index database of formula;Lastest news hot spot is directed in the present invention, due to the time response of news, the visit of a usual news Ask temperature news distribute new dispatchs on the day of be peak, then gradually decay, in order to improve the performance of retrieval, index is using on time Between the mode of subregion carry out.
As shown in Fig. 2, to the sorted news website data be indexed including:S41, the language by news content Type selects different segmenter;42, word segmentation processing is carried out to targeted news content according to selected segmenter, is segmented List;S43, using stop words dictionary, delete the stop words in the participle list, obtain filtered participle list;S44, For filtered participle list, reverse indexing is generated, is stored in distributed index database.
It is to divide subregion on schedule to generate reverse indexing, and the reverse indexing of news website data can then be stored and patrolled It collects in subregion.Specifically, every Press release to be indexed, which should all have, explicitly distributes new dispatchs the time, while in order to limit index point The quantity in area, the optional mode for taking monthly subregion carry out.Such as the time of distributing new dispatchs of a contribution is 2018-01-31 13:33: 21, then use 201801 subregion as the index of this contribution, the reverse indexing of the contribution can then store to 201801 this In logical partition.The practical of index information can store in distributed key assignments (key-value) data, we select in realization Select raft agreements, the copy of acquiescence is 3, i.e., can be deposited slightly on three different index nodes in actual storage index information, Such as 201801 all index informations of this logical partition can be distributed on three different physical nodes.
It is that daily statistics is primary to obtain the temperature information in news website data.Temperature information includes:When contribution ID, access Between, it is at least one in page browsing amount and independent visitor's number.Specifically, it for the temperature information of contribution, is used in of the invention The mode counted once a day carries out, for any one contribution, daily temperature be one by contribution ID (hereinafter referred to as Cid), access time (day), page browsing amount (referred to as pv), independent visitor's number (referred to as uv) composition four-tuple (cid, day,pv,uv).Similarly, the temperature information four-tuple of contribution, is also stored in distributed key-value databases, together Sample uses raft agreements.
It is described
News metamessage includes:The date of distributing new dispatchs of Press release, the author that distributes new dispatchs, the emotion information of news, degree of correlation information At least one of with classification information.These news metamessages will not change with the variation of time, even if if changing only It needs to preserve a value.These metamessages of Press release are consistent with reverse indexing, according to contribution distribute new dispatchs the time into Row logical partition, after subregion in storage to the distributed key-value databases of counterlogic subregion.Keyword index is provided While, can use contribution metamessage be filtered, such as can specify contribution emotional value fixed interval contribution into Row retrieval, various metamessage search conditions support the query composition of the basic logics such as and-or inverter.
In order to improve the performance of specific retrieval, the member in the temperature information present invention for same day contribution of distributing new dispatchs as Press release Information and other metamessages are stored together.For example, we only need to count the temperature data on the day of Press release is distributed new dispatchs, directly It connects and is counted according to the temperature information in metadata information, this can greatly improve the performance of this some types retrieval.
In embodiments of the present invention, as shown in figure 3, including to the temperature Information Statistics in news website data:S61, root According to keyword retrieval to relevant contribution;S62, according to determining query time section;S63, to the visit capacity in time interval Information carries out aggregate statistics.Specifically, first according to keyword retrieval to relevant contribution, then according to determining query time area Between, aggregate statistics are carried out to the visit capacity information in time interval.For the aggregate statistics of contribution temperature information, the present invention is implemented Example uses on the basis of distributed key-value databases, and simple SQL query function is added, can tire out to pv Add, and the hyperloglog algorithms that polymerization use is carried out to uv are realized.Another aspect, the visit capacity information of contribution follow The date of distributing new dispatchs of contribution carries out subregion, can greatly improve the performance of the two processes of retrieval+count, and usually only needs primary point Area's positioning operation can navigate to position of the Press release self-information in distributed key-value databases.
For further embody news content full-text search engine provided by the invention construction method superiority, the present invention A kind of news content text searching method applied in above-mentioned search engine is provided, as shown in figure 4, this method includes:D1, acquisition Search condition, given query time interval;D2, the subregion for determining index to be retrieved;D3, to the keyword expression of input into Row parsing, is retrieved after generating the problem of search engine needs in corresponding subregion;D4, the hit results for getting retrieval, Find out the metamessage of Press release;D5, it finds temperature information of the Press release at the appointed time in interval range and carries out polymerization system It is exported after meter.Below news content text searching method provided by the invention is unfolded to be described in detail.
In the present invention, user search condition is obtained, can be the search key for obtaining user.User can pass through pass Keyword carries out Press release content and is retrieved.Dividing for the index retrieved is determined after checking the legitimacy of the time range of retrieval Area, including one or more index partition to be retrieved according to keyword retrieval to relevant contribution, then according to determination Query time section.Reptile can carry out analysis extraction after collecting news content data to content, and Press release is distributed new dispatchs Time is also to extract in this way, if the time extracted is earlier than regular time or advanced current time (being then considered as illegal) can refuse these contributions, this hair when then calling the index interface of this news content full-text search engine The bright analysis and retrieval for focusing mainly on the internet hot spots news in recent years, premature news content would not count.This Invention retrieval is concerned with the hot statistics of particular news contribution in specific time, in order to improve safeguarding for performance and system Property, retrieve information is carried out by way of subregion, and subregion is carried out by the month information of the time of distributing new dispatchs of Press release, such as one The time of distributing new dispatchs of piece contribution is 2018-05-04 12:30:21, then the index information of the contribution can distribute storage to 201805 this In a logical partition.
The keyword expression of input is parsed, is carried out in corresponding subregion after generating the problem of search engine needs Retrieval.Specifically, the core inner of searching system supports the following Query combined, and combination Query is to some row subquery items The synthesis of part (Child Query).Conjunction:Indicate that retrieval result must be matched to all subqueries; Disjunction:Indicate that retrieval result must be matched to one or more subquery;Boolean:(it must expire including Must Foot), Must Not (must be unsatisfactory for).Such as:Term input by user is " (Nanjing endowment insurance &)!(house property | medical treatment) " packet Logical AND, logic NOT and logic have been included or, being directed to such retrieval grammer, a series of BNF expression formulas has been write out, passes through BISON Tool generates the Parser customized, and Parser run-downs term input by user generates three above and checks inquiry substantially Combination Query, be submitted to index and retrieved.
The hit results for getting retrieval find out the metamessage of Press release;Specifically, for example, retrieval hit results, It is a series of position of Press release in the index, which is indicated by one 64 for ID, is obtained after ID along with subregion is believed It ceases, then finds the news of corresponding ID in the key-value databases of respective partition, take out the metamessage of news, including news Contribution content, temperature information etc..D5, it finds temperature information of the Press release at the appointed time in interval range and carries out aggregate statistics After export.
While exporting Press release original contents, the system of the polymerization to the Press release temperature of specified time interval is provided Count result;For the aggregate statistics of contribution temperature information, present invention employs on the basis of distributed key-value databases On, simple SQL query function is added, can add up to pv, and the hyperloglog algorithms of polymerization use are carried out to uv It realizes.Such as given keyword " room rate " wonders the Press release comprising this word in nearest one month hot statistics, packet The information included has, and the new contribution of how many daily item, every news how many pv and uv, daily pv and uv are in the whole nation How is the distribution in each province area, and device provides detailed statistics, while for the data in one month, can daily go out knot Fruit can also be added up out by Zhou Jinhang as a result, providing the Long-term change trend information of temperature.
Traditional news content full-text search engine only supports the index to free text, and for being additional to text The additional information data generally use database for statistical analysis changed over time come carry out retrieval and aggregate statistics, work as needs When obtaining news content retrieval result and statistical data based on news simultaneously, separate operations can only be retrieved by two kinds, on the one hand It can not ensure that the locality (locality) of index data and statistical data, retrieval statistics performance can not ensure, while rely on outer Portion's database increases the degree of coupling of system again.
In another embodiment of the present invention, the present invention provides a kind of news content full-text search device, such as Fig. 5 institutes Show, which includes:Log acquisition unit, for obtaining the real-time website daily record with real-time access information;Data acquisition list Member, the data for obtaining the news website with news temperature comment information;Taxon, for the real-time website day Will and news website data classification;Indexing units, for being handled the sorted news website data, being indexed And it stores;News metamessage acquiring unit, news metamessage for obtaining news website data China simultaneously store;News temperature Information process unit, the temperature information for obtaining in news website data are stored, and to the heat in news website data Spend Information Statistics.In the present invention, working process and principle and the above-mentioned news content full text of news content full-text search device are examined Suo Fangfa is similar, is referred to above method execution, details are not described herein.
Construction method, search method and the device of news content full-text search engine provided in an embodiment of the present invention, it is excellent Point and effect are:1, the present invention realizes rational balance in query performance, index space and structure aspect of performance, with routine Full-text search engine then is suitable compared to retrieval performance, the increment very little occupied to storage resource.2, using by statistical data with The mode that full-text search data are combined builds new index, while in view of the characteristic that statistical data changes over time, dynamic Indexed results are updated, convenient search person disposably obtains content results and statistical result.3, in combination with Press release content On the one hand time response makes to indexing time this metadata characteristics progress logical partition storage of distributing new dispatchs by Press release Index data can be more suitable for carrying out distributed storage, and be on the other hand is more prone to the management of index data, for example, when one The index data of a time interval if necessary to being indexed again, then only need from external source reacquire this partial data into Line index updates, and does not influence the retrieval statistics to other index partitions, improves the robustness of system.4, the present invention is suitable for making In the data management engine of data statistics and content analysis mixed type, the compound query of statistical data and text data is improved Performance.
It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, the application can be used in one or more wherein include computer usable program code computer The computer program production implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.
The application is with reference to method, the flow of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions every first-class in flowchart and/or the block diagram The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided Instruct the processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine so that the instruction executed by computer or the processor of other programmable data processing devices is generated for real The device for the function of being specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that instruction generation stored in the computer readable memory includes referring to Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device so that count Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, in computer or The instruction executed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in a box or multiple boxes.
It should be noted that herein, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also include other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.The fingers such as term "upper", "lower" The orientation or positional relationship shown is to be based on the orientation or positional relationship shown in the drawings, and is merely for convenience of the description present invention and simplifies Description, does not indicate or imply the indicated device or element must have a particular orientation, with specific azimuth configuration and behaviour Make, therefore is not considered as limiting the invention.Unless otherwise clearly defined and limited, term " installation ", " connected ", " connection " shall be understood in a broad sense, for example, it may be being fixedly connected, may be a detachable connection, or be integrally connected;Can be Mechanical connection can also be electrical connection;It can be directly connected, can also can be indirectly connected through an intermediary two Connection inside element.For the ordinary skill in the art, above-mentioned term can be understood at this as the case may be Concrete meaning in invention.
In the specification of the present invention, numerous specific details are set forth.Although it is understood that the embodiment of the present invention can To put into practice without these specific details.In some instances, well known method, structure and skill is not been shown in detail Art, so as not to obscure the understanding of this description.Similarly, it should be understood that disclose in order to simplify the present invention and helps to understand respectively One or more of a inventive aspect, in the above description of the exemplary embodiment of the present invention, each spy of the invention Sign is grouped together into sometimes in single embodiment, figure or descriptions thereof.However, should not be by the method solution of the disclosure It releases and is intended in reflection is following:The feature that i.e. the claimed invention requirement ratio is expressly recited in each claim is more More features.More precisely, as the following claims reflect, inventive aspect is to be less than single reality disclosed above Apply all features of example.Therefore, it then follows thus claims of specific implementation mode are expressly incorporated in the specific implementation mode, Wherein each claim itself is as a separate embodiment of the present invention.It should be noted that in the absence of conflict, this The feature in embodiment and embodiment in application can be combined with each other.The invention is not limited in any single aspect, It is not limited to any single embodiment, is also not limited to the arbitrary combination and/or displacement of these aspects and/or embodiment.And And can be used alone of the invention each aspect and/or embodiment or with other one or more aspects and/or its implement Example is used in combination.
Finally it should be noted that:The above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent Present invention has been described in detail with reference to the aforementioned embodiments for pipe, it will be understood by those of ordinary skill in the art that:Its according to So can with technical scheme described in the above embodiments is modified, either to which part or all technical features into Row equivalent replacement;And these modifications or replacements, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme should all cover in the claim of the present invention and the range of specification.

Claims (10)

1. a kind of construction method of news content full-text search engine, it is characterised in that:This method comprises the following steps:
S1, the real-time website daily record with real-time access information is obtained;
S2, the data for obtaining the news website with news temperature comment information;
S3, classify to the real-time website daily record and the news website data;
S4, the sorted news website data are handled, indexed and is stored;
S5, the news metamessage for obtaining news website data China simultaneously store;
S6, the temperature information obtained in news website data are stored, and to the temperature Information Statistics in news website data.
2. according to the method described in claim 1, it is characterized in that:Packet is indexed to the sorted news website data It includes:
S41, different segmenter is selected by the category of language of news content;
S42, word segmentation processing is carried out to targeted news content according to selected segmenter, obtains participle list;
S43, using stop words dictionary, delete the stop words in the participle list, obtain filtered participle list;
S44, it is directed to filtered participle list, generates reverse indexing, is stored in distributed index database.
3. according to the method described in claim 2, it is characterized in that:The generation reverse indexing is to divide to divide on schedule Area can then store the reverse indexing of news website data in logical partition.
4. according to the method described in claim 3, it is characterized in that:According to the method described in claim 3, it is characterized in that: It is described monthly to be divided as subregion on schedule.
5. according to the method described in claim 1, it is characterized in that:The temperature information obtained in news website data is every Its statistics is primary.
6. according to the method described in claim 1, it is characterized in that:The temperature information includes:Contribution ID, access time, page It is at least one in face pageview and independent visitor's number.
7. according to the method described in claim 1, it is characterized in that:The news metamessage includes:Press release is distributed new dispatchs day At least one of phase, the author that distributes new dispatchs, the emotion information of news, degree of correlation information and classification information.
8. according to the method described in claim 1, it is characterized in that:Temperature Information Statistics packet in the data to news website It includes:
S61, according to keyword retrieval to relevant contribution;
S62, according to determining query time section;
S63, aggregate statistics are carried out to the visit capacity information in time interval.
9. a kind of news content text searching method, it is characterised in that:This method includes:
D1, search condition, given query time interval are obtained;
D2, the subregion for determining index to be retrieved;
D3, the keyword expression of input is parsed, is carried out in corresponding subregion after generating the problem of search engine needs Retrieval;
D4, the hit results for getting retrieval, find out the metamessage of Press release;
D5, it finds after temperature information of the Press release at the appointed time in interval range carries out aggregate statistics and exports.
10. a kind of news content full-text search device, it is characterised in that:Including:
Log acquisition unit, for obtaining the real-time website daily record with real-time access information;
Data capture unit, the data for obtaining the news website with news temperature comment information;
Taxon, for classifying to the real-time website daily record and the news website data;
Indexing units, for the sorted news website data to be handled, indexed and stored;
News metamessage acquiring unit, news metamessage for obtaining news website data China simultaneously store;
News temperature information process unit, the temperature information for obtaining in news website data are stored, and to News Network Temperature Information Statistics in data of standing.
CN201810523561.8A 2018-05-28 2018-05-28 A kind of construction method and device of news content full-text search engine Pending CN108804594A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810523561.8A CN108804594A (en) 2018-05-28 2018-05-28 A kind of construction method and device of news content full-text search engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810523561.8A CN108804594A (en) 2018-05-28 2018-05-28 A kind of construction method and device of news content full-text search engine

Publications (1)

Publication Number Publication Date
CN108804594A true CN108804594A (en) 2018-11-13

Family

ID=64090493

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810523561.8A Pending CN108804594A (en) 2018-05-28 2018-05-28 A kind of construction method and device of news content full-text search engine

Country Status (1)

Country Link
CN (1) CN108804594A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110138583A (en) * 2019-03-03 2019-08-16 北京立思辰安科技术有限公司 A kind of methods of exhibiting of warning intelligent analysis
CN110516157A (en) * 2019-08-30 2019-11-29 盈盛智创科技(广州)有限公司 A kind of document retrieval method, equipment and storage medium
CN112115154A (en) * 2020-09-27 2020-12-22 北京有竹居网络技术有限公司 Data processing and data query method, device, equipment and computer readable medium
CN112817921A (en) * 2021-04-20 2021-05-18 泰德网聚(北京)科技股份有限公司 Cloud resource acquisition management system based on data center

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070250501A1 (en) * 2005-09-27 2007-10-25 Grubb Michael L Search result delivery engine
CN103365924A (en) * 2012-04-09 2013-10-23 北京大学 Method, device and terminal for searching information
CN103365902A (en) * 2012-03-31 2013-10-23 北大方正集团有限公司 Method and device for evaluating Internet News
CN103544255A (en) * 2013-10-15 2014-01-29 常州大学 Text semantic relativity based network public opinion information analysis method
CN103577501A (en) * 2012-08-10 2014-02-12 深圳市世纪光速信息技术有限公司 Hot topic searching system and hot topic searching method
CN103605658A (en) * 2013-10-14 2014-02-26 北京航空航天大学 Search engine system based on text emotion analysis
CN104731864A (en) * 2015-02-26 2015-06-24 国家计算机网络与信息安全管理中心 Data storage method for mass unstructured data
CN107918644A (en) * 2017-10-31 2018-04-17 北京锐思爱特咨询股份有限公司 News subject under discussion analysis method and implementation system in reputation Governance framework

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070250501A1 (en) * 2005-09-27 2007-10-25 Grubb Michael L Search result delivery engine
CN103365902A (en) * 2012-03-31 2013-10-23 北大方正集团有限公司 Method and device for evaluating Internet News
CN103365924A (en) * 2012-04-09 2013-10-23 北京大学 Method, device and terminal for searching information
CN103577501A (en) * 2012-08-10 2014-02-12 深圳市世纪光速信息技术有限公司 Hot topic searching system and hot topic searching method
CN103605658A (en) * 2013-10-14 2014-02-26 北京航空航天大学 Search engine system based on text emotion analysis
CN103544255A (en) * 2013-10-15 2014-01-29 常州大学 Text semantic relativity based network public opinion information analysis method
CN104731864A (en) * 2015-02-26 2015-06-24 国家计算机网络与信息安全管理中心 Data storage method for mass unstructured data
CN107918644A (en) * 2017-10-31 2018-04-17 北京锐思爱特咨询股份有限公司 News subject under discussion analysis method and implementation system in reputation Governance framework

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110138583A (en) * 2019-03-03 2019-08-16 北京立思辰安科技术有限公司 A kind of methods of exhibiting of warning intelligent analysis
CN110138583B (en) * 2019-03-03 2022-04-12 杭州立思辰安科科技有限公司 Display method for intelligent alarm analysis
CN110516157A (en) * 2019-08-30 2019-11-29 盈盛智创科技(广州)有限公司 A kind of document retrieval method, equipment and storage medium
CN110516157B (en) * 2019-08-30 2022-04-01 盈盛智创科技(广州)有限公司 Document retrieval method, document retrieval equipment and storage medium
CN112115154A (en) * 2020-09-27 2020-12-22 北京有竹居网络技术有限公司 Data processing and data query method, device, equipment and computer readable medium
CN112817921A (en) * 2021-04-20 2021-05-18 泰德网聚(北京)科技股份有限公司 Cloud resource acquisition management system based on data center
CN112817921B (en) * 2021-04-20 2021-09-10 泰德网聚(北京)科技股份有限公司 Cloud resource acquisition management system based on data center

Similar Documents

Publication Publication Date Title
CN108038222B (en) System of entity-attribute framework for information system modeling and data access
CN106997386B (en) OLAP pre-calculation model, automatic modeling method and automatic modeling system
JP5264740B2 (en) Time series search engine
Phipps et al. Automating data warehouse conceptual schema design and evaluation.
Trujillo et al. Designing data warehouses with OO conceptual models
EP1639503B1 (en) A data processing method and system
US7805465B2 (en) Metadata management for a data abstraction model
CN105706078B (en) Automatic definition of entity collections
CN1307585C (en) Data processing method for realizing data base multitable inguiry
CN108804594A (en) A kind of construction method and device of news content full-text search engine
US6738759B1 (en) System and method for performing similarity searching using pointer optimization
US8380750B2 (en) Searching and displaying data objects residing in data management systems
CN101833568B (en) Web data management system
CN103970902A (en) Method and system for reliable and instant retrieval on situation of large quantities of data
CN104391908B (en) Multiple key indexing means based on local sensitivity Hash on a kind of figure
CN114116716A (en) Hierarchical data retrieval method, device and equipment
CN102810114A (en) Personal computer resource management system based on body
Bernstein et al. OptARQ: A SPARQL optimization approach based on triple pattern selectivity estimation
CN102819600B (en) Keyword search methodology towards relational database of power production management system
EP4191484A1 (en) Automatic machine learning data modelling in a low-latency data access and analysis system
CN112800083B (en) Government decision-oriented government affair big data analysis method and equipment
US11960484B2 (en) Identifying joins of tables of a database
Quoc et al. A performance study of RDF stores for linked sensor data
KR100994725B1 (en) Method of searching personalized ordering sequence based on user context
Supraja et al. Patent search and trend analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181113

RJ01 Rejection of invention patent application after publication