CN113032436B

CN113032436B - Searching method and device based on article content and title

Info

Publication number: CN113032436B
Application number: CN202110412837.7A
Authority: CN
Inventors: 姚鑫; 白杰; 白会杰; 宋东瑞
Original assignee: Suzhou Zhenxuan Data Information Technology Co ltd
Current assignee: Suzhou Zhenxuan Data Information Technology Co ltd
Priority date: 2021-04-16
Filing date: 2021-04-16
Publication date: 2022-05-31
Anticipated expiration: 2041-04-16
Also published as: CN113032436A

Abstract

The invention provides a searching method and a device based on article content and titles, wherein the method comprises the following steps: storing article data by using a search system in a distributed storage mode, wherein the search system is realized by using an elastic search; when an article searching request of a user terminal is received, the article content and the title are searched in real time in a searching system; the method comprises the steps of aggregating the matched data during retrieval by taking articles as objects, and highlighting the matched articles on a user terminal in a paging display mode so as to solve the problem that the retrieved data cannot be aggregated in the prior art.

Description

Searching method and device based on article content and title

Technical Field

The invention relates to the field of big data search, in particular to a search method and a search device based on article contents and titles.

Background

With the progress of computer technology, big data search has been rapidly developed. The current big data search is realized based on Mysql, Solr, elastic search, Hermes and other technologies. The Solr and the ElasticSearch focus on searching and full-text retrieval, the data scale can reach millions to millions, the Solr utilizes the Zookeeper to perform distributed management, supports data in various formats, is widely used in traditional search application, has low efficiency for processing real-time search application, supports data in json format, has a distributed coordination management function, and has higher efficiency than the Solr when processing the real-time search application. Hermes is a massive data real-time retrieval and analysis platform based on a large-index technology, data analysis is emphasized, and the data scale is different from hundreds of millions to trillions.

In the scheme, the Mysql full-text retrieval efficiency is low, the result matching correlation degree is low, the word segmentation retrieval cannot be completed, and the product use experience sense is biased; the Solr processing real-time application efficiency is low; hermes focuses on data analysis and is relatively inefficient in search and full-text retrieval; the search of the ElasticSearch is based on data, and multiple data matched with the search cannot be aggregated into the same data in an object-oriented mode during aggregation.

Disclosure of Invention

The invention mainly aims to provide a searching method and a searching device based on article contents and titles, which are used for solving the problem that the retrieved data cannot be aggregated in the prior art.

In order to achieve the above object, according to an aspect of the present invention, there is provided a search method based on article contents and titles, including: storing article data by using a search system in a distributed storage mode, wherein the search system is realized by using an elastic search; when an article searching request of a user terminal is received, the article content and the title are searched in real time in a searching system; and aggregating the matched data during retrieval by taking the articles as objects, and highlighting the matched articles on the user terminal in a paging display mode.

Optionally, the storing article data by using a search system in a distributed storage manner includes: the whole article is split according to the title and the paragraph and then is stored in a distributed storage system in a standard data structure mode, wherein the standard data structure comprises the following fields: the method comprises the following steps of identifying the belonged article, the article type of the belonged article, the source of the belonged article, the article name and content of the belonged article, the URL of the belonged article, whether the article is the title of the article, the release time of the belonged article, the data generation time and the storage time.

Optionally, when receiving an article search request, the real-time retrieval of article contents and titles in the search system includes: according to keywords in an article search request, performing real-time retrieval of article contents and titles in a search system to obtain a search result, wherein the search result comprises a single article or a plurality of articles, and the single article comprises the following situations: a single article matching only the title, a single article matching only the paragraph, and a single article matching both the title and the paragraph, the plurality of articles including the following situations: the method is characterized in that a plurality of articles which are all titles are matched, a part of the articles are matched with the titles and the paragraphs at the same time, a part of the articles are matched with the titles only, and another part of the articles (different from a part of the articles described before and can be a part or all of the articles left after the part of the articles) are matched with the paragraphs only.

Optionally, aggregating the data matched during the retrieval with the article as an object includes: for the data matched with the title, aggregating the data with the same identification of the article; and for the data only matched with the paragraph, aggregating the data with the same identification of the article to which the data belongs.

Optionally, highlighting the matched article on the user terminal by using a paging display mode includes: calculating the relevance between the matched articles and keywords in the article search request by adopting a relevance algorithm; and highlighting the matched articles on the user terminal according to the relevance from large to small by adopting a paging display mode.

Optionally, highlighting the matched article from large to small according to the relevance on the user terminal includes: highlighting the matched article from large to small according to the correlation degree on the user terminal according to the preset display configuration, wherein the preset display configuration is that only the title is displayed or the title and the paragraph are displayed at the same time.

Optionally, after highlighting the matched article in a paging display manner, in the case of receiving a search request from the user terminal, if a keyword in the received search request is the same as a keyword in the article search request, returning the same search result as the previous search result to the user terminal.

In order to achieve the above object, according to an aspect of the present invention, there is also provided a search apparatus based on article contents and titles, including: the article data storage unit is used for storing the article data by utilizing a search system in a distributed storage mode, wherein the search system is realized by adopting an elastic search; the search unit is used for searching article contents and titles in real time in the search system when receiving an article search request of the user terminal; and the display unit is used for aggregating the matched data during retrieval by taking the articles as objects and highlighting the matched articles on the user terminal in a paging display mode.

Optionally, the storage unit is further configured to: the whole article is split according to the title and the paragraph and then is stored in a distributed storage system in a standard data structure mode, wherein the standard data structure comprises the following fields: the method comprises the following steps of identifying the belonged article, the article type of the belonged article, the source of the belonged article, the article name and content of the belonged article, the URL of the belonged article, whether the article is the title of the article, the release time of the belonged article, the data generation time and the storage time.

Optionally, the search unit is further configured to: according to keywords in an article search request, performing real-time retrieval of article contents and titles in a search system to obtain a search result, wherein the search result comprises a single article or a plurality of articles, and the single article comprises the following situations: a single article matching only the title, a single article matching only the paragraph, and a single article matching both the title and the paragraph, the plurality of articles including the following situations: the matched articles and partial articles are matched with the titles and the paragraphs at the same time, the partial articles are only matched with the titles, and the partial articles are only matched with the paragraphs.

Optionally, the display unit is further configured to: for the data matched with the title, aggregating the data with the same identification of the article; and for the data only matched with the paragraph, aggregating the data with the same identification of the article to which the data belongs.

Optionally, the display unit is further configured to: calculating the relevance between the matched articles and keywords in the article search request by adopting a relevance algorithm; and highlighting the matched articles on the user terminal according to the relevance from large to small by adopting a paging display mode.

Optionally, the display unit is further configured to: highlighting the matched articles from large to small according to the relevance according to a preset display configuration on the user terminal, wherein the preset display configuration is that only the title is displayed or the title and the paragraph are displayed at the same time.

Optionally, the apparatus of the present application may further comprise: and the response unit is used for, after the matched article is highlighted in a paging display mode, returning the same search result as the previous search result to the user terminal if the keyword in the received search request is the same as the keyword in the article search request under the condition that the search request of the user terminal is received.

According to another aspect of the embodiments of the present application, there is also provided a storage medium including a stored program which, when executed, performs the above-described method.

According to another aspect of the embodiments of the present application, there is also provided an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the above method through the computer program.

By applying the technical scheme of the invention, million distributed storages are completed by adopting the elastic search; the method has the advantages that the ElasticSearch is adopted to complete real-time retrieval of article contents and titles, the matched data is aggregated by taking articles as objects, a certain number of article data are displayed in pages, and the matched data are highlighted, so that the problem that aggregation processing cannot be performed on the retrieved data in the prior art can be solved, and the user experience is improved.

In addition to the objects, features and advantages described above, other objects, features and advantages of the present invention are also provided. The present invention will be described in further detail below with reference to the drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are included to illustrate an exemplary embodiment of the invention and not to limit the invention. In the drawings:

FIG. 1 illustrates a flow chart of an alternative article content and title based search method in accordance with the present invention;

FIG. 2 is a schematic diagram illustrating search results of an optional article according to the present invention;

FIG. 3 is a schematic diagram of an alternative correlation calculation scheme in accordance with the present invention;

FIG. 4 is a schematic diagram of an alternative correlation calculation scheme in accordance with the present invention;

FIG. 5 is a schematic diagram of an alternative correlation calculation scheme in accordance with the present invention;

FIG. 6 is a schematic diagram illustrating an alternative correlation calculation result according to the present invention;

FIG. 7 is a schematic diagram of an alternative data node in accordance with the present invention;

FIG. 8 is a schematic diagram illustrating an alternative data retrieval in accordance with the present invention; and the number of the first and second groups,

FIG. 9 shows a schematic diagram of an alternative article search scheme in accordance with the present invention.

Detailed Description

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances for describing embodiments of the invention herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

According to an aspect of embodiments of the present application, there is provided an embodiment of a search method based on article content and titles. As shown in fig. 1:

and step S101, storing article data in a distributed storage mode by using a search system, wherein the search system is realized by using an elastic search.

Step S102, when receiving the article search request of the user terminal, the search system searches the article content and the title in real time.

Optionally, when receiving an article search request, the real-time retrieval of article contents and titles in the search system includes: according to keywords in the article search request, performing real-time retrieval of article contents and titles in the search system to obtain search results, wherein the search results comprise a single article or a plurality of articles, and the single article comprises the following situations: a single article matching only the title, a single article matching only the paragraph, and a single article matching both the title and the paragraph, the plurality of articles including the following situations: the method comprises the steps that a plurality of articles which are all titles are matched, a part of the articles are matched with the titles and the paragraphs at the same time, the part of the articles are matched with the titles only, and the part of the articles are matched with the paragraphs only.

And step S103, aggregating the matched data in the searching process by taking the articles as objects, and highlighting the matched articles on the user terminal in a paging display mode.

Optionally, highlighting the matched article from large to small according to the relevance on the user terminal includes: highlighting the matched articles from large to small according to the relevance according to a preset display configuration on the user terminal, wherein the preset display configuration is that only the title is displayed or the title and the paragraph are displayed at the same time.

In the related use case of the ElasticSearch, the ElasticSearch can be adopted to perform PB level search; starting a core search architecture based on the elastic search to provide timely and accurate music search service for a user; the method comprises the steps of using an ElasticSearch as text data analysis, collecting various index data and user-defined data on a server, and performing multi-dimensional analysis display on various data to assist in positioning analysis instance abnormity or service level abnormity; analyzing and processing hundreds of millions of real-time logs by using ES; the ES is used to build a log collection and analysis system. In conclusion, the scheme is based on the ElasticSearch to complete full-text retrieval based on article contents and titles in consideration of real-time searching and real-time data analysis of products.

In the scheme, million distributed storages are completed by adopting the elastic search; the method has the advantages that the ElasticSearch is adopted to complete real-time retrieval of article contents and titles, the matched data is aggregated by taking articles as objects, a certain number of article data are displayed in a paging mode, the matched data are highlighted, and the use experience of a user is improved. The technical solution of the present application is further detailed below with reference to specific embodiments:

step 1, creating a data structure, namely splitting a whole article according to titles and paragraphs, wherein the format of the data structure is shown in the following table 1.

TABLE 1

And 2, analyzing the user behavior.

In the search behavior analysis, the following cases are included for a single article, as shown in fig. 2: the search result is matched with only the title, the search result is matched with only the paragraph, and the search result is matched with the paragraph and the title.

Several cases are included for several articles: the matched data is the titles of different articles, partial articles are matched with both the titles and the paragraphs, partial articles are matched with only the titles or partial articles are matched with only the paragraphs.

The search filter field is sortText (i.e., a search key). The demonstration includes the following several cases: only headings, and paragraphs are shown.

Because different processing is carried out on the titles and the paragraphs as matching results, the isTitle judgment needs to be carried out on the searched results; the same articles are aggregated and the article data structure is as follows:

the structure of the final aggregated search results is as follows:

the implementation process of article aggregation:

data returned by the ES is acquired, and the technology aggregates articles and paragraphs by the paper Ids (the same paper Ids are the same articles) in the data structure.

And A, firstly matching the title, firstly matching the data matched with the title, and aggregating all the data with the same paperId.

Traversing the article titles, the url of the articles, the source names of the articles and the release time of the articles of all the articles, and binding the article paragraphs and the article titles (article ids of the titles) to obtain an article paragraph set, wherein for the paragraphs in the set, if the article Id of the titles is the same as the article Id of the paragraphs, the article paragraphs can be judged to be from the same article.

And B, matching paragraphs, wherein the search result is possibly matched with only paragraphs, and aggregating the data with matched titles according to the paperId again.

And 3, performing word segmentation analysis on the search text. The word segmentation mechanism used is shown in table 2.

TABLE 2

Character Filter	Processing original text	Example (c): removing html tags, special characters, etc
			Tokenizer	Segmenting original text into words	Example (c): medical information->Medicine, information
Token Filters	Processing the keywords after word segmentation	Example (c): change to lower case, delete mood words, synonyms, and the like

And 4, calculating the correlation degree by using a correlation algorithm, as shown in FIG. 3.

In step 401, the phrase Alfred way to be queried is determined.

At step 402, the number of times TF (term frequency) that the keyword appears in each document Doc is determined.

Step 403, determining the frequency IDF (inverse document word frequency updated document frequency) of the keyword appearing in the whole index.

Step 404, determining a field length norm, wherein the longer the field length, the smaller the value.

In step 405, the final scoring result score (q, d) for a document doc is determined.

And step 406, converting the discrete-looking data into a similar interval by using querynorm (q) on the premise of not influencing the mutual relation so as to be more humanized.

Step 407, using coord (q, d) to score the matching result, and the more the matched document doc is scored, the more the matching score is calculated according to the inverted index of the user name username.

And step 408, summarizing the scores by using the sigma function to obtain the total weight of each item in the query by the document doc.

Step 409, using tf (t in d) to determine the square root of the number of times the item appears in the document doc.

Step 410, the weight value set with t.getboost ().

Step 411, norm (t, d), the longer the field length, the smaller the result.

The scheme can be realized by a TF/IDE model, as shown in FIG. 4, or by a BM25 model, as shown in FIG. 5, wherein the score pair of the TF/IDE model and the BM25 model is shown in FIG. 6.

And step 5, distributed storage, wherein in order to avoid the risk of data loss caused by single-point faults, the scheme adopts multi-node distributed storage, and the high availability of the system is improved through a master-slave design mode and a master-slave design mode.

The operational data node working scheme is shown in fig. 7:

1) the client sends a new creation, index or deletion request to NODE NODE 1 (i.e. MASTER NODE) in the CLUSTER CLUSTER.

2) The NODE uses the id of the document to determine that the document belongs to shard 0, which forwards the request to NODE 3, with shard 0 located at this NODE.

3) NODE 3 executes the request on the primary partition, and if successful, it forwards the request to the corresponding replication NODEs located at NODE 1 and NODE 2, and when all replication NODEs report success, NODE 3 reports success to the requesting NODE, which reports to the client.

The retrieval scheme is shown in FIG. 8:

1) the client sends a get request to NODE 1.

2) The nodes use the id of the document to determine that the document belongs to the fragment 0, and the copy fragments corresponding to the fragment 0 are all on the three nodes. At this point it forwards the request to NODE 2.

3) NODE 2 returns the document (document) to NODE 1 and then to the client.

For read requests, to balance the load, the requesting node will select a different shard for each request, which will cycle through all shard copies. It may be the case that an indexed document already exists on the primary partition but has not yet been synchronized to the duplicate partition. At this time, the copy fragment reports that the document is not found, and the main fragment successfully returns the document. Once the index request is successfully returned to the user, the document is available in both the master shard and the replica shard.

Optionally, the id registered by the user is used as a unique identifier, the data requested to be accessed by the user for the first time is associated with the user through a redis cache technology, and when the user initiates the same request again, the data is obtained from the cache, so that the response speed is increased again.

As shown in fig. 9, after the user registration, the user id and uri (short for Uniform Resource Identifier) are bound, and if all parameters of the accessed uri except the pageNo and pageSize are the same, the cached data is displayed to the user, otherwise, the ElasticSearch is accessed again.

In the technical scheme of the application, a cluster scheme is adopted, and the high availability of the system is improved through a master-slave mode and a master-slave mode; the method adopts the ElasticSearch inverted index principle, divides the search condition into words as much as possible, improves the matching correlation degree, displays the words to the user in a highlight form and improves the experience of the user; and highly aggregating the matched contents into an article by the thought facing the article object, and displaying the article to the user.

Due to the fact that sequencing is adopted for services, a default correlation algorithm of the ES is broken through, the filter is adopted as a query to replace match, because the filter is not sequenced according to score, the performance is high, and the filter can add cache after multiple times of same operations.

The scheme adopts an elastic search to complete millions of distributed storages; the method has the advantages that the ElasticSearch is adopted to complete real-time retrieval of article contents and titles, the matched data is aggregated by taking articles as objects, a certain number of article data are displayed in a paging mode, the matched data are highlighted, and the use experience of a user is improved.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art will recognize that the embodiments described in this specification are preferred embodiments and that acts or modules referred to are not necessarily required for this application.

Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method according to the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

According to another aspect of the embodiment of the application, a searching device based on article content and titles is also provided. The method comprises the following steps: the article data storage unit is used for storing the article data by utilizing a search system in a distributed storage mode, wherein the search system is realized by adopting an elastic search; the search unit is used for searching article contents and titles in real time in a search system when receiving an article search request of a user terminal; and the display unit is used for aggregating the matched data during retrieval by taking the articles as objects and highlighting the matched articles on the user terminal in a paging display mode.

Optionally, the search unit is further configured to: according to keywords in an article search request, performing real-time retrieval of article contents and titles in a search system to obtain a search result, wherein the search result comprises a single article or a plurality of articles, and the single article comprises the following situations: a single article matching only the title, a single article matching only the paragraph, and a single article matching both the title and the paragraph, the plurality of articles including the following situations: the matched articles and partial articles are simultaneously matched with the titles and the paragraphs, the partial articles are only matched with the titles, and the partial articles are only matched with the paragraphs.

Optionally, the display unit is further configured to: for the data matched with the titles, aggregating the data with the same identification of the article; and for the data only matched with the paragraph, aggregating the data with the same identification of the article to which the data belongs.

The relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise. Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description. Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate. In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Spatially relative terms, such as "above … …," "above … …," "above … … surface," "above," and the like, may be used herein for ease of description to describe one device or feature's spatial relationship to another device or feature as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. For example, if a device in the figures is turned over, devices described as "above" or "on" other devices or configurations would then be oriented "below" or "under" the other devices or configurations. Thus, the exemplary term "above … …" can include both an orientation of "above … …" and "below … …". The device may be otherwise variously oriented (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly.

In the description of the present invention, it is to be understood that the orientation or positional relationship indicated by the orientation words such as "front, rear, upper, lower, left, right", "lateral, vertical, horizontal" and "top, bottom", etc. are usually based on the orientation or positional relationship shown in the drawings, and are only for convenience of description and simplicity of description, and in the case of not making a reverse description, these orientation words do not indicate and imply that the device or element being referred to must have a specific orientation or be constructed and operated in a specific orientation, and therefore, should not be considered as limiting the scope of the present invention; the terms "inner and outer" refer to the inner and outer relative to the profile of the respective component itself.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for searching based on article content and titles is characterized by comprising the following steps:

storing article data by using a search system in a distributed storage mode, wherein the method comprises the following steps: the whole article is split according to the title and the paragraph and then is stored in a distributed storage system in a standard data structure mode, wherein the standard data structure comprises the following fields: the search system comprises a search system and a search system, wherein the search system comprises a search system and a search system, the search system comprises identifications of belonged articles, article types of the belonged articles, sources of the belonged articles, article names and contents of the belonged articles, URLs of the belonged articles, whether the articles are article titles, release time of the belonged articles, data generation time and storage time, and the search system is a system realized by adopting ElasticSearch;

when receiving an article search request from a user terminal, the method for searching article contents and titles in real time in the search system comprises the following steps: according to keywords in the article search request, performing real-time retrieval of article contents and titles in the search system to obtain a search result, wherein the search result comprises a single article or a plurality of articles, and the single article comprises the following situations: a single article matching only the title, a single article matching only the paragraph, and a single article matching both the title and the paragraph, the plurality of articles including the following situations: the matched articles and partial articles are simultaneously matched with the titles and the paragraphs, the partial articles are only matched with the titles, and the partial articles are only matched with the paragraphs;

aggregating the matched data during retrieval by taking the article as an object, wherein the aggregating comprises the following steps: for the data matched with the title, aggregating the data with the same identification of the article to which the article belongs for the data only matched with the paragraph, and highlighting the matched article on the user terminal in a paging display mode;

after highlighting the matched articles in a paging display mode, under the condition of receiving a search request of the user terminal, if a keyword in the received search request is the same as the keyword in the article search request, returning the same search result as the previous search result to the user terminal.

2. The method of claim 1, wherein highlighting the matched article on the user terminal in a paginated presentation comprises:

calculating the relevance between the matched articles and keywords in the article search request by adopting a relevance algorithm;

and adopting a paging display mode to highlight the matched articles on the user terminal from big to small according to the relevance.

3. The method of claim 2, wherein highlighting the matched articles on the user terminal according to relevance from large to small comprises:

and highlighting the matched articles from large to small according to the relevance according to a preset display configuration on the user terminal, wherein the preset display configuration is that only the title is displayed or the title and the paragraph are displayed simultaneously.

4. An article content and title based search apparatus, comprising:

the storage unit is used for storing article data by using a search system in a distributed storage mode: the whole article is split according to the title and the paragraph and then is stored in a distributed storage system in a standard data structure mode, wherein the standard data structure comprises the following fields: the search system comprises a search system, a search system and a search system, wherein the search system comprises a search system and a search system, the search system comprises an identifier of an article, an article type of the article, a source of the article, an article name of the article, content, a URL of the article, whether the article is a title of the article, release time of the article, data generation time and storage time, and the search system is realized by using an ElasticSearch;

the search unit is used for searching article contents and titles in real time in the search system when receiving an article search request of a user terminal, and comprises the following steps: according to keywords in the article search request, performing real-time retrieval of article contents and titles in the search system to obtain a search result, wherein the search result comprises a single article or a plurality of articles, and the single article comprises the following situations: a single article matching only the title, a single article matching only the paragraph, and a single article matching both the title and the paragraph, the plurality of articles including the following situations: the matched articles and partial articles are simultaneously matched with the titles and the paragraphs, the partial articles are only matched with the titles, and the partial articles are only matched with the paragraphs;

the display unit is used for aggregating the data matched during retrieval by taking the article as an object, and comprises the following steps: for the data matched with the title, aggregating the data with the same identification of the article to which the article belongs for the data only matched with the paragraph, and highlighting the matched article on the user terminal in a paging display mode;

the display unit is further configured to: after highlighting the matched articles in a paging display mode, under the condition of receiving a search request of the user terminal, if a keyword in the received search request is the same as the keyword in the article search request, returning the same search result as the previous search result to the user terminal.

5. A storage medium, characterized in that the storage medium comprises a stored program, wherein the program when executed performs the method of any of the preceding claims 1 to 3.

6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the method of any of the preceding claims 1 to 3 by means of the computer program.