CN115640442A - Information screening method and device - Google Patents

Information screening method and device Download PDF

Info

Publication number
CN115640442A
CN115640442A CN202110817653.9A CN202110817653A CN115640442A CN 115640442 A CN115640442 A CN 115640442A CN 202110817653 A CN202110817653 A CN 202110817653A CN 115640442 A CN115640442 A CN 115640442A
Authority
CN
China
Prior art keywords
data source
information
screening
search
final
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110817653.9A
Other languages
Chinese (zh)
Inventor
夏正新
王东传
邓鹏�
李鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Yizhanshendeng Network Information Technology Co Ltd
Original Assignee
Nanjing Yizhanshendeng Network Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Yizhanshendeng Network Information Technology Co Ltd filed Critical Nanjing Yizhanshendeng Network Information Technology Co Ltd
Priority to CN202110817653.9A priority Critical patent/CN115640442A/en
Publication of CN115640442A publication Critical patent/CN115640442A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for screening information, wherein the screening method comprises the following steps: acquiring information content required by a user, and determining a pre-search keyword according to the information content; expanding the pre-search keywords to obtain final keywords; searching in a search engine using the final keyword to determine a first data source; expanding the first data source to obtain a second data source; determining the required information content in the second data source according to the final keyword; the information screening method provided by the invention determines the pre-search keywords according to the information theme, determines the first data source by using the search engine and the expanded pre-search keyword words, and finally obtains the required information content by using the final keywords, so that on one hand, a user can be helped to accurately and efficiently obtain the required information, and on the other hand, the obtained information can be ensured to be richer, more comprehensive and more ordered.

Description

Information screening method and device
Technical Field
The invention relates to the technical field of internet, in particular to a method and a device for screening information.
Background
With the continuous development and popularization of the internet, people now increasingly know various news, information and the like through a network information platform, for example, a terminal logs in and browses an information platform website, or the information is acquired through an app corresponding to the information platform installed on the terminal.
In such a situation of information flooding, it is increasingly difficult to search for accurate information data, and especially for some specific needs, it is very difficult for a user to simply retrieve desired information data through a search engine, and it takes a lot of time to remove non-target information.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method and an apparatus for screening information comprehensively and accurately.
In order to achieve the above object, a first aspect of the present invention provides a method for screening information, including:
acquiring information content required by a user, and determining a pre-search keyword according to the information content;
expanding the pre-search keywords to obtain final keywords;
searching in a search engine using the final keyword to determine a first data source;
expanding the first data source to obtain a second data source;
and determining the required information content in the second data source according to the final keyword.
In the screening method, preferably, the expanding the first data source to obtain a second data source further includes:
inputting one or more of the final keywords in a search engine and selecting information in search conditions of the search engine, thereby obtaining a third data source;
screening the third data source according to a screening condition to obtain a fourth data source;
and combining the fourth data source with the first data source to obtain a second data source.
In the screening method, preferably, the screening condition is one or more of whether the website in the third data source satisfies the information-based website, whether daily updated information of the website reaches 5 or more, and whether daily browsing volume of the website reaches 5 w.
In the screening method, preferably, the expanding the first data source to obtain a second data source further includes:
acquiring the website type of the first data source;
expanding homogeneous websites or competitive websites according to the website types;
and merging the same type of websites or competitive websites with the first data source to obtain a second data source.
In the screening method, preferably, the determining, in the second data source, the required information content according to the final keyword further includes:
acquiring all information in the second data source;
acquiring title information or summary information of all the information;
and searching the final keyword in the title information or the abstract information to determine the required information content.
In the screening method, preferably, the determining, in the second data source, the required information content according to the final keyword further includes:
obtaining the subject information of all columns in the second data source;
determining target columns in the subject information of all columns;
and searching the final keyword in the target column to determine the required information content.
In another aspect, the present invention provides an apparatus for screening information, comprising:
the system comprises an acquisition unit, a search unit and a search unit, wherein the acquisition unit is used for acquiring information content required by a user and determining a pre-search keyword according to the information content;
the first expansion unit is used for expanding the pre-search keywords to obtain final keywords;
a first determining unit, configured to perform a search in a search engine using the final keyword to determine a first data source;
the second expansion unit is used for expanding the first data source to obtain a second data source;
and the second determining unit is used for determining the required information content in the second data source according to the final keyword.
In the screening apparatus, it is preferable that the second expansion unit includes:
the search module is used for inputting one or more of the final keywords in a search engine and selecting information in search conditions of the search engine so as to obtain a third data source;
the screening module is used for screening the third data source according to screening conditions to obtain a fourth data source;
and the merging module is used for merging the fourth data source and the first data source to obtain a second data source.
In the screening apparatus, it is preferable that the screening condition is one or more of whether or not the web site in the third data source satisfies the information-based web site, whether or not the daily updated information of the web site is 5 or more, and whether or not the daily browsing amount of the web site is 5 w.
Compared with the prior art, the invention has the beneficial effects that: the information screening method provided by the invention determines the pre-retrieval keywords according to the information theme, expands the pre-retrieval keywords, determines the first data source by using the search engine and the expanded pre-retrieval keyword words, expands the first data source, and finally obtains the required information content by using the final keywords, so that the method can help the user to accurately and efficiently obtain the required information on one hand, and on the other hand, can ensure that the obtained information is more abundant, comprehensive and ordered.
Drawings
FIG. 1 is a flowchart illustrating a method for screening information according to an embodiment of the present invention;
FIG. 2 is a block diagram of an apparatus for filtering information according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As used in this application and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.
The relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present application unless specifically stated otherwise. Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description. Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate. In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be discussed further in subsequent figures.
It should be noted that the terms "first", "second", and the like are used to define the components, and are only used for convenience of distinguishing the corresponding components, and the terms have no special meanings unless otherwise stated, so that the scope of the present application is not to be construed as being limited. Further, although the terms used in the present application are selected from publicly known and used terms, some of the terms mentioned in the specification of the present application may be selected by the applicant at his or her discretion, the detailed meanings of which are described in relevant parts of the description herein. Further, it is required that the present application is understood, not simply by the actual terms used but by the meaning of each term lying within.
Referring to fig. 1, the present implementation provides a method for screening information, including the following steps:
s10: acquiring information content required by a user, and determining a pre-search keyword according to the information content;
specifically, the subject content included in the information required by the user is determined, and then the analysis is performed according to the subject content to obtain the pre-search keyword required by the search.
S20: expanding the pre-search keywords to obtain final keywords;
it should be understood that the manner of augmentation includes, but is not limited to, synonyms of pre-search keywords, near synonyms, chinese and foreign language translations, and the like.
S30: searching in a search engine using the final keyword to determine a first data source;
specifically, all final keywords are input in each big search engine (Baidu search, dog search, 360 search, etc.), a website (information must have a certain place) corresponding to the back of the information article is found according to the searched information content, and whether the website is added into the second data source is judged according to the search result.
According to the search, the final keywords of the information and the websites (i.e. the first data source) with more information are provided. For example, it is necessary to find the relevant information of "business opportunity" in the network, and the keywords provided by the product are the final keywords of "business opportunity, wind gap" and several target websites such as website a, website B, and website C, which are most likely to have such information.
S40: expanding the first data source to obtain a second data source;
in some embodiments, the final keywords are sequentially input into a search engine (for example, hundred-degree search, dog search, 360 search), and the search condition selects information (i.e., the screening condition of the search engine itself), so that many information contents possibly required appear in the search result, and what is needed at present is to find the information which is really the target from the information contents. And this information is displayed (from a website). And (2) judging whether the data source can be added into a target data source, wherein the judgment standard is 1. Whether the website is an information website, namely whether the theme of the website meets the requirement, for example, if one website is a training website or a bidding website, the website type does not meet the requirement, and the data source cannot be added into the target data source. 2. If the number of updated websites is less, the website is selected as a target data source, and the information of business opportunities obtained later is less. 3. Whether the daily browsing volume of the website is 5w or not, and whether the daily browsing volume is large or not represents that the information quality of the website is good or not. And if one or all of the three day conditions are met, adding the data into the first data source to finally obtain a second data source.
In other embodiments, the same type of data source websites are augmented with existing target data source websites. And taking the expanded data source website as a third data source, and combining the third data source with the first data source to obtain a second data source.
S50: and determining the required information content in the second data source according to the final key words.
Specifically, with the final keywords and the corresponding information target websites, the required information content needs to be determined from the target websites, but the target websites have more information columns. For example, the website a has columns such as science and technology, life, city, investment, automobile, enterprise service, innovation, etc., and at this time, we do not know which column of the website the required information content appears.
In an embodiment, all information in the second data source may be obtained first, that is, all information under all columns of the website is captured; then, acquiring the title information or the summary information of all the information; and finally, searching the final key words in the title information or the abstract information to determine the required information content. The method can relatively completely find the required information content in the second data source.
In another embodiment, the subject information of all columns in the second data source is obtained firstly; for example, the website A has columns such as science and technology, life, city, initial investment, automobile, enterprise service, innovation and the like; then, determining target columns in the subject information of all columns, that is, acquiring columns with the highest possibility of generating required information content from all columns as captured targets, for example, selecting initial investment and innovation columns as captured column data sources in the website A, because the required information content has a large number of initial investment and innovation in the website A in research; and finally, searching the final keyword in the target column to determine the required information content. The method can accurately find the business information in the website, the information grabbed by the crawler is relatively reduced, and compared with the method I, the screening proportion can be improved, and the overall running time is reduced.
In another embodiment, the final keyword is retrieved by using the in-site search function in the second data source to determine the required information content, i.e. the required information content is obtained by in-site search of the website. Research has found that many websites have in-site searches, and we have exactly the final keywords ("business opportunity", "wind gap"), so by entering the final keywords "business opportunity", "wind gap" in the search box inside the website, the results that appear in the site are then captured. The method can accurately find the business opportunity information in the website, the screening proportion of the information is higher, and the overall operation time is greatly reduced
In other embodiments, referring to fig. 2, the present invention further provides an information screening apparatus, including:
an obtaining unit 100, configured to obtain information content required by a user, and determine a pre-search keyword according to the information content; it should be noted that, since the specific obtaining method and process are already described in detail in step S10 of the information screening method, they are not described herein again.
A first expansion unit 200, configured to expand the pre-search keyword to obtain a final keyword; it should be noted that, since the specific obtaining method and process are already described in detail in step S20 of the information screening method, they are not described herein again.
A first determining unit 300 for searching in a search engine using the final keyword to determine a first data source; it should be noted that, since the specific obtaining method and process are already described in detail in step S30 of the information screening method, they are not described herein again.
A second expansion unit 400, configured to expand the first data source to obtain a second data source; it should be noted that, since the specific obtaining method and process are already described in detail in step S40 of the information screening method, they are not described herein again.
A second determining unit 500, configured to determine the required information content in the second data source according to the final keyword. It should be noted that, since the specific obtaining method and process are already described in detail in step S50 of the information screening method, they are not described herein again.
In another embodiment, the second expansion unit 400 includes:
a search module for inputting one or more of the final keywords in a search engine and selecting information in search conditions of the search engine, thereby obtaining a third data source;
the screening module is used for screening the third data source according to screening conditions to obtain a fourth data source;
and the merging module is used for merging the fourth data source and the first data source to obtain a second data source.
In addition, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium may store a program, and when the program is executed, the program includes part or all of the steps of the method for screening any information described in the above method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.
The integrated unit, if implemented as a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, read-Only memories (ROMs), random Access Memories (RAMs), magnetic or optical disks, and the like.
An exemplary flowchart of a screening method for information according to an embodiment of the present invention is described above with reference to the accompanying drawings. It should be noted that the numerous details included in the above description are merely exemplary of the invention and are not limiting of the invention. In other embodiments of the invention, the method may have more, fewer, or different steps, and the order, inclusion, function, etc. of the steps may be different from that described and illustrated.

Claims (10)

1. A method for screening information, comprising:
acquiring information content required by a user, and determining a pre-search keyword according to the information content;
expanding the pre-search keywords to obtain final keywords;
searching in a search engine using the final keyword to determine a first data source;
expanding the first data source to obtain a second data source;
and determining the required information content in the second data source according to the final key words.
2. The screening method of claim 1, wherein expanding the first data source to obtain a second data source further comprises:
inputting one or more of the final keywords in a search engine and selecting information in search conditions of the search engine, thereby obtaining a third data source;
screening the third data source according to a screening condition to obtain a fourth data source;
and combining the fourth data source with the first data source to obtain a second data source.
3. The screening method according to claim 2, wherein: the screening condition is one or more of whether the website in the third data source meets the information website, whether daily updated information of the website reaches 5 or more, and whether daily browsing volume of the website reaches 5 w.
4. The screening method of claim 1, wherein augmenting the first data source to obtain a second data source further comprises:
acquiring the website type of the first data source;
expanding homogeneous websites or competitive websites according to the website types;
and merging the same type of websites or competitive websites with the first data source to obtain a second data source.
5. The screening method of claim 1, wherein the determining the required information content in the second data source according to the final keyword further comprises:
acquiring all information in the second data source;
acquiring title information or summary information of all the information;
and searching the final keyword in the title information or the abstract information to determine the required information content.
6. The screening method of claim 1, wherein the determining the required information content in the second data source according to the final keyword further comprises:
obtaining the subject information of all columns in the second data source;
determining target columns in the theme information of all columns;
and searching the final keyword in the target column to determine the required information content.
7. An apparatus for screening information, comprising:
the system comprises an acquisition unit, a search unit and a search unit, wherein the acquisition unit is used for acquiring information content required by a user and determining a pre-search keyword according to the information content;
the first expansion unit is used for expanding the pre-search keywords to obtain final keywords;
a first determining unit, configured to perform a search in a search engine using the final keyword to determine a first data source;
the second expansion unit is used for expanding the first data source to obtain a second data source;
and the second determining unit is used for determining the required information content in the second data source according to the final keyword.
8. The screening apparatus according to claim 7, wherein the second expansion unit includes:
a search module for inputting one or more of the final keywords in a search engine and selecting information in search conditions of the search engine, thereby obtaining a third data source;
the screening module is used for screening the third data source according to screening conditions to obtain a fourth data source;
and the merging module is used for merging the fourth data source and the first data source to obtain a second data source.
9. The screening apparatus of claim 8, wherein: the screening condition is one or more of whether the website in the third data source meets the information website, whether daily updated information of the website reaches 5 or more, and whether daily browsing amount of the website reaches 5 w.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of a method for screening information according to any one of claims 1 to 6.
CN202110817653.9A 2021-07-20 2021-07-20 Information screening method and device Pending CN115640442A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110817653.9A CN115640442A (en) 2021-07-20 2021-07-20 Information screening method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110817653.9A CN115640442A (en) 2021-07-20 2021-07-20 Information screening method and device

Publications (1)

Publication Number Publication Date
CN115640442A true CN115640442A (en) 2023-01-24

Family

ID=84940421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110817653.9A Pending CN115640442A (en) 2021-07-20 2021-07-20 Information screening method and device

Country Status (1)

Country Link
CN (1) CN115640442A (en)

Similar Documents

Publication Publication Date Title
US10650087B2 (en) Systems and methods for content extraction from a mark-up language text accessible at an internet domain
US8255386B1 (en) Selection of documents to place in search index
US8805867B2 (en) Query rewriting with entity detection
JP4837040B2 (en) Ranking blog documents
US8150846B2 (en) Content searching and configuration of search results
US8832058B1 (en) Systems and methods for syndicating and hosting customized news content
US7716216B1 (en) Document ranking based on semantic distance between terms in a document
US8762326B1 (en) Personalized hot topics
CN107180093B (en) Information searching method and device and timeliness query word identification method and device
US20070061298A1 (en) Method and apparatus for adding a search filter for web pages based on page type
US8700592B2 (en) Shopping search engines
US9977816B1 (en) Link-based ranking of objects that do not include explicitly defined links
US20150172299A1 (en) Indexing and retrieval of blogs
US8732165B1 (en) Automatic determination of whether a document includes an image gallery
US20110004829A1 (en) Method for Human-Centric Information Access and Presentation
US8423885B1 (en) Updating search engine document index based on calculated age of changed portions in a document
KR101932619B1 (en) Method, apparatus and data processing system for matching content items with images
JP2002207760A (en) Document retrieval method, executing device thereof, and storage medium with its processing program stored therein
KR20120087058A (en) Apparatus, method and computer readable recording medium for providibg related contents
JP5952711B2 (en) Prediction server, program and method for predicting future number of comments in prediction target content
CN103226601B (en) A kind of method and apparatus of picture searching
CN110008396B (en) Object information pushing method, device, equipment and computer readable storage medium
JP4912384B2 (en) Document search device, document search method, and document search program
US8375017B1 (en) Automated keyword analysis system and method
CN115640442A (en) Information screening method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination