CN113868235A - Big data-based information retrieval and analysis system - Google Patents

Big data-based information retrieval and analysis system Download PDF

Info

Publication number
CN113868235A
CN113868235A CN202111151682.2A CN202111151682A CN113868235A CN 113868235 A CN113868235 A CN 113868235A CN 202111151682 A CN202111151682 A CN 202111151682A CN 113868235 A CN113868235 A CN 113868235A
Authority
CN
China
Prior art keywords
retrieval
information
module
user
storage end
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111151682.2A
Other languages
Chinese (zh)
Inventor
朱志辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhu Zhihui
Original Assignee
Shenzhen Lianyin Intercommunication Information Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Lianyin Intercommunication Information Co ltd filed Critical Shenzhen Lianyin Intercommunication Information Co ltd
Priority to CN202111151682.2A priority Critical patent/CN113868235A/en
Publication of CN113868235A publication Critical patent/CN113868235A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an information retrieval analysis system based on big data, which relates to the technical field of information retrieval and comprises an information input module, an information receiving module, a browsing module and an evaluation module; when a user inputs retrieval information through the information input module, the analysis module is used for tracking a login account of the user and carrying out statistical analysis on a historical retrieval record of the user to obtain a storage end sequence table of the retrieval; after the information retrieval module receives the storage end sequence list, the learning resources stored in the corresponding storage end are sequentially retrieved by combining retrieval information input by a current user, so that synchronous interaction of the information in an original resource database with large storage capacity can be avoided, the data retrieval pressure of the original resource database is reduced, the retrieval efficiency is improved, and the waste of retrieval resources is avoided; the information receiving module is used for receiving the retrieval result of the information retrieval module, auditing and filtering the retrieval result, and pushing the corresponding retrieval result to the user terminal, so that the retrieval efficiency is improved.

Description

Big data-based information retrieval and analysis system
Technical Field
The invention relates to the technical field of information retrieval, in particular to an information retrieval analysis system based on big data.
Background
With the popularity of internet applications and the advent of the big data age, the number of global internet web pages per day has increased in the tens of millions of levels. To retrieve the required information on a vast network, search engines have become an indispensable aid for accessing the internet.
The document with publication number CN106503199A discloses a computer information retrieval system based on network, which comprises a foreground information input system and a background information retrieval system, wherein the foreground information input system and the background information retrieval system are both electrically connected in two ways through a computer central system; the foreground information input system comprises a picture input subsystem, a language input subsystem and a character input subsystem; the background information retrieval system comprises an information retrieval subsystem, a retrieval subsystem and a retrieval sharing subsystem, the network-based computer information retrieval system comprises a foreground information input system and a background information retrieval system, when retrieval is needed, three retrieval information of pictures, languages and characters can be input, the problem that the retrieval mode of the traditional retrieval system is single is solved, the retrieval sharing subsystem realizes retrieval sharing, and remote transmission is realized.
However, when the application is used for searching, information is synchronously interacted in the information storage unit with large storage capacity, so that the data retrieval pressure of the information storage unit is improved; after the retrieval is finished, all possible results are presented to the user, and the user selects the required retrieval items, so that the burden of the user is increased, and the searching efficiency is reduced; and meanwhile, an effective evaluation for a retrieval analysis system is lacked.
Disclosure of Invention
In order to solve the problems existing in the scheme, the invention provides an information retrieval and analysis system based on big data. The invention can sequentially search the learning resources stored in the corresponding storage end according to the search priority value, can avoid synchronous interaction of information in the original resource database with large storage capacity, reduces the data search pressure of the original resource database, improves the search efficiency and avoids the waste of search resources.
The purpose of the invention can be realized by the following technical scheme:
an information retrieval and analysis system based on big data comprises a data acquisition module, an information input module, an analysis module, an information receiving module, a browsing module and an evaluation module;
the data acquisition module is used for acquiring learning resource information of an education platform to form an original resource database, and the original resource database comprises a plurality of storage terminals;
the information input module is used for logging in by a user, inputting retrieval information and sending the input retrieval information to the information retrieval module; when a user inputs retrieval information through the information input module, the analysis module is used for tracking a login account of the user, performing statistical analysis on a historical retrieval record of the user to obtain a storage end sequence table of the retrieval, and feeding the storage end sequence table back to the information retrieval module;
after the information retrieval module receives the storage end sequence table, the learning resources stored in the corresponding storage end are sequentially retrieved by combining the retrieval information input by the current user;
the information receiving module is used for receiving the retrieval result of the information retrieval module, auditing and filtering the retrieval result, and pushing the corresponding retrieval result to the user terminal; the browsing module is used for the user terminal to select the retrieval result for looking up until the target data is found, and feeding the target data back to the information retrieval module; and when the user logs out, the evaluation module is used for evaluating the retrieval service of the learning resources by the user.
Further, the specific analysis steps of the analysis module are as follows:
when a user inputs retrieval information, tracking a login account of the user, and collecting retrieval records of the user in the last three months; the retrieval records carry corresponding target data;
acquiring a storage end where each target data is located, and counting the occurrence times of the same storage end and the total browsing time of the target data in the same storage end; calculating to obtain a retrieval attraction value Gi of the storage end;
acquiring all retrieval results fed back by historical retrieval information matched with the current retrieval information; counting the distribution proportion of the corresponding retrieval results at each storage end and marking as the storage end occupation ratio Zi;
using formulas
Figure BDA0003287370800000031
And calculating to obtain a retrieval priority value JSi of the storage end in the retrieval, and sequencing the storage ends according to the size of the retrieval priority value JSi to obtain a storage end sequence list of the retrieval.
Further, the specific auditing and filtering steps of the information receiving module are as follows:
s1: acquiring a plurality of retrieval results of the retrieval information; extracting original keywords of the learning resources corresponding to each retrieval result, and performing data cleaning on the original keywords to obtain learning keywords;
s2: then, the learning keywords are stored into a specific data format to be used as key information for storage, and a key information coding table of learning resources is established;
s3: carrying out coverage rate analysis on the key information coding tables corresponding to any two retrieval results, and filtering to obtain representative retrieval results;
s4: and (4) performing access value analysis on the representative retrieval results, and selecting the representative retrieval results with the access values of W1 before ranking to feed back to the user terminal, wherein W1 is a preset value.
Further, the original keywords are keywords which appear more than a set threshold frequently in texts corresponding to the learning resources; the specific process of carrying out data cleaning on the original keywords comprises the following steps: unifying keywords with the same meaning or similar keywords, and removing keywords without actual analysis meaning.
Further, wherein the coverage is expressed as: the number of the same codes in the two key information coding tables is compared; the number ratio is the same code number/code number calculated value, and the code number calculated value is the minimum value of the total number of codes of the two code tables.
Further, the filtering in step S3 to obtain the representative search result specifically includes:
if the coverage rate exceeds gamma%, taking the retrieval result with a large number of codes as a representative retrieval result, and rejecting the other retrieval result; if the coverage rate does not exceed gamma%, taking the two search results as representative search results; and then, performing coverage rate analysis on the representative retrieval result and other retrieval results, and so on, wherein gamma is a preset value.
Further, the specific working steps of the evaluation module are as follows:
marking the service score of the user as Qs, acquiring the number of representative retrieval results consulted before the user finds the target data, and marking the representative retrieval results as Cs; calculating the time difference between the time when the user inputs the retrieval information and the time when the target data is fed back to obtain a retrieval time GT;
calculating a retrieval satisfaction value QR of the user by using a formula QR (Qs multiplied by r1)/(Cs multiplied by r2+ GT multiplied by r3), wherein r1, r2 and r3 are coefficient factors; the evaluation module is used for searching the QR satisfaction value, stamping a time stamp and storing the time stamp in the storage module, and transmitting the QR satisfaction value to the display module for real-time display.
Furthermore, the original resource database is used for extracting the release time information of each piece of stored learning resource information and classifying the stored learning resource information according to a plurality of time periods; each storage terminal is in one-to-one correspondence with each type of learning resource information and is used for storing the learning resource information of the corresponding type.
Compared with the prior art, the invention has the beneficial effects that:
1. when a user inputs retrieval information through the information input module, the analysis module is used for tracking a login account of the user, performing statistical analysis on historical retrieval records of the user and generating a corresponding storage end sequence table;
2. the information receiving module is used for receiving the retrieval results of the information retrieval module, auditing and filtering the retrieval results, firstly, performing coverage rate analysis on the key information coding tables corresponding to any two retrieval results, deleting and selecting the homologous retrieval results, and selecting the retrieval results with a large number of codes as representative retrieval results, so that not only can the selection items be reduced, but also the user can obtain more abundant and comprehensive learning resources, the user is prevented from spending time and energy on similar learning resources, and the retrieval efficiency is improved;
3. when the user logs out, the evaluation module is used for evaluating the retrieval service of the learning resources by the user, calculating the retrieval satisfaction value of the user by combining the service score of the user, the number of representative retrieval results consulted before the user finds the target data and the retrieval duration, and transmitting the retrieval satisfaction value to the display module for real-time display, so that the administrator can conveniently and visually know the retrieval satisfaction value.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic block diagram of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, an information retrieval and analysis system based on big data includes a data acquisition module, an original resource database, an information input module, an information retrieval module, an analysis module, an information receiving module, a browsing module, an evaluation module, a storage module, and a display module;
the data acquisition module is used for acquiring learning resource information of the education platform to form an original resource database, and the original resource database is used for extracting the release time information of each piece of stored learning resource information and classifying the stored learning resource information according to a plurality of time periods;
the original resource database comprises a plurality of storage ends, each storage end corresponds to each type of learning resource information one by one, and each storage end is used for storing the corresponding type of learning resource information;
the information input module is used for logging in by a user, inputting retrieval information and sending the input retrieval information to the information retrieval module; the information retrieval module is used for retrieving the learning resources according to the retrieval information; the retrieval information comprises retrieval keywords or keywords;
the information input module is connected with the analysis module, when a user inputs retrieval information through the information input module, the analysis module is used for tracking a login account of the user, performing statistical analysis on a historical retrieval record of the user to obtain a storage end sequence table of the retrieval, and the specific analysis steps are as follows:
the first step is as follows: when a user inputs retrieval information, tracking a login account of the user, and collecting retrieval records of the user in the last three months; the retrieval record comprises input retrieval information and corresponding target data; one piece of retrieval information corresponds to one or more retrieval results, and a user selects a required retrieval item, namely target data;
the second step is that: acquiring a storage end where each target datum is located, counting the occurrence times of the same storage end according to the storage end and marking the occurrence times as storage end frequency Pi; wherein i represents the ith storage end;
accumulating the browsing time of each target data in the same storage end to form a total storage end duration Ti; normalizing the frequency of the storage end and the total time length of the storage end and taking the numerical values of the frequency and the total time length;
calculating a retrieval attraction value Gi of the storage end by using a formula Gi-Pi × a1+ Ti × a2, wherein a1 and a2 are coefficient factors;
the third step: dividing the search information, matching the input search information with the search information input historically, and if the coincidence degree of the keywords or the keywords exceeds mu%, the matching is successful; wherein mu is a preset value and takes a value of 95;
acquiring all retrieval results fed back by historical retrieval information matched with the current retrieval information; counting the distribution proportion of the corresponding retrieval result at each storage end according to the storage end to which the retrieval result belongs, and marking as a storage end occupation ratio Zi; wherein Zi is in one-to-one correspondence with Gi;
the fourth step: carrying out normalization processing on the retrieval attraction value and the storage end ratio and taking the numerical values;
using formulas
Figure BDA0003287370800000071
Calculating to obtain a retrieval priority value JSi of the storage end in the retrieval, wherein f1 and f1 are preset coefficient factors, and eta is a fixed value;
sorting the storage ends according to the size of the retrieval priority value JSi to obtain a storage end sequence table of the retrieval;
the analysis module is used for feeding back the storage end sequence table searched at this time to the information retrieval module, and after the information retrieval module receives the storage end sequence table, the information retrieval module sequentially retrieves the learning resources stored in the corresponding storage end by combining the retrieval information input by the current user;
the method can sequentially retrieve the learning resources stored in the corresponding storage end according to the retrieval priority value JSi, can avoid synchronous interaction of information in the original resource database with large storage capacity, reduce the data retrieval pressure of the original resource database, improve the retrieval efficiency and avoid the waste of retrieval resources;
the information receiving module is used for receiving the retrieval result of the information retrieval module, auditing and filtering the retrieval result, and pushing the corresponding retrieval result to the user terminal; the specific examination and filtration steps are as follows:
s1: acquiring a plurality of retrieval results of the retrieval information; extracting original keywords of the learning resources corresponding to each retrieval result, and performing data cleaning on the original keywords to obtain learning keywords;
the original keywords are keywords which appear more than a set threshold frequently in texts corresponding to the learning resources; the specific process of carrying out data cleaning on the original keywords comprises the following steps: unifying keywords with the same meaning or similar keywords, and removing keywords without actual analysis meaning;
s2: then, the learning keywords are stored into a specific data format to be used as key information for storage, a key information coding table of learning resources is established, and each learning keyword in the key information corresponds to one binary code respectively;
s3: performing coverage rate analysis on the key information coding tables corresponding to any two retrieval results, and if the coverage rate exceeds gamma%, considering the two retrieval results as homologous retrieval results, wherein gamma is a preset value and takes a value of 97;
for the homologous retrieval results, counting the number of codes in a key information code table corresponding to each retrieval result, selecting the retrieval result with a large number of codes as a representative retrieval result, and rejecting the other retrieval result; if the coverage rate does not exceed gamma%, taking the two search results as representative search results; then, performing coverage rate analysis on the representative retrieval result and other retrieval results, and so on;
wherein the coverage is expressed as: the number of the same codes in the two key information coding tables is compared; the number ratio is the same code number/code number calculation value, and the code number calculation value is the lower value of the total number of codes in the two code tables;
according to the invention, the information receiving module is used for auditing and filtering the homologous retrieval results, so that a user can obtain more abundant and comprehensive learning resources, options can be reduced, the user is prevented from spending time and energy on similar learning resources, and the retrieval efficiency is improved;
s4: acquiring the representative retrieval result processed in step S3, and performing access value analysis on the representative retrieval result; sorting the representative retrieval results according to the access values, and selecting the representative retrieval results of the W1 before ranking to feed back to the user terminal, so that the pushing result is more accurate, and the retrieval efficiency is improved; wherein W1 is a preset value;
the access value acquisition method comprises the following steps:
s31: acquiring access information representing a retrieval result in ten days before the current time of the system; the access information comprises an access object and an access time;
s32: counting the number of visitors representing the retrieval result according to the visiting objects and marking as R1;
sequencing the access time representing the retrieval result according to time sequence, and calculating the time difference of adjacent access time to obtain a single access interval;
summing all the single access intervals and averaging to obtain an access interval average value Gz;
calculating the time difference between the latest access time and the current time of the system to obtain a buffer duration HT; carrying out normalization processing on the number of visitors, the mean value of the visiting intervals and the buffer duration and taking the numerical values of the number of visitors, the mean value of the visiting intervals and the buffer duration;
calculating an access value FW representing a retrieval result by using a formula FW (R1 × b1)/(Gz × b2+ HT × b3), wherein b1, b2 and b3 are coefficient factors;
the browsing module is used for the user terminal to select the retrieval result for reference until the target data is found; feeding target data back to the information retrieval module;
when the user logs out, the evaluation module is used for evaluating the retrieval service of the learning resource by the user, and the evaluation rule is as follows: scoring the retrieval service, wherein the full score is 100; the specific working steps of the evaluation module are as follows:
marking the service score of the user as Qs, acquiring the number of representative retrieval results consulted before the user finds the target data, and marking the representative retrieval results as Cs;
calculating the time difference between the time when the user inputs the retrieval information and the time when the target data is fed back to obtain a retrieval time GT;
carrying out normalization processing on the service scores, the representative retrieval result quantity and the retrieval duration and taking the numerical values of the service scores, the representative retrieval result quantity and the retrieval duration; calculating a retrieval satisfaction value QR of the user by using a formula QR (Qs multiplied by r1)/(Cs multiplied by r2+ GT multiplied by r3), wherein r1, r2 and r3 are coefficient factors; the smaller Cs and the smaller GT are, the faster the user finds the target data, and the higher the retrieval efficiency is, the higher the retrieval satisfaction value of the user is;
the evaluation module is used for stamping a time stamp on the retrieval satisfaction value QR and storing the time stamp in the storage module, and transmitting the retrieval satisfaction value QR to the display module for real-time display.
The above formulas are all calculated by removing dimensions and taking numerical values thereof, the formula is a formula which is obtained by acquiring a large amount of data and performing software simulation to obtain the closest real situation, and the preset parameters and the preset threshold value in the formula are set by the technical personnel in the field according to the actual situation or obtained by simulating a large amount of data.
The working principle of the invention is as follows:
when the information retrieval and analysis system works, when a user inputs retrieval information through an information input module, the analysis module is used for tracking a login account of the user, performing statistical analysis on historical retrieval records of the user, obtaining a retrieval attraction value of a storage end according to the distribution of the storage ends of target data, dividing the retrieval information to obtain historical retrieval results corresponding to the retrieval information, obtaining the distribution proportion of the corresponding retrieval results in each storage end, obtaining a retrieval priority value of the storage end in the retrieval at this time by combining the retrieval attraction value and the storage end occupation ratio to generate a corresponding storage end sequence table, and after the information retrieval module receives the storage end sequence table, sequentially retrieving learning resources stored in the corresponding storage end by combining the retrieval information input by the current user;
the information receiving module is used for receiving the retrieval results of the information retrieval module, auditing and filtering the retrieval results, pushing the corresponding retrieval results to the user terminal, firstly extracting the original keywords of the learning resources corresponding to each retrieval result, and performing data cleaning on the original keywords to obtain the learning keywords; then, the learning keywords are stored into a specific data format to be used as key information for storage, and a key information coding table of learning resources is established; carrying out coverage rate analysis on the key information coding tables corresponding to any two retrieval results to obtain representative retrieval results; then, performing access value analysis on the representative retrieval result; sorting the representative retrieval results according to the access values, and selecting the representative retrieval results of the W1 before ranking to feed back to the user terminal;
the browsing module is used for the user terminal to select the retrieval result for reference until the target data is found; feeding target data back to the information retrieval module; when the user logs out, the evaluation module is used for evaluating the retrieval service of the learning resources by the user, calculating the retrieval satisfaction value of the user by combining the service score of the user, the number of representative retrieval results consulted before the user finds the target data and the retrieval time length, and transmitting the retrieval satisfaction value to the display module for real-time display, so that the administrator can conveniently and visually know the retrieval satisfaction value.
In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (8)

1. An information retrieval and analysis system based on big data is characterized by comprising a data acquisition module, an information input module, an analysis module, an information receiving module, a browsing module and an evaluation module;
the data acquisition module is used for acquiring learning resource information of an education platform to form an original resource database, and the original resource database comprises a plurality of storage terminals;
the information input module is used for logging in by a user, inputting retrieval information and sending the input retrieval information to the information retrieval module; when a user inputs retrieval information through the information input module, the analysis module is used for tracking a login account of the user, performing statistical analysis on a historical retrieval record of the user to obtain a storage end sequence table of the retrieval, and feeding the storage end sequence table back to the information retrieval module;
after the information retrieval module receives the storage end sequence table, the learning resources stored in the corresponding storage end are sequentially retrieved by combining the retrieval information input by the current user;
the information receiving module is used for receiving the retrieval result of the information retrieval module, auditing and filtering the retrieval result, and pushing the corresponding retrieval result to the user terminal; the browsing module is used for the user terminal to select the retrieval result for looking up until the target data is found, and feeding the target data back to the information retrieval module; and when the user logs out, the evaluation module is used for evaluating the retrieval service of the learning resources by the user.
2. The big data-based information retrieval and analysis system according to claim 1, wherein the analysis module comprises the following specific analysis steps:
when a user inputs retrieval information, tracking a login account of the user, and collecting retrieval records of the user in the last three months; the retrieval records carry corresponding target data;
acquiring a storage end where each target data is located, and counting the occurrence times of the same storage end and the total browsing time of the target data in the same storage end; calculating to obtain a retrieval attraction value Gi of the storage end;
acquiring all retrieval results fed back by historical retrieval information matched with the current retrieval information; counting the distribution proportion of the corresponding retrieval results at each storage end and marking as the storage end proportion Z i;
using formulas
Figure FDA0003287370790000021
Calculating to obtain the retrieval priority value JSi of the storage end in the retrieval, and according to the retrieval priority value JSAnd (5) sorting the storage ends by the size of i to obtain a storage end sequence table of the retrieval.
3. The big data-based information retrieval and analysis system according to claim 1, wherein the detailed review filtering steps of the information receiving module are as follows:
s1: acquiring a plurality of retrieval results of the retrieval information; extracting original keywords of the learning resources corresponding to each retrieval result, and performing data cleaning on the original keywords to obtain learning keywords;
s2: then, the learning keywords are stored into a specific data format to be used as key information for storage, and a key information coding table of learning resources is established;
s3: carrying out coverage rate analysis on the key information coding tables corresponding to any two retrieval results, and filtering to obtain representative retrieval results;
s4: and (4) performing access value analysis on the representative retrieval results, and selecting the representative retrieval results with the access values of W1 before ranking to feed back to the user terminal, wherein W1 is a preset value.
4. The big data-based information retrieval and analysis system according to claim 3, wherein the original keywords are keywords that appear more frequently than a set threshold in the text corresponding to the learning resource; the specific process of carrying out data cleaning on the original keywords comprises the following steps: unifying keywords with the same meaning or similar keywords, and removing keywords without actual analysis meaning.
5. The big-data-based information retrieval and analysis system according to claim 3, wherein the coverage rate is expressed as: the number of the same codes in the two key information coding tables is compared; the number ratio is the same code number/code number calculated value, and the code number calculated value is the minimum value of the total number of codes of the two code tables.
6. The big-data-based information retrieval and analysis system according to claim 5, wherein the filtering in step S3 to obtain the representative retrieval result specifically includes:
if the coverage rate exceeds gamma%, taking the retrieval result with a large number of codes as a representative retrieval result, and rejecting the other retrieval result; if the coverage rate does not exceed gamma%, taking the two search results as representative search results; and then, performing coverage rate analysis on the representative retrieval result and other retrieval results, and so on, wherein gamma is a preset value.
7. The big data-based information retrieval and analysis system according to claim 1, wherein the evaluation module specifically works as follows:
marking the service score of the user as Qs, acquiring the number of representative retrieval results consulted before the user finds the target data, and marking the representative retrieval results as Cs; calculating the time difference between the time when the user inputs the retrieval information and the time when the target data is fed back to obtain a retrieval time GT;
calculating a retrieval satisfaction value QR of the user by using a formula QR (Qs multiplied by r1)/(Cs multiplied by r2+ GT multiplied by r3), wherein r1, r2 and r3 are coefficient factors; the evaluation module is used for searching the QR satisfaction value, stamping a time stamp and storing the time stamp in the storage module, and transmitting the QR satisfaction value to the display module for real-time display.
8. The big data-based information retrieval and analysis system according to claim 1, wherein the raw resource database is configured to extract release time information of each piece of stored learning resource information, and classify the stored learning resource information according to a plurality of time periods; each storage terminal is in one-to-one correspondence with each type of learning resource information and is used for storing the learning resource information of the corresponding type.
CN202111151682.2A 2021-09-29 2021-09-29 Big data-based information retrieval and analysis system Pending CN113868235A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111151682.2A CN113868235A (en) 2021-09-29 2021-09-29 Big data-based information retrieval and analysis system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111151682.2A CN113868235A (en) 2021-09-29 2021-09-29 Big data-based information retrieval and analysis system

Publications (1)

Publication Number Publication Date
CN113868235A true CN113868235A (en) 2021-12-31

Family

ID=78992827

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111151682.2A Pending CN113868235A (en) 2021-09-29 2021-09-29 Big data-based information retrieval and analysis system

Country Status (1)

Country Link
CN (1) CN113868235A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114238588A (en) * 2022-02-24 2022-03-25 江西医之健科技有限公司 Data retrieval method, system, readable storage medium and computer equipment
CN114722179A (en) * 2022-04-26 2022-07-08 国信专达(杭州)科技有限公司 Retrieval analysis and data fusion method based on information tracing
CN114741609A (en) * 2022-05-13 2022-07-12 北京思源智通科技有限责任公司 Teaching resource retrieval system
CN115600011A (en) * 2022-11-30 2023-01-13 大能手教育科技(北京)有限公司(Cn) Educational resource pairing method, system and storage medium based on distribution algorithm
CN116719954A (en) * 2023-08-04 2023-09-08 中国人民解放军海军潜艇学院 Information retrieval method, electronic equipment and storage medium
CN117573727A (en) * 2024-01-17 2024-02-20 湖南天承信息技术有限公司 Practitioner health physical examination information retrieval system

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114238588A (en) * 2022-02-24 2022-03-25 江西医之健科技有限公司 Data retrieval method, system, readable storage medium and computer equipment
CN114238588B (en) * 2022-02-24 2022-06-17 江西医之健科技有限公司 Data retrieval method, system, readable storage medium and computer device
CN114722179A (en) * 2022-04-26 2022-07-08 国信专达(杭州)科技有限公司 Retrieval analysis and data fusion method based on information tracing
CN114722179B (en) * 2022-04-26 2023-07-04 国信专达(杭州)科技有限公司 Retrieval analysis and data fusion method based on information tracing
CN114741609A (en) * 2022-05-13 2022-07-12 北京思源智通科技有限责任公司 Teaching resource retrieval system
CN115600011A (en) * 2022-11-30 2023-01-13 大能手教育科技(北京)有限公司(Cn) Educational resource pairing method, system and storage medium based on distribution algorithm
CN116719954A (en) * 2023-08-04 2023-09-08 中国人民解放军海军潜艇学院 Information retrieval method, electronic equipment and storage medium
CN116719954B (en) * 2023-08-04 2023-10-17 中国人民解放军海军潜艇学院 Information retrieval method, electronic equipment and storage medium
CN117573727A (en) * 2024-01-17 2024-02-20 湖南天承信息技术有限公司 Practitioner health physical examination information retrieval system
CN117573727B (en) * 2024-01-17 2024-03-26 湖南天承信息技术有限公司 Practitioner health physical examination information retrieval system

Similar Documents

Publication Publication Date Title
CN113868235A (en) Big data-based information retrieval and analysis system
CN108363821A (en) A kind of information-pushing method, device, terminal device and storage medium
CN109299271B (en) Training sample generation method, text data method, public opinion event classification method and related equipment
CN100545847C (en) A kind of method and system that blog articles is sorted
CN101814083A (en) Automatic webpage classification method and system
CN105069102A (en) Information push method and apparatus
CN104077407A (en) System and method for intelligent data searching
CN111191111A (en) Content recommendation method, device and storage medium
CN111723256A (en) Government affair user portrait construction method and system based on information resource library
CN112632405A (en) Recommendation method, device, equipment and storage medium
CN113297457A (en) High-precision intelligent information resource pushing system and pushing method
CN105512300B (en) information filtering method and system
CN113157867A (en) Question answering method and device, electronic equipment and storage medium
CN110795613A (en) Commodity searching method, device and system and electronic equipment
CN111753526A (en) Similar competitive product data analysis method and system
US20120239657A1 (en) Category classification processing device and method
CN108509449B (en) Information processing method and server
CN110175289B (en) Mixed recommendation method based on cosine similarity collaborative filtering
JP4667889B2 (en) Data map creation server and data map creation program
CN111104422B (en) Training method, device, equipment and storage medium of data recommendation model
CN117171650A (en) Document data processing method, system and medium based on web crawler technology
CN114528448B (en) Accurate analytic system of drawing of portrait of global foreign trade customer
CN115510202A (en) Intelligent question-answering system based on power grid equipment knowledge graph
CN111078972B (en) Questioning behavior data acquisition method, questioning behavior data acquisition device and server
CN113971213A (en) Smart city management public information sharing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220829

Address after: No. 302, Unit 3, Building 15, Zone C, Jiying Garden, Jiying Street, Economic and Technological Development Zone, Kaifeng City, Henan Province, 475000

Applicant after: Zhu Zhihui

Address before: 518100 a309, building A1, creative Town, No. 29, Nanxin street, Nanling village community, Nanwan street, Longgang District, Shenzhen, Guangdong

Applicant before: Shenzhen Lianyin intercommunication Information Co.,Ltd.

WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20211231