CN112269816A - Government affair appointment event correlation retrieval method - Google Patents

Government affair appointment event correlation retrieval method Download PDF

Info

Publication number
CN112269816A
CN112269816A CN202011244701.1A CN202011244701A CN112269816A CN 112269816 A CN112269816 A CN 112269816A CN 202011244701 A CN202011244701 A CN 202011244701A CN 112269816 A CN112269816 A CN 112269816A
Authority
CN
China
Prior art keywords
index
retrieval
service
appointment
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011244701.1A
Other languages
Chinese (zh)
Other versions
CN112269816B (en
Inventor
张超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Cloud Information Technology Co Ltd filed Critical Inspur Cloud Information Technology Co Ltd
Priority to CN202011244701.1A priority Critical patent/CN112269816B/en
Publication of CN112269816A publication Critical patent/CN112269816A/en
Application granted granted Critical
Publication of CN112269816B publication Critical patent/CN112269816B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2468Fuzzy queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Tourism & Hospitality (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Automation & Control Theory (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for retrieving correlation of government affair appointment matters, which belongs to the technical field of government affair appointment, and is characterized in that a correlation type index is generated by using timing task induction based on user operation records, and a common type index is generated by basic information maintenance; the generation of the correlation type index uses a statistical model in a scoring form to generate a combined retrieval mode of keyword retrieval, associated word retrieval, and correlation degree ranking of the keywords and the reservation service for the reservation service item retrieval. The invention can accurately position the demands of the transactants and display related services when the transacted services are booked on the internet by the members, thereby improving the booked transaction efficiency, and meanwhile, the statistical analysis method based on the data can continuously optimize the query accuracy to improve the performance and the experience.

Description

Government affair appointment event correlation retrieval method
Technical Field
The invention relates to the technical field of government affair appointment, in particular to a method for searching correlation of government affair appointment matters.
Background
With the continuous development and improvement of the field of government affair services and the development of the mobile internet to a new stage, the online reservation transaction of various channels based on a webpage end, an App and a small program provides simple, convenient and efficient government affair transaction experience for the masses, but the requirement for intelligent transaction is more and more urgent, more and more masses need intelligent, personalized and accurate transaction processes, and meanwhile, in order to realize the conversion from 'handling', 'fast handling' to 'intelligent handling' of the government affair service capability, the service mode needs to be changed more, the online service is applied to the statistical analysis capability based on user data, the keyword hit rate is improved, and the government affair service governing capability is improved.
Disclosure of Invention
The technical task of the invention is to provide a method for searching the correlation of government affair appointment matters, which can accurately position the demands of the transactants and display the related business when the transactants make an appointment for transacting business on the internet, thereby improving the efficiency of the appointment transaction, and meanwhile, the statistical analysis method based on data can continuously optimize the query accuracy to improve the performance and experience.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a correlation retrieval method for government affair appointment matters is characterized in that a correlation type index is generated by using timing task induction based on user operation records, and a common type index is generated by basic information maintenance;
the generation of the correlation type index uses a statistical model in a scoring form to generate a combined retrieval mode of keyword retrieval, associated word retrieval, and correlation degree ranking of the keywords and the reservation service for the reservation service item retrieval. The method and the system realize rapid item searching and related business analysis and recommendation of the appointed business items under the multi-channel online appointment scene, and realize accurate prediction and intelligent analysis and recommendation of the demands of the appointed items of the consumers, thereby meeting the requirements of individuation, intellectualization and high accuracy, reducing the database searching pressure and improving the efficiency of the public.
The method can accurately position the demands of the transactants and display related services when the transacted services are booked on the Internet by the members, thereby improving the booked transaction efficiency, and meanwhile, the statistical analysis method based on the data can continuously optimize the query accuracy to improve the performance and the experience.
Preferably, an Elasticissearch search engine and a Chinese IK participler are selected for retrieval. Elatics search is an open-source highly-expanded distributed full-text search engine, can store and retrieve data in near real time, has good expansibility, occupies the first share in the open-source search field, and is accurate in extracting keywords by a Chinese IK word splitter, so that a business analysis retrieval method can be provided based on the Elatics search.
The method has the advantages that a search engine is used for replacing simple database retrieval, the Elasticissearch is a good choice, the Elasticissearch is an open-source distributed RESTful-style search and data analysis engine, the open-source library Apache Lucene is arranged on the bottom layer, the Elasticissearch engine is used as a distributed full-text retrieval engine, the Elasticissearch engine has good expansibility, supports PB-level structured or unstructured data, and can be completely suitable for fast positioning of reservation item services with huge data volume under the condition of large-scale centralized deployment.
The Elasticissearch has a plurality of high-quality word segmenters, wherein the selection is based on a Chinese IK word segmenter, the Chinese IK word segmenter provides two word segmentation algorithms of IK _ smart and IK _ max _ word, a text can be segmented in a semantization multi-level mode by using an IK _ max _ word minimum granularity division mode for positioning the target data of a user to the greatest extent, the created indexes are more, and the positioning precision is higher.
Preferably, the rocktmq is used as a message queue to realize the decoupling of normal service and a recording result.
The method has the advantages that higher searching precision is realized, user target search words and search results need to be collected and summarized, follow-up timing tasks are waited for pulling data to analysis model services, asynchronous decoupling is carried out by selecting a message queue under the condition that normal business processes are not influenced, Rokectmq is a good choice, a transactional message solution is provided, and correct consumption and storage of each result set are guaranteed.
Preferably, the correlation type index includes three index types, that is, the search result set is composed of three indexes and a database SQL query, and the three index types are:
an index generated by associating the keywords generated after the word segmentation processing based on the name of the appointment affair service with the current appointment affair service is marked as N type (Normal),
the index of the relevance pattern analyzed based on the user feedback and the trigger behavior log is subjected to timing induction and is marked as type C (Correlation),
and a reservation service item service index carried by the Related words based on the keywords and marked as R type (Related).
The three index types have different importance, the search results are sequentially sorted according to the degree of correlation of the C-type index result, the N-type index result and the R-type index result, the results are deduplicated, reservation items can be displayed according to the relevance sorting, and the search reliability is improved.
Furthermore, each field of each piece of information in the retrieval result set contains a business item name, a business item ID, a business item department, a keyword, an index type and an index ID, and the total result set also contains a UUID of the retrieval, so that data is provided for subsequent user log collection and recall records.
Preferably, the basic information maintenance generates a general type index, when the management service maintains the service items, the basic index (i.e. N-type index) is affected by adding modification and deletion,
after the appointment item is newly added, the service name is subjected to word segmentation processing, and each keyword and the current appointment service item ID form index data to be stored in the ES service;
after the appointment items are modified, deleting the original basic type index and regenerating a new basic type index according to the item ID;
after the appointment item is deleted, the original basic type index is deleted according to the item ID, and the other two types of indexes are deleted according to the item ID, so that the accuracy of the data is ensured.
Preferably, the log collection is performed on the operation records of the user, and comprises
After the search request processing process of the client is finished, assembling each piece of data of the search result set into a message queue, and using the log service as a consumption end of the message to record log information and putting the log information into the message queue; the assembled log information field comprises a search word, a keyword, an Es index ID, a matter ID, an index type and a UUID for the search;
after a client user obtains a retrieval result, clicking and browsing a certain piece of retrieval information to form a click positioning recall log, sending data to log service through the client for storage, and repeatedly clicking and recording only once to prevent analysis data distortion; the log information comprises a UUID, an index ID, a matter ID, a keyword, an index type and a search word of the retrieval;
after obtaining the retrieval result, the client user clicks and browses a certain piece of retrieval information and successfully transacts the service to form a successfully transacted recall log, and the data is sent to log service through the client for storage; the log information comprises a UUID, an index ID, an index type, a matter ID, a keyword and a search word of the retrieval;
and the three logs are collected and have the same retrieval UUID, the logs are regarded as a group of retrieval process logs, and after the logs respectively enter the log service, a group of retrieval process logs are put into the analysis model service for analysis processing after waiting for timing task scanning.
Preferably, the relevancy index is generated:
and scanning the collected logs by the log service timing task, packaging and sending the three logs to an analysis model service according to the UUID for processing, generating a relevancy index, calculating the relevancy of the relevancy index by adopting a numerical statistical rule, and according to the fact that the attribute of the newly generated relevancy index contains a relevancy field, defaulting to 100 and setting the interval to be 0-1000.
The relevancy index comprises key words related to key information, corresponding basic index IDs and relevancy values, the key words and the basic indexes are subjected to many-to-many correlation mapping and the relevancy values are mounted, and the basic index IDs related in the index are used for pointing to the basic indexes finally when the type index is searched; meanwhile, different step values are set according to different log types: browsing the recall (+1), handling the recall (+2) and the missed recall (-1), setting the correlation value to change in sequence, scanning and deleting the index with the correlation value of 0 at regular time, and aiming at continuously correcting the correlation value of each correlation index through the analysis service module and improving the hit rate;
although the types of indexes carried in the log are different, the log can carry two pieces of information, namely keywords and basic index ID, by default, so that the processing processes are approximately the same; checking whether a correlation index exists, if so, modifying the numerical value according to the setp rule, and if not, generating a new index according to the generation rule;
through continuous calibration of the model based on a large amount of data, the respective correlation value distribution of all keywords of one reserved service item approximately accords with normal distribution, and the correlation value is used for determining the display sorting priority.
Generating a related word index:
the fields contained in the associated vocabulary index are keywords and associated word arrays thereof, the data source is that after the reserved service search interface processes the search result, the IK word segmentation result is put into a message queue, the analysis model service is used as a consumption end to process and generate the associated index, and the message queue realizes asynchronous decoupling.
Only the keywords and the relevancy are used for searching, but the query result may still be inaccurate, so that the function of inference recommendation needs to be achieved by using the related word index, and the searching accuracy is improved.
The retrieval method realizes the result matching of the keywords and the associated vocabularies during the fuzzy query of the searched vocabularies, and improves the hit rate of the masses;
according to log records browsed and successfully handled by a user after a search result is displayed, a relevancy retrieval mode is provided through a recall rate, and the relevancy is displayed according to relevancy weight, so that the accuracy rate is improved;
the relevance analysis model based on the recall rate can be used for improving the relevance accuracy of the keywords and the item information;
the relevance index of the keywords and the associated vocabularies is realized, and the personalized recommendation function is enhanced.
The invention also claims a government affairs appointment affairs correlation retrieval device, comprising: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor is used for calling the machine readable program and executing the method.
The invention also claims a computer readable medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the above-described method.
Compared with the prior art, the method for searching the correlation of the government affair appointment matters has the following beneficial effects:
the method can reduce the pressure on the database in the concurrency environment and improve the concurrency capability when the system does not completely depend on the database retrieval during the service retrieval and the keyword fuzzy search;
the data retrieval interface data come from the index of the Elasticissearch and partial database query, and can provide millimeter-level response speed, reduce waiting for people and optimize use experience.
The statistical analysis based on the user operation log records and the induction summary of the timing tasks enable the query results to be more intelligent, accurate and personalized, replace the original fuzzy search based on a database, and optimize the positioning accuracy of the masses.
Drawings
Fig. 1 is a flowchart of a method for retrieving the relevancy of government appointment matters according to an embodiment of the present invention;
fig. 2 is a schematic diagram of client and reservation server retrieval according to an embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and specific examples.
In the current government affair service scene, when the masses accurately find the corresponding standard affair service according to the demands of the masses, the positioning efficiency is low and the accuracy is low under the unfamiliar condition, and the mode based on database retrieval can not meet the requirements of the production environment more and more. When the business query and retrieval process is based on database search, the required keywords can only be fuzzy queried by using a database LIKE function, and under the condition of uniform reservation of the entire province, under the condition that the data volume of business handling items is huge, query results are different when keywords input by the masses are slightly input and output, and the business handling items corresponding to the requirements are difficult to be accurately positioned, so that the business query and retrieval process needs to have the characteristics of word segmentation function, fuzzification query, completely non-corresponding query and millisecond-level query to improve the condition.
The embodiment of the invention provides a government affair appointment event correlation retrieval method, which is based on an Elasticissearch search engine and an index retrieval basic function of a Chinese IK participler, uses a timing task to generalize and generate a correlation type index based on user operation records, and generates a common type index by basic information maintenance;
the generation of the correlation type index uses a statistical model in a scoring form, so that a combined search mode of keyword search, associated word search, and correlation degree ranking of keywords and reservation services is generated for the reservation service item search. The method and the system realize rapid item searching and related business analysis and recommendation of the appointed business items under the multi-channel online appointment scene, and realize accurate prediction and intelligent analysis and recommendation of the demands of the appointed items of the consumers, thereby meeting the requirements of individuation, intellectualization and high accuracy, reducing the database searching pressure and improving the efficiency of the public.
The method can accurately position the demands of the transactants and display related services when the transacted services are booked on the Internet by the members, thereby improving the booked transaction efficiency, and meanwhile, the statistical analysis method based on the data can continuously optimize the query accuracy to improve the performance and the experience.
Elatics search is an open-source highly-expanded distributed full-text search engine, can store and retrieve data in near real time, has good expansibility, occupies the first share in the open-source search field, and is accurate in extracting keywords by a Chinese IK word splitter, so that a business analysis retrieval method can be provided based on the Elatics search.
The method has the advantages that a search engine is used for replacing simple database retrieval, the Elasticissearch is a good choice, the Elasticissearch is an open-source distributed RESTful-style search and data analysis engine, the open-source library Apache Lucene is arranged on the bottom layer, the Elasticissearch engine is used as a distributed full-text retrieval engine, the Elasticissearch engine has good expansibility, supports PB-level structured or unstructured data, and can be completely suitable for fast positioning of reservation item services with huge data volume under the condition of large-scale centralized deployment.
The Elasticissearch has a plurality of high-quality word segmenters, wherein the selection is based on a Chinese IK word segmenter, the Chinese IK word segmenter provides two word segmentation algorithms of IK _ smart and IK _ max _ word, a text can be segmented in a semantization multi-level mode by using an IK _ max _ word minimum granularity division mode for positioning the target data of a user to the greatest extent, the created indexes are more, and the positioning precision is higher.
And selecting the RockMq as a message queue to realize the decoupling of normal service and a recording result. The method has the advantages that higher searching precision is realized, user target search words and search results need to be collected and summarized, follow-up timing tasks are waited for pulling data to analysis model services, asynchronous decoupling is carried out by selecting a message queue under the condition that normal business processes are not influenced, Rokectmq is a good choice, a transactional message solution is provided, and correct consumption and storage of each result set are guaranteed.
As appearing herein:
search terms: namely, a search sentence or a vocabulary input by a user client;
key words: namely, each word segmentation result processed by using an IK word segmentation device IK _ max _ word mode;
associated words: that is, a keyword having the same or similar search result as the keyword is referred to as a related word of the keyword.
The retrieval result set consists of three indexes and database SQL query, and the three index types are respectively:
and (2) N type: the Normal index is generated by associating a keyword generated after word segmentation processing based on the name of the reserved item service with the current reserved item service;
type C: "Correlation" is based on user feedback and triggering behavior log to periodically induce and analyze the index of the relevance mode;
r type: the 'Related' is based on the service index of the reservation service item carried by the relevant word of the keyword.
The search flow of the client and the reservation server is briefly described as that the reservation server receives a search word input by a user, a Chinese IK word segmentation device IK _ max _ word mode is used for processing the search word into a plurality of key words, each key word calls an ES container service to obtain three types of index results, a service item main key ID in an index is obtained, a database is searched to obtain basic information of a service, and a result set is formed and returned to the client. As shown with reference to fig. 2.
The three index types have different importance, the search results are sequentially sorted according to the degree of correlation of the C-type index result, the N-type index result and the R-type index result, the results are deduplicated, reservation items can be displayed according to the relevance sorting, and the search reliability is improved.
Each field of the information in the retrieval result set contains a business item name, a business item ID, a business item department, a keyword, an index type and an index ID, and the total result set also contains a UUID of the retrieval, so that data is provided for subsequent user log collection and recall records.
The analysis model service needs huge user behavior log support, so that three log records about search are available in a full-process log link of search analysis, and the following processes are respectively a generation process, information-containing process and a storage process:
after the search request processing process of the client is finished, assembling each piece of data of the search result set and putting the data into a message queue, and using the log service as a consuming end of the message to record log information. The assembled log information field comprises a search word, a keyword, an Es index ID, a matter ID, an index type and a UUID for the search, and is placed in a message queue.
After the client user obtains the retrieval result, clicking and browsing a certain piece of retrieval information to form a click positioning recall log, and sending the data to the log service through the client for storage. The log information comprises a UUID, an index ID, a matter ID, a keyword, an index type and a search word of the retrieval; repeated clicking is only recorded once, so that the analysis data is prevented from being distorted.
After obtaining the retrieval result, the client user clicks and browses a certain piece of retrieval information and successfully transacts the service to form a successfully transacted recall log, and the data is sent to log service through the client for storage; the log information comprises a UUID, an index ID, an index type, a matter ID, keywords and search terms of the current retrieval.
And the three logs are collected and have the same retrieval UUID, the logs are regarded as a group of retrieval process logs, and after the logs respectively enter the log service, a group of retrieval process logs are put into the analysis model service for analysis processing after waiting for timing task scanning.
The management service maintains the base index:
when the management service maintains the business items, the basic index (i.e. the N-type index) is affected by adding modification and deletion,
after the appointment item is newly added, the service name is subjected to word segmentation processing, and each keyword and the current appointment service item ID form index data to be stored in the ES service;
after the appointment items are modified, deleting the original basic type index and regenerating a new basic type index according to the item ID;
after the appointment item is deleted, the original basic type index is deleted according to the item ID, and the other two types of indexes are deleted according to the item ID, so that the accuracy of the data is ensured.
The analysis model service generates a relevance index:
and scanning the collected logs by the log service timing task, packaging and sending the three logs to an analysis model service according to the UUID for processing, generating a relevancy index, calculating the relevancy of the relevancy index by adopting a numerical statistical rule, and according to the fact that the attribute of the newly generated relevancy index contains a relevancy field, defaulting to 100 and setting the interval to be 0-1000.
The relevancy index comprises key words related to key information, corresponding basic index IDs and relevancy values, and essentially maps the key words and the basic indexes in a many-to-many correlation mode and mounts the relevancy values, and the basic index IDs related in the index are used for pointing to the basic indexes finally when the type index is searched. Meanwhile, different step values are set according to different log types: browsing the recall (+1), handling the recall (+2) and the missed recall (-1), setting the correlation value to change in sequence, scanning and deleting the index with the correlation value of 0 at regular time, and aiming at continuously correcting the correlation value of each correlation index through the analysis service module and improving the hit rate;
although the types of indexes carried in the log are different, the log can carry two information, namely a keyword and a basic index ID by default, so that the processing procedures are approximately the same. And checking whether a correlation index exists, if so, modifying the numerical value according to the setp rule, and if not, generating a new index according to the generation rule.
Through continuous calibration of the model based on a large amount of data, the respective correlation value distribution of all keywords of one reserved service item approximately accords with normal distribution, and the correlation value is used for determining the display sorting priority.
The analysis model generates a relevant word index:
the fields contained in the associated vocabulary index are keywords and associated word arrays thereof, the data source is that after the reserved service search interface processes the search result, the IK word segmentation result is put into a message queue, the analysis model service is used as a consumption end to process and generate the associated index, and the message queue realizes asynchronous decoupling.
The IK segmentation result includes a primary search segmentation result, for example, when the search term is "vehicle annual inspection", the keyword segmentation is "vehicle", "annual inspection", "vehicle annual inspection", "vehicle", they are associated terms, the five keywords are sequentially associated indexes, each index includes its own keyword and its associated vocabulary array, the associated term is found according to the keyword when the search is triggered subsequently, and then the basic index is searched according to the associated term. The associated vocabulary array of the special associated type index is continuously supplemented according to the IK word segmentation result, and at most 20 vocabularies prevent the efficiency from being slowed down when the basic index is searched.
Only the keywords and the relevancy are used for searching, but the query result may still be inaccurate, so that the function of inference recommendation needs to be achieved by using the related word index, and the searching accuracy is improved.
The retrieval method realizes the result matching of the keywords and the associated vocabularies during the fuzzy query of the searched vocabularies, and improves the hit rate of the masses;
according to log records browsed and successfully handled by a user after a search result is displayed, a relevancy retrieval mode is provided through a recall rate, and the relevancy is displayed according to relevancy weight, so that the accuracy rate is improved;
the relevance analysis model based on the recall rate can be used for improving the relevance accuracy of the keywords and the item information;
the relevance index of the keywords and the associated vocabularies is realized, and the personalized recommendation function is enhanced.
The invention also claims a government affairs appointment affairs correlation retrieval device, comprising: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor is configured to invoke the machine readable program to perform the above-mentioned government appointment correlation retrieval method.
An embodiment of the present invention further provides a computer-readable medium, where the computer-readable medium stores thereon computer instructions, and when executed by a processor, the computer instructions cause the processor to execute the method for retrieving the correlation between the government appointment matters in the above embodiments of the present invention. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.
In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.
Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.
Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.
Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.
While the invention has been shown and described in detail in the drawings and in the preferred embodiments, it is not intended to limit the invention to the embodiments disclosed, and it will be apparent to those skilled in the art that various combinations of the code auditing means in the various embodiments described above may be used to obtain further embodiments of the invention, which are also within the scope of the invention.

Claims (10)

1. A method for searching correlation of government affair appointment matters is characterized in that a correlation type index is generated by using timing task induction based on user operation records, and a common type index is generated by basic information maintenance;
the generation of the correlation type index uses a statistical model in a scoring form to generate a combined retrieval mode of keyword retrieval, associated word retrieval, and correlation degree ranking of the keywords and the reservation service for the reservation service item retrieval.
2. The method according to claim 1, wherein an elastic search engine and a Chinese IK participler are selected for searching.
3. A method for retrieving a correlation between government appointments according to claim 1 or 2, wherein the rocktmq is selected as a message queue to implement the decoupling between normal services and recorded results.
4. The method according to claim 1, wherein the correlation type index includes three index types, each of which is:
an index generated by associating the keywords generated after the word segmentation processing based on the name of the appointment affair service with the current appointment affair service is marked as an N type,
the index of the relevance mode analyzed by timing induction based on the user feedback and the trigger behavior log is marked as C type,
and an appointment business item business index carried by the associated words based on the keywords and is marked as an R type.
And sequencing the retrieval results according to the relevance of the C-type index result, the N-type index result and the R-type index result in sequence, and removing duplication of the results.
5. The method according to claim 4, wherein each information field in the search result set contains a service item name, a service item ID, a service item department, a keyword, an index type and an index ID, and the total result set further contains a UUID of the search, so as to provide data for subsequent user log collection and recall records.
6. A method of retrieving the relevancy of government appointment matters according to claim 2, wherein the basic information maintenance generates a general type index,
after the appointment item is newly added, the service name is subjected to word segmentation processing, and each keyword and the current appointment service item ID form index data to be stored in the ES service;
after the appointment items are modified, deleting the original basic type index and regenerating a new basic type index according to the item ID;
after the appointment item is deleted, the original basic type index is deleted according to the item ID, and the other two types of indexes are deleted according to the item ID, so that the accuracy of the data is ensured.
7. The method for searching for the relevance of a government appointment according to claim 1, 2, 4, 5 or 6, wherein the log collection of the operation records of the user comprises
After the search request processing process of the client is finished, assembling each piece of data of the search result set into a message queue, and using the log service as a consumption end of the message to record log information and putting the log information into the message queue; the assembled log information field comprises a search word, a keyword, an Es index ID, a matter ID, an index type and a UUID for the search;
after obtaining a retrieval result, a client user clicks and browses a certain piece of retrieval information to form a click positioning recall log, data is sent to a log service through the client to be stored, and repeated clicks are recorded only once; the log information comprises a UUID, an index ID, a matter ID, a keyword, an index type and a search word of the retrieval;
after obtaining the retrieval result, the client user clicks and browses a certain piece of retrieval information and successfully transacts the service to form a successfully transacted recall log, and the data is sent to log service through the client for storage; the log information comprises a UUID, an index ID, an index type, a matter ID, a keyword and a search word of the retrieval;
and the three logs are collected and have the same retrieval UUID, the logs are regarded as a group of retrieval process logs, and after the logs respectively enter the log service, a group of retrieval process logs are put into the analysis model service for analysis processing after waiting for timing task scanning.
8. A method for retrieving the relevancy of the government appointment according to any one of the claim 7, wherein the relevancy index is generated by:
the log service timing task scans the collected logs, packs the three logs according to the UUID and sends the three logs to the analysis model service for processing, and generates a relevancy index:
the relevancy index comprises keywords, corresponding basic index IDs and relevancy values, the keywords and the basic indexes are subjected to many-to-many correlation mapping and the relevancy values are mounted, and the basic index IDs correlated in the index are searched to finally point to the basic indexes; meanwhile, different step values are set according to different log types: browsing a recall (+1), handling a recall (+2) and a missed recall (-1), setting the change of the correlation value in sequence, and scanning and deleting indexes with the correlation value of 0 at regular time;
checking whether a correlation index exists, if so, modifying the numerical value according to the setp rule, and if not, generating a new index according to the generation rule;
generating a related word index:
the fields contained in the associated vocabulary index are keywords and associated word arrays thereof, after the reservation service search interface processes the search result, the IK word segmentation result is put into a message queue, the analysis model service is used as a consumption end to process and generate the associated index, and the message queue realizes asynchronous decoupling.
9. A government affair appointment event correlation retrieval device, comprising: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor, configured to invoke the machine readable program to perform the method of any of claims 1 to 8.
10. Computer readable medium, characterized in that it has stored thereon computer instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 8.
CN202011244701.1A 2020-11-10 2020-11-10 Government affair appointment correlation retrieval method Active CN112269816B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011244701.1A CN112269816B (en) 2020-11-10 2020-11-10 Government affair appointment correlation retrieval method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011244701.1A CN112269816B (en) 2020-11-10 2020-11-10 Government affair appointment correlation retrieval method

Publications (2)

Publication Number Publication Date
CN112269816A true CN112269816A (en) 2021-01-26
CN112269816B CN112269816B (en) 2023-04-21

Family

ID=74339950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011244701.1A Active CN112269816B (en) 2020-11-10 2020-11-10 Government affair appointment correlation retrieval method

Country Status (1)

Country Link
CN (1) CN112269816B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112883246A (en) * 2021-03-09 2021-06-01 数字广东网络建设有限公司 Business item display method, device, equipment and storage medium
CN113377896A (en) * 2021-05-19 2021-09-10 朗新科技集团股份有限公司 Full-text quick retrieval method and device, electronic equipment and storage medium
CN113569132A (en) * 2021-05-31 2021-10-29 《人民论坛》杂志社 Information retrieval display method and system
CN116243833A (en) * 2023-05-08 2023-06-09 北京国信新网通讯技术有限公司 Cloud data-based electronic government platform communication management method and system
CN116975697A (en) * 2023-09-25 2023-10-31 广东赛博威信息科技有限公司 Main data management method, system, equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107562726A (en) * 2017-09-06 2018-01-09 国家电网公司 A kind of electric service search engine based on hot word
CN110569273A (en) * 2019-07-26 2019-12-13 南京邮电大学 Patent retrieval system and method based on relevance sorting
CN110807138A (en) * 2019-09-10 2020-02-18 国网电子商务有限公司 Method and device for determining search object category
CN111611268A (en) * 2020-05-21 2020-09-01 腾讯科技(深圳)有限公司 Government affair service search processing method and device
CN111859042A (en) * 2020-07-30 2020-10-30 上海妙一生物科技有限公司 Retrieval method and device and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107562726A (en) * 2017-09-06 2018-01-09 国家电网公司 A kind of electric service search engine based on hot word
CN110569273A (en) * 2019-07-26 2019-12-13 南京邮电大学 Patent retrieval system and method based on relevance sorting
CN110807138A (en) * 2019-09-10 2020-02-18 国网电子商务有限公司 Method and device for determining search object category
CN111611268A (en) * 2020-05-21 2020-09-01 腾讯科技(深圳)有限公司 Government affair service search processing method and device
CN111859042A (en) * 2020-07-30 2020-10-30 上海妙一生物科技有限公司 Retrieval method and device and electronic equipment

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112883246A (en) * 2021-03-09 2021-06-01 数字广东网络建设有限公司 Business item display method, device, equipment and storage medium
CN113377896A (en) * 2021-05-19 2021-09-10 朗新科技集团股份有限公司 Full-text quick retrieval method and device, electronic equipment and storage medium
CN113569132A (en) * 2021-05-31 2021-10-29 《人民论坛》杂志社 Information retrieval display method and system
CN116243833A (en) * 2023-05-08 2023-06-09 北京国信新网通讯技术有限公司 Cloud data-based electronic government platform communication management method and system
CN116975697A (en) * 2023-09-25 2023-10-31 广东赛博威信息科技有限公司 Main data management method, system, equipment and medium
CN116975697B (en) * 2023-09-25 2023-12-15 广东赛博威信息科技有限公司 Main data management method, system, equipment and medium

Also Published As

Publication number Publication date
CN112269816B (en) 2023-04-21

Similar Documents

Publication Publication Date Title
CN112269816B (en) Government affair appointment correlation retrieval method
US11789952B2 (en) Ranking enterprise search results based on relationships between users
US8990241B2 (en) System and method for recommending queries related to trending topics based on a received query
US11126630B2 (en) Ranking partial search query results based on implicit user interactions
JP5721818B2 (en) Use of model information group in search
US9201931B2 (en) Method for obtaining search suggestions from fuzzy score matching and population frequencies
AU2022201654A1 (en) System and engine for seeded clustering of news events
US8126888B2 (en) Methods for enhancing digital search results based on task-oriented user activity
US8645385B2 (en) System and method for automating categorization and aggregation of content from network sites
US8706748B2 (en) Methods for enhancing digital search query techniques based on task-oriented user activity
US8117198B2 (en) Methods for generating search engine index enhanced with task-related metadata
US10585927B1 (en) Determining a set of steps responsive to a how-to query
CN111008265A (en) Enterprise information searching method and device
CN101416212A (en) Targeting of buzz advertising information
CN110888990A (en) Text recommendation method, device, equipment and medium
US20190205465A1 (en) Determining document snippets for search results based on implicit user interactions
CN110188291B (en) Document processing based on proxy log
US9996529B2 (en) Method and system for generating dynamic themes for social data
US20210149979A1 (en) System and Method for Accessing and Managing Cognitive Knowledge
US11481454B2 (en) Search engine results for low-frequency queries
US20160063594A1 (en) Data refining engine for high performance analysis system and method
KR20090041519A (en) System and method for managing informaiton map
KR20140026796A (en) System and method for providing customized patent analysis service
CN117033744A (en) Data query method and device, storage medium and electronic equipment
RU2589856C2 (en) Method of processing target message, method of processing new target message and server (versions)

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant