CN112269913A - Enterprise-level full data intelligent search implementation method and system - Google Patents
Enterprise-level full data intelligent search implementation method and system Download PDFInfo
- Publication number
- CN112269913A CN112269913A CN202011174923.0A CN202011174923A CN112269913A CN 112269913 A CN112269913 A CN 112269913A CN 202011174923 A CN202011174923 A CN 202011174923A CN 112269913 A CN112269913 A CN 112269913A
- Authority
- CN
- China
- Prior art keywords
- data
- algorithm
- layer
- retrieval
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 58
- 238000010276 construction Methods 0.000 claims abstract description 24
- 238000005516 engineering process Methods 0.000 claims abstract description 18
- 238000004458 analytical method Methods 0.000 claims description 32
- 238000007405 data analysis Methods 0.000 claims description 19
- 238000007726 management method Methods 0.000 claims description 13
- 230000011218 segmentation Effects 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 10
- 238000012800 visualization Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 8
- 238000001914 filtration Methods 0.000 claims description 7
- 238000007418 data mining Methods 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 6
- 238000010801 machine learning Methods 0.000 claims description 6
- 210000004258 portal system Anatomy 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 4
- 230000006399 behavior Effects 0.000 claims description 3
- 238000007621 cluster analysis Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 abstract description 9
- 238000013461 design Methods 0.000 abstract description 6
- 238000013439 planning Methods 0.000 abstract description 3
- 230000007774 longterm Effects 0.000 abstract description 2
- 230000006872 improvement Effects 0.000 description 10
- 230000008901 benefit Effects 0.000 description 5
- 230000010354 integration Effects 0.000 description 5
- 238000005065 mining Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013523 data management Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
An enterprise-level full data intelligent search method and system comprises the following steps: the system comprises an access layer, a model layer, an algorithm layer, an assembly layer, a service layer and a display layer, wherein the system design follows four unified principles, namely the principles of unified leadership, unified planning, unified standard and unified construction. The demonstration application construction should fully consider the safety protection, fault tolerance and anti-interference capability of the system, ensure the long-term stable, safe, reliable and efficient operation of the system, have good compatibility and expansibility, follow the design concept taking a customer as the center, provide consistency and humanized user experience, meet the actual needs of the service to the maximum extent, and have convenient operation, complete functions and friendly interface; the application design adopts an international advanced technical route. Fully utilizes old and heterogeneous compatible technology and protects the prior IT investment of national network companies. It is in line with international and national universal standards and supports various hardware platforms.
Description
Technical Field
The invention relates to the technical field of information, in particular to a method and a system for realizing enterprise-level full data intelligent search.
Background
Information technologies such as cloud computing, big data, internet of things and mobile application are rapidly developed, companies develop a large amount of research and application, and a new opportunity is brought to the change of production modes and management modes of enterprises. During the thirteen-five period, the business of a company is upgraded from a partial support business operation to a comprehensive auxiliary analysis decision, comprehensive analysis and mining of data across business fields are required to be enhanced, analysis of semi-structured and unstructured data is enhanced, more regular and autonomous learning type data analysis is constructed, an information system is required to provide the capabilities of storage, rapid calculation and deep analysis and mining of massive and various types of data and rich information visualization display capability, uncertainty, randomness and subjectivity in the decision process are reduced to the maximum extent, the rationality, scientificity and rapid response degree of decision are enhanced, and the benefit and efficiency of decision are improved, so that the development direction of intelligent enterprises is led, and the progress is made in a more centralized, more intelligent and more interactive direction. Therefore, the data flow is accelerated, and the data utilization efficiency is improved, so that the method becomes a new challenge of facing to the application, providing quick and accurate data retrieval, efficient management and effective utilization.
The research focus and service performance of the search engine in different periods can divide the search engine into two generations:
first generation search engines appeared in 1994, represented by Yahoo, InfoSeek, AltaVista, etc., using manual or semi-manual indexing methods and keyword-based meta search techniques, with the goal of finding as many web pages as possible. The search engine generally indexes less than 100 ten thousand web pages, rarely re-collects the web pages and refreshes the index, has very slow retrieval speed, generally waits for 10 seconds or even longer, basically adopts mature IR (information retrieval), network, database and other technologies in the implementation technology, and is equivalent to the application on a WWW realized by utilizing some existing technologies.
The second generation search engine system appeared in 1996 mostly adopts a distributed scheme (multiple microcomputers work together) to improve data size, response speed and number of users, and generally maintains an index database of about 5,000 ten thousand web pages, which can respond to 1,000 ten thousand user retrieval requests each day, and the development direction is as follows: the size of index databases continues to increase, with typical commercial search engines remaining on the order of tens or even hundreds of millions of web pages.
1) Existing deficiencies and drawbacks
The existing search engine has more or less defects at present, which mainly appear in the following aspects:
2) logical operators
The query functions provided by existing search engines are quite limited, and most search engines only provide the most basic Boolean connections among keywords. For example, Yahoo only provides AND OR operations, AND once a logical operator is selected, it must be applied to all keywords. The Open Text Index allows users to use different Boolean operators, but only allows 4 operators and must operate in order of occurrence, and a query language as complex as the SQL language cannot be applied in the existing search engine.
3) Questioning using keywords only
Existing search engines only allow a question to be composed of a set of keywords and logical operators, but keyword retrieval does not fully satisfy the user's requirements, and it is a blind match, and natural language understanding is a very difficult task, and is still under study.
4) Inability to retrieve historical information
Every search of the user is from the beginning, and cannot be further refined from the original query result.
5) Simple result representation method
Most search engines return only a long search result list, typically several pages. The table may contain thousands of connection pointers to Web sites and the user may select only a small portion and discard the rest because the user may not be as patiently patienced and as a result they may lose much useful information.
6) Limitation of a single engine
As the amount of information on the Web is getting larger and larger, a single search engine cannot include the track of the whole network, the capability of an indexing robot, the size of an indexing database, the system maintenance overhead, and the like, which all limit the recall ratio of one search engine. Thus, the user must attempt to find the information he wants with all search engines. At worst, each engine is overlaid, and a user repeatedly finds a piece of information, and some solutions, such as a meta search engine and a distributed search engine, have appeared. In addition, it is reported that the main commercial search engine receives 1.5-2 ten thousand questions per minute, which is a great pressure on the index server.
7) It is difficult to provide effective personalized services for users
Because different users have different interests and hobbies, the required retrieval result also has certain pertinence, but the existing search engine cannot provide effective personalized service for a single user, so that the time for the user to inquire useful information is greatly increased. .
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides a method and a system for realizing enterprise-level full data intelligent search.
(II) technical scheme
In order to achieve the purpose, the invention provides the following technical scheme: an enterprise-level full-data intelligent search system, comprising:
an access layer: the unstructured data source collection of unstructured, enterprise portal and knowledge management authority is realized;
a model layer: establishing an authority model, a business model, an interest domain model and a similarity model, and providing a modeling basis for data analysis;
and an algorithm layer: establishing characteristic value modeling, data analysis and retrieval, model evaluation and high-dimensional visualization algorithm analysis processes by using a data mining and analysis algorithm;
assembly layer: developing public building to support upper-layer service calling, realizing data relation analysis and association based on a data retrieval component with authority, an entity naming identification component, an automatic label component and the like, and providing a basic component for data comprehensive utilization;
and (3) a service layer: applying the model of the model layer and the algorithm of the algorithm layer to data, and encapsulating each component according to requirements to form a public service for supporting business;
a display layer: and displaying the retrieval result, wherein the retrieval result comprises an enterprise portal system, a knowledge management system and three sets of five major business systems.
The invention improves that the service layer comprises a cross-service retrieval service, a document association retrieval service, a related recommendation service and an automatic pushing service.
The invention has the improvement that a corpus is arranged in the access layer, the corpus is processed and constructed by comprehensively applying manual and program automation methods according to the requirements of the service fields and the data conditions, the related service data is combed according to the requirements of service scenes in the manual aspect, corresponding classification processing is carried out, and the corpus in the given field is characterized by mainly utilizing a word segmentation technology and a machine learning characteristic modeling and pattern analysis technology in the automatic construction aspect, so that different types of corpora are established.
The improvement of the invention is that the arithmetic layer algorithm comprises: association rule and sequence pattern algorithm, classification and prediction pattern algorithm, cluster analysis pattern algorithm and heterogeneous analysis pattern algorithm.
The invention further provides an enterprise-level full data intelligent search implementation method, which comprises the following steps:
accessing: the unstructured data source collection of unstructured, enterprise portal and knowledge management authority is realized;
model: establishing an authority model, a business model, an interest domain model and a similarity model, and providing a modeling basis for data analysis;
the algorithm is as follows: establishing characteristic value modeling, data analysis and retrieval, model evaluation and high-dimensional visualization algorithm analysis processes by using a data mining and analysis algorithm;
assembly of: developing public building to support upper-layer service calling, realizing data relation analysis and association based on a data retrieval component with authority, an entity naming identification component, an automatic label component and the like, and providing a basic component for data comprehensive utilization;
service: applying the model of the model layer and the algorithm of the algorithm layer to data, and encapsulating each component according to requirements to form a public service for supporting business;
and (3) displaying: and displaying the retrieval result.
The invention improves that the access comprises a component of a corpus, and specifically comprises two steps of construction:
step 1, unstructured data source collection of unstructured, enterprise portal and knowledge management authority, analyzing collected unstructured data, manually classifying the data, and storing the data into a corpus;
step 2, processing and constructing a corpus of the electric power dictionary through manual and program automation methods, combing related business data according to business scene requirements in the manual aspect, and performing corresponding classification processing, wherein in the automatic construction aspect, the corpus of a given field is characterized mainly by using a word segmentation technology and a machine learning feature modeling and pattern analysis technology;
and (5) manually combing the external dictionary and classifying the external dictionary and storing the external dictionary into a corpus.
The improvement of the invention also comprises a step of preprocessing the data of the corpus, and filtering is carried out through a word segmentation component, a filtering component and a user literary composition component.
The invention has the improvement that the algorithm comprises a word similarity algorithm, a document similarity algorithm, a user behavior analysis and a project characteristic analysis.
The invention has the improvement that the model step comprises fuzzy retrieval, an interest domain and a business relation map.
(III) advantageous effects
Compared with the prior art, the invention provides an enterprise-level full data intelligent search system, which has the following beneficial effects: the system design follows four unified principles, namely the principles of unified leadership, unified planning, unified standard and unified construction. The demonstration application construction should fully consider the safety protection, fault tolerance and anti-interference capability of the system, ensure the long-term stable, safe, reliable and efficient operation of the system, and have good compatibility and expansibility. The method follows a design concept taking a client as a center, provides consistent and humanized user experience, meets actual service requirements to the maximum extent, and is convenient to operate, complete in function and friendly in interface; the application design adopts an international advanced technical route. Fully utilizes old and heterogeneous compatible technology and protects the prior IT investment of national network companies. The method conforms to international and national universal standards, supports various hardware platforms, and has good openness and portability. And a standard open platform interface is adopted to support data exchange and sharing with other systems, so that maintenance, expansion and interconnection are facilitated.
Drawings
FIG. 1 is a schematic diagram of the system of the present invention;
FIG. 2 is a flow chart of the present invention;
fig. 3 is a case flow chart of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1 to 3, an enterprise-level full data intelligent search system includes:
an access layer: the unstructured data source collection of unstructured, enterprise portal and knowledge management authority is realized;
a model layer: establishing an authority model, a business model, an interest domain model and a similarity model, and providing a modeling basis for data analysis;
and an algorithm layer: establishing characteristic value modeling, data analysis and retrieval, model evaluation and high-dimensional visualization algorithm analysis processes by using a data mining and analysis algorithm;
assembly layer: developing public building to support upper-layer service calling, realizing data relation analysis and association based on a data retrieval component with authority, an entity naming identification component, an automatic label component and the like, and providing a basic component for data comprehensive utilization;
and (3) a service layer: applying the model of the model layer and the algorithm of the algorithm layer to data, and encapsulating each component according to requirements to form a public service for supporting business;
a display layer: and displaying the retrieval result, wherein the retrieval result comprises an enterprise portal system, a knowledge management system and three sets of five major business systems.
The invention improves that the service layer comprises a cross-service retrieval service, a document association retrieval service, a related recommendation service and an automatic pushing service.
The technical route selection is carried out under the whole architecture of the national network, and according to the principle of localization and repeated construction reduction, an autonomous + mature open source software mode is selected to realize the rapid construction of the system, and the advancement and the stability of the architecture are ensured.
After researching the full-text retrieval product with mature source opening in the industry, the discovery that the ElasticSearch is an open-source, distributed and RESTful search engine constructed based on Lucene, is designed for distributed computation, and can achieve real-time, stable, reliable and rapid search.
The invention has the improvement that a corpus is arranged in the access layer, the corpus is processed and constructed by comprehensively applying manual and program automation methods according to the requirements of the service fields and the data conditions, the related service data is combed according to the requirements of service scenes in the manual aspect, corresponding classification processing is carried out, and the corpus in the given field is characterized by mainly utilizing a word segmentation technology and a machine learning characteristic modeling and pattern analysis technology in the automatic construction aspect, so that different types of corpora are established.
The improvement of the invention is that the arithmetic layer algorithm comprises: association rule and sequence pattern algorithm, classification and prediction pattern algorithm, cluster analysis pattern algorithm and heterogeneous analysis pattern algorithm.
The invention further provides an enterprise-level full data intelligent search implementation method, which comprises the following steps:
accessing: the unstructured data source collection of unstructured, enterprise portal and knowledge management authority is realized;
model: establishing an authority model, a business model, an interest domain model and a similarity model, and providing a modeling basis for data analysis;
the algorithm is as follows: establishing characteristic value modeling, data analysis and retrieval, model evaluation and high-dimensional visualization algorithm analysis processes by using a data mining and analysis algorithm;
assembly of: developing public building to support upper-layer service calling, realizing data relation analysis and association based on a data retrieval component with authority, an entity naming identification component, an automatic label component and the like, and providing a basic component for data comprehensive utilization;
service: applying the model of the model layer and the algorithm of the algorithm layer to data, and encapsulating each component according to requirements to form a public service for supporting business;
and (3) displaying: and displaying the retrieval result.
The invention improves that the access comprises a component of a corpus, and specifically comprises two steps of construction:
step 1, unstructured data source collection of unstructured, enterprise portal and knowledge management authority, analyzing collected unstructured data, manually classifying the data, and storing the data into a corpus;
step 2, processing and constructing a corpus of the electric power dictionary through manual and program automation methods, combing related business data according to business scene requirements in the manual aspect, and performing corresponding classification processing, wherein in the automatic construction aspect, the corpus of a given field is characterized mainly by using a word segmentation technology and a machine learning feature modeling and pattern analysis technology;
and (5) manually combing the external dictionary and classifying the external dictionary and storing the external dictionary into a corpus.
The improvement of the invention also comprises a step of preprocessing the data of the corpus, and filtering is carried out through a word segmentation component, a filtering component and a user literary composition component.
The invention has the improvement that the algorithm comprises a word similarity algorithm, a document similarity algorithm, a user behavior analysis and a project characteristic analysis.
The invention has the improvement that the model step comprises fuzzy retrieval, an interest domain and a business relation map.
And constructing a one-stop retrieval service, and providing data classification and pushing functions. And developing a batch transmission interface of the unstructured platform to realize automatic extraction, classification and clustering of unstructured data. The method has the advantages that the unstructured data are automatically collected to knowledge by combining enterprise-level knowledge management, classification, clustering and knowledge pushing from the unstructured data to the knowledge management are realized in a knowledge management question and answer bar, a knowledge construction module and a new knowledge recommendation module, the quality of knowledge is improved, the conversion from experience to knowledge is supported, and a foundation is laid for cross-business association use of data.
The novel system and the method provide different retrieval modes, and specifically comprise the following steps:
(1) comprehensive retrieval
The method has the advantages that the company full-data one-stop search service is constructed, the centralized retrieval of different business data is realized, the problem of cross-system and cross-business multi-source information dispersed search is solved, the comprehensive retrieval only realizes the expansion of retrieval breadth and depth, and the one-to-one accurate retrieval is realized.
The user can realize accurate retrieval, fuzzy retrieval and combined retrieval on certain logical relations based on single words and phrases in a search box and other corresponding visual search interfaces, the retrieval depth can be freely selected by the user, and specific range retrieval including title, text, attachments and other key attributes can be realized. For example, the user types in the term "five in three", and the system will return a result list of titles, text, attachments or other key attributes containing four words "five in three".
(2) Relevance retrieval
If the comprehensive retrieval is to retrieve scattered information points needed by the user from the massive information and return the result to the user in the form of knowledge points, the relevance retrieval is to serially connect the scattered information points at multiple angles through a certain relevance relationship and return the result to the user in the form of a relational network. In each retrieval, the user not only can obtain the comprehensive retrieval result, but also can know some new facts or new relations, so that the user can be prompted to carry out a series of new search queries, and the retrieval is more deep and extensive. The relevance retrieval is fuzzy and extension search, and one-to-many relative meaning retrieval is realized.
On the basis of comprehensive retrieval, a relation graph of the information points is constructed, and a complete knowledge network related to the search source and the result is constructed. And forming a relation degree sequence according to the relation degree of the information, and displaying the relation degrees layer by layer in a graphical visualization mode according to the internal and external dimensions. For example, the user types in the term "five in three", and the system will return a result list of titles, text, attachments or other key attributes containing four words "five in three". Meanwhile, relevant results such as human resources, finance, intensive management of materials, large planning, large construction, large production, large overhaul, large marketing system and the like related to the three sets and the five sets are returned, and relevant information such as twelve and four sets can also be returned.
On the basis of comprehensive retrieval, a venation map of the information points is constructed, and a complete information venation evolution relation related to a search source and a result is constructed. And forming a sequence according to key attributes such as the degree of relationship or time of the information, and displaying the information one by one in a graphical visualization mode according to the far and near dimensions. For example, the user types in the term "big overhaul scheme" and the system will return a result list with a title, body, attachment, or other key attribute containing five words of "big overhaul scheme". And meanwhile, returning a relational map related to the large overhaul system. In addition, the system returns key course information of the construction of the large overhaul system according to the time axis in a memorial mode. The context atlas only provides such a search for a portion of the event class search sources.
Meanwhile, the system also has an automatic pushing function, the retrieval can be divided into active retrieval and passive retrieval, the comprehensive retrieval and the relevance retrieval are active retrieval initiated by a user, and the automatic pushing is passive retrieval initiated by the system to the user. The system can complete the real-time active push of hot spot information and important information of a company without any retrieval operation by a user. Meanwhile, the system can also provide a preselected focus of attention for the user, and the system automatically finishes the automatic pushing of the information within the preselected range of the user.
Through project construction, one-stop intelligent search based on an unstructured data management platform is constructed, business, fusion, intelligence, initiative and individuation unstructured data information resource inlets are provided for users at all levels of a company, an unstructured big data value mining technical method based on an information retrieval layer is explored, construction cost of other business systems is reduced, and construction benefits of the unstructured data management platform are improved. The method specifically comprises the following three business targets:
1. the comprehensive search service of the total data of the company is constructed, the centralized search of different business data is realized, the problem of multi-source information dispersed search of cross-system and cross-business is solved, the comprehensive search only realizes the expansion of the search breadth and depth, and the one-to-one accurate search is realized.
2. On the basis of comprehensive retrieval results, the results are returned to the user in a knowledge point form, and then relevance retrieval is to carry out multi-angle series connection on scattered information points through a certain relevance relation and return the results to the user in a relation network form, so that the search is more deep and extensive.
3. The current passive retrieval situation is changed, the conversion from 'person finding data' to 'data finding person' is realized, the comprehensive retrieval and the relevance retrieval are active retrieval initiated by a user, and the automatic pushing is passive retrieval initiated by a system to the user.
The following presents a whole example by way of a simple example, with particular reference to fig. 3:
1. data acquisition: the data searched and collected by the enterprise comprise various databases (structured, semi-structured and unstructured), electronic documents, texts, multimedia and the like besides a webpage, and the data are cleaned by extracting and integrating heterogeneous data.
2. Modeling data: preprocessing (segmenting words, removing stop words, filtering virtual words and the like), feature representation, feature selection and feature weight calculation are carried out on the acquired data, a knowledge graph, a user interest model and a similarity model are established by adopting a text mining analysis algorithm, and model support is provided for data retrieval and display.
3. The user requests: and performing word segmentation and semantic understanding on the keywords or phrases input by the user, and bringing the recognized word segmentation and the user authority information into a search engine for query.
4. And (3) processing by a search engine: the retrieval request is firstly filtered by the authority, the index results meeting the authority are taken out, and the index results are ranked from high to low according to the degree of correlation by default.
5. The results show that: and based on the data model, visually displaying the retrieval result in a business association map form, an interest domain map form and the like.
In order to realize the final goal of cross-business, strong correlation and intelligentization one-stop search of enterprise information, the system is researched and developed according to the principle of point division and stage division. The project work target of the current period is mainly to realize the system framework construction of the one-stop search engine and complete the research and development of the system cross-business, strong association, automatic pushing and other important functions. And typical scenes are combed in the field of information-based construction, and the application effect is verified through trial run.
The new functions are as follows: the development work of 4 newly added function modules such as comprehensive retrieval (comprising 5 secondary modules such as cross-business system retrieval, accurate retrieval and fuzzy retrieval), relevance retrieval (comprising 2 secondary modules such as a relation map for constructing information points and a venation map for constructing information points), automatic pushing (comprising 5 secondary modules such as real-time information pushing, a user interest model and a recommendation algorithm), knowledge collection and one-stop retrieval service test point (comprising 2 secondary modules such as knowledge collection and one-stop retrieval service) is completed.
1) And completing the test point deployment implementation work at the headquarters of the company.
2) The data integration of 4 systems of cooperative office, IRS, knowledge management and portal is completed, the centralized retrieval of related materials in the information construction process is realized, and the working efficiency of staff is improved.
3) The integration of single sign-on with the portal is completed, and the retrieval and the use of the user are facilitated.
4) And completing integration of the unstructured platform and the unified authority.
The system functions are mainly divided into the following 5 blocks:
system and data integration: at this stage, the integrated service system is required to carry out the combing of the service model and the authority model, provide a unified search interface and realize the integration and the reconstruction of the service system.
Comprehensive retrieval: the method has the advantages that the company full-data one-stop search service is constructed, the centralized retrieval of different business data is realized, the problem of cross-system and cross-business multi-source information dispersed search is solved, the comprehensive retrieval only realizes the expansion of retrieval breadth and depth, and the one-to-one accurate retrieval is realized.
And (3) relevance retrieval: if the comprehensive retrieval is to retrieve scattered information points needed by the user from the massive information and return the result to the user in the form of knowledge points, the relevance retrieval is to serially connect the scattered information points at multiple angles through a certain relevance relationship and return the result to the user in the form of a relational network. In each retrieval, the user not only can obtain the comprehensive retrieval result, but also can know some new facts or new relations, so that the user can be prompted to carry out a series of new search queries, and the retrieval is more deep and extensive. The relevance retrieval is fuzzy and extension search, and one-to-many relative meaning retrieval is realized.
Automatic pushing: the retrieval can be divided into active retrieval and passive retrieval, the comprehensive retrieval and the relevance retrieval are active retrieval initiated by a user, and the automatic pushing is passive retrieval initiated by a system to the user.
Retrieval application: and analyzing according to the interest domain models such as the user identity information and the like, and displaying the recommendation result through knowledge management. After the user logs in the portal system, the user can input one or more keywords in a search interface of the portal system to initiate retrieval. Through the encryption and decryption algorithm, the user information is prevented from being stolen and tampered in the transmission process, and the user can be accessed without re-inputting a user name and a password during the second login full-text retrieval after the portal login.
And (3) system management: and the management of the user and the logging task is realized.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (9)
1. An enterprise-level full-data intelligent search system, comprising:
an access layer: the unstructured data source collection of unstructured, enterprise portal and knowledge management authority is realized;
a model layer: establishing an authority model, a business model, an interest domain model and a similarity model, and providing a modeling basis for data analysis;
and an algorithm layer: establishing characteristic value modeling, data analysis and retrieval, model evaluation and high-dimensional visualization algorithm analysis processes by using a data mining and analysis algorithm;
assembly layer: developing public building to support upper-layer service calling, realizing data relation analysis and association based on a data retrieval component with authority, an entity naming identification component, an automatic label component and the like, and providing a basic component for data comprehensive utilization;
and (3) a service layer: applying the model of the model layer and the algorithm of the algorithm layer to data, and encapsulating each component according to requirements to form a public service for supporting business;
a display layer: and displaying the retrieval result, wherein the retrieval result comprises an enterprise portal system, a knowledge management system and three sets of five major business systems.
2. The system of claim 2, wherein the service layer comprises a cross-business retrieval service, a document association retrieval service, a related recommendation service, and an automatic push service.
3. The system according to claim 3, wherein a corpus is provided in the access layer, the corpus is processed and constructed by comprehensively applying manual and program automated methods according to business field requirements and data conditions, in the manual aspect, the related business data are sorted according to business scene requirements, and corresponding classification processing is performed, in the automated construction aspect, the corpus in a given field is characterized by mainly utilizing a word segmentation technology and a machine learning feature modeling and pattern analysis technology, and different types of corpora are established.
4. The system of claim 1, wherein the intra-arithmetic layer algorithm comprises: association rule and sequence pattern algorithm, classification and prediction pattern algorithm, cluster analysis pattern algorithm and heterogeneous analysis pattern algorithm.
5. An enterprise-level full data intelligent search implementation method is characterized by comprising the following steps:
accessing: the unstructured data source collection of unstructured, enterprise portal and knowledge management authority is realized;
model: establishing an authority model, a business model, an interest domain model and a similarity model, and providing a modeling basis for data analysis;
the algorithm is as follows: establishing characteristic value modeling, data analysis and retrieval, model evaluation and high-dimensional visualization algorithm analysis processes by using a data mining and analysis algorithm;
assembly of: developing public building to support upper-layer service calling, realizing data relation analysis and association based on a data retrieval component with authority, an entity naming identification component, an automatic label component and the like, and providing a basic component for data comprehensive utilization;
service: applying the model of the model layer and the algorithm of the algorithm layer to data, and encapsulating each component according to requirements to form a public service for supporting business;
and (3) displaying: and displaying the retrieval result.
6. The method for implementing enterprise-level full-data intelligent search according to claim 1, wherein the access includes a component of a corpus, and specifically includes two steps of construction:
step 1, unstructured data source collection of unstructured, enterprise portal and knowledge management authority, analyzing collected unstructured data, manually classifying the data, and storing the data into a corpus;
step 2, processing and constructing a corpus of the electric power dictionary through manual and program automation methods, combing related business data according to business scene requirements in the manual aspect, and performing corresponding classification processing, wherein in the automatic construction aspect, the corpus of a given field is characterized mainly by using a word segmentation technology and a machine learning feature modeling and pattern analysis technology;
and (5) manually combing the external dictionary and classifying the external dictionary and storing the external dictionary into a corpus.
7. The method of claim 6, further comprising a data preprocessing step for the corpus, filtering by a word segmentation component, a filtering component and a user's literary component.
8. The method of claim 5, wherein the algorithm steps include word similarity algorithm, document similarity algorithm, user behavior analysis, and project feature analysis.
9. The method of claim 5, wherein the model step comprises fuzzy search, interest domain and business relationship graph.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011174923.0A CN112269913A (en) | 2020-10-28 | 2020-10-28 | Enterprise-level full data intelligent search implementation method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011174923.0A CN112269913A (en) | 2020-10-28 | 2020-10-28 | Enterprise-level full data intelligent search implementation method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112269913A true CN112269913A (en) | 2021-01-26 |
Family
ID=74345156
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011174923.0A Pending CN112269913A (en) | 2020-10-28 | 2020-10-28 | Enterprise-level full data intelligent search implementation method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112269913A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113256264A (en) * | 2021-06-07 | 2021-08-13 | 国网安徽省电力有限公司 | Management system, method and device for architecture full-flow management and control and readable storage medium |
CN113743885A (en) * | 2021-08-11 | 2021-12-03 | 南方电网数字电网研究院有限公司 | Construction method for enterprise-level data service access |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1822005A (en) * | 2006-04-07 | 2006-08-23 | 张天山 | Information pushing system and method based on web sit automatic forming and search engine |
CN102075560A (en) * | 2010-11-19 | 2011-05-25 | 福建富士通信息软件有限公司 | Fukutomi enterprise search engine technology based on system coupling |
KR20130011271A (en) * | 2011-07-21 | 2013-01-30 | 박미정 | The system which supports a recommendation/no-recommendation information about a human resources or company |
CN104899268A (en) * | 2015-05-25 | 2015-09-09 | 浪潮集团有限公司 | Distributed enterprise information vertical search method |
US20170344552A1 (en) * | 2016-05-26 | 2017-11-30 | Yahoo! Inc. | Computerized system and method for optimizing the display of electronic content card information when providing users digital content |
CN109710701A (en) * | 2018-12-14 | 2019-05-03 | 浪潮软件股份有限公司 | A kind of automated construction method for public safety field big data knowledge mapping |
CN110704577A (en) * | 2019-10-10 | 2020-01-17 | 国家电网公司华中分部 | Method and system for searching power grid scheduling data |
CN111723273A (en) * | 2019-03-18 | 2020-09-29 | 北京中电翔云信息技术有限公司 | Smart cloud retrieval system and method |
-
2020
- 2020-10-28 CN CN202011174923.0A patent/CN112269913A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1822005A (en) * | 2006-04-07 | 2006-08-23 | 张天山 | Information pushing system and method based on web sit automatic forming and search engine |
CN102075560A (en) * | 2010-11-19 | 2011-05-25 | 福建富士通信息软件有限公司 | Fukutomi enterprise search engine technology based on system coupling |
KR20130011271A (en) * | 2011-07-21 | 2013-01-30 | 박미정 | The system which supports a recommendation/no-recommendation information about a human resources or company |
CN104899268A (en) * | 2015-05-25 | 2015-09-09 | 浪潮集团有限公司 | Distributed enterprise information vertical search method |
US20170344552A1 (en) * | 2016-05-26 | 2017-11-30 | Yahoo! Inc. | Computerized system and method for optimizing the display of electronic content card information when providing users digital content |
CN109710701A (en) * | 2018-12-14 | 2019-05-03 | 浪潮软件股份有限公司 | A kind of automated construction method for public safety field big data knowledge mapping |
CN111723273A (en) * | 2019-03-18 | 2020-09-29 | 北京中电翔云信息技术有限公司 | Smart cloud retrieval system and method |
CN110704577A (en) * | 2019-10-10 | 2020-01-17 | 国家电网公司华中分部 | Method and system for searching power grid scheduling data |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113256264A (en) * | 2021-06-07 | 2021-08-13 | 国网安徽省电力有限公司 | Management system, method and device for architecture full-flow management and control and readable storage medium |
CN113743885A (en) * | 2021-08-11 | 2021-12-03 | 南方电网数字电网研究院有限公司 | Construction method for enterprise-level data service access |
CN113743885B (en) * | 2021-08-11 | 2024-04-19 | 南方电网数字电网研究院有限公司 | Construction method for enterprise-level data service access |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110866123A (en) | Method for constructing data map based on data model and system for constructing data map | |
CN104298785B (en) | Searching method for public searching resources | |
AU2011210742A1 (en) | Method and system for conducting legal research using clustering analytics | |
CN113392227A (en) | Metadata knowledge map engine system facing rail transit field | |
Martin et al. | A framework for business intelligence application using ontological classification | |
CN111813958A (en) | Intelligent service method and system based on innovation and entrepreneurship platform | |
CN112269913A (en) | Enterprise-level full data intelligent search implementation method and system | |
US20200334314A1 (en) | Emergency disposal support system | |
CN115757689A (en) | Information query system, method and equipment | |
CN111353085A (en) | Cloud mining network public opinion analysis method based on feature model | |
Antopol’skii et al. | The development of a semantic network of keywords based on definitive relationships | |
Li et al. | Application research of machine learning method based on distributed cluster in information retrieval | |
KR101665649B1 (en) | System for analyzing social media data and method for analyzing social media data using the same | |
Ji | A heuristic collaborative filtering recommendation algorithm based on book personalized Recommendation | |
CN114997624A (en) | Intelligent whole-person safety production responsibility management system | |
Croce et al. | On the experimental usage of ontology-based data management for the italian integrated system of statistical registers: quality issues | |
Yuan et al. | Research on network public opinion analysis platform architecture based on big data | |
Kononova et al. | Contextual knowledge extraction: terminological landscape of digital economy | |
Matsiuk et al. | Formation of Search Queries Based on Thesaurus of Narrowly Specialized Subject Areas | |
Liu | A public opinion monitoring system based on big data technology | |
Mansukhlal et al. | A Novel Approach for Semantic Integration of Data using Ontology | |
Li et al. | Research and development of ict call center data auxiliary analysis system based on knowledge discovery | |
Jia et al. | Study on standard system of aerospace quality data resources integration under the background of big data | |
Hu et al. | Research on Key Technologies of Internet Public Opinion Monitoring and Analysis System | |
Yuan et al. | Research on Virtual and Identity Identity, Correlation and Analysis Technology of Internet-related Economic Crimes Based on Big Data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210126 |
|
RJ01 | Rejection of invention patent application after publication |