CN115858729A - Multi-type knowledge retrieval and statistics method, device, storage medium and equipment - Google Patents

Multi-type knowledge retrieval and statistics method, device, storage medium and equipment Download PDF

Info

Publication number
CN115858729A
CN115858729A CN202211625173.3A CN202211625173A CN115858729A CN 115858729 A CN115858729 A CN 115858729A CN 202211625173 A CN202211625173 A CN 202211625173A CN 115858729 A CN115858729 A CN 115858729A
Authority
CN
China
Prior art keywords
retrieval
knowledge
retrieved
picture
vectorization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211625173.3A
Other languages
Chinese (zh)
Inventor
杨娟
翟士丹
林健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Haizhi Xingtu Technology Co ltd
Original Assignee
Beijing Haizhi Xingtu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Haizhi Xingtu Technology Co ltd filed Critical Beijing Haizhi Xingtu Technology Co ltd
Priority to CN202211625173.3A priority Critical patent/CN115858729A/en
Publication of CN115858729A publication Critical patent/CN115858729A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method, a device, a storage medium and equipment for searching and counting multi-type knowledge, wherein the method comprises the following steps: analyzing the knowledge to be retrieved, and performing semantic recognition on the analyzed knowledge to be retrieved aiming at text retrieval; the method comprises the steps of performing full-text retrieval, vectorization retrieval, retrieval type question and answer retrieval and map retrieval on knowledge to be retrieved according to a semantic identification result, vectorizing the analyzed knowledge to be retrieved aiming at picture retrieval, performing in-knowledge picture retrieval and picture knowledge retrieval on the knowledge to be retrieved according to a vectorization result, inputting full-text retrieval, vectorization retrieval, retrieval type question and answer retrieval and map retrieval results or in-knowledge picture retrieval and picture knowledge retrieval results into a fine ordering model for reordering, obtaining retrieval results of the knowledge to be retrieved, and performing statistical classification on the retrieval results. The invention can search different types of knowledge and can count the search results from different dimensions.

Description

Multi-type knowledge retrieval and statistics method, device, storage medium and equipment
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a multi-type knowledge retrieval and statistics method, a device, a storage medium and equipment.
Background
With the development of science and technology, people generate various documents, pictures, audio and video data in daily life and work, and when people need to search the documents, the pictures, the audio and the video data are difficult, and an operating system generally only aims at the title of the document and cannot retrieve the content of the document.
At present, more knowledge is searched in the market, products supporting multi-type knowledge search are few, and products for carrying out multi-dimensional statistics on knowledge search are few. The searching mode can only search the opened documents, the searched documents are single in type, a certain tool can only search one type of documents, pictures, audios and videos can only be judged whether to be similar according to the judgment, and the searching mode is high in use cost and low in efficiency.
Disclosure of Invention
In view of the above, the present invention provides a method, an apparatus, a storage medium, and a device for searching and counting multiple types of knowledge, which can search different types of knowledge and count search results from different dimensions.
In a first aspect, an embodiment of the present invention provides a multi-type knowledge retrieval and statistics method, where the method includes:
analyzing the knowledge to be retrieved to obtain structured data;
executing a retrieval mode corresponding to the attribute identification according to the attribute identification of the knowledge to be retrieved, wherein the retrieval mode comprises text retrieval and picture retrieval;
when the retrieval mode is text retrieval, performing semantic recognition on the structured data of the knowledge to be retrieved;
according to the semantic recognition result, performing full-text retrieval, vectorization retrieval, retrieval type question-answer retrieval and map retrieval on the knowledge to be retrieved respectively;
when the retrieval mode is picture retrieval, vectorizing the knowledge to be retrieved;
carrying out intra-knowledge picture retrieval and picture knowledge retrieval on the knowledge to be retrieved according to the vectorization result;
inputting the full-text retrieval, vectorization retrieval, retrieval type question answering retrieval and map retrieval results or the in-knowledge picture retrieval and picture knowledge retrieval results into a fine ordering model for reordering to obtain the retrieval result of the knowledge to be retrieved;
and carrying out statistical classification on the retrieval results.
Further, before performing semantic recognition on the structured data of the knowledge to be retrieved and before performing vectorization processing on the knowledge to be retrieved, the method further includes:
acquiring a user identity;
and screening a knowledge list corresponding to the user identity in a knowledge base according to the user identity, wherein the knowledge base comprises data of various knowledge types.
And further, performing semantic recognition on the structured data of the knowledge to be retrieved by using a natural language processing method.
Further, performing semantic recognition on the structured data of the knowledge to be retrieved at least comprises: chinese word segmentation, named entity identification, part of speech tagging, synonym analysis, word vector analysis, dependency grammar analysis, word position analysis, semantic normalization, knowledge error correction and label extraction.
Further, statistically classifying the search results includes:
and carrying out statistical classification according to the creation time, knowledge classification, knowledge owners, knowledge labels and knowledge types of the entity knowledge in the retrieval result.
In a second aspect, an embodiment of the present invention provides a multi-type knowledge retrieval and statistics apparatus, including:
the analysis module is used for analyzing the knowledge to be retrieved to obtain structured data;
the retrieval mode determining module is used for carrying out retrieval modes corresponding to the attribute identifications according to the attribute identifications of the knowledge to be retrieved, and the retrieval modes comprise text retrieval and picture retrieval;
the semantic recognition module is used for carrying out semantic recognition on the structured data of the knowledge to be retrieved when the retrieval mode is text retrieval;
the first retrieval module is used for respectively carrying out full-text retrieval, vectorization retrieval, retrieval type question-answer retrieval and map retrieval on the knowledge to be retrieved according to the semantic recognition result;
the vectorization module is used for vectorizing the knowledge to be retrieved when the retrieval mode is picture retrieval;
the second retrieval module is used for carrying out picture retrieval on the knowledge to be retrieved according to the vectorization result;
the fine ranking module is used for inputting the full-text retrieval, vectorization retrieval, retrieval type question and answer retrieval and map retrieval results or the picture retrieval results into a fine ranking model for re-ranking to obtain the retrieval results of the knowledge to be retrieved;
and the classification statistical module is used for carrying out statistical classification on the retrieval result.
Further, the apparatus further comprises:
the acquisition module is used for acquiring the user identity;
and the screening module is used for screening a knowledge list corresponding to the user identity identification in a knowledge base according to the user identity identification, wherein the knowledge base comprises data of various knowledge types.
In a third aspect, an embodiment of the present invention provides a storage medium, in which a computer program is stored, where the computer program is configured to execute the method in any one of the first aspects when the computer program runs.
In a fourth aspect, an embodiment of the present invention provides an apparatus, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform the method of any one of the first aspect.
In a fifth aspect, an embodiment of the present invention provides a computer program product, including a computer program, where the computer program is used to implement the steps of the method in any one of the first aspect when executed by a processor.
The technical scheme provided by the invention is that aiming at text retrieval, after a knowledge list corresponding to user authority is screened, semantic identification is carried out on knowledge to be retrieved, full-text retrieval, vectorization retrieval, retrieval type question-answer retrieval and map retrieval are respectively carried out on the knowledge to be retrieved after the semantic identification, a rough retrieval result of the knowledge to be retrieved is obtained, the rough retrieval result is gathered and then input into a fine ranking model to reorder the rough retrieval result, a final retrieval result is obtained, aiming at picture retrieval, after the knowledge list corresponding to the user authority is screened out, the knowledge to be retrieved is vectorized, the knowledge to be retrieved is subjected to in-knowledge picture retrieval and picture knowledge retrieval according to the vectorization result, the in-knowledge picture retrieval and picture knowledge retrieval results are input into the fine ranking model to reorder, the retrieval result of the knowledge to be retrieved is obtained, and finally the retrieval result is classified and counted. Therefore, the method and the device can search various types of knowledge, can count the search results, and further perform fine search after rough search on various types of knowledge, so that the accuracy of search is improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
FIG. 1 is a flow chart of a multi-type knowledge retrieval and statistics method provided by an embodiment of the invention;
FIG. 2 is a schematic structural diagram of a multi-type knowledge retrieval and statistics apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flow chart of a multi-type knowledge retrieval and statistics method provided by an embodiment of the present invention, the method includes the following steps:
step 101, analyzing knowledge to be retrieved to obtain structured data.
In this step, the knowledge to be retrieved may be an unstructured document, and may be parsed using an existing document parser to obtain structured data. After the knowledge to be retrieved is analyzed, keywords, titles, paragraph contents and full-text contents of the knowledge to be retrieved can be obtained.
And 102, executing a retrieval mode corresponding to the attribute identification according to the attribute identification of the knowledge to be retrieved, wherein the retrieval mode comprises text retrieval and picture retrieval.
In this step, the attribute identifier of the knowledge to be retrieved is used for indicating the type of the indication to be retrieved, the type includes a text and a picture, when the attribute identifier of the knowledge to be retrieved indicates that the knowledge to be retrieved is the text, the knowledge to be retrieved is retrieved by adopting a text retrieval mode, and when the attribute identifier of the knowledge to be retrieved indicates that the knowledge to be retrieved is the picture, the knowledge to be retrieved is retrieved by adopting a picture retrieval mode.
And 103, when the retrieval mode is text retrieval, performing semantic identification on the structured data of the knowledge to be retrieved.
In this step, a natural language processing method may be used to perform semantic recognition on the structured data of the knowledge to be retrieved. Natural Language Processing (NLP) aims at extracting information from text data, extracting information from the text that is suitable for computer algorithms.
For example, a natural language processing technology is adopted to perform a stem extraction on the structured data of the knowledge to be retrieved, and a process of removing a change or derivative form of a word and converting the word into a stem or prototype form is performed. The goal of stem extraction is to reduce the relevant words to the same stem, e.g., the stems of beautiful and beautiful in English, which are both beauti.
For example, a natural language processing technology is adopted to perform morphology reduction on structured data of knowledge to be retrieved, and the morphology reduction is a process of reducing a group of words into a word source or a word order form of a dictionary. The reduction process takes into account the semantics of the words in the sentence, the semantics of the words to adjacent sentences, etc. For example, in English: beautiful and beautifully are reduced to beautiful and beautifully, respectively.
For example, a natural language processing technology is adopted to perform word vectorization on the structured data of the knowledge to be retrieved, and the word vectorization is to represent the natural language by using a set of real numbers. Word vectorization can capture the essential relationship between natural language and real numbers. By word vectorization, a word or a phrase may be represented by a fixed-dimension vector, for example, the length of the vector may be 100.
For example, the structured data of the knowledge to be retrieved is subjected to part-of-speech tagging by using a natural language processing technology, wherein the part-of-speech tagging is a process of tagging words in a sentence into names, verbs, adjectives, adverbs and the like. For example, for the sentence "ashokkillthsniffakewithattribute", the part-of-speech tag would identify: ashok-pronouns, killed-verbs, the-qualifiers, snake nouns, with-conjuncts, a-qualifiers, stick-nouns,. -punctuation.
For example, the natural language processing technology is adopted to perform named entity disambiguation on the structured data of the knowledge to be retrieved, and the named entity disambiguation is a process for identifying the entity mentioned in the sentence. For example, for the sentence "Apple earneeveof 200Billion USDin2016", named entity disambiguation would infer that Apple in the sentence is Apple, and not a fruit.
For example, natural language processing technology is used to perform entity recognition on structured data of knowledge to be retrieved, and entity recognition is a task of recognizing entities with specific meanings in a sentence and distinguishing the entities into categories such as names of people, organization, dates, place names, time and the like.
For example, the structured data of the knowledge to be retrieved is subjected to emotion analysis by adopting a natural language processing technology. Emotion analysis is a broad subjective analysis that uses natural language processing techniques to identify the semantic emotion of a customer comment, the positive or negative emotion expressed by a sentence, and the emotion expressed by it judged by speech analysis or written text, etc. For example: "I dislike chocolate ice cream" -is a negative evaluation of the ice cream. "I do not dislike chocolate ice cream" -can be considered a neutral rating.
For example, a natural language processing technology is adopted to perform semantic text similarity analysis on structured data of knowledge to be retrieved, and the semantic text similarity analysis is a process of analyzing the similarity between the meaning and the essence of two sections of texts. Note that the similarity and correlation are different. For example, automobiles and buses are similar, but automobiles and fuels are related.
For example, natural language processing techniques are used to extract text summaries of structured data of knowledge to be retrieved, a text summary being the process of shortening a text by identifying key points of the text and creating a summary using those key points. The purpose of text summarization is to minimize text shortening without changing the meaning of the text.
For example, the structured data of the knowledge to be retrieved is subjected to language recognition by adopting a natural language processing technology, and the language recognition refers to distinguishing texts in different languages. It utilizes the statistical and syntactic properties of the language to perform this task. Language identification may also be considered a special case of text classification.
Therefore, after semantic recognition is carried out on the structured data of the knowledge to be retrieved, the attribute information of the knowledge such as word segmentation results, keywords, synonyms, service word recognition, parts of speech and the like of the knowledge to be retrieved can be obtained.
And step 104, respectively carrying out full-text retrieval, vectorization retrieval, retrieval type question-answer retrieval and map retrieval on the knowledge to be retrieved according to the semantic recognition result.
In this step, full-text search is performed on the knowledge to be searched according to the semantic recognition result, and Full-text search (Full-text retrieval) refers to finding out a text containing a specified vocabulary by using the text as a search object. For example, the data to be retrieved after the knowledge structuring includes a keyword index, the knowledge with the highest correlation with the knowledge to be retrieved is searched in the knowledge list according to the keyword index, and the first preset number of search results are selected according to the similarity score.
In the step, vectorization retrieval is performed on the knowledge to be retrieved according to the semantic identification result, specifically, vectorization processing is performed on the semantic identification result of the knowledge to be retrieved, vectorization processing is performed on the pictures, the audios and the videos in the knowledge list, the similarity between the vectorization-processed knowledge to be retrieved and the vectorization-processed pictures, the audios and the videos is calculated, and a second preset number of most relevant data are selected as the search result according to the similarity calculation result.
In this step, retrieval type question answering retrieval (FAQ) is performed on the knowledge to be retrieved according to the semantic recognition result, and the FAQ returns the most accurate answer corresponding to the knowledge to be retrieved by calculating the similarity between the knowledge to be retrieved and the problems in the existing knowledge base, so as to obtain a third preset number of data as a search result.
In the step, map retrieval is carried out on the knowledge to be retrieved according to the semantic recognition result, the map retrieval is firstly based on full-text retrieval, the first-degree relation of the entities is inquired according to index data matched with the retrieval, and a fourth preset number of data are obtained according to the map retrieval result and serve as the search result.
Therefore, through step 104, the retrieval results of the knowledge to be retrieved can be obtained, wherein the knowledge to be retrieved is the first preset number + the second preset number + the third preset number + the fourth preset number.
And 105, when the retrieval mode is picture retrieval, vectorizing the knowledge to be retrieved.
In this step, when the retrieval mode is picture retrieval, vectorization processing is performed on the structured data of the knowledge to be retrieved to obtain data such as keywords, titles, full-text contents and the like of the knowledge to be retrieved.
And 106, carrying out intra-knowledge picture retrieval and picture knowledge retrieval on the knowledge to be retrieved according to the vectorization result.
In this step, the intra-knowledge picture retrieval is mainly used for retrieving a part containing a picture, an audio or a video in a document, calculating the similarity between a vectorization result of knowledge to be retrieved and the vectorization result of the intra-knowledge picture, and obtaining a fifth preset number of search results.
And 107, inputting the full-text retrieval, vectorization retrieval, retrieval type question answering retrieval and map retrieval results or the intra-knowledge picture retrieval and picture knowledge retrieval results into a fine ordering model for reordering to obtain the retrieval result of the knowledge to be retrieved.
In this step, for text retrieval, a first preset number of retrieval results obtained by full-text retrieval, a second preset number of retrieval results obtained by vectorization retrieval, a third preset number of retrieval results obtained by search-type question-answer retrieval, and a fourth preset number of retrieval results obtained by atlas retrieval are input into a fine ranking model for reordering, so as to obtain a retrieval result of knowledge to be retrieved. And inputting a fifth preset number of retrieval results obtained by retrieving pictures in the knowledge and a sixth preset number of retrieval results obtained by retrieving the picture knowledge into a fine ranking model for reordering aiming at map retrieval to obtain retrieval results of the knowledge to be retrieved.
In this step, the refined ranking model may use a convolutional neural network to calculate the similarity between the knowledge to be retrieved and the search results obtained in each coarse ranking stage, or may use a recall algorithm to calculate the similarity between the knowledge to be retrieved and the search results obtained in each coarse ranking stage.
And 107, carrying out statistical classification on the retrieval result.
In this step, since semantic recognition processing is performed on the knowledge to be retrieved and each piece of knowledge in the knowledge base, attribute information of the knowledge to be retrieved and each piece of knowledge in the knowledge base can be obtained, where the attribute information may include creation time of the knowledge, knowledge classification, knowledge owner, knowledge tag, knowledge type, and the like. And carrying out classified statistics on the retrieval results according to the attribute information of each knowledge.
In some embodiments, prior to step 103, the method further comprises:
step 103a, acquiring a user identity;
and 103b, screening a knowledge list corresponding to the user identity in a knowledge base according to the user identity, wherein the knowledge base comprises data of various knowledge types.
In this embodiment, different search contents are set for users with different rights, so that the security of the search result can be improved.
Referring to fig. 2, fig. 2 is a block diagram of a multi-type knowledge retrieval and statistics apparatus according to an embodiment of the present invention, the apparatus includes:
the analysis module 21 is configured to analyze the knowledge to be retrieved to obtain structured data;
the retrieval mode determining module 22 is configured to perform a retrieval mode corresponding to the attribute identifier according to the attribute identifier of the knowledge to be retrieved, where the retrieval mode includes text retrieval and picture retrieval;
the semantic recognition module 23 is configured to perform semantic recognition on the structured data of the knowledge to be retrieved when the retrieval mode is text retrieval;
the first retrieval module 24 is configured to perform full-text retrieval, vectorization retrieval, retrieval-type question-answer retrieval and map retrieval on the knowledge to be retrieved according to the semantic recognition result;
the vectorization module 25 is configured to perform vectorization processing on the knowledge to be retrieved when the retrieval mode is picture retrieval;
the second retrieval module 26 is configured to perform intra-knowledge picture retrieval and picture knowledge retrieval on the knowledge to be retrieved according to the vectorization result;
the fine ranking module 27 is configured to input the full-text search, vectorization search, search-type question-answer search, and map search results or the intra-knowledge picture search and picture knowledge search results into a fine ranking model for re-ranking to obtain a search result of the knowledge to be searched;
and a classification statistical module 28, configured to perform statistical classification on the search result.
In some embodiments, the apparatus further comprises:
an obtaining module 29, configured to obtain a user identity;
the screening module 30 is configured to screen a knowledge list corresponding to the user identity from a knowledge base according to the user identity, where the knowledge base includes data of multiple knowledge types.
The technical scheme provided by the invention is that aiming at text retrieval, after a knowledge list corresponding to user authority is screened, semantic identification is carried out on knowledge to be retrieved, full-text retrieval, vectorization retrieval, retrieval type question-answer retrieval and map retrieval are respectively carried out on the knowledge to be retrieved after the semantic identification, a rough retrieval result of the knowledge to be retrieved is obtained, the rough retrieval result is gathered and then input into a fine ranking model to reorder the rough retrieval result, a final retrieval result is obtained, aiming at picture retrieval, after the knowledge list corresponding to the user authority is screened out, the knowledge to be retrieved is vectorized, the knowledge to be retrieved is subjected to in-knowledge picture retrieval and picture knowledge retrieval according to the vectorization result, the in-knowledge picture retrieval and picture knowledge retrieval results are input into the fine ranking model to reorder, the retrieval result of the knowledge to be retrieved is obtained, and finally the retrieval result is classified and counted. Therefore, the method and the device can search various types of knowledge, can count the search results, and further perform fine search after rough search on various types of knowledge, so that the accuracy of search is improved.
It should be noted that the multi-type knowledge retrieving and counting apparatus in the embodiment of the present invention and the multi-type knowledge retrieving and counting method in the above embodiment belong to the same inventive concept, and the technical details that are not detailed in the apparatus can be referred to the related description of the method, and are not repeated herein.
Furthermore, an embodiment of the present invention further provides a storage medium, in which a computer program is stored, where the computer program is configured to execute the foregoing method when running.
Furthermore, an embodiment of the present invention provides a computer program product, which includes a computer program, and the computer program can implement the foregoing method when being executed by a processor.
FIG. 3 illustrates a block diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 3, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM12, and the RAM13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 performs the various methods and processes described above, such as the idle detection method.
In some embodiments, the idle detection method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM12 and/or the communication unit 19. When the computer program is loaded into the RAM13 and executed by the processor 11, one or more steps of the idle detection method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the idle detection method in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A multi-type knowledge retrieval and statistics method, the method comprising:
analyzing the knowledge to be retrieved to obtain structured data;
executing a retrieval mode corresponding to the attribute identification according to the attribute identification of the knowledge to be retrieved, wherein the retrieval mode comprises text retrieval and picture retrieval;
when the retrieval mode is text retrieval, performing semantic recognition on the structured data of the knowledge to be retrieved;
according to the semantic recognition result, performing full-text retrieval, vectorization retrieval, retrieval type question-answer retrieval and map retrieval on the knowledge to be retrieved respectively;
when the retrieval mode is picture retrieval, vectorizing the structured data of the knowledge to be retrieved;
carrying out intra-knowledge picture retrieval and picture knowledge retrieval on the knowledge to be retrieved according to the vectorization result;
inputting the full-text retrieval, vectorization retrieval, retrieval type question answering retrieval and map retrieval results or the in-knowledge picture retrieval and picture knowledge retrieval results into a fine ordering model for reordering to obtain the retrieval result of the knowledge to be retrieved;
and carrying out statistical classification on the retrieval result.
2. The multi-type knowledge retrieval and statistics method of claim 1, wherein before performing semantic recognition on the structured data of the knowledge to be retrieved, and before performing vectorization processing on the knowledge to be retrieved, the method further comprises:
acquiring a user identity;
and screening a knowledge list corresponding to the user identity in a knowledge base according to the user identity, wherein the knowledge base comprises data of various knowledge types.
3. The method for multi-type knowledge retrieval and statistics according to claim 1, wherein a natural language processing method is used to perform semantic recognition on the structured data of the knowledge to be retrieved.
4. The method for multi-type knowledge retrieval and statistics according to claim 3, wherein semantically recognizing the structured data of knowledge to be retrieved comprises at least: chinese word segmentation, named entity identification, part of speech tagging, synonym analysis, word vector analysis, dependency grammar analysis, word position analysis, semantic normalization, knowledge error correction and label extraction.
5. The multi-type knowledge retrieval and statistical method of claim 1, wherein statistically classifying the retrieval results comprises:
and carrying out statistical classification according to the creation time, knowledge classification, knowledge owners, knowledge labels and knowledge types of the entity knowledge in the retrieval result.
6. A multi-type knowledge retrieval and statistics apparatus, the apparatus comprising:
the analysis module is used for analyzing the knowledge to be retrieved to obtain structured data;
the retrieval mode determining module is used for carrying out retrieval modes corresponding to the attribute identifications according to the attribute identifications of the knowledge to be retrieved, and the retrieval modes comprise text retrieval and picture retrieval;
the semantic recognition module is used for carrying out semantic recognition on the structured data of the knowledge to be retrieved when the retrieval mode is text retrieval;
the first retrieval module is used for respectively carrying out full-text retrieval, vectorization retrieval, retrieval type question-answer retrieval and map retrieval on the knowledge to be retrieved according to the semantic recognition result;
the vectorization module is used for vectorizing the knowledge to be retrieved when the retrieval mode is picture retrieval;
the second retrieval module is used for carrying out in-knowledge picture retrieval and picture knowledge retrieval on the knowledge to be retrieved according to the vectorization result;
the fine ranking module is used for inputting the full-text retrieval, vectorization retrieval, retrieval type question answering retrieval and map retrieval results or the intra-knowledge picture retrieval and picture knowledge retrieval results into a fine ranking model for re-ranking to obtain the retrieval results of the knowledge to be retrieved;
and the classification statistical module is used for carrying out statistical classification on the retrieval result.
7. The apparatus of claim 6, further comprising:
the acquisition module is used for acquiring the user identity;
and the screening module is used for screening a knowledge list corresponding to the user identity identification in a knowledge base according to the user identity identification, wherein the knowledge base comprises data of various knowledge types.
8. A storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of any of claims 1 to 5 when executed.
9. An apparatus comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 5.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.
CN202211625173.3A 2022-12-16 2022-12-16 Multi-type knowledge retrieval and statistics method, device, storage medium and equipment Pending CN115858729A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211625173.3A CN115858729A (en) 2022-12-16 2022-12-16 Multi-type knowledge retrieval and statistics method, device, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211625173.3A CN115858729A (en) 2022-12-16 2022-12-16 Multi-type knowledge retrieval and statistics method, device, storage medium and equipment

Publications (1)

Publication Number Publication Date
CN115858729A true CN115858729A (en) 2023-03-28

Family

ID=85673748

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211625173.3A Pending CN115858729A (en) 2022-12-16 2022-12-16 Multi-type knowledge retrieval and statistics method, device, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN115858729A (en)

Similar Documents

Publication Publication Date Title
CN111104794A (en) Text similarity matching method based on subject words
CN111104526A (en) Financial label extraction method and system based on keyword semantics
CN107577663B (en) Key phrase extraction method and device
CN112559684A (en) Keyword extraction and information retrieval method
CN115438166A (en) Keyword and semantic-based searching method, device, equipment and storage medium
CN111160007B (en) Search method and device based on BERT language model, computer equipment and storage medium
CN111444304A (en) Search ranking method and device
CN116775874B (en) Information intelligent classification method and system based on multiple semantic information
Singh et al. Sentiment analysis using lexicon based approach
CN113660541A (en) News video abstract generation method and device
CN115470313A (en) Information retrieval and model training method, device, equipment and storage medium
CN113806483A (en) Data processing method and device, electronic equipment and computer program product
CN113609847A (en) Information extraction method and device, electronic equipment and storage medium
CN110705285B (en) Government affair text subject word library construction method, device, server and readable storage medium
CN117216275A (en) Text processing method, device, equipment and storage medium
CN109298796B (en) Word association method and device
CN109344397B (en) Text feature word extraction method and device, storage medium and program product
CN116628278A (en) Multi-modal searching method, device, storage medium and equipment
CN114491232B (en) Information query method and device, electronic equipment and storage medium
CN112926297B (en) Method, apparatus, device and storage medium for processing information
CN111949781B (en) Intelligent interaction method and device based on natural sentence syntactic analysis
TWI636370B (en) Establishing chart indexing method and computer program product by text information
CN112529627B (en) Method and device for extracting implicit attribute of commodity, computer equipment and storage medium
CN114255067A (en) Data pricing method and device, electronic equipment and storage medium
CN111368036B (en) Method and device for searching information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination