CN116450913A - Retrieval method, retrieval device, server and computer readable storage medium - Google Patents

Retrieval method, retrieval device, server and computer readable storage medium Download PDF

Info

Publication number
CN116450913A
CN116450913A CN202210010310.6A CN202210010310A CN116450913A CN 116450913 A CN116450913 A CN 116450913A CN 202210010310 A CN202210010310 A CN 202210010310A CN 116450913 A CN116450913 A CN 116450913A
Authority
CN
China
Prior art keywords
search
data
target
client
retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210010310.6A
Other languages
Chinese (zh)
Inventor
雷中杰
杨凯峰
石正贵
梅勇
白波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Information Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Information Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202210010310.6A priority Critical patent/CN116450913A/en
Publication of CN116450913A publication Critical patent/CN116450913A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/904Browsing; Visualisation therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a retrieval method, a retrieval device, a server and a computer readable storage medium, comprising: receiving a search request sent by a client, and acquiring a search keyword based on the search request; determining candidate search results from a pre-constructed search database based on the search keywords; acquiring preference information corresponding to the client, and sorting the candidate search results based on the preference information to obtain target search results; and sending a search response carrying the target search result to the client. In the retrieval process, the personalized target retrieval result can be determined based on the preference information, so that the personalized requirement of the client is met, the ideal retrieval result is obtained, and the retrieval efficiency is improved.

Description

Retrieval method, retrieval device, server and computer readable storage medium
Technical Field
The present application relates to the field of internet technologies, and relates to, but is not limited to, a search method, apparatus, server, and computer readable storage medium.
Background
With the continuous development of internet technology, the required search information can be obtained from a server by a method of inputting keywords and retrieving them in a client.
In the related art, search information matched with a keyword is determined from server resources by a word matching method, and the obtained search information is used as a search result. The obtained search result only considers a keyword, but does not fully consider the personalized requirement of the client, so that the problem that the search result is not matched with the personalized requirement of the client is caused, the ideal search result cannot be obtained, and the search efficiency is reduced.
Disclosure of Invention
In view of this, embodiments of the present application provide a retrieval method, apparatus, server, and computer-readable storage medium.
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides a retrieval method, which comprises the following steps:
receiving a search request sent by a client, and acquiring a search keyword based on the search request;
determining candidate search results from a pre-constructed search database based on the search keywords;
acquiring preference information corresponding to the client, and sorting the candidate search results based on the preference information to obtain target search results;
and sending a search response carrying the target search result to the client.
The embodiment of the application provides a retrieval device, which comprises:
the first acquisition module is used for receiving a search request sent by a client and acquiring a search keyword based on the search request;
the first determining module is used for determining candidate search results from a pre-constructed search database based on the search keywords;
the ordering module is used for acquiring preference information corresponding to the client, and ordering the candidate search results based on the preference information to obtain target search results;
and the sending module is used for sending the search response carrying the target search result to the client.
The embodiment of the application provides a server, which comprises:
a processor; and
a memory for storing a computer program executable on the processor;
wherein the computer program, when executed by the processor, implements the above-described retrieval method.
Embodiments of the present application provide a computer-readable storage medium having stored therein computer-executable instructions configured to perform the above-described retrieval method.
The embodiment of the application provides a retrieval method, a retrieval device, a server and a computer readable storage medium, wherein the retrieval method comprises the following steps: when a search request of a client is received, acquiring search keywords based on the search request; then, determining candidate search results from a search database based on the search keywords, wherein the search database is a pre-constructed database, and the search database contains a plurality of data; then, obtaining preference information corresponding to the client, and sorting candidate search results based on the preference information, so as to obtain target search results matched with the preference information; and finally, sending a search response carrying the target search result to the client so that the client can display the target search result matched with the preference information. In the retrieval process, the personalized target retrieval result can be determined based on the preference information, so that the personalized requirement of the client is met, the ideal retrieval result is obtained, and the retrieval efficiency is improved.
Drawings
In the drawings (which are not necessarily drawn to scale), like numerals may describe similar components in different views. The drawings illustrate generally, by way of example and not by way of limitation, various embodiments discussed herein.
Fig. 1 is a schematic flow chart of an implementation of a search method according to an embodiment of the present application;
FIG. 2 is a schematic flowchart of an implementation of determining candidate search results according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of an implementation of obtaining a target search result according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of another implementation of the search method according to the embodiment of the present application;
FIG. 5A is a schematic diagram of a block diagram of a retrieval system according to an embodiment of the present application;
fig. 5B is a schematic structural diagram of a block diagram of a retrieval system in an office system according to an embodiment of the present application;
fig. 6 is a schematic diagram of a composition structure of a search device according to an embodiment of the present application;
fig. 7 is a schematic diagram of a composition structure of a server according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail with reference to the accompanying drawings, and the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.
In the following description, the terms "first", "second", "third" and the like are merely used to distinguish similar objects and do not represent a particular ordering of the objects, it being understood that the "first", "second", "third" may be interchanged with a particular order or sequence, as permitted, to enable embodiments of the application described herein to be practiced otherwise than as illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.
In the related art, content retrieval may be achieved by the following means including: creating a discipline ontology by using a semantic association editor; adding a predefined associated word family to the subject ontology; determining relevant subject matter of the ontology and referring to the resource; extracting related topics of the index articles by utilizing a predefined ontology word family; storing the extracted subject term as an attribute index; and searching the content by using the subject term or the keyword, and clustering the articles under the related ontology. The method can effectively define the subject ontology, improves the content retrieval accuracy in a controlled environment (such as school professional courseware resources), and improves the association degree between the resources.
However, when searching is performed for images and texts with low resolution, the content is difficult to query and analyze, and meanwhile, the analysis is required to be performed according to the search category used for many times, so that the judgment is also affected. In addition, the search result obtained by the method does not fully consider the personalized requirement of the client, so that the problem that the search result is not matched with the personalized requirement of the client is caused, the ideal search result cannot be obtained, and the search efficiency is reduced.
Based on the problems of the related art, the embodiment of the present application provides a search method, where the method provided in the embodiment of the present application may be implemented by a computer program, and the computer program completes the search method provided in the embodiment of the present application when executed, and the detection method is applied to a server. In some embodiments, the computer program may be executed on a processor in a server. Fig. 1 is a flowchart of an implementation of a search method provided in an embodiment of the present application, where, as shown in fig. 1, the search method includes:
step S101, receiving a search request sent by a client, and acquiring a search keyword based on the search request.
Here, the server receives the search request sent by the client, where the server may be a dedicated server or a general server, or may be a rack server or a cabinet server. The client is a terminal which establishes communication connection with the server, and can be a computer, a mobile phone, a household appliance, a wearable device, an intelligent vehicle-mounted device and the like.
In this embodiment of the present application, the server may receive, based on the established communication connection, a search request sent by the client, where the search request includes a search keyword, and the server may obtain the search keyword by parsing the search request, which may be parsing the entire search request or may be parsing a field in the search request. In actual implementation, parsing may be a decoding operation.
By way of example, the search keywords may be "artificial intelligence algorithm", "communication mode", "baking", and the like.
In some embodiments, the search request may also carry speech data, where this step may be implemented by performing speech recognition on the speech data carried in the search request to obtain search text data, and then determining a search keyword from the search text data. The search keyword may be one word or a plurality of words.
Step S102, candidate search results are determined from a pre-constructed search database based on the search keywords.
Here, when the search database is constructed, on the one hand, resources crawled from the internet are stored in the database, and on the other hand, upload data from the client to the database may be stored. The server acquires the data and stores the data in a database, and also performs document preprocessing on the data stored in the database to extract useful information in each data; the Chinese data is segmented to obtain word segmentation results corresponding to the data, and the part of speech and the named entity are judged according to the word segmentation results; and then extracting the sign, extracting key information which can most express the content and the characteristics of the document from the document, and classifying the data according to the characteristics of the document.
In the embodiment of the application, word segmentation results corresponding to all data in the search database can be obtained first; determining a target word segmentation result containing the search keyword from the word segmentation results; and then, determining the data corresponding to the target word segmentation result as target data, and determining the target data as a candidate retrieval result. The target data includes a search keyword, and in general, the number of the target data may be plural, and in special cases, the number of the target data may be one.
Step S103, obtaining preference information corresponding to the client, and sorting candidate search results based on the preference information to obtain target search results.
Here, the preference information is some characteristics to reflect browsing or accessing data by the client, for example, the preference information may be domestic authors, electronic fields, entertainment fields, science and technology fields, etc.
In the embodiment of the application, the search database may further include classification information of data, and based on this, the server may acquire classification information of each target data in the candidate search result; next, the similarity between the preference information and the classification information of each target data is determined, which can be determined here by a euclidean distance, manhattan distance, markov distance, or the like; and finally, sorting all the target data according to the obtained similarity, thereby obtaining a target retrieval result, wherein the sorting position is higher and the sorting position is higher. That is, the target data is ordered according to the magnitude relation of the similarity, the target data corresponding to the larger similarity is ordered earlier, and the target data corresponding to the smaller similarity is ordered later.
Based on this, the sorted target data is determined as a target retrieval result.
Step S104, the search response carrying the target search result is sent to the client.
In the embodiment of the application, the server encodes the target search result to generate a search response corresponding to the search request, and then sends the search response to the client based on the established communication connection, so that the client obtains the target search result based on the search response and displays the target search result.
In some embodiments, the client may display the target search result by default, or may display the target search result by a custom manner, where the custom manner may be obtained from the search response or may be obtained from the setting information of the client itself.
The embodiment of the application provides a retrieval method, when a server receives a retrieval request of a client, a search keyword in the server is acquired based on the retrieval request; then, determining candidate search results from a search database based on the search keywords, wherein the search database is a pre-constructed database, and the search database contains a plurality of data; then, obtaining preference information corresponding to the client, and sorting candidate search results based on the preference information, so as to obtain target search results matched with the preference information; and finally, sending a search response carrying the target search result to the client so that the client can display the target search result matched with the preference information. In the retrieval process, the personalized target retrieval result can be determined based on the preference information, so that the personalized requirement of the client is met, the ideal retrieval result is obtained, and the retrieval efficiency is improved.
In actual implementation, referring to fig. 2, the step S102 "determining candidate search results from a pre-constructed search database based on search keywords" may be implemented by the following steps S1021 to S1023:
step S1021, word segmentation results corresponding to the data in the search database are obtained.
Here, the word segmentation result corresponding to each data may be obtained from the search database through the word segmentation obtaining instruction, where each word segmentation result corresponding to each data may include a plurality of words, and for example, the word segmentation result corresponding to a certain data may be "artificial intelligence", "inference algorithm", "neural network", "input layer".
Step S1022, determining target word segmentation results containing the search keywords from the word segmentation results.
Here, the target word segmentation result can be determined by comparing and determining the intersection. When determining a target word segmentation result by a comparison method, comparing the relation between the words in each word segmentation result and the search keywords, and determining the word segmentation result as the target word segmentation result if the search keywords are included in the words; if the word is compared to include no search keyword, the word segmentation result is not the target word segmentation result.
In some embodiments, if the target word segmentation result is determined by an intersection method, the words in each word segmentation result are assembled into word sets, intersections between the word sets and the search keywords are determined, if the intersections are not empty, the words in the characterization word segmentation result contain the search keywords, and the word segmentation result is determined to be the target word segmentation result. And if the intersection is an empty set, the words in the characterization word result do not contain search keywords, and the word result is not the target word result.
Step S1023, determining target data corresponding to the target word segmentation result as a candidate retrieval result.
Here, the data corresponding to the target word segmentation result is determined as target data, and then the target data is determined as a candidate search result matched with the search keyword.
In the embodiment of the present application, through the steps S1021 to S1023, the word segmentation result is obtained first, then the target word segmentation result is determined according to whether the word segmentation result includes the search keyword, and the target data corresponding to the target word segmentation result is determined as the candidate search result, so that the candidate search result can be obtained quickly and conveniently, and the search speed is improved.
In actual implementation, referring to fig. 3, the step S103 "obtaining the preference information corresponding to the client, and sorting the candidate search results based on the preference information, to obtain the target search result" may be implemented by the following steps S1031 to S10310:
step S1031, a history browsing record of the client is obtained, and a plurality of history browsing data are obtained based on the history browsing record.
Here, the identification information of the client may be obtained by analyzing the search request, and then the history browsing record of the client may be obtained from the log of the server based on the identification information of the client, where the history browsing record may be a browsing record of a week or a browsing record of a month, or may be a browsing record of a quarter; then, history browsing data corresponding to the history record is obtained.
Step S1032, obtaining the classification information corresponding to each history browsing data.
Here, the classification information corresponding to each of the historical browsing data may be obtained from the search database by the classification obtaining instruction, where each of the word segmentation results corresponding to the data may include at least one classification information, and for example, the classification information corresponding to a certain data may be "author type", "belonging field" and "nature of enterprise".
Step S1033, determining preference information corresponding to the client based on the classification information corresponding to each of the historical browsing data.
Here, the classification information is statistically processed to obtain the type of author, belonging to the domain, and nature of the enterprise, whose occurrence is most frequent, and the type of author, belonging to the domain, and nature of the enterprise, whose occurrence is most frequent, are determined as the preference information.
Step S1034, obtaining classification information of each target data in the candidate search results.
Here, step S1034 is similar to the implementation process of step S1032 described above, and therefore, the implementation process of step S1034 may refer to the implementation process of step S1032 described above.
Step S1035, a degree of similarity between the preference information and the classification information of each target data is determined.
Here, the similarity may be determined by a euclidean distance, a manhattan distance, a markov distance, or the like, where a larger similarity characterizes a more similar and a smaller similarity characterizes a less similar.
Step S1036, determining whether or not the external voting result of each target data in the candidate search results can be obtained.
Here, the external voting result refers to the number of times target data is accessed or browsed in a period of time. If the external voting result of each target data cannot be obtained through the voting obtaining instruction, the external voting result does not exist in the characterization search database, and the step S1037 is performed; if the external voting result of each target data is obtained by the voting obtaining instruction, the external voting result is represented as being present in the search database, and the process proceeds to step S1038.
Step S1037, sorting all target data based on the similarity to obtain target retrieval results.
At this time, the external voting results of the respective target data cannot be obtained by the voting acquisition instruction, and the fact that the external voting results do not exist in the search database is characterized, and the respective target data are ranked based on the similarity alone. The higher the similarity, the more forward the sorting position, i.e. the sorting is performed on the target data according to the magnitude relation of the similarity, the more forward the sorting is performed on the target data corresponding to the larger similarity, and the more backward the sorting is performed on the target data corresponding to the smaller similarity.
Step S1038, obtaining a preset preference duty ratio and a preset voting duty ratio.
At this time, external voting results of each target data can be obtained through the voting acquisition instruction, and the existence of the external voting results in the search database is characterized.
In this case, the target data is ranked based on the similarity and the external voting result, first, a preset preference duty ratio and a preset voting duty ratio are acquired. Wherein the sum of the preference duty ratio and the voting duty ratio is 1, and the value of the preference duty ratio and the voting duty ratio is between 0 and 1. The values of the two may be the same or different.
Step S1039, determining preference factors of the respective target data based on the preference ratios and the respective similarities, and determining voting factors of the respective target data based on the voting ratios and external voting results of the respective target data.
Here, the product of the preference duty ratio and the respective degrees of similarity may be determined as a preference factor of the respective target data; similarly, the product of the voting duty and the external voting result of each target data may be determined as the voting factor of each target data.
Step S10310, sorting the target data based on the preference factors of the target data and the voting factors of the target data, and obtaining the target retrieval result.
Here, the preference factors of each target data and the voting factors of each target data may be accumulated to obtain accumulated values of each target data; next, the respective target data are sorted based on the accumulated value of the respective target data, wherein the higher the accumulated value, the higher the sorting position. That is, the target data are sorted according to the magnitude relation of the accumulated values, the target data corresponding to the larger accumulated value are sorted earlier, and the target data corresponding to the smaller accumulated value are sorted later.
In the embodiment of the present application, through the steps S1031 to S10310, a plurality of history browsing data are obtained according to the history browsing record of the client; acquiring classification information corresponding to each historical browsing data, and determining preference information corresponding to the client based on the classification information of each historical browsing data; then, acquiring classification information of each target data, and further determining similarity between the preference information and the classification data of each target data; finally, under the condition that the external voting result of the target data cannot be obtained, each target data is directly sequenced according to the similarity to obtain a search result, so that the search result matched with the preference information can be obtained, the personalized requirement of the client is met, the search efficiency is improved, and the search effect is optimized; under the condition that the external voting result of the target data is obtained, a preset preference duty ratio and a preset voting duty ratio are also obtained, each preference factor is determined based on the preference duty ratio and each similarity, each voting factor is determined based on the voting duty ratio and each external voting result, the target data is ordered based on each preference factor and each voting factor, and a final target retrieval result is obtained.
In some embodiments, the data in the search database is preprocessed before the content search is performed, referring to fig. 4, before step S101 "receiving the search request sent by the client and obtaining the search keyword based on the search request", the following steps S11 to S16 may be further executed:
step S11, judging whether the data is a picture.
Here, the type of the data may be acquired, and whether the data in the search database is a picture is judged according to the data type, and if the data is not a picture, step S12 is entered; and if the data is a picture, step S13.
And step S12, analyzing the data in the search database to obtain the structured data.
At this time, the data in the database is searched for not being pictures, and the data is characterized as characters, then the decoding mode corresponding to the data coding is determined, and the data is analyzed based on the decoding mode, so as to obtain the analyzed structured data. That is, different parts of the title, date, text, etc. of the character are determined, and in addition, unnecessary symbols can be filtered out during this process.
And S13, carrying out recognition processing on the picture by using a preset image recognition algorithm to obtain character contained in the picture or content information of the picture representation.
At this time, the data in the search database is a picture, and the picture needs to be identified to identify the characters contained in the picture, and if the picture contains a graphic, the content information of the representation of the picture is identified. The image recognition processing may be implemented through a preset image recognition algorithm, which may be optical character recognition (Optical Character Recognition, OCR), an artificial intelligence algorithm, or the like. Illustratively, the artificial intelligence algorithm may be a neural network algorithm, a bayesian network algorithm, or the like.
In an actual implementation, the pictures may be flowcharts, block diagrams, schematic diagrams, photographs, and the like.
And S14, analyzing the character or content information to obtain the structured data corresponding to the picture.
Here, the implementation of step S14 is similar to that of step S12 described above, and thus, the implementation of step S14 may refer to the implementation of step S12 described above.
And S15, performing word segmentation on the structured data to obtain a word segmentation result.
Here, the structured data may be subjected to word segmentation processing by a dictionary-based method, a statistical-based method, or a rule-based method, thereby obtaining a word segmentation result corresponding to each data.
In some embodiments, to simplify the processing procedure, word segmentation may also be directly performed on the data to obtain a word segmentation result corresponding to each data.
Step S16, extracting features of the structured data to obtain feature information; and determining classification information of the data based on the feature information.
Here, the feature extraction method may be one-hot encoding (one-hot) or word Frequency inverse text Frequency index (Term Frequency-Inverse Document Frequency, TF-IDF), and the obtained feature information can represent key information of the structured data. Classification information of the data is then also determined based on the feature information, which may be, for example, an office class, a scientific class, a national enterprise class, or the like.
In some embodiments, the abstract in the data is also extracted, so that the subsequent client displays the abstract, and the legibility of the data is improved.
In the embodiment of the present application, through the steps S11 to S16, when the data in the search database is not a picture, the data is directly analyzed to obtain the structured data, and when the data is a picture, the picture is first identified before the analysis to obtain the character or the content information represented by the picture, and then the analysis is performed on the character or the content information to obtain the analysis result; then, word segmentation and feature extraction are sequentially performed, and classification information of data is determined based on the feature information, so that a basis is provided for subsequent content retrieval, and the retrieval accuracy is improved.
Based on the above embodiments, the embodiments of the present application further provide a search method applied to a server, in the embodiments of the present application, a block diagram of a search system in the server is shown in fig. 5A,
the system comprises: a user login module 51 for managing user information; the user authority management module 52 is used for achieving the purposes of defining the authority and refining the division; the wind control module 53 is used for customer identification and risk control; the system also comprises a knowledge graph construction module 54, an index construction management module 55, a content retrieval module 56 and a graphic analysis and storage module 57.
Furthermore, in some embodiments, the system may further comprise: office management for providing attributes of office tools commonly used by clients; the account setting is used for collecting account information and personalized information actively fed back by the client; the personalized pushing module is used for providing system acquisition information and providing office tools, office supplies, office modes and the like which are possibly used for the system acquisition information; and the page layout control module is used for providing page layout for the customer to select.
The search method includes the following steps S501 to S508:
in step S501, personalized data collected by the client is received and stored in a database.
Here, the personalized data refers to data uploaded to the server through the client.
Step S502, web spider.
Here, new resources are continuously crawled from the internet while the resources are updated periodically.
Step S503, preprocessing the document.
Here, the resource formats obtained from the internet are various, and each resource in each format needs a parser program, so that various strange symbols can be ignored, and useful information can be extracted.
Step S504, chinese word segmentation.
Here, sentences are decomposed into individual words, and parts of speech and named entities are determined.
Step S505, document feature extraction.
Here, the purpose of feature extraction is to extract key information from a document that is most capable of expressing the content and characteristics of the document, thereby minimizing the computational effort of a computer while accurately describing the document.
Step S506, the documents are automatically classified.
Here, the digitized document resources are classified into corresponding contexts based on document features.
Step S507, the document is automatically abstracted in real time.
Here, the automatic document summarization can automatically extract the content summary of a web document, and the length of the summary text can be adjusted according to the needs. In addition, the summary results may be used to allow the user to quickly view a summary of the content of the resource when listing the search results.
Step S508, distributed information retrieval.
Here, "distributed" includes both a multi-node distribution of the index data and a multi-node distribution of the query task execution. The distributed search system is responsible for receiving a search request of a user by a central server, distributing the search request to specific query task execution sub-nodes, receiving combined query results and returning the combined query results to the user.
Referring to fig. 5A, the function of the functional module contained therein is as follows:
full text retrieval is the discovery of meaningful articles or knowledge from a vast unordered content of a database.
Chinese word segmentation is a process of recombining continuous word sequences into word sequences according to a certain specification. In practical cases, in English line text, space is used as natural delimiter between words, chinese is simply delimited by word, sentence and segment through obvious delimiter, and only word does not have one delimiter in form, while English also has phrase dividing problem, but Chinese is more complex and more difficult than English on the word layer. The Chinese word segmentation processing aims at accelerating the analysis of the quick feedback information required by the user, and is greatly convenient for the wide Chinese users.
The information quantization and quantization technology is a data mining technology, and can measure the information of a database to realize an information full-text retrieval system with a correlation retrieval function; the metrics may also be used to generate statistical information that is applied to document statistics and decision analysis; based on data mining, enterprises can build automated information processing systems.
The intelligent search engine realizes the initiative of providing personalized office demands through a search technology. The key points are the processing of image-text retrieval, the filling of content and the establishment of index relation, so that the office efficiency is improved, the meaning of image-text attribute is redefined through the acquisition of data, the proper office attribute relation is reconstructed and stored, and the matching degree of the user is adapted to provide a quick and convenient office environment for retrieving for the user.
The user login is based On a Single Sign On (SSO) framework, so that the user can access all application systems which are mutually trusted only by logging in once. The system provides functions of registration, one-key registration, logout, automatic registration without first registration, verification code verification and the like when in application, a user is not bothered by multiple logins, a program is added with a single sign-on protocol, and the burden of managing user accounts is relieved.
Rights management, through paging, conditional query, advanced query, field list display, chinese name escape, and most importantly, data filtering operation according to company conditions, users can only manage their own data. Among other things, the employed technology may be role based access control technology (Role Based Access Control, RBAC).
The primary goal of the planning and implementation of the office automation (Office Automation, OA) system is to establish a security system of the OA office automation system to protect the security of information assets of enterprises, and to establish technical and administrative security measures, and to provide security functions such as user authentication, operation authorization, network security detection and virus attack prevention, so as to protect the security, stability, integrity, effectiveness and confidentiality of hardware, software and data in the OA office automation system, and to prevent the system or data information from being destroyed, tampered and leaked due to malicious or accidental reasons, thereby ultimately causing serious economic loss of the enterprises.
The knowledge graph construction comprises two aspects of knowledge acquisition and knowledge fusion. Knowledge acquisition is to extract concepts, entities, attributes and relationships from data such as open web pages, online encyclopedias and core word libraries; the main purpose of knowledge fusion is to realize time sequence fusion and multi-data source fusion of knowledge. The real demands of the users at the moment are tried to be understood by utilizing the time and geographic position information when the users send query words and historical information means such as the query words sent by the users in the past, corresponding click records and the like.
For example, referring to FIG. 5B, in an intelligent retrieval enterprise office system, the retrieval system may also include an intelligent retrieval storage module 58, a retrieval log module 59, and a structured database module 510.
In the embodiment of the application, the knowledge graph construction main body is as follows: the body definition (entity, relation, attribute) and construction, content uploading (document, content, attribute) is characterized in that the acquired data are redefined, the relation is reconstructed, and effective information conforming to the search content is screened out.
The index construction mainly comprises an organization structure, a construction process, a compression coding technology, a dynamic updating technology, a large-scale data storage technology and the like of the index, wherein index compression, dynamic updating and large-scale data storage are taken as research emphasis, and a prototype system for an experimental environment is designed on the basis. Index compression and encoding can effectively save memory space and reduce Input/Output (I/O) traffic. Aiming at the problem of large-scale data storage and processing, a distributed data storage and processing strategy is designed, which not only effectively meets the requirements of data distributed storage and data processing, but also has high fault tolerance. When the user searches information by keywords, the search engine searches in the database, if a website which accords with the content required by the user is found, a special algorithm is adopted to calculate the relevance and ranking level of each webpage according to the matching degree, the occurrence position/frequency, the link quality and the like of the keywords in the webpage, and then the webpage links are sequentially returned to the user according to the relevance.
The content retrieval is mainly divided into four steps of grabbing and library building, retrieval and sorting, external voting and result showing. The function of grabbing and library building is to grab valuable resources as much as possible under limited hardware and bandwidth resources, so that the purposes of not affecting the normal user access of the website and grabbing the valuable resources as much as possible are achieved; the search ordering includes the process of intersection of page sets of different parts after word segmentation in the query, and the search becomes comparison and intersection between page names. Thus, searching in the unit of one hundred million in milliseconds becomes possible. External voting refers to representing the relevance and importance of web pages through the calculation of scores by a hyperlink, and is one of important reference factors used by a search engine for evaluating the web pages, and the external voting directly participates in search result ranking calculation. The superior hyperlinks are pushed to improve the ranking of data indirectly to improve the access quantity through filtering and cleaning the hyperlinks, so that a superior content pushing mode is achieved for a long time; the result display is structured display, and the forms are quite various. At present, 80% of search requirements are covered, namely, complex display patterns appear under 80% of keywords; the second, one section abstract type, is presented, and the most original presentation mode has only one title, two rows of abstracts and partial links.
And (3) performing image-text analysis, namely calculating a hash value (hash code) of each picture, taking the hash value as a fingerprint of the picture, and then calculating the distance between the fingerprints of the two pictures to judge whether the pictures are similar. A color distribution histogram is generated for each picture, for example, by a "color distribution method", and then the histograms of the two pictures are compared for similarity. The high-quality pictures can increase the overall browsing amount of the website, improve the interaction and participation of users, establish good trust, and facilitate enterprises to make quick response to the demands of the users.
The embodiment of the application focuses on actively providing personalized requirements for the user in the use searching process, emphasizes the aspect of interest and love, matches the use habit of the user, and provides the user with the desire; even if the control is not thought of by the user, the control can be predicted, the working efficiency can be greatly improved in the office of enterprises, and a strong rear shield is provided for efficient work.
Based on the foregoing embodiments, the embodiments of the present application provide a search device, where each module included in the search device and each unit included in each module may be implemented by a processor in a computer device; of course, the method can also be realized by corresponding logic circuits; in practice, the processor may be a central processing unit (Central Processing Unit, CPU), microprocessor (Microprocessor Unit, MPU), digital signal processor (Digital Signal Processing, DSP) or field programmable gate array (Field Programmable Gate Array, FPGA), etc.
An embodiment of the present application further provides a retrieving apparatus, fig. 6 is a schematic diagram of a composition structure of the retrieving apparatus provided in the embodiment of the present application, as shown in fig. 6, where the retrieving apparatus 600 includes:
a first obtaining module 601, configured to receive a search request sent by a client, and obtain a search keyword based on the search request;
a first determining module 602, configured to determine candidate search results from a search database that is built in advance based on the search keyword;
the ranking module 603 is configured to obtain preference information corresponding to the client, rank the candidate search results based on the preference information, and obtain a target search result;
and the sending module 604 is configured to send a search response carrying the target search result to the client.
In some embodiments, the first determining module 602 includes:
the first acquisition unit is used for acquiring word segmentation results corresponding to each data in the search database;
the first determining unit is used for determining target word segmentation results containing the search keywords from the word segmentation results;
and the second determining unit is used for determining target data corresponding to the target word segmentation result as the candidate retrieval result.
In some embodiments, the ordering module 603 includes:
a second obtaining unit for obtaining classification information of each target data in the candidate search result;
a third determining unit configured to determine a degree of similarity between the preference information and the classification information of the respective target data;
and the sorting unit is used for sorting the target data based on the similarity to obtain a target retrieval result, wherein the higher the similarity is, the earlier the sorting position is.
In some embodiments, the apparatus further comprises:
the second acquisition module is used for acquiring the history browsing record of the client and acquiring a plurality of history browsing data based on the history browsing record;
the third acquisition module is used for acquiring classification information corresponding to each historical browsing data;
and the second determining module is used for determining preference information corresponding to the client based on the classification information corresponding to each historical browsing data.
In some embodiments, the apparatus further comprises:
a fourth obtaining module, configured to obtain an external voting result of each target data in the candidate search results;
accordingly, the sorting module 603 includes:
a third obtaining unit, configured to obtain a preset preference duty ratio and a preset voting duty ratio;
A fourth determining unit configured to determine preference factors of the respective target data based on the preference ratios and the respective similarities, and determine voting factors of the respective target data based on the voting ratios and external voting results of the respective target data;
and the second sorting unit is used for sorting the target data based on the preference factors of the target data and the voting factors of the target data to obtain target retrieval results.
In some embodiments, the apparatus further comprises:
the analysis module is used for analyzing the data in the search database to obtain structured data;
the word segmentation module is used for carrying out word segmentation processing on the structured data to obtain word segmentation results;
the third determining module is used for extracting the characteristics of the structured data to obtain characteristic information; and determining classification information of the data based on the characteristic information.
In some embodiments, the parsing module includes:
the identification unit is used for carrying out identification processing on the picture by utilizing a preset image identification algorithm to obtain characters contained in the picture or content information represented by the picture;
And the analysis unit is used for analyzing the characters or the content information to obtain the structured data corresponding to the picture.
It should be noted that, the description of the search device in the embodiment of the present application is similar to the description of the method embodiment described above, and has similar advantageous effects as the method embodiment. For technical details not disclosed in the embodiments of the present apparatus, please refer to the description of the embodiments of the method of the present application for understanding.
In the embodiment of the present application, if the above-mentioned search method is implemented in the form of a software functional module, and sold or used as a separate product, the search method may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or partially contributing to the related art, and the computer software product may be stored in a storage medium, and include several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.
Accordingly, an embodiment of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the retrieval method provided in the above embodiment.
An embodiment of the present application provides a server, fig. 7 is a schematic diagram of a composition structure of the server provided in the embodiment of the present application, as shown in fig. 7, and the server 700 includes: a processor 701, at least one communication bus 702, a user interface 703, at least one external communication interface 704 and a memory 705. Wherein the communication bus 702 is configured to enable connected communication between these components. The user interface 703 may include a display screen, and the external communication interface 704 may include a standard wired interface and a wireless interface, among others. Wherein the processor 701 is configured to execute a program of the search method stored in the memory to realize the search method provided in the above-described embodiment.
The description of the server and storage medium embodiments above is similar to that of the method embodiments described above, with similar benefits as the method embodiments. For technical details not disclosed in the embodiments of the server and the storage medium of the present application, please refer to the description of the method embodiments of the present application for understanding.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application. The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units; can be located in one place or distributed to a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purposes of the embodiments of the present application.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: various media capable of storing program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.
Alternatively, the integrated units described above may be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or partly contributing to the related art, embodied in the form of a software product stored in a storage medium, including several instructions for causing an AC to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.
The foregoing is merely an embodiment of the present application, but the protection scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method of searching, the method comprising:
receiving a search request sent by a client, and acquiring a search keyword based on the search request;
determining candidate search results from a pre-constructed search database based on the search keywords;
acquiring preference information corresponding to the client, and sorting the candidate search results based on the preference information to obtain target search results;
and sending a search response carrying the target search result to the client.
2. The method of claim 1, wherein the determining candidate search results from a pre-constructed search database based on the search keywords comprises:
obtaining word segmentation results corresponding to each data in the search database;
Determining target word segmentation results containing the search keywords from the word segmentation results;
and determining target data corresponding to the target word segmentation result as the candidate retrieval result.
3. The method according to claim 2, wherein the ranking the candidate search results based on the preference information to obtain a target search result comprises:
acquiring classification information of each target data in the candidate search result;
determining the similarity between the preference information and the classification information of each target data;
and sequencing each target data based on the similarity to obtain a target retrieval result, wherein the sequencing position is higher and the sequencing position is higher.
4. A method as claimed in claim 3, further comprising:
acquiring a history browsing record of the client, and acquiring a plurality of history browsing data based on the history browsing record;
acquiring classification information corresponding to each historical browsing data;
and determining preference information corresponding to the client based on the classification information corresponding to each historical browsing data.
5. A method as claimed in claim 3, further comprising:
Obtaining an external voting result of each target data in the candidate retrieval results;
correspondingly, the ranking the candidate search results based on the preference information to obtain a target search result, and the method further comprises:
acquiring a preset preference duty ratio and a preset voting duty ratio;
determining preference factors of the target data based on the preference duty ratio and the similarity, and determining voting factors of the target data based on the voting duty ratio and external voting results of the target data;
and sequencing the target data based on the preference factors of the target data and the voting factors of the target data to obtain a target retrieval result.
6. The method according to any one of claims 1 to 5, wherein prior to the receiving the client-transmitted retrieval request, the method further comprises:
analyzing the data in the search database to obtain structured data;
performing word segmentation on the structured data to obtain word segmentation results;
extracting features of the structured data to obtain feature information; and determining classification information of the data based on the characteristic information.
7. The method of claim 6, wherein if the data is a picture, parsing the data in the search database comprises:
carrying out recognition processing on the picture by using a preset image recognition algorithm to obtain characters contained in the picture or content information represented by the picture;
and analyzing the character or the content information to obtain the structured data corresponding to the picture.
8. A search device, characterized in that the search device comprises:
the first acquisition module is used for receiving a search request sent by a client and acquiring a search keyword based on the search request;
the first determining module is used for determining candidate search results from a pre-constructed search database based on the search keywords;
the ordering module is used for acquiring preference information corresponding to the client, and ordering the candidate search results based on the preference information to obtain target search results;
and the sending module is used for sending the search response carrying the target search result to the client.
9. A server, the server comprising:
A processor; and
a memory for storing a computer program executable on the processor;
wherein the computer program, when executed by a processor, implements the retrieval method of any of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein computer-executable instructions configured to perform the retrieval method of any of the above claims 1 to 7.
CN202210010310.6A 2022-01-06 2022-01-06 Retrieval method, retrieval device, server and computer readable storage medium Pending CN116450913A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210010310.6A CN116450913A (en) 2022-01-06 2022-01-06 Retrieval method, retrieval device, server and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210010310.6A CN116450913A (en) 2022-01-06 2022-01-06 Retrieval method, retrieval device, server and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN116450913A true CN116450913A (en) 2023-07-18

Family

ID=87128941

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210010310.6A Pending CN116450913A (en) 2022-01-06 2022-01-06 Retrieval method, retrieval device, server and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN116450913A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116719954A (en) * 2023-08-04 2023-09-08 中国人民解放军海军潜艇学院 Information retrieval method, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116719954A (en) * 2023-08-04 2023-09-08 中国人民解放军海军潜艇学院 Information retrieval method, electronic equipment and storage medium
CN116719954B (en) * 2023-08-04 2023-10-17 中国人民解放军海军潜艇学院 Information retrieval method, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US8082248B2 (en) Method and system for document classification based on document structure and written style
WO2017097231A1 (en) Topic processing method and device
CN110929125B (en) Search recall method, device, equipment and storage medium thereof
CN111797214A (en) FAQ database-based problem screening method and device, computer equipment and medium
CN113297457B (en) High-precision intelligent information resource pushing system and pushing method
CN112749341B (en) Important public opinion recommendation method, readable storage medium and data processing device
Mottaghinia et al. A review of approaches for topic detection in Twitter
Kumar et al. Hashtag recommendation for short social media texts using word-embeddings and external knowledge
Alghamdi et al. Topic detections in Arabic dark websites using improved vector space model
Rafail et al. Natural language processing
Gossen et al. Towards extracting event-centric collections from web archives
CN114676346A (en) News event processing method and device, computer equipment and storage medium
CN110019763B (en) Text filtering method, system, equipment and computer readable storage medium
CN116450913A (en) Retrieval method, retrieval device, server and computer readable storage medium
Cho et al. Topic category analysis on twitter via cross-media strategy
US9081858B2 (en) Method and system for processing search queries
Kotenko et al. The intelligent system for detection and counteraction of malicious and inappropriate information on the Internet
AlNoamany Using web archives to enrich the live web experience through storytelling
Dokoohaki et al. Mining divergent opinion trust networks through latent dirichlet allocation
CN110795943B (en) Topic representation generation method and system for event
US20200226159A1 (en) System and method of generating reading lists
Singh et al. User specific context construction for personalized multimedia retrieval
JP2010282403A (en) Document retrieval method
Selvan et al. ASE: Automatic search engine for dynamic information retrieval
Abuoda et al. Automatic Tag Recommendation for the UN Humanitarian Data Exchange.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination