CN117591631A - Elastic search text vectorization search system based on AI PaaS platform - Google Patents
Elastic search text vectorization search system based on AI PaaS platform Download PDFInfo
- Publication number
- CN117591631A CN117591631A CN202311572486.1A CN202311572486A CN117591631A CN 117591631 A CN117591631 A CN 117591631A CN 202311572486 A CN202311572486 A CN 202311572486A CN 117591631 A CN117591631 A CN 117591631A
- Authority
- CN
- China
- Prior art keywords
- search
- text
- elastic
- platform
- elastic search
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 claims abstract description 15
- 238000005516 engineering process Methods 0.000 claims abstract description 7
- 238000005457 optimization Methods 0.000 claims abstract description 7
- 238000013500 data storage Methods 0.000 claims abstract description 6
- 239000013598 vector Substances 0.000 claims description 16
- 238000003860 storage Methods 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 7
- 239000012634 fragment Substances 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 6
- 238000004220 aggregation Methods 0.000 claims description 3
- 230000002776 aggregation Effects 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 2
- 238000006243 chemical reaction Methods 0.000 claims 1
- 230000008901 benefit Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/338—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/31—User authentication
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Computer Security & Cryptography (AREA)
- Artificial Intelligence (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of intelligent office, in particular to an elastic search text vectorization search system based on an AI PaaS platform, which comprises an AI PaaS platform module, an elastic search engine module, a text optimization module and a search output module; the AI PaaS platform module is used for constructing an AI PaaS platform, creating text search items, setting search authorities and access authorities, integrating a selected language big model into the AI PaaS platform, constructing a Web application by adopting a FastAPI framework, and loading the AI PaaS platform into the Web application; the elastic search engine module adopts the elastic search as a document data storage engine, searches and analyzes a large amount of text data based on the elastic search engine, queries the text data by matching keywords, understanding and applying complex query sentences, can more intelligently understand user query by a text vectorization technology, provides more accurate search results, can realize similarity matching based on a vectorization method, and recommends documents related to query semantics of the user, thereby improving search experience.
Description
Technical Field
The invention relates to the technical field of intelligent office, in particular to an elastic search text vectorization search system based on an AI PaaS platform.
Background
The AI PaaS platform is a cloud platform for providing artificial intelligence service and is used for providing a pre-training model, natural language processing service and image processing service, has high openness and can integrate multiple functions.
The elastsearch is an open-source distributed search engine that searches and analyzes large amounts of data in real time. The method supports functions of full-text search, structured search, analysis and the like, and is widely applied to the fields of log analysis, text search, recommendation systems and the like.
The AI PaaS platform is combined with the elastic search engine, and the advantages of the AI PaaS platform and the elastic search engine are combined, so that a quick and efficient text searching method can be developed, and the text searching efficiency is improved.
Disclosure of Invention
This section is intended to outline some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description summary and in the title of the application, to avoid obscuring the purpose of this section, the description summary and the title of the invention, which should not be used to limit the scope of the invention.
The present invention has been made in view of the above-described problems.
Therefore, the technical problems solved by the invention are as follows: and the elastic search text search is developed based on the AI PaaS platform, so that the search efficiency is optimized, and the search success rate is improved.
In order to solve the technical problems, the invention provides the following technical scheme: the system comprises an AI PaaS platform module, an elastic search engine module, a text optimization module and a search output module;
the AI PaaS platform module is used for building an AI PaaS platform, creating text search items, setting search rights and access rights, integrating a selected language big model into the AI PaaS platform, constructing a Web application by adopting a FastAPI framework, and loading the AI PaaS platform into the Web application;
the elastic search engine module is used for searching and matching texts stored in an elastic search engine database;
the text storage optimization module optimizes the storage of the text in the elastic search engine by adopting text segmentation and vector storage technology;
and the search output module outputs the searched text information on an elastic search engine user interaction interface and displays the searched text information to a user.
As a preferable scheme of the elastic search text vectorization search system based on the AI PaaS platform, the invention comprises the following steps: the elastiscearch includes searching and analyzing a large amount of text data based on an elastiscearch engine by using the elastiscearch as a document data storage engine, and querying the text data by matching keywords, understanding and applying complex query sentences.
As a preferable scheme of the elastic search text vectorization search system based on the AI PaaS platform, the invention comprises the following steps: the elastic search also comprises that the texts are stored in the elastic search engine in a data form, a unique identifier is established for each text, the texts are stored in an index according to a logic group, the index is a set of a plurality of texts, and the elastic search engine provides a plurality of searching modes and identity authentication functions.
As a preferable scheme of the elastic search text vectorization search system based on the AI PaaS platform, the invention comprises the following steps: setting different search query modes in an elastic search engine, establishing full-text search, fuzzy search, range search and Boolean search, constructing an aggregation operation of searching out texts, and carrying out statistics, grouping and filtering on the texts;
the identity authentication function comprises the steps of establishing identity authentication of an elastic search engine, authenticating a user, enabling unauthorized users not to allow data in the search engine, setting access levels of authorized users, and enabling the users to access corresponding-level text information according to the levels.
As a preferable scheme of the elastic search text vectorization search system based on the AI PaaS platform, the invention comprises the following steps: the storing of the optimized text in the elastic search engine comprises the steps that before the text is stored in the elastic search engine, text segmentation is firstly carried out, and the text is segmented into a plurality of equal-length fragments; when the text is segmented, a hidden Markov model is adopted to segment the text into a plurality of equal-length fragments with the same vocabulary occurrence frequency.
As a preferable scheme of the elastic search text vectorization search system based on the AI PaaS platform, the invention comprises the following steps: the storing of the optimized text in the elastic search engine further comprises vectorizing the segmented text segments and constructing a vocabulary.
As a preferable scheme of the elastic search text vectorization search system based on the AI PaaS platform, the invention comprises the following steps: the construction of the vocabulary includes constructing a unique vocabulary by taking the vocabulary appearing in all texts as elements of the vocabulary:
V={w 1 ,w 2 ,...,w 2 }
where V is the vocabulary set, w i A sequence of numbers for different words in the vocabulary;
for each text D i After being cut into different numbers of equal length segments, denoted as D in D is to i The digital sequence is converted into vectors according to a vocabulary:
and storing the text converted vector into an elastic search engine database, and optimizing a search flow by adopting a cosine similarity method.
As a preferable scheme of the elastic search text vectorization search system based on the AI PaaS platform, the invention comprises the following steps: the cosine similarity comprises optimizing text search by cosine similarity technology:
wherein A.B represents the vector inner product, II A II respectively represents the vector norm, and the correlation degree between the searched text and the search question given by the user is judged by calculating the cosine similarity, so that the search answer is given.
The invention also provides a system of the elastic search text vectorization search system based on the AI PaaS platform, which can realize efficient search of texts by constructing the text vectorization search system.
The invention also provides a computer device, which comprises a memory and a processor, wherein the memory stores a computer program, and the computer device is characterized in that the processor realizes the step of the elastic search text vectorization search system based on the AI PaaS platform when executing the computer program.
The invention also provides a computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the step of the elastic search text vectorization search system based on the AI PaaS platform.
The invention has the beneficial effects that: according to the method provided by the invention, an integrated improved elastic search engine is built on an AI PaaS platform, the vectorization of the design text is developed, the text data stored by the elastic search engine is optimized, and more accurate search results are provided when the text data is searched;
similarity matching can be realized based on the vectorization method, the text related to query semantics is recommended to the user through a constructed cosine similarity technology search algorithm, the search speed and the text search relevance are improved by more than 50%, and the search experience is remarkably improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:
FIG. 1 is a schematic diagram of the overall method of the elastic search text vectorization search system based on the AI PaaS platform of the present invention.
FIG. 2 is a system architecture diagram of an elastic search text vectorization search system based on an AI PaaS platform of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
While the embodiments of the present invention have been illustrated and described in detail in the drawings, the cross-sectional view of the device structure is not to scale in the general sense for ease of illustration, and the drawings are merely exemplary and should not be construed as limiting the scope of the invention. In addition, the three-dimensional dimensions of length, width and depth should be included in actual fabrication.
Also in the description of the present invention, it should be noted that the orientation or positional relationship indicated by the terms "upper, lower, inner and outer", etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first, second, or third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected, and coupled" should be construed broadly in this disclosure unless otherwise specifically indicated and defined, such as: can be fixed connection, detachable connection or integral connection; it may also be a mechanical connection, an electrical connection, or a direct connection, or may be indirectly connected through an intermediate medium, or may be a communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
Example 1
Referring to fig. 1, for a first embodiment of the present invention, a method of an elastic search text vectorization search system based on an AI PaaS platform is provided.
S1: and constructing an AI PaaS platform, and integrating the language big model into the platform.
Specifically, an AI PaaS platform is built, text search items are created, search rights and access rights are set, a selected language big model is integrated into the AI PaaS platform, a FastAPI framework is adopted to build a Web application, and the AI PaaS platform is loaded into the Web application.
S2: the text data is stored using an elastiscearch as a document data storage engine.
Specifically, the elastiscearch is adopted as a document data storage engine, a large amount of text data is searched and analyzed based on the elastiscearch engine, and the text data is queried through matching keywords, understanding and applying complex query sentences.
Further, the texts are stored in the form of data in an elastic search engine, a unique identifier is established for each text, and the texts are stored in an index according to a logical grouping, and the index is a set of a plurality of texts.
Different search query modes are set in an elastic search engine, full-text search, fuzzy search, range search and Boolean search are established, aggregation operation of searching out texts is established, and statistics, grouping and filtering are carried out on the texts.
And establishing identity authentication of the elastic search engine, authenticating the user, enabling the unauthorized user not to allow data in the search engine, setting access level of the authorized user, and enabling the user to access corresponding level text information according to the level.
S3: optimizing the storage of text in the elastiscearch engine.
Specifically, before the text is stored in an elastic search engine, firstly, text segmentation is carried out, and the text is segmented into a plurality of equal-length fragments; and segmenting the text by adopting a hidden Markov model, and segmenting the text into a plurality of equal-length fragments with the same vocabulary occurrence frequency.
Further, vectorizing the segmented text segments.
Constructing a vocabulary table, and constructing a unique vocabulary table by taking the vocabulary appearing in all texts as elements of the vocabulary table:
V={w 1 ,w 2 ,...,w 2 }
where V is the vocabulary set, w i A sequence of numbers for different words in the vocabulary;
for each text D i After being cut into different numbers of equal length segments, denoted as D in D is to i The digital sequence is converted into vectors according to a vocabulary:
and storing the text converted vector into an elastic search engine database, and optimizing a search flow by adopting a cosine similarity method.
S4: the user questions are searched in the elastiscearch engine vector library through the language big model.
Optimizing a search flow by adopting a cosine similarity method:
wherein A.B represents the vector inner product, II A II respectively represents the vector norm, and the correlation degree between the searched text and the search question given by the user is judged by calculating the cosine similarity, so that the search answer is given.
Example 2
Referring to fig. 2, for a second embodiment of the present invention, an elastic search text vectorization search system based on an AI PaaS platform is provided.
The system comprises an AI PaaS platform module, an elastic search engine module, a text optimization module and a search output module;
the AI PaaS platform module is used for constructing an AI PaaS platform, creating text search items, setting search authorities and access authorities, integrating a selected language big model into the AI PaaS platform, constructing a Web application by adopting a FastAPI framework, and loading the AI PaaS platform into the Web application;
the elastic search engine module adopts the elastic search as a document data storage engine, searches and analyzes a large amount of text data based on the elastic search engine, and queries the text data by matching keywords, understanding and applying complex query sentences;
the text storage optimization module optimizes the storage of the text in the elastic search engine by adopting text segmentation and vector storage technology;
and the search output module outputs the searched text information on an elastic search engine user interaction interface and displays the searched text information to a user.
Example 3
A third embodiment of the invention, which differs from the previous embodiment, is:
the functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Furthermore, in an effort to provide a concise description of the exemplary embodiments, all features of an actual implementation may not be described (i.e., those not associated with the best mode presently contemplated for carrying out the invention, or those not associated with practicing the invention).
It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions may be made. Such a development effort might be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure.
It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.
Claims (10)
1. The elastic search text vectorization search system based on the AIPaaS platform is characterized in that: the system comprises an AIPaaS platform module, an elastic search engine module, a text optimization module and a search output module;
the AIPaaS platform module is used for building an AIPaaS platform, creating text search items, setting search permission and access permission, integrating a selected language big model into the AIPaaS platform, constructing a Web application by adopting a FastAPI framework, and loading the AIPaaS platform into the Web application;
the elastic search engine module is used for searching and matching texts stored in an elastic search engine database;
the text storage optimization module optimizes the storage of the text in the elastic search engine by adopting text segmentation and vector storage technology;
and the search output module outputs the searched text information on an elastic search engine user interaction interface and displays the searched text information to a user.
2. The AIPaaS platform-based elastic search text vectorization search system of claim 1 wherein: the elastiscearch includes searching and analyzing a large amount of text data based on an elastiscearch engine by using the elastiscearch as a document data storage engine, and querying the text data by matching keywords, understanding and applying complex query sentences.
3. The AIPaaS platform-based elastic search text vectorization search system of claim 2 wherein: the elastic search also comprises that the texts are stored in the elastic search engine in a data form, a unique identifier is established for each text, the texts are stored in an index according to a logic group, the index is a set of a plurality of texts, and the elastic search engine provides a plurality of searching modes and identity authentication functions.
4. The AIPaaS platform-based elastic search text vectorization search system of claim 3 wherein: setting different search query modes in an elastic search engine, establishing full-text search, fuzzy search, range search and Boolean search, constructing an aggregation operation of searching out texts, and carrying out statistics, grouping and filtering on the texts;
the identity authentication function comprises the steps of establishing identity authentication of an elastic search engine, authenticating a user, enabling unauthorized users not to allow data in the search engine, setting access levels of authorized users, and enabling the users to access corresponding-level text information according to the levels.
5. The AIPaaS platform-based elastic search text vectorization search system of claim 4 wherein: the storing of the optimized text in the elastic search engine comprises the steps that before the text is stored in the elastic search engine, text segmentation is firstly carried out, and the text is segmented into a plurality of equal-length fragments; when the text is segmented, a hidden Markov model is adopted to segment the text into a plurality of equal-length fragments with the same vocabulary occurrence frequency.
6. The AIPaaS platform-based elastic search text vectorization search system of claim 5 wherein: the storing of the optimized text in the elastic search engine further comprises vectorizing the segmented text segments and constructing a vocabulary.
7. The AIPaaS platform-based elastic search text vectorization search system of claim 6 wherein: the construction of the vocabulary includes constructing a unique vocabulary by taking the vocabulary appearing in all texts as elements of the vocabulary:
V={w 1 ,w 2 ,...,w 2 }
where V is the vocabulary set, w i A sequence of numbers for different words in the vocabulary;
for each text D i After being cut into different numbers of equal length segments, denoted as D in D is to i Conversion into vectors by vocabulary digital sequences:
And storing the text converted vector into an elastic search engine database, and optimizing a search flow by adopting a cosine similarity method.
8. The AIPaaS platform-based elastic search text vectorization search system of claim 7 wherein: the cosine similarity comprises optimizing text search by cosine similarity technology:
wherein A.B represents the vector inner product, II A II respectively represents the vector norm, and the correlation degree between the searched text and the search question given by the user is judged by calculating the cosine similarity, so that the search answer is given.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the AIPaaS station based elastiscearch text vectorization search method of any of claims 1 to 8.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of the AI PaaS platform based elastic search text vectorization search system of any of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311572486.1A CN117591631A (en) | 2023-11-23 | 2023-11-23 | Elastic search text vectorization search system based on AI PaaS platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311572486.1A CN117591631A (en) | 2023-11-23 | 2023-11-23 | Elastic search text vectorization search system based on AI PaaS platform |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117591631A true CN117591631A (en) | 2024-02-23 |
Family
ID=89916159
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311572486.1A Pending CN117591631A (en) | 2023-11-23 | 2023-11-23 | Elastic search text vectorization search system based on AI PaaS platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117591631A (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103412933A (en) * | 2013-08-20 | 2013-11-27 | 南京物联网应用研究院有限公司 | Cloud search platform |
US20180300415A1 (en) * | 2017-04-16 | 2018-10-18 | Radim Rehurek | Search engine system communicating with a full text search engine to retrieve most similar documents |
CN109189752A (en) * | 2018-10-12 | 2019-01-11 | 国网山东省电力公司电力科学研究院 | Power marketing knowledge base system based on intelligent Search Technique |
US20200228402A1 (en) * | 2019-01-15 | 2020-07-16 | Affirmed Networks, Inc. | Dynamic auto-configuration of multi-tenant paas components |
CN113761312A (en) * | 2021-07-09 | 2021-12-07 | 杭州叙简科技股份有限公司 | Network handwriting detection method based on Elasticissearch and microblog comments |
US20220092099A1 (en) * | 2020-09-21 | 2022-03-24 | Samsung Electronics Co., Ltd. | Electronic device, contents searching system and searching method thereof |
CN114996551A (en) * | 2021-02-17 | 2022-09-02 | Gsi 科技公司 | System and method for improved similarity search of search engines |
WO2022232501A1 (en) * | 2021-04-29 | 2022-11-03 | American Chemical Society | Artificial intelligence assisted reviewer recommender and originality evaluator |
CN116821285A (en) * | 2023-07-11 | 2023-09-29 | 海默潘多拉数据科技(深圳)有限公司 | Text processing method, device, equipment and medium based on artificial intelligence |
-
2023
- 2023-11-23 CN CN202311572486.1A patent/CN117591631A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103412933A (en) * | 2013-08-20 | 2013-11-27 | 南京物联网应用研究院有限公司 | Cloud search platform |
US20180300415A1 (en) * | 2017-04-16 | 2018-10-18 | Radim Rehurek | Search engine system communicating with a full text search engine to retrieve most similar documents |
CN109189752A (en) * | 2018-10-12 | 2019-01-11 | 国网山东省电力公司电力科学研究院 | Power marketing knowledge base system based on intelligent Search Technique |
US20200228402A1 (en) * | 2019-01-15 | 2020-07-16 | Affirmed Networks, Inc. | Dynamic auto-configuration of multi-tenant paas components |
US20220092099A1 (en) * | 2020-09-21 | 2022-03-24 | Samsung Electronics Co., Ltd. | Electronic device, contents searching system and searching method thereof |
CN114996551A (en) * | 2021-02-17 | 2022-09-02 | Gsi 科技公司 | System and method for improved similarity search of search engines |
WO2022232501A1 (en) * | 2021-04-29 | 2022-11-03 | American Chemical Society | Artificial intelligence assisted reviewer recommender and originality evaluator |
CN113761312A (en) * | 2021-07-09 | 2021-12-07 | 杭州叙简科技股份有限公司 | Network handwriting detection method based on Elasticissearch and microblog comments |
CN116821285A (en) * | 2023-07-11 | 2023-09-29 | 海默潘多拉数据科技(深圳)有限公司 | Text processing method, device, equipment and medium based on artificial intelligence |
Non-Patent Citations (5)
Title |
---|
张誌;张延彬;邢庆文;杨滨;: "人工智能能力平台建设思路", 网络安全和信息化, no. 03, 5 March 2020 (2020-03-05) * |
杨文杰;倪平波;宋卫平;杨帆;: "基于Elasticsearch服务化的探究", 科技资讯, no. 24, 23 August 2020 (2020-08-23) * |
许大宏;: "Elasticsearch在车牌识别系统中的应用研究", 计算机时代, no. 12, 15 December 2014 (2014-12-15) * |
郭建磊;董蕾;邱忠杰: "一种工业云PaaS平台统一日志服务系统", 信息技术与信息化, 28 March 2020 (2020-03-28), pages 43 - 45 * |
钱红兵;李艳丽;张蕊;: "WebCollector和ElasticSearch在高校网站群敏感词检测中的应用研究", 电子设计工程, no. 24, 20 December 2019 (2019-12-20) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109635273B (en) | Text keyword extraction method, device, equipment and storage medium | |
US20220261427A1 (en) | Methods and system for semantic search in large databases | |
CN117290489B (en) | Method and system for quickly constructing industry question-answer knowledge base | |
US8375061B2 (en) | Graphical models for representing text documents for computer analysis | |
CN112800170A (en) | Question matching method and device and question reply method and device | |
CN116701431A (en) | Data retrieval method and system based on large language model | |
CN116628173B (en) | Intelligent customer service information generation system and method based on keyword extraction | |
CN112115232A (en) | Data error correction method and device and server | |
CN115795061B (en) | Knowledge graph construction method and system based on word vector and dependency syntax | |
Li et al. | TagDC: A tag recommendation method for software information sites with a combination of deep learning and collaborative filtering | |
US12067061B2 (en) | Systems and methods for automated information retrieval | |
CN113590811B (en) | Text abstract generation method and device, electronic equipment and storage medium | |
US20220114340A1 (en) | System and method for an automatic search and comparison tool | |
CN113434639A (en) | Audit data processing method and device | |
CN115563313A (en) | Knowledge graph-based document book semantic retrieval system | |
CN113901783B (en) | Domain-oriented document duplication checking method and system | |
CN113761104A (en) | Method and device for detecting entity relationship in knowledge graph and electronic equipment | |
CN114391142A (en) | Parsing queries using structured and unstructured data | |
US20090234836A1 (en) | Multi-term search result with unsupervised query segmentation method and apparatus | |
CN117591631A (en) | Elastic search text vectorization search system based on AI PaaS platform | |
KR102541806B1 (en) | Method, system, and computer readable record medium for ranking reformulated query | |
CN113139034A (en) | Statement matching method, statement matching device and intelligent equipment | |
CN115688771B (en) | Document content comparison performance improving method and system | |
CN117725555B (en) | Multi-source knowledge tree association fusion method and device, electronic equipment and storage medium | |
CN118069122B (en) | Structured query statement multiplexing method, device, electronic equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |