CN113382063A - ES-based file uploading retrieval analysis method and device - Google Patents

ES-based file uploading retrieval analysis method and device Download PDF

Info

Publication number
CN113382063A
CN113382063A CN202110636741.9A CN202110636741A CN113382063A CN 113382063 A CN113382063 A CN 113382063A CN 202110636741 A CN202110636741 A CN 202110636741A CN 113382063 A CN113382063 A CN 113382063A
Authority
CN
China
Prior art keywords
information
text
uploaded
fragments
uploading
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110636741.9A
Other languages
Chinese (zh)
Inventor
王善博
程林
杨培强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Technology Co Ltd
Original Assignee
Inspur Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Technology Co Ltd filed Critical Inspur Software Technology Co Ltd
Priority to CN202110636741.9A priority Critical patent/CN113382063A/en
Publication of CN113382063A publication Critical patent/CN113382063A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/144Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1074Peer-to-peer [P2P] networks for supporting data block transmission mechanisms
    • H04L67/1078Resource delivery mechanisms
    • H04L67/108Resource delivery mechanisms characterised by resources being split in blocks or fragments

Abstract

The invention discloses a file uploading retrieval analysis method and device based on ES, belongs to the technical field of file analysis, and aims to solve the technical problem of how to quickly and accurately upload a retrieval analysis file. The method comprises the following steps: acquiring text information uploaded to a browser locally, wherein the text information comprises a text format and a text size; analyzing and judging the content of the text information to obtain a text type and obtain a text name; uniformly analyzing the content of the text information into a word format; segmenting text information to obtain information fragments; uploading the information fragments to an ES (application Server) through mapping, judging whether the information fragments are uploaded completely and whether the information fragments need to be re-fragmented through the discover during transmission, and taking charge of re-fragmentation of a master node of a cluster, wherein the uploaded information fragments realize information interaction through Transport and transmit the information fragments to a River data source.

Description

ES-based file uploading retrieval analysis method and device
Technical Field
The invention relates to the technical field of file analysis, in particular to an ES-based file uploading retrieval analysis method and device.
Background
With the increasing demand for tax information collection and information collection, the cells of the collection form will not display all the information. The number of data items is increased, the data content is variable, the data items cannot be summarized by using one input, sometimes word or excel needs to be uploaded for supplement or description, the data items are used as a certain information node of information or intelligence, the current file uploading speed is low, file storage address information needs to be improved, and information and addresses of uploaded files cannot be rapidly and accurately stored and inquired.
Based on the analysis, how to quickly and accurately upload, retrieve and analyze files is a technical problem to be solved.
Disclosure of Invention
The technical task of the invention is to provide a file uploading retrieval analysis method and device based on ES (electronic document) to solve the technical problem of how to quickly and accurately upload the retrieval analysis file.
In a first aspect, the present invention provides an ES-based file upload retrieval analysis method, which performs block-wise cutting and upload on a file, and monitors upload stream data, the method including the steps of:
acquiring text information uploaded to a browser locally, wherein the text information comprises a text format and a text size;
analyzing and judging the content of the text information to obtain a text type and obtain a text name;
uniformly analyzing the content of the text information into a word format;
segmenting text information to obtain information fragments;
uploading the information fragments to an ES (application server) through mapping, judging whether the information fragments are uploaded completely and whether re-fragmentation is needed or not through the Disvcovery in the transmission process, and taking charge of re-fragmentation of a master node of a cluster, wherein the uploaded information fragments realize information interaction through a Transport;
and transmitting the information fragments to the River data source.
Preferably, when the information fragment is uploaded to the ES through mapping, a temporary file name is created, and after the information fragment is transmitted to the River data source, the file name is modified.
Preferably, the River data source exists in the ES in the form of a plug-in.
Preferably, the Transport is integrated in the ES by means of a plug-in.
Preferably, the Transport defaults to internally use a TCP protocol for interaction and supports an http protocol, a thick protocol, a servlet, a memcached protocol and a zeroMQ, and the http protocol supports a json format.
Preferably, the memory address of the plug-in used by the uploaded information fragment is configured by path.
Preferably, the path refers to a plugin folder under the ES root directory by default.
In a second aspect, the present invention provides an apparatus comprising: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor is configured to invoke the machine-readable program to perform the method of any of the first aspects.
In a third aspect, the present invention provides a computer readable medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the method of any of the first aspects.
The ES-based file uploading retrieval analysis method and device have the following advantages: fragmenting text information, uploading the information fragments to an ES (electronic document) through mapping, re-fragmenting master nodes of a cluster through a discover, realizing information interaction of the uploaded information fragments through a Transport, and finally uploading the information fragments to a River data source.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
The invention is further described below with reference to the accompanying drawings.
Fig. 1 is a flow chart of an ES-based file upload retrieval analysis method according to embodiment 1.
Detailed Description
The present invention is further described in the following with reference to the drawings and the specific embodiments so that those skilled in the art can better understand the present invention and can implement the present invention, but the embodiments are not to be construed as limiting the present invention, and the embodiments and the technical features of the embodiments can be combined with each other without conflict.
The embodiment of the invention provides a file uploading retrieval analysis method and device based on ES (electronic document) and aims to solve the technical problem of how to quickly and accurately upload a retrieval analysis file.
Example 1:
a file uploading retrieval analysis method based on ES cuts and uploads files in blocks and monitors the uploaded data, and the method comprises the following steps:
s100, acquiring text information uploaded to a browser locally, wherein the text information comprises a text format and a text size;
s200, analyzing and judging the content of the text information to obtain a text type and obtain a text name;
s300, uniformly analyzing the content of the text information into a word format;
s400, segmenting the text information to obtain information fragments;
s500, uploading the information fragments to an ES through mapping, judging whether the information fragments are uploaded completely and whether re-fragmentation is needed or not through the Disvcovery in the transmission process, and taking charge of re-fragmentation of a master node of a cluster, wherein the uploaded information fragments realize information interaction through a Transport;
s600, transmitting the information fragments to a River data source.
When the information fragments are uploaded to the ES through mapping, temporary file names are created, and the file names are modified after the information fragments are transmitted to the River data source. The ES stores the index in the memory by default, and then persists the index in a Gateway when the memory is full, wherein the Gateway represents a persistent storage mode of the ES index.
The River data source exists in the ES in the form of a plug-in.
Mapping, meaning Mapping, is very similar to the type of data in a static language. For example, an int type variable is declared, which can only store int type data later, and Mapping not only tells the ES which field is which type. It can also tell the ES how to index the data and whether the data is indexed.
The discovery is mainly responsible for master node discovery of the cluster. For example, when a node suddenly leaves or comes in, a new fragment is generated.
The Transport represents an interaction mode of an es internal node or a cluster and a client, the default internal is to use a tcp protocol for interaction, and the default internal supports a Transport protocol (integrated in a plug-in mode) of an http protocol (json format), a thrift, a servlet, a memcached, a zeroMQ and the like.
In this embodiment, the memory address of the plug-in used by the uploaded information fragment is configured by path. plugins defaults to a plugins folder under the ES root directory.
And respectively processing the types of the text information:
if so, the docx type:
(1) fragmenting the text information directly;
(2) uploading the information fragments through mapping;
(3) the discovery is responsible for re-fragmenting the master node of the cluster and is used for judging whether the uploading of the file fragments is finished and whether re-fragmentation is needed;
(4) the uploaded fragment information realizes information interaction through Transport;
(5) the plugin storage address used by the uploaded information is configured through path.plugins, and the default is a plugins folder under an ES root directory;
(6) and finally, transmitting the information fragments to a River data source.
If so, xlsx type:
(1) firstly, converting a text into docx information fragmentation;
(2) uploading the information fragments through mapping;
(3) the discovery is responsible for re-fragmenting the master node of the cluster and is used for judging whether the uploading of the file fragments is finished and whether re-fragmentation is needed;
(4) and the uploaded fragment information realizes information interaction through Transport.
(5) The plugin storage address used by the uploaded information is configured through path.plugins, and the default is a plugins folder under an ES root directory;
(6) and finally, transmitting the information fragments to a River data source.
If so, pdf type:
(1) firstly, converting a text into docx information fragmentation;
(2) uploading the information fragments through mapping;
(3) the discovery is responsible for re-fragmenting the master node of the cluster and is used for judging whether the uploading of the file fragments is finished and whether re-fragmentation is needed;
(4) the uploaded fragment information realizes information interaction through Transport;
(5) the plugin storage address used by the uploaded information is configured through path.plugins, and the default is a plugins folder under an ES root directory;
(6) and finally, transmitting the information fragments to a River data source.
If so, png type:
(1) firstly, converting a text into docx information fragmentation;
(2) uploading the information fragments through mapping;
(3) the discovery is responsible for re-fragmenting the master node of the cluster and is used for judging whether the uploading of the file fragments is finished and whether re-fragmentation is needed;
(4) the uploaded fragment information realizes information interaction through Transport;
(5) the plugin storage address used by the uploaded information is configured through path.plugins, and the default is a plugins folder under an ES root directory;
(6) and finally, transmitting the information fragments to a River data source.
In the file uploading retrieval analysis method based on the ES according to the embodiment, the uploaded file is partitioned and analyzed in the form of the service interface, and data block data in the uploaded file is acquired, so that the file is cut and partitioned to be uploaded, and preparation is provided for faster and better file uploading.
Example 2:
an apparatus of the present invention comprises: at least one memory and at least one processor; the at least one memory for storing a machine-readable program; the at least one processor is used for calling the machine readable program and executing the method disclosed by the embodiment 1.
Example 3:
a computer-readable medium of the present invention, having stored thereon computer instructions, which, when executed by a processor, cause the processor to perform the method disclosed in embodiment 1. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.
In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.
Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.
Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.
Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.
While the invention has been shown and described in detail in the drawings and in the preferred embodiments, it is not intended to limit the invention to the embodiments disclosed, and it will be apparent to those skilled in the art that many more embodiments of the invention are possible that combine the features of the different embodiments described above and still fall within the scope of the invention.

Claims (9)

1. The file uploading retrieval analysis method based on ES is characterized in that the file is cut into blocks and uploaded, and the uploading data is monitored, and the method comprises the following steps:
acquiring text information uploaded to a browser locally, wherein the text information comprises a text format and a text size;
analyzing and judging the content of the text information to obtain a text type and obtain a text name;
uniformly analyzing the content of the text information into a word format;
segmenting text information to obtain information fragments;
uploading the information fragments to an ES (application server) through mapping, judging whether the information fragments are uploaded completely and whether re-fragmentation is needed or not through the Disvcovery in the transmission process, and taking charge of re-fragmentation of a master node of a cluster, wherein the uploaded information fragments realize information interaction through a Transport;
and transmitting the information fragments to the River data source.
2. The ES-based file uploading retrieval analysis method of claim 1, wherein when the information fragment is uploaded to the ES through mapping, a temporary file name is created, and after the information fragment is transmitted to the River data source, the file name is modified.
3. The ES-based file upload search analysis method according to claim 1, wherein the River data source exists in the ES in the form of a plug-in.
4. The ES-based file uploading retrieval analysis method of claim 1, wherein the Transport is integrated into the ES by means of plug-in.
5. The ES-based file uploading retrieval analysis method of claim 4, wherein the Transport default internally uses TCP protocol for interaction and supports http protocol, thrift, servlet, memcached and zeroMQ, and the http protocol supports json format.
6. The ES-based file uploading retrieval analysis method according to claim 1, 2, 3, 4 or 5, wherein the storage addresses of the plug-ins used by the uploaded information fragments are configured by path.
7. The method according to claim 6, wherein the path refers to plugins folder under ES root directory.
8. An apparatus, comprising: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor, configured to invoke the machine readable program to perform the method of any of claims 1 to 7.
9. A computer readable medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1 to 7.
CN202110636741.9A 2021-06-07 2021-06-07 ES-based file uploading retrieval analysis method and device Pending CN113382063A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110636741.9A CN113382063A (en) 2021-06-07 2021-06-07 ES-based file uploading retrieval analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110636741.9A CN113382063A (en) 2021-06-07 2021-06-07 ES-based file uploading retrieval analysis method and device

Publications (1)

Publication Number Publication Date
CN113382063A true CN113382063A (en) 2021-09-10

Family

ID=77576592

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110636741.9A Pending CN113382063A (en) 2021-06-07 2021-06-07 ES-based file uploading retrieval analysis method and device

Country Status (1)

Country Link
CN (1) CN113382063A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106850451A (en) * 2017-02-13 2017-06-13 济南浪潮高新科技投资发展有限公司 A kind of data transmission method, apparatus and system
CN110276025A (en) * 2019-06-27 2019-09-24 北京首汽智行科技有限公司 A kind of thermodynamic chart load and methods of exhibiting based on mass data
US10877984B1 (en) * 2017-12-07 2020-12-29 Palantir Technologies Inc. Systems and methods for filtering and visualizing large scale datasets
CN112506886A (en) * 2021-02-05 2021-03-16 北京通付盾人工智能技术有限公司 Multi-source service operation log acquisition method and system
CN112765103A (en) * 2021-01-26 2021-05-07 上海销氪信息科技有限公司 File analysis method, system, device and equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106850451A (en) * 2017-02-13 2017-06-13 济南浪潮高新科技投资发展有限公司 A kind of data transmission method, apparatus and system
US10877984B1 (en) * 2017-12-07 2020-12-29 Palantir Technologies Inc. Systems and methods for filtering and visualizing large scale datasets
CN110276025A (en) * 2019-06-27 2019-09-24 北京首汽智行科技有限公司 A kind of thermodynamic chart load and methods of exhibiting based on mass data
CN112765103A (en) * 2021-01-26 2021-05-07 上海销氪信息科技有限公司 File analysis method, system, device and equipment
CN112506886A (en) * 2021-02-05 2021-03-16 北京通付盾人工智能技术有限公司 Multi-source service operation log acquisition method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
北冥大鱼鱼: "ElasticSearch入门详解", 《CSDN》 *

Similar Documents

Publication Publication Date Title
RU2689439C2 (en) Improved performance of web access
US7539762B2 (en) Method, system and program product for determining an initial number of connections for a multi-source file download
US10459886B2 (en) Client-side deduplication with local chunk caching
EP3757777A1 (en) Webpage loading method, server, and webpage loading system
EP2638683B1 (en) Methods for reducing latency in network connections using automatic redirects and systems thereof
CN112559463B (en) Compressed file processing method and device
US20140359066A1 (en) System, method and device for offline downloading resource and computer storage medium
GB2519516A (en) A method, apparatus and computer program for modifying messages in a communications network
US8108441B2 (en) Efficient creation, storage, and provision of web-viewable documents
CN112765103A (en) File analysis method, system, device and equipment
WO2013097812A1 (en) Method and system for downloading font file
CN105550179A (en) Webpage collection method and browser plug-in
US20160044077A1 (en) Policy use in a data mover employing different channel protocols
CN102624910A (en) Method, device and system for processing webpage content selected by user
CN104978325A (en) Webpage processing method and device, and user terminal
CN113382063A (en) ES-based file uploading retrieval analysis method and device
CN111414339A (en) File processing method, system, device, equipment and medium
CN111126965A (en) Audit rule optimization method and device, computer equipment and storage medium
CN116112484A (en) File fragment uploading method and device, vehicle and storage medium
CN110096478A (en) Document index generation method and equipment
CN107908634B (en) Cache control method of browser and mobile terminal
CN113037848B (en) File uploading method and system
CN113282347B (en) Plug-in operation method, device, equipment and storage medium
CN113900990A (en) File fragment storage method, device, equipment and storage medium
CN113900991A (en) Data interaction method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210910

RJ01 Rejection of invention patent application after publication