CN112612905A - Elasticissearch-based data processing method, system, computer and readable storage medium - Google Patents

Elasticissearch-based data processing method, system, computer and readable storage medium Download PDF

Info

Publication number
CN112612905A
CN112612905A CN202011582357.7A CN202011582357A CN112612905A CN 112612905 A CN112612905 A CN 112612905A CN 202011582357 A CN202011582357 A CN 202011582357A CN 112612905 A CN112612905 A CN 112612905A
Authority
CN
China
Prior art keywords
entity
relationship
data
index
acquiring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011582357.7A
Other languages
Chinese (zh)
Inventor
杜芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Minglue Technology Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN202011582357.7A priority Critical patent/CN112612905A/en
Publication of CN112612905A publication Critical patent/CN112612905A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to an elastic search based data processing method, a system, a computer and a readable storage medium, wherein the method comprises the following steps: a data acquisition step, which is used for acquiring source data to be processed and extracting entity data and relationship data in the source data to be processed; the source data to be processed is graph data; and an index storage step, which is used for respectively carrying out index storage on the entity data and the relationship data based on optimistic lock concurrent writing operation of an elastic search to obtain at least one entity index and a relationship index, wherein each entity index corresponds to a plurality of entity documents, each relationship index corresponds to a plurality of relationship documents, and an entity retrieval step, which is used for obtaining an entity or an entity set matched with the entity information to be retrieved based on obtaining the entity information to be retrieved and retrieving in the entity index based on the entity to be retrieved. According to the method and the device, a mode that the Elasticissearch and a graph database are deployed at the same time is replaced, the graph database does not need to be deployed, and the lightweight deployment and efficient retrieval of graph data are achieved.

Description

Elasticissearch-based data processing method, system, computer and readable storage medium
Technical Field
The present application relates to the field of internet technologies, and in particular, to an Elasticsearch-based data processing method, system, computer device, and computer-readable storage medium.
Background
The graph data is stored by using the Elasticsearch index, namely, an entity index and a relation index are respectively established by using the Elasticsearch, and entity information and relation information in the graph data are respectively stored in the corresponding indexes.
Although many graph databases are available for storing graph data, the method is applicable to a scenario where retrieval and graph coexist and the computation power of the graph is not high, and meanwhile, deployment of the Elasticissearch and the graph database is slightly cumbersome, and the retrieval efficiency of the graph database is often not as good as that of the Elasticissearch.
Disclosure of Invention
The embodiment of the application provides a data processing method, a system, computer equipment and a computer readable storage medium based on an elastic search, and realizes lightweight deployment and efficient retrieval of graph data.
In a first aspect, an embodiment of the present application provides a data processing method based on an Elasticsearch, including:
a data acquisition step, which is used for acquiring source data to be processed and extracting entity data and relationship data in the source data to be processed; the source data to be processed is graph data;
an index storage step, configured to perform index storage on the entity data and the relationship data respectively based on optimistic lock concurrent writing operation of an elastic search to obtain at least one entity index and a relationship index, where each entity index corresponds to multiple entity documents, and each relationship index corresponds to multiple relationship documents;
and an entity retrieval step, which is used for retrieving in the entity index based on the acquired entity information to be retrieved and the entity to be retrieved, so as to obtain an entity or an entity set matched with the entity information to be retrieved.
The entity document and the relationship document correspond to an entity information and a relationship information respectively, specifically, the entity information at least comprises one or any combination of an entity ID, a main key, an entity name, tag information, an attribute value and an entity source; the relationship information is triple information, and the relationship information at least comprises relationship ID, relationship name, subject entity ID, object entity ID, subject entity key, object entity key, label information, whether the subject is directed to the object or not, and one or any combination of relationship sources; and the entity source ID and the relationship source ID are used for tracing entity source information and relationship source information through array storage.
The data storage realized by the steps supports entity retrieval and relationship expansion, the entities are conveniently retrieved according to the entity ID or the attribute value, the high-efficiency entity retrieval is realized, and meanwhile, the correctness of data concurrent writing is ensured by adopting an optimistic lock mechanism.
In some of these embodiments, the method further comprises: and a map construction step, namely acquiring entity IDs in the entity set, and performing relationship expansion on the entity IDs according to the relationship indexes to construct a map of the map data.
Through the steps, the efficient expansion relationship and the tracing source are realized, the atlas information required by the user is assembled according to the entity ID expansion relationship, the entity relationship can be quickly obtained and the atlas is constructed only by deploying the elastic search under the scene that retrieval and the atlas coexist and the requirement on the computation capability of the atlas is low, the mode that the elastic search and the atlas database are deployed simultaneously is replaced, the atlas database does not need to be deployed, and the operation and maintenance cost of the system is reduced.
In some of these embodiments, the entity retrieving step further comprises:
an attribute value retrieval step, which is used for acquiring an entity attribute value in entity information to be retrieved and acquiring an entity set matched with the attribute value through the retrieval attribute value;
an entity ID retrieval step, which is used for acquiring an entity ID in entity information to be retrieved and acquiring an entity matched with the entity ID by retrieving the entity ID;
and a primary key value retrieval step, which is used for acquiring a unique identifier in the entity information to be retrieved and acquiring an entity matched with the unique identifier by retrieving the unique identifier. Specifically, the unique identifier is a primary key value and label information, the label information comprises a Chinese label and an English label, term retrieval is carried out on the entity primary key value, the entity English label and the entity Chinese label through an elastic search, and the three values are simultaneously matched with respective corresponding fields, namely the three values are regarded as hitting a certain entity.
The entity retrieval step is used for representing three retrieval modes according to the embodiment of the application, and one or any combination of the three retrieval modes can be flexibly selected according to scenes and requirements for entity retrieval.
In some of these embodiments, the map construction step further comprises:
a relationship expansion step, configured to obtain a relationship matching a subject entity ID and/or an object entity ID by retrieving the subject entity ID and/or the object entity ID in the relationship index, and obtain an opposite end entity ID of the relationship from the obtained relationship, so as to perform entity retrieval according to the opposite end entity ID, and expand entity information of the opposite end of the relationship;
and a circulating step for circularly executing the relationship expansion step to expand all the relationships of the entities and the map data formed by the entities.
In a second aspect, an embodiment of the present application provides an Elasticsearch-based data processing system, including:
the data acquisition module is used for acquiring source data to be processed and extracting entity data and relationship data in the source data to be processed;
the index storage module is used for respectively carrying out index storage on the entity data and the relationship data based on optimistic lock concurrent writing operation of an elastic search to obtain at least one entity index and a relationship index, wherein each entity index corresponds to a plurality of entity documents, and each relationship index corresponds to a plurality of relationship documents;
and the entity retrieval module is used for retrieving in the entity index based on the acquired entity information to be retrieved and the entity to be retrieved, so as to obtain an entity or an entity set matched with the entity information to be retrieved.
The entity document and the relationship document correspond to an entity information and a relationship information respectively, specifically, the entity information at least comprises one or any combination of an entity ID, a main key, an entity name, tag information, an attribute value and an entity source ID; the relationship information is triple information, and the relationship information at least comprises relationship ID, relationship name, subject entity ID, object entity ID, subject entity key, object entity key, label information, whether the subject is directed to the object or not, and relationship source ID or any combination thereof.
The data storage realized by the module supports entity retrieval and relationship expansion, the entity is conveniently retrieved according to the ID or the attribute value, the high-efficiency entity retrieval is realized, and meanwhile, the correctness of data concurrent writing is ensured by adopting an optimistic lock mechanism.
In some of these embodiments, the system further comprises: and the map construction module is used for acquiring the entity ID in the entity set and carrying out relationship expansion on the entity ID according to the relationship index so as to construct the map of the map data.
Through the modules, efficient expansion relations and tracing sources are achieved, map information required by a user is assembled according to entity ID expansion relations, under the scene that retrieval and maps coexist and the map calculation capacity requirement is low, the entity relations can be rapidly obtained and the maps can be constructed only by deploying the elastic search, the mode that the elastic search and a map database are deployed simultaneously is replaced, the map database does not need to be deployed, and the system operation and maintenance cost is reduced.
In some of these embodiments, the entity retrieval module further comprises:
the attribute value retrieval module is used for acquiring entity attribute values in the entity information to be retrieved and acquiring an entity set matched with the attribute values through the retrieved attribute values;
the entity ID retrieval module is used for acquiring an entity ID in entity information to be retrieved and acquiring an entity matched with the entity ID by retrieving the entity ID;
and the primary key value retrieval module is used for acquiring the unique identifier in the entity information to be retrieved and acquiring the entity matched with the unique identifier by retrieving the unique identifier. Specifically, the unique identifier is a primary key value and label information, the label information comprises a Chinese label and an English label, term retrieval is carried out on the entity primary key value, the entity English label and the entity Chinese label through an elastic search, and the three values are simultaneously matched with respective corresponding fields, namely the three values are regarded as hitting a certain entity.
The entity retrieval module is used for representing three retrieval modes according to the embodiment of the application, and one or any combination of the three retrieval modes can be flexibly selected according to scenes and requirements to carry out entity retrieval.
In some embodiments, the map building module further comprises:
a relationship expansion module, configured to obtain a relationship matching a subject entity ID and/or an object entity ID by retrieving the subject entity ID and/or the object entity ID in the relationship index, and obtain an opposite end entity ID of the relationship from the obtained relationship, so as to perform entity retrieval according to the opposite end entity ID, and expand entity information of the opposite end of the relationship;
and the circulating module is used for circularly executing the relationship expansion module so as to expand all the relationships of the entities and the map data formed by the entities.
The concurrent data writing operation adopts an optimistic locking mechanism to ensure atomicity and consistency of the data writing operation.
In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the processor implements the method for processing data based on the Elasticsearch according to the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for processing data based on an Elasticsearch as described in the first aspect above.
Compared with the related art, the data processing method, the data processing system, the computer device and the computer readable storage medium based on the elastic search provided by the embodiment of the application realize efficient entity retrieval, relationship expansion and source tracing through data storage and data structure setting based on the elastic search, and simultaneously guarantee the correctness of data concurrent writing by adopting an optimistic lock mechanism. Under the scene that retrieval and maps coexist and the requirement on the map computing capacity is low, the entity relationship can be quickly obtained and the maps can be constructed only by deploying the Elasticissearch, the mode that the Elasticissearch and the map database are deployed at the same time is replaced, the map database does not need to be deployed, and the system operation and maintenance cost is reduced.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a flowchart of an Elasticsearch-based data processing method according to an embodiment of the present application;
FIG. 2 is a block diagram of an Elasticissearch-based data processing system according to an embodiment of the present application;
fig. 3 is a schematic diagram of an optimistic lock concurrent write operation based on an Elasticsearch in step S2 according to an embodiment of the present application.
Description of the drawings:
1. a data acquisition module; 2. an index storage module; 3. an entity retrieval module;
4. a map construction module; 401. a relationship extension module; 402. and a circulation module.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
In order to achieve the purpose of light-weight deployment of efficient query graph data, embodiments of the present application provide a solution for storing graph data based on an Elasticsearch index.
The embodiment provides a data processing method based on an elastic search. Fig. 1 is a flowchart of an Elasticsearch-based data processing method according to an embodiment of the present application, where the flowchart includes the following steps, as shown in fig. 1:
a data obtaining step S1, configured to obtain source data to be processed and extract entity data and relationship data in the source data to be processed; the source data to be processed is graph data.
And an index storage step S2, configured to perform index storage on the entity data and the relationship data respectively based on optimistic lock concurrent writing operation of the Elasticsearch to obtain at least one entity index and a relationship index, where each entity index corresponds to multiple entity documents, each relationship index corresponds to multiple relationship documents, and the entity documents and the relationship documents correspond to one piece of entity information and relationship information respectively.
And an entity retrieval step S3, configured to retrieve in the entity index based on the acquired information of the entity to be retrieved and based on the entity to be retrieved, and obtain an entity or an entity set that matches the information of the entity to be retrieved.
And a map construction step S4, configured to acquire the entity IDs in the entity set, and perform relationship expansion on the entity IDs according to the relationship index, so as to construct a map of the map data.
Specifically, the entity information at least includes one or any combination of an entity ID, a primary key, an entity name, tag information, an attribute value, and an entity source; the relation information is triple information, and the relation information at least comprises a relation ID, a relation name, a subject entity ID, an object entity ID, a subject entity key, an object entity key, label information, whether the subject is directed to the object or not, and one or any combination of the relation source; and the entity source ID and the relationship source ID are stored through an array and used for tracing entity source information and relationship source information. The specific mapping structure of the entity index is as follows:
Figure BDA0002865412730000061
Figure BDA0002865412730000071
the specific mapping structure of the relationship index is as follows:
Figure BDA0002865412730000072
the data is stored by the storage mode of the steps to support entity retrieval and relationship expansion, the entities are conveniently retrieved according to the ID or the attribute value, high-efficiency entity retrieval is realized, and meanwhile, an optimistic lock mechanism is adopted to ensure the correctness of data concurrent writing. In addition, the embodiment of the application also realizes efficient expansion relationship and source tracing, so that the atlas information required by the user is assembled according to the entity ID expansion relationship, under the scene that retrieval and atlas coexist and the requirement on the computation capability of the atlas is low, the entity relationship can be quickly obtained and the atlas is constructed only by deploying the Elasticissearch, the mode that the Elasticissearch and the atlas database are deployed simultaneously is replaced, the atlas database does not need to be deployed, and the operation and maintenance cost of the system is reduced.
In some of these embodiments, the entity retrieving step S3 further includes:
an attribute value retrieval step, which is used for acquiring an entity attribute value in entity information to be retrieved and acquiring an entity set matched with the attribute value through retrieving the attribute value propertyValue;
an entity ID retrieval step, which is used for acquiring an entity ID in entity information to be retrieved and acquiring an entity matched with the entity ID through the retrieval entity ID;
and a primary key value retrieval step, which is used for acquiring the unique identifier in the entity information to be retrieved and acquiring the entity matched with the unique identifier by retrieving the unique identifier. Specifically, the unique identifier is a primary key value and label information, the label information comprises a Chinese label and an English label, term retrieval is carried out on fields of the entity primary key value key, the entity English label eDelabel and the entity Chinese label cNlabel through an elastic search, and the three values are simultaneously matched with the corresponding fields respectively, namely the entity is considered to be hit.
The entity retrieving step S3 is used to indicate three retrieving manners according to the embodiment of the present application, and one or any combination of the three retrieving manners can be flexibly selected according to the scene and the requirement to perform entity retrieval.
In some of these embodiments, the map construction step S4 further includes:
a relationship expansion step S401, configured to obtain a relationship matching the subject entity ID and/or the object entity ID by retrieving the subject entity ID and/or the object entity ID in the relationship index, and obtain an opposite end entity ID of the relationship from the obtained relationship, so as to perform entity retrieval according to the opposite end entity ID, and expand entity information of an opposite end of the relationship;
a looping step S402 for looping the relationship expansion step S401 to expand all the relationships of the entities and the map data of the entity composition.
The embodiment also provides a data processing system based on the Elasticsearch, which is used for implementing the above embodiments and preferred embodiments, and the description of the system that has been already made is omitted. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. While the system described in the embodiments below is preferably implemented in software, implementations in hardware, or a combination of software and hardware are also possible and contemplated.
Fig. 2 is a block diagram of a data processing system based on an Elasticsearch according to an embodiment of the present application, and referring to fig. 2, the system includes: the data acquisition module 1 is used for acquiring source data to be processed and extracting entity data and relationship data in the source data to be processed; the index storage module 2 is used for respectively performing index storage on the entity data and the relationship data based on optimistic lock concurrent writing operation of the elastic search to obtain at least one entity index and at least one relationship index; the entity retrieval module 3 is used for retrieving in the entity index based on the acquired entity information to be retrieved and the entity to be retrieved, and acquiring an entity or an entity set matched with the entity information to be retrieved; and the map building module 4 is used for obtaining the entity IDs in the entity set and carrying out relationship expansion on the entity IDs according to the relationship indexes so as to build a map of the map data.
Each entity index corresponds to a plurality of entity documents, each relationship index corresponds to a plurality of relationship documents, the entity documents and the relationship documents respectively correspond to one piece of entity information and relationship information, and specifically, the entity information at least comprises one or any combination of an entity ID, a main key, an entity name, label information, an attribute value and an entity source ID; the relationship information is triple information, and the relationship information at least comprises relationship ID, relationship name, subject entity ID, object entity ID, subject entity key, object entity key, label information, whether the subject is directed to the object or not, and relationship source ID or any combination thereof.
Specifically, the entity retrieving module 3 further includes: the attribute value retrieval module is used for acquiring entity attribute values in the entity information to be retrieved and acquiring an entity set matched with the attribute values through retrieving the attribute values propertyValue; the entity ID retrieval module is used for acquiring an entity ID in the entity information to be retrieved and acquiring an entity matched with the entity ID through the retrieval entity ID; and the primary key value retrieval module is used for acquiring the unique identifier in the entity information to be retrieved and acquiring the entity matched with the unique identifier by retrieving the unique identifier. Specifically, the unique identifier is a primary key value and label information, the label information comprises a Chinese label and an English label, term retrieval is carried out on fields of the entity primary key value key, the entity English label eDelabel and the entity Chinese label cNlabel through an elastic search, and the three values are simultaneously matched with the corresponding fields respectively, namely the entity is considered to be hit. The entity retrieval module is used for representing three retrieval modes according to the embodiment of the application, and one or any combination of the three retrieval modes can be flexibly selected according to scenes and requirements to carry out entity retrieval.
Specifically, the map building module 4 further includes: a relationship expansion module 401, configured to obtain a relationship matching a subject entity ID and/or a guest entity ID by retrieving the subject entity ID and/or the guest entity ID in the relationship index, and obtain an opposite-end entity ID of the relationship from the obtained relationship, so as to perform entity retrieval according to the opposite-end entity ID and expand entity information of an opposite end of the relationship; a loop module 402, configured to loop the relationship extension module to extend all relationships of the entities and the graph data formed by the entities.
Specifically, fig. 3 is a schematic diagram of an optimistic lock concurrent write operation based on an Elasticsearch in step S2 according to an embodiment of the present application. As shown in fig. 3, the entity data concurrent writing based on the operation includes:
firstly, obtaining given information such as entity primary keys, Chinese labels, English labels, various attributes, entity source IDs and the like, splicing the values of the entity primary keys, the Chinese labels cNlabel and the English labels eLabel, converting the spliced values into character strings with fixed lengths and according with the Elasticisearch _ ID format standard by utilizing an SHA (secure hash algorithm), and taking the character strings as the entity IDs;
then, setting the initial version number to be 1, assembling entity information to be elastic search document, indexing the document by using a mode of specifying _ id and version, and if the description data is successfully written and does not conflict, successfully returning the method. If a version 409 conflict error is returned, the following operations need to be performed:
firstly, according to the entity id, inquiring about the document and version of the entity in the entity index, wherein the document is marked as: document _ confllict, which is denoted as version _ confllict;
then, respectively merging attribute English name arrays, attribute EnNames, attribute Chinese name data attribute CnNames, attribute value arrays, attribute values and entity source id arrays sourceIds of the current document and document _ confllict;
and finally, setting the version as version _ confllict +1, indexing the document by using the modes of the designated _ id and the version again, and if the description data is successfully written and no conflict exists, successfully returning the method. If a version conflict error is returned 409, the above operation is performed again to attempt until a successful write.
The concurrent writing of the relationship data based on the operation further comprises:
firstly, giving information such as a subject entity, an object entity, a relation name, direction existence, a relation source ID and the like, splicing the subject entity ID, the object entity ID and the relation name, converting a spliced value into a character string with fixed length and according with an Elasticisearch _ ID format standard by utilizing an SHA (secure hash algorithm), wherein the character string is used as the relation ID;
then, the initial version number is set to 1, the assembly relation information is an elastic search document, the document is indexed by using a mode of specifying _ id and version, and if the description data is successfully written and does not conflict, the document is successfully returned. If a version 409 conflict error is returned, the following operations need to be performed:
firstly, according to the relationship id, inquiring the document and version about the relationship in the relationship index, wherein the document is marked as: document _ confllict, which is denoted as version _ confllict;
then, merging the relation source ID arrays sourceIds of the current document and document _ confllict;
finally, setting version as version _ confllict +1, and indexing document by using the mode of appointed _ id and version again, if the description data is successfully written and has no conflict, the method successfully returns; if a version conflict error is returned 409, the above operation is performed again to attempt until a successful write.
The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
In addition, the data processing method based on the Elasticsearch in the embodiment of the present application described in conjunction with fig. 1 may be implemented by a computer device. The computer device may include a processor and a memory storing computer program instructions. In particular, the processor may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application. The memory may be used to store or cache various data files for processing and/or communication use, as well as possibly computer program instructions for execution by the processor. The processor reads and executes the computer program instructions stored in the memory to implement any one of the above-described embodiments of the Elasticsearch-based data processing method.
The memory may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory may include a Hard Disk Drive (Hard Disk Drive, abbreviated to HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. The memory may include removable or non-removable (or fixed) media, where appropriate. The memory may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory is a Non-Volatile (Non-Volatile) memory. In particular embodiments, the Memory includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), Electrically rewritable ROM (earrom), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), an Extended data output Dynamic Random-Access Memory (EDODRAM), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.
In addition, in combination with the data processing method based on the Elasticsearch in the foregoing embodiments, the embodiments of the present application may provide a computer-readable storage medium to implement. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any of the above-described embodiments of the Elasticsearch-based data processing method.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A data processing method based on an elastic search is characterized by comprising the following steps:
a data acquisition step, which is used for acquiring source data to be processed and extracting entity data and relationship data in the source data to be processed;
an index storage step, configured to perform index storage on the entity data and the relationship data respectively based on optimistic lock concurrent writing operation of an elastic search to obtain at least one entity index and a relationship index, where each entity index corresponds to multiple entity documents, and each relationship index corresponds to multiple relationship documents;
and an entity retrieval step, which is used for retrieving in the entity index based on the acquired entity information to be retrieved and the entity to be retrieved, so as to obtain an entity or an entity set matched with the entity information to be retrieved.
2. The Elasticsearch-based data processing method of claim 1, further comprising: and a map construction step, namely acquiring the entity ID in the entity set, and performing relationship expansion on the entity ID according to the relationship index so as to construct a map.
3. The Elasticsearch-based data processing method according to claim 1 or 2, wherein said entity retrieving step further comprises:
an attribute value retrieval step, which is used for acquiring an entity attribute value in entity information to be retrieved and acquiring an entity set matched with the attribute value through the retrieval attribute value;
an entity ID retrieval step, which is used for acquiring an entity ID in entity information to be retrieved and acquiring an entity matched with the entity ID by retrieving the entity ID;
and a primary key value retrieval step, which is used for acquiring a unique identifier in the entity information to be retrieved and acquiring an entity matched with the unique identifier by retrieving the unique identifier.
4. The Elasticsearch-based data processing method of claim 2, wherein said map construction step further comprises:
a relationship expansion step, configured to obtain a relationship matching a subject entity ID and/or an object entity ID by retrieving the subject entity ID and/or the object entity ID in the relationship index, and obtain an opposite end entity ID of the relationship from the obtained relationship, so as to perform entity retrieval according to the opposite end entity ID, and expand entity information of the opposite end of the relationship;
and a circulating step for circularly executing the relationship expansion step to expand all the relationships of the entities and the map data formed by the entities.
5. An Elasticsearch-based data processing system, comprising:
the data acquisition module is used for acquiring source data to be processed and extracting entity data and relationship data in the source data to be processed;
the index storage module is used for respectively carrying out index storage on the entity data and the relationship data based on optimistic lock concurrent writing operation of an elastic search to obtain at least one entity index and a relationship index, wherein each entity index corresponds to a plurality of entity documents, and each relationship index corresponds to a plurality of relationship documents;
and the entity retrieval module is used for retrieving in the entity index based on the acquired entity information to be retrieved and the entity to be retrieved, so as to obtain an entity or an entity set matched with the entity information to be retrieved.
6. The Elasticsearch based data processing system of claim 5, further comprising: and the map building module is used for acquiring the entity IDs in the entity set and carrying out relationship expansion on the entity IDs according to the relationship indexes so as to build a map.
7. An Elasticsearch based data processing system according to claim 5 or 6, wherein said entity retrieval module further comprises:
the attribute value retrieval module is used for acquiring entity attribute values in the entity information to be retrieved and acquiring an entity set matched with the attribute values through the retrieved attribute values;
the entity ID retrieval module is used for acquiring an entity ID in entity information to be retrieved and acquiring an entity matched with the entity ID by retrieving the entity ID;
and the primary key value retrieval module is used for acquiring the unique identifier in the entity information to be retrieved and acquiring the entity matched with the unique identifier by retrieving the unique identifier.
8. The Elasticsearch-based data processing system of claim 6, wherein the atlas construction module further comprises:
a relationship expansion module, configured to obtain a relationship matching a subject entity ID and/or an object entity ID by retrieving the subject entity ID and/or the object entity ID in the relationship index, and obtain an opposite end entity ID of the relationship from the obtained relationship, so as to perform entity retrieval according to the opposite end entity ID, and expand entity information of the opposite end of the relationship;
and the circulating module is used for circularly executing the relationship expansion module so as to expand all the relationships of the entities and the map data formed by the entities.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the Elasticsearch based data processing method according to any of claims 1 to 4 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the Elasticsearch-based data processing method according to any of claims 1 to 4.
CN202011582357.7A 2020-12-28 2020-12-28 Elasticissearch-based data processing method, system, computer and readable storage medium Pending CN112612905A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011582357.7A CN112612905A (en) 2020-12-28 2020-12-28 Elasticissearch-based data processing method, system, computer and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011582357.7A CN112612905A (en) 2020-12-28 2020-12-28 Elasticissearch-based data processing method, system, computer and readable storage medium

Publications (1)

Publication Number Publication Date
CN112612905A true CN112612905A (en) 2021-04-06

Family

ID=75248594

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011582357.7A Pending CN112612905A (en) 2020-12-28 2020-12-28 Elasticissearch-based data processing method, system, computer and readable storage medium

Country Status (1)

Country Link
CN (1) CN112612905A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113535682A (en) * 2021-07-23 2021-10-22 中信银行股份有限公司 Data version management system, method, device and storage medium
CN113535682B (en) * 2021-07-23 2024-05-17 中信银行股份有限公司 Data version management system, method, device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014039884A1 (en) * 2012-09-07 2014-03-13 Magnet Systems, Inc. Time-based graph data model
CN110134796A (en) * 2019-04-19 2019-08-16 平安科技(深圳)有限公司 Clinical test search method, device, computer equipment and the storage medium of knowledge based map
CN111382226A (en) * 2018-12-29 2020-07-07 北京神州泰岳软件股份有限公司 Database query retrieval method and device and electronic equipment
CN111913949A (en) * 2019-05-07 2020-11-10 北京京东尚科信息技术有限公司 Data processing method, system, device and computer readable storage medium
CN112131214A (en) * 2019-06-25 2020-12-25 北京京东尚科信息技术有限公司 Method, system, equipment and storage medium for data writing and data query

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014039884A1 (en) * 2012-09-07 2014-03-13 Magnet Systems, Inc. Time-based graph data model
CN111382226A (en) * 2018-12-29 2020-07-07 北京神州泰岳软件股份有限公司 Database query retrieval method and device and electronic equipment
CN110134796A (en) * 2019-04-19 2019-08-16 平安科技(深圳)有限公司 Clinical test search method, device, computer equipment and the storage medium of knowledge based map
CN111913949A (en) * 2019-05-07 2020-11-10 北京京东尚科信息技术有限公司 Data processing method, system, device and computer readable storage medium
CN112131214A (en) * 2019-06-25 2020-12-25 北京京东尚科信息技术有限公司 Method, system, equipment and storage medium for data writing and data query

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113535682A (en) * 2021-07-23 2021-10-22 中信银行股份有限公司 Data version management system, method, device and storage medium
CN113535682B (en) * 2021-07-23 2024-05-17 中信银行股份有限公司 Data version management system, method, device and storage medium

Similar Documents

Publication Publication Date Title
US9367640B2 (en) Method and system for creating linked list, method and system for searching data
US9141676B2 (en) Systems and methods of modeling object networks
JP5506290B2 (en) Associative memory system and method using searchable blocks
US20130080485A1 (en) Quick filename lookup using name hash
CN106980665B (en) Data dictionary implementation method and device and data dictionary management system
WO2021258848A1 (en) Data dictionary generation method and apparatus, data query method and apparatus, and device and medium
US8682902B2 (en) Storage device having full-text search function
CN110597855A (en) Data storage method, terminal equipment and computer readable storage medium
CN114491172B (en) Rapid retrieval method, device and equipment for tree structure nodes and storage medium
JP6977565B2 (en) Search result output program, search result output device and search result output method
CN108038253B (en) Log query processing method and device
CN112612905A (en) Elasticissearch-based data processing method, system, computer and readable storage medium
ES2812771T3 (en) Data storage method and apparatus for mobile terminal
US10262000B1 (en) Global distributed file append using log-structured file system
CN112527950A (en) MapReduce-based graph data deleting method and system
Fleet et al. Fast search in hamming space with multi-index hashing
Hon et al. Compressed index for dictionary matching
CN110765125B (en) Method and device for storing data
CN113536047A (en) Graph database data deleting method, system, electronic equipment and storage medium
CN114238334A (en) Heterogeneous data encoding method and device, heterogeneous data decoding method and device, computer equipment and storage medium
CN112597106A (en) Document page skipping method and system
CN111984828A (en) Neighbor node retrieval method and device
Choudum et al. Embedding height balanced trees and Fibonacci trees in hypercubes
CN112162950B (en) Data processing method and device based on file system and computer equipment
US11816158B2 (en) Metadata tagging of document within search engine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230926

Address after: Room 401, 4th Floor, Building J, Yunmi City, No. 19 Ningshuang Road, Yuhuatai District, Nanjing City, Jiangsu Province, 210000

Applicant after: Nanjing Minglue Technology Co.,Ltd.

Address before: 100089 a1002, 10th floor, building 1, yard 1, Zhongguancun East Road, Haidian District, Beijing

Applicant before: MININGLAMP SOFTWARE SYSTEMS Co.,Ltd.

TA01 Transfer of patent application right