CN113282579A - Heterogeneous data storage and retrieval method, device, equipment and storage medium - Google Patents

Heterogeneous data storage and retrieval method, device, equipment and storage medium Download PDF

Info

Publication number
CN113282579A
CN113282579A CN202110414397.9A CN202110414397A CN113282579A CN 113282579 A CN113282579 A CN 113282579A CN 202110414397 A CN202110414397 A CN 202110414397A CN 113282579 A CN113282579 A CN 113282579A
Authority
CN
China
Prior art keywords
attribute
data
partition
storage
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110414397.9A
Other languages
Chinese (zh)
Inventor
曲子乐
程宏国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202110414397.9A priority Critical patent/CN113282579A/en
Publication of CN113282579A publication Critical patent/CN113282579A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2291User-Defined Types; Storage management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24556Aggregation; Duplicate elimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data storage and retrieval method, a device, equipment and a storage medium, wherein the data storage method comprises the following steps: acquiring data to be stored comprising M attributes; classifying the M attribute pairs according to the attribute characteristics of the M attributes to obtain first class data with the attribute characteristics being system attributes and second class data with the attribute characteristics being non-system attributes; and storing the first type of data to a first storage partition, and storing the second type of data to a second storage partition. In this way, the data to be stored is classified according to the attribute characteristics of each attribute in the data to be stored, the data to be stored is divided into first-class data with the attribute characteristics being system attributes and second-class data with the attribute characteristics being non-system attributes, and then the first-class data and the second-class data are respectively stored in the corresponding storage partitions. By performing attribute partition storage on data to be stored, data with the same attribute characteristics are stored in the same partition in a centralized manner, and the data do not need to be stored in a row, so that a large amount of row data is avoided to a certain extent.

Description

Heterogeneous data storage and retrieval method, device, equipment and storage medium
Technical Field
The present application relates to big data technologies, and in particular, to a method, an apparatus, a device, and a storage medium for storing and retrieving heterogeneous data.
Background
In some scenarios, such as storage for class data, it is desirable to use a database (e.g., MySql) to store and support transactions heterogeneously. The course class data at least comprises live courses, on-demand courses, image-text courses, paid courses and the like, which have the same state machine (such as states of uncommitted course, un-shelved course, to-be-audited course, rejected course, audited pass and the like) and also can have respective unique attributes, such as video-related attributes of the on-demand courses and price-related attributes of the paid courses. As the business progresses, the attributes may change frequently, and even a new type (e.g., offline course) may be added. For such sudden changes, a solution for solving heterogeneous data storage based on a database exists in the prior art.
Specifically, the Attribute of the model is stored in the data table in a form of record through an Entity-Attribute-Value (EAV) database model, and even if the Attribute needs to be added and removed, only the record responding to the Attribute in the data table needs to be deleted, and the database structure does not need to be modified.
However, in the EAV model, since each attribute of an entity type occupies one row separately, data growth is very fast, and if an entity type has 10 attributes, the traditional model method occupies 100 ten thousand rows of data, and the EAV model occupies about 1000 ten thousand rows of data, i.e. a large amount of row of data is generated when the EAV model is used for heterogeneous data storage.
Disclosure of Invention
In order to solve the foregoing technical problems, it is desirable to provide a method, an apparatus, a device, and a storage medium for storing and retrieving heterogeneous data.
The technical scheme of the application is realized as follows:
in a first aspect, a data storage method is provided, and the method includes:
acquiring data to be stored comprising M attributes; wherein M is a positive integer;
classifying the M attributes according to the attribute characteristics of the M attributes to obtain first class data with the attribute characteristics being system attributes and second class data with the attribute characteristics being non-system attributes;
and storing the first type of data to a first storage partition, and storing the second type of data to a second storage partition.
In the above scheme, each attribute in the data to be stored includes a first attribute identifier and an attribute value; the method further comprises: acquiring metadata corresponding to the data to be stored; wherein the metadata includes self-defined attribute information, the self-defined attribute information including: self-defining attribute characteristics and an attribute set corresponding to the attribute characteristics, wherein the attribute set comprises a second attribute identifier of at least one attribute; and using the first attribute identifier of the target attribute and the mapping relation between the first attribute identifier and the second attribute identifier to take the corresponding custom attribute feature in the metadata as the attribute feature of the target attribute.
In the above solution, the storing the first type of data in a first storage partition and the storing the second type of data in a second storage partition includes: determining a second attribute identifier corresponding to the first attribute identifier of each attribute by using the mapping relation; taking the second attribute identifier of each attribute in the first type of data as a key, taking an attribute value as a value, and storing the attribute value in the first storage partition; and storing the second attribute identifier of each attribute in the second type of data as a key and the attribute value as a value to the second storage partition.
In the above solution, the custom attribute feature includes a system attribute feature and a plurality of non-system attribute features, and the second storage partition includes a plurality of second sub-storage partitions; the storing the second type of data to a second storage partition includes: classifying the second class of data according to non-system attributes to obtain at least two third class of data; and storing the third type of data to a corresponding second sub-storage partition according to the non-system attribute type of the third type of data.
In the above scheme, the custom attribute information further includes a partition identifier corresponding to the custom attribute feature; the method further comprises the following steps: determining a first partition identifier in the metadata according to the system attribute characteristics of the first type of data; determining a second partition identifier in the metadata according to the non-system attribute characteristics of the third type of data; the storing the first type of data to a first storage partition and the storing the second type of data to a second storage partition includes: storing the first type of data into a corresponding first storage partition according to the first partition identification; and storing the third class of data to a corresponding second sub-storage partition according to the second partition identification.
In a second aspect, a data retrieval method is provided, which is applied to a full-text search engine, and is characterized in that the method includes: acquiring retrieval information; determining N first attribute identifications according to the retrieval information; wherein N is a positive integer; searching a storage partition according to the N first attribute identifications to obtain corresponding N sections of retrieval data; the storage partition is used for storing the attribute in a partition mode according to the attribute characteristics; and aggregating the N sections of retrieval data according to a preset aggregation strategy to obtain a retrieval result corresponding to the retrieval information.
In the foregoing solution, the searching for a storage partition according to the N first attribute identifiers to obtain corresponding N segments of search data includes: determining N second attribute identifications of the N first attribute identifications based on a preset mapping relation between each first attribute identification and the corresponding second attribute identification; and searching a storage partition based on the N second attribute identifications to acquire the N sections of retrieval data.
In the foregoing scheme, the aggregating the N segments of search data according to a preset aggregation policy to obtain a search result corresponding to the search information includes: and aggregating the N sections of retrieval data according to the arrangement sequence of the N first attribute identifications to obtain the retrieval result.
In a third aspect, there is provided a data storage device comprising:
a first obtaining unit configured to obtain data to be stored including M attributes; wherein M is a positive integer;
the classification unit is used for classifying the M attribute pairs according to the attribute characteristics of the M attributes to obtain first-class data with the attribute characteristics being system attributes and second-class data with the attribute characteristics being non-system attributes;
and the storage unit is used for storing the first type of data to a first storage partition and storing the second type of data to a second storage partition.
In a fourth aspect, there is provided a data retrieval apparatus for use in a full text search engine, the apparatus comprising:
a second acquisition unit that acquires retrieval information;
a determining unit, configured to determine N first attribute identifiers according to the search information; wherein N is a positive integer;
the searching unit is used for searching the storage partitions according to the N first attribute identifications and acquiring corresponding N sections of search data; the storage partition is used for storing the attribute in a partition mode according to the attribute characteristics;
and the aggregation unit is used for aggregating the N sections of retrieval data according to a preset aggregation strategy to obtain a retrieval result corresponding to the retrieval information.
In a fifth aspect, there is provided a data storage device comprising: a processor and a memory configured to store a computer program operable on the processor, wherein the processor is configured to perform the steps of the aforementioned method when executing the computer program.
In a sixth aspect, there is provided a full text search engine, comprising: a processor and a memory configured to store a computer program operable on the processor, wherein the processor is configured to perform the steps of the aforementioned method when executing the computer program.
In a seventh aspect, a computer-readable storage medium is provided, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the aforementioned method.
By adopting the technical scheme, the data to be stored is classified according to the attribute characteristics of each attribute in the data to be stored, the data to be stored is divided into first-class data with the attribute characteristics being system attributes and second-class data with the attribute characteristics being non-system attributes, and then the first-class data and the second-class data are respectively stored in the corresponding storage partitions. By performing attribute partition storage on data to be stored, data with the same attribute characteristics are stored in the same partition in a centralized manner, and the data do not need to be stored in a row, so that a large amount of row data is avoided to a certain extent.
Drawings
FIG. 1 is a block diagram of an overall framework for data storage and retrieval in an embodiment of the present application;
FIG. 2 is a first flowchart of a data storage method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a UML class diagram in an embodiment of the present application;
FIG. 4 is a flowchart illustrating a data classification method according to an embodiment of the present application;
FIG. 5 is a flow chart illustrating the preprocessing of attribute values according to an embodiment of the present application;
FIG. 6 is a flowchart illustrating a storage method after data classification in an embodiment of the present application;
FIG. 7 is a first flowchart of a data retrieval method according to an embodiment of the present application;
FIG. 8 is a second flowchart of a data retrieval method according to an embodiment of the present application;
FIG. 9 is a second sub-flowchart of a data retrieval method according to an embodiment of the present application;
FIG. 10 is a schematic structural diagram of a data storage device according to an embodiment of the present application;
FIG. 11 is a diagram illustrating a first structure of a full text search engine according to an embodiment of the present application;
FIG. 12 is a schematic structural diagram of a data storage device according to an embodiment of the present application;
fig. 13 is a second structural diagram of the full text search engine in the embodiment of the present application.
Detailed Description
So that the manner in which the features and elements of the present embodiments can be understood in detail, a more particular description of the embodiments, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings.
Before introducing a data storage and retrieval method, the present application provides a data storage and retrieval architecture diagram, and fig. 1 is an overall framework schematic diagram of data storage and retrieval in an embodiment of the present application.
As shown in fig. 1, the overall framework of data storage and retrieval mainly includes an API layer, an application layer, and a storage layer.
The API layer comprises a RESTFUL API interface or an RPC API interface, and provides services for the outside through the RESTFUL API interface or the RPC API interface.
Each Filter (Filter) in the application layer is used for filtering attributes with different functions, for example, filtering sensitive information of text or pictures in the attributes, and the like, wherein different filters realize different filtering functions. The application layer also encapsulates the complexity of bottom layer storage and retrieval, and classifies the data to be stored by performing attribute partitioning on the attributes with different attribute characteristics, and stores the classified data into the corresponding database of the storage layer.
The data to be stored can include 3 different attribute characteristics of a square, a triangle and a circle, different partition ranges are set for the different attribute characteristics, for example, the partition range corresponding to the square is 0-255, and 0-255 can be further divided into 0-127 and 128-255; the partition range corresponding to the triangle is 256-511; the range of the circular corresponding subarea is 512-767. Different partition ranges correspond to different read/write processors and different partition ranges correspond to different memory partitions (i.e., databases). The database of the storage layer includes OSS database, Mysql database, and other databases (e.g., redis database, Hbase database).
The storage layer further includes an Electronic Search (ES) for acquiring corresponding data from different databases and aggregating the data to obtain a final retrieval result.
The following is a detailed description of data storage and data retrieval, respectively.
Example one
An embodiment of the present application provides a data storage method, fig. 2 is a first flowchart of the data storage method in the embodiment of the present application, and as shown in fig. 2, the data storage method may specifically include:
step 201: acquiring data to be stored comprising M attributes; wherein M is a positive integer;
it should be noted that the to-be-stored data refers to data having a structure different from that of the stored data, compared with the stored data.
Here, for example, using the course class data, if the stored data includes an on-demand course, and the data to be stored includes a paid course, the on-demand course and the paid course have the same attribute (e.g., the course name, etc.) and also have respective unique attributes (e.g., the on-demand course has a video-related attribute, and the paid course has a price-related attribute). Therefore, the data structure of the data to be stored changes at any time, so that the data storage method mainly aims at the heterogeneous data storage method.
Step 202: classifying the M attributes according to the attribute characteristics of the M attributes to obtain first class data with the attribute characteristics being system attributes and second class data with the attribute characteristics being non-system attributes;
it should be noted that the attribute characteristics of the attribute include a system attribute and a non-system attribute. The system attribute is a common attribute of each attribute in the data to be stored, and the non-system attribute is a non-common attribute of each attribute in the data to be stored.
It should be noted that, in step 202, the attribute feature of each attribute is mainly obtained, and then the M attributes are classified according to the attribute feature of each attribute. The attribute features generally include system attributes and non-system attributes, so the statistical attribute features here are N attributes of system attributes as the first class of data, and P attributes of non-system attributes as the second class of data. Wherein N + P ═ M, and N and P are positive integers.
In some embodiments, each attribute in the data to be stored includes a first attribute identifier and an attribute value; the method further comprises the following steps: acquiring metadata corresponding to the data to be stored; wherein the metadata includes custom attribute information, the custom attribute information including: self-defining attribute characteristics and an attribute set corresponding to the attribute characteristics, wherein the attribute set comprises a second attribute identifier of at least one attribute; and using the first attribute identifier of the target attribute and the mapping relation between the first attribute identifier and the second attribute identifier to take the corresponding custom attribute feature in the metadata as the attribute feature of the target attribute.
It should be noted that, this embodiment is a method for obtaining the attribute feature of each attribute, and the method mainly sets the attribute feature of each attribute in the metadata corresponding to the data to be stored in a self-defined manner in advance, and then determines the attribute feature of the corresponding attribute in the data to be stored according to the attribute feature set in the metadata.
It should be noted that each attribute in the data to be stored includes a first attribute identifier and an attribute value, the metadata is custom attribute information of the first attribute identifier for each attribute in the data to be stored, and the custom attribute information at least includes a custom attribute feature and a second attribute identifier. Here, in determining the attribute characteristics of the target attribute, the metadata may be queried using the data to be stored, or the data to be stored may be queried using the metadata. Specifically, iterating a first attribute identifier of each attribute in the data to be stored, determining a second attribute identifier corresponding to the first attribute identifier according to the mapping relation, and taking a custom attribute feature corresponding to the second attribute identifier as an attribute feature of a corresponding attribute; or iterating each piece of custom attribute information in the metadata, determining a first attribute identifier corresponding to a second attribute identifier in each piece of custom attribute information according to the mapping relation, and taking the custom attribute feature corresponding to the second attribute identifier as the attribute feature of the corresponding attribute.
Step 203: and storing the first type of data to a first storage partition, and storing the second type of data to a second storage partition.
It should be noted that, in step 203, the first type data and the second type data classified in the previous step are respectively stored in different storage partitions, so that the subsequent retrieval of the data can be performed in a targeted manner, that is, if only the first type data needs to be acquired, the first storage partition is accessed; if only the second type of data needs to be acquired, the second storage partition is accessed.
In some embodiments, the step specifically includes: determining a second attribute identifier corresponding to the first attribute identifier of each attribute by using the mapping relation; taking the second attribute identifier of each attribute in the first type of data as a key, taking an attribute value as a value, and storing the attribute value in the first storage partition; and storing the second attribute identifier of each attribute in the second type data as a key and the attribute value as a value to the second storage partition.
It should be noted that, when the first type of data whose attribute features are system attributes is stored, the first attribute identifier of each attribute is replaced with the second attribute identifier and used as a key, and the attribute value corresponding to the first attribute identifier is used as a value and stored in the first storage partition. Here, the second attribute identification may be an attribute logical name.
It should be noted that, when storing the second type data whose attribute features are non-system attributes, the first attribute identifier of each attribute is replaced with the second attribute identifier and used as a key, and the attribute value corresponding to the first attribute identifier is used as a value and stored in the second storage partition. Here, the second attribute identification may be an attribute ID. The attribute identification of the attribute is replaced by the attribute ID, namely the attribute identifier reference, so that the subsequent modification is facilitated and a certain space is saved compared with the complicated attribute identification (such as an attribute English name).
In some embodiments, the custom attribute feature comprises a system attribute feature and a plurality of non-system attribute features, the second storage partition comprising a plurality of second child storage partitions; the storing the second type of data to a second storage partition includes: classifying the second class of data according to non-system attributes to obtain at least two third class of data; and storing the third type of data to the corresponding second sub-storage partition according to the non-system attribute type of the third type of data.
It should be noted that, since the non-system attribute features include a plurality of kinds, the second type data needs to be classified again.
Here, the non-system attribute features include at least a basic attribute or a small text attribute, a large text attribute, and a frequently updated attribute. And dividing the second class of data into three third classes of data according to the non-system attribute characteristic types, namely a basic attribute or a small text attribute, a large text attribute and a frequent updating attribute, and respectively storing the three third classes of data into corresponding different second sub-storage partitions.
The above-mentioned frequently updated attribute may be a browsing volume.
In some embodiments, the custom attribute information further includes a partition identifier corresponding to the custom attribute feature; the method further comprises the following steps: determining a first partition identifier in the metadata according to the system attribute characteristics of the first type of data; determining a second partition identifier in the metadata according to the non-system attribute characteristics of the third type of data; the storing the first type of data to a first storage partition and the storing the second type of data to a second storage partition includes: storing the first type of data to a corresponding first storage partition according to the first partition identification; and storing the third class of data to a corresponding second sub-storage partition according to the second partition identification.
It should be noted that, on the basis of the previous embodiment, in order to clearly distinguish the attribute characteristics of each attribute, different partition identifiers are preset for the system attribute characteristics and the non-system attribute characteristics in the embodiment; different partition identifications are preset aiming at different non-system attribute characteristics. Whether the attribute characteristics of at least two attributes are the same can be distinguished directly through the partition identification.
Specifically, the partition identifier of the first type of data is determined as a first partition identifier in the metadata, a preset partition range is determined based on the first partition identifier, and then the first type of data is stored in a first storage partition corresponding to the partition range. And determining the partition identifier of the second type of data as a second partition identifier in the metadata, determining a preset partition range based on the second partition identifier, and storing the second type of data to a second sub-storage partition corresponding to the partition range. Here, the different partition ranges correspond to different second child storage partitions.
Exemplarily, if the partition identifier of the system attribute feature is set as a 0 partition, the partition identifier of the non-system attribute feature is set as a non-0 partition, specifically, the partition range set for the basic attribute or the small text attribute of the non-system attribute feature is 1-127, the partition range set for the large text attribute of the non-system attribute feature is 128-255, the partition range set for the frequently updated attribute of the non-system attribute feature is 256-511, and the partition range set for the uniqueness constraint attribute of the non-system attribute feature is 512-767. Here, the partition identifier of the first type of data in the metadata is determined to be a partition 0, a first storage partition (such as a Mysql database) corresponding to the partition 0 is determined, and then the first type of data is stored in the Mysql database. Determining the partition identifier of the second type of data as a 1 partition in the metadata, determining a partition range 1-127 corresponding to the 1 partition, and storing the second type of data into a second sub-storage partition (such as a Mysql database) corresponding to the partition range 1-127; determining the partition identifier of the second type of data as a 128 partition in the metadata, determining the partition range corresponding to the 128 partition as 128-; determining the partition identifier of the second type of data as 256 partitions in the metadata, determining the partition range corresponding to the 128 partition as 256-pass 511, and storing the second type of data to a second sub-storage partition (such as a Mysql database) corresponding to the partition range 256-pass 511; and determining that the partition identifier of the second type of data is 512 partition in the metadata, determining that the partition range corresponding to 512 partition is 512-767, and storing the second type of data to a second sub-storage partition (such as Mysql database) corresponding to the partition range 512-767.
Here, the execution subject of steps 201 to 203 may be a processor of a data storage device.
By adopting the technical scheme, the data to be stored is classified according to the attribute characteristics of each attribute in the data to be stored, the data to be stored is divided into first-class data with the attribute characteristics being system attributes and second-class data with the attribute characteristics being non-system attributes, and then the first-class data and the second-class data are respectively stored in the corresponding storage partitions. By performing attribute partition storage on data to be stored, data with the same attribute characteristics are stored in the same partition in a centralized manner, and the data do not need to be stored in a row, so that a large amount of row data is avoided to a certain extent.
Based on the above embodiments, the present application provides an example of a UML class diagram, and fig. 3 is a schematic diagram of the UML class diagram in the embodiment of the present application, as shown in fig. 3, specifically,
the metadata refers to 3 tables of entry _ type (entity type), entry _ attribute (entity attribute), and entry _ attribute _ enum.
Here, the attributes in the entry _ type (entity type) include: id (long), root _ id (long), parent _ id (long), site id (long), namespace (String), entity name (String), entity complex alias (String), entity single alias (String), configuration information in JSON format (String), version number (int), state (int), creator (String), modifier (String), creation time (Date), and modification time (Date).
Root _ id (long) is used for realizing entity type inheritance, and a child type can inherit the attribute of a parent type and can also rewrite the attribute of the parent type.
parent _ id (long) is used to implement entity type inheritance, and a child type can inherit or overwrite the properties of a parent type.
Site id (long) is used to support different sites and data logical isolation of different sites. Such as: a merchant help center, a merchant learning center, etc.
Configuration information (String) is a JSON array format for configuring filters and their execution order. Each object typically has several options: filter (filter name), order (filter execution order), params (some personalization parameters of the filter, optional part).
And the version number (int) of the entity is equal to the maximum version number in the entity attribute set, and is used for realizing compatibility of different versions of data.
The attributes in the entity _ attribute include: id (long), parent _ id (long), entity type id (long), attribute name (String), attribute alias (String), system attribute (String), data type (String), canonical check (String), data partition (int), document separator (ES attribute, String), document tokenizer (ES attribute, String), document data type (ES attribute, String), whether a document is searchable (ES attribute, byte), whether a document is stored (ES attribute, byte), document nested object (ES attribute, byte), unique constraint grouping (int), unique constraint index order (int), whether it is necessary to fill in (byte), default value (String), remark (String), version number (int), state (int), creator (String), modifier (String), creation time (Date), modification time (Date).
The parent _ id (long) is used to implement a composite type attribute, and the attribute may be an object (a common object or a nested object, different types are selected according to different query modes).
Entity type id (long) points to the primary key of the entity type table, indicating to which entity type the attribute belongs.
The attribute name (String), the chinese name of the attribute, is used as a comment.
Attribute alias (String), which is the english name of an attribute, the system generates a hump format name (the first letter immediately after underlining is changed to upper case and underlining is removed) from the attribute alias and is used to interact with the client. Naming specification: a combination of lower case + number + underline, the number cannot appear first and the words are separated by underline.
The system attribute (String), which is specifically referred to as an attribute (or a generic attribute or a global attribute) in the entry table, needs to be unified in name (attribute alias is mapped to the system attribute) during storage because naming rules of all service systems are different, and then reverse operation (system attribute is restored to attribute alias) is performed during retrieval. For example, the primary key of the curriculum schedule of the learning center is the core _ id (attribute alias), which can be mapped to the service primary key of the system attribute, and the creator may be the update _ user, modifier, etc., which can be mapped to the system attribute modifier. The system attributes include (parent _ id, service primary key, status, creator, modifier, creation time, modification time).
Data type (String), i.e., java data type. Currently, it supports: byte, short, int, long, float, double, decimal, enum, date, string.
Regular checking (String), i.e. verifying whether the value of the attribute is legal using a regular expression, such as: range checking, length checking, etc.
Data partition (int) is a partition according to attribute characteristics. For example: 000 to 255: and partitioning the JSON. 000: partition 0 is a system partition, and does not store any data, and the partition of the system attribute is fixed to 0. 001 to 127: basic data partition, basic type attribute or small text attribute. 128-255: OSS partition, very long text, large objects, files, pictures. 256 to 511: statistical class, numeric class (frequently updated) partitions. 512-767: and the unique key partitions ensure the uniqueness of the attribute values and are used together with the unique constraint groups and the indexes.
Document delimiters (ES attributes, String) are used to convert the original String data (e.g., business enrollment, platform rules, industry standards) into array storage by designated delimiters to facilitate retrieval from single or multiple values.
The document's tokenizer (ES property, String), i.e., the tokenizer used in building the full-text index.
Document data type (ES attribute, String), currently supports: byte, short, integer, integer _ range, long, long _ range, float, float _ range, double, double _ range, coolean, date, date _ range, ip, ip _ range, keyword, text.
The index order (int) is uniquely constrained, i.e., this field is valid only when the data partition is within the unique key partition. The attributes of the same unique key partition generate a hash value by index order (sequence), like the Seq _ in _ index of the database, starting with 1. For example: unique constraints are created for two attributes (app _ key, external activity ID) in the activity entity, the data partition for both attributes is 512, the index order for app _ key is 1, and the index order for external activity ID is 2. The way the hash is computed is hash (app _ key: external activity ID).
Fill-necessary (byte) refers to whether the value of an attribute must be filled.
A default value (String), a default value displayed when the value is empty.
Version number (int), attribute version number, multiple versions of the implementation attribute coexist. For example: in version v1, the article abstract attribute and all other attributes are stored in partition 1. Due to the requirement of business development, the length of the summary is increased, and the summary needs to be stored in an OSS partition. Add v2 version of the digest attribute, partition 128. At this time, new and old data coexist in the system, and if the version number of the data is v1 (version in the entity table), the digest information is searched from the partition 1, and if the version number is v2, the digest information is searched from the OSS storage of the partition 128.
The attribute in the entry _ attribute _ enum (entity attribute enumeration) includes: id (long), attribute id (long), enumeration index (int), enumeration key (String), enumeration value (String), state (int), creator (String), modifier (String), creation time (Date), modification time (Date).
Wherein, the attribute ID (Long) points to the primary key of the entity attribute table.
The enumeration index (int), i.e. the enumeration code, is a number, which is typically used by the storage layer.
Enumeration key (String), english name, usually corresponds to the edit name of java enumeration class.
Enumerated values (String), chinese names, corresponding custom enumerated names, used to describe the enumeration.
After classifying the data to be stored according to the attribute characteristics of the attributes in the metadata, four tables are obtained, which include: entity, entity _ unique, entity _ text, entity value-JSON, and entity _ long.
Here, the attributes in the entity include: id (long), parent _ id (long), service primary key (long), entity type id (long), language (int), version number (int), state (int), creator (String), modifier (String), creation time (Date), modification time (Date).
Wherein, parent _ id (long) realizes the To-One association: if a comment has multiple replies, the parent _ id of each reply points to the comment; and realizing multiple languages, wherein a plurality of pieces of sub-language data are arranged under the main language data, and the parent _ id of each sub-language points to the main language.
Entity type id (long) is used to point to the primary key of the entity type table.
The language (int) is used to support internationalization, or multiple languages for a site.
The state (int) refers to the state of the business data, and the negative is invalid data.
The attributes in the entity _ unique (entity uniqueness constraint) include: id (long), entity type id (long), entity id (long), data partition (int), hash (long), value _255byte (string).
Where entity id (long) is used to point to the primary key of the entity table.
Entity type id (long) is used to point to the primary key of the entity type table.
Data partition (int) is a partition according to attribute characteristics. For example: 000 to 255: and partitioning the JSON. 000: partition 0 is a system partition, and does not store any data, and the partition of the system attribute is fixed to 0. 001 to 127: basic data partitions, basic type attributes or small text attributes, the attributes of the same partition may be stored together in JSON format. 128-255: OSS partitions (i.e., OSS database storage) store very long text, large objects, files, pictures, etc., and store only one link or key that can uniquely represent this object. 256 to 511: statistical class, numeric class (frequently updated) partitions. 512-767: and the unique key partition ensures the uniqueness of the attribute value and is used together with the unique constraint grouping and the index.
Hash (long), which is to splice attribute values with the same partition in order and perform hash calculation (ie93jd34:20001), this field is only to narrow the query range (when hash collision occurs), so it is also necessary to compare whether the values are equal. Adding common indexes to the three [ entity type id, data partition and hash ] value class fields, judging whether repeated data exists or not by a write LOCK (SELECT … FOR UPDATE) or a read LOCK (LOCK IN SHARE MODE) at an application layer, wherein the filtering conditions are as follows:
WHERE entity type id? AND data partition? AND hash? AND value?
The value _255byte (string), JSON format, for example: { "21": ie93jd34","28": 20001" }.
The attributes in the entry _ text (entity value-JSON) include: id (long), entity id (long), data partition (int), value (JSON, 8192BYTE, String).
Where entity id (long) is used to point to the primary key of the entity table.
Data partition (int) is a partition according to attribute characteristics. For example: 000 to 255: and partitioning the JSON. 000: partition 0 is a system partition, and does not store any data, and the partition of the system attribute is fixed to 0. 001 to 127: basic data partitions, basic type attributes or small text attributes, the attributes of the same partition may be stored together in JSON format. 128-255: OSS partitions (i.e., OSS database storage) store very long text, large objects, files, pictures, etc., and store only one link or key that can uniquely represent this object. 256 to 511: statistical class, numeric class (frequently updated) partitions. 512-767: and the unique key partition ensures the uniqueness of the attribute value and is used together with the unique constraint grouping and the index.
The attributes in the entity _ long (entity value-value) include: id (long), entity id (long), attribute id _0(long), attribute id _1(long), value _0(long), and value _1 (long).
Where entity id (long) is used to point to the primary key of the entity table.
Attribute id _0(long) is used to point to the primary key of the entity attribute table. Rule: the attributes of the data partition ranging from 256 to 511, MUST, are stored here, typically for the frequently updated value class field. In a physical type, partitions with different attributes cannot be repeated within the range of 256 to 511 [ MUST NOT ].
Value _0(long), i.e. the value of the attribute.
Attribute ID _1(long), optional part, when most entity types have at least 2-4 frequently updated numerical value attributes, the table can be properly expanded, a plurality of key value pairs are added, the key corresponds to the attribute ID, and the value corresponds to the value of the attribute.
Value _1(long), the value of the attribute.
Here, an entity _ relationship is also included, and attributes thereof include: id (int), entity id-source _ id (int), entity id-target _ id (int), and state (int).
The entity id-source _ id (int) and the entity id-target _ id (int) are used for implementing To-Many association, for example: given a user, find a list of stores in which the user is interested, if the user entity type (entity _ type) ID is less than the store entity type ID, the condition is: and (4) source _ id (user _ id), wherein the inquired target _ id list is the IDs list of the shop entity, otherwise, the condition is as follows: and (4) obtaining a target _ id (user _ id), wherein the inquired source _ id list is the IDs list of the shop entities. Given a store, find which users the store is interested in, and vice versa.
State (int), state 1 indicates valid, state-1 indicates delete.
For the 1..1 appearing in fig. 3, it means that an object of another class is related to only one object of the class, 0.. means that an object of another class is related to zero or more objects of the class, 1.. means that an object of another class is related to one or more objects of the class, and 0..1 means that an object of another class is not related to or is related to only one object of the class.
Example two
On the basis of the above embodiments, the present application also provides a data storage method, which includes a data classification method (i.e., fig. 4 and 5) and a storage method after data classification (i.e., fig. 6).
Fig. 4 is a schematic flowchart of a data classification method in an embodiment of the present application, and as shown in fig. 4, specific steps may include:
step 401: starting;
step 402: data to be stored and corresponding metadata;
here, the data to be stored includes M attributes, and the metadata is custom attribute information for each attribute.
Step 403: acquiring a custom attribute information set from the metadata;
here, the custom attribute information set includes custom attribute information corresponding to each attribute, the custom attribute information includes custom attribute features and other attribute sets, and the other attribute sets include attribute logical names.
Step 404: iterating the custom attribute information set;
namely, the custom attribute information corresponding to each attribute in the custom attribute information set is iterated in sequence.
Step 405: whether there is a next one; if yes, go to step 406; if not, go to step 413;
step 406: obtaining the custom attribute information of the next attribute;
step 407: searching a first attribute identifier of a corresponding attribute in the data to be stored according to the attribute logic name in the user-defined attribute information;
each attribute in the M attributes includes a first attribute identifier and an attribute value, the self-defined attribute information of the metadata includes a logic name (i.e., a second attribute identifier) of each attribute, the logic name of each attribute and the first attribute identifier have a mapping relationship, and the first attribute identifier of the corresponding attribute in the data to be stored can be searched according to the logic name of the attribute, so that the self-defined attribute feature corresponding to the logic name of the attribute is the attribute feature of the corresponding attribute in the data to be stored.
Here, after step 407 is usually executed, some pre-processing is performed on the attribute values, i.e., steps 501 to 509.
Fig. 5 is a schematic diagram illustrating a flow of attribute value preprocessing in an embodiment of the present application. As shown in fig. 5, the specific steps include the following:
step 501: whether a default value is set; if yes, go to step 502; if not, go to step 503;
the default value refers to a default value. Here, a default value is set, the default value is not null, and the attribute value of the entity attribute is null, step 502 is executed; if the attribute value of the entity attribute is not null, step 503 is executed.
Step 502: converting the default value to the data type specified in the attribute metadata;
step 503: converting the attribute value of the entity attribute into a data type specified in the attribute metadata;
step 502 and step 503 realize data type conversion through DataType, DataType is self-defined data type enumeration, a Function interface of java is realized, an application method is used for converting data types, and different enumeration types have respective realization.
Step 504: checking the data type;
step 505: checking whether a regular expression of a check value is required to be filled;
step 506: checking whether a check value range is required;
step 505 and step 506 implement data type verification through DataType.
Step 507: whether the verification is passed; if yes, go to step 508; if not, go to step 509;
step 508: whether the attribute value is null; if yes, go to step 412; if not, go to step 408;
step 509: throwing out abnormal error information;
step 408: whether the attribute is a system attribute; if yes, go to step 409; if not, go to step 410;
here, the system attribute is a common attribute of each attribute in the data to be stored, and the non-system attribute is a non-common attribute of each attribute in the data to be stored.
Here, if the attribute is a system attribute, the attribute corresponds to a partition 0; if the attribute is a non-system attribute, it corresponds to a non-0 partition.
Step 409: acquiring an attribute logic name of the attribute mapped in the metadata and using the attribute logic name as a KEY;
step 410: acquiring an attribute ID of the attribute in the metadata and using the attribute ID as a KEY;
step 411: setting the attribute value in the attribute partition set corresponding to the attribute value;
step 412: finishing the current attribute processing;
step 413: processing the system attribute set to populate values onto the system attributes of the entities;
step 414: iterating a non-system attribute partition set, wherein KEY is a partition number, and VALUE is an attribute set;
step 415: constructing an entity value object instance, setting a partition and setting a JSON value;
step 416: setting an entity value object set for the current entity;
step 417: and (6) ending.
Based on the steps in fig. 4 and fig. 5, if the data to be stored includes KEY1-VALUE1, KEY2-VALUE2, KEY3-VALUE3 … KEY-VALUE, the classified data to be stored are: partition 0 includes KEY1-VALUE1 and KEY2-VALUE 2; partition 1 comprises KEY3-VALUE3, KEY4-VALUE4 and KEY5-VALUE 5; partition 128 includes KEY6-VALUE 6; partition 256 includes KEY7-VALUE 7. Here, the same partition may store at least one different attribute. In addition, the partitioning can be continued according to the requirement, which is not specifically described herein.
Aiming at the non-system attribute characteristics, data of different partitions are required to be stored into different storage partitions, so that the storage method after the non-system attribute characteristic data are classified is provided. Fig. 6 is a flowchart illustrating a storage method after data classification in an embodiment of the present application.
As shown in fig. 6, the specific steps may include:
step 601: acquiring the range of the partition according to the partition identification of the attribute; if the partition range is 1-127, go to step 602; if the partition range is 128-255, executing step 603; if the partition range is 256-511, execute step 607; if the partition range is 512-767, go to step 609;
step 602: determining a processor within a partition range of 1-127;
here, the 1 ~ 127 partition range stores basic attributes or related data of small text data.
The processor in the partition range of 1-127 operates the corresponding storage partition (such as a Mysql database) through the EntityText object, specifically, step 613 is executed, and if it is determined that the primary key exists, it is indicated that the originally stored data needs to be updated, step 615 is executed; if it is determined that the primary key does not exist, it indicates that the originally stored data needs to be updated, then step 614 is executed.
Step 603: determining a processor in a 128-255 partition range;
step 604: generating OSS unique KEY;
step 605: writing the large text into an OSS, and acquiring a writing result MD 5;
step 606: taking KEY and MD5 as new values to replace the original content as a link to the OSS file;
here, the 128-channel 255 partition range stores the related data of the large text attribute, and stores a corresponding link when storing the related data.
Operating a corresponding storage partition (such as an OSS database) through an EntityText object by using a processor in the 128-channel 255 partition range, specifically executing step 613, if it is determined that a primary key exists, indicating that the data stored originally needs to be updated, namely executing step 615; if it is determined that the primary key does not exist, it indicates that the originally stored data needs to be updated, then step 614 is executed.
Step 607: determining processors within the 256-511 partition range;
step 608: converting the EntityText object into an entityLong object;
here, the 256-511 partition range stores related data of frequently updated attributes.
Converting an EntityText object into an entityLong object by using a processor in the partition range of 256-511, and then operating a corresponding storage partition (such as a Mysql database), specifically executing step 613, if a main key is determined to exist, indicating that the originally stored data needs to be updated, namely executing step 615; if it is determined that the primary key does not exist, it indicates that the originally stored data needs to be updated, and step 614 is executed.
Step 609: determining a processor in a partition range of 512-767;
here, the 512 ~ 767 partition range stores uniqueness constraint related data. The processors in the 512-767 partition range are generally higher than the processors in other partition ranges.
Step 610: converting the EntityText object into an EntityUnique object;
and converting the EntityText object into an EntityUnique object by using a processor in a partition range of 512-767.
Step 611: a write lock mode/read lock mode;
in order to ensure the global uniqueness of the joint attribute, the range of 512-767 partitions does not support updating operation.
Step 612: generating a HASH value according to the sequence of the joint unique KEY in the metadata;
step 613: whether a primary key exists; if yes, go to step 614; if not, go to step 615;
step 614: an INSERT database;
step 615: the UPDATE database.
Based on the above steps, the partitioning can be continued, and different storage partitions, such as Hbase database, are set for different partition ranges. And will not be described in detail herein.
By adopting the technical scheme, the data to be stored is classified according to the attribute characteristics of each attribute in the data to be stored, the data to be stored is divided into first-class data with the attribute characteristics being system attributes and second-class data with the attribute characteristics being non-system attributes, and then the first-class data and the second-class data are respectively stored in the corresponding storage partitions. By performing attribute partition storage on data to be stored, data with the same attribute characteristics are stored in the same partition in a centralized manner, and the data do not need to be stored in a row, so that a large amount of row data is avoided to a certain extent.
EXAMPLE III
An embodiment of the present application provides a data retrieval method, fig. 7 is a first flow diagram of the data retrieval method in the embodiment of the present application, and as shown in fig. 7, the data retrieval method is applied to a full-text search engine, and the specific steps may include:
step 701: acquiring retrieval information;
it should be noted that the retrieval information is obtained by using a full text search Engine (ES).
Here, the ES may implement dynamic addition of heterogeneous attributes, which defines how these dynamically added attributes should be mapped onto appropriate data types, for the purpose of seamless linking with the database storing data and external indexing.
Specifically, the ES constructs indexes through a Dynamic template (Dynamic templates) technology, and automatically establishes attribute mapping relationships with each database, that is, after a new custom attribute information is added to a certain attribute in the database and data is inserted, the ES automatically creates missing attribute information according to metadata information of an attribute type to which the data belongs, so that the ES has flexibility and can well retrieve heterogeneous data, and therefore, the ES is requested to retrieve data through the ES.
Here, in order to avoid too many (more than 1000) attribute information in a single index (index) when an index is constructed, n indexes with the same structure may be created and consistent hash calculation is used, so that all data of a fixed site fall into the same index, and it is ensured that the number of attribute information of each index does not exceed a preset threshold.
Step 702: determining N first attribute identifications according to the retrieval information; wherein N is a positive integer;
it should be noted that the ES acquires the search information and determines N first attribute identifiers included in the search information, and the ES specifically executes the search operation according to the N first attribute identifiers.
Step 703: searching a storage partition according to the N first attribute identifications to obtain corresponding N sections of search data; the storage partition is used for storing the attribute in a partition mode according to the attribute characteristics;
it should be noted that, because the data is stored in different storage partitions according to the attribute characteristics of the attributes during data storage, the storage partition can be determined for the retrieval data to be obtained during acquisition, and the retrieval data can be directly obtained from the determined storage partition without acquiring unnecessary data, thereby improving a certain retrieval speed.
When the data retrieval is realized by the ES, the complexity of different underlying storage technologies is shielded, a uniform retrieval entry is provided, and corresponding data can be quickly acquired from different storage partitions.
In some embodiments, the specific steps include: determining N second attribute identifiers of the N first attribute identifiers based on a preset mapping relation between each first attribute identifier and the corresponding second attribute identifier; and searching a storage partition based on the N second attribute identifications to acquire the N sections of retrieval data.
Here, when the ES is used to realize data retrieval, the stored data (i.e., the attribute value) cannot be directly obtained according to the second attribute identifier of the attribute, the second attribute identifier of the attribute needs to be replaced, and the corresponding retrieved data is searched by using the replaced identifier (hereinafter, referred to as a third attribute identifier), so that the corresponding third attribute identifier is set in advance in a self-defined manner for the second attribute identifier of each attribute.
Specifically, a first attribute identifier of an attribute is input, a second attribute identifier corresponding to the first attribute identifier needs to be replaced by a third attribute identifier during retrieval, corresponding retrieval data is obtained through the third attribute identifier, and then the third attribute identifier is replaced by the second attribute identifier, so that retrieval data corresponding to the second attribute identifier can be obtained, and retrieval data corresponding to the first attribute identifier is obtained.
For example, if the name of the query instructor is equal to the course of a certain king, the input first attribute identifier is a certain king, and the second attribute identifier in the storage process may be: $ teacher _ name ═ king, and when an ES is actually used for searching, the physical name of the attribute is used: keyword _12 is somebody in king (i.e., the third attribute identification). In practical application, replacing $ { teacher _ name } ═ king with keyword _12 ═ king is completed through a placeholder replacement process.
Step 704: and aggregating the N sections of retrieval data according to a preset aggregation strategy to obtain a retrieval result corresponding to the retrieval information.
It should be noted that the preset aggregation policy is a policy of aggregating according to the arrangement order of the N first attribute identifiers. The obtained N sections of retrieval data are aggregated according to the arrangement sequence of the N first attribute identifications, and retrieval results corresponding to the N first attribute identifications are obtained.
By adopting the technical scheme, the full-text search engine is utilized to respectively search the N first attribute identifications from different storage partitions to obtain N sections of search data, and the N sections of search data are aggregated to obtain a search result. Because the full-text search engine provides a uniform retrieval interface, the complexity of retrieving and aggregating data from different storage partitions is shielded, and the retrieval efficiency is improved.
Example four
The embodiment of the present application further provides a data retrieval method, and fig. 8 is a second flow diagram of the data retrieval method in the embodiment of the present application.
As shown in fig. 8, the specific steps include the following:
step 801: retrieve DSL resolution (original retrieval);
here, a Domain Specific Language (DSL) is used to parse the first attribute identification of the attribute to be retrieved.
If the first attribute identifier is a certain attribute, the second attribute identifier corresponding to the analyzed first attribute identifier may be: $ teacher _ name ═ king.
Step 802: a first placeholder replacement;
and replacing $ { teacher _ name } -, a king certain with a keyword _12 ═ a king certain (third attribute identification) through a placeholder replacement process.
Step 803: retrieving the DSL security wrapper;
step 804: constructing an ES retrieval condition;
the retrieval conditions include at least one condition in step 805.
Step 805: multiple languages; paging related information and aggregation related information; querying the first attribute identification include/include; constructing a sort default descending order according to the creation time; routing information;
step 806: using ES to execute search to obtain corresponding search result;
here, a search result corresponding to king (third attribute identifier) is obtained.
Step 807: a second placeholder replacement;
and replacing keyword _12 ═ king (third attribute identification) with $ { teacher _ name } -, king (third attribute identification) through a placeholder replacement process.
Step 808: and processing the retrieval result.
Here, the metadata corresponding to each attribute in the data to be stored includes a plurality of custom attribute information, different custom attribute information corresponds to different partitions, only data of a specified partition may be retrieved during retrieval, and data of all partitions of the attribute may also be retrieved, so that the present application further defines the analysis content in step 801, that is, defines the partition to be retrieved, and fig. 9 is a second sub-flow diagram of the data retrieval method in the embodiment of the present application.
As shown in fig. 9, the specific steps include the following:
step 901: whether to inquire only the data of the specified partition; if yes, go to step 902; if not, go to step 903;
step 902: constructing a set of specified partition ranges;
step 903: constructing a set of all partition ranges contained in the metadata;
step 904: iterating the partition range set;
step 905: whether there is a next one; if not, go to step 906; if yes, go to step 907;
step 906: completing the representation retrieval;
step 907: searching a corresponding read processor according to the partition range;
step 908: requesting task splitting;
step 909: determining a read processor in a 1-127 partition range, and searching a corresponding database to obtain first data;
step 910: determining a read processor in a 128-255 partition range, and searching a corresponding database to obtain second data;
step 911: determining a read processor in the range of 265-511 partitions, and searching a corresponding database to obtain third data;
step 912: determining a read processor in the partition range of 512-767, and searching a corresponding database to obtain fourth data;
step 913: and performing aggregation operation on the first data, the second data, the third data and the fourth data to obtain corresponding aggregation results.
Based on the above steps, after the partition storage is performed according to the attribute characteristics of the attributes, the data corresponding to the required partition range can be searched in a targeted manner during the search, so that the data corresponding to other partition ranges does not need to be searched, and certain search efficiency is improved.
By adopting the technical scheme, the full-text search engine is utilized to respectively search the N first attribute identifications from different storage partitions to obtain N sections of search data, and the N sections of search data are aggregated to obtain a search result. Because the full-text search engine provides a uniform retrieval interface, the complexity of retrieving and aggregating data from different storage partitions is shielded, and the retrieval efficiency is improved.
EXAMPLE five
In order to implement the method of the embodiment of the present application, based on the same inventive concept, a data storage device is further provided in the embodiment of the present application, and fig. 10 is a schematic structural diagram of the data storage device in the embodiment of the present application. As shown in fig. 10, the data storage device includes:
a first obtaining unit 1001 configured to obtain data to be stored including M attributes; wherein M is a positive integer;
a classifying unit 1002, configured to classify the M attribute pairs according to attribute features of the M attributes, so as to obtain first class data with attribute features being system attributes and second class data with attribute features being non-system attributes;
the storage unit 1003 is configured to store the first type of data in a first storage partition, and store the second type of data in a second storage partition.
In some embodiments, each attribute in the data to be stored includes a first attribute identifier and an attribute value; the method further comprises the following steps: acquiring metadata corresponding to the data to be stored; wherein the metadata includes custom attribute information, the custom attribute information including: self-defining attribute characteristics and an attribute set corresponding to the attribute characteristics, wherein the attribute set comprises a second attribute identifier of at least one attribute; and using the first attribute identifier of the target attribute and the mapping relation between the first attribute identifier and the second attribute identifier to take the corresponding custom attribute feature in the metadata as the attribute feature of the target attribute.
In some embodiments, the apparatus includes a storage unit 1003, specifically configured to determine, by using the mapping relationship, a second attribute identifier corresponding to the first attribute identifier of each attribute; taking the second attribute identifier of each attribute in the first type of data as a key, taking an attribute value as a value, and storing the attribute value in the first storage partition; and storing the second attribute identifier of each attribute in the second type data as a key and the attribute value as a value to the second storage partition.
In some embodiments, the custom attribute feature comprises a system attribute feature and a plurality of non-system attribute features, the second storage partition comprising a plurality of second child storage partitions; the storing the second type of data to a second storage partition includes: classifying the second class of data according to non-system attributes to obtain at least two third class of data; and storing the third type of data to the corresponding second sub-storage partition according to the non-system attribute type of the third type of data.
In some embodiments, the custom attribute information further includes a partition identifier corresponding to the custom attribute feature; the method further comprises the following steps: determining a first partition identifier in the metadata according to the system attribute characteristics of the first type of data; determining a second partition identifier in the metadata according to the non-system attribute characteristics of the third type of data; the storing the first type of data to a first storage partition and the storing the second type of data to a second storage partition includes: storing the first type of data to a corresponding first storage partition according to the first partition identification; and storing the third class of data to a corresponding second sub-storage partition according to the second partition identification.
By adopting the technical scheme, the data to be stored is classified according to the attribute characteristics of each attribute in the data to be stored, the data to be stored is divided into first-class data with the attribute characteristics being system attributes and second-class data with the attribute characteristics being non-system attributes, and then the first-class data and the second-class data are respectively stored in the corresponding storage partitions. By performing attribute partition storage on data to be stored, data with the same attribute characteristics are stored in the same partition in a centralized manner, and the data do not need to be stored in a row, so that a large amount of row data is avoided to a certain extent.
EXAMPLE six
In order to implement the method of the embodiment of the present application, based on the same inventive concept, a full text search engine is further provided in the embodiment of the present application, and fig. 11 is a first structural schematic diagram formed by the full text search engine in the embodiment of the present application. As shown in fig. 11, the full text search engine includes:
a second acquisition unit 1101 that acquires retrieval information;
a determining unit 1102, configured to determine N first attribute identifiers according to the search information; wherein N is a positive integer;
the searching unit 1103 searches for a storage partition according to the N first attribute identifiers, and obtains N corresponding segments of retrieval data; the storage partition is used for storing the attribute in a partition mode according to the attribute characteristics;
and the aggregation unit 1104 aggregates the N segments of search data according to a preset aggregation policy to obtain a search result corresponding to the search information.
In some embodiments, the apparatus comprises: a searching unit 1103, configured to determine, based on a preset mapping relationship between each first attribute identifier and a corresponding second attribute identifier, N second attribute identifiers of the N first attribute identifiers; and searching a storage partition based on the N second attribute identifications to acquire the N sections of retrieval data.
In some embodiments, the apparatus comprises: the aggregating unit 1104 is specifically configured to aggregate the N segments of search data according to the arrangement order of the N first attribute identifiers, so as to obtain the search result.
By adopting the technical scheme, the full-text search engine is utilized to respectively search the N first attribute identifications from different storage partitions to obtain N sections of search data, and the N sections of search data are aggregated to obtain a search result. Because the full-text search engine provides a uniform retrieval interface, the complexity of retrieving and aggregating data from different storage partitions is shielded, and the retrieval efficiency is improved.
The embodiment of the present application further provides another data storage device, and fig. 12 is a schematic structural diagram of the data storage device in the embodiment of the present application. As shown in fig. 12, the data storage device includes: a processor 1201 and a memory 1202 configured to store a computer program capable of running on the processor;
wherein the processor 1201 is configured to execute the method steps in the aforementioned embodiments when running the computer program.
Of course, in actual practice, the various components in the data storage device are coupled together by a bus system 1203 as shown in FIG. 12. It will be appreciated that the bus system 1203 is used to implement the connection communication between these components. The bus system 1203 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for the sake of clarity the various busses are labeled in figure 12 as the bus system 1203.
Fig. 13 is a second schematic structural diagram of the full-text search engine in the embodiment of the present application. As shown in fig. 13, the full text search engine includes: a processor 1301 and a memory 1302 configured to store a computer program operable on the processor;
wherein the processor 1301 is configured to execute the method steps in the preceding embodiments when running the computer program.
Of course, in practice, as shown in FIG. 13, the various components of the full text search engine are coupled together via a bus system 1303. It is understood that the bus system 1303 is used to enable connection communication between these components. The bus system 1303 includes a power bus, a control bus, and a status signal bus, in addition to the data bus. But for clarity of illustration the various buses are labeled in figure 13 as the bus system 1303.
In practical applications, the processor may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, and a microprocessor. It is understood that the electronic device for implementing the above processor function may be other devices, and the embodiments of the present application are not limited in particular.
The Memory may be a volatile Memory (volatile Memory), such as a Random-Access Memory (RAM); or a non-volatile Memory (non-volatile Memory), such as a Read-Only Memory (ROM), a flash Memory (flash Memory), a Hard Disk (HDD), or a Solid-State Drive (SSD); or a combination of the above types of memories and provides instructions and data to the processor.
In an exemplary embodiment, the present application further provides a computer-readable storage medium for storing a computer program.
Optionally, the computer-readable storage medium may be applied to any one of the methods in the embodiments of the present application, and the computer program enables a computer to execute corresponding processes implemented by a processor in each method in the embodiments of the present application, which is not described herein again for brevity.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or certain features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may be separately used as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit. Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments.
Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict.
The features disclosed in the several method or apparatus embodiments provided in this application may be combined in any combination to arrive at a new method or apparatus embodiment without conflict.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (13)

1. A method of data storage, the method comprising:
acquiring data to be stored comprising M attributes; wherein M is a positive integer;
classifying the M attributes according to the attribute characteristics of the M attributes to obtain first class data with the attribute characteristics being system attributes and second class data with the attribute characteristics being non-system attributes;
and storing the first type of data to a first storage partition, and storing the second type of data to a second storage partition.
2. The method according to claim 1, wherein each attribute in the data to be stored comprises a first attribute identifier and an attribute value; the method further comprises the following steps:
acquiring metadata corresponding to the data to be stored; wherein the metadata includes custom attribute information, the custom attribute information including: self-defining attribute characteristics and an attribute set corresponding to the attribute characteristics, wherein the attribute set comprises a second attribute identifier of at least one attribute;
and using the first attribute identifier of the target attribute and the mapping relation between the first attribute identifier and the second attribute identifier to take the corresponding custom attribute feature in the metadata as the attribute feature of the target attribute.
3. The method of claim 2,
the storing the first type of data to a first storage partition and the storing the second type of data to a second storage partition includes:
determining a second attribute identifier corresponding to the first attribute identifier of each attribute by using the mapping relation;
taking the second attribute identifier of each attribute in the first type of data as a key, taking an attribute value as a value, and storing the attribute value in the first storage partition;
and storing the second attribute identifier of each attribute in the second type data as a key and the attribute value as a value to the second storage partition.
4. The method of claim 2,
the custom attribute features comprise system attribute features and a plurality of non-system attribute features, and the second storage partition comprises a plurality of second sub-storage partitions;
the storing the second type of data to a second storage partition includes:
carrying out non-system attribute classification on the second class data to obtain at least two third class data;
and storing the third type of data to a corresponding second sub-storage partition according to the non-system attribute type of the third type of data.
5. The method of claim 4, wherein the custom attribute information further includes a partition identifier corresponding to the custom attribute feature;
the method further comprises the following steps:
determining a first partition identifier in the metadata according to the system attribute characteristics of the first type of data;
determining a second partition identifier in the metadata according to the non-system attribute characteristics of the third type of data;
the storing the first type of data to a first storage partition and the storing the second type of data to a second storage partition includes:
storing the first type of data to a corresponding first storage partition according to the first partition identification;
and storing the third class of data to a corresponding second sub-storage partition according to the second partition identification.
6. A data retrieval method is applied to a full-text search engine, and is characterized by comprising the following steps:
acquiring retrieval information;
determining N first attribute identifications according to the retrieval information; wherein N is a positive integer;
searching a storage partition according to the N first attribute identifications to obtain corresponding N sections of retrieval data; the storage partition is used for storing the attribute in a partition mode according to the attribute characteristics;
and aggregating the N sections of retrieval data according to a preset aggregation strategy to obtain a retrieval result corresponding to the retrieval information.
7. The method according to claim 6, wherein said searching for the storage partition according to the N first attribute identifiers to obtain the corresponding N segments of search data comprises:
determining N second attribute identifications of the N first attribute identifications based on a preset mapping relation between each first attribute identification and the corresponding second attribute identification;
and searching a storage partition based on the N second attribute identifications to acquire the N sections of retrieval data.
8. The method according to claim 6, wherein the aggregating the N segments of search data according to a preset aggregation policy to obtain the search result corresponding to the search information comprises:
and aggregating the N sections of retrieval data according to the arrangement sequence of the N first attribute identifications to obtain the retrieval result.
9. A data storage device, characterized in that the device comprises:
a first obtaining unit configured to obtain data to be stored including M attributes; wherein M is a positive integer;
the classification unit is used for classifying the M attribute pairs according to the attribute characteristics of the M attributes to obtain first-class data with the attribute characteristics being system attributes and second-class data with the attribute characteristics being non-system attributes;
and the storage unit is used for storing the first type of data to a first storage partition and storing the second type of data to a second storage partition.
10. A data retrieval apparatus for use in a full text search engine, the apparatus comprising:
a second acquisition unit that acquires retrieval information;
a determining unit, configured to determine N first attribute identifiers according to the search information; wherein N is a positive integer;
the searching unit is used for searching the storage partitions according to the N first attribute identifications and acquiring corresponding N sections of retrieval data; the storage partition is used for storing the attribute in a partition mode according to the attribute characteristics;
and the aggregation unit is used for aggregating the N sections of retrieval data according to a preset aggregation strategy to obtain a retrieval result corresponding to the retrieval information.
11. A data storage device, the data storage device comprising: a processor and a memory configured to store a computer program capable of running on the processor,
wherein the processor is configured to perform the steps of the method of any one of claims 1 to 5 when running the computer program.
12. A full text search engine, comprising: a processor and a memory configured to store a computer program capable of running on the processor,
wherein the processor is configured to perform the steps of the method of any one of claims 6 to 8 when running the computer program.
13. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.
CN202110414397.9A 2021-04-16 2021-04-16 Heterogeneous data storage and retrieval method, device, equipment and storage medium Pending CN113282579A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110414397.9A CN113282579A (en) 2021-04-16 2021-04-16 Heterogeneous data storage and retrieval method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110414397.9A CN113282579A (en) 2021-04-16 2021-04-16 Heterogeneous data storage and retrieval method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113282579A true CN113282579A (en) 2021-08-20

Family

ID=77276918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110414397.9A Pending CN113282579A (en) 2021-04-16 2021-04-16 Heterogeneous data storage and retrieval method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113282579A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114121139A (en) * 2022-01-27 2022-03-01 合肥悦芯半导体科技有限公司 Chip testing method and device, electronic equipment and storage medium
CN114925145A (en) * 2022-05-25 2022-08-19 盐城金堤科技有限公司 Data storage method and device, storage medium and electronic equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114121139A (en) * 2022-01-27 2022-03-01 合肥悦芯半导体科技有限公司 Chip testing method and device, electronic equipment and storage medium
CN114121139B (en) * 2022-01-27 2022-05-17 合肥悦芯半导体科技有限公司 Chip testing method and device, electronic equipment and storage medium
CN114925145A (en) * 2022-05-25 2022-08-19 盐城金堤科技有限公司 Data storage method and device, storage medium and electronic equipment
CN114925145B (en) * 2022-05-25 2024-05-14 盐城天眼察微科技有限公司 Data storage method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
US11468103B2 (en) Relational modeler and renderer for non-relational data
AU2017243870B2 (en) "Methods and systems for database optimisation"
US8443002B2 (en) Operationally complete hierarchical repository in a relational database
US20110314060A1 (en) Markup language based query and file generation
DE202014010938U1 (en) Omega name: name generation and derivation
US20140046928A1 (en) Query plans with parameter markers in place of object identifiers
US10445675B2 (en) Confirming enforcement of business rules specified in a data access tier of a multi-tier application
CN113282579A (en) Heterogeneous data storage and retrieval method, device, equipment and storage medium
CN111708805A (en) Data query method and device, electronic equipment and storage medium
CN108140022B (en) Data query method and database system
US20230350899A1 (en) Query engine for recursive searches in a self-describing data system
US20080294673A1 (en) Data transfer and storage based on meta-data
US11567969B2 (en) Unbalanced partitioning of database for application data
CN112912870A (en) Tenant identifier conversion
US20230385308A1 (en) Conversion and migration of key-value store to relational model
WO2002103573A1 (en) A flexible virtual database system including a hierarchical application parameter repository
US7536398B2 (en) On-line organization of data sets
CN114118944A (en) Forensic laboratory grading management method, terminal device and storage medium
CN114385145A (en) Web system back-end architecture design method and computer equipment
EP3436988B1 (en) "methods and systems for database optimisation"
CN116783587A (en) Data storage for list-based data searching
CN113779068A (en) Data query method, device, equipment and storage medium
CN114385555A (en) Data query method, device, equipment and storage medium
CN111881220A (en) Data operation method and device under list storage, electronic equipment and storage medium
CN112596719A (en) Method and system for generating front-end and back-end codes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination