CN110263082B - Data distribution analysis method and device of database, electronic equipment and storage medium - Google Patents

Data distribution analysis method and device of database, electronic equipment and storage medium Download PDF

Info

Publication number
CN110263082B
CN110263082B CN201910447214.6A CN201910447214A CN110263082B CN 110263082 B CN110263082 B CN 110263082B CN 201910447214 A CN201910447214 A CN 201910447214A CN 110263082 B CN110263082 B CN 110263082B
Authority
CN
China
Prior art keywords
data
separator
database
sample
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910447214.6A
Other languages
Chinese (zh)
Other versions
CN110263082A (en
Inventor
姜华
牟晓光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201910447214.6A priority Critical patent/CN110263082B/en
Publication of CN110263082A publication Critical patent/CN110263082A/en
Application granted granted Critical
Publication of CN110263082B publication Critical patent/CN110263082B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to a data distribution analysis method and device for a database, an electronic device and a storage medium. The method comprises the steps of identifying a separator in each data sample in a target data sample set, and separating the data sample into at least one data segment according to the separator; then determining the data structure of the data sample according to the arrangement sequence of the separator, the at least one data segment, the separator and the at least one data segment in the data sample; finally, generating analysis result information for representing data distribution of the database according to the data structures of all the data samples in the target data sample set; by the mode, all the data samples are analyzed, so that the obtained data distribution information of the data samples is more reliable; moreover, manual intervention is not needed, the efficiency of data distribution information analysis is improved, and the labor cost is reduced.

Description

Data distribution analysis method and device of database, electronic equipment and storage medium
Technical Field
The present disclosure relates to data statistical analysis, and in particular, to a method and an apparatus for analyzing data distribution of a database, an electronic device, and a storage medium.
Background
A DataBase (DB) is an organized, shared, and uniformly managed collection of data that is stored long term in a computer. The database has the characteristics of good data structuring, good sharing, good independence, small storage granularity and the like, and provides a friendly interface for users.
For a user of a database to use a data sample in the database, basic knowledge of the data structure of the data sample in the entire database is required to determine how to use or process the data sample in the database according to the learned data basic structure. The conventional basic understanding of the data structure is as follows: a part of data samples are extracted from a large number of data samples of the database, and then the extracted part of data samples are analyzed by management personnel, so that data distribution information of the data samples in the database is approximately known. However, since the extracted data samples are only a part of the large number of data samples of the database, the reliability of the data distribution information of the obtained data samples is not high, and a large amount of labor cost is wasted to analyze the extracted data samples one by one.
Disclosure of Invention
The purpose of the present disclosure is to provide a method, an apparatus, an electronic device, and a storage medium for analyzing data distribution of a database, so as to obtain data distribution information of data samples of the database with high reliability at high efficiency and low cost.
In order to achieve the above object, a first aspect of the embodiments of the present disclosure provides a data distribution analysis method for a database, including:
acquiring a target data sample set from a database;
identifying a delimiter in each data sample in the target set of data samples and separating the data sample into at least one data segment according to the delimiter;
determining a data structure of the data sample according to the arrangement sequence of the separator, the at least one data segment, the separator and the at least one data segment in the data sample;
and generating analysis result information for characterizing the data distribution of the database according to the data structures of all the data samples in the target data sample set.
Optionally, before the determining the data structure of the data sample according to the arrangement order of the separator, the at least one data segment, the separator and the at least one data segment in the data sample, the method further includes:
and caching each data segment in different addresses of the first ordered linked list and the separator in different addresses of the second ordered linked list one by one so as to record the arrangement sequence of the separator and the at least one data segment in the data sample.
Optionally, each of the data segments includes character string length information and/or a data type, and the determining the data structure of the data sample according to the arrangement order of the separator, the at least one data segment, the separator, and the at least one data segment in the data sample includes:
and determining the data structure of the data sample according to the separator, the character string length information and/or the data type, and the arrangement sequence of the separator and the at least one data segment in the data sample.
Optionally, the generating, according to the data structure of all the data samples in the target data sample set, analysis result information for characterizing data distribution of the database includes:
classifying the data samples according to a preset classification rule;
counting the quantity of each type of data samples, and determining the proportion of each type of data samples;
and generating analysis result information for representing the data distribution of the database according to the data structure and the proportion of each type of the data sample.
Optionally, the obtaining a target data sample set from a database includes:
identifying non-character strings in the database according to the metadata information in the database;
and eliminating non-character strings in the database to obtain the target sample data set.
A second aspect of the embodiments of the present disclosure provides an apparatus for analyzing data distribution of a database, where the apparatus includes:
a data acquisition module configured to acquire a set of target data samples from a database;
a data identification module configured to identify a separator in each data sample in the target set of data samples and separate the data sample into at least one data segment according to the separator;
a data structure determination module configured to determine a data structure of the data sample according to the arrangement order of the separator, the at least one data segment, the separator and the at least one data segment in the data sample;
a result generation module configured to generate analysis result information for characterizing data distribution of the database according to the data structures of all the data samples in the target data sample set.
Optionally, the apparatus further comprises: and the sequential recording module is configured to buffer each data segment in different addresses of the first ordered linked list and buffer the separator in different addresses of the second ordered linked list one by one so as to record the arrangement sequence of the separator and the at least one data segment in the data sample.
Optionally, the data structure determining module is specifically configured to determine the data structure of the data sample according to the separator, the character string length information and/or the data type, and the arrangement order of the separator and the at least one data segment in the data sample.
Optionally, the result generation module includes:
a classification submodule configured to classify the data samples according to preset classification rules;
the proportion determining submodule is configured to count the number of the data samples of each type and determine the proportion of the data samples of each type;
and the result generation submodule is configured to generate analysis result information for characterizing the data distribution of the database according to the data structure and the proportion of each type of the data sample.
Optionally, the data obtaining module includes:
a character recognition sub-module configured to recognize a non-character string in a database according to metadata information in the database;
and the character eliminating submodule is configured to eliminate non-character strings in the database to obtain the target sample data set.
A third aspect of embodiments of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method of the first aspect.
A fourth aspect of an embodiment of the present disclosure provides an electronic device, including:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the steps of the method of the first aspect.
By adopting the technical scheme provided by the disclosure, the following technical effects can be at least achieved:
the data distribution analysis method and device for the database, the electronic device and the storage medium provided by the disclosure are used for identifying a separator in each data sample in a target data sample set and dividing the data sample into at least one data segment according to the separator; then determining the data structure of the data sample according to the arrangement sequence of the separator, the at least one data segment, the separator and the at least one data segment in the data sample; finally, generating analysis result information for representing data distribution of the database according to the data structures of all the data samples in the target data sample set; by the mode, all the data samples are analyzed, so that the obtained data distribution information of the data samples is more reliable; moreover, manual intervention is not needed, the efficiency of data distribution information analysis is improved, and the labor cost is reduced.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
fig. 1 is an interaction diagram of an application environment of a data distribution analysis method for a database provided by an embodiment of the present disclosure;
FIG. 2 is a flowchart of a method for analyzing data distribution of a database according to an embodiment of the present disclosure;
FIG. 3 is a flowchart of a data distribution analysis method for a database according to another embodiment provided by an embodiment of the present disclosure;
FIG. 4 is a flowchart of a data distribution analysis method for a database according to another embodiment provided by an embodiment of the present disclosure;
FIG. 5 is a flowchart of a data distribution analysis method for a database according to another embodiment provided by an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a data distribution analysis apparatus of a database according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a data distribution analysis apparatus of a database according to another embodiment provided in an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of a data distribution analysis apparatus of a database according to another embodiment provided in an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of a data distribution analysis apparatus of a database according to another embodiment provided in an embodiment of the present disclosure;
fig. 10 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure.
Detailed Description
The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
It should be noted that the terms "first," "second," and the like in the embodiments of the present disclosure are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
The data distribution analysis method and apparatus, the electronic device and the storage medium provided by the embodiment of the disclosure can be applied to the application environment shown in fig. 1. As shown in fig. 1, a client 100 and an electronic device (the electronic device in fig. 1 adopts a server 200, and certainly, the electronic device is not limited to adopt the server 200) are located in a network 300, and the client 100 and the server 200 perform data interaction through the network 300. In the embodiment of the present disclosure, at least one Application (APP) is installed in the client 100, and corresponds to the server 200 to provide services for the user. The server 200 may be, but is not limited to, a web server, a database server, a cloud server, and the like. The client 100 may be, but is not limited to, a smart phone, a Personal Computer (PC), a tablet PC, a Personal Digital Assistant (PDA), a Mobile Internet Device (MID), and the like. The operating system of the client 100 may be, but is not limited to, an Android system, an ios (internet operating system) system, a Windows phone system, a Windows system, and the like.
The embodiment of the disclosure provides a data distribution analysis method of a database, which is applied to electronic equipment. It is understood that, before executing the method provided by the embodiments of the present disclosure, a user may set the obtaining conditions of the data samples in an operation interface of an application program of a client, for example, set the number of the obtained data samples, and the dimension information of the data samples; the dimension information of the data can include time, region, subject and the like; after the user sets the data acquisition condition, the user clicks an analysis button, and then the client sends an analysis instruction carrying the data acquisition condition to the electronic equipment. As shown in fig. 2, the data distribution analysis method of the database includes:
s21: a set of target data samples is obtained from a database.
The electronic device may obtain the target data sample set from the database according to the analysis instruction after responding to the analysis instruction sent by the client. For example, if the number of data samples carried in the analysis instruction is ten thousand, the electronic device obtains ten thousand data samples from the database by using a random sampling algorithm; for another example, if the data dimension information carried in the analysis instruction is beijing quarter, the electronic device takes out all data samples of beijing quarter from the database; of course, the number of data samples and the data dimension information are merely examples, and are not limited herein. In this embodiment of the present disclosure, the data sample in the database may be customer information data in a customer relationship management system, for example, each piece of customer information data includes: customer name, customer phone, customer mailbox, customer identification number, etc., by way of example only.
S22: a delimiter in each data sample in the target set of data samples is identified and the data samples are separated into at least one data segment according to the delimiter.
Specifically, the process of identifying the separator may be: and matching the character of each data sample with the separator in a preset separator set, and determining the character as the separator if the separator set contains the character. Wherein the separator can be, but is not limited to, "-", "@", "&", "+", "", "_", "&", "|", "", and the like. In addition, the user can modify and update the preset separator set through the client, for example, add ","; "into a set of separators. For example, for the string "fxsxd @163. com", it is possible to recognize "@", "@" as a separator, and separate the string "fxsxxd @163. com" into three data pieces, which are "fxsxd", "163", "com", respectively.
S23: and determining the data structure of the data sample according to the arrangement sequence of the separator, the at least one data segment, the separator and the at least one data segment in the data sample.
Optionally, the data structure of the data sample may be determined according to the separator, the string length information and/or the data type of the data segment, and the arrangement order of the separator and the at least one data segment in the data sample.
For example, in the character string "fxsxd @163. com", when the character string length information and the data type are considered, the ordering order of the data segment and the separator is "fxsxd" (the character string length is 5, the data type is letter and may be denoted as X), "@" and "163" (the character string length is 3, the data type is number and may be denoted as Y), "-", "com" (the character string length is 3, the data type is letter and may be denoted as X in the above manner), and the determined data structure may be "X (5) @ Y (3). X (3)"; for another example, in the character string "fxsxd @163. com", when only the character string length information of the data segment is considered, the ordering order of the data segment and the separator is "fxsxd" (character string length is 5), "@", "163" (character string length is 3), "-", "com" (character string length is 3), and the determined data structure may be "X (5) @ X (3). X (3)"; for another example, in the character string "fxsxd @163. com", when the character string length information and the data type of the data segment are considered, and the sorting order of the data segment and the separator is "fxsxxd" (the data type is an alphabet and may be denoted by X), "@", "163" (the data type is a number and may be denoted by Y), ". and" com "(the data type is an alphabet and may be denoted by X in the above manner), the determined data structure may be" X @ Y.X ", and the determination of the data structure is not limited to the above manner, and is merely an example.
S24: and generating analysis result information for characterizing the data distribution of the database according to the data structures of all the data samples in the target data sample set.
For example, the data structure of all data samples may be counted, and then analysis result information of the data distribution may be generated. Optionally, the analysis result information of the data distribution may be fed back to the client for display, so that the user may directly know the data condition in the database through the analysis result information of the data distribution.
By adopting the method steps, the separator in each data sample in the target data sample set is identified, and the data sample is separated into at least one data segment according to the separator; then determining the data structure of the data sample according to the arrangement sequence of the separator, the at least one data segment, the separator and the at least one data segment in the data sample; finally, generating analysis result information for representing data distribution of the database according to the data structures of all the data samples in the target data sample set; by the mode, all the data samples are analyzed, so that the obtained data distribution information of the data samples is more reliable; moreover, manual intervention is not needed, the efficiency of data distribution information analysis is improved, and the labor cost is reduced.
Optionally, before S23, as shown in fig. 3, the method further includes:
s31: and caching each data segment in different addresses of the first ordered linked list and the separator in different addresses of the second ordered linked list one by one so as to record the arrangement sequence of the separator and the at least one data segment in the data sample.
The ordered linked list is a linked list in which data are ordered from a head node to the end of the linked list, and each node can represent the cache address of the data. The specific implementation of step 31 may be: according to the identified sequence of the data segments, caching the identified data segments into different nodes one by one from the head node of the first ordered linked list; and caching the identified separators into different nodes one by one from the head node of the second ordered linked list according to the identified sequence of the separators. The order of each data segment and separator can be recorded by the position of each data segment in the first ordered linked list and the position of each separator in the second ordered linked list. For example, for the character string "fxsxd @163. com", starting from the head node of the first ordered linked list, the data segments cached by each node are "fxsxxd", "163", "com" in sequence; starting from the head node of the second ordered linked list, the data segments cached by each node are ' @ ' and '.
When the data structure of the character string ' fxsxd @163.com ' needs to be determined, firstly, taking out ' fxsxxd ' from the first node in the first ordered linked list for recognition to generate the data structure of the data segment, then taking out ' @ from the first node in the second ordered linked list, then taking out ' 163 ' from the second node in the first ordered linked list for recognition to generate the data structure of the data segment, and repeating the steps until the last data segment of the character string, and finally generating the data structure of the character string according to the sequence of the taken out data structures.
In addition, alternatively, as shown in fig. 4, the above S21 may be implemented by:
s211: and identifying the non-character strings in the database according to the metadata information in the database.
In a database, data samples are typically stored in the form of tables, each row or column of each table being labeled with metadata information. For example, for a customer data sample in the database of "kingdom, 13256238792, fxsxxd @163.com, beijing city haichi district, 5 months and 1 day 2019"; the metadata information corresponding to each piece of data of the customer data sample is name, telephone, mailbox, residence and date respectively. Therefore, based on the metadata information, a non-character string in the database can be determined, wherein the non-character string comprises date, time stamp, large field and the like.
S212: and eliminating non-character strings in the database to obtain the target sample data set.
For example, for the example in S211, for a row or a column of data whose metadata information is a date, the data does not belong to a character string, so that a data structure cannot be recognized, and in order to avoid interference with finally generated analysis result information representing data distribution of the database, the row or the column of data whose metadata information is a date needs to be deleted, so as to obtain a target sample data set.
Alternatively, as shown in fig. 5, S24 may be embodied by:
s241: and classifying the data samples according to a preset classification rule.
The preset classification rule may be to classify data samples having identical data structures into one class, for example, for data samples "fxsxd @163. com", "yueli @163. com", "hongqi @163. com"; the data structures of the former two can be determined as "X (5) @ X (3). X (3)", and the data samples "fxsxxd @163. com" and "yueli @163. com" can be classified into one class; the data structure of the latter is determined as "X (6) @ X (3). X (3)", and "hongqi @163. com" is classified as one; in addition, the preset classification rule may also be to classify the data samples that contain the same separators and have the same arrangement order among the separators into a class, for example, for the data samples "fxsxd @163. com", "hongqi @163. com", "010-; the delimiters of the two previous delimiters are '@' and the arrangement sequence of the two delimiters in the sample is the same, so that 'fxsxd @163. com' and 'hongqi @163. com' can be classified into one category; the latter delimiters are determined as "-", "-", and "010-.
S242: and carrying out quantity statistics on the data samples of each type, and determining the proportion of the data samples of each type.
For example, for the data sample in S241, it may be determined that the percentage of data samples of the data structure of X (5) @ X (3). X (3) class is 66.7%; the percentage of data samples having a data structure of "X (6) @ X (3). X (3)" is 33.3%.
S243: and generating analysis result information for representing the data distribution of the database according to the data structure and the proportion of each type of the data sample.
For example, the analysis result information of the data distribution generated for the data sample in S242 is "X (5) @ X (3). X (3) 66.7%"; "X (6) @ X (3) · X (3) 33.3%", obviously, the user can know the condition of the data in the database quickly and accurately through the analysis result information of the data distribution, for example, it can be known that the above-mentioned character string is the mailbox address through the analysis result information. In one case, when the user finds that X (6) @ X (3). X (6)3% is included in the analysis result information, since suffixes of mailboxes are all.com, the data structure of the last data field should be X (3), and it can be known from X (6) @ X (3). X (6)3% included in the analysis result information that the data sample in which errors are recorded for the suffixes of mailbox addresses accounts for 3%.
The embodiment of the disclosure further provides a data distribution analysis device for a database, which is used for implementing the data distribution analysis method for the database provided by the embodiment of the method. It should be noted that the basic principle and the generated technical effect of the data distribution analysis device of the database provided by the embodiment of the present invention are the same as those of the above embodiment, and for the sake of brief description, corresponding contents in the above embodiment may be referred to for the part not mentioned in the embodiment of the present invention. As shown in fig. 6, the apparatus 600 includes a data acquisition module 601, a data identification module 602, a data structure determination module 603, and a result generation module 604.
The data acquisition module 601 is configured to acquire a set of target data samples from a database.
It is understood that the data acquisition module 601 may perform S21 described above.
The data identification module 602 is configured to identify a separator in each data sample in the target set of data samples and separate the data sample into at least one data segment according to the separator.
It is to be appreciated that the data identification module 602 may perform S22 described above.
The data structure determining module 603 is configured to determine the data structure of the data sample according to the arrangement order of the separator, the at least one data segment, the separator and the at least one data segment in the data sample.
It is to be understood that the data structure determining module 603 may perform S23 described above.
The result generation module 604 is configured to generate analysis result information for characterizing data distribution of the database according to the data structure of all data samples in the target data sample set.
It is understood that the result generation module 604 may perform S24 described above.
The device is adopted to identify a separator in each data sample in a target data sample set and separate the data sample into at least one data segment according to the separator; then determining the data structure of the data sample according to the arrangement sequence of the separator, the at least one data segment, the separator and the at least one data segment in the data sample; finally, generating analysis result information for representing data distribution of the database according to the data structures of all the data samples in the target data sample set; by the mode, all the data samples are analyzed, so that the obtained data distribution information of the data samples is more reliable; moreover, manual intervention is not needed, the efficiency of data distribution information analysis is improved, and the labor cost is reduced.
Optionally, as shown in fig. 7, the apparatus 600 further includes:
the sequential recording module 701 is configured to buffer each data segment in different addresses of the first ordered linked list and buffer the separator in different addresses of the second ordered linked list one by one, so as to record an arrangement sequence of the separator and the at least one data segment in the data sample.
It is to be understood that the sequential recording module 701 may perform S31 described above.
Optionally, the data structure determining module 603 is specifically configured to determine the data structure of the data sample according to the separator, the character string length information and/or the data type, and the arrangement order of the separator and the at least one data segment in the data sample.
Optionally, as shown in fig. 8, the data obtaining module 601 includes:
a character recognition sub-module 801 configured to recognize non-character strings in the database according to the metadata information in the database.
It is to be understood that the character recognition sub-module 801 may perform S211 described above.
And the character eliminating submodule 802 is configured to eliminate non-character strings in the database to obtain the target sample data set.
It is understood that the classification sub-module 901 may perform S212 described above.
Optionally, as shown in fig. 9, the result generating module 604 includes:
a classification submodule 901 configured to classify the data sample according to a preset classification rule.
It is understood that the classification submodule 901 may perform the above S241.
And a ratio determining submodule 902 configured to perform statistics on the number of the data samples of each type, and determine a ratio of the data samples of each type.
It is to be appreciated that the duty ratio determination module can perform S242 described above.
And a result generation submodule 903 configured to generate analysis result information for characterizing data distribution of the database according to the data structure and the proportion of each type of the data sample.
It is understood that the result generation sub-module 903 may perform S243 described above.
A fourth aspect of an embodiment of the present disclosure provides an electronic device, including:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the steps of the embodiments of the data distribution analysis method for the database provided by the above embodiments.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 10 is a block diagram illustrating an electronic device 1000 in accordance with an example embodiment. For example, the electronic device 1000 may be provided as a server. Referring to fig. 10, the electronic device 1000 includes a processor 1022, which may be one or more in number, and a memory 1032 for storing computer programs executable by the processor 1022. The computer programs stored in memory 1032 may include one or more modules that each correspond to a set of instructions. Further, the processor 1022 may be configured to execute the computer program to perform the data distribution analysis method of the database described above. For example, processor 1022 may perform the following operations: acquiring a target data sample set from a database; identifying a delimiter in each data sample in the target set of data samples and separating the data sample into at least one data segment according to the delimiter; determining a data structure of the data sample according to the arrangement sequence of the separator, the at least one data segment, the separator and the at least one data segment in the data sample; generating analysis result information for representing data distribution of the database according to data structures of all data samples in the target data sample set; and outputting analysis result information for representing the data distribution of the database.
Additionally, the electronic device 1000 may also include a power component 1026 and a communication component 1050, the power component 1026 may be configured to perform power management for the electronic device 1000, and the communication component 1050 may be configured to enable communication for the electronic device 1000, e.g., wired or wireless communication. In addition, the electronic device 1000 may also include input/output (I/O) interfaces 1058. The electronic device 1000 may operate based on an operating system stored in memory 1032, such as Windows Server, Mac OS XTM, UnixTM, Linux, and the like.
In another exemplary embodiment, there is also provided a computer readable storage medium including program instructions which, when executed by a processor, implement the steps of the above-described method of analyzing data distribution of a database. For example, the computer readable storage medium may be the memory 1032 comprising program instructions executable by the processor 1022 of the electronic device 1000 to perform the method of analyzing the data distribution of the database described above.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims (8)

1. A method for analyzing data distribution of a database, the method comprising:
acquiring a target data sample set from a database;
identifying a delimiter in each data sample in the target set of data samples and separating the data sample into at least one data segment according to the delimiter;
determining a data structure of the data sample according to the arrangement sequence of the separator, the at least one data segment, the separator and the at least one data segment in the data sample;
generating analysis result information for representing data distribution of the database according to data structures of all data samples in the target data sample set;
before the determining the data structure of the data sample according to the arrangement order of the separator, the at least one data segment, the separator and the at least one data segment in the data sample, the method further includes:
and caching each data segment in different addresses of the first ordered linked list and the separator in different addresses of the second ordered linked list one by one so as to record the arrangement sequence of the separator and the at least one data segment in the data sample.
2. The method according to claim 1, wherein each of the data segments comprises character string length information and/or a data type, and the determining the data structure of the data sample according to the arrangement order of the separator, the at least one data segment, the separator and the at least one data segment in the data sample comprises:
and determining the data structure of the data sample according to the separator, the character string length information and/or the data type, and the arrangement sequence of the separator and the at least one data segment in the data sample.
3. The method according to claim 1 or 2, wherein the generating analysis result information for characterizing the data distribution of the database according to the data structures of all the data samples in the target data sample set comprises:
classifying the data samples according to a preset classification rule;
counting the quantity of each type of data samples, and determining the proportion of each type of data samples;
and generating analysis result information for representing the data distribution of the database according to the data structure and the proportion of each type of the data sample.
4. The method of claim 1 or 2, wherein the obtaining a set of target data samples from a database comprises:
identifying non-character strings in the database according to the metadata information in the database;
and eliminating non-character strings in the database to obtain the target sample data set.
5. An apparatus for analyzing data distribution of a database, the apparatus comprising:
a data acquisition module configured to acquire a set of target data samples from a database;
a data identification module configured to identify a separator in each data sample in the target set of data samples and separate the data sample into at least one data segment according to the separator;
a data structure determination module configured to determine a data structure of the data sample according to the arrangement order of the separator, the at least one data segment, the separator and the at least one data segment in the data sample;
a result generation module configured to generate analysis result information for characterizing data distribution of the database according to data structures of all data samples in the target data sample set;
the device further comprises: and the sequential recording module is configured to buffer each data segment in different addresses of the first ordered linked list and buffer the separator in different addresses of the second ordered linked list one by one so as to record the arrangement sequence of the separator and the at least one data segment in the data sample.
6. The apparatus according to claim 5, wherein each of the data segments comprises character string length information and/or a data type, and the data structure determining module is specifically configured to determine the data structure of the data sample according to the separator, the character string length information and/or the data type, and an arrangement order of the separator and the at least one data segment in the data sample.
7. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.
8. An electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 4.
CN201910447214.6A 2019-05-27 2019-05-27 Data distribution analysis method and device of database, electronic equipment and storage medium Active CN110263082B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910447214.6A CN110263082B (en) 2019-05-27 2019-05-27 Data distribution analysis method and device of database, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910447214.6A CN110263082B (en) 2019-05-27 2019-05-27 Data distribution analysis method and device of database, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110263082A CN110263082A (en) 2019-09-20
CN110263082B true CN110263082B (en) 2022-04-29

Family

ID=67915522

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910447214.6A Active CN110263082B (en) 2019-05-27 2019-05-27 Data distribution analysis method and device of database, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110263082B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111261244A (en) * 2020-01-19 2020-06-09 戴纳智慧医疗科技有限公司 Sample information acquisition and storage system and method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1920858A (en) * 2005-08-26 2007-02-28 毛山 Two-dimensional bar code based information transfer method and encoding/decoding device
CN105335652A (en) * 2015-11-24 2016-02-17 小米科技有限责任公司 Debug method and debug device of application process of mobile terminal
CN106886417A (en) * 2017-03-09 2017-06-23 华东师范大学 A kind of universal parallel method for digging of linear temporal specification
CN107038118A (en) * 2017-03-28 2017-08-11 福建星云电子股份有限公司 The universal process method that a kind of distinct electronic apparatuses assembling is tested
CN109614382A (en) * 2018-12-11 2019-04-12 杭州数梦工场科技有限公司 A kind of the log dividing method and device of application

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1920858A (en) * 2005-08-26 2007-02-28 毛山 Two-dimensional bar code based information transfer method and encoding/decoding device
CN105335652A (en) * 2015-11-24 2016-02-17 小米科技有限责任公司 Debug method and debug device of application process of mobile terminal
CN106886417A (en) * 2017-03-09 2017-06-23 华东师范大学 A kind of universal parallel method for digging of linear temporal specification
CN107038118A (en) * 2017-03-28 2017-08-11 福建星云电子股份有限公司 The universal process method that a kind of distinct electronic apparatuses assembling is tested
CN109614382A (en) * 2018-12-11 2019-04-12 杭州数梦工场科技有限公司 A kind of the log dividing method and device of application

Also Published As

Publication number Publication date
CN110263082A (en) 2019-09-20

Similar Documents

Publication Publication Date Title
CN109033471B (en) Information asset identification method and device
US11347851B2 (en) System and method for file artifact metadata collection and analysis
US20150205847A1 (en) Database generation from a spreadsheet
WO2020219862A1 (en) Machine learning classifier for identifying internet service providers from website tracking
US20130198240A1 (en) Social Network Analysis
CN112347165B (en) Log processing method and device, server and computer readable storage medium
CN110555172A (en) user relationship mining method and device, electronic equipment and storage medium
CN111163072A (en) Method and device for determining characteristic value in machine learning model and electronic equipment
CN111400361A (en) Data real-time storage method and device, computer equipment and storage medium
JP2019191975A (en) Talent selection device, talent selection system, talent selection method, and program
CN109658120B (en) Service data processing method and device
CN110648172A (en) Identity recognition method and system fusing multiple mobile devices
US20160248724A1 (en) Social Message Monitoring Method and Apparatus
CN113051460A (en) Elasticissearch-based data retrieval method and system, electronic device and storage medium
US8396877B2 (en) Method and apparatus for generating a fused view of one or more people
CN110263082B (en) Data distribution analysis method and device of database, electronic equipment and storage medium
CN113438123B (en) Network flow monitoring method and device, computer equipment and storage medium
CN108011936B (en) Method and device for pushing information
CN109710811A (en) Detection method, equipment and the application system of user's portrait
CN111209325B (en) Service system interface identification method, device and storage medium
CN110737691B (en) Method and apparatus for processing access behavior data
JP2019083012A (en) Adaptive event aggregation
CN105677827B (en) A kind of acquisition methods and device of list
CN112347066B (en) Log processing method and device, server and computer readable storage medium
CN114117239A (en) House resource pushing method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant