CN112199048B - Data reading method, system, device and medium - Google Patents

Data reading method, system, device and medium Download PDF

Info

Publication number
CN112199048B
CN112199048B CN202011127396.8A CN202011127396A CN112199048B CN 112199048 B CN112199048 B CN 112199048B CN 202011127396 A CN202011127396 A CN 202011127396A CN 112199048 B CN112199048 B CN 112199048B
Authority
CN
China
Prior art keywords
disk
data
pod
service
ceph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011127396.8A
Other languages
Chinese (zh)
Other versions
CN112199048A (en
Inventor
贺宁
魏程琛
雷强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Unisinsight Technology Co Ltd
Original Assignee
Chongqing Unisinsight Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Unisinsight Technology Co Ltd filed Critical Chongqing Unisinsight Technology Co Ltd
Priority to CN202011127396.8A priority Critical patent/CN112199048B/en
Publication of CN112199048A publication Critical patent/CN112199048A/en
Application granted granted Critical
Publication of CN112199048B publication Critical patent/CN112199048B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a data reading method, a system, a device and a medium, wherein the method comprises the following steps: acquiring data and storing the data to a local disk and a ceph shared disk by using micro-services, wherein the ceph shared disk stores full data by using a distributed file system, and part of data stored in the local disk is exclusively shared by the corresponding micro-services; when the micro-service receives the query request, judging whether the service state of each Pod is normal, when the service state of the Pod is normal, distributing the query request to the Pod with the normal service state to query the local disk, and summarizing and sorting to obtain a query result; and when the service state of the pod is abnormal, inquiring partition data corresponding to the pod with the abnormal service state corresponding to the ceph shared disk, and summarizing and sequencing to obtain an inquiry result. The stability of the system is improved by mixing the ceph shared disk with the local disk; the performance of the local disk is shared by a single micro service, so that the data reading efficiency is improved; and partial data is queried through each pod instead of the whole data, the whole is broken into parts, and the whole query speed is provided.

Description

Data reading method, system, device and medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data reading method, system, device, and medium.
Background
With the development of storage technology, distributed systems are widely used, where a distributed system includes a master node and multiple slave nodes for storing the same data, and a terminal can write data into the distributed system through the master node and read data stored in any one of the nodes.
However, the distributed micro cloud service needs to share the feature data, and in the prior art, shared storage ceph is mostly adopted to store the feature data, which has the advantages of high reliability and accessibility to all services and all nodes. However, in the actual use process, mass data is formed along with the continuous access of the data, and the phenomenon of data redundancy is caused; meanwhile, as the disks between the nodes adopt different file reading systems, the data fetching speed becomes very slow.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, an object of the present application is to provide a data reading method, system, device and medium, which are used to solve the problem of low efficiency of micro-cloud data reading in the prior art.
To achieve the above and other related objects, the present application provides a data reading method, including:
step S1, acquiring data and storing the data to a local disk and a ceph shared disk by using micro-services, wherein the ceph shared disk stores full data by using a distributed file system, and part of data stored in the local disk is exclusively shared by the corresponding micro-services;
step S2, when the micro service receives the query request, judging whether the service state of each Pod is normal, when the service state of the Pod is normal, distributing the query request to the Pod with normal service state to query the local disk, and summarizing and sorting to obtain a query result; and when the service state of the pod is abnormal, inquiring partition data corresponding to the pod with the abnormal service state corresponding to the ceph shared disk, and summarizing and sequencing to obtain an inquiry result.
An object of the present application is to provide a data reading system, including:
the data storage module is used for acquiring data and storing the data to a local disk and a ceph shared disk by using micro-services, wherein the ceph shared disk stores full data by using a distributed file system, and part of data stored in the local disk is exclusively shared by the corresponding micro-services;
the query request module is used for judging whether the service state of each Pod is normal or not when the micro service receives a query request, distributing the query request to the Pod with the normal service state to query a local disk when the service state of the Pod is normal, and summarizing and sorting to obtain a query result; and when the service state of the pod is abnormal, inquiring partition data corresponding to the pod with the abnormal service state corresponding to the ceph shared disk, and summarizing and sequencing to obtain an inquiry result.
Another object of the present application is to provide an electronic device, comprising:
one or more processing devices;
a memory for storing one or more programs; when the one or more programs are executed by the one or more processing devices, the one or more processing devices are caused to perform the data reading method.
It is a further object in the present application to provide a computer-readable storage medium having stored thereon a computer program for causing the computer to execute the data reading method.
As described above, the data reading method, system, device, and medium of the present application have the following beneficial effects:
the stability of the system is improved by mixing the ceph shared disk and the local disk; the performance of the local disk is shared by a single micro-service, so that the data reading efficiency is improved; during query, partial data is queried through each pod instead of all data, the whole is broken up into zero, and the overall query speed is provided.
Drawings
FIG. 1 is a flow chart of a data reading method provided in the present application;
FIG. 2 illustrates a flow chart for adding feature storage to local storage provided herein;
FIG. 3 shows a flow chart of a local store query provided herein;
FIG. 4 shows a flow chart for local storage daily data validation provided herein;
FIG. 5 shows a feature drift synchronization flow chart provided for the present application;
FIG. 6 is a block diagram of a data reading system according to the present application;
fig. 7 shows a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
The following description of the embodiments of the present application is provided by way of specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure herein. The present application is capable of other and different embodiments and its several details are capable of modifications and/or changes in various respects, all without departing from the spirit of the present application. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present application, and the drawings only show the components related to the present application and are not drawn according to the number, shape and size of the components in actual implementation, and the type, number and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
Referring to fig. 1, a system structure block diagram of a data reading method provided in the present application includes:
step S1, acquiring data and storing the data to a local disk and a ceph shared disk by using micro-services, wherein the ceph (distributed file system) shared disk stores full data by using a distributed file system, and partial data stored in the local disk is exclusively shared by corresponding micro-services;
wherein, the step S1 specifically includes: when a corresponding topic in the middleware kafka is newly established, dividing each topic into 60 partitions;
starting the micro service, calling a micro cloud interface to confirm the number of spots expected to be started by the micro service, wherein each micro service is named as a spot 0 to a spot (N-1), and calculating the partition number and the serial number corresponding to each spot, wherein N is a natural number greater than 1;
judging whether a spare disk exists in a local disk corresponding to the microservice, if so, formatting the spare disk to mount a pod in the corresponding disk, and synchronizing data stored in the ceph shared disk to the disk; if the spare disk does not exist, calling the micro cloud interface to hang the pod, and enabling the micro cloud interface to drift to other servers for secondary scheduling to realize pod hanging;
analyzing data corresponding to the micro service to obtain structured data and feature data, wherein the storage positions of the structured data and the feature data depend on the self load of the middleware kafka, and the middleware kafka updates the offset of the corresponding partition after the storage is finished;
monitoring the offset in topic corresponding to the middleware kafka, and when the offset is not monitored to be the latest offset, persistently processing data according to the storage time or storage capacity of a memory, and storing the data to a local disk and a ceph shared disk; and when the offset is monitored to be the latest offset, not processing.
Step S2, when the micro service receives the query request, judging whether the service state of each Pod is normal, when the service state of the Pod is normal, distributing the query request to the Pod with normal service state to query the local disk, and summarizing and sorting to obtain a query result; and when the service state of the pod is abnormal, inquiring partition data corresponding to the abnormal service state pod corresponding to the ceph shared disk, and summarizing and sequencing to obtain an inquiry result.
In the embodiment, the stability of the system is improved by mixing the ceph shared disk and the local disk; the performance of the local disk is shared by a single micro-service, so that the data reading efficiency is improved; during query, partial data is queried through each pod instead of all data, the whole is broken up into zero, and the overall query speed is provided.
Referring to fig. 2, for the local storage added feature storage flowchart provided in the present application, step S1 includes:
step S1.1, when newly building a corresponding topic in the middleware kafka, dividing each topic into 60 partitions (the number of topics is determined when newly building, repleniaca is 3, and partitions are 60), and the number of each partition is 0-59;
step 1.2, in the micro-service starting process, calling a micro-cloud interface to detect the number of spots started in the micro-service duration, wherein each micro-service is named as a spot 0 to a spot (N-1), calculating the number and the serial number of partition corresponding to each spot, and calculating the corresponding relation between the spots and the partition, wherein N is a natural number greater than 1.
For example, the number is calculated by dividing the number of partitions by the number of dots to a whole: and M is 60/N. N is the number of spots to be started, M is how many partitions are per service, the number of partitions of the last spot (i.e. the number of spots (N-1)) is calculated as the number of partitions divided by the number of spots, the rest 60% of N is equal to A, A is equal to the rest value, and the whole part of A + M is equal to B, B is the number of partitions of the last spot (i.e. the number of spots (N-1)) is added, and the distribution of the partitions is respectively 0 to M-1 from the position 0 to the last spot (N-1); and directly writing the corresponding relation between the points and the partition into a partition. aux file in the ceph shared disk storage/under the home path after the calculation is finished.
Step 1.3, in the process of starting the service, detecting whether a spare disk exists in the local disk, and checking whether a storage device in the server is a MONUTPOINT device which is empty through lsblk; if the device exists, the device is formatted, the formatted file type is EXT4, and then a directory/home/podN (N is a pod number) is mounted to a corresponding disk; and detecting whether data exist in the ceph shared disk, and if so, synchronizing to the local disk through the rsync. If no spare disk exists in the local disk, refer to step 4.5 (to implement pod mount to other nodes according to pod drift hung by the current server).
Step 1.4, after the service is started, the analysis micro-service analyzes the picture data into structural data and characteristic data and stores the structural data and the characteristic data in the middleware kafka, which partition is stored is determined according to the load balance of the kafka, and the offset (offset) corresponding to the partition is updated after the kafka storage is finished;
step 1.5, the view library microserver acquires the offset of kafka once per second, and if the offset is detected to be not the latest, acquires the corresponding data in kafka (kafka itself records the offset consumed by each group and the latest offset), and after acquiring the latest data, stores the data in the memory, and persists the data once when the number of the memory reaches a preset number (e.g., 10000) or when the time in the memory reaches a preset time (e.g., 5 minutes).
And step 1.6, storing the data in two parts in the persistence process, wherein one part is stored in the ceph shared disk, and the other part is stored in the local disk. Persistence is stored according to which partition each piece of data is obtained from. The ceph shared disk is stored and directly written into a file under the corresponding partition; the local disk firstly reads the positions stored in the ceph shared disk and the file partition.
In the embodiment, the stability of the system is improved by mixing the ceph shared disk and the local disk, and meanwhile, the stability and the usability of the system are improved by actively identifying characteristic resources, traversing all nodes, and dynamically starting service; the performance of the local disk is shared by a single micro service, and the data reading efficiency is improved.
Referring to fig. 3, a flow chart of a local storage query provided by the present application includes:
when the micro-service receives a query request, judging whether the service state of each Pod is normal, when the service state of the Pod is normal, distributing the query request to the Pod with the normal service state to query a local disk, and acquiring a query result by using the Pod collecting and sorting of the distributed query request; and when the service state of the pod is abnormal, inquiring partition data corresponding to the abnormal service state pod corresponding to the ceph shared disk by using the pod of the distribution inquiry request, and summarizing and sequencing to obtain an inquiry result.
Step 2.1, when FDS (Flex Data Services, Data Services and micro Services) receives a query request, calling a micro cloud interface to judge whether the operation state of each service (pods) is normal; if the service running state is normal, the query interface of each micro service is called, the query request is distributed to the micro services in other normal states, the query results of the local disks of the corresponding micro services are obtained, and the query results are collected and sorted, so that the final query result is obtained.
And 2.2, if the service running state is abnormal, recording according to the position number returned by the micro cloud and with abnormal state or states, reading the partition. aux file recorded in the home path, and determining the partition number corresponding to the abnormal state.
Further, the partition data corresponding to the abnormal-state spots is stored in the ceph shared disk, and the feature files are stored in 60 different partitions respectively.
And 2.3, if the state of the service (pods) is abnormal, the pod of the distribution request is substituted to inquire corresponding partition data and finally summarized. In addition, the nodes in normal state call the query interfaces of the nodes by distributing the query requests, and the queried micro-services query the local disk and then return the data. After the query is finished, the data is summarized and returned by the service distributed by the service.
In this embodiment, the whole is broken up by querying part of the data for each pod instead of querying the whole data, providing the overall query speed.
Referring to fig. 4, a flow chart for confirming local storage daily data provided by the present application includes:
monitoring whether the time of the background service reaches preset time, and when the time of the background service reaches the preset time, detecting whether the sizes of the files stored in the local disk corresponding to the micro service and the ceph shared disk are consistent;
when the fact that the sizes of the files stored in the local disk and the files stored in the ceph shared disk are inconsistent is detected, the partition data corresponding to the ceph shared disk are synchronized to the local disk;
and when the sizes of the files stored in the local disk and the ceph shared disk are consistent, not processing the files.
In this embodiment, the local storage daily confirmation procedure specifically includes:
s3.1, monitoring time by using a micro-service background; when the monitoring time reaches the preset time, judging the sizes of the feature files corresponding to the local disks corresponding to the services and the ceph shared disk;
for example, the background process time monitor monitors whether the server time is 3 am, and when the server time is 3 am, each corresponding microserver determines whether the file size in the local disk is consistent with the file size in the ceph shared disk.
S3.2, acquiring each file with the size needing to be judged, and recording the position of a file pointer; detecting whether the pointer positions of the files under the partial corresponding to the local disk and the ceph shared disk are the same or not, and if the pointer positions are the same, determining that the sizes of the files are the same; when the pointer locations are different, the file sizes are not consistent.
S3.3, if the sizes of the files are consistent, the files are not synchronized; and if the file sizes are inconsistent, calling an rsync (digital mirror image backup tool) method to synchronize the files.
For example, calling rsync from ceph shared disk shared storage for file synchronization; if the file of the local disk is larger than the file in the ceph shared storage, synchronizing the file of the local disk to the ceph shared storage, wherein the characteristic files are respectively stored according to 60 different partitions; and if the file of the local disk is smaller than the file in the ceph shared disk shared storage, synchronizing the file of the ceph shared storage to the file of the local disk.
In the embodiment, the files are synchronized at regular time every day, on the basis of not influencing the normal reading of the data, the new data is synchronized to the ceph shared disk for shared storage, the new data is backed up in time, and when a disaster occurs or the data is restarted due to a fault, the data can be effectively recovered, so that the data processing capacity is improved.
Please refer to fig. 5, which is a flow chart of feature drift synchronization provided in the present application, including:
when detecting that the server or the micro-service is hung abnormally and drifts, creating the pods and initializing to obtain the number of the pods needing to be started in the configuration file; calculating the partition corresponding to each pod and storing the partition to a ceph shared disk; judging whether a spare disk exists in the current server, and if the spare disk exists in the current server, mounting the pod to the corresponding disk; and if the current server does not have a spare disk, implementing pod mounting to other nodes according to pod drifting hung up by the current server.
In the above embodiment, the step of mounting the pod to the corresponding disk includes:
formatting the current spare disk, mounting the spare disk to the spare disk by using a mount method according to a directory, acquiring corresponding data from a ceph shared disk according to the corresponding relation between the pod and the partition, completing synchronization, and realizing service initialization and state updating.
In the foregoing embodiment, the step of implementing pod mount to another node according to pod drift hung by a current server includes:
detecting whether a ceph shared disk has a failure information file or not; if the failure information file does not exist, calling a micro cloud interface to acquire the number of micro cloud nodes (nodes), the corresponding IP and the IP with the abnormal state, recording the IP with the abnormal state into an abnormal IP list, adding one to the number of the nodes and writing the number of the nodes into the created failure information file, and calling the micro cloud interface delete to enable the service to drift to other nodes; if the failure information file exists, acquiring a current server IP and reading the failure information file, judging whether the current server IP exists in the failure information file, if so, drifting to other nodes again and not counting; and if the current server IP does not exist in the failure information file, drifting to other nodes again by using idle spare disks according to the number of the nodes of the current micro cloud cluster.
In another embodiment, whether an idle spare disk exists in the node number range is judged, if the idle spare disk exists in the node number range, the service of the cloud interface delete is called to enable the service to drift to other nodes, and the pod is mounted to the corresponding disk; and if no idle free disk exists in the node number range, starting and initializing the service, finishing the drifting, and changing the pod state into a pending state.
The following detailed steps are detailed steps of realizing synchronous recovery of the server or the micro-service due to abnormal hanging caused by external force factors through a drift characteristic:
step 4.1: when a server or a micro-service is hung abnormally and drifts (the drift refers to micro-cloud secondary scheduling, and because a pod state cannot be pulled up or a node of a micro-cloud cannot be scheduled, the micro-service is scheduled to other node nodes), in the creating process of the spots, in the initializing process, a partition. aux file in a ceph shared disk storage/under a home path is read, and the corresponding relation between the spots and the partition is confirmed.
Step 4.2: when the corresponding relation between the spots and the partition is obtained, judging whether a vacant disk exists in the current server or not; calling lsblk to confirm whether the mount directory of the storage device mount point is empty
Step 4.3: if the directory of the MOUNTPOINT of the storage device is empty, calling the mk2fs formatted disk (ext4 format), mounting the directory/home/podN directory on the disk by using a mount method (the podN is consistent with the alias of the pos being initialized), writing the storage device information and the corresponding directory and format into an/etc/fastab file, completing data synchronization, restarting a server, and enabling a file system to take effect;
step 4.4: after the mounting is finished, acquiring corresponding data from a ceph shared disk storage by an rsync method according to the corresponding relation between the pod and the partition, after the synchronization is finished, finishing the service initialization, updating the state to a normal state, and finishing the starting logic at this moment;
step 4.5: when detecting that there is no spare disk on the current server, determining whether a failure information file failure count is present in the ceph shared disk/home, if no failure information file is present, calling a micro cloud interface to obtain the number of nodes of the micro cloud, the IPs corresponding to all the nodes and the IPs of nodes with abnormal current state, adding a counter, setting the default count to 0, judging whether the number of the nodes is greater than the count number of the counter, recording the local IP and the abnormal IP returned by the micro cloud into a failure count.
Step 4.6: and when the shifted server IP is in the failed IP list and the number of the nodes is greater than the number of the counters, calling the service of the cloud interface delete to shift the server to other nodes, shifting the server to other nodes, and executing the step 4.3 of judging again.
Step 4.7: and when the IP of the server after drift is in the fail list and the number of the nodes is less than or equal to the number of the counters, starting and initializing the service, ending the drift, changing the state of the pod into a pending state, and entering logic to end.
In the embodiment, the characteristic drift is synchronized to other node devices, so that the fault processing capability and recovery capability are improved, and the stability of the system is ensured.
In another embodiment, a micro cloud cluster is used, wherein three physical machines are in the cluster, each physical machine has 3 spare disks, the face image data is accessed, and the number of pod of the face micro service is 5:
when creating a service, 5 pods are created, pods0-pods4 respectively; each of the posts is correspondingly mounted with a disk, and the information of the posts and the partition is written into partition. The mounted directories are/home/podN (N is an alias number). When data access is carried out, the face microserver monitors the offset in the corresponding topic in kafka by using a specific group, and if the offset is latest, no operation is carried out; if the offset is not the latest, consuming the data into a memory, persisting the data in the memory every 5 minutes or 10000 pieces of data, simultaneously writing the file into a ceph shared disk and a local disk (persisting each pod according to the partition and deleting the data in the memory), calling a micro cloud interface to judge the running state of each pod when receiving a query request, and if the running state is normal, distributing the query request to each pod to query the local disk and then summarizing; if the running state is abnormal, reading a partition file under the ceph shared disk/home according to the alias of the abnormal pod, judging the corresponding relation between the pods and the partition, and directly reading corresponding partition data in the ceph shared disk by the pods which receive the query request for query and comparison; and for the other pods with normal states, distributing the query request, and querying the local disk by the corresponding pods. If the service is abnormal and is dispatched to other nodes by the micro cloud for the second time, judging whether a failcount. Aux file is created and information is recorded in the file, since 0+1 is less than 3, drift to other machines; if yes, firstly acquiring the current server IP, directly reading the file, judging whether the IP exists in the file, if yes, not counting, drifting again and not counting; if the IP does not exist in the file, judging whether the node is larger than 0+1, if so, initializing and judging that spare disk resources are available, starting the system if the spare disk resources are available, drifting again if no disk resources are available, and increasing the count. And if the number of the nodes is less than or equal to the count +1, setting the state as pending and ending the process.
Referring to fig. 6, a block diagram of a data reading system provided in the present application includes:
the data storage module 1 is used for acquiring data and storing the data to a local disk and a ceph shared disk by using micro-services, wherein the ceph shared disk stores full data by using a distributed file system, and part of data stored in the local disk is exclusively shared by corresponding micro-services;
the query request module 2 is configured to determine whether the service state of each Pod is normal when the micro service receives a query request, distribute the query request to the Pod with a normal service state to query the local disk when the service state of the Pod is normal, and perform summarizing and sorting to obtain a query result; and when the service state of the pod is abnormal, inquiring the partition data corresponding to the abnormal service state pod corresponding to the ceph shared disk, and summarizing and sequencing to obtain an inquiry result.
The timing synchronization module 3 is used for monitoring whether the time of the background service reaches the preset time, and when the time of the background service reaches the preset time, detecting whether the sizes of the files stored in the local disk corresponding to the micro service and the ceph shared disk are consistent; when the fact that the sizes of the files stored in the local disk and the files stored in the ceph shared disk are inconsistent is detected, the partition data corresponding to the ceph shared disk are synchronized to the local disk; and when the sizes of the files stored in the local disk and the ceph shared disk are consistent, not processing the files.
The abnormal recovery module 4 is used for creating and initializing the pods when detecting that the server or the micro-service is abnormally hung up and drifts, and acquiring the number of the pods needing to be started in the configuration file; calculating the partition corresponding to each pod and storing the partition to a ceph shared disk; judging whether a spare disk exists in the current server, and if the spare disk exists in the current server, mounting the pod to the corresponding disk; and if the current server does not have a spare disk, implementing pod mounting to other nodes according to pod drifting hung up by the current server.
It should be noted that the data reading method and the data reading system are in a one-to-one correspondence relationship, and here, the technical details and technical effects involved in the process steps S1-S4 and the data reading system are the same, and are not described herein again, please refer to the data reading method.
Referring now to fig. 7, a schematic diagram of an electronic device (e.g., a terminal device or a server 700) suitable for implementing embodiments of the present disclosure is shown, where the terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), etc., and a fixed terminal such as a digital TV, a desktop computer, etc. the electronic device shown in fig. 7 is only one example and should not bring any limitations to the functions and scope of use of the embodiments of the present disclosure.
As shown in fig. 7, electronic device 700 may include a processing means (e.g., central processing unit, graphics processor, etc.) 701 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from storage 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the electronic apparatus 700 are also stored. The processing device 701, the ROM702, and the RAM703 are connected to each other by a bus 704. An input/output (I/O) interface 707 is also connected to bus 704.
Generally, the following devices may be connected to the I/O interface 707: input devices 707 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 707 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 708 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication means 709 may allow the electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 illustrates an electronic device 700 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication means 709, or may be installed from the storage means 708, or may be installed from the ROM 702. The computer program, when executed by the processing device 701, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. A computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this embodiment, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a data set, wherein the data set is a plurality of human images acquired by the same person under different scenes; distributing the portrait in the data set to corresponding bayonet equipment according to the number of the bayonet equipment, the distributed geographic position and a preset time period to form images which are arranged in a time sequence and represent the portrait track, and form images which are arranged in the time sequence and represent the portrait track; and clustering and archiving the image by using the platform to be tested, and calculating the index of clustering and classification.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods and computer program products according to various embodiments disclosed herein. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In summary, the stability of the system is improved by mixing the ceph shared disk and the local disk; the performance of the local disk is shared by a single micro-service, so that the data reading efficiency is improved; during query, partial data is queried through each pod instead of all data, the whole is broken up into zero, and the overall query speed is provided.
The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the application. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical concepts disclosed in the present application shall be covered by the claims of the present application.

Claims (10)

1. A data reading method, comprising:
step S1, acquiring data and storing the data to a local disk and a ceph shared disk by using micro-services, wherein the ceph shared disk stores full data by using a distributed file system, and part of data stored in the local disk is exclusively shared by the corresponding micro-services;
step S2, when the micro service receives the query request, judging whether the service state of each Pod is normal, when the service state of the Pod is normal, distributing the query request to the Pod with normal service state to query the local disk, and summarizing and sorting to obtain a query result; and when the service state of the pod is abnormal, inquiring partition data corresponding to the pod with the abnormal service state corresponding to the ceph shared disk, and summarizing and sequencing to obtain an inquiry result.
2. A data reading method according to claim 1, further comprising: monitoring whether the time of the background service reaches preset time, and when the time of the background service reaches the preset time, detecting whether the sizes of the files stored in the local disk corresponding to the micro service and the ceph shared disk are consistent; when the fact that the size of a file stored in a local disk is inconsistent with that of a file stored in a ceph shared disk is detected, synchronizing partition data corresponding to the ceph shared disk to the local disk; and when the sizes of the files stored in the local disk and the ceph shared disk are consistent, not processing the files.
3. A data reading method according to claim 1, wherein the step S1 further comprises:
when a corresponding theme in the middleware kafka is newly built, dividing each theme into 60 partitions;
starting the micro service, calling a micro cloud interface to confirm the number of pods expected to be started by the micro service, wherein each micro service is named as pod0 to pod (N-1), and calculating the number and the number of partitions corresponding to each pod, wherein N is a natural number greater than 1;
judging whether a spare disk exists in a local disk corresponding to the microservice, if so, formatting the spare disk to mount a pod in the corresponding disk, and synchronizing data stored in the ceph shared disk to the disk; if the spare disk does not exist, calling the micro cloud interface to hang the pod, and enabling the pod to drift to other servers in secondary scheduling;
analyzing data corresponding to the micro service to obtain structured data and characteristic data, wherein the storage positions of the structured data and the characteristic data depend on the self load of the middleware kafka, and the offset of the corresponding partition is updated after the storage of the middleware kafka is finished;
monitoring the offset in the corresponding theme in the middleware kafka, and when the offset is not monitored to be the latest offset, persistently processing data according to the storage time or storage capacity of a memory, and storing the data to a local disk and a ceph shared disk; and when the offset is monitored to be the latest offset, not processing.
4. A data reading method according to claim 1, further comprising: when detecting that the server or the micro-service is hung abnormally and drifts, creating the pods and initializing to obtain the number of the pods needing to be started in the configuration file; calculating a partition corresponding to each pod and storing the partition to a ceph shared disk; judging whether a spare disk exists in the current server, and if the spare disk exists in the current server, mounting the pod to the corresponding disk; and if the current server does not have a spare disk, implementing pod mounting to other nodes according to pod drifting hung up by the current server.
5. The data reading method according to claim 4, wherein the step of mounting the pod to the corresponding disk comprises:
formatting the current spare disk, mounting the spare disk to the spare disk by using a mount method according to a directory, acquiring corresponding data from a ceph shared disk according to the corresponding relation between the pod and the partition, completing synchronization, and realizing service initialization and state updating.
6. The data reading method according to claim 4, wherein the step of implementing pod mount to other nodes according to pod drift hung up by the current server comprises:
detecting whether a ceph shared disk has a failure information file or not; if the failure information file does not exist, calling a micro cloud interface to obtain the number of micro cloud nodes, corresponding IPs and abnormal-state IPs, recording the abnormal-state IPs into an abnormal IP list, adding one to the number of the nodes and writing the node into the created failure information file, and calling the micro cloud interface to delete the service so that the service drifts to other nodes; if the failure information file exists, acquiring a current server IP and reading the failure information file, judging whether the current server IP exists in the failure information file, if so, drifting to other nodes again and not counting; and if the current server IP does not exist in the failure information file, drifting to other nodes again by using idle spare disks according to the number of the nodes of the current micro cloud cluster.
7. A data reading method according to claim 6, further comprising:
judging whether idle spare disks exist in the node number range, if the idle spare disks exist in the node number range, calling a micro cloud interface to delete the service to enable the service to drift to other nodes, and mounting the pod to the corresponding disks; and if no idle disk exists in the node number range, starting and initializing the service, finishing drifting, and changing the pod state into a suspended state.
8. A data reading system, comprising:
the data storage module is used for acquiring data and storing the data to a local disk and a ceph shared disk by using micro-services, wherein the ceph shared disk stores full data by using a distributed file system, and part of data stored in the local disk is exclusively shared by the corresponding micro-services;
the query request module is used for judging whether the service state of each Pod is normal or not when the micro service receives a query request, distributing the query request to the Pod with the normal service state to query a local disk when the service state of the Pod is normal, and summarizing and sorting to obtain a query result; and when the service state of the pod is abnormal, inquiring partition data corresponding to the pod with the abnormal service state corresponding to the ceph shared disk, and summarizing and sequencing to obtain an inquiry result.
9. An electronic device, characterized in that: the method comprises the following steps:
one or more processing devices;
a memory for storing one or more programs; when executed by the one or more processing devices, cause the one or more processing devices to implement the data reading method of any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, for causing the computer to execute the data reading method of any one of claims 1 to 7.
CN202011127396.8A 2020-10-20 2020-10-20 Data reading method, system, device and medium Active CN112199048B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011127396.8A CN112199048B (en) 2020-10-20 2020-10-20 Data reading method, system, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011127396.8A CN112199048B (en) 2020-10-20 2020-10-20 Data reading method, system, device and medium

Publications (2)

Publication Number Publication Date
CN112199048A CN112199048A (en) 2021-01-08
CN112199048B true CN112199048B (en) 2021-07-27

Family

ID=74009516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011127396.8A Active CN112199048B (en) 2020-10-20 2020-10-20 Data reading method, system, device and medium

Country Status (1)

Country Link
CN (1) CN112199048B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112995301B (en) * 2021-02-07 2023-03-10 中国工商银行股份有限公司 Data processing method and device applied to distributed system
CN112883016B (en) * 2021-04-28 2021-07-20 睿至科技集团有限公司 Data storage optimization method and system
CN115794363A (en) * 2021-09-10 2023-03-14 西门子(中国)有限公司 Method, platform and computer readable medium for deploying a model
CN117389725A (en) * 2023-10-12 2024-01-12 中科驭数(北京)科技有限公司 Service data node migration method and device, electronic equipment and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7953736B2 (en) * 2007-01-04 2011-05-31 Intersect Ptp, Inc. Relevancy rating of tags
CN105302661A (en) * 2014-06-04 2016-02-03 北京云端时代科技有限公司 System and method for implementing virtualization management platform high availability
CN105573824B (en) * 2014-10-10 2020-04-03 腾讯科技(深圳)有限公司 Monitoring method and system for distributed computing system
CN105068758B (en) * 2015-07-23 2018-06-19 清华大学 Towards the Distributed File System Data I/O optimization methods of parallel data acquisition
CN105224445B (en) * 2015-10-28 2017-02-15 北京汇商融通信息技术有限公司 Distributed tracking system
CN111641700B (en) * 2020-05-25 2023-04-28 上海德拓信息技术股份有限公司 Ceph object-based management and retrieval implementation method for storage metadata
CN111639082B (en) * 2020-06-08 2022-12-23 成都信息工程大学 Object storage management method and system of billion-level node scale knowledge graph based on Ceph

Also Published As

Publication number Publication date
CN112199048A (en) 2021-01-08

Similar Documents

Publication Publication Date Title
CN112199048B (en) Data reading method, system, device and medium
CN110704000B (en) Data processing method, device, electronic equipment and storage medium
CN111198859B (en) Data processing method, device, electronic equipment and computer readable storage medium
CN111274503B (en) Data processing method, device, electronic equipment and computer readable medium
CN113485962B (en) Log file storage method, device, equipment and storage medium
CN110633046A (en) Storage method and device of distributed system, storage equipment and storage medium
CN114625597A (en) Monitoring operation and maintenance system, method and device, electronic equipment and storage medium
CN111198777A (en) Data processing method, device, terminal and storage medium
CN113553178A (en) Task processing method and device and electronic equipment
CN110545313B (en) Message push control method and device and electronic equipment
CN111198853B (en) Data processing method, device, electronic equipment and computer readable storage medium
CN112363980A (en) Data processing method and device for distributed system
CN111177260A (en) Database remote copying method and device and electronic equipment
CN116049142A (en) Data processing method, device, electronic equipment and storage medium
CN114840562B (en) Distributed caching method and device for business data, electronic equipment and storage medium
CN114827698B (en) Method, device, terminal equipment and storage medium for synchronizing play information
CN110704401A (en) Data processing method and device, electronic equipment and storage medium
CN111274104A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN113391882B (en) Virtual machine memory management method and device, storage medium and electronic equipment
CN113886353A (en) Data configuration recommendation method and device for hierarchical storage management software and storage medium
CN112799863A (en) Method and apparatus for outputting information
CN109992215A (en) A kind of upgrade method, upgrade-system and the relevant apparatus of ICFS system
CN117349035B (en) Workload scheduling method, device, equipment and storage medium
CN110727694A (en) Data processing method and device, electronic equipment and storage medium
CN116820354B (en) Data storage method, data storage device and data storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant