WO2020258290A1 - 日志数据收集方法、日志数据收集装置、存储介质和日志数据收集系统 - Google Patents

日志数据收集方法、日志数据收集装置、存储介质和日志数据收集系统 Download PDF

Info

Publication number
WO2020258290A1
WO2020258290A1 PCT/CN2019/093854 CN2019093854W WO2020258290A1 WO 2020258290 A1 WO2020258290 A1 WO 2020258290A1 CN 2019093854 W CN2019093854 W CN 2019093854W WO 2020258290 A1 WO2020258290 A1 WO 2020258290A1
Authority
WO
WIPO (PCT)
Prior art keywords
log data
log
data collection
data
message queue
Prior art date
Application number
PCT/CN2019/093854
Other languages
English (en)
French (fr)
Inventor
樊林
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to PCT/CN2019/093854 priority Critical patent/WO2020258290A1/zh
Priority to CN201980000951.XA priority patent/CN112449750A/zh
Priority to US16/963,618 priority patent/US11755452B2/en
Publication of WO2020258290A1 publication Critical patent/WO2020258290A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0868Data transfer between cache memory and other subsystems, e.g. storage devices or host systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0888Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/40Data acquisition and logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45591Monitoring or debugging support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/16General purpose computing application
    • G06F2212/163Server or database system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis

Definitions

  • the embodiments of the present disclosure relate to a log data collection method, a log data collection device, a storage medium, and a log data collection system.
  • Docker is an open source application container engine that allows developers to package their applications and package these applications into a portable container (container), and then publish to any popular Linux or Windows machine, can also achieve virtualization .
  • the container uses the sandbox mechanism, and there is no interface between the container and the container, which is independent of each other.
  • At least one embodiment of the present disclosure provides a log data collection method, including: acquiring log data generated by at least one container in an application container environment; transmitting the log data to a log buffer unit for buffering; and collecting the log data through the log collection unit The log data cached in the log cache unit, and the log data is transmitted to the log storage unit for storage.
  • the log buffer unit includes a message queue component
  • the log collection unit includes a data flow migration component
  • the log data collection method includes: The data is directly transmitted to the message queue component for buffering; the log data buffered in the message queue component is collected by the data flow migration component, and the log data is transmitted to the log storage unit for storage.
  • transmitting the log data to the log cache unit for caching includes: according to the log type of the log data, separate log data of different log types Send to different message queues in the message queue component for buffering.
  • collecting log data buffered in the log buffer unit by the log collection unit includes: the log collection unit reads the different message queues one by one Cached log data to collect the log data cached in the log cache unit.
  • the log data includes error-level log data, warning-level log data, and information-level log data.
  • the log data is transmitted to the log storage unit for storage based on the system time and according to the first time range.
  • the log storage unit includes a distributed file system; transmitting the log data to the log storage unit for storage includes: collecting the log The log data collected by the unit is transmitted to the distributed file system for distributed storage.
  • the log data collection method provided by at least one embodiment of the present disclosure further includes: performing data processing on the log data stored in the log storage unit.
  • a time slice is used as a filter condition to determine the data range of the log data that needs to be processed; to determine whether the log data in the data range is compliant, If it is compliant, the log data is collected in a structured manner, and the log data is output to a target file with a time slice for storage.
  • determining whether the log data in the data range is compliant includes: reading in at least one log data in the data range one by one in a distributed manner to determine the Whether the log data in at least one data range is compliant.
  • the log data is log data generated by an intelligent question answering system.
  • the type of the log data includes a first type of log data and a second type of log data; the first type of log data is sent to the message queue component The second type of log data is sent to the second message queue in the message queue component for buffering; the first message queue and the second message queue are different message queue.
  • the first type of log data is log data generated based on general question and answer
  • the second type of log data is log data generated based on art question and answer .
  • the application container environment includes the at least one container
  • the intelligent question answering system includes a natural language understanding subsystem
  • the natural language understanding subsystem runs on
  • the log data is generated on at least one container of the application container environment, and the at least one container outputs the log data in response to a business request.
  • the application container environment includes multiple containers, and different business modules of the natural language understanding subsystem run in different containers.
  • the application container environment is implemented by a docker container engine.
  • At least one embodiment of the present disclosure also provides a log data collection device, including a log acquisition unit, a log cache unit, a log collection unit, and a log storage unit.
  • the log acquisition unit is configured to acquire log data generated by at least one container in the application container environment; the log cache unit is configured to cache the log data; the log acquisition unit is configured to collect log data cached in the log cache unit and Perform transmission; the log storage unit is configured to store the log data.
  • the log buffer unit includes a message queue component
  • the log collection unit includes a data flow migration component
  • the log storage unit includes a distributed file system.
  • At least one embodiment of the present disclosure further provides a log data collection device, including: a processor; a memory, storing one or more computer program modules; the one or more computer program modules are configured to be executed by the processor
  • the one or more computer program modules include instructions for executing the log data collection method provided by any embodiment of the present disclosure.
  • At least one embodiment of the present disclosure further provides a storage medium that non-temporarily stores computer-readable instructions, and when the computer-readable instructions are executed by a computer, the log data collection method provided according to any embodiment of the present disclosure can be executed.
  • At least one embodiment of the present disclosure further provides a log data collection system, including a terminal device and a server; the terminal device is configured to receive audio or text information, and send the audio or text information to the server; the server It is configured to receive the audio or text information sent by the terminal device and generate log data, and collect the log data based on the log data collection method provided in any embodiment of the present disclosure.
  • the terminal device includes an electronic picture frame.
  • the audio or text information includes general audio or text information and artistic audio or text information
  • the server includes a general application container and an art application.
  • Container, message queue component, data flow migration component and distributed file system the general application container is configured to output general log data in response to the general audio or text information
  • the art application container is configured to Output art log data in response to the art audio or text information
  • the message queue component is configured to buffer the general log data and the art log data
  • the data stream migration component is configured to collect The general log data and the art log data are cached in the message queue component and transmitted
  • the distributed file system is configured to store the general log data and the art log data.
  • the message queue component includes a message queue of general topics and a message queue of artistic topics; the general log data is cached in the general topics In the message queue of, the art log data is cached in the log data of the art theme.
  • the server is further configured to determine the general log data and the art log stored on the distributed file system according to the first principle. Whether the data is compliant.
  • FIG. 1 is a flowchart of a log data collection method provided by at least one embodiment of the present disclosure
  • FIG. 2 is a flowchart of another log data collection method provided by at least one embodiment of the present disclosure.
  • FIG. 3 is a schematic block diagram of a log data collection method provided by at least one embodiment of the present disclosure.
  • FIG. 5 is a schematic block diagram of a log data collection device provided by at least one embodiment of the present disclosure.
  • FIG. 6 is a schematic block diagram of another log data collection device provided by at least one embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of a storage medium provided by at least one embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of a log data collection system provided by at least one embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of a terminal device provided by at least one embodiment of the present disclosure.
  • the intelligent question answering system based on the docker environment runs in a high-concurrency environment.
  • the use of a scalable application container environment can respond to high-concurrency business requests, but also generates a large amount of log data.
  • the log data may be incomplete due to the limitations of the container environment to read and write files.
  • the data generated in the multiple docker containers are stored in the form of files at the same time, such as a speed resistance machine, it will compete for storage resources. Therefore, it may cause the log data to be written during the peak time of log data generation. Failure, resulting in incomplete data storage or troublesome reading.
  • At least one embodiment of the present disclosure provides a log data collection method, including: acquiring log data generated by at least one container in an application container environment; transmitting the log data to a log buffer unit for buffering; and collecting the log buffer unit through the log collection unit The log data is cached in the log data, and the log data is transferred to the log storage unit for storage.
  • At least one embodiment of the present disclosure also provides a log data collection device, storage medium, and log data collection system corresponding to the foregoing log data collection method.
  • the log data collection method provided by the above-mentioned embodiments of the present disclosure can solve the problem of incomplete storage of log data generated in the application container environment, thereby broadening the use environment of the application container and improving its market competitiveness.
  • Fig. 1 is a flowchart of a log data collection method provided by at least one embodiment of the present disclosure.
  • the log data collection method can be applied to various systems operating based on an application container environment, such as intelligent question answering systems, etc., of course, can also be applied to various systems in other operating environments, and the embodiments of the present disclosure do not limit this.
  • the log data collection method can be implemented in software, loaded and executed by the processor in the intelligent question answering system, for example, loaded and executed by the central processing unit (CPU); or, at least in part by software, hardware, firmware Or any combination thereof can solve the problem of incomplete storage of log data generated in a high-concurrency environment, broaden the application field of the application container environment, and improve the market competition rate.
  • the log data collection method includes steps S110 to S130, and steps S110 to S130 of the log data collection method and their respective exemplary implementation manners are respectively introduced below.
  • Step S110 Obtain log data generated by at least one container in the application container environment.
  • Step S120 Transmit the log data to the log buffer unit for buffering.
  • Step S130 Collect log data buffered in the log buffer unit by the log collection unit, and transmit the log data to the log storage unit for storage.
  • the log buffer unit and the log collection unit mentioned in the above steps can be implemented in the form of hardware (for example, circuit) modules or software modules, and any combination thereof.
  • the central processing unit CPU
  • image processor GPU
  • tensor processor TPU
  • field programmable logic gate array FPGA
  • processing units and corresponding computer instructions implement these units.
  • the processing unit may be a general-purpose processor or a special-purpose processor, and may be a processor based on the X86 or ARM architecture.
  • the application container environment is implemented by a docker container engine, and accordingly, the application container is, for example, a docker container.
  • the application container is, for example, a docker container.
  • each docker container is independent of each other.
  • the number of services for example, intelligent question answering services
  • the number of docker containers can be increased accordingly, thereby improving the processing efficiency of the docker container, which is not limited in the embodiments of the present disclosure.
  • each docker container can be regarded as an independent host.
  • the creation of a docker container usually has an image (Image) as its template. Analogy to a virtual machine, it can be understood that the image is the image of the virtual machine, and the docker container is the running virtual machine. For example, what software is in a docker container after it is created depends entirely on the image it uses.
  • the image can be created by a docker container (equivalent to saving the state of the docker container as a snapshot at this time), or it can be created by a Dockerfile (a text file that uses some rules specified by docker).
  • a registry is a place where mirror files are stored centrally. Each warehouse can contain multiple mirrors, and each mirror has a different tag.
  • the warehouse is divided into two forms: public (Public) warehouse and private (Private) warehouse.
  • Public public
  • Private private
  • the largest public repository is Docker Hub, which stores a large number of images for users to download.
  • the embodiments of the present disclosure do not impose restrictions on the creation and storage of mirror images.
  • the log data is log data generated by an intelligent question answering system running based on the docker container environment, which is not limited in the embodiment of the disclosure.
  • the application container environment includes at least one container, and the natural language understanding (NLU) subsystem (for example, question and answer (Q&A) subsystem, dialogue subsystem, etc.) included in the intelligent question answering system runs at least in the application container environment Log data is generated on a container.
  • the at least one container outputs the log data in response to a business request.
  • NLU natural language understanding
  • the application container environment includes multiple containers, and different business modules of the natural language understanding subsystem (such as the first type of business module (e.g., general business module), the second type of business module (e.g., art The business module)) runs in different containers to implement responses to different business requests, thereby outputting different types of log data.
  • the first type of business module e.g., general business module
  • the second type of business module e.g., art The business module
  • the general business module of the natural language understanding subsystem processes general business requests (for example, weather questions and answers, time questions and answers and other common terms in daily life), and runs in the first type of docker container (for example, the general-purpose docker container);
  • the art business module of the natural language understanding subsystem processes art business requests (for example, who painted the painting, etc.), and runs in the second type of docker container (for example, art docker container).
  • the type of the log data includes a first type of log data (for example, general type log data) and a second type of log data (for example, art type log data).
  • the general log data includes, for example, log data generated in response to business requests such as weather question and answer, time question and answer, that is, log data generated by a general-purpose docker container; for example, art log data includes logs generated in response to business requests such as painting questions and answers.
  • the data, that is, the log data generated by the art docker container is not limited in the embodiment of the present disclosure.
  • the type of log data may also include inference log data or more other types of log data.
  • the inference log data may be log data generated in the process of judging and processing the above business request. The disclosed embodiment does not limit this.
  • the log data may be divided into multiple levels, including, for example, error (error) level log data, warning (warn) level log data, and information (info) level log data.
  • error-level log data includes error events that may still allow the application to continue running
  • warning-level log data includes potentially harmful locations
  • information-level log data includes information events that represent a coarser grain in the application running process.
  • the log data may also include debug level log data and fatal level log data.
  • the debug level log data includes finer-grained information events that are useful for debugging applications. Lower than the information level data, the fatal level data includes very serious error events that may cause the application to be terminated, and its level may be higher than the error level log data and the warning level log data, which is not limited by the embodiment of the present disclosure.
  • only error-level log data, warning-level log data, and information-level log data can be collected.
  • the amount of log data can be reduced and the system's work can be improved. Efficiency and accuracy.
  • this step S110 may include step S210 to step S240.
  • Step S210 Receive a service request.
  • the service request may be a question received by the intelligent question answering system, for example, what is the weather today, what time is it, etc.
  • the intelligent question answering system is not limited to one, but may include multiple, and the log collection method may simultaneously collect log data generated by the multiple intelligent question answering systems.
  • Step S220 At least one application container processes the service request.
  • At least one application container outputs answers to corresponding questions in response to the business request.
  • different types of business requests are processed in different docker containers, and these different docker containers are created based on different images.
  • docker containers can be divided into general docker containers, art docker containers, etc.
  • the log data generated is general log data and art log data. It can be specifically set according to actual conditions, which is not limited in the embodiments of the present disclosure.
  • Step S230 Generate relevant log data.
  • multiple log data such as user identification information, user problem information, and device information will be generated.
  • the multiple log data can be classified into the aforementioned types and levels, based on Its type and grade are stored accordingly to facilitate processing and recall in the subsequent processing.
  • Step S240 Obtain the log data.
  • the aforementioned log data can be divided into log data that does not need to be persistently stored and log data that needs to be persistently stored according to actual needs.
  • the log data generated in the application container environment that does not need to be persisted can be transferred to the log cache unit for caching, and the log data that needs to be persisted can be stored directly in the form of a file, or Both of them are cached by the log cache unit, which can be specifically set according to actual needs, which is not limited in the embodiment of the present disclosure. For example, you can decide whether to store multiple copies of the log data according to its importance or actual needs.
  • very important log data may include error-level log data and warning-level log data, which can be used for problem tracking, error judgment, etc., for example.
  • relatively unimportant log data may include debug-level log data or information-level log data.
  • the relatively important log data can be stored as needed, for example, two copies, one copy is transmitted to the log cache unit for caching, and the other copy is directly stored in the form of a file, for example, stored in the speed resistance machine. , Hard disk, etc.
  • the log data that needs to be persistently stored may include log data of various levels, such as error-level log data and warning-level log data, etc.
  • the log data that does not need to be persistently stored also includes logs of various levels Data, for example, includes log data above the information level (for example, error-level log data, warning-level log data, and information-level log data, etc.), for example for subsequent text analysis, etc., which can be specifically set according to actual needs.
  • the implementation of this disclosure The example does not restrict this.
  • the log data that needs to be transmitted and saved can be determined according to actual needs. For example, it can also include debug level log data or fatal level log data, which is not limited in the embodiment of the present disclosure.
  • a log acquisition unit for acquiring log data may be provided, and log data generated by at least one container in the application container environment may be acquired through the log acquisition unit; for example, a central processing unit (CPU), an image processor (GPU) ), a tensor processor (TPU), a field programmable logic gate array (FPGA), or other forms of processing units with data processing capabilities and/or instruction execution capabilities, and corresponding computer instructions to implement the log acquisition unit.
  • CPU central processing unit
  • GPU image processor
  • TPU tensor processor
  • FPGA field programmable logic gate array
  • the log data is transmitted to the log cache unit in a data stream for caching, rather than directly transmitted to, for example, a speed resistance machine for storage as a file, which can avoid resource contention. , Thereby avoiding problems such as incomplete storage of log data in a high-concurrency environment.
  • the log cache unit includes a message queue component.
  • step S120 can be specifically implemented as step S250 as shown in FIG. 2: directly transmitting log data to the message queue component for buffering.
  • the message queuing component is a distributed message queuing component, for example, it can be implemented by using a kafka component, which is not limited in the embodiment of the present disclosure.
  • the distributed message queue component includes a plurality of different message queues, for example, including a first message queue, a second message queue, ..., an Nth (N is an integer greater than 2) message queue, etc.
  • the Nth message queue is a different message queue, for example, message queues with different topics.
  • the log data of different log types can be sent to different message queues in the message queue component for buffering.
  • the general log data generated by the general docker container is sent to the first message queue in the message queue component for buffering; the art log data generated by the art docker container is sent to the message queue component In the second message queue in the cache. Therefore, the orderly transmission of data streams can be realized based on the concurrent throughput capability of the message queue component.
  • the message queue component can be implemented as a distributed messaging system that supports partitions, multiple copies, and is based on coordination mechanisms such as zookeeper. Its greatest feature is that it can process large amounts of data in real time. To meet various demand scenarios.
  • the message queue component classifies messages according to topics when they are saved, the sender of the message is called a producer, and the message receiver is called a consumer.
  • the message queue cluster includes multiple message queue instances, and each message queue instance is called a broker. Whether it is a message queue cluster, or producers and consumers, they all rely on zookeeper to ensure system availability.
  • the object of publishing and subscribing in the message queue component is the message queue under the topic. You can create a topic for each type of log data.
  • the client that publishes messages to the message queue of each topic is called the producer, and the client that subscribes to the message from the message queue of each topic is called the consumer. Producers and consumers can simultaneously read and write data from multiple topic message queues.
  • a message queue cluster is composed of one or more agents (for example, servers), which are responsible for persisting and backing up specific queue messages.
  • the machines/services in the message queue cluster are called agents.
  • a node in the message queue component is an agent, and a message queue cluster includes multiple agents. It should be noted that a node can include multiple agents. The number of agents on a machine is determined by the number of servers.
  • a topic represents a type of message
  • the directory where the message is stored is the topic, such as page view logs, click logs, etc.
  • the message queue cluster can be responsible for the distribution of messages in message queues of multiple topics at the same time.
  • An agent can include multiple topics.
  • partitions represent physical groupings of topics.
  • a topic can be divided into multiple partitions, and each partition is an ordered queue.
  • the message queue component is responsible for associating the log data with the corresponding partition.
  • the message represents the data object to be transferred, which mainly includes four parts: offset, key, value, and insertion time.
  • the log data in the embodiment of the present disclosure is the message.
  • the producer produces messages and sends them to the message queue of the corresponding topic.
  • a consumer subscribes to a topic and consumes the messages stored in the message queue of the topic, and the consumer consumes as a thread.
  • a consumer group contains multiple consumers, this is pre-configured in the configuration file.
  • Consumers can form a consumer group.
  • Each message in the partition can only be consumed by one consumer (consumer thread) in the consumer group. If a message can be consumed by multiple consumers (consumer If user thread) consumes, then these consumers need to be in different groups.
  • the message queue component allows only one consumer thread to access a partition. If you feel that the efficiency is not high, you can expand horizontally by increasing the number of partitions, then add new consumer threads to consume, so as to give full play to the horizontal scalability and extremely high throughput, which also forms a distributed The concept of consumption.
  • a message queue cluster contains several producers (which can be PageView generated by the web front-end, server logs or system CPU, storage, etc.), several agents (message queue components support horizontal expansion, the more the number of agents, the more the cluster throughput The higher the rate), several consumer groups, and a Zookeeper cluster.
  • the message queue component manages the cluster configuration through Zookeeper, elects decision makers, and performs rebalancing operations when the consumer group changes. Producers use push mode to publish messages to agents, and consumers use pull mode to subscribe and consume messages from agents.
  • the process from the producer to the agent is a push operation, that is, data is pushed to the agent, and the process from the consumer to the agent is a pull operation.
  • the consumer actively pulls the data instead of The agent actively sends the data to the consumer.
  • the log collection unit includes a data flow migration component.
  • the data flow migration component includes a distributed data flow migration component, such as a big data ETL (Extraction-Transformation-Loading, extraction, transformation, and loading) component such as a flume component.
  • ETL Extraction-Transformation-Loading, extraction, transformation, and loading
  • flume component a component having an interface corresponding to the log cache unit, which is not limited in the embodiment of the present disclosure.
  • this step S130 specifically includes step S260: collecting log data buffered in the message queue component through the data flow migration component, and transmitting the log data to the log storage unit for storage.
  • the log collection unit includes multiple data flow migration components, and different data flow migration components correspond to message queues of different topics in a one-to-one correspondence to collect log data buffered in different message queues.
  • the log collection unit reads the log data buffered in different message queues one by one to collect the log data buffered in the log buffer unit, that is, the transmission method of the data flow from the message queue component to the data flow migration component adopts streaming transmission.
  • the data flow migration component can be implemented as a distributed system for effectively collecting, aggregating and moving large amounts of log data from many different sources (for example, message queue components) to a centralized data storage area.
  • Tools/services or data centralization mechanisms that can collect data resources such as logs and events, and collect these huge amounts of log data from various data resources.
  • the external structure of the data flow migration component may include a data generator.
  • the log data generated by the data generator (for example, a message queue component) is collected by a single agent running on the server where the data generator is located, and then The data receiver collects log data from each agent area and stores the collected log data in the log storage unit.
  • the data flow migration component includes one or more agent zones, but for each agent zone, it is an independent daemon (JVM) that receives log data from the client (for example, the message queue component), Or receive log data from other agent areas, and then quickly transmit the obtained log data to the next destination node, such as sinks, log storage units, or the next agent area.
  • JVM independent daemon
  • the agent zone mainly includes three components: data source (source), channel (channel) and sink (sink).
  • the data source receives log data from the data generator and transmits the received log data to one or more channels.
  • a channel is a short-lived storage container that caches log data received from the data source until they are consumed by the receiver. It acts as a bridge between the data source and the receiver.
  • the channel is a complete transaction, which ensures the consistency of data when receiving and sending, and it can be connected to any number of data sources and receivers.
  • the types of channels are: JDBC channel, File System channel, Memort channel, etc.
  • the receiver stores log data in, for example, a log storage unit. It consumes the log data from the channel and delivers it to a destination.
  • the destination may be another receiver or a log storage unit.
  • the data flow migration component can be implemented by the flume component.
  • the log storage unit includes a big data storage platform, for example, including a distributed file system (HDFS, Hadoop Distributed File System), a database (for example, HBase (HadoopDatabase, open source non-relational distributed database)) Or other common files (for example, Windows files, linux files, etc.), etc., which are not limited in the embodiments of the present disclosure.
  • a distributed file system for example, Hadoop Distributed File System
  • a database for example, HBase (HadoopDatabase, open source non-relational distributed database)
  • other common files for example, Windows files, linux files, etc.
  • transmitting log data to the log storage unit for storage includes: transmitting the log data collected by the data flow migration component to a distributed file system for distributed storage.
  • log data in different data flow migration components are stored on different distributed file systems.
  • the log data is transmitted to a log storage unit (for example, a distributed file system) for storage.
  • the system time may be the time on the machine or system that executes the log data processing method.
  • folders can be created according to the subject, year, month, and first time range (for example, some specific time ranges, such as 00:00-12:00, 12:00-24:00, etc.) And files, so that the log data corresponding to a certain topic and time is stored in the corresponding file or folder, thereby realizing the distributed storage of the log data, which is beneficial to processing the log data within the corresponding range in the subsequent steps.
  • the log collection method further includes: performing data processing on the log data stored in the log storage unit, so as to ensure the accuracy and practicability of the stored log data.
  • Fig. 4 shows a flow chart of data processing provided by at least one embodiment of the present disclosure. As shown in Fig. 4, the data processing operation includes step S140-step S180. The data processing operation provided by at least one embodiment of the present disclosure will be described in detail below with reference to FIG. 4.
  • Step S140 Use the time slice as a filter condition to determine the data range of the log data to be processed.
  • the time slice represents a time range.
  • a time range is set according to actual needs to filter out log data within the time range for the following data processing.
  • the time slice may include a range of a first time range, that is, the time slice is a first time range (for example, 00:00-12:00), so as to rate log data in the first time range To process.
  • the time slice may also include multiple first time ranges, that is, the time slice covers multiple first time ranges (for example, it is 00:00-24:00, covering two first time ranges) , So that the multiple log data in the first time range can be filtered for processing.
  • Step S150 Distributedly read in log data of at least one data range one by one.
  • At least one data range can be acquired based on different time slices in step S140.
  • the log data in the at least one data range can be processed simultaneously.
  • the log data in each data range is read in one by one to process the read log data one by one.
  • the log data read in one by one is used to continue to perform step S160, that is, to determine whether it is compliant, to filter out the compliant data for subsequent processes; in other examples, the one-by-one read in The log data can be directly used to execute step S170, that is, to perform structured processing.
  • the specific operation steps can be set according to actual conditions, and the embodiment of the present disclosure does not limit this.
  • Step S160 Determine whether the log data in the data range is compliant, if yes, perform step S170, if not, continue to perform step S160 to continue to determine whether the remaining log data is compliant.
  • the log data in each distributed file system can be cleaned.
  • determining whether the log data within the data range is compliant may include determining whether the format, information (for example, user identification information, user question information, etc.), and time of the log data are compliant, which is not limited in the embodiment of the present disclosure. Based on this step, accurate log data can be filtered out for subsequent data analysis.
  • Step S170 Collect the log data structured.
  • the process of structured collection includes: converting, for example, log data in text form into a matrix form.
  • a big data processing program in this field can be used to clean the newly added log data in the distributed file system according to task scheduling.
  • Step S180 output the log data to a target file with a time slice for storage.
  • the log data structured in step S180 is stored in the target file corresponding to its time range, thereby completing the distributed storage of the log data.
  • the log data after the above data processing is collected into the result file, and then the relevant calculation of the indicators required for the report (for example, question and answer time, the number of questions and answers, etc.), and the calculation results of the indicators required by the report are displayed in the report display system.
  • the relevant calculation of the indicators required for the report for example, question and answer time, the number of questions and answers, etc.
  • the calculation results of the indicators required by the report are displayed in the report display system. For example, histogram display.
  • the log data collection method provided by the above-mentioned embodiments of the present disclosure can solve the problem of incomplete log data storage in a high-concurrency environment, thereby broadening the use environment of the application container and improving its market competitiveness.
  • the flow of the log data collection method may include more or fewer operations, and these operations may be executed sequentially or in parallel.
  • the flow of the log data collection method described above includes multiple operations appearing in a specific order, it should be clearly understood that the order of the multiple operations is not limited.
  • the log data collection method described above can be executed once or multiple times according to predetermined conditions.
  • At least one embodiment of the present disclosure also provides a log data collection device.
  • Fig. 5 is a schematic block diagram of a log data collection device provided by at least one embodiment of the present disclosure.
  • the log data collection device 100 includes a log acquisition unit 110, a log cache unit 120, a log collection unit 130, and a log storage unit 140.
  • these units may be implemented in the form of hardware (for example, circuit) modules, software modules, and any combination thereof.
  • the log obtaining unit 110 is configured to obtain log data generated by at least one container in an application container environment.
  • the log obtaining unit 110 may implement step S110, and its specific implementation method can refer to the related description of step S110, which will not be repeated here.
  • the log cache unit 120 is configured to cache log data.
  • the log caching unit 120 can implement step S120, and the specific implementation method can refer to the related description of step S120, which will not be repeated here.
  • the log collection unit 130 is configured to collect and transmit log data buffered in the log buffer unit 120, and the log storage unit 140 is configured to store log data.
  • the log collection unit 130 and the log storage unit 140 can implement step S130, and the specific implementation method can refer to the related description of step S130, which will not be repeated here.
  • the log cache unit 120 includes a message queue component
  • the log collection unit 130 includes a data flow migration component
  • the log storage unit 140 includes a distributed file system.
  • Log Data Collection Method for specific description, please refer to the description in Log Data Collection Method, which will not be repeated here.
  • the log data collection device may include more or fewer circuits or units, and the connection relationship between the various circuits or units is not limited and can be determined according to actual requirements.
  • the specific structure of each circuit is not limited, and may be composed of analog devices according to the circuit principle, or may be composed of digital chips, or be composed in other suitable manners.
  • Fig. 6 is a schematic block diagram of another log data collection device provided by at least one embodiment of the present disclosure.
  • the log data collection device 200 includes a processor 210, a memory 220, and one or more computer program modules 221.
  • the processor 210 and the memory 220 are connected through a bus system 230.
  • one or more computer program modules 221 are stored in the memory 220.
  • one or more computer program modules 221 include instructions for executing the log data collection method provided by any embodiment of the present disclosure.
  • instructions in one or more computer program modules 221 may be executed by the processor 210.
  • the bus system 230 may be a commonly used serial or parallel communication bus, etc., which is not limited in the embodiments of the present disclosure.
  • the processor 210 may be a central processing unit (CPU), a field programmable logic gate array (FPGA), or another form of processing unit with data processing capability and/or instruction execution capability, and may be a general-purpose processor or a dedicated processing unit. It can also control other components in the log data collection device 200 to perform desired functions.
  • CPU central processing unit
  • FPGA field programmable logic gate array
  • the memory 220 may include one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or nonvolatile memory.
  • the volatile memory may include random access memory (RAM) and/or cache memory (cache), for example.
  • the non-volatile memory may include read-only memory (ROM), hard disk, flash memory, etc., for example.
  • One or more computer program instructions may be stored on a computer-readable storage medium, and the processor 210 may run the program instructions to implement the functions (implemented by the processor 210) and/or other desired functions in the embodiments of the present disclosure, For example, log data collection methods, etc.
  • the computer-readable storage medium may also store various application programs and various data, such as log data generated in at least one application container and various data used and/or generated by the application program.
  • the embodiment of the present disclosure does not provide all the constituent units of the log data collection device 200.
  • those skilled in the art may provide and set other unshown component units according to specific needs, which are not limited in the embodiments of the present disclosure.
  • FIG. 7 is a schematic diagram of a storage medium provided by at least one embodiment of the present disclosure.
  • the storage medium 300 non-temporarily stores computer-readable instructions 301, and when the computer-readable instructions 301 are executed by a computer (including a processor), the log data collection method provided by any embodiment of the present disclosure can be executed.
  • the storage medium may be any combination of one or more computer-readable storage media.
  • one computer-readable storage medium contains computer-readable program code for buffering log data
  • another computer-readable storage medium contains collected log data.
  • the program code when the program code is read by a computer, the computer can execute the program code stored in the computer storage medium, and execute, for example, the log data collection method provided in any embodiment of the present disclosure.
  • the program code can be used by or in combination with an instruction execution system, apparatus, or device.
  • the program code contained on the computer-readable storage medium can be transmitted by any suitable medium, including but not limited to: wire, optical cable, RF (Radio Frequency), etc., or any suitable combination of the above.
  • the storage medium may include a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM), The portable compact disk read-only memory (CD-ROM), flash memory, or any combination of the above storage media can also be other suitable storage media.
  • RAM random access memory
  • ROM read only memory
  • EPROM erasable programmable read only memory
  • CD-ROM compact disk read-only memory
  • flash memory or any combination of the above storage media can also be other suitable storage media.
  • At least one embodiment of the present disclosure also provides a log data collection system.
  • the log data collection system 500 includes a terminal device 510 and a server 520.
  • the terminal device 510 is configured to receive audio or text information, and send the audio or text information to the server 520.
  • the terminal device may be an electronic device such as an electronic picture frame.
  • the terminal device will be described in detail in FIG. 9 and will not be repeated here.
  • the server 520 is configured to receive audio or text information sent by the terminal device 510 and generate log data, and collect the log data based on the log data collection method provided in any embodiment of the present disclosure.
  • audio or text information includes general audio or text information and artistic audio or text information
  • server 520 includes general application containers and artistic application containers, message queue components, data flow migration components, and distributed File system.
  • the general application container is configured to output general log data in response to general audio or text information
  • the art application container is configured to output art log data in response to art audio or text information
  • the message queue component is configured to Cache general log data and art log data
  • data flow migration component configured to collect and transmit general log data and art log data cached in the message queue component
  • distributed file system configured to store general log data And art log data.
  • the general application container, the art application container, the message queue component, the data flow migration component, and the distributed file system can refer to the specific description of the above log data collection method, which will not be repeated here.
  • the message queue component includes message queues with general topics and message queues with artistic topics.
  • General log data is cached in the message queue of the general theme
  • art log data is cached in the log data of the art theme.
  • the server 520 is further configured to determine whether the general log data and the art log data stored on the distributed file system are compliant based on the first principle.
  • the first principle can be set according to the power-on time of the electronic picture frame, the screen orientation of the electronic picture frame, or the volume of the electronic picture frame.
  • the first principle when judging the boot time of the electronic picture frame, can be set to 2019, that is, when the boot time shows 2099, it is not compliant; for example, the first principle can be set to the horizontal screen included in the electronic picture frame And the vertical screen, so when the inclined screen is displayed, it is non-compliant; for example, the first rule can be set to the volume of the electronic picture frame to be 0-100, and when the volume displays 300, it is non-compliant.
  • the embodiment of the present disclosure does not limit this.
  • FIG. 9 shows a schematic diagram of a terminal device provided by at least one embodiment of the present disclosure.
  • the terminal device 600 may include, but is not limited to, electronic picture frames, mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs ( Mobile terminals such as tablet computers), PMP (portable multimedia players), vehicle-mounted terminals (such as vehicle navigation terminals), and fixed terminals such as digital TVs, desktop computers, and the like.
  • the terminal device shown in FIG. 9 is only an example, and the embodiment of the present disclosure does not limit this.
  • the terminal device 600 may include a processing device (such as a central processing unit, a graphics processor, etc.) 601, which can be loaded into a random access device according to a program stored in a read-only memory (ROM) 602 or from a storage device 608.
  • the program in the memory (RAM) 603 executes various appropriate actions and processing, for example, the above-mentioned log data collection method.
  • various programs and data required for the operation of the terminal device 600 are also stored.
  • the processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also connected to the bus 604.
  • the devices connected to the I/O interface 605 include: input devices 606 including, for example, touch screens, touch pads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; including, for example, liquid crystal displays (LCD), speakers, vibration An output device 607 such as a device; a storage device 608 such as a magnetic tape and a hard disk; and a communication device 609.
  • the communication device 609 may allow the terminal device 600 to perform wireless or wired communication with other devices to exchange data.
  • FIG. 9 shows a terminal device 600 including various devices, it should be understood that it is not required to implement or have all the illustrated devices, and may alternatively implement or have more or fewer devices.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a storage medium as shown in FIG. 7, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program can be downloaded and installed from the network through the communication device 609, or installed from the storage device 608, or installed from the ROM 602.
  • the above-mentioned functions defined in the log data collection method of the embodiment of the present disclosure are executed.
  • the above-mentioned storage medium may be included in the above-mentioned terminal device; or it may exist alone without being assembled into the terminal device.
  • the terminal device and the server can use any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol) to communicate, and can communicate with any form or medium.
  • Digital data communication e.g., communication network
  • Examples of communication networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (e.g., the Internet), and end-to-end networks (e.g., ad hoc end-to-end networks), as well as any currently known or future research and development network of.
  • LAN local area networks
  • WAN wide area networks
  • the Internet e.g., the Internet
  • end-to-end networks e.g., ad hoc end-to-end networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种日志数据收集方法、日志数据收集装置、存储介质以及日志数据收集系统。该日志数据收集方法,包括:获取在应用容器环境下至少一个容器产生的日志数据;将日志数据传输至日志缓存单元中进行缓存;通过日志采集单元采集日志缓存单元中缓存的日志数据,并将日志数据传输至日志存储单元上进行存储。该日志数据收集方法可以解决在应用容器环境下产生的日志数据保存不全的问题。

Description

日志数据收集方法、日志数据收集装置、存储介质和日志数据收集系统 技术领域
本公开的实施例涉及一种日志数据收集方法、日志数据收集装置、存储介质和日志数据收集系统。
背景技术
Docker是一个开源的应用容器引擎,让开发者可以打包他们的应用以及将这些应用打包到一个可移植的容器(container)中,然后发布到任何流行的Linux或Windows机器上,也可以实现虚拟化。容器使用沙箱机制,且容器与容器之间不会有任何接口,是相互独立的。
发明内容
本公开至少一实施例提供一种日志数据收集方法,包括:获取在应用容器环境下至少一个容器产生的日志数据;将所述日志数据传输至日志缓存单元中进行缓存;通过日志采集单元采集所述日志缓存单元中缓存的日志数据,并将所述日志数据传输至日志存储单元上进行存储。
例如,在本公开至少一实施例提供的日志数据收集方法中,所述日志缓存单元包括消息队列组件,所述日志采集单元包括数据流迁移组件;所述日志数据收集方法包括:将所述日志数据直接传输至所述消息队列组件中进行缓存;通过所述数据流迁移组件采集所述消息队列组件中缓存的日志数据,并将所述日志数据传输至所述日志存储单元上进行存储。
例如,在本公开至少一实施例提供的日志数据收集方法中,将所述日志数据传输至日志缓存单元中进行缓存,包括:根据所述日志数据的日志类型,将不同日志类型的日志数据分别发送至所述消息队列组件中不同的消息队列中进行缓存。
例如,在本公开至少一实施例提供的日志数据收集方法中,通过日志采集单元采集所述日志缓存单元中缓存的日志数据,包括:所述日志采集单元逐个读取所述不同的消息队列中缓存的日志数据,以采集所述日志缓存单元 中缓存的日志数据。
例如,在本公开至少一实施例提供的日志数据收集方法中,所述日志数据包括错误级日志数据、警告级日志数据和信息级日志数据。
例如,在本公开至少一实施例提供的日志数据收集方法中,基于系统时间并按照第一时间范围将所述日志数据传输至所述日志存储单元上进行存储。
例如,在本公开至少一实施例提供的日志数据收集方法中,所述日志存储单元包括分布式文件系统;将所述日志数据传输至所述日志存储单元上进行存储包括:将所述日志采集单元采集的日志数据,传输至所述分布式文件系统上进行分布式存储。
例如,本公开至少一实施例提供的日志数据收集方法,还包括:对存储至所述日志存储单元的日志数据进行数据处理。
例如,在本公开至少一实施例提供的日志数据收集方法中,使用时间片作为过滤条件确定需要进行所述数据处理的日志数据的数据范围;判断所述数据范围内的日志数据是否合规,如果合规,则结构化收集所述日志数据,并输出所述日志数据至带有时间片的目标文件中进行存储。
例如,在本公开至少一实施例提供的日志数据收集方法中,判断所述数据范围内的日志数据是否合规包括:分布式逐条读入至少一个所述数据范围的日志数据,以判断所述至少一个数据范围内的日志数据是否合规。
例如,在本公开至少一实施例提供的日志数据收集方法中,所述日志数据为智能问答系统产生的日志数据。
例如,在本公开至少一实施例提供的日志数据收集方法中,所述日志数据的类型包括第一类日志数据和第二类日志数据;所述第一类日志数据发送至所述消息队列组件中的第一消息队列中进行缓存;所述第二类日志数据发送至所述消息队列组件中的第二消息队列中进行缓存;所述第一消息队列和所述第二消息队列为不同的消息队列。
例如,在本公开至少一实施例提供的日志数据收集方法中,所述第一类日志数据为基于通用类问答产生的日志数据,所述第二类日志数据为基于艺术类问答产生的日志数据。
例如,在本公开至少一实施例提供的日志数据收集方法中,所述应用容 器环境包括所述至少一个容器,所述智能问答系统包括自然语言理解子系统,所述自然语言理解子系统运行在所述应用容器环境的至少一个容器上并产生所述日志数据,所述至少一个容器响应于业务请求输出所述日志数据。
例如,在本公开至少一实施例提供的日志数据收集方法中,所述应用容器环境包括多个容器,所述自然语言理解子系统的不同业务模块运行在不同的容器中。
例如,在本公开至少一实施例提供的日志数据收集方法中,所述应用容器环境采用docker容器引擎实现。
本公开至少一实施例还提供一种日志数据收集装置,包括:日志获取单元、日志缓存单元、日志采集单元和日志存储单元。日志获取单元,配置为获取在应用容器环境下至少一个容器产生的日志数据;日志缓存单元,配置为缓存所述日志数据;日志采集单元,配置为采集所述日志缓存单元中缓存的日志数据并进行传输;日志存储单元,配置为存储所述日志数据。
例如,在本公开至少一实施例提供的日志数据收集装置中,所述日志缓存单元包括消息队列组件,所述日志采集单元包括数据流迁移组件,所述日志存储单元包括分布式文件系统。
本公开至少一实施例还提供一种日志数据收集装置,包括:处理器;存储器,存储有一个或多个计算机程序模块;所述一个或多个计算机程序模块被配置为由所述处理器执行,所述一个或多个计算机程序模块包括用于执行实现本公开任一实施例提供的日志数据收集方法的指令。
本公开至少一实施例还提供一种存储介质,非暂时性地存储计算机可读指令,当所述计算机可读指令由计算机执行时可以执行根据本公开任一实施例提供的日志数据收集方法。
本公开至少一实施例还提供一种日志数据收集系统,包括终端设备和服务器;所述终端设备配置为接收音频或文字信息,并将所述音频或文字信息发送至所述服务器;所述服务器配置为接收所述终端设备发送的所述音频或文字信息,并产生日志数据,且基于本公开任一实施例提供的日志数据收集方法收集所述日志数据。
例如,在本公开至少一实施例提供的日志数据收集系统中,所述终端设备包括电子画框。
例如,在本公开至少一实施例提供的日志数据收集系统中,所述音频或文字信息包括通用类音频或文字信息和艺术类音频或文字信息,所述服务器包括通用类应用容器和艺术类应用容器、消息队列组件、数据流迁移组件和分布式文件系统;所述通用类应用容器,配置为响应于所述通用类音频或文字信息输出通用类日志数据;所述艺术类应用容器,配置为响应于所述艺术类音频或文字信息输出艺术类日志数据;所述消息队列组件,配置为缓存所述通用类日志数据和所述艺术类日志数据;所述数据流迁移组件,配置为采集所述消息队列组件中缓存的所述通用类日志数据和所述艺术类日志数据并进行传输;所述分布式文件系统,配置为存储所述通用类日志数据和所述艺术类日志数据。
例如,在本公开至少一实施例提供的日志数据收集系统中,所述消息队列组件包括通用类主题的消息队列和艺术类主题的消息队列;所述通用类日志数据缓存在所述通用类主题的消息队列中,所述艺术类日志数据缓存在所述艺术类主题的日志数据中。
例如,在本公开至少一实施例提供的日志数据收集系统中,所述服务器还配置为根据第一原则判断存储在所述分布式文件系统上的所述通用类日志数据和所述艺术类日志数据是否合规。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例的附图作简单地介绍,显而易见地,下面描述中的附图仅仅涉及本发明的一些实施例,而非对本发明的限制。
图1为本公开至少一实施例提供的一种日志数据收集方法的流程图;
图2为本公开至少一实施例提供的另一种日志数据收集方法的流程图;
图3为本公开至少一实施例提供的一种日志数据收集方法的示意框图;
图4为本公开至少一实施例提供的一种数据处理操作的流程图;
图5为本公开至少一实施例提供的一种日志数据收集装置的示意框图;
图6为本公开至少一实施例提供的另一种日志数据收集装置的示意框图;
图7为本公开至少一实施例提供的一种存储介质的示意图;
图8为本公开至少一实施例提供的一种日志数据收集系统的示意图;以及
图9为本公开至少一实施例提供的一种终端设备的示意图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例的附图,对本发明实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本发明的一部分实施例,而不是全部的实施例。基于所描述的本发明的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。
除非另外定义,本公开使用的技术术语或者科学术语应当为本发明所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。同样,“一个”、“一”或者“该”等类似词语也不表示数量限制,而是表示存在至少一个。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。
下面通过几个具体的实施例对本公开进行说明。为了保持本发明实施例的以下说明清楚且简明,可省略已知功能和已知部件的详细说明。当本发明实施例的任一部件在一个以上的附图中出现时,该部件在每个附图中由相同的参考标号表示。
部署在容器中的应用程序可以自动化运行。在应用开发过程中,需要持续不断地执行开发、部署、测试,而将代码编译打包成docker镜像(image)是应用部署、发布、运维管理的基础。而且,系统可以方便地根据不同的镜像产生不同的容器,以应对不同的业务需要,并且在相应业务需要消失后可以方便回收容器,因此,可以通过应用容器引擎方便地实现可伸缩架构。
基于例如docker环境的智能问答系统是在高并发环境下运行的,使用可 伸缩的应用容器环境可以应对高并发的业务请求,但是也产生了大量的日志数据。同时,在高并发环境下,由于多个docker容器彼此之间是独立运行的,线上数据生产快且具有突发性,因此可能会因为容器环境读写文件的限制,导致日志数据保存不全。例如,当该多个docker容器中产生的数据均以文件的形式同时存入例如速阻机时,会产生争夺存储资源的现象,因此,可能会使得在日志数据产生的高峰时间日志数据写入失败,从而造成数据存储不全或读取麻烦的问题。
本公开至少一实施例提供一种日志数据收集方法,包括:获取在应用容器环境下至少一个容器产生的日志数据;将日志数据传输至日志缓存单元中进行缓存;通过日志采集单元采集日志缓存单元中缓存的日志数据,并将日志数据传输至日志存储单元上进行存储。
本公开至少一实施例还提供一种对应于上述日志数据收集方法的日志数据收集装置、存储介质和日志数据收集系统。
本公开上述实施例提供的日志数据收集方法可以解决在应用容器环境下产生的日志数据保存不全的问题,从而拓宽了应用容器的使用环境,提高了其市场竞争力。
下面结合附图对本公开的实施例及其一些示例进行详细说明。
图1为本公开至少一实施例提供的一种日志数据收集方法的流程图。例如,该日志数据收集方法可以应用于基于应用容器环境运行的各种系统,例如,智能问答系统等,当然,也可以用于其他运行环境下各个系统,本公开的实施例对此不作限制。该日志数据收集方法可以以软件的方式实现,由智能问答系统中的处理器加载并执行,例如由中央处理器(Central Processing Unit,CPU)加载并执行;或,至少部分以软件、硬件、固件或其任意组合的方式实现,可以解决在高并发环境下产生的日志数据保存不全的问题,拓宽了应用容器环境的应用领域,提高了市场竞争率。
下面,参考图1对本公开至少一实施例提供的日志数据收集方法进行说明。如图1所示,该日志数据收集方法包括步骤S110至步骤S130,下面对该日志数据收集方法的步骤S110至步骤S130以及它们各自的示例性实现方式分别进行介绍。
步骤S110:获取在应用容器环境下至少一个容器产生的日志数据。
步骤S120:将日志数据传输至日志缓存单元中进行缓存。
步骤S130:通过日志采集单元采集日志缓存单元中缓存的日志数据,并将日志数据传输至日志存储单元上进行存储。
例如,上述步骤中提到的日志缓存单元以及日志采集单元可以通过硬件(例如电路)模块或软件模块及其任意组合等形式实现。例如,可以通过中央处理单元(CPU)、图像处理器(GPU)、张量处理器(TPU)、现场可编程逻辑门阵列(FPGA)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元以及相应计算机指令来实现这些单元。例如,该处理单元可以为通用处理器或专用处理器,可以是基于X86或ARM架构的处理器等。
对于步骤S110,例如,在本公开一些实施例中,应用容器环境采用docker容器引擎实现,相应地,该应用容器例如为docker容器。例如,各个docker容器是相互独立的,当业务(例如,智能问答业务)数量增多时,可以相应地增加docker容器的数量,从而提高docker容器的处理效率,本公开的实施例对此不作限制。
例如,可以把每个docker容器看做是一个独立的主机。docker容器的创建通常有一个镜像(Image)作为其模板。类比成虚拟机的话,可以理解为镜像就是虚拟机的镜像,而docker容器就是一个个正在运行的虚拟机。例如,docker容器创建后里面有什么软件完全取决于它使用的镜像。镜像可以通过docker容器创建(相当于把此时docker容器的状态保存成快照),也可以通过Dockerfile(一个文本文件,里面使用docker规定的一些规则)来创建。仓库(Registry)是集中存放镜像文件的场所,每个仓库中可以包含多个镜像,每个镜像有不同的标签(tag)。仓库分为公开(Public)仓库和私有(Private)仓库两种形式。最大的公开仓库是Docker Hub,存放了数量庞大的镜像供用户下载。本公开的实施例对于镜像的创建、存放等不作限制。
下面以docker容器应用于智能问答系统为例进行说明,相应地,该日志数据为基于该docker容器环境运行的智能问答系统产生的日志数据,本公开的实施例对此不作限制。
例如,应用容器环境包括至少一个容器,智能问答系统包括的自然语言理解(Natural Language Understanding,简称NLU)子系统(例如,问答(Q&A)子系统、对话子系统等)运行在应用容器环境的至少一个容器上并产生日志 数据。例如,该至少一个容器响应于业务请求输出该日志数据。
例如,在一些示例中,应用容器环境包括多个容器,自然语言理解子系统的不同业务模块(例如第一类业务模块(例如,通用类业务模块)、第二类业务模块(例如,艺术类业务模块))运行在不同的容器中,以实现对不同业务请求的响应,从而输出不同类型的日志数据。
例如,如图3所示,自然语言理解子系统的通用类业务模块处理通用类的业务请求(例如,天气问答、时间问答等日常生活中的常用用语),且运行在第一类docker容器(例如,通用类docker容器)中;自然语言理解子系统的艺术类业务模块处理艺术类的业务请求(例如,画作是谁画的等问答),且运行在第二类docker容器(例如,艺术类docker容器)中。
例如,该日志数据的类型包括第一类日志数据(例如,通用类日志数据)和第二类日志数据(例如,艺术类日志数据)。例如,该通用类日志数据例如包括响应于天气问答、时间问答等业务请求产生的日志数据,即通用类docker容器产生的日志数据;艺术类日志数据例如包括响应于画作问答等业务请求产生的日志数据,即艺术类docker容器产生的日志数据,本公开的实施例对此不作限制。需要注意的是,日志数据的类型还可以包括推理类日志数据或更多其他类型的日志数据,例如,该推理类日志数据可以是对上述业务请求进行判断和处理过程中产生的日志数据,本公开的实施例对此不作限制。
例如,在一些示例中,该日志数据可以分为多个级别,例如包括错误(error)级日志数据、警告(warn)级日志数据和信息(info)级日志数据。例如,错误级日志数据包括可能仍然允许应用继续运行的错误事件,警告级日志数据包括潜在的有害位置,信息级日志数据包括在表示应用运行进程中较粗粒度的信息事件。需要注意的是,该日志数据还可以包括调试(debug)级日志数据和致命(fatal)级日志数据,例如,该调试级日志数据包括对于调试应用比较有用的较细粒度的信息事件,其等级低于信息级数据,该致命级数据包括可能导致应用被终止的非常严重的错误事件,其等级可高于错误级日志数据和警告级日志数据,本公开的实施例对此不作限制。
例如,在本公开的一些实施例中,可以仅收集错误级日志数据、警告级日志数据和信息级日志数据,通过仅收集对应级别类的日志数据,可以减少日志数据的数量,提高系统的工作效率和准确率。
例如,在一些示例中,如图2所示,该步骤S110可以包括步骤S210-步骤S240。
步骤S210:接收业务请求。
例如,该业务请求可以是智能问答系统接收的问题,例如,今天天气如何,现在几点了等等。例如,该智能问答系统不限于一个,可以包括多个,该日志收集方法可以对该多个智能问答系统产生的日志数据同时进行收集。
步骤S220:至少一个应用容器处理该业务请求。
例如,至少一个应用容器响应于业务请求输出对相应问题的回答。例如,不同类型的业务请求在不同的docker容器中处理,这些不同的docker容器基于不同的镜像创建。例如,如图3所示,根据处理的业务请求的不同,docker容器可以分为通用类docker容器、艺术类docker容器等,相应地,其产生的日志数据为通用类日志数据以及艺术类日志数据等,具体可根据实际情况设置,本公开的实施例对此不作限制。
步骤S230:产生相关的日志数据。
例如,在该至少一个应用容器处理该业务请求的过程中,会产生例如用户标识信息,用户问题信息以及设备信息等多个日志数据,该多个日志数据可以分为上述类型和等级,从而基于其类型和等级对其进行相应的存储,以便于在后续处理过程中进行处理和调用。
步骤S240:获取该日志数据。
例如,可以根据需要获取相应的日志数据。例如,如上所述,可以仅获取各个等级的日志数据中的错误级日志数据、警告级日志数据和信息级日志数据,以减少日志数据的处理量,提高日志数据的处理效率。
例如,在一些示例中,上述日志数据可以根据实际需要分为不需要持久化保存的日志数据和需要持久化保存的日志数据。例如,在一些示例中,可以将在应用容器环境下产生的不需要持久化保存的日志数据传输至日志缓存单元中进行缓存,将需要持久化保存的日志数据直接以文件的形式存储,也可以将二者均通过日志缓存单元进行缓存,具体可根据实际需要进行设置,本公开的实施例对此不作限制。例如,可以根据日志数据的重要程度或实际需要决定其是否需要存储多份。例如,在一些示例中,十分重要的日志数据可以包括错误级日志数据和警告级日志数据,例如可用于问题追踪、错误判 断等。例如,相对不重要的日志数据可以包括调试级日志数据或信息级日志数据等。例如,在一些实施例中,该相对重要的日志数据可以根据需要存储例如两份,一份传输至日志缓存单元中进行缓存,另一份直接以文件的形式存储,例如,存储在速阻机、硬盘上等。
例如,在一些示例中,该需要持久化保存的日志数据可以包括各个等级的日志数据,例如错误级日志数据和警告级日志数据等,该不需要持久化保存的日志数据也包括各个等级的日志数据,例如包括信息级以上的日志数据(例如,错误级日志数据、警告级日志数据和信息级日志数据等),例如用于后续的文本分析等,具体可根据实际需要设置,本公开的实施例对此不作限制。需要注意的是,需要传输和保存的日志数据可根据实际需要确定,例如,还可以包括调试级日志数据或致命级日志数据,本公开的实施例对此不作限制。
例如,获取上述日志数据之后,接下来,进入后续的步骤S120和步骤S130。
例如,可以提供用于获取日志数据的日志获取单元,并通过该日志获取单元获取在应用容器环境下至少一个容器产生的日志数据;例如,可以通过中央处理单元(CPU)、图像处理器(GPU)、张量处理器(TPU)、现场可编程逻辑门阵列(FPGA)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元以及相应计算机指令来实现日志获取单元。
对于步骤S120,例如,在该示例中,将日志数据以数据流的方式传输至日志缓存单元中进行缓存,而不是直接传输至例如速阻机中以文件的方式存储,可以避免出现资源争夺现象,从而避免了在高并发环境下产生的日志数据保存不全等问题。
例如,在一些示例中,日志缓存单元包括消息队列组件。例如,在该示例中,该步骤S120可以具体实现为如图2所示步骤S250:将日志数据直接传输至消息队列组件中进行缓存。
例如,该消息队列组件为分布式消息队列组件,例如,可以采用kafka组件实现,本公开的实施例对此不作限制。例如,分布式消息队列组件包括多个不同的消息队列,例如,包括第一消息队列、第二消息队列,……,第N(N为大于2的整数)消息队列等。第一消息队列、第二消息队列,…… 第N(消息队列为不同的消息队列,例如,不同主题的消息队列。
例如,根据日志数据的日志类型,可以将不同日志类型的日志数据分别发送至消息队列组件中不同的消息队列中进行缓存。例如,如图3所示,上述通用类docker容器产生的通用类日志数据发送至消息队列组件中的第一消息队列中进行缓存;上述艺术类docker容器产生的艺术类日志数据发送至消息队列组件中的第二消息队列中进行缓存。因此,可以基于该消息队列组件的并发吞吐能力实现数据流的有序传输。
下面对消息队列组件的结构和操作模式进行简要的介绍,需要注意的是,本公开的实施例不限于下面的介绍,还可以采用本领域的其他结构和操作模式,本公开的实施例对此不作限制。
例如,该消息队列组件可以实现为分布式、支持分区(partition)的、多副本(replica)的且基于例如zookeeper这样的协调机制的分布式消息系统,其最大的特性就是可以实时的处理大量数据以满足各种需求场景。例如,该消息队列组件对消息保存时根据主题(Topic)进行归类,发送消息者称为生产者(Producer),消息接收者称为消费者(Consumer)。消息队列集群包括多个消息队列实例,每个消息队列实例称为代理者(broker)。无论是消息队列集群,还是生产者和消费者,都依赖于zookeeper来保证系统可用性。
消息队列组件中发布订阅的对象是主题下的消息队列。可以为每类日志数据创建一个主题,把向各个主题的消息队列发布消息的客户端称作生产者,从各个主题的消息队列订阅消息的客户端称作消费者。生产者和消费者可以同时从多个主题的消息队列中读写数据。一个消息队列集群由一个或多个代理者(例如,服务器)组成,它负责持久化和备份具体的队列消息。
例如,消息队列集群中的机器/服务被称为代理者。消息队列组件中的一个节点就是一个代理者,一个消息队列集群包括多个代理者。需要注意的是,一个节点上可以包括多个代理者。一台机器上的代理者的数量由服务器的数量决定。
例如,主题表示一类消息,消息存放的目录即主题,例如page view日志、click日志等都可以以主题的形式存在,消息队列集群能够同时负责多个主题的消息队列中的消息的分发。一个代理者可以包括多个主题。
例如,分区表示主题的物理分组,一个主题可以分为多个分区,每个分 区是一个有序的队列。在日志数据的产生和消费过程中,不需要关注具体存储的分区在哪个代理者上,只需要指定主题即可,由消息队列组件负责将日志数据和对应的分区关联上。
例如,消息表示传递的数据对象,主要包括四部分:偏移量、密钥、数值以及插入时间等。例如,本公开实施例中的日志数据即为该消息。
例如,生产者生产消息发送到相应主题的消息队列中。
例如,消费者订阅主题并消费该主题的消息队列中存储的消息,消费者作为一个线程来进行消费。
例如,一个消费者组包含多个消费者,这个是预先在配置文件中配置好的。各个消费者(消费者线程)可以组成一个消费者组,分区中的每个消息只能被消费者组中的一个消费者(消费者线程)消费,如果一个消息可以被多个消费者(消费者线程)消费的话,那么这些消费者需要在不同的组。消息队列组件为了保证吞吐量,只允许一个消费者线程去访问一个分区。如果觉得效率不高的时候,可以通过增加分区的数量来横向扩展,那么再加新的消费者线程去消费,从而充分发挥了横向的扩展性,吞吐量极高,这也就形成了分布式消费的概念。
例如,一个消息队列集群中包含若干生产者(可以是web前端产生的PageView、服务器日志或系统CPU、存储器等)、若干代理者(消息队列组件支持水平扩展,一般代理者数量越多,集群吞吐率越高)、若干消费者组以及一个Zookeeper集群。消息队列组件通过Zookeeper管理集群配置,选举决策者,以及在消费者组发生变化时进行再平衡操作。生产者使用推(push)模式将消息发布到代理者,消费者使用拉(pull)模式从代理者订阅并消费消息。
生产者到代理者的过程是推(push)操作,也就是有数据被推送到代理者,消费者到代理者的过程是拉(pull)操作,是通过消费者主动去拉数据的,而不是代理者把数据主动发送到消费者端的。
对于步骤S130,例如,在一些示例中,日志采集单元包括数据流迁移组件。例如,该数据流迁移组件包括分布式数据流迁移组件,例如flume组件等大数据ETL(Extraction-Transformation-Loading,提取、转换和加载)组件。需要注意的是,该日志采集单元只要是具有与日志缓存单元对应的接口的组 件即可,本公开的实施例对此不作限制。
例如,如图2所示,该步骤S130具体包括步骤S260:通过数据流迁移组件采集消息队列组件中缓存的日志数据,并将日志数据传输至日志存储单元上进行存储。如图3所示,日志采集单元包括多个数据流迁移组件,不同的数据流迁移组件与不同主题的消息队列一一对应,以分别收集不同的消息队列中缓存的日志数据。例如,日志采集单元逐个读取不同的消息队列中缓存的日志数据,以采集日志缓存单元中缓存的日志数据,即数据流从消息队列组件传输至数据流迁移组件的传输方式采用流式传输。
例如,该数据流迁移组件可以实现为一个分布式系统,用于有效地从许多不同的源(例如,消息队列组件)收集、聚合和移动大量日志数据到一个集中式的数据存储区,是一个可以收集例如日志、事件等数据资源,并将这些数量庞大的日志数据从各项数据资源中集中起来存储的工具/服务,或者数据集中机制。
例如,数据流迁移组件的外部结构可以包括数据发生器,数据发生器(例如,消息队列组件)产生的日志数据被单个的运行在数据发生器所在服务器上的代理专区(agent)所收集,之后数据接收器从各个代理专区上汇集日志数据并将采集到的日志数据存入到日志存储单元中。
例如,数据流迁移组件内部包括一个或者多个代理专区,然而对于每一个代理专区来说,它就是一个独立的守护进程(JVM),它从客户端(例如,消息队列组件)接收日志数据,或者从其他的代理专区接收日志数据,然后迅速的将获取的日志数据传给下一个目的节点,例如,接收器(sink)、日志存储单元或者下一个代理专区。
例如,代理专区主要包括三个组件:数据源(source),通道(channel)和接收器(sink)。例如,数据源从数据发生器接收日志数据,并将接收的日志数据传递给一个或者多个通道。例如,通道是一种短暂的存储容器,它将从数据源处接收到的日志数据缓存起来,直到它们被接收器消费掉,它在数据源和接收器之间起着桥梁的作用。通道是一个完整的事务,这一点保证了数据在收发的时候的一致性,并且它可以和任意数量的数据源和接收器连接。例如,通道的类型有:JDBC channel,File System channel,Memort channel等。例如,接收器将日志数据存储到例如日志存储单元,它从通道中消费日志数 据并将其传递给目标地,例如,该目标地可能是另一个接收器,也可能日志存储单元。例如,数据流迁移组件可以采用flume组件实现。
例如,在一些示例中,日志存储单元包括大数据存储平台,例如,包括分布式文件系统(HDFS,Hadoop Distributed File System)、数据库(例如,HBase(HadoopDatabase,开源的非关系型分布式数据库))或其他普通文件(例如,Windows文件、linux文件等)等,本公开的实施例对此不作限制。
例如,将日志数据传输至日志存储单元上进行存储包括:将数据流迁移组件采集的日志数据,传输至分布式文件系统上进行分布式存储。例如,不同的数据流迁移组件中的日志数据保存在不同的分布式文件系统上。
例如,基于系统时间并按照第一时间范围将日志数据传输至日志存储单元(例如,分布式文件系统)上进行存储。例如,该系统时间可以是执行该日志数据处理方法的机器或系统上的时间。例如,在一些示例中,可以按照主题、年月日和第一时间范围(例如,一些具体的时间范围,例如,00:00-12:00、12:00-24:00等)建立文件夹以及文件,从而将对应于某个主题和时间的日志数据存储在相应的文件或文件夹下,从而实现日志数据的分布式存储,从而有利在后续步骤中对相应范围内的日志数据进行处理。
例如,在一些示例中,当日志数据存储至日志存储单元后,该日志收集方法还包括:对存储至日志存储单元的日志数据进行数据处理,从而保证存储的日志数据的准确性和实用性。
图4示出了本公开至少一实施例提供的一种数据处理的流程图,如图4所示,该数据处理操作包括步骤S140-步骤S180。下面参考图4对本公开至少一实施例提供的数据处理操作进行详细地介绍。
步骤S140:使用时间片作为过滤条件确定需要进行数据处理的日志数据的数据范围。
例如,该时间片表示时间范围。例如,根据实际需求设定一个时间范围,以筛选出该时间范围内的日志数据进行下面的数据处理。例如,该时间片可以包括一个第一时间范围的范围,即,该时间片为一个第一时间范围(例如,是00:00-12:00),从而进率该第一时间范围的日志数据进行处理。当然,该时间片也可以包括多个第一时间范围的范围,即该时间片的范围涵盖多个第一时间范围(例如,是00:00-24:00,涵盖两个第一时间范围),从而可以过 滤该多个第一时间范围内的日志数据进行处理。
步骤S150:分布式逐条读入至少一个数据范围的日志数据。
例如,基于步骤S140中的不同的时间片可以获取至少一个数据范围,例如,可以对该至少一个数据范围内的日志数据分别同时进行数据处理。例如,每个数据范围内的日志数据分别被逐条读入,以对读入的日志数据逐条进行处理。例如,在一些示例中,该逐条读入的日志数据用于继续执行步骤S160,即用于判断是否合规,以筛选出合规的数据进行后续流程;在另一些示例中,该逐条读入的日志数据可直接用于执行步骤S170,即进行结构化处理,具体操作步骤可根据实际情况设置,本公开的实施例对此不作限制。
步骤S160:判断数据范围内的日志数据是否合规,如果是,则执行步骤S170,如果否,则继续执行步骤S160,以继续判断其余的日志数据是否合规。
例如,在该步骤中可以对各个分布式文件系统中的日志数据进行数据清洗。例如,判断数据范围内的日志数据是否合规可以包括:判断日志数据的格式、信息(例如,用户标识信息、用户问题信息等)以及时间等是否符合,本公开的实施例对此不作限制。基于此步骤可以筛选出准确的日志数据用于后面的数据分析。
步骤S170:结构化收集该日志数据。
例如,该结构化收集的过程包括:将例如文字形式的日志数据转换成矩阵形式。
例如,在上述步骤中,可以使用本领域的大数据处理程序按照任务调度清洗分布式文件系统中新增的日志数据。
步骤S180:输出该日志数据至带有时间片的目标文件中进行存储。
例如,将步骤S180中结构化后的日志数据存储至与其时间范围对应的目标文件中,从而完成日志数据的分布式存储。
例如,将上述进行数据处理后的日志数据汇集到结果文件,再进行报表所需指标(例如,问答时间、问答次数等)的相关计算,并在报表展示系统展示报表所需指标的计算结果,例如,柱状图展示。
本公开上述实施例提供的日志数据收集方法可以解决在高并发环境下日志数据保存不全的问题,从而拓宽了应用容器的使用环境,提高了其市场竞争力。
需要说明的是,本公开的一些实施例提供的日志数据收集方法的流程可以包括更多或更少的操作,这些操作可以顺序执行或并行执行。虽然上文描述的日志数据收集方法的流程包括特定顺序出现的多个操作,但是应该清楚地了解,多个操作的顺序并不受限制。上文描述的日志数据收集方法可以执行一次,也可以按照预定条件执行多次。
本公开至少一实施例还提供一种日志数据收集装置。图5为本公开至少一实施例提供的一种日志数据收集装置的示意框图。
例如,如图5所示,在一些示例中,日志数据收集装置100包括日志获取单元110、日志缓存单元120、日志采集单元130和日志存储单元140。例如,这些单元可以通过硬件(例如电路)模块或软件模块及其任意组合等形式实现。
该日志获取单元110配置为获取在应用容器环境下至少一个容器产生的日志数据。例如,该日志获取单元110可以实现步骤S110,其具体实现方法可以参考步骤S110的相关描述,在此不再赘述。
该日志缓存单元120配置为缓存日志数据。例如,该日志缓存单元120可以实现步骤S120,其具体实现方法可以参考步骤S120的相关描述,在此不再赘述。
该日志采集单元130配置为采集日志缓存单元120中缓存的日志数据并进行传输,该日志存储单元140配置为存储日志数据。例如,该日志采集单元130和该日志存储单元140可以实现步骤S130,其具体实现方法可以参考步骤S130的相关描述,在此不再赘述。
例如,日志缓存单元120包括消息队列组件,日志采集单元130包括数据流迁移组件,日志存储单元140包括分布式文件系统。具体描述可参考日志数据收集方法中的描述,在此不再赘述。
需要注意的是,本公开的实施例提供的日志数据收集装置可以包括更多或更少的电路或单元,并且各个电路或单元之间的连接关系不受限制,可以根据实际需求而定。各个电路的具体构成方式不受限制,可以根据电路原理由模拟器件构成,也可以由数字芯片构成,或者以其他适用的方式构成。
图6为本公开至少一实施例提供的另一种日志数据收集装置的示意框图。如图6所示,该日志数据收集装置200包括处理器210、存储器220以 及一个或多个计算机程序模块221。
例如,处理器210与存储器220通过总线系统230连接。例如,一个或多个计算机程序模块221被存储在存储器220中。例如,一个或多个计算机程序模块221包括用于执行本公开任一实施例提供的日志数据收集方法的指令。例如,一个或多个计算机程序模块221中的指令可以由处理器210执行。例如,总线系统230可以是常用的串行、并行通信总线等,本公开的实施例对此不作限制。
例如,该处理器210可以是中央处理单元(CPU)、现场可编程逻辑门阵列(FPGA)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元,可以为通用处理器或专用处理器,并且可以控制日志数据收集装置200中的其它组件以执行期望的功能。
存储器220可以包括一个或多个计算机程序产品,该计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。该易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。该非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在计算机可读存储介质上可以存储一个或多个计算机程序指令,处理器210可以运行该程序指令,以实现本公开实施例中(由处理器210实现)的功能以及/或者其它期望的功能,例如日志数据收集方法等。在该计算机可读存储介质中还可以存储各种应用程序和各种数据,例如至少一个应用容器中产生的日志数据以及应用程序使用和/或产生的各种数据等。
需要说明的是,为表示清楚、简洁,本公开实施例并没有给出该日志数据收集装置200的全部组成单元。为实现日志数据收集装置200的必要功能,本领域技术人员可以根据具体需要提供、设置其他未示出的组成单元,本公开的实施例对此不作限制。
关于不同实施例中的日志数据收集装置100和日志数据收集装置200的技术效果可以参考本公开实施例提供的日志数据收集方法的技术效果,这里不再赘述。
本公开的一些实施例还提供一种存储介质。图7为本公开至少一实施例提供的一种存储介质的示意图。例如,该存储介质300非暂时性地存储计算机可读指令301,当计算机可读指令301由计算机(包括处理器)执行时可 以执行本公开任一实施例提供的日志数据收集方法。
例如,该存储介质可以是一个或多个计算机可读存储介质的任意组合,例如一个计算机可读存储介质包含缓存日志数据的计算机可读的程序代码,另一个计算机可读存储介质包含采集日志数据的程序代码。例如,当该程序代码由计算机读取时,计算机可以执行该计算机存储介质中存储的程序代码,执行例如本公开任一实施例提供的日志数据收集方法。
例如,该程序代码可以被指令执行系统、装置或者器件使用或者与其结合使用。计算机可读存储介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
例如,存储介质可以包括智能电话的存储卡、平板电脑的存储部件、个人计算机的硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、闪存、或者上述存储介质的任意组合,也可以为其他适用的存储介质。
本公开的实施例提供的存储介质的技术效果可以参考上述实施例中关于日志数据收集方法的相应描述,这里不再赘述。
本公开至少一实施例还提供一种日志数据收集系统。如图8所示,该日志数据收集系统500包括终端设备510和服务器520。
例如,终端设备510配置为接收音频或文字信息,并将音频或文字信息发送至服务器520。例如,终端设备可以是电子画框等电子设备。例如,该终端设备将在图9中进行详细地介绍,在此不再赘述。
例如,服务器520配置为接收终端设备510发送的音频或文字信息,并产生日志数据,且基于本公开任一实施例提供的日志数据收集方法收集该日志数据。
例如,在一些示例中,音频或文字信息包括通用类音频或文字信息和艺术类音频或文字信息,服务器520包括通用类应用容器和艺术类应用容器、消息队列组件、数据流迁移组件和分布式文件系统。例如,通用类应用容器,配置为响应于通用类音频或文字信息输出通用类日志数据;艺术类应用容器,配置为响应于艺术类音频或文字信息输出艺术类日志数据;消息队列组件,配置为缓存通用类日志数据和艺术类日志数据;数据流迁移组件,配置为采集消息队列组件中缓存的通用类日志数据和艺术类日志数据并进行传输;分 布式文件系统,配置为存储通用类日志数据和艺术类日志数据。例如,该通用类应用容器、艺术类应用容器、消息队列组件、数据流迁移组件和分布式文件系统等可以参考上述日志数据收集方法的具体描述,在此不再赘述。
例如,该消息队列组件包括通用类主题的消息队列和艺术类主题的消息队列。通用类日志数据缓存在通用类主题的消息队列中,艺术类日志数据缓存在艺术类主题的日志数据中。例如,服务器520还还配置为基于第一原则判断存储在分布式文件系统上的通用类日志数据和艺术类日志数据是否合规。例如,该第一原则可根据电子画框的开机时间、电子画框的屏幕方向或电子画框的音量等进行设置。例如,当判断电子画框的开机时间时,该第一原则可设置为2019,即当开机时间显示2099年时,即不合规;例如,可设置第一原则为电子画框包括的横屏和竖屏,因此当显示斜屏时,即为不合规;又例如,可设置第一原则为电子画框的音量为0-100,当音量显示300时,即为不合规。本公开的实施例对此不作限制。
本公开至少一实施例还提供一种应用上述日志数据收集方法的终端设备。图9示出了本公开至少一实施例提供的一种终端设备的示意图。如图9所示,该终端设备600(如上所述的服务器或终端设备)可以包括但不限于诸如电子画框、移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图9示出的终端设备仅仅是一个示例,本公开的实施例对此不作限制。
如图9所示,终端设备600可以包括处理装置(例如中央处理器、图形处理器等)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储装置608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理,例如,上述日志数据收集方法。在RAM 603中,还存储有终端设备600操作所需的各种程序和数据。处理装置601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
例如,连接至I/O接口605的装置包括:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置606;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置607;包括例如磁带、 硬盘等的存储装置608;以及通信装置609。通信装置609可以允许终端设备600与其他设备进行无线或有线通信以交换数据。虽然图9示出了包括各种装置的终端设备600,但是应理解的是,并不要求实施或具备所有示出的装置,而且可以替代地实施或具备更多或更少的装置。
例如,在本公开的实施例中,上述参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在例如如图7所示的存储介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在该实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM 602被安装。在该计算机程序被处理装置601执行时,执行本公开实施例的日志数据收集方法中限定的上述功能。例如,上述存储介质可以是上述终端设备中所包含的;也可以是单独存在,而未装配入该终端设备中。
例如,在一些实施方式中,终端设备和服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
本公开的实施例提供的日志数据收集系统的技术效果可以参考上述实施例中关于日志数据收集方法的相应描述,这里不再赘述。
有以下几点需要说明:
(1)本公开实施例附图只涉及到与本公开实施例涉及到的结构,其他结构可参考通常设计。
(2)在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合以得到新的实施例。
以上所述仅是本公开的示范性实施方式,而非用于限制本公开的保护范围,本公开的保护范围由所附的权利要求确定。

Claims (25)

  1. 一种日志数据收集方法,包括:
    获取在应用容器环境下至少一个容器产生的日志数据;
    将所述日志数据传输至日志缓存单元中进行缓存;
    通过日志采集单元采集所述日志缓存单元中缓存的日志数据,并将所述日志数据传输至日志存储单元上进行存储。
  2. 根据权利要求1所述的日志数据收集方法,其中,所述日志缓存单元包括消息队列组件,所述日志采集单元包括数据流迁移组件,
    其中,所述日志数据收集方法包括:
    将所述日志数据直接传输至所述消息队列组件中进行缓存;
    通过所述数据流迁移组件采集所述消息队列组件中缓存的日志数据,并将所述日志数据传输至所述日志存储单元上进行存储。
  3. 根据权利要求2所述的日志数据收集方法,其中,将所述日志数据传输至日志缓存单元中进行缓存,包括:
    根据所述日志数据的日志类型,将不同日志类型的日志数据分别发送至所述消息队列组件中不同的消息队列中进行缓存。
  4. 根据权利要求3所述的日志数据收集方法,其中,通过日志采集单元采集所述日志缓存单元中缓存的日志数据,包括:
    所述日志采集单元逐个读取所述不同的消息队列中缓存的日志数据,以采集所述日志缓存单元中缓存的日志数据。
  5. 根据权利要求1-4任一所述的日志数据收集方法,其中,所述日志数据包括错误级日志数据、警告级日志数据和信息级日志数据。
  6. 根据权利要求1-5任一所述的日志数据收集方法,其中,基于系统时间并按照第一时间范围将所述日志数据传输至所述日志存储单元上进行存储。
  7. 根据权利要求1-6任一所述的日志数据收集方法,其中,所述日志存储单元包括分布式文件系统;
    将所述日志数据传输至所述日志存储单元上进行存储包括:
    将所述日志采集单元采集的日志数据,传输至所述分布式文件系统上进 行分布式存储。
  8. 根据权利要求1-7任一所述的日志数据收集方法,还包括:
    对存储至所述日志存储单元的日志数据进行数据处理。
  9. 根据权利要求8所述的日志数据收集方法,其中,
    使用时间片作为过滤条件确定需要进行所述数据处理的日志数据的数据范围;
    判断所述数据范围内的日志数据是否合规,如果合规,则结构化收集所述日志数据,并输出所述日志数据至带有时间片的目标文件中进行存储。
  10. 根据权利要求9所述的日志数据收集方法,其中,判断所述数据范围内的日志数据是否合规包括:
    分布式逐条读入至少一个所述数据范围的日志数据,以判断所述至少一个数据范围内的日志数据是否合规。
  11. 根据权利要求1-10任一所述的日志数据收集方法,其中,所述日志数据为智能问答系统产生的日志数据。
  12. 根据权利要求11所述的日志数据收集方法,其中,所述日志数据的类型包括第一类日志数据和第二类日志数据;其中,
    所述第一类日志数据发送至所述消息队列组件中的第一消息队列中进行缓存;
    所述第二类日志数据发送至所述消息队列组件中的第二消息队列中进行缓存;
    所述第一消息队列和所述第二消息队列为不同的消息队列。
  13. 根据权利要求12所述的日志数据收集方法,其中,所述第一类日志数据为基于通用类问答产生的日志数据,所述第二类日志数据为基于艺术类问答产生的日志数据。
  14. 根据权利要求11所述的日志数据收集方法,其中,所述应用容器环境包括所述至少一个容器,所述智能问答系统包括自然语言理解子系统,所述自然语言理解子系统运行在所述应用容器环境的至少一个容器上并产生所述日志数据,
    其中,所述至少一个容器响应于业务请求输出所述日志数据。
  15. 根据权利要求14所述的日志数据收集方法,其中,所述应用容器环 境包括多个容器,所述自然语言理解子系统的不同业务模块运行在不同的容器中。
  16. 根据权利要求1-15任一所述的日志数据收集方法,其中,所述应用容器环境采用docker容器引擎实现。
  17. 一种日志数据收集装置,包括:
    日志获取单元,配置为获取在应用容器环境下至少一个容器产生的日志数据;
    日志缓存单元,配置为缓存所述日志数据;
    日志采集单元,配置为采集所述日志缓存单元中缓存的日志数据并进行传输;
    日志存储单元,配置为存储所述日志数据。
  18. 根据权利要求17所述的日志数据收集装置,其中,所述日志缓存单元包括消息队列组件,所述日志采集单元包括数据流迁移组件,所述日志存储单元包括分布式文件系统。
  19. 一种日志数据收集装置,包括:
    处理器;
    存储器,存储有一个或多个计算机程序模块,其中,
    所述一个或多个计算机程序模块被配置为由所述处理器执行,所述一个或多个计算机程序模块包括用于执行实现权利要求1-16任一所述的日志数据收集方法的指令。
  20. 一种存储介质,非暂时性地存储计算机可读指令,当所述计算机可读指令由计算机执行时可以执行根据权利要求1-16任一所述的日志数据收集方法。
  21. 一种日志数据收集系统,包括终端设备和服务器;其中,
    所述终端设备配置为接收音频或文字信息,并将所述音频或文字信息发送至所述服务器;
    所述服务器配置为接收所述终端设备发送的所述音频或文字信息,并产生日志数据,且基于权利要求1-16任一所述的日志数据收集方法收集所述日志数据。
  22. 根据权利要求21所述的日志数据收集系统,其中,所述终端设备包 括电子画框。
  23. 根据权利要求22所述的日志数据收集系统,其中,所述音频或文字信息包括通用类音频或文字信息和艺术类音频或文字信息,所述服务器包括通用类应用容器和艺术类应用容器、消息队列组件、数据流迁移组件和分布式文件系统;
    所述通用类应用容器,配置为响应于所述通用类音频或文字信息输出通用类日志数据;
    所述艺术类应用容器,配置为响应于所述艺术类音频或文字信息输出艺术类日志数据;
    所述消息队列组件,配置为缓存所述通用类日志数据和所述艺术类日志数据;
    所述数据流迁移组件,配置为采集所述消息队列组件中缓存的所述通用类日志数据和所述艺术类日志数据并进行传输;
    所述分布式文件系统,配置为存储所述通用类日志数据和所述艺术类日志数据。
  24. 根据权利要求23所述的日志数据收集系统,其中,所述消息队列组件包括通用类主题的消息队列和艺术类主题的消息队列;其中,
    所述通用类日志数据缓存在所述通用类主题的消息队列中,所述艺术类日志数据缓存在所述艺术类主题的日志数据中。
  25. 根据权利要求23或24所述的日志数据收集系统,其中,所述服务器还配置为根据第一原则判断存储在所述分布式文件系统上的所述通用类日志数据和所述艺术类日志数据是否合规。
PCT/CN2019/093854 2019-06-28 2019-06-28 日志数据收集方法、日志数据收集装置、存储介质和日志数据收集系统 WO2020258290A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/CN2019/093854 WO2020258290A1 (zh) 2019-06-28 2019-06-28 日志数据收集方法、日志数据收集装置、存储介质和日志数据收集系统
CN201980000951.XA CN112449750A (zh) 2019-06-28 2019-06-28 日志数据收集方法、日志数据收集装置、存储介质和日志数据收集系统
US16/963,618 US11755452B2 (en) 2019-06-28 2019-06-28 Log data collection method based on log data generated by container in application container environment, log data collection device, storage medium, and log data collection system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/093854 WO2020258290A1 (zh) 2019-06-28 2019-06-28 日志数据收集方法、日志数据收集装置、存储介质和日志数据收集系统

Publications (1)

Publication Number Publication Date
WO2020258290A1 true WO2020258290A1 (zh) 2020-12-30

Family

ID=74061459

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/093854 WO2020258290A1 (zh) 2019-06-28 2019-06-28 日志数据收集方法、日志数据收集装置、存储介质和日志数据收集系统

Country Status (3)

Country Link
US (1) US11755452B2 (zh)
CN (1) CN112449750A (zh)
WO (1) WO2020258290A1 (zh)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113179302A (zh) * 2021-04-19 2021-07-27 杭州海康威视系统技术有限公司 日志系统以及日志数据的收集方法和收集装置
CN113297048A (zh) * 2021-05-24 2021-08-24 北京鼎事兴教育咨询有限公司 日志数据的处理方法、电子设备及存储介质
CN113326004A (zh) * 2021-06-10 2021-08-31 深圳市移卡科技有限公司 云计算环境下高效日志集中化方法及设备
CN113342564A (zh) * 2021-06-25 2021-09-03 阿波罗智联(北京)科技有限公司 日志审计方法、装置、电子设备和介质
CN113382071A (zh) * 2021-06-09 2021-09-10 北京猿力未来科技有限公司 基于混合云架构的链路创建方法及装置
CN113612816A (zh) * 2021-07-06 2021-11-05 深圳市酷开网络科技股份有限公司 一种数据采集方法、系统、终端及计算机可读存储介质
CN114553866A (zh) * 2022-01-19 2022-05-27 深圳力维智联技术有限公司 全量数据的接入方法、装置以及计算机可读存储介质
CN114756301A (zh) * 2022-04-24 2022-07-15 北京百度网讯科技有限公司 日志处理方法、装置和系统
CN115967607A (zh) * 2022-12-25 2023-04-14 西安电子科技大学 基于模板的分布式互联网大数据采集系统及方法

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111581173B (zh) * 2020-05-09 2023-10-20 深圳市卡数科技有限公司 日志系统分布式存储的方法、装置、服务器及存储介质
CN112948378A (zh) * 2021-02-04 2021-06-11 上海中通吉网络技术有限公司 基于HBase的数据处理方法、装置和设备
CN114490543A (zh) * 2022-01-12 2022-05-13 北京元年科技股份有限公司 内存多维数据库的事务日志实现方法、装置、设备及介质
CN115086296B (zh) * 2022-05-27 2024-04-05 阿里巴巴(中国)有限公司 一种日志传输系统、日志传输方法及相关装置
CN115168030B (zh) * 2022-06-24 2023-10-20 天翼爱音乐文化科技有限公司 一种动态调控的日志采集、处理方法、装置及存储介质
CN115118779B (zh) * 2022-06-24 2024-09-24 济南浪潮数据技术有限公司 一种基于集中式存储搭建集群的方法、系统、设备和介质
CN116185921B (zh) * 2023-04-28 2023-07-21 湖北芯擎科技有限公司 针对异构多核多域的日志输出方法、系统、设备及介质
CN117194549B (zh) * 2023-11-07 2024-01-26 上海柯林布瑞信息技术有限公司 基于任务数据配置的数据传输方法及装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1642104A (zh) * 2004-01-05 2005-07-20 华为技术有限公司 一种系统日志实现方法和装置
US7480672B2 (en) * 2005-03-31 2009-01-20 Sap Ag Multiple log queues in a database management system
CN109284251A (zh) * 2018-08-14 2019-01-29 平安普惠企业管理有限公司 日志管理方法、装置、计算机设备以及存储介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8862813B2 (en) * 2005-12-29 2014-10-14 Datacore Software Corporation Method, computer program product and appartus for accelerating responses to requests for transactions involving data operations
CN107239382A (zh) * 2017-06-23 2017-10-10 深圳市冬泉谷信息技术有限公司 一种容器应用的日志处理方法及系统
KR102016238B1 (ko) 2017-12-05 2019-08-29 숭실대학교산학협력단 도커 컨테이너 관리 시스템 및 방법, 이를 수행하기 위한 기록매체
CN109063025A (zh) * 2018-07-13 2018-12-21 江苏满运软件科技有限公司 一种日志处理方法及系统,及一种智能终端
CN109308329A (zh) * 2018-09-27 2019-02-05 深圳供电局有限公司 一种基于云平台的日志收集方法和装置
CN109491859B (zh) 2018-10-16 2021-10-26 华南理工大学 针对Kubernetes集群中容器日志的收集方法
CN109274556A (zh) 2018-11-09 2019-01-25 四川长虹电器股份有限公司 一种web日志的收集分析系统
CN109818934B (zh) 2018-12-29 2021-10-22 达闼机器人有限公司 一种自动化日志处理的方法、装置及计算设备
CN110111084A (zh) * 2019-05-16 2019-08-09 上饶市中科院云计算中心大数据研究院 一种政务服务热线分析方法及系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1642104A (zh) * 2004-01-05 2005-07-20 华为技术有限公司 一种系统日志实现方法和装置
US7480672B2 (en) * 2005-03-31 2009-01-20 Sap Ag Multiple log queues in a database management system
CN109284251A (zh) * 2018-08-14 2019-01-29 平安普惠企业管理有限公司 日志管理方法、装置、计算机设备以及存储介质

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113179302A (zh) * 2021-04-19 2021-07-27 杭州海康威视系统技术有限公司 日志系统以及日志数据的收集方法和收集装置
CN113179302B (zh) * 2021-04-19 2022-09-16 杭州海康威视系统技术有限公司 日志系统以及日志数据的收集方法和收集装置
CN113297048A (zh) * 2021-05-24 2021-08-24 北京鼎事兴教育咨询有限公司 日志数据的处理方法、电子设备及存储介质
CN113382071B (zh) * 2021-06-09 2022-09-06 北京猿力未来科技有限公司 基于混合云架构的链路创建方法及装置
CN113382071A (zh) * 2021-06-09 2021-09-10 北京猿力未来科技有限公司 基于混合云架构的链路创建方法及装置
CN113326004A (zh) * 2021-06-10 2021-08-31 深圳市移卡科技有限公司 云计算环境下高效日志集中化方法及设备
CN113326004B (zh) * 2021-06-10 2023-03-03 深圳市移卡科技有限公司 云计算环境下高效日志集中化方法及设备
CN113342564A (zh) * 2021-06-25 2021-09-03 阿波罗智联(北京)科技有限公司 日志审计方法、装置、电子设备和介质
CN113342564B (zh) * 2021-06-25 2023-12-12 阿波罗智联(北京)科技有限公司 日志审计方法、装置、电子设备和介质
CN113612816A (zh) * 2021-07-06 2021-11-05 深圳市酷开网络科技股份有限公司 一种数据采集方法、系统、终端及计算机可读存储介质
CN114553866A (zh) * 2022-01-19 2022-05-27 深圳力维智联技术有限公司 全量数据的接入方法、装置以及计算机可读存储介质
CN114553866B (zh) * 2022-01-19 2024-09-17 深圳力维智联技术有限公司 全量数据的接入方法、装置以及计算机可读存储介质
CN114756301A (zh) * 2022-04-24 2022-07-15 北京百度网讯科技有限公司 日志处理方法、装置和系统
CN114756301B (zh) * 2022-04-24 2023-09-01 北京百度网讯科技有限公司 日志处理方法、装置和系统
CN115967607A (zh) * 2022-12-25 2023-04-14 西安电子科技大学 基于模板的分布式互联网大数据采集系统及方法

Also Published As

Publication number Publication date
US11755452B2 (en) 2023-09-12
CN112449750A (zh) 2021-03-05
US20220004480A1 (en) 2022-01-06

Similar Documents

Publication Publication Date Title
WO2020258290A1 (zh) 日志数据收集方法、日志数据收集装置、存储介质和日志数据收集系统
US10447772B2 (en) Managed function execution for processing data streams in real time
US11836533B2 (en) Automated reconfiguration of real time data stream processing
US12007996B2 (en) Management of distributed computing framework components
US11711420B2 (en) Automated management of resource attributes across network-based services
US10768988B2 (en) Real-time partitioned processing streaming
US10127086B2 (en) Dynamic management of data stream processing
US11861405B2 (en) Multi-cluster container orchestration
US9367211B1 (en) Interface tab generation
CN109117252B (zh) 基于容器的任务处理的方法、系统及容器集群管理系统
CN111352903A (zh) 日志管理平台、日志管理方法、介质以及电子设备
CN111680799B (zh) 用于处理模型参数的方法和装置
US10489179B1 (en) Virtual machine instance data aggregation based on work definition metadata
CN110782122A (zh) 数据处理方法、装置及电子设备
CN110928732A (zh) 服务器集群性能采样分析方法、装置及电子设备
US20230222004A1 (en) Data locality for big data on kubernetes
CN113535673A (zh) 生成配置文件及数据处理的方法和装置
CN113746685B (zh) 基于pulsar日志采集流处理方法、处理装置及可读存储介质
US20220091866A1 (en) Containerized software discovery and identification
CN114090201A (zh) 资源调度方法、装置、设备及存储介质
CN115269130A (zh) 算力资源调度方法与装置、介质及电子设备
CN110866605B (zh) 数据模型训练方法、装置、电子设备及可读介质
US10733002B1 (en) Virtual machine instance data aggregation
CN113672200A (zh) 微服务处理方法及装置、存储介质及电子设备
Marian et al. Analysis of Different SaaS Architectures from a Trust Service Provider Perspective

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19935675

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19935675

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19935675

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25/07/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19935675

Country of ref document: EP

Kind code of ref document: A1