CN114201493B - Data access method, device, equipment and storage medium - Google Patents

Data access method, device, equipment and storage medium Download PDF

Info

Publication number
CN114201493B
CN114201493B CN202111514637.9A CN202111514637A CN114201493B CN 114201493 B CN114201493 B CN 114201493B CN 202111514637 A CN202111514637 A CN 202111514637A CN 114201493 B CN114201493 B CN 114201493B
Authority
CN
China
Prior art keywords
data
accessed
component
file
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111514637.9A
Other languages
Chinese (zh)
Other versions
CN114201493A (en
Inventor
崔雪霏
郝学峰
王维煜
孙莺萁
宋勋超
王志华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111514637.9A priority Critical patent/CN114201493B/en
Publication of CN114201493A publication Critical patent/CN114201493A/en
Application granted granted Critical
Publication of CN114201493B publication Critical patent/CN114201493B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2291User-Defined Types; Storage management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present disclosure provides a data access method, apparatus, device and storage medium, and relates to the technical field of artificial intelligence such as natural language processing, knowledge graph and the like. The method comprises the following steps: acquiring data to be accessed comprising at least one data type; verifying the data to be accessed based on the data protocol specification corresponding to each data type, responding to the data to be accessed, passing the verification, and preprocessing the data to be accessed to obtain preprocessed data; acquiring the preprocessed data to obtain acquired data; and storing the acquired data to finish the data access process. The data access method provided by the disclosure can be used for preprocessing, collecting and storing the data to be accessed at the access side, thereby improving the efficiency and performance of data access.

Description

Data access method, device, equipment and storage medium
Technical Field
The present disclosure relates to artificial intelligence technologies such as natural language processing and knowledge graph spectrum, and in particular, to a data access method, apparatus, device, and storage medium.
Background
With the advent of the big data and industrial internet era, a great deal of data brings a great deal of challenges, and data of different types, different formats and different sources need to be accessed into a system through various terminals. Aiming at a multi-service and multi-mode oriented data access scene, the data butt joint of various systems in an enterprise needs to be solved, and multi-mode heterogeneous data scattered in different systems are uniformly accessed to a knowledge base for subsequent knowledge production and application. Meanwhile, as the interior of an enterprise usually accumulates a large amount of data, sometimes even reaching hundred million levels of data, meeting the high-performance access capability is also a necessary condition.
Disclosure of Invention
The disclosure provides a data access method, a device, equipment and a storage medium.
According to a first aspect of the present disclosure, there is provided a data access method, including: acquiring data to be accessed comprising at least one data type; verifying the data to be accessed based on the data protocol specification corresponding to each data type, responding to the data to be accessed to pass the verification, and preprocessing the data to be accessed to obtain preprocessed data; acquiring the preprocessed data to obtain acquired data; and storing the acquired data to finish the data access process.
According to a second aspect of the present disclosure, there is provided a data access apparatus comprising: the device comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is configured to acquire data to be accessed comprising at least one data type; the preprocessing module is configured to verify data to be accessed based on the data protocol specification corresponding to each data type, respond to the data to be accessed and pass the verification, and preprocess the data to be accessed to obtain preprocessed data; the acquisition module is configured to acquire the preprocessed data to obtain acquired data; and the storage module is configured to store the acquired data so as to complete the data access process.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as described in any one of the implementations of the first aspect.
According to a fifth aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method as described in any of the implementations of the first aspect.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is an exemplary system architecture diagram in which the present disclosure may be applied;
FIG. 2 is a flow diagram of one embodiment of a data access method according to the present disclosure;
fig. 3 is a flow diagram of another embodiment of a data access method according to the present disclosure;
FIG. 4 is a schematic diagram of an application scenario of a data access method according to the present disclosure;
FIG. 5 is a schematic block diagram of one embodiment of a data access device according to the present disclosure;
fig. 6 is a block diagram of an electronic device for implementing a data access method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that, in the present disclosure, the embodiments and the features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the data access method or data access apparatus of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal devices 101, 102, 103 to interact with a server 105 over a network 104 to receive or transmit information or the like. Various client applications may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the above-described electronic apparatuses. It may be implemented as a plurality of software or software modules or as a single software or software module. And is not particularly limited herein.
The server 105 may provide various services. For example, the server 105 may analyze and process the data to be accessed acquired from the terminal apparatuses 101, 102, 103, and generate a processing result (e.g., access data to be accessed).
The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the data access method provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, the data access device is generally disposed in the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to fig. 2, a flow 200 of one embodiment of a data access method according to the present disclosure is shown. The data access method comprises the following steps:
step 201, obtaining data to be accessed including at least one data type.
In this embodiment, an executing entity (for example, the server 105 shown in fig. 1) of the data access method may obtain data to be accessed, where the data type includes, but is not limited to: structured data, video data, web page data, document data. The executing agent may obtain data to be accessed from the access side, where the data to be accessed may include one or more data types, and the data types may include, but are not limited to, structured data, video data, web page data, and Document data, where the structured data refers to data accessed from a database, the video data may include local videos and online videos, and the Document data may include, but is not limited to, office series PDF (Portable Document Format), txt text documents, and the like. As an example, the data to be accessed may include structured data, video data, and web page data.
Step 202, verifying the data to be accessed based on the data protocol specification corresponding to each data type, responding to the passing inspection of the data to be accessed, and preprocessing the data to be accessed to obtain preprocessed data.
In this embodiment, the execution main body customizes a corresponding data protocol specification for data of each data type in advance, and after acquiring the data to be accessed, the execution main body may check the data to be accessed based on the data protocol specification customized in advance and corresponding to each data type, that is, check a specified field, a data type, and the like of the data to be accessed based on the data protocol specification. For example, if the data to be accessed includes structured data and video data, the execution main body checks the structured data in the data to be accessed based on the structured access specification corresponding to the structured data, and checks the video data in the data to be accessed based on the video access specification corresponding to the video data.
And if the data to be accessed cannot pass the verification, the data to be accessed cannot be accessed. Through a verification mechanism, data to be accessed can be verified at a data introduction stage, and the error interception is ensured at the initial stage.
If the data to be accessed passes the verification, the execution main body can process the data to be accessed, so as to obtain the preprocessed data. For example, when the data to be accessed includes video data, the preprocessing operation includes performing fragment uploading on a video, so as to increase the speed of uploading the video, and then obtaining stream media file information and meta information (meta information) of a video file, and converting the stream media file information and meta information into data in a JSON (JavaScript Object Notation, JS Object Notation for short), so as to obtain the preprocessed data. For another example, when the data to be accessed includes document data, the preprocessing operation includes converting the document data into web page data, and converting the converted document data and corresponding meta information into data in JSON format, thereby obtaining preprocessed data. For another example, when the data to be accessed includes page data, the page data may be directly converted into data in the JSON format.
And 203, acquiring the preprocessed data to obtain acquired data.
In this embodiment, the execution subject may collect the pre-processing data, so as to obtain the collected data. That is, the execution main body may acquire the preprocessed data in different manners according to the type of the data included in the data to be accessed and the size of the data, so as to obtain the acquired data. For example, the JSON-formatted data obtained in step 202 may be collected in an API (Application Programming Interface) pushing mode, an API pulling mode, an FTP (File Transfer Protocol), a manual uploading mode, a mysql (relational database management system) direct connection mode, and the like, so as to obtain collected data, thereby meeting the requirements of introducing data such as user stock document data, incremental structured data, and log files in the system. It should be noted that when the data to be accessed includes structured data, the mysql direct connection mode may be directly adopted for acquisition, so as to obtain the acquired data.
And step 204, storing the acquired data to finish the data access process.
In this embodiment, the executing entity may store the acquired data obtained in step 203, so as to complete the data access process of the data to be accessed. In this embodiment, the collected data may be stored in a corresponding storage manner based on the requirements of different data types and production manners, so as to complete the data access process, where the storage manner may include, but is not limited to, at least one of the following: message queues, distributed storage, and local storage. Therefore, the data source requirements of different data types and production modes are met, and support is provided for data playback capacity.
For example, when the data volume of the data to be accessed is large, the API pushing component can be used for collecting the preprocessed data, so that the access requirement of large-data-volume structured data is met. For another example, when the data volume of the data to be accessed is small, the preprocessed data can be collected by adopting the API pull component and the file transfer protocol component, so that the access requirement in the small data volume scene can be flexibly met.
The data access method provided by the embodiment of the disclosure includes firstly, acquiring data to be accessed including at least one data type; then, verifying the data to be accessed based on the data protocol specification corresponding to each data type, responding to the passing inspection of the data to be accessed, and preprocessing the data to be accessed to obtain preprocessed data; then, acquiring the preprocessed data to obtain acquired data; and finally, storing the acquired data to finish the data access process. In the data access method in this embodiment, the data to be accessed is preprocessed, collected and stored at the access side, so that the efficiency and performance of data access are improved, and further, the complex downstream service requirements can be met.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.
With continued reference to fig. 3, fig. 3 illustrates a flow 300 of another embodiment of a data access method according to the present disclosure. The data access method comprises the following steps:
step 301, obtaining data to be accessed including at least one data type.
Step 302, the data to be accessed is checked based on the data protocol specification corresponding to each data type, and the data to be accessed is preprocessed in response to the data to be accessed passing the check, so as to obtain preprocessed data.
Steps 301 to 302 are substantially the same as steps 201 to 202 in the foregoing embodiment, and the specific implementation manner may refer to the foregoing description of steps 201 to 202, which is not described herein again.
In some optional implementations of this embodiment, the data to be accessed includes video data, and the preprocessing the data to be accessed in step 302 includes: segmenting video data to obtain at least one sub-video file; and respectively acquiring the streaming media file and the meta information of each sub video file in at least one sub video file, and converting the streaming media file and the meta information into data in a JSON format.
In this implementation manner, when the data to be accessed includes video data, the execution main body may first check the video data in the data to be accessed based on a video access specification corresponding to the video data, and in response to the data to be accessed passing the check, the execution main body may segment the video data to obtain at least one sub-video file. The fragment uploading can avoid the problem that the file is always required to be uploaded from the initial position of the file due to poor network environment, and can also use multiple threads to concurrently send different fragment data, thereby improving the sending efficiency and reducing the sending time.
Then, the execution main body obtains the streaming media file and the meta information of each sub video file in at least one sub video file obtained after segmentation, and converts the streaming media file and the meta information into data in a JSON format, so as to obtain preprocessed data. JSON (JavaScript Object Notation) is a lightweight data exchange format. Meta information generally refers to parameter information such as title, author, time, etc. of a video. The meta information of the streaming media file and the video file is converted into data in a JSON format, so that different types of access data are accessed, and then the complex downstream service requirements can be met.
In some optional implementations of this embodiment, the data to be accessed includes document data, and the preprocessing the data to be accessed in step 302 further includes: converting the document data into webpage data to obtain converted webpage data; and converting the converted webpage data and the meta information of the document data into data in a JSON format.
In this implementation manner, when the data to be accessed includes document data, the execution main body may first verify the document data in the data to be accessed based on the data protocol specification corresponding to the document data, and in response to the data to be accessed passing the verification, the execution main body may perform parsing and conversion on the document data to obtain converted web page data, and then convert the converted web page data and the meta information of the document data into data in the JSON format. Meta information generally refers to parameter information such as the title, author, etc. of a document. And extracting docModel and para structured data according to the document original text, so that feature analysis and other processing can be carried out on the docModel and para structured data subsequently, and the converted webpage data and the meta information of the document data are converted into data in a JSON format, so that different types of access data are accessed, and the complex downstream business requirements can be met.
And 303, acquiring data in the JSON format based on the pre-customized interface component to obtain acquired data.
In this embodiment, the execution subject may acquire data in JSON format based on a pre-customized interface component to obtain acquired data, where the interface component includes at least one of the following: the device comprises an interface pushing component, an interface pulling component and a file transfer protocol component. In this embodiment, when the data volume of the data to be accessed is large, an interface pushing component, namely an API pushing component, may be used to collect the preprocessed data, so as to meet the access requirement of the large-data-volume structured data; when the data volume of the data to be accessed is small, the preprocessed data can be collected by adopting an interface pushing component, namely an API (application program interface) pulling component and a file transfer protocol component, so that the access requirement under the scene of small data volume is flexibly met.
In some optional implementations of this embodiment, the method further includes: and collecting the streaming media file and the converted webpage data based on the file uploading component.
In this implementation manner, the execution main body may further collect streaming media files and converted web data based on a pre-customized file uploading component, so that videos, documents, pictures and the like are stored in a local file system, and access services are encapsulated, so that a downstream demand party can obtain data through a service or a local reading manner.
And step 304, storing the acquired data based on at least one storage mode of the message queue, the distributed storage and the local storage to finish the data access process.
In this embodiment, the execution main body may store the acquired data based on at least one storage mode of a message queue, distributed storage, and local storage, so as to complete a data access process of the data to be accessed. For example, for data statistical information with high timeliness requirements, a stream-type calculation mode is adopted to obtain statistical data information, namely, collected data is stored in a message queue mode, so that flow peak clipping is achieved by using the message queue, and decoupling of data production and consumption is achieved; for another example, because the access quantity of the structured data may be abnormally large, for statistical data information with low timeliness requirements, data is counted in an asynchronous timing batch calculation mode, that is, the acquired data is stored in a distributed storage mode, so that distributed data reading and writing are realized, and a large data volume scene is met; in addition, for videos and documents, the source file data is stored in a local storage mode. Thereby processing and storing data of different data types.
Step 305, outputting the stored data based on the pre-customized data output protocol.
In this embodiment, the execution subject may output the stored data based on a pre-customized data output protocol. In this embodiment, for different production modes, such as full-text streaming production, full-map, and map increment, batch, streaming, full-map, and increment data sources are provided in the data output protocol, so as to meet different requirements. For example, for data with higher timeliness requirements, a message queue mode is adopted for output, so that flow peak clipping is achieved by using the message queue, and decoupling of data production and consumption is achieved; data of the distributed file system can be output in an incremental and full mode, so that different business requirements are met.
As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, the data access method in this embodiment highlights the processes of preprocessing, acquiring, storing and outputting the data to be accessed, and by preprocessing the data of different data types, acquiring the data by using different interface components according to different data volumes, and then storing and outputting the data according to different data scenes, the data of complex data types is accessed, the efficiency and performance of data access are improved, and the complex downstream service requirements can be met.
With continued reference to fig. 4, a schematic diagram of an application scenario of the Data access method according to the present disclosure is shown, in which the Data to be accessed includes structured Data, i.e., DB (Data Base) Data, video Data, which may include local videos and online videos, web page Data, and document Data, which may include, but is not limited to, office series PDF, txt text documents, etc.
Firstly, the execution main body checks the data to be accessed based on the data protocol specification corresponding to each data type, namely checks the structured data in the data to be accessed based on the structured access specification corresponding to the structured data; verifying the video data in the data to be accessed based on the video access specification corresponding to the video data; verifying the webpage data in the data to be accessed based on the webpage access specification corresponding to the webpage data; and verifying the document data in the data to be accessed based on the document access specification corresponding to the document data. And if the data to be accessed do not pass the verification, the data cannot be accessed.
And if the data to be accessed passes the verification, the execution main body preprocesses the data to be accessed. For video data, the executing entity performs fragment uploading on the video, namely, divides the video data to obtain at least one sub-video file, then respectively obtains a streaming media file and meta-information of each sub-video file in the at least one sub-video file, and converts the streaming media file and the meta-information into data in a JSON format; aiming at the webpage data, the execution subject can convert the webpage data into data in a JSON format; for the document data, the execution subject converts the document data into web page data to obtain converted web page data, and then converts the converted web page data and the meta information of the document data into data in a JSON format.
And then, the execution subject acquires the data obtained by preprocessing. That is, the execution subject collects the obtained JSON-format data based on a pre-customized interface component, where the interface component includes at least one of: the system comprises an API pushing component, an API pulling component, an FTP component and a file uploading component. When the data volume of the data to be accessed is large, an interface API (application program interface) pushing component is adopted to collect the preprocessed data, so that the access requirement of large-data-volume structured data is met; when the data volume of the data to be accessed is small, the preprocessed data are collected by the API pull component and the FTP component, so that the access requirement in a small data volume scene is flexibly met; in addition, the execution main body collects the streaming media file and the converted webpage data based on the file uploading component, so that the content of videos, documents, pictures and the like is stored in a local file system. It should be noted that, for structured data, mysql import component is used to directly access the structured data.
Then, the execution agent stores the collected data based on at least one storage mode of a Message Queue (MQ), a distributed storage, and a local storage. For example, for data statistical information with high timeliness requirements, collected data is stored in a message queue manner; for another example, because the access quantity of the structured data may be abnormally large, the statistical data information with low timeliness requirements can be stored in a distributed storage mode; in addition, for videos and documents, the source file data is stored in a local storage mode.
And finally, the execution main body outputs the stored data based on a pre-customized data output protocol. Batch, streaming, full-volume and incremental data sources are provided in the data output protocol, so that different production modes can be supported, for example, full-text streaming production, FAQ (frequencyty assigned Questions) stream construction, video construction, event construction, map full-volume, map increment and other modes. For example, for data with higher timeliness requirements, the data is output in a message queue MQ manner; data for the HDFS (Distributed File System) is output in an incremental and full manner, so as to meet different service requirements.
With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of a data access apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which can be specifically applied to various electronic devices.
As shown in fig. 5, the data access apparatus 500 of the present embodiment includes: the device comprises an acquisition module 501, a preprocessing module 502, an acquisition module 503 and a storage module 504. The obtaining module 501 is configured to obtain data to be accessed, which includes at least one data type; the preprocessing module 502 is configured to verify data to be accessed based on a data protocol specification corresponding to each data type, and in response to the data to be accessed passing the inspection, preprocess the data to be accessed to obtain preprocessed data; an acquisition module 503 configured to acquire the preprocessed data to obtain acquired data; and a storage module 504 configured to store the collected data to complete the data access process.
In this embodiment, in the data access apparatus 500: the specific processing of the obtaining module 501, the preprocessing module 502, the collecting module 503 and the storing module 504 and the technical effects thereof can refer to the related descriptions of steps 201 to 204 in the corresponding embodiment of fig. 2, which are not repeated herein.
In some optional implementations of this embodiment, the data to be accessed includes video data; and the preprocessing module comprises: the segmentation sub-module is configured to segment the video data to obtain at least one sub-video file; the first conversion submodule is configured to respectively acquire a streaming media file and meta information of each sub video file in at least one sub video file, and convert the streaming media file and the meta information into data in a JSON format.
In some optional implementations of this embodiment, the data to be accessed includes document data; and the preprocessing module further comprises: the second conversion sub-module is configured to convert the document data into webpage data to obtain converted webpage data; and the third conversion submodule is configured to convert the converted webpage data and the meta information of the document data into data in the JSON format.
In some optional implementations of this embodiment, the acquisition module includes: a first acquisition submodule configured to acquire data in a JSON format based on a pre-customized interface component, wherein the interface component includes at least one of: the device comprises an interface pushing component, an interface pulling component and a file transfer protocol component.
In some optional implementations of this embodiment, the acquisition module further includes: and the second acquisition sub-module is configured to acquire the streaming media file and the converted webpage data based on the file uploading component.
In some optional implementations of this embodiment, the storage module includes: and the storage submodule is configured to store the acquired data based on at least one storage mode of a message queue, distributed storage and local storage.
In some optional implementations of this embodiment, the data access apparatus 500 further includes: an output module configured to output the stored data based on a pre-customized data output protocol.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 601 performs the various methods and processes described above, such as the data access method. For example, in some embodiments, the data access method or the image retrieval method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM 603 and executed by the computing unit 601, one or more steps of the data access method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the data access method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server combining a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (14)

1. A data access method, comprising:
acquiring data to be accessed comprising at least one data type;
verifying the data to be accessed based on the data protocol specification corresponding to each data type, responding to the data to be accessed and passing the inspection, and preprocessing the data to be accessed to obtain preprocessed data;
collecting the preprocessed data to obtain collected data;
storing the acquired data to finish the data access process;
wherein the collecting the pre-processing data comprises:
collecting data in a JSON format based on a pre-customized interface component, wherein the interface component comprises at least one of the following components: the device comprises an interface pushing component, an interface pulling component and a file transfer protocol component; and
in response to determining that the data to be accessed is large in data volume, acquiring the preprocessed data by adopting the interface pushing assembly;
and in response to the fact that the data to be accessed is small in data volume, acquiring the preprocessed data by adopting the interface pulling component and the file transfer protocol component.
2. The method of claim 1, wherein the data to be accessed comprises video data; and
the preprocessing the data to be accessed comprises:
segmenting the video data to obtain at least one sub-video file;
respectively obtaining the streaming media file and the meta-information of each sub-video file in the at least one sub-video file, and converting the streaming media file and the meta-information into data in a JSON JavaScript object numbered notation format.
3. The method of claim 1, wherein the data to be accessed comprises document data; and
the preprocessing the data to be accessed further comprises:
converting the document data into webpage data to obtain converted webpage data;
and converting the converted webpage data and the meta information of the document data into data in a JSON format.
4. The method of claim 3, wherein the collecting the pre-processing data further comprises:
and acquiring the streaming media file and the converted webpage data based on the file uploading component.
5. The method of any of claims 1-4, wherein the storing the acquisition data comprises:
and storing the acquired data based on at least one storage mode of a message queue, distributed storage and local storage.
6. The method of any of claims 1-5, further comprising:
and outputting the stored data based on the pre-customized data output protocol.
7. A data access apparatus, comprising:
the device comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is configured to acquire data to be accessed comprising at least one data type;
the preprocessing module is configured to verify the data to be accessed based on the data protocol specification corresponding to each data type, respond to the data to be accessed and pass the verification, and preprocess the data to be accessed to obtain preprocessed data;
the acquisition module is configured to acquire the preprocessed data to obtain acquired data;
a storage module configured to store the collected data to complete a data access process;
wherein, the collection module includes:
a first acquisition submodule configured to acquire data in a JSON format based on a pre-customized interface component, wherein the interface component includes at least one of: the device comprises an interface pushing component, an interface pulling component and a file transfer protocol component; and
in response to determining that the data to be accessed is large in data volume, acquiring the preprocessed data by adopting the interface pushing assembly;
and in response to the fact that the data to be accessed is small in data volume, acquiring the preprocessed data by adopting the interface pulling component and the file transfer protocol component.
8. The apparatus of claim 7, wherein the data to be accessed comprises video data; and
the preprocessing module comprises:
a partitioning submodule configured to partition the video data to obtain at least one sub-video file;
the first conversion submodule is configured to acquire a streaming media file and meta information of each of the at least one sub video file respectively, and convert the streaming media file and the meta information into data in a JSON format.
9. The apparatus of claim 7, wherein the data to be accessed comprises document data; and
the preprocessing module further comprises:
the second conversion sub-module is configured to convert the document data into webpage data to obtain converted webpage data;
and the third conversion submodule is configured to convert the converted webpage data and the meta information of the document data into data in a JSON format.
10. The apparatus of claim 9, wherein the acquisition module further comprises:
a second acquisition submodule configured to acquire the streaming media file and the converted web page data based on the file upload component.
11. The apparatus of any of claims 7-10, wherein the storage module comprises:
and the storage submodule is configured to store the acquired data based on at least one storage mode of a message queue, distributed storage and local storage.
12. The apparatus of any of claims 7-11, further comprising:
an output module configured to output the stored data based on a pre-customized data output protocol.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.
CN202111514637.9A 2021-12-13 2021-12-13 Data access method, device, equipment and storage medium Active CN114201493B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111514637.9A CN114201493B (en) 2021-12-13 2021-12-13 Data access method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111514637.9A CN114201493B (en) 2021-12-13 2021-12-13 Data access method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114201493A CN114201493A (en) 2022-03-18
CN114201493B true CN114201493B (en) 2023-04-07

Family

ID=80652696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111514637.9A Active CN114201493B (en) 2021-12-13 2021-12-13 Data access method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114201493B (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10303999B2 (en) * 2011-02-22 2019-05-28 Refinitiv Us Organization Llc Machine learning-based relationship association and related discovery and search engines
CN111737528A (en) * 2020-06-23 2020-10-02 Oppo(重庆)智能科技有限公司 Data acquisition and verification method and device, electronic equipment and storage medium
CN113254578B (en) * 2021-05-20 2023-07-28 北京百度网讯科技有限公司 Method, apparatus, device, medium and product for data clustering

Also Published As

Publication number Publication date
CN114201493A (en) 2022-03-18

Similar Documents

Publication Publication Date Title
CN112527649A (en) Test case generation method and device
US11758088B2 (en) Method and apparatus for aligning paragraph and video
CN109033404B (en) Log data processing method, device and system
US20190163828A1 (en) Method and apparatus for outputting information
US20170286377A1 (en) Narrative generation using pattern recognition
CN109062560B (en) Method and apparatus for generating information
CN110866040A (en) User portrait generation method, device and system
CN109697452B (en) Data object processing method, processing device and processing system
CN107357526B (en) Method and apparatus for processing network data, server, and storage medium
CN111427899A (en) Method, device, equipment and computer readable medium for storing file
CN110188113B (en) Method, device and storage medium for comparing data by using complex expression
CN111680799A (en) Method and apparatus for processing model parameters
CN113010542B (en) Service data processing method, device, computer equipment and storage medium
CN113724398A (en) Augmented reality method, apparatus, device and storage medium
CN110852057A (en) Method and device for calculating text similarity
CN114201493B (en) Data access method, device, equipment and storage medium
CN113590447B (en) Buried point processing method and device
CN115330540A (en) Method and device for processing transaction data
CN113360672B (en) Method, apparatus, device, medium and product for generating knowledge graph
CN113326305A (en) Method and device for processing data
CN113076254A (en) Test case set generation method and device
CN113779018A (en) Data processing method and device
CN111178014A (en) Method and device for processing business process
CN112799903A (en) Method and device for evaluating health state of business system
CN113609131B (en) Data storage method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant