WO2022267368A1 - 数据处理方法及装置,计算设备和介质 - Google Patents

数据处理方法及装置,计算设备和介质 Download PDF

Info

Publication number
WO2022267368A1
WO2022267368A1 PCT/CN2021/136561 CN2021136561W WO2022267368A1 WO 2022267368 A1 WO2022267368 A1 WO 2022267368A1 CN 2021136561 W CN2021136561 W CN 2021136561W WO 2022267368 A1 WO2022267368 A1 WO 2022267368A1
Authority
WO
WIPO (PCT)
Prior art keywords
data stream
database
data
storage information
storage
Prior art date
Application number
PCT/CN2021/136561
Other languages
English (en)
French (fr)
Inventor
田永生
汪婷
石然
朱良昌
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Priority to KR1020227033396A priority Critical patent/KR20220138867A/ko
Priority to JP2022562122A priority patent/JP2023534347A/ja
Priority to US17/921,620 priority patent/US20230306031A1/en
Priority to EP21936262.1A priority patent/EP4152174A4/en
Publication of WO2022267368A1 publication Critical patent/WO2022267368A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation

Definitions

  • the present disclosure relates to the technical field of big data, in particular to the technical field of data splicing, and in particular to a data processing method, device, computing device, computer-readable storage medium, and computer program product.
  • data processing platforms In data-driven fields such as big data and machine learning, data processing platforms (frameworks) play an important role. For example, common open source data processing frameworks such as Spark Streaming, Storm, and Apache Flink are widely used. From the perspective of data processing logic, data processing can be divided into single data stream processing (for example, filtering, transformation, etc.), and multiple data stream processing (for example, aggregation, splicing, etc.).
  • the present disclosure provides a data processing method, a device, an electronic device, a computer-readable storage medium, and a computer program product.
  • a data processing method including: acquiring a first data stream; querying the storage information of a second data stream corresponding to the first data stream in a database in a lookup table, wherein the database Stored in a plurality of second data streams, the plurality of second data streams include at least one valid second data stream, and the lookup table includes at least one valid second data stream in the database for each second data stream storing information; and in response to querying the storage information of the second data stream corresponding to the first data stream in the look-up table, based on the storage information, determining in the database the second data for performing splicing processing with the first data stream flow.
  • a data processing device including: an acquisition unit configured to acquire a first data stream; a query unit configured to query a query table corresponding to the first data stream The storage information of the second data stream in the database, wherein the database stores multiple second data streams, the multiple second data streams include at least one valid second data stream, and the lookup table includes at least one valid second data stream The storage information of each second data stream in the data stream in the database; and the first determining unit, configured to respond to querying the storage information of the second data stream corresponding to the first data stream in the lookup table , based on the stored information, determine in the database the second data stream for splicing with the first data stream.
  • a computing device including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by at least one processor, and the instructions are executed by at least one processor. Execution by a processor, so that at least one processor can execute the above method.
  • a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause a computer to execute the above method.
  • a computer program product including a computer program, wherein the computer program implements the above method when executed by a processor.
  • invalid data search can be avoided, and the efficiency of data processing can be improved.
  • FIG. 1 shows a schematic diagram of an exemplary system in which various methods described herein may be implemented according to an embodiment of the present disclosure
  • FIG. 2 shows a flow chart of a data processing method according to an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of a data processing method according to an embodiment of the present disclosure.
  • Fig. 4 shows a structural block diagram of a data processing device according to an embodiment of the present disclosure
  • FIG. 5 shows a structural block diagram of an exemplary electronic device that can be used to implement the embodiments of the present disclosure.
  • first, second, etc. to describe various elements is not intended to limit the positional relationship, temporal relationship or importance relationship of these elements, and such terms are only used for Distinguishes one element from another.
  • first element and the second element may refer to the same instance of the element, and in some cases, they may also refer to different instances based on contextual description.
  • Data splicing refers to the technology that two or more different data streams from different data sources are merged into one piece of data based on business associations for downstream processing.
  • the first data stream from the first data source may be the user's click event
  • the second data stream from the second data source may be the display data of the page.
  • the downstream can analyze the user's click on the page based on the acquired spliced data.
  • the corresponding second data stream is often directly searched in the database, because the database The amount of data stored in the database is huge, and the search process takes a long time, which affects the splicing efficiency.
  • the present disclosure provides a data processing method, which establishes a lookup table independent of the database, stores valid storage information of the second data stream in the lookup table, and only when the required second data is queried in the lookup table
  • the search of the database is only performed when the information is stored in the stream, thereby avoiding invalid searches and improving the efficiency of data processing.
  • FIG. 1 shows a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented according to an embodiment of the present disclosure.
  • the system 100 includes one or more client devices 101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks coupling the one or more client devices to the server 120 110.
  • Client devices 101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.
  • the server 120 may run one or more services or software applications that enable the method of data processing to be performed.
  • server 120 may also provide other services or software applications that may include non-virtualized environments and virtualized environments.
  • these services may be provided as web-based services or cloud services, such as under a software-as-a-service (SaaS) model to users of client devices 101, 102, 103, 104, 105, and/or 106 .
  • SaaS software-as-a-service
  • server 120 may include one or more components that implement the functions performed by server 120 . These components may include software components, hardware components or combinations thereof executable by one or more processors. Users operating client devices 101, 102, 103, 104, 105, and/or 106 may in turn utilize one or more client application programs to interact with server 120 to utilize the services provided by these components. It should be understood that various different system configurations are possible, which may differ from system 100 . Accordingly, FIG. 1 is one example of a system for implementing the various methods described herein, and is not intended to be limiting.
  • a user may use a client device 101 , 102 , 103 , 104 , 105 and/or 106 to obtain the first data stream and/or the second data stream.
  • a client device may provide an interface that enables a user of the client device to interact with the client device. The client device can also output information to the user via the interface.
  • FIG. 1 depicts only six client devices, those skilled in the art will understand that the present disclosure can support any number of client devices.
  • Client devices 101, 102, 103, 104, 105, and/or 106 may include various types of computing devices, such as portable handheld devices, general-purpose computers (such as personal computers and laptops), workstation computers, wearable devices, Gaming systems, thin clients, various messaging devices, sensors or other sensing devices, etc. These computer devices can run various types and versions of software applications and operating systems, such as MICROSOFT Windows, APPLE iOS, UNIX-like operating systems, Linux or Linux-like operating systems (such as GOOGLE Chrome OS); or include various mobile operating systems , such as MICROSOFT Windows Mobile OS, iOS, Windows Phone, Android.
  • Portable handheld devices may include cellular phones, smart phones, tablet computers, personal digital assistants (PDAs), and the like.
  • Wearable devices can include head-mounted displays and other devices. Gaming systems may include various handheld gaming devices, Internet-enabled gaming devices, and the like.
  • a client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (eg, email applications), Short Message Service (SMS) applications, and may use a variety of communication protocols.
  • communication applications eg, email applications
  • SMS Short Message Service
  • Network 110 can be any type of network known to those skilled in the art that can support data communications using any of a number of available protocols, including but not limited to TCP/IP, SNA, IPX, and the like.
  • the one or more networks 110 may be a local area network (LAN), an Ethernet-based network, a token ring, a wide area network (WAN), the Internet, a virtual network, a virtual private network (VPN), an intranet, an extranet, Public switched telephone network (PSTN), infrared network, wireless network (eg Bluetooth, WIFI) and/or any combination of these and/or other networks.
  • LAN local area network
  • Ethernet-based network a token ring
  • WAN wide area network
  • VPN virtual private network
  • PSTN Public switched telephone network
  • WIFI wireless network
  • Server 120 may include one or more general purpose computers, dedicated server computers (e.g., PC (personal computer) servers, UNIX servers, midrange servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination .
  • Server 120 may include one or more virtual machines running virtual operating systems, or other computing architectures involving virtualization (eg, one or more flexible pools of logical storage devices that may be virtualized to maintain the server's virtual storage devices).
  • server 120 may run one or more services or software applications that provide the functionality described below.
  • Computing units in server 120 may run one or more operating systems including any of the operating systems described above as well as any commercially available server operating systems.
  • Server 120 may also run any of a variety of additional server applications and/or middle-tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, and the like.
  • server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of client devices 101 , 102 , 103 , 104 , 105 , and 106 .
  • Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of client devices 101 , 102 , 103 , 104 , 105 , and 106 .
  • the server 120 may be a server of a distributed system, or a server combined with blockchain.
  • the server 120 can also be a cloud server, or an intelligent cloud computing server or an intelligent cloud host with artificial intelligence technology.
  • Cloud server is a host product in the cloud computing service system to solve the defects of difficult management and weak business scalability existing in traditional physical host and virtual private server (VPS, Virtual Private Server) services.
  • System 100 may also include one or more databases 130 .
  • these databases may be used to store data and other information.
  • databases 130 may be used to store information such as audio files and video files.
  • Data repository 130 may reside in various locations.
  • the data store used by server 120 may be local to server 120, or may be remote from server 120 and may communicate with server 120 via a network-based or dedicated connection.
  • Data repository 130 can be of different types.
  • the data store used by server 120 may be a database, such as a relational database.
  • One or more of these databases may store, update and retrieve the database and data from the database in response to commands.
  • databases 130 may also be used by applications to store application data.
  • Databases used by applications can be different types of databases such as key-value stores, object stores or regular stores backed by a file system.
  • the system 100 of FIG. 1 may be configured and operated in various ways to enable application of the various methods and apparatuses described in accordance with this disclosure.
  • Fig. 2 is a flowchart showing a data processing method according to an exemplary embodiment of the present disclosure.
  • the method includes: step S201, obtaining the first data stream; step S202, querying the first data stream in the lookup table The storage information of the second data stream corresponding to the stream in the database, wherein the database stores multiple second data streams, the multiple second data streams include at least one valid second data stream, and the lookup table includes at least one The storage information of each second data stream in the valid second data stream in the database; and step S203, in response to finding the storage information of the second data stream corresponding to the first data stream in the lookup table, based on Information is stored to determine a second data stream for splicing processing with the first data stream in the database. In this way, it is possible to effectively reduce the time spent searching for the second data stream to be spliced, and improve splicing efficiency.
  • the second data stream acquired earlier than the first data stream may be stored in the database, so as to wait for the corresponding first data stream acquired later to implement splicing processing.
  • each second data stream in the database has a corresponding storage index
  • the storage index of each second data stream may include one or more attribute information of the second data stream, for example, the second data The timestamp of the stream and/or the identification information of the second data stream, etc.
  • the multiple second data streams in the database can be sorted based on the order indicated by their corresponding storage indexes.
  • the storage index of each second data stream in the database may be composed of one or more types of storage information corresponding to the second data stream.
  • the newly added second data stream may be acquired at the same time.
  • storage information corresponding to the newly added second data stream may be correspondingly stored in the lookup table.
  • the lookup table only contains the storage information of the partially valid second data streams that can perform the current splicing operation among the multiple second data streams stored in the database, and the invalid second data can be deleted through dynamic updating.
  • Stream storage information For example, based on one or more factors such as the time validity of the second data stream, whether a splicing operation has been performed, and the attributes of the second data stream, the invalid storage information of the second data stream in the lookup table may be deleted, To ensure that only valid storage information of the second data stream is saved in the lookup table.
  • the obtained second data flow will be stored in the database for a long time, so that the second data flow that needs to be viewed can be traced back at the necessary time, wherein the second data stored in the database
  • the streams include not only valid second data streams that can be spliced currently, but also a large number of invalid second data streams that cannot be spliced currently.
  • a query table independent of the database is established.
  • the lookup table can be updated in real time based on the validity of the second data stream, and only the storage information of the currently valid second data stream is kept, so that only the information of the second data stream corresponding to the first data stream can be queried in the lookup table When the information is stored, the search in the database is performed, which greatly reduces the time consumption of the search.
  • the data in the lookup table may be stored in the form of data fragments (KeyGroup).
  • the first data stream may include a first identifier
  • the storage information of each second data stream in the database of at least one valid second data stream may include a second identifier of the second data stream
  • Querying the stored information of the second data stream corresponding to the first data stream in the database in the lookup table may include: querying the lookup table for the second identifier corresponding to the first identifier; and including the stored information of the second identifier It is determined as storage information of the second data stream corresponding to the first data stream. Therefore, the storage information of the second data stream corresponding to the first data stream in the database can be conveniently determined.
  • the first identifier of the first data stream may include at least one type of attribute information used to represent the first data stream
  • the second identifier of the second data stream may include at least one type of attribute information used to represent the second data stream
  • the first identifier and the second identifier may be user IDs corresponding to the data streams, and then it may be determined that the first data stream and the second data stream with the same user ID are two data streams for splicing.
  • the stored information in the look-up table is deleted.
  • the storage information of the matched second data stream can be deleted in time in the lookup table, so that only the storage information of the unmatched second data stream is kept in the lookup table, the invalid data in the lookup table is reduced, and the efficiency in matching is improved.
  • the query efficiency during the query process of the query table is improved.
  • the storage information of each second data stream in the lookup table may include the timestamp of the second data stream
  • the method may further include: responding to the storage information of each second data stream in the lookup table
  • the time difference between the historical moment indicated by the time stamp in the stored information and the current water level time exceeds the preset delay range
  • the stored information is deleted.
  • the storage information of the second data stream that has lost timeliness can be deleted in time in the lookup table, so that only the storage information of the second data stream with timeliness is kept in the lookup table, reducing invalid data in the lookup table and improving The query efficiency during the query process of the query table.
  • the water level time (watermark) is an indicator used to measure progress in the process of real-time calculation.
  • a centralized module in the system may be used to record the current water level time corresponding to the lookup table.
  • step S203 is executed, and based on the stored information, the second data stream used for splicing processing with the first data stream is determined in the database.
  • the spliced first data stream and second data stream can be delivered to downstream for corresponding processing and analysis.
  • the original data of the spliced second data stream is still stored in the database, and the second data stream stored in the database is not deleted due to splicing.
  • determining in the database the second data stream for splicing with the first data stream may include: determining a storage index in the database corresponding to the storage information; and based on the determined storage index , determining the second data stream corresponding to the storage index in the database.
  • the database further includes a status identifier corresponding to each second data stream in the plurality of second data streams
  • the method further includes: determining the second data stream used for splicing processing with the first data stream Afterwards, the status flag corresponding to the second data stream is set to indicate that the splicing process of the second data stream has been completed.
  • each second data stream is a spliced data stream or an unspliced data stream.
  • At least one second data stream to be searched in the database is determined based on a preset search period, wherein each of the at least one second data stream to be searched for is indicated by a time stamp of the second data stream to be searched for The historical moment is within the preset search period; and for each second data stream to be searched in at least one second data stream to be searched, indicating the second data stream in response to the status identifier corresponding to the second data stream to be searched.
  • the splicing process is not completed, and the second data stream is determined as the second data stream to be transmitted.
  • the unspliced second data stream can be efficiently acquired in the database, and then transmitted to downstream processing.
  • the time difference between the termination moment of the preset search period and the current water level time is greater than the preset waiting time range.
  • the preset waiting time range indicates the longest time that the second data stream stored in the database can wait for the first data stream, and for the second data stream that has not been successfully spliced beyond the preset waiting time range, it can be determined as A second data stream is to be transmitted.
  • a centralized module in the system may be used to record the current water level time corresponding to the database.
  • the multiple second data streams included in the database are sorted based on the sequence of historical moments indicated by their corresponding timestamps.
  • the plurality of second data streams in the database can be arranged in chronological order, so as to facilitate searching for the second data streams to be transmitted in the database according to the time stamp information.
  • Table 1 shows an exemplary storage mode of the second data streams in the database.
  • five second data streams are taken as an example for illustration, and the present disclosure does not limit the number of second data streams stored in the database.
  • Second data stream E 1 storage index data column status flag Timestamp 001_Second ID A Second data stream A 0 Timestamp 002_Second ID B Second data stream B 1 Timestamp 003_Second ID C Second data stream C 0 Timestamp 003_Second ID D Second data stream D 0 Timestamp 004_Second ID E Second data stream E 1
  • the second data streams stored in the database are arranged in order of the time stamp of each second data stream.
  • the timestamp 001 and the second identifier A of the second data stream A constitute the storage index of the second data stream A in the database
  • the timestamp 002 and the second identifier B of the second data stream B constitute the second data stream
  • the storage index of B in the database, the time stamp 003 and the second identification C of the second data stream C constitute the storage index of the second data stream C in the database
  • the storage index of the second data stream D in the database is constituted
  • the time stamp 004 and the second identifier E of the second data stream E constitute the storage index of the second data stream E in the database.
  • the state flag 1 may be used to indicate that the corresponding second data stream has completed splicing processing
  • the state flag 0 may be used to indicate that the corresponding second data stream has not completed splicing processing.
  • determining at least one second data stream to be searched in the database may include: determining a start time and an end time corresponding to the preset search time period; searching the database corresponding to the start time The starting storage index, and the ending storage index corresponding to the ending time; and determining the second data stream within the storage range determined by the starting storage index and the ending storage index in the database as at least one second data stream to be searched. Therefore, in the process of performing a search in the database, it is only necessary to determine the start storage index and the end storage index corresponding to the start time and the end time in the database, and all the searchable items corresponding to the preset search period can be determined. The second data stream can only effectively save the time of searching and determining the final second data stream to be transmitted.
  • the starting storage index corresponding to the starting time is "time stamp 002_second identification B”
  • the ending storage index corresponding to the ending time is "time stamp 004_second identification E”
  • the second data flow B, the second data stream B, the second data stream within the storage range determined by the starting storage index "timestamp 002_second identification B" and the ending storage index "timestamp 004_second identification E" in the database can be determined.
  • the second data stream C, the second data stream D and the second data stream E are at least one second data stream to be searched.
  • the state flags of the second data stream C and the second data stream D are 0, indicating that the splicing process of the second data stream C and the second data stream D has not been completed, and the second data stream C and the second data stream D can be used as A second data stream is to be transmitted.
  • the first data stream in response to no storage information of the second data stream corresponding to the first data stream is found in the lookup table, the first data stream is cached. In this way, it is possible to avoid affecting the splicing success rate of the first data stream and the second data stream in the case of a time lag in obtaining the second data stream.
  • the first data stream may be cached in a lookup table.
  • the second data stream in response to the first data stream not being spliced successfully after a preset period of time, is deleted from the lookup table.
  • Fig. 3 is a flowchart illustrating a data stream processing method according to an exemplary embodiment of the present disclosure.
  • the acquired second data stream is stored after being processed by the data stream processing module.
  • the data stream processing module extracts the timestamp and the second identifier of the second data stream, and constructs a storage index of the second data stream in the database based on the timestamp and the second identifier.
  • the second data stream is stored in the database at the storage location indicated by the storage index.
  • the time stamp and the second identifier of the second data stream are stored in the lookup table as storage information of the second data stream.
  • the water level module stores current water level time information corresponding to the database, and the startup module uses the water level time information obtained from the water level module to start the data search module to determine at least one second data stream to be searched in the database.
  • the water level time acquired by the startup module from the water level module is 10:00
  • the preset waiting time range is 2 hours. If the searchable time window is 10 minutes, the preset search period can be set to 7 :50-8:00.
  • the second data stream For each second data stream to be searched in the at least one second data stream to be searched, in response to the state identifier corresponding to the second data stream to be searched indicating that the second data stream has not completed splicing processing, the second data stream The flow is determined as the second data flow to be transmitted, and is transmitted to the downstream through the transmission module for corresponding processing.
  • a data processing device 400 including: an acquisition unit 401 configured to acquire a first data stream; a query unit 402 configured to query a query table related to the first The storage information of the second data stream corresponding to the data stream in the database, where multiple second data streams are stored in the database, the multiple second data streams include at least one valid second data stream, and the lookup table includes at least Storage information of each second data stream in a valid second data stream in the database; and the first determining unit 403 is configured to respond to finding the first data stream corresponding to the first data stream in the lookup table
  • the storage information of the two data streams is used to determine the second data stream for splicing processing with the first data stream in the database based on the storage information.
  • the apparatus further includes: a cache unit configured to cache the first data stream in response to no storage information of the second data stream corresponding to the first data stream being found in the lookup table.
  • the first data stream includes a first identifier
  • the storage information of each second data stream in the database of at least one valid second data stream includes a second identifier of the second data stream
  • the query unit includes : a module for querying a second identifier corresponding to the first identifier in a lookup table; and a module for determining storage information including the second identifier as storage information of a second data stream corresponding to the first data stream.
  • the device further includes: a first deletion unit configured to delete the storage information in the lookup table in response to finding the storage information of the second data stream corresponding to the first data stream in the lookup table information.
  • the storage information of each second data stream in the look-up table includes the timestamp of the second data stream
  • the device further includes: a second deletion unit configured to, for each second data stream in the look-up table For the storage information of the data stream, in response to the time difference between the historical moment indicated by the time stamp in the storage information and the current water level time exceeding the preset delay range, the storage information is deleted.
  • the database further includes a status identifier corresponding to each second data stream in the plurality of second data streams
  • the device further includes: a setting unit configured to perform After splicing the processed second data stream, setting a status flag corresponding to the second data stream indicates that the splicing process of the second data stream has been completed.
  • the device further includes: a second determining unit configured to determine at least one second data stream to be searched in the database based on a preset search period, wherein the at least one second data stream to be searched for is The historical moment indicated by the time stamp of each second data stream to be searched is within the preset search period; For the data stream, in response to the state identifier corresponding to the second data stream to be searched indicating that the splicing process has not been completed for the second data stream, the second data stream is determined as the second data stream to be transmitted.
  • the plurality of second data streams included in the database are sorted based on the sequence of historical moments indicated by their corresponding timestamps
  • the second determining unit includes: determining the preset search period corresponding to The modules of the start time and the end time; search the database for the start storage index corresponding to the start time, and the module for the end storage index corresponding to the end time; and determine the starting storage index and the end storage index in the database
  • the second data stream within the determined storage range is at least one module to be searched for the second data stream.
  • a computing device including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by Executed by at least one processor, so that at least one processor can execute any one of the above methods.
  • a non-transitory computer-readable storage medium storing computer instructions is also disclosed, wherein the computer instructions are used to make a computer perform any of the above-mentioned methods.
  • a computer program product including a computer program, wherein the computer program implements any one of the above methods when executed by a processor.
  • Electronic device is intended to mean various forms of digital electronic computing equipment, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the device 500 includes a computing unit 501 that can execute according to a computer program stored in a read-only memory (ROM) 502 or loaded from a storage unit 508 into a random-access memory (RAM) 503. Various appropriate actions and treatments. In the RAM 503, various programs and data necessary for the operation of the device 500 can also be stored.
  • the computing unit 501, ROM 502, and RAM 503 are connected to each other through a bus 504.
  • An input/output (I/O) interface 505 is also connected to the bus 504 .
  • the input unit 506 can be any type of equipment capable of inputting information to the device 500, the input unit 506 can receive input digital or character information, and generate key signal input related to user settings and/or function control of the electronic device, and can Including but not limited to mouse, keyboard, touch screen, trackpad, trackball, joystick, microphone and/or remote control.
  • the output unit 507 may be any type of device capable of presenting information, and may include, but is not limited to, a display, a speaker, a video/audio output terminal, a vibrator, and/or a printer.
  • the storage unit 508 may include, but is not limited to, a magnetic disk and an optical disk.
  • the communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks, and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication transceiver and/or a chipset , such as a BluetoothTM device, a 1302.11 device, a WiFi device, a WiMax device, a cellular communication device, and/or the like.
  • the computing unit 501 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing units 501 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc.
  • the computing unit 501 executes various methods and processes described above, such as data processing methods.
  • the data processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 508 .
  • part or all of the computer program may be loaded and/or installed on the device 500 via the ROM 502 and/or the communication unit 509.
  • the computer program When the computer program is loaded into RAM 503 and executed by computing unit 501, one or more steps of the data processing method described above may be performed.
  • the computing unit 501 may be configured to execute the data processing method in any other suitable manner (for example, by means of firmware).
  • Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips Implemented in a system of systems (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC system of systems
  • CPLD load programmable logic device
  • computer hardware firmware, software, and/or combinations thereof.
  • programmable processor can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
  • Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or a trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
  • the systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.
  • a computer system may include clients and servers.
  • Clients and servers are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
  • the server can be a cloud server, a server of a distributed system, or a server combined with a blockchain.
  • steps may be reordered, added or deleted using the various forms of flow shown above.
  • each step described in the present disclosure may be executed in parallel, sequentially or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据处理方法及装置,计算设备和介质,涉及大数据领域,尤其涉及数据拼接领域。实现方案为:获取第一数据流(S201);在查询表中查询与所述第一数据流相对应的第二数据流在数据库中的存储信息,其中,所述数据库中存储有多个第二数据流,多个第二数据流包括至少一个有效的第二数据流,查询表中包括所述至少一个有效的第二数据流中的每一个第二数据流在所述数据库中的存储信息(S202);以及响应于在查询表中查询到与所述第一数据流相对应的第二数据流的存储信息,基于所述存储信息,在所述数据库中确定用于与所述第一数据流执行拼接处理的第二数据流(S203)。

Description

数据处理方法及装置,计算设备和介质
相关申请的交叉引用
本申请要求于2021年6月23日提交的中国专利申请202110700394.1的优先权,其全部内容通过引用整体结合在本申请中。
技术领域
本公开涉及大数据技术领域,尤其涉及数据拼接技术领域,具体涉及一种数据处理的方法、装置、计算设备、计算机可读存储介质和计算机程序产品。
背景技术
在大数据、机器学习等数据驱动的领域中,数据处理平台(框架)起到重要的作用,例如常见的Spark Streaming、Storm、Apache Flink等等开源数据处理框架广泛应用其中。从数据处理逻辑来看,数据处理可以分为单条数据流处理(例如,过滤、变换等等)、多条数据流处理(例如,聚合、拼接等等)两种。
在此部分中描述的方法不一定是之前已经设想到或采用的方法。除非另有指明,否则不应假定此部分中描述的任何方法仅因其包括在此部分中就被认为是现有技术。类似地,除非另有指明,否则此部分中提及的问题不应认为在任何现有技术中已被公认。
发明内容
本公开提供了一种数据处理方法的方法、装置、电子设备、计算机可读存储介质和计算机程序产品。
根据本公开的一方面,提供了一种数据处理方法,包括:获取第一数据流;在查询表中查询与第一数据流相对应的第二数据流在数据库中的存储信息,其中,数据库中存储有多个第二数据流,多个第二数据流包括至少一个有效的第二数据流,查询表中包括至少一个有效的第二数据流中的每一个第二数据流在数据库中的存储信息;以及响应于在查询表中查询到与第一数据流相对 应的第二数据流的存储信息,基于存储信息,在数据库中确定用于与第一数据流执行拼接处理的第二数据流。
根据本公开的一方面,提供了一种数据处理装置,包括:获取单元,被配置用于获取第一数据流;查询单元,被配置用于在查询表中查询与第一数据流相对应的第二数据流在数据库中的存储信息,其中,数据库中存储有多个第二数据流,多个第二数据流包括至少一个有效的第二数据流,查询表中包括至少一个有效的第二数据流中的每一个第二数据流在数据库中的存储信息;以及第一确定单元,被配置用于响应于在查询表中查询到与第一数据流相对应的第二数据流的存储信息,基于存储信息,在数据库中确定用于与第一数据流执行拼接处理的第二数据流。
根据本公开的一方面,提供了一种计算设备,包括:至少一个处理器;以及与至少一个处理器通信连接的存储器;其中存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行上述的方法。
根据本公开的一方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,计算机指令用于使计算机执行根据上述的方法。
根据本公开的一方面,提供了一种计算机程序产品,包括计算机程序,其中,计算机程序在被处理器执行时实现上述的方法。
根据本公开的一个或多个实施例,可以避免无效的数据搜索,提升数据处理的效率。
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。
附图说明
附图示例性地示出了实施例并且构成说明书的一部分,与说明书的文字描述一起用于讲解实施例的示例性实施方式。所示出的实施例仅出于例示的目的,并不限制权利要求的范围。在所有附图中,相同的附图标记指代类似但不一定相同的要素。
图1示出了根据本公开的实施例的可以在其中实施本文描述的各种方法的示例性系统的示意图;
图2示出了根据本公开的实施例的数据处理方法的流程图;
图3根据本公开的实施例的数据处理方法的示意图;
图4示出了根据本公开的实施例的数据处理装置的结构框图;
图5示出了能够用于实现本公开的实施例的示例性电子设备的结构框图。
具体实施方式
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。
在本公开中,除非另有说明,否则使用术语“第一”、“第二”等来描述各种要素不意图限定这些要素的位置关系、时序关系或重要性关系,这种术语只是用于将一个元件与另一元件区分开。在一些示例中,第一要素和第二要素可以指向该要素的同一实例,而在某些情况下,基于上下文的描述,它们也可以指代不同实例。
在本公开中对各种示例的描述中所使用的术语只是为了描述特定示例的目的,而并非旨在进行限制。除非上下文另外明确地表明,如果不特意限定要素的数量,则该要素可以是一个也可以是多个。此外,本公开中所使用的术语“和/或”涵盖所列出的项目中的任何一个以及全部可能的组合方式。
数据拼接是指两条以上的来自不同的数据源的不同数据流,基于业务上的关联,合并为一份数据,以提供给下游处理的技术。例如,来自第一数据源的第一数据流可以为用户的点击事件,来自第二数据源的第二数据流可以为页面的展示数据,将第一数据流和对应的第二数据流拼接后提供至下游处理,能够使得下游基于所获取的拼接后的数据,分析出用户对页面的点击情况。
在相关技术中,为了获取与第一数据流相对应的第二数据流,以实现对第一数据流和第二数据流的拼接,往往直接在数据库中搜索对应的第二数据流,由于数据库中所存储的数据量巨大,搜索过程耗时较长,影响拼接效率。
基于此,本公开提供了一种数据处理方法,建立独立于数据库的查询表,在查询表中存储有效的第二数据流的存储信息,并仅当在查询表中查询到所需第二数据流的存储信息时,才执行对数据库的搜索,进而能够避免无效的搜索,提升数据处理效率。
下面将结合附图详细描述本公开的实施例。
图1示出了根据本公开的实施例可以将本文描述的各种方法和装置在其中实施的示例性系统100的示意图。参考图1,该系统100包括一个或多个客户端设备101、102、103、104、105和106、服务器120以及将一个或多个客户端设备耦接到服务器120的一个或多个通信网络110。客户端设备101、102、103、104、105和106可以被配置为执行一个或多个应用程序。
在本公开的实施例中,服务器120可以运行使得能够执行数据处理的方法的一个或多个服务或软件应用。
在某些实施例中,服务器120还可以提供可以包括非虚拟环境和虚拟环境的其他服务或软件应用。在某些实施例中,这些服务可以作为基于web的服务或云服务提供,例如在软件即服务(SaaS)模型下提供给客户端设备101、102、103、104、105和/或106的用户。
在图1所示的配置中,服务器120可以包括实现由服务器120执行的功能的一个或多个组件。这些组件可以包括可由一个或多个处理器执行的软件组件、硬件组件或其组合。操作客户端设备101、102、103、104、105和/或106的用户可以依次利用一个或多个客户端应用程序来与服务器120进行交互以利用这些组件提供的服务。应当理解,各种不同的系统配置是可能的,其可以与系统100不同。因此,图1是用于实施本文所描述的各种方法的系统的一个示例,并且不旨在进行限制。
用户可以使用客户端设备101、102、103、104、105和/或106来获取第一数据流和/或第二数据流。客户端设备可以提供使客户端设备的用户能够与客户端设备进行交互的接口。客户端设备还可以经由该接口向用户输出信息。 尽管图1仅描绘了六种客户端设备,但是本领域技术人员将能够理解,本公开可以支持任何数量的客户端设备。
客户端设备101、102、103、104、105和/或106可以包括各种类型的计算机设备,例如便携式手持设备、通用计算机(诸如个人计算机和膝上型计算机)、工作站计算机、可穿戴设备、游戏系统、瘦客户端、各种消息收发设备、传感器或其他感测设备等。这些计算机设备可以运行各种类型和版本的软件应用程序和操作系统,例如MICROSOFT Windows、APPLE iOS、类UNIX操作系统、Linux或类Linux操作系统(例如GOOGLE Chrome OS);或包括各种移动操作系统,例如MICROSOFT Windows Mobile OS、iOS、Windows Phone、Android。便携式手持设备可以包括蜂窝电话、智能电话、平板电脑、个人数字助理(PDA)等。可穿戴设备可以包括头戴式显示器和其他设备。游戏系统可以包括各种手持式游戏设备、支持互联网的游戏设备等。客户端设备能够执行各种不同的应用程序,例如各种与Internet相关的应用程序、通信应用程序(例如电子邮件应用程序)、短消息服务(SMS)应用程序,并且可以使用各种通信协议。
网络110可以是本领域技术人员熟知的任何类型的网络,其可以使用多种可用协议中的任何一种(包括但不限于TCP/IP、SNA、IPX等)来支持数据通信。仅作为示例,一个或多个网络110可以是局域网(LAN)、基于以太网的网络、令牌环、广域网(WAN)、因特网、虚拟网络、虚拟专用网络(VPN)、内部网、外部网、公共交换电话网(PSTN)、红外网络、无线网络(例如蓝牙、WIFI)和/或这些和/或其他网络的任意组合。
服务器120可以包括一个或多个通用计算机、专用服务器计算机(例如PC(个人计算机)服务器、UNIX服务器、中端服务器)、刀片式服务器、大型计算机、服务器群集或任何其他适当的布置和/或组合。服务器120可以包括运行虚拟操作系统的一个或多个虚拟机,或者涉及虚拟化的其他计算架构(例如可以被虚拟化以维护服务器的虚拟存储设备的逻辑存储设备的一个或多个灵活池)。在各种实施例中,服务器120可以运行提供下文所描述的功能的一个或多个服务或软件应用。
服务器120中的计算单元可以运行包括上述任何操作系统以及任何商业上可用的服务器操作系统的一个或多个操作系统。服务器120还可以运行各 种附加服务器应用程序和/或中间层应用程序中的任何一个,包括HTTP服务器、FTP服务器、CGI服务器、JAVA服务器、数据库服务器等。
在一些实施方式中,服务器120可以包括一个或多个应用程序,以分析和合并从客户端设备101、102、103、104、105和106的用户接收的数据馈送和/或事件更新。服务器120还可以包括一个或多个应用程序,以经由客户端设备101、102、103、104、105和106的一个或多个显示设备来显示数据馈送和/或实时事件。
在一些实施方式中,服务器120可以为分布式系统的服务器,或者是结合了区块链的服务器。服务器120也可以是云服务器,或者是带人工智能技术的智能云计算服务器或智能云主机。云服务器是云计算服务体系中的一项主机产品,以解决传统物理主机与虚拟专用服务器(VPS,Virtual Private Server)服务中存在的管理难度大、业务扩展性弱的缺陷。
系统100还可以包括一个或多个数据库130。在某些实施例中,这些数据库可以用于存储数据和其他信息。例如,数据库130中的一个或多个可用于存储诸如音频文件和视频文件的信息。数据存储库130可以驻留在各种位置。例如,由服务器120使用的数据存储库可以在服务器120本地,或者可以远离服务器120且可以经由基于网络或专用的连接与服务器120通信。数据存储库130可以是不同的类型。在某些实施例中,由服务器120使用的数据存储库可以是数据库,例如关系数据库。这些数据库中的一个或多个可以响应于命令而存储、更新和检索到数据库以及来自数据库的数据。
在某些实施例中,数据库130中的一个或多个还可以由应用程序使用来存储应用程序数据。由应用程序使用的数据库可以是不同类型的数据库,例如键值存储库,对象存储库或由文件系统支持的常规存储库。
图1的系统100可以以各种方式配置和操作,以使得能够应用根据本公开所描述的各种方法和装置。
图2是示出根据本公开示例性实施例的数据处理方法流程图,如图2所示,该方法包括:步骤S201、获取第一数据流;步骤S202、在查询表中查询与第一数据流相对应的第二数据流在数据库中的存储信息,其中,数据库中存储有多个第二数据流,多个第二数据流包括至少一个有效的第二数据流,查询表中包括至少一个有效的第二数据流中的每一个第二数据流在数据库中 的存储信息;以及步骤S203、响应于在查询表中查询到与第一数据流相对应的第二数据流的存储信息,基于存储信息,在数据库中确定用于与第一数据流执行拼接处理的第二数据流。由此,能够有效减少对待拼接的第二数据流的搜索耗时,提升拼接效率。
针对步骤S202,根据一些实施例,对于先于第一数据流获取的第二数据流,可以存储在数据库中,以等待之后获取的相应的第一数据流以实现拼接处理。
根据一些实施例,数据库中的每一个第二数据流具有对应的存储索引,每一个第二数据流的存储索引可以包括该第二数据流的一种或多种属性信息,例如,第二数据流的时间戳和/或第二数据流的标识信息等。数据库中的多个第二数据流可以基于各自所对应的存储索引所指示的顺序来进行排序。
其中,数据库中每一个第二数据流所具有的存储索引可以由该第二数据流所对应的一种或多种存储信息构成。
根据一些实施例,在获取第一数据流的过程中,可以同时获取新增的第二数据流。
根据一些实施例,响应于在数据库中存储一个新增的第二数据流,可以在查询表中相应的存储该新增的第二数据流所对应的存储信息。其中,查询表中仅包含数据库中所存储的多个第二数据流中能够执行当前拼接操作的部分有效的第二数据流的存储信息,并可以通过动态更新删除其中变为无效的第二数据流的存储信息。例如,可以基于第二数据流的时间有效性、是否已经执行过拼接操作,以及第二数据流的属性等一种或多种因素,将查询表中无效的第二数据流的存储信息删除,以保证在查询表中仅保存有效的第二数据流的存储信息。
为了保证系统的可靠性,会将所获取的第二数据流在数据库中保存较长的时间,以便于在必要的时间回溯需要查看的第二数据流,其中,数据库中所存储的第二数据流既包括当前可拼接的有效的第二数据流,也包括大量的当前不可拼接的无效第二数据流。为了减小数据库中的大量数据对搜索效率的影响,建立独立于数据库的查询表。查询表可以基于第二数据流的有效性进行实时的更新,仅保留当前有效的第二数据流的存储信息,使得仅仅在查 询表中查询到与第一数据流相对应的第二数据流的存储信息时,再执行在数据库中的搜索,大大降低了搜索的耗时。
根据一些实施例,查询表中的数据可以以数据分片(KeyGroup)的形式存储。
根据一些实施例,第一数据流可以包括第一标识,至少一个有效的第二数据流中的每一个第二数据流在数据库中的存储信息可以包括该第二数据流的第二标识,在查询表中查询与第一数据流相对应的第二数据流在数据库中的存储信息可以包括:在查询表中查询与第一标识相对应的第二标识;以及将包括第二标识的存储信息确定为与第一数据流相对应的第二数据流的存储信息。由此可以方便地确定与第一数据流相对应的第二数据流在数据库中的存储信息。
其中,第一数据流的第一标识可以包括至少一种用于表示第一数据流的属性信息,第二数据流的第二标识可以包括至少一种用于表示第二数据流的属性信息,例如,第一标识和第二标识可以为数据流所对应的用户ID,进而可以确定具有相同的用户ID的第一数据流和第二数据流为用于执行拼接处理的两条数据流。
根据一些实施例,响应于在查询表中查询到与第一数据流相对应的第二数据流的存储信息,删除查询表中的该存储信息。由此,可以及时在查询表中删除已经匹配过的第二数据流的存储信息,使得查询表中仅保留未匹配的第二数据流的存储信息,减少查询表中的无效数据,提升在对查询表的查询过程中的查询效率。
根据一些实施例,查询表中的每一个第二数据流的存储信息可以包括该第二数据流的时间戳,方法还可以包括:针对查询表中的每一个第二数据流的存储信息,响应于该存储信息中的时间戳所指示的历史时刻和当前水位时间的时差超出预设延时范围,删除该存储信息。由此,可以及时在查询表中删除已经丧失时效性的第二数据流的存储信息,使得查询表中仅保留具有时效性的第二数据流的存储信息,减少查询表中的无效数据,提升在对查询表的查询过程中的查询效率。
其中,水位时间(watermark)是一种用于在实时计算的过程中衡量进度的指标。
根据一些实施例,可以采用系统中的一个中心化的模块来记录对应于查询表的当前的水位时间。
在查询到与第一数据流相对应的第二数据流的存储信息之后,执行步骤S203,进而基于存储信息,在数据库中确定用于与第一数据流执行拼接处理的第二数据流。拼接后的第一数据流和第二数据流可以下发给下游进行相应的处理和分析。
其中,完成拼接的第二数据流的原始数据依然存储在数据库中,数据库中所存储的第二数据流不因为拼接而删除。
根据一些实施例,基于存储信息,在数据库中确定用于与第一数据流执行拼接处理的第二数据流可以包括:确定存储信息所对应的数据库中的存储索引;以及基于所确定的存储索引,在数据库中确定该存储索引所对应的第二数据流。
在数据拼接的过程中,如果第一数据流和第二数据流的拼接率较低,将会导致大量的未拼接的第二数据流无法下发至下游进行进一步处理,影响后续的数据处理效果,基于此,有必要识别出数据库中未拼接的第二数据流,并将其传输至下游处理。
根据一些实施例,数据库中还包括多个第二数据流中的每一个第二数据流所对应的状态标识,方法还包括:在确定用于与第一数据流执行拼接处理的第二数据流之后,设置该第二数据流所对应的状态标识以指示该第二数据流已完成拼接处理。由此,可以在数据库中方便地表示出每个第二数据流是已经拼接的数据流还是未拼接的数据流。
根据一些实施例,基于预设搜索时段,确定数据库中的至少一个待搜索第二数据流,其中,至少一个待搜索第二数据流中的每一个待搜索第二数据流的时间戳所指示的历史时刻在预设搜索时段内;以及针对至少一个待搜索第二数据流中的每一个待搜索第二数据流,响应于该待搜索第二数据流所对应的状态标识指示该第二数据流未完成拼接处理,将该第二数据流确定为待传输第二数据流。由此,可以在数据库中高效地获取未拼接的第二数据流,进而将其传输至下游处理。
根据一些实施例,预设搜索时段的终止时刻和当前水位时间的时差大于预设等待时间范围。其中,预设等待时间范围表示数据库中所存储的第二数 据流能够等待第一数据流的最长时间,对于超出预设等待时间范围仍未拼接成功的第二数据流,可以将其确定为待传输第二数据流。
根据一些实施例,可以采用系统中的一个中心化的模块来记录对应于数据库的当前的水位时间。
根据一些实施例,数据库中所包含的多个第二数据流为基于各自所对应的时间戳所指示的历史时刻的先后次序来进行排序的。由此可以将数据库中的多个第二数据流依照时间先后顺序排列,便于依据时间戳信息搜索数据库中的待传输第二数据流。
表1为示例性的数据库中第二数据流的存储模式,为了表述的简明,仅以五条第二数据流为例进行说明,本公开不对数据库中所存储的第二数据流的数量进行限制。
表1
存储索引 数据列 状态标识
时间戳001_第二标识A 第二数据流A 0
时间戳002_第二标识B 第二数据流B 1
时间戳003_第二标识C 第二数据流C 0
时间戳003_第二标识D 第二数据流D 0
时间戳004_第二标识E 第二数据流E 1
如表1所示,数据库中所存储的第二数据流依照每个第二数据流的时间戳的大小顺序排列。
其中,第二数据流A的时间戳001和第二标识A构成了第二数据流A在数据库中的存储索引,第二数据流B的时间戳002和第二标识B构成了第二数据流B在数据库中的存储索引,第二数据流C的时间戳003和第二 标识C构成了第二数据流C在数据库中的存储索引,第二数据流D的时间戳003和第二标识D构成了第二数据流D在数据库中的存储索引,第二数据流E的时间戳004和第二标识E构成了第二数据流E在数据库中的存储索引。可以采用状态标识1指示所对应的第二数据流已完成拼接处理,采用状态标识0指示所对应的第二数据流未完成拼接处理。
根据一些实施例,基于预设搜索时段,确定数据库中的至少一个待搜索第二数据流可以包括:确定预设搜索时段所对应的起始时刻和终止时刻;在数据库中搜索起始时刻所对应的起始存储索引,以及终止时刻所对应的终止存储索引;以及确定数据库中由起始存储索引和终止存储索引所确定的存储范围内的第二数据流为至少一个待搜索第二数据流。由此在数据库中执行搜索的过程中,仅仅需要确定起始时刻和终止时刻在数据库中所分别对应的起始存储索引和终止存储索引,就可以确定出对应于预设搜索时段的所有待搜索第二数据流,进而仅能有效节省搜索和确定最终的待传输第二数据流的时间。
例如,如表1所示,起始时刻所对应的起始存储索引为“时间戳002_第二标识B”,终止时刻所对应的终止存储索引为“时间戳004_第二标识E”,由此可以确定该数据库中由起始存储索引“时间戳002_第二标识B”和终止存储索引“时间戳004_第二标识E”所确定的存储范围内的第二数据流B、第二数据流C、第二数据流D和第二数据流E为至少一个待搜索第二数据流。其中,第二数据流C和第二数据流D的状态标识为0,指示第二数据流C和第二数据流D未完成拼接处理,可以将第二数据流C和第二数据流D作为待传输第二数据流。
根据一些实施例,响应于在查询表中未查询到与第一数据流相对应的第二数据流的存储信息,缓存第一数据流。由此,能够避免在第二数据流获取的时间滞后的情况下,影响第一数据流和第二数据流的拼接成功率。
根据一些实施例,可以将第一数据流缓存在查询表中。
根据一些实施例,响应于第一数据流在经过预设时长后仍未拼接成功,从查询表中删除该第二数据流。
图3是示出根据本公开示例性实施例的数据流处理方法流程图。如图3所示,获取的第二数据流通过数据流处理模块的处理后进行存储。具体地, 数据流处理模块提取第二数据流的时间戳和第二标识,并基于时间戳和第二标识构建该第二数据流在数据库中的存储索引。该第二数据流以该存储索引所指示的存储位置在数据库中进行存储。同时,该第二数据流的时间戳和第二标识作为该第二数据流的存储信息存储于查询表中。
水位模块中存储对应于数据库的当前水位时间信息,启动模块通过从水位模块所获取的水位时间信息,启动数据搜索模块在数据库中确定至少一个待搜索第二数据流。示例性地,启动模块通过从水位模块所获取的水位时间为10:00,预设等待时间范围为2小时,如果可搜素的时间窗口为10分钟,则可以将预设搜索时段设置为7:50-8:00。
针对至少一个待搜索第二数据流中的每一个待搜索第二数据流,响应于该待搜索第二数据流所对应的状态标识指示该第二数据流未完成拼接处理,将该第二数据流确定为待传输第二数据流,通过传输模块传输至下游进行相应处理。
根据本公开的另一方面,还公开了一种数据处理装置400,包括:获取单元401,被配置用于获取第一数据流;查询单元402,被配置用于在查询表中查询与第一数据流相对应的第二数据流在数据库中的存储信息,其中,数据库中存储有多个第二数据流,多个第二数据流包括至少一个有效的第二数据流,查询表中包括至少一个有效的第二数据流中的每一个第二数据流在数据库中的存储信息;以及第一确定单元403,被配置用于响应于在查询表中查询到与第一数据流相对应的第二数据流的存储信息,基于存储信息,在数据库中确定用于与第一数据流执行拼接处理的第二数据流。
根据一些实施例,该装置还包括:缓存单元,被配置用于响应于在查询表中未查询到与第一数据流相对应的第二数据流的存储信息,缓存第一数据流。
根据一些实施例,第一数据流包括第一标识,至少一个有效的第二数据流中的每一个第二数据流在数据库中的存储信息包括该第二数据流的第二标识,查询单元包括:在查询表中查询与第一标识相对应的第二标识的模块;以及将包括第二标识的存储信息确定为与第一数据流相对应的第二数据流的存储信息的模块。
根据一些实施例,该装置还包括:第一删除单元,被配置用于响应于在查询表中查询到与第一数据流相对应的第二数据流的存储信息,删除查询表中的该存储信息。
根据一些实施例,查询表中的每一个第二数据流的存储信息包括该第二数据流的时间戳,装置还包括:第二删除单元,被配置用于针对查询表中的每一个第二数据流的存储信息,响应于该存储信息中的时间戳所指示的历史时刻和当前水位时间的时差超出预设延时范围,删除该存储信息。
根据一些实施例,数据库中还包括多个第二数据流中的每一个第二数据流所对应的状态标识,装置还包括:设置单元,被配置用于在确定用于与第一数据流执行拼接处理的第二数据流之后,设置该第二数据流所对应的状态标识以指示该第二数据流已完成拼接处理。
根据一些实施例,该装置还包括:第二确定单元,被配置用于基于预设搜索时段,确定数据库中的至少一个待搜索第二数据流,其中,至少一个待搜索第二数据流中的每一个待搜索第二数据流的时间戳所指示的历史时刻在预设搜索时段内;以及第三确定单元,被配置用于针对至少一个待搜索第二数据流中的每一个待搜索第二数据流,响应于该待搜索第二数据流所对应的状态标识指示该第二数据流未完成拼接处理,将该第二数据流确定为待传输第二数据流。
根据一些实施例,数据库中所包含的多个第二数据流为基于各自所对应的时间戳所指示的历史时刻的先后次序来进行排序的,第二确定单元包括:确定预设搜索时段所对应的起始时刻和终止时刻的模块;在数据库中搜索起始时刻所对应的起始存储索引,以及终止时刻所对应的终止存储索引的模块;以及确定数据库中由起始存储索引和终止存储索引所确定的存储范围内的第二数据流为至少一个待搜索第二数据流的模块。
根据本公开的另一方面,还公开了一种计算设备,包括:至少一个处理器;以及与至少一个处理器通信连接的存储器;其中存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行上述任一的方法。
根据本公开的另一方面,还公开了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,计算机指令用于使计算机执行根据上述任一的方法。
根据本公开的另一方面,还公开了一种计算机程序产品,包括计算机程序,其中,计算机程序在被处理器执行时实现上述任一的方法。
参考图5,现将描述可以作为本公开的服务器或客户端的电子设备500的结构框图,其是可以应用于本公开的各方面的硬件设备的示例。电子设备旨在表示各种形式的数字电子的计算机设备,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。
如图5所示,设备500包括计算单元501,其可以根据存储在只读存储器(ROM)502中的计算机程序或者从存储单元508加载到随机访问存储器(RAM)503中的计算机程序,来执行各种适当的动作和处理。在RAM 503中,还可存储设备500操作所需的各种程序和数据。计算单元501、ROM 502以及RAM 503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。
设备500中的多个部件连接至I/O接口505,包括:输入单元506、输出单元507、存储单元508以及通信单元509。输入单元506可以是能向设备500输入信息的任何类型的设备,输入单元506可以接收输入的数字或字符信息,以及产生与电子设备的用户设置和/或功能控制有关的键信号输入,并且可以包括但不限于鼠标、键盘、触摸屏、轨迹板、轨迹球、操作杆、麦克风和/或遥控器。输出单元507可以是能呈现信息的任何类型的设备,并且可以包括但不限于显示器、扬声器、视频/音频输出终端、振动器和/或打印机。存储单元508可以包括但不限于磁盘、光盘。通信单元509允许设备500通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据,并且可以包括但不限于调制解调器、网卡、红外通信设备、无线通信收发机和/或芯片组,例如蓝牙TM设备、1302.11设备、WiFi设备、WiMax设备、蜂窝通信设备和/或类似物。
计算单元501可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元501的一些示例包括但不限于中央处理单元(CPU)、图形处 理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元501执行上文所描述的各个方法和处理,例如数据处理方法。例如,在一些实施例中,数据处理方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元508。在一些实施例中,计算机程序的部分或者全部可以经由ROM 502和/或通信单元509而被载入和/或安装到设备500上。当计算机程序加载到RAM 503并由计算单元501执行时,可以执行上文描述的数据处理方法的一个或多个步骤。备选地,在其他实施例中,计算单元501可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行数据处理方法。
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可 读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,也可以为分布式系统的服务器,或者是结合了区块链的服务器。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本公开中记载的各步骤可以并行地执行、也可以顺序地或以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。
虽然已经参照附图描述了本公开的实施例或示例,但应理解,上述的方法、系统和设备仅仅是示例性的实施例或示例,本发明的范围并不由这些实 施例或示例限制,而是仅由授权后的权利要求书及其等同范围来限定。实施例或示例中的各种要素可以被省略或者可由其等同要素替代。此外,可以通过不同于本公开中描述的次序来执行各步骤。进一步地,可以以各种方式组合实施例或示例中的各种要素。重要的是随着技术的演进,在此描述的很多要素可以由本公开之后出现的等同要素进行替换。

Claims (20)

  1. 一种数据处理方法,包括:
    获取第一数据流;
    在查询表中查询与所述第一数据流相对应的第二数据流在数据库中的存储信息,其中,所述数据库中存储有多个第二数据流,所述多个第二数据流包括至少一个有效的第二数据流,所述查询表中包括所述至少一个有效的第二数据流中的每一个第二数据流在所述数据库中的存储信息;以及
    响应于在查询表中查询到与所述第一数据流相对应的第二数据流的存储信息,基于所述存储信息,在所述数据库中确定用于与所述第一数据流执行拼接处理的第二数据流。
  2. 根据权利要求1所述的方法,还包括:
    响应于在查询表中未查询到与所述第一数据流相对应的第二数据流的存储信息,缓存所述第一数据流。
  3. 根据权利要求1或2所述的方法,其中,所述第一数据流包括第一标识,所述至少一个有效的第二数据流中的每一个第二数据流在所述数据库中的存储信息包括该第二数据流的第二标识,所述在查询表中查询与所述第一数据流相对应的第二数据流在数据库中的存储信息包括:
    在所述查询表中查询与所述第一标识相对应的第二标识;以及
    将包括所述第二标识的存储信息确定为与所述第一数据流相对应的第二数据流的存储信息。
  4. 根据权利要求3所述的方法,还包括:
    响应于在查询表中查询到与所述第一数据流相对应的第二数据流的存储信息,删除所述查询表中的该存储信息。
  5. 根据权利要求1至4中任一项所述的方法,其中,所述查询表中的每一个第二数据流的存储信息包括该第二数据流的时间戳,所述方法还包括:
    针对所述查询表中的每一个第二数据流的存储信息,响应于该存储信息中的时间戳所指示的历史时刻和当前水位时间的时差超出预设延时范围,删除该存储信息。
  6. 根据权利要求1所述的方法,其中,所述数据库中还包括所述多个第二数据流中的每一个第二数据流所对应的状态标识,所述方法还包括:
    在所述确定用于与所述第一数据流执行拼接处理的第二数据流之后,设置该第二数据流所对应的状态标识以指示该第二数据流已完成拼接处理。
  7. 根据权利要求6所述的方法,还包括:
    基于预设搜索时段,确定所述数据库中的至少一个待搜索第二数据流,其中,所述至少一个待搜索第二数据流中的每一个待搜索第二数据流的时间戳所指示的历史时刻在所述预设搜索时段内;以及
    针对所述至少一个待搜索第二数据流中的每一个待搜索第二数据流,响应于该待搜索第二数据流所对应的状态标识指示该第二数据流未完成拼接处理,将该第二数据流确定为待传输第二数据流。
  8. 根据权利要求7所述的方法,其中,所述数据库中所包含的多个第二数据流为基于各自所对应的时间戳所指示的历史时刻的先后次序来进行排序的。
  9. 根据权利要求8所述的方法,其中,所述基于预设搜索时段,确定所述数据库中的至少一个待搜索第二数据流包括:
    确定所述预设搜索时段所对应的起始时刻和终止时刻;
    在所述数据库中搜索所述起始时刻所对应的起始存储索引,以及所述终止时刻所对应的终止存储索引;以及
    确定所述数据库中由所述起始存储索引和所述终止存储索引所确定的存储范围内的第二数据流为所述至少一个待搜索第二数据流。
  10. 一种数据处理装置,包括:
    获取单元,被配置用于获取第一数据流;
    查询单元,被配置用于在查询表中查询与所述第一数据流相对应的第二数据流在数据库中的存储信息,其中,所述数据库中存储有多个第二数据流,所述多个第二数据流包括至少一个有效的第二数据流,所述查询表中包括所述至少一个有效的第二数据流中的每一个第二数据流在所述数据库中的存储信息;以及
    第一确定单元,被配置用于响应于在查询表中查询到与所述第一数据流相对应的第二数据流的存储信息,基于所述存储信息,在所述数据库中确定用于与所述第一数据流执行拼接处理的第二数据流。
  11. 根据权利要求10所述的装置,还包括:
    缓存单元,被配置用于响应于在查询表中未查询到与所述第一数据流相对应的第二数据流的存储信息,缓存所述第一数据流。
  12. 根据权利要求10或11所述的装置,其中,所述第一数据流包括第一标识,所述至少一个有效的第二数据流中的每一个第二数据流在所述数据库中的存储信息包括该第二数据流的第二标识,所述查询单元包括:
    在所述查询表中查询与所述第一标识相对应的第二标识的模块;以及
    将包括所述第二标识的存储信息确定为与所述第一数据流相对应的第二数据流的存储信息的模块。
  13. 根据权利要求12所述的装置,还包括:
    第一删除单元,被配置用于响应于在查询表中查询到与所述第一数据流相对应的第二数据流的存储信息,删除所述查询表中的该存储信息。
  14. 根据权利要求10至13中任一项所述的装置,其中,所述查询表中的每一个第二数据流的存储信息包括该第二数据流的时间戳,所述装置还包括:
    第二删除单元,被配置用于针对所述查询表中的每一个第二数据流的存储信息,响应于该存储信息中的时间戳所指示的历史时刻和当前水位时间的时差超出预设延时范围,删除该存储信息。
  15. 根据权利要求10所述的装置,其中,所述数据库中还包括所述多个第二数据流中的每一个第二数据流所对应的状态标识,所述装置还包括:
    设置单元,被配置用于在所述确定用于与所述第一数据流执行拼接处理的第二数据流之后,设置该第二数据流所对应的状态标识以指示该第二数据流已完成拼接处理。
  16. 根据权利要求15所述的装置,还包括:
    第二确定单元,被配置用于基于预设搜索时段,确定所述数据库中的至少一个待搜索第二数据流,其中,所述至少一个待搜索第二数据流中的每一个待搜索第二数据流的时间戳所指示的历史时刻在所述预设搜索时段内;以及
    第三确定单元,被配置用于针对所述至少一个待搜索第二数据流中的每一个待搜索第二数据流,响应于该待搜索第二数据流所对应的状态标识指示该第二数据流未完成拼接处理,将该第二数据流确定为待传输第二数据流。
  17. 根据权利要求16中任一项所述的装置,其中,所述数据库中所包含的多个第二数据流为基于各自所对应的时间戳所指示的历史时刻的先后次序来进行排序的,所述第二确定单元包括:
    确定所述预设搜索时段所对应的起始时刻和终止时刻的模块;
    在所述数据库中搜索所述起始时刻所对应的起始存储索引,以及所述终止时刻所对应的终止存储索引的模块;以及
    确定所述数据库中由所述起始存储索引和所述终止存储索引所确定的存储范围内的第二数据流为所述至少一个待搜索第二数据流的模块。
  18. 一种计算设备,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-9中任一项所述的方法。
  19. 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行根据权利要求1-9中任一项所述的方法。
  20. 一种计算机程序产品,包括计算机程序,其中,所述计算机程序在被处理器执行时实现权利要求1-9中任一项所述的方法。
PCT/CN2021/136561 2021-06-23 2021-12-08 数据处理方法及装置,计算设备和介质 WO2022267368A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020227033396A KR20220138867A (ko) 2021-06-23 2021-12-08 데이터 처리방법 및 장치, 컴퓨팅 기기 및 매체
JP2022562122A JP2023534347A (ja) 2021-06-23 2021-12-08 データ処理方法および装置、計算機器ならびに媒体
US17/921,620 US20230306031A1 (en) 2021-06-23 2021-12-08 Method for data processing, computing device, and storage medium
EP21936262.1A EP4152174A4 (en) 2021-06-23 2021-12-08 DATA PROCESSING METHOD AND APPARATUS, AND COMPUTER DEVICE AND MEDIUM

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110700394.1 2021-06-23
CN202110700394.1A CN113377809A (zh) 2021-06-23 2021-06-23 数据处理方法及装置,计算设备和介质

Publications (1)

Publication Number Publication Date
WO2022267368A1 true WO2022267368A1 (zh) 2022-12-29

Family

ID=77578839

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/136561 WO2022267368A1 (zh) 2021-06-23 2021-12-08 数据处理方法及装置,计算设备和介质

Country Status (2)

Country Link
CN (1) CN113377809A (zh)
WO (1) WO2022267368A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377809A (zh) * 2021-06-23 2021-09-10 北京百度网讯科技有限公司 数据处理方法及装置,计算设备和介质
CN114125082B (zh) * 2021-11-05 2024-04-19 江西洪都航空工业集团有限责任公司 遥测数据处理方法、装置及系统
CN115629918B (zh) * 2022-10-24 2023-06-27 北京百度网讯科技有限公司 数据处理方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110184963A1 (en) * 2009-12-23 2011-07-28 Ratnesh Singh Thakur Systems and methods for rewriting a stream of data via intermediary
CN105100819A (zh) * 2009-04-28 2015-11-25 福拜特斯印度私人有限公司 用于多个流的协调拼接的方法和设备
CN110134702A (zh) * 2019-05-17 2019-08-16 北京百度网讯科技有限公司 数据流拼接方法、装置、设备和存储介质
CN110287192A (zh) * 2019-06-26 2019-09-27 浙江大搜车软件技术有限公司 搜索应用数据处理方法、装置、计算机设备和存储介质
CN110462616A (zh) * 2017-03-27 2019-11-15 斯纳普公司 生成拼接数据流
CN113377809A (zh) * 2021-06-23 2021-09-10 北京百度网讯科技有限公司 数据处理方法及装置,计算设备和介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10057349B2 (en) * 2015-11-12 2018-08-21 Facebook, Inc. Data stream consolidation in a social networking system for near real-time analysis
US10956420B2 (en) * 2017-11-17 2021-03-23 International Business Machines Corporation Automatically connecting external data to business analytics process
CN109241188A (zh) * 2018-09-05 2019-01-18 上海汽车集团股份有限公司 数据的涓流传输方法及装置、存储介质、终端
CN110276002B (zh) * 2019-06-26 2021-08-03 浙江大搜车软件技术有限公司 搜索应用数据处理方法、装置、计算机设备和存储介质
CN111831383A (zh) * 2020-07-20 2020-10-27 北京百度网讯科技有限公司 窗口拼接方法、装置、设备以及存储介质
CN112905645A (zh) * 2021-03-30 2021-06-04 中国建设银行股份有限公司 银行数据处理的方法、装置、电子设备和存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105100819A (zh) * 2009-04-28 2015-11-25 福拜特斯印度私人有限公司 用于多个流的协调拼接的方法和设备
US20110184963A1 (en) * 2009-12-23 2011-07-28 Ratnesh Singh Thakur Systems and methods for rewriting a stream of data via intermediary
CN110462616A (zh) * 2017-03-27 2019-11-15 斯纳普公司 生成拼接数据流
CN110134702A (zh) * 2019-05-17 2019-08-16 北京百度网讯科技有限公司 数据流拼接方法、装置、设备和存储介质
CN110287192A (zh) * 2019-06-26 2019-09-27 浙江大搜车软件技术有限公司 搜索应用数据处理方法、装置、计算机设备和存储介质
CN113377809A (zh) * 2021-06-23 2021-09-10 北京百度网讯科技有限公司 数据处理方法及装置,计算设备和介质

Also Published As

Publication number Publication date
CN113377809A (zh) 2021-09-10

Similar Documents

Publication Publication Date Title
WO2022267368A1 (zh) 数据处理方法及装置,计算设备和介质
US20230005284A1 (en) Method for training image-text matching model, computing device, and storage medium
WO2022141968A1 (zh) 对象推荐方法及装置、计算机设备和介质
WO2023019948A1 (zh) 多模态信息库的检索方法、管理方法、装置、设备和介质
WO2023142406A1 (zh) 排序方法、排序模型的训练方法、装置、电子设备及介质
WO2023231350A1 (zh) 利用整数规划求解器实现的任务处理方法、设备和介质
WO2023245938A1 (zh) 对象推荐方法和装置
US11842726B2 (en) Method, apparatus, electronic device and storage medium for speech recognition
CN113220710B (zh) 数据查询方法、装置、电子设备以及存储介质
US20200327140A1 (en) Systems and methods for access to multi-tenant heterogeneous databases
WO2024027125A1 (zh) 对象推荐方法、装置、电子设备和存储介质
WO2023240833A1 (zh) 信息推荐方法及装置、电子设备和介质
WO2023050732A1 (zh) 对象推荐方法和装置
US12019592B2 (en) File moving method, electronic device, and medium
WO2023103432A1 (zh) 代码文件编辑方法、装置、电子设备以及存储介质
US20220004801A1 (en) Image processing and training for a neural network
CN115640280A (zh) 数据迁移方法及装置
CN112860681B (zh) 数据清洗方法及装置、计算机设备和介质
EP4152174A1 (en) Data processing method and apparatus, and computing device and medium
CN115809364B (zh) 对象推荐方法和模型训练方法
CN114861658B (zh) 地址信息解析方法及装置、设备和介质
US11849164B2 (en) Method for detecting live streaming jitter, device, and medium
WO2021139183A1 (zh) 电子地图的搜索方法、装置、设备和介质
US20230195849A1 (en) Data processing method
CN114840483A (zh) 数据压缩方法、数据解压方法、装置、电子设备、介质

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 20227033396

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2022562122

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021936262

Country of ref document: EP

Effective date: 20221021

NENP Non-entry into the national phase

Ref country code: DE