CN114116890A - Data processing method, device, equipment, storage medium and computer program product - Google Patents

Data processing method, device, equipment, storage medium and computer program product Download PDF

Info

Publication number
CN114116890A
CN114116890A CN202111300424.6A CN202111300424A CN114116890A CN 114116890 A CN114116890 A CN 114116890A CN 202111300424 A CN202111300424 A CN 202111300424A CN 114116890 A CN114116890 A CN 114116890A
Authority
CN
China
Prior art keywords
data
stock
processed
server
cleaned
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111300424.6A
Other languages
Chinese (zh)
Inventor
张朗淇
郑舒丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Futu Network Technology Co Ltd
Original Assignee
Shenzhen Futu Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Futu Network Technology Co Ltd filed Critical Shenzhen Futu Network Technology Co Ltd
Priority to CN202111300424.6A priority Critical patent/CN114116890A/en
Publication of CN114116890A publication Critical patent/CN114116890A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Computational Linguistics (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a data processing method, a device, equipment, a storage medium and a computer program product, wherein the method comprises the following steps: acquiring data to be processed in real time by adopting a data synchronization method, and storing the data to be processed into the same target database, wherein the data to be processed come from different stock databases; cleaning data to be processed to obtain cleaned data; and adopting a stream calculation method for the cleaned data to obtain first numerical values corresponding to various stock indexes. The stock data of different stock databases are subjected to unified data processing, so that a user does not need to adopt a large number of APPs or page switching operations, and the user experience is improved.

Description

Data processing method, device, equipment, storage medium and computer program product
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method, an apparatus, a device, a storage medium, and a computer program product.
Background
Currently, different servers may perform statistical analysis on stock data in their respective stock databases to obtain values corresponding to stock indexes, such as: the server 1 analyzes the stock data corresponding thereto to obtain the numerical values corresponding to the stock indexes such as the browsing amount and the publication amount, and the server 2 analyzes the stock data corresponding thereto to obtain the numerical values corresponding to the stock indexes such as the position sum.
From the perspective of a user, the user needs to check the values corresponding to the stock indexes through multiple stock Applications (APPs), or needs to check the values corresponding to the stock indexes through multiple pages of one APP, which obviously causes the user to perform a large number of APP or page switching operations, and therefore how to perform unified data processing on stock data of different stock databases is an urgent technical problem to be solved in the present Application.
Disclosure of Invention
The application provides a data processing method, a data processing device, data processing equipment, a storage medium and a computer program product, so that unified data processing of stock data of different stock databases is realized, a user does not need to adopt a large amount of APP or page switching operation, and the user experience is improved.
In a first aspect, a data processing method is provided, including: acquiring data to be processed in real time by adopting a data synchronization method, and storing the data to be processed into the same target database, wherein the data to be processed come from different stock databases; cleaning data to be processed to obtain cleaned data; and adopting a stream calculation method for the cleaned data to obtain first numerical values corresponding to various stock indexes.
In a second aspect, a data processing apparatus is provided, including: a processing module to: acquiring data to be processed in real time by adopting a data synchronization method, and storing the data to be processed into the same target database, wherein the data to be processed come from different stock databases; cleaning data to be processed to obtain cleaned data; and adopting a stream calculation method for the cleaned data to obtain first numerical values corresponding to various stock indexes.
In a third aspect, an electronic device is provided, including: a processor and a memory, the memory being configured to store a computer program, the processor being configured to invoke and execute the computer program stored in the memory to perform a method as in the first aspect or its implementations.
In a fourth aspect, there is provided a computer readable storage medium for storing a computer program for causing a computer to perform the method as in the first aspect or its implementations.
In a fifth aspect, there is provided a computer program product comprising computer program instructions to cause a computer to perform the method as in the first aspect or its implementations.
A sixth aspect provides a computer program for causing a computer to perform a method as in the first aspect or implementations thereof.
According to the technical scheme, the server can acquire the data to be processed in real time by adopting a data synchronization method, store the data to be processed into the same target database, clean the data to be processed, and then adopt a stream calculation method for the cleaned data to obtain the numerical values corresponding to various stock indexes. Because the server can carry out unified data processing on stock data of different stock databases, a user does not need to adopt a large amount of APP or page switching operation, and the user experience is improved. In addition, the server can acquire data to be processed from different stock databases in real time, and the server can obtain numerical values corresponding to various stock indexes in real time by adopting a stream calculation method, so that a user can check the numerical values in real time, and the user experience is further improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is an application scenario diagram provided in an embodiment of the present application;
fig. 2 is an interaction flowchart of a data processing method according to an embodiment of the present application;
fig. 3 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;
fig. 4 is a schematic block diagram of an electronic device 400 provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In this application, the terms "exemplary" or "such as" are used to indicate that any embodiment or aspect described as "exemplary" or "such as" in this application is not to be construed as preferred or advantageous over other embodiments or aspects. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
As described above, in the existing data processing method, from the perspective of a user, the user needs to check the values corresponding to the stock indexes through multiple APPs, or needs to check the values corresponding to the stock indexes through multiple pages of one APP, which obviously causes the user to perform a large number of APPs or page switching operations, and therefore, how to perform unified data processing on stock data of different stock databases is an urgent technical problem to be solved in the present application.
In order to solve the technical problem, the inventive concept of the present application is: the server can carry out unified data processing on stock data of different stock databases.
It should be understood that the technical solution of the present application can be applied to the following scenarios, but is not limited to:
fig. 1 is a diagram of an application scenario provided in an embodiment of the present application, as shown in fig. 1, the application scenario may include a terminal 110, a first server 120, and a second server 130. Communication is possible between the terminal 110 and the first server 120 and communication is possible between the first server 120 and the second server 130.
In some implementations, the second server 130 may include a respective stock database, which may include some stock data, such as: including but not limited to stock account information, taken positions, stock page views, stock page likes and dislikes, etc.
It should be understood that the stock database corresponding to the second server 130 may be a database inside the second server 130, or may be a database outside the second server 130, and the present application is not limited thereto.
In some implementation manners, the second server 130 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing a cloud computing service, which is not limited in this embodiment of the present application.
In some implementations, the first server 120 may obtain the stock data corresponding to the second server 130 from the second server 130, and perform unified processing on the stock data to obtain the numerical values corresponding to the stock indexes.
In some implementations, the first server 120 may store the obtained values corresponding to the stock indexes in its corresponding stock database, where the stock database may be a database inside the first server 120 or a database outside the first server 120, and this is not limited in this application.
In some implementation manners, the first server 120 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing a cloud computing service, which is not limited in this embodiment of the present application.
In some implementation manners, a stock client may be installed on the terminal 110, and the user may view the value corresponding to each stock index counted by the first server 120 by accessing the stock client, or the stock client may not be installed on the terminal 110, and the user accesses a stock webpage corresponding to the first server 120 by using a browser to view the value corresponding to each stock index counted by the first server 120.
In some implementations, the terminal 110 may be a Mobile phone, a tablet Computer, a desktop, a laptop, a handheld Computer, a notebook Computer, a vehicle-mounted device, an Ultra-Mobile Personal Computer (UMPC), a netbook, a cellular phone, a Personal Digital Assistant (PDA), an Augmented Reality (AR) \ Virtual Reality (VR) device, which is not limited in this application.
It should be understood that the numbers of the terminals, the first servers and the second servers in fig. 1 are only illustrative, and in fact, any number of the terminals, the first servers and the second servers may be provided according to the actual needs, and the present application is not limited thereto.
After introducing the application scenario of the embodiment of the present application, the following will explain the technical solution of the present application in detail:
fig. 2 is an interaction flowchart of a data processing method provided in an embodiment of the present application, where the method may be executed by the first server 120 and the second server 130 shown in fig. 1, but is not limited to this, and as shown in fig. 2, the method may include the following steps:
s201: the first server acquires data to be processed from a plurality of second servers in real time by adopting a data synchronization method and stores the data to be processed into the same target database;
s202: the method comprises the steps that a first server cleans data to be processed to obtain cleaned data;
s203: and the first server adopts a stream calculation method for the cleaned data to obtain first numerical values corresponding to various stock indexes.
Wherein, the data to be processed is from different stock databases.
In some implementations, the data to be processed may include various items of stock data, such as: including but not limited to stock account information, taken positions, stock page views, stock page likes and dislikes, etc.
Example 1, the stock database in a second server may comprise a community browsing table comprising: the browsing of one or more stock pages per account, for example as shown in table 1, may be used to make statistics of the browsing volumes of stock pages.
Example 2, the stock database in a second server may include a community publication table including: the stock posts published per account, for example as shown in table 2, may be used to make statistics of the publication volume of stock posts.
Example 3, the stock database in a second server may include a community praise table including: the stock posts complied with per account, for example as shown in table 3, the community compliment table may be used to make statistics of the volume of compliments awarded for the stock posts.
TABLE 1
Figure BDA0003338157010000051
Figure BDA0003338157010000061
In some implementation manners, the database types of the stock databases corresponding to the second servers may be the same or different, for example, the stock database of the second server 1 may be a MySQL database, and the stock database of the second server 2 is a Redis database. Alternatively, the stock databases of the second server 1 and the second server 2 may both be MySQL databases. This is not limited by the present application.
In some implementations, the server may obtain the data to be processed in real time using a data synchronization method.
Illustratively, the server may employ a data synchronization scheme based on the Flink SQL CDC to acquire the data to be processed in real time. The Flink is a computing framework capable of realizing stream computing and batch computing, the Flink SQL is a set of development languages which are designed by the Flink for simplifying a computing model and are in accordance with the semantics of a standard Structured Query Language (SQL) and used by a user to reduce a real-time computing threshold, and the Change Data acquisition (Change Data Capture, CDC) can Capture changed Data. The Flink SQL CDC is a data synchronization scheme after the Flink integration CDC. The Flink CDC Connector is a set of data source connectors used by the Flink SQL CDC, which may use the CDC to extract change data from different databases. In the Flink SQL CDC data synchronization scheme, a Flink CDC Connector can directly read full data and incremental change data from databases such as MySQL and the like, and then the Flink SQL can collect, calculate and transmit the data, so that data synchronization is realized. For example, the Flink CDC Connector may read each item of stock data, i.e., to-be-processed data, from the stock database, i.e., the MySQL database, of the second server 1, and then the Flink SQL may collect, calculate, and transmit the to-be-processed data, so as to synchronize the to-be-processed data to the target database.
In some implementation manners, the server may put the to-be-processed Data acquired in real time in a Data Operation layer (ODS) layer, which is not limited in this application.
It should be understood that the ODS layer is the layer of the data model that is closest to the original data in the data source, and that the original data in the data source can be accessed without being changed.
The server may perform the cleaning on the data to be processed in the following implementation manners, but is not limited to the following implementation manners:
in some implementations, the server may delete data that does not meet the data specification from the data to be processed; the server can convert enumeration values in the data to be processed into corresponding data according to a first preset rule; the server can convert null values in the data to be processed into corresponding data according to a second preset rule.
It should be understood that, for the above three operations of the server for cleaning the data to be processed, the present application does not limit the operation sequence thereof.
The server may determine whether each piece of data in the to-be-processed data does not conform to the data of the data specification through any one of the following realizable manners, but is not limited to this:
in the first implementation manner, assuming that any data of the data to be processed is called as first data, if the first data in the data to be processed does not conform to the preset logic, it is determined that the first data does not conform to the data specification; and if the first data in the data to be processed conforms to the preset logic, judging that the first data conforms to the data specification.
It should be understood that the present application is not limited to the preset logic of the data.
Illustratively, it is assumed that a first data 1 is that an account a browses a stock page B at 25/10/2021 while another first data 2 is that an account a browses a stock page C at 12: 00/25/10/2021, and it is impossible for the account a to browse the stock pages B and C at the same time, and therefore, the first data 1 and the first data 2 are contradictory data, and thus, both data may be referred to as data that does not conform to the preset logic.
In the second implementation manner, assuming that any data of the data to be processed is called as first data, if the generation time of the first data is earlier than the preset time, it is determined that the first data does not meet the data specification; and if the generation time of the first data is equal to or later than the preset time, judging that the first data meets the data specification.
It should be understood that the preset time is earlier than the acquisition time of the data to be processed, for example: the preset time may be an early on-line test time of the product in any one of the second servers with respect to the current data processing process.
Exemplarily, assuming that the preset time is 10/25/2021, 00:00, and assuming that the generation time of the first data 1 is 10/25/2021, 14:00, it may be determined that the first data 1 meets the data specification; assuming that the generation time of the first data 2 is 10/24/2021, 14:00, it can be judged that the first data 1 does not meet the data specification.
In the third implementation manner, assuming that any data of the data to be processed is referred to as first data, if the first data does not conform to the preset logic or the generation time is earlier than the preset time, it may be determined that the first data does not conform to the data specification. If the first data conforms to the preset logic and the generation time is equal to or later than the preset time, the first data can be judged to conform to the data specification.
Illustratively, suppose that the first data 1 is that the account a browses the stock page B at 25: 12 at 10/25/2021, while the other first data 2 is that the account a browses the stock page C at 13:00 at 25/10/25/2021, both of which conform to the preset logic of the data, and furthermore, suppose that the preset time is 12:30 at 25/10/2021, it is known that the first data 1 is earlier than the preset time and the first data 2 is later than the preset time, and in sum, the first data 1 does not conform to the data specification, and the first data 2 conforms to the data specification.
It should be understood that the first preset rule may be a uniform field specification of the first server for the enumerated value, and the first preset rule is not limited in this application.
Illustratively, the first server may convert the enumerated values in the data to be processed into corresponding data according to the uniform field specification for the enumerated values, assuming that the enumerated values are 1001 and 1002, where 1001 and 1002 correspond to a dealer name, respectively, they may be converted into FT-H, FT-U according to the uniform field specification for the enumerated values, respectively. Assuming that the enumerated values are 1, 2, and 13, where 1, 2, and 13 correspond to a market name, respectively, they may be converted into C _ SECURITY, H _ FUND, and U _ SECURITY, respectively, according to the unified field specification for the enumerated values.
It should be understood that the second preset rule may be a uniform field specification of the first server for the null value, and the application does not limit the second preset rule. Illustratively, the first server may convert the null value to the character S according to the unicode field specification for the null value.
In some implementations, the first server may put the cleaned Data in a Data retrieval Base (DWB) layer, which is not limited in this application. It should be understood that the DWB layer generally serves as an intermediate layer, and some objective data may be stored.
In some implementation manners, before the first server applies the stream calculation method to the cleaned data, the priority of the cleaned data may be determined, and then the stream calculation method is applied to the cleaned data in sequence according to the priority, so that the important data is processed and displayed at the first time, and the efficiency of data processing and displaying is improved.
For example, the first server may determine the priority according to the importance of the cleaned data. For example, the first server may preset a numerical value of the importance degree of the cleaned data, where a higher numerical value indicates a higher importance degree of the cleaned data, that is, a higher priority of the cleaned data. For example, the first server may set the importance of the cleaned data to a value between 0 and 5. Assuming that the priority of the data related to the stock 1 needs to be set to be the highest, that is, the importance of the data related to the stock 1 needs to be set to be the highest, the first server may set the importance of the cleaned data related to the stock 1 to be 5, for example, the first server may set the importance of the cleaned data related to the stock 1 to be 5, such as browsing of a page of the stock 1 by each account, the number of posts of the stock 1 published by each account, the like of posts of the stock 1 by each account, and the like, so that the server may preferentially process the cleaned data related to the stock 1.
In some implementation manners, the first server may obtain the first numerical value corresponding to each stock index by using a stream calculation method for the cleaned Data, and may output the first numerical value to a Data consumer Service (DWS) layer, which is not limited in this application. It should be understood that the DWS layer can integrate and summarize to analyze service data of a certain subject domain, typically a broad table, based on underlying data on the DWB.
In some implementations, the first server may determine a row key value of the first value corresponding to each stock index after obtaining the first value corresponding to each stock index, and then store the first values corresponding to each stock index as a column according to the row key value of the first value corresponding to each stock index, so as to obtain the target table.
In some implementations, the row key value may include: the name of the stock and the time information, which the present application does not limit.
In some implementations, the first server can use the Hbase database to store a first value corresponding to each stock index, can set a row key value to a stock name and a time date, and can be a column corresponding to the row key value.
In some implementations, as shown in table 2, a target table is listed that is obtained using first values corresponding to a portion of stocks and a portion of stock indices stored in the Hbase database.
TABLE 2
Figure BDA0003338157010000091
For example, as shown in the second row of table 2, the first values corresponding to the partial indexes of the stock 1 at t1 are stored, it can be seen that the browsing volume of the stored stock 1 at t1 is 2, the publication volume is 6, and the first values and browsing volumes stored in other indexes are similar, which is not described herein again.
It should be understood that table 2 merely exemplifies the first values corresponding to the stored partial stock indexes of the partial stocks under the partial time information, but is not limited thereto.
It should be understood that the method for storing the first value corresponding to each stock index is not limited in this application.
In some implementation manners, the first server may receive a query request sent by the client, where the query request may include a key to be queried, and then the first server may perform a numerical query in the target table according to the key to be queried to obtain a query result.
In some implementations, the key to be queried may be a stock name. For example, the key to be queried is stock 1, and the target table is shown in table 2, then the first server queries according to the key to be queried, and then obtains the query result shown in table 3.
TABLE 3
Figure BDA0003338157010000101
In other implementations, the key to be queried may be time information. For example, the to-be-queried key is t1, and the target table is shown in table 2, then the first server queries according to the to-be-queried key, and then obtains a query result shown in table 4.
TABLE 4
Figure BDA0003338157010000102
In still other implementations, the key to be queried may be a stock name and time information. For example, the key values to be queried are stock 1 and t1, and the target table is shown in table 2, then the first server queries according to the key value to be queried, and then the query result shown in table 5 can be obtained.
TABLE 5
Figure BDA0003338157010000103
It should be understood that tables 3, 4 and 5 are merely exemplary to list partial query results, but are not limited thereto.
It should be understood that the present application is not limited to specific contents of the to-be-queried key value.
It should be understood that the query principle is matching, so that in the present application, querying in the target table according to the key value to be queried is to calculate the similarity between the key value to be queried and the data in the target table, and then obtain a query result according to the similarity.
In some implementation manners, the first server may calculate a similarity between the key value to be queried and the data in the target table according to the cosine similarity, which is not limited in this application.
It should be understood that cosine similarity measures the similarity between two vectors by measuring the cosine value of the angle between them, and if the cosine value of the angle between two vectors is closer to 1, it means that the angle is closer to 0 degree, that is, the similarity between the two vectors is higher, that is, the two vectors are more similar. When calculating the similarity between two character strings according to cosine similarity, a union set of the two character strings can be determined first, then vectors corresponding to the two character strings respectively are determined according to the probability of each character in the two character strings appearing in the union set, and then a cosine value of an included angle between the two vectors is calculated according to the following formula:
Figure BDA0003338157010000111
wherein, a and B are n-dimensional vectors corresponding to the two character strings, and a ═ a1,A2,……,An],B=[B1,B2,……,Bn]N is a positive integer, cos θ is the cosine of the angle between A and B, the closer the cosine is to 1, the more the two characters are representedThe higher the cosine similarity between strings, i.e. the higher the similarity between the two strings.
For example, the key to be queried may be stock 1, the data in the target table may be stock 1, then the union between the string stock 1 and the string stock 1 is [ stock, ticket, 1], the probabilities of the stock, ticket, and 1 in the key to be queried 1 appearing in the union are 1, 1, and 1, respectively, the vector corresponding to the key to be queried 1 is a ═ 1, 1, 1], the probabilities of the stock, ticket, and 1 in the data stock 1 in the target table appearing in the union are 1, 1, and 1, respectively, and the vector corresponding to the data stock 1 in the target table is B ═ 1, 1, 1. The specific calculation formula of the cosine value of the included angle between the vectors respectively corresponding to the key value stock 1 to be queried and the data stock 1 in the target table is as follows:
Figure BDA0003338157010000112
similarly, the first server may calculate a cosine value of an included angle between vectors corresponding to the stock 1 of the key value to be queried and other data in the target table, and by comparing the sizes of the cosine values, it may be determined that the data in the target table corresponding to the largest cosine value has the highest similarity, and then the query result may be determined. For example, through the above steps, it can be determined that the cosine value of the included angle between the vectors corresponding to the key value stock 1 to be queried and the data stock 1 in the target table is the largest, and it can be determined that the similarity between the key value stock 1 to be queried and the data stock 1 in the target table is the highest, and then the query result can display the data related to the stock 1, as shown in table 3 above.
In some implementation manners, after querying a query result according to a key value to be queried in the query request, the first server may send a query response to the client, where the query response may include the query result.
It should be understood that, the first value of each stock index is obtained by the first server according to the stream computing method, and in fact, the first server may also obtain the second value of each stock index by using the batch computing method, which is as follows:
in some implementation manners, the first server may obtain data within a preset time period before the data to be processed, and then, a batch calculation method is used for the data to be processed and the data within the preset time period to obtain second values corresponding to various stock indexes.
It should be appreciated that the batch computation method is a batch, high-latency, proactive initiation of computation method. The batch computation method must first define the computation job logic and submit it to the attrition computation system, and the computation job logic is not alterable during the entire run. The data calculated by the batch calculation method must be pre-loaded to a calculation system, and a subsequent calculation system can calculate after the data loading is finished. Unlike the batch computation method, the stream computation method emphasizes computing data streams and low latency. The flow calculation method can spread a large amount of data to each time point, continuously transmit small batches, continuously flow the data, and discard the data after calculation. The result calculated by the flow calculation method can be immediately delivered to an online system, and real-time display is achieved.
Illustratively, assuming that the first server acquires the to-be-processed data at 26/00/10/2020 for a preset time period of 24 hours, when the first server adopts a batch calculation method, all data of the day of 25/10/2020 may be acquired, including acquiring the to-be-processed data at 26/00/10/2020 and all data within 24 hours before the acquired data, and then, adopting the batch calculation method on the data to acquire the second value corresponding to each stock index. When the first server adopts the stream calculation method, before the first server acquires the data to be processed in 10/26/00/2020, the corresponding numerical values of various stock indexes are calculated, for example: further, when the first server acquires to-be-processed data of 26/10/2020 at 00:00, the values corresponding to the stock indexes calculated by the first server may be updated according to the data, for example: the data to be processed of 26/00/2020 are: account a browses stock page B, and the browsing volume for stock page B needs only to be increased by one.
In some implementation manners, before the first server uses the batch calculation method, the priority of the cleaned data may be determined, and then the batch calculation method is sequentially used for the cleaned data according to the priority. Here, the method is similar to the method for determining the priority of the cleaned data before the first server applies the stream calculation method to the cleaned data, and details of this method are not repeated here.
In some implementations, the first server can determine whether the stream computation method is accurate by determining, for each of the stock indices, whether the first and second values corresponding to the stock index are the same.
In some implementations, the first server may store, by reading the data snapshot, a cross section of a first value corresponding to each stock index calculated by using a stream calculation method and a cross section of a second value corresponding to each stock index calculated by using a batch calculation method, and then determine whether the first value and the second value are the same by comparing to determine whether the stream calculation method is accurate.
In some implementations, the stream calculation method is determined to be accurate if the first and second values for each of the stock indices are the same.
In some implementations, the stream calculation method is determined to be inaccurate if, for each of the stock indices, there is at least one stock index for which the first and second values are different.
In some implementations, after the first server determines that the stream computation method is inaccurate, the first server may generate a prompt and push the prompt to prompt the user that the stream computation method is inaccurate.
In some implementation manners, the prompt message may be in a mail form, a short message form, or other forms, and the form of the prompt message is not limited in the present application.
In summary, the embodiments of the present application at least provide the following beneficial effects: on the one hand, the server can carry out unified data processing on stock data of different stock databases, so that a user does not need to adopt a large number of APPs or page switching operations, and the user experience is improved. In addition, the server can acquire data to be processed from different stock databases in real time by adopting a data synchronization method, and can obtain numerical values corresponding to various stock indexes in real time by adopting a stream calculation method, so that a user can check the numerical values in real time, and the user experience is further improved.
On the other hand, the first server can store the numerical values corresponding to the stock indexes into a table in a mode of taking the stock names and the time information as key values, so that a user can conveniently and quickly inquire the stock indexes at a certain time or the stock indexes at a certain time, and the inquiry efficiency is improved.
In another aspect, since the batch calculation method is accurate and reliable, the first server may determine whether the stream calculation method is accurate by comparing the values obtained by the batch calculation method and the stream calculation method, and may push a prompt message to the user when the stream calculation method is inaccurate, thereby ensuring the accuracy of the obtained values corresponding to each stock index.
Fig. 3 is a schematic diagram of a data processing apparatus according to an embodiment of the present application, where the security authentication apparatus may be the first server 120 shown in fig. 1, and the data processing apparatus includes: a processing module 310 configured to: acquiring data to be processed in real time by adopting a data synchronization method, and storing the data to be processed into the same target database, wherein the data to be processed come from different stock databases; cleaning data to be processed to obtain cleaned data; and adopting a stream calculation method for the cleaned data to obtain first numerical values corresponding to various stock indexes.
In some implementations, the processing module 310 is specifically configured to: deleting data which do not meet the data specification in the data to be processed; converting enumeration values in the data to be processed into corresponding data according to a first preset rule; and converting null values in the data to be processed into corresponding data according to a second preset rule.
In some implementations, the processing module 310 is further to: judging whether each data in the data to be processed does not accord with a preset logic and/or whether the generation time is earlier than a preset time; and if the first data in the data to be processed does not accord with the preset logic and/or the generation time is earlier than the preset time, determining that the first data is the data which does not accord with the data specification.
In some implementations, the data processing apparatus further includes: a storage module 320. The processing module 310 is further configured to determine a row key value of a first numerical value corresponding to each stock index; the storage module 320 is configured to store the first values corresponding to the stock indexes as a column according to the row key value of the first value corresponding to each stock index, so as to obtain a target table.
In some implementations, the data processing apparatus further includes: a transceiver module 330. The transceiver module 330 is configured to receive an inquiry request sent by a client, where the inquiry request includes: key values to be queried; the processing module 310 is further configured to perform numerical query in the target table according to the key value to be queried to obtain a query result; the transceiver module 330 is further configured to send a query response to the client, where the query response includes a query result.
In some implementations, the key values include: stock name and time information.
In some implementations, the processing module 310 is further configured to obtain data within a preset time length before the data to be processed; adopting a batch calculation method for the data to be processed and the data in the preset time length to obtain second numerical values corresponding to various stock indexes; judging whether a first numerical value and a second numerical value corresponding to each stock index are the same or not aiming at each stock index in each stock index; if the first numerical value and the second numerical value corresponding to the stock index are the same for each stock index in the stock indexes, determining that the stream calculation method is accurate; otherwise, it is determined that the flow calculation method is inaccurate.
In some implementations, the processing module 310 is further configured to generate a prompt; the transceiver module 330 is also used to push prompt information to prompt the user that the flow calculation method is inaccurate.
In some implementations, the processing module 310 is specifically configured to acquire the data to be processed in real time by using a data synchronization method.
It is to be understood that apparatus embodiments and method embodiments may correspond to one another and that similar descriptions may refer to method embodiments. To avoid repetition, further description is omitted here.
Specifically, the data processing apparatus shown in fig. 3 may execute the first server-side method embodiment, and the foregoing and other operations and/or functions of each module in the data processing apparatus are respectively for implementing corresponding flows in each method of the server side, and are not described herein again for brevity.
The data processing apparatus of the embodiments of the present application is described above in connection with the drawings from the perspective of functional modules. It should be understood that the functional modules may be implemented by hardware, by instructions in software, or by a combination of hardware and software modules. Specifically, the steps of the method embodiments in the present application may be implemented by integrated logic circuits of hardware in a processor and/or instructions in the form of software, and the steps of the method disclosed in conjunction with the embodiments in the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. Alternatively, the software modules may be located in random access memory, flash memory, read only memory, programmable read only memory, electrically erasable programmable memory, registers, and the like, as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps in the above method embodiments in combination with hardware thereof.
Fig. 4 is a schematic block diagram of an electronic device 400 provided in an embodiment of the present application, where the electronic device 400 may be the first server.
As shown in fig. 4, the electronic device 400 may include:
a memory 410 and a processor 420, the memory 410 being configured to store a computer program and to transfer the program code to the processor 420. In other words, the processor 420 may call and run a computer program from the memory 410 to implement the method in the embodiment of the present application.
For example, the processor 420 may be configured to perform the above-described method embodiments according to instructions in the computer program.
In some embodiments of the present application, the processor 420 may include, but is not limited to:
general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like.
In some embodiments of the present application, the memory 410 includes, but is not limited to:
volatile memory and/or non-volatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a flash Memory, or the like. The volatile Memory may be a Random Access Memory (RAM). By way of example, and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), and the like.
In some embodiments of the present application, the computer program may be partitioned into one or more modules, which are stored in the memory 410 and executed by the processor 420 to perform the methods provided herein. The one or more modules may be a series of computer program instruction segments capable of performing certain functions, the instruction segments describing the execution of the computer program in the electronic device.
As shown in fig. 4, the electronic device may further include:
a transceiver 430, the transceiver 430 may be connected to the processor 420 or the memory 410.
The processor 420 may control the transceiver 430 to communicate with other devices, and specifically, may transmit information or data to the other devices or receive information or data transmitted by the other devices. The transceiver 430 may include a transmitter and a receiver. The transceiver 430 may further include antennas, and the number of antennas may be one or more.
It should be understood that the various components in the electronic device are connected by a bus system that includes a power bus, a control bus, and a status signal bus in addition to a data bus.
The present application also provides a computer storage medium having stored thereon a computer program which, when executed by a computer, enables the computer to perform the method of the above-described method embodiments.
Embodiments of the present application also provide a computer program product containing instructions, which when executed by a computer, cause the computer to perform the method of the above method embodiments.
When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the present application occur, in whole or in part, when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website site to another website site. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The available media may be magnetic media (e.g., floppy disks, hard disks, tapes), and so forth.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and all the changes or substitutions should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A data processing method, comprising:
acquiring data to be processed in real time by adopting a data synchronization method, and storing the data to be processed into the same target database, wherein the data to be processed come from different stock databases;
cleaning the data to be processed to obtain cleaned data;
and adopting a flow calculation method for the cleaned data to obtain first numerical values corresponding to various stock indexes.
2. The method according to claim 1, wherein the cleaning the data to be processed to obtain cleaned data comprises:
deleting data which do not meet the data specification in the data to be processed;
converting enumeration values in the data to be processed into corresponding data according to a first preset rule;
and converting the null value in the data to be processed into corresponding data according to a second preset rule.
3. The method according to claim 2, wherein before deleting data that does not meet the data specification from the data to be processed, the method further comprises:
judging whether each data in the data to be processed does not accord with a preset logic and/or whether the generation time is earlier than a preset time;
and if the first data in the data to be processed does not accord with preset logic and/or the generation time is earlier than the preset time, determining that the first data is the data which does not accord with the data specification.
4. The method according to any one of claims 1 to 3, wherein after the applying a stream calculation method to the cleaned data to obtain the first values corresponding to the stock indexes, the method further comprises:
determining the row key value of a first numerical value corresponding to each stock index;
and storing the first numerical values corresponding to the stock indexes into a column respectively according to the row key values of the first numerical values corresponding to the stock indexes so as to obtain a target table.
5. The method of claim 4, further comprising:
receiving a query request sent by a client, wherein the query request comprises: key values to be queried;
carrying out numerical value query in the target table according to the key value to be queried to obtain a query result;
and sending a query response to the client, wherein the query response comprises the query result.
6. The method according to any one of claims 1-3, further comprising:
acquiring data within a preset time before the data to be processed;
adopting a batch calculation method for the data to be processed and the data in the preset duration to obtain second numerical values corresponding to various stock indexes;
aiming at each stock index in the stock indexes, judging whether a first numerical value and a second numerical value corresponding to the stock index are the same;
if the first numerical value and the second numerical value corresponding to each stock index in the stock indexes are the same, determining that the stream calculation method is accurate;
otherwise, it is determined that the stream calculation method is inaccurate.
7. A data processing apparatus, comprising: a processing module to:
acquiring data to be processed in real time by adopting a data synchronization method, and storing the data to be processed into the same target database, wherein the data to be processed come from different stock databases;
cleaning the data to be processed to obtain cleaned data;
and adopting a flow calculation method for the cleaned data to obtain first numerical values corresponding to various stock indexes.
8. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the data processing method of any of claims 1-6 via execution of the executable instructions.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the data processing method of any one of claims 1 to 6.
10. A computer program product comprising instructions for causing an electronic device to perform the data processing method of any one of claims 1-6 when the computer program product is run on the electronic device.
CN202111300424.6A 2021-11-04 2021-11-04 Data processing method, device, equipment, storage medium and computer program product Pending CN114116890A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111300424.6A CN114116890A (en) 2021-11-04 2021-11-04 Data processing method, device, equipment, storage medium and computer program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111300424.6A CN114116890A (en) 2021-11-04 2021-11-04 Data processing method, device, equipment, storage medium and computer program product

Publications (1)

Publication Number Publication Date
CN114116890A true CN114116890A (en) 2022-03-01

Family

ID=80380427

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111300424.6A Pending CN114116890A (en) 2021-11-04 2021-11-04 Data processing method, device, equipment, storage medium and computer program product

Country Status (1)

Country Link
CN (1) CN114116890A (en)

Similar Documents

Publication Publication Date Title
CN111382255B (en) Method, apparatus, device and medium for question-answering processing
CN108572990B (en) Information pushing method and device
CN110069698B (en) Information pushing method and device
CN109918594B (en) Information display method and device
WO2015117560A1 (en) Web page recognizing method and apparatus
US20210326531A1 (en) Mapping natural language utterances to operations over a knowledge graph
JP7254925B2 (en) Transliteration of data records for improved data matching
CN104598539A (en) Internet event hot degree calculation method and terminal
CN112257436A (en) Text detection method and device
CN112784063A (en) Idiom knowledge graph construction method and device
CN116415564B (en) Functional point amplification method and system based on knowledge graph
CN116361552A (en) Campus book retrieval method, device, equipment and readable storage medium
CN106202440B (en) Data processing method, device and equipment
CN111930891B (en) Knowledge graph-based search text expansion method and related device
CN110827101A (en) Shop recommendation method and device
CN113434653A (en) Method, device and equipment for processing query statement and storage medium
CN116383412A (en) Functional point amplification method and system based on knowledge graph
CN112069267A (en) Data processing method and device
CN114116890A (en) Data processing method, device, equipment, storage medium and computer program product
CN111597430A (en) Data processing method and device, electronic equipment and storage medium
US20190087387A1 (en) Method and system for asynchronous correlation of data entries in spatially separated instances of heterogenous databases
CN115485676A (en) User portrait based data processing method, apparatus, device, medium, and program
CN113010550B (en) Batch object generation and batch processing method and device for structured data
US20170351738A1 (en) Automatic conversion stage discovery
CN113918734A (en) Data retrieval method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination