CN111427914B - Data acquisition method and device - Google Patents

Data acquisition method and device Download PDF

Info

Publication number
CN111427914B
CN111427914B CN202010199200.XA CN202010199200A CN111427914B CN 111427914 B CN111427914 B CN 111427914B CN 202010199200 A CN202010199200 A CN 202010199200A CN 111427914 B CN111427914 B CN 111427914B
Authority
CN
China
Prior art keywords
data
content
attribute information
content attribute
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010199200.XA
Other languages
Chinese (zh)
Other versions
CN111427914A (en
Inventor
顾伟
付元宝
王玉东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202010199200.XA priority Critical patent/CN111427914B/en
Publication of CN111427914A publication Critical patent/CN111427914A/en
Application granted granted Critical
Publication of CN111427914B publication Critical patent/CN111427914B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the invention provides a data acquisition method and device, wherein the method comprises the following steps: obtaining behavior data of a user aiming at content data to be accessed; inquiring whether content attribute information of content data to be accessed is stored in data caching equipment or not, wherein the data caching equipment is used for storing the content attribute information of hot content data and the content attribute information of newly added content data; if yes, obtaining content attribute information of the content data to be accessed from the data caching device; if not, obtaining the content attribute information of the content data to be accessed from a full-volume information base, wherein the full-volume information base is used for storing the content attribute information of each content data stored in the content database. The scheme provided by the embodiment of the invention can be applied to obtain the data, so that the data obtaining efficiency can be improved.

Description

Data acquisition method and device
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and an apparatus for obtaining data.
Background
Service providers typically analyze the behavior of users accessing content data such as television shows, movies, entertainment programs, etc. to obtain the rules presented therein in order to be able to provide users with better and better services.
In the prior art, when analyzing the behavior of a user, the behavior of the user is generally analyzed in combination with content attribute information of content data accessed by the user. To facilitate data analysis, service providers typically store attribute information of various content data in a specialized database, so that when the content attribute information of the content data accessed by the user is required, the content attribute information of the content data accessed by the user can be obtained by querying the specialized database.
However, as the content data that the service provider can provide to the user is more and more, the content attribute information stored in the database is more and more stored, and the time required for querying the content attribute information is more and more. In addition, since the number of users of the service provider is very large, the number of content attribute information of content data accessed by users queried by the server in a short time is also high. In view of the above, the efficiency is low when the content attribute information is obtained by querying the database by applying the method provided in the prior art.
Disclosure of Invention
The embodiment of the invention aims to provide a data acquisition method and device so as to improve the data acquisition efficiency. The specific technical scheme is as follows:
In a first aspect, the present invention provides a data acquisition method, the method comprising:
Obtaining behavior data of a user aiming at content data to be accessed;
Inquiring whether content attribute information of the content data to be accessed is stored in data caching equipment, wherein the data caching equipment is used for storing content attribute information of popular content data and content attribute information of newly added content data, the popular content data is content data with the inquiry times larger than preset times in a first preset time period, and the newly added content data is newly added content data in a content database in a second preset time period;
if yes, obtaining content attribute information of the content data to be accessed from the data caching device;
if not, obtaining the content attribute information of the content data to be accessed from a full-volume information base, wherein the full-volume information base is used for storing the content attribute information of each content data stored in the content database.
In one embodiment of the present invention, the method further includes:
Splicing the obtained content attribute information;
Adding the spliced content attribute information to a message queue corresponding to the data analysis service;
And carrying out data analysis on the obtained content attribute information according to the sequence of the information stored in the message queue.
In one embodiment of the present invention, after the obtaining the content attribute information of the content data to be accessed from the full-size information base, the method further includes:
And storing the obtained content attribute information into the data caching device.
In an embodiment of the present invention, an validity period of the stored data in the data buffering device is a preset third duration.
In a second aspect, the present invention provides a data acquisition apparatus, the apparatus comprising:
the behavior data acquisition module is used for acquiring behavior data of a user aiming at the content data to be accessed;
The information inquiry module is used for inquiring whether the content attribute information of the content data to be accessed is stored in the data caching equipment, wherein the data caching equipment is used for storing the content attribute information of the popular content data and the content attribute information of newly added content data, the popular content data is the content data with the inquiry times larger than the preset times in the first preset time length, and the newly added content data is the newly added content data in the content database in the second preset time length; if yes, triggering a first information acquisition module; if not, triggering a second information acquisition module;
the first information obtaining module is used for obtaining content attribute information of the content data to be accessed from the data caching device;
The second information obtaining module is configured to obtain content attribute information of the content data to be accessed from a full-scale information base, where the full-scale information base is configured to store content attribute information of each content data stored in the content database.
In one embodiment of the present invention, the apparatus further includes:
the information splicing module is used for splicing the obtained content attribute information;
the information adding module is used for adding the spliced content attribute information to a message queue corresponding to the service requested by the data analysis service request;
And the data analysis module is used for carrying out data analysis on the obtained content attribute information according to the sequence of the information stored in the message queue.
In one embodiment of the present invention, the apparatus further includes:
and the information storage module is used for storing the obtained content attribute information into the data caching device after the second information obtaining module.
In an embodiment of the present invention, an validity period of the stored data in the data buffering device is a preset third duration.
In a third aspect, an embodiment of the present invention provides a server, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
A memory for storing a computer program;
and a processor, configured to implement the method steps described in the first aspect when executing the program stored in the memory.
In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium having stored therein a computer program which, when executed by a processor, implements the method steps of the first aspect described above.
From the above, when the scheme provided by the embodiment is applied to obtain data, behavior data of a user for the content data to be accessed is obtained, whether content attribute information of the content data to be accessed is stored in the data caching device is queried, and if the content attribute information of the content data to be accessed is queried, the content attribute information is obtained from the data caching device; and if the content attribute information of the content data to be accessed is not queried, acquiring the content attribute information from a full-scale information base. Because the content attribute information stored in the data caching device is the content attribute information of the popular content data and the content attribute information of the newly added content data, compared with the prior art, the data amount stored in the data caching device is smaller, and the time required for inquiring the content attribute information of the content data to be accessed is shorter, so that the efficiency of data acquisition is improved.
When the content attribute information is not queried, the content attribute information is obtained from a full-scale information base. Since the content attribute information stored in the full-size database is the content attribute information of each content data stored in the content database. Accordingly, it is possible to ensure that content attribute information for the above-described content data to be accessed is obtained.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a first data obtaining method according to an embodiment of the present invention;
fig. 2 is a flowchart of a second data obtaining method according to an embodiment of the present invention;
FIG. 3 is a process block diagram of a data acquisition method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a first data obtaining apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a second data obtaining apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
When a service provider queries content attribute information from a database, the technical problem of low data obtaining efficiency in the prior art is solved, and the embodiment of the invention provides a data obtaining method and device.
In one embodiment of the present invention, there is provided a data obtaining method including:
Obtaining behavior data of a user aiming at content data to be accessed;
Inquiring whether content attribute information of content data to be accessed is stored in the data caching device or not, wherein the data caching device is used for storing content attribute information of hot content data and content attribute information of newly added content data, the hot content data is content data with the inquiry times larger than the preset times in a first preset duration, and the newly added content data is newly added content data in a content database in a second preset duration;
if yes, obtaining content attribute information of the content data to be accessed from the data caching device;
If not, obtaining the content attribute information of the content data to be accessed from a full-volume information base, wherein the full-volume information base is used for storing the content attribute information of each content data stored in the content database.
From the above, when the scheme provided by the embodiment is applied to obtain data, behavior data of a user for the content data to be accessed is obtained, whether content attribute information of the content data to be accessed is stored in the data caching device is queried, and if the content attribute information of the content data to be accessed is queried, the content attribute information is obtained from the data caching device; and if the content attribute information of the content data to be accessed is not queried, acquiring the content attribute information from a full-scale information base. Because the content attribute information stored in the data caching device is the content attribute information of the popular content data and the content attribute information of the newly added content data, compared with the prior art, the data amount stored in the data caching device is smaller, and the time required for inquiring the content attribute information of the content data to be accessed is shorter, so that the efficiency of data acquisition is improved.
When the content attribute information is not queried, the content attribute information is obtained from a full-scale information base. Since the content attribute information stored in the full-size database is the content attribute information of each content data stored in the content database. Accordingly, it is possible to ensure that content attribute information for the above-described content data to be accessed is obtained.
Referring to fig. 1, fig. 1 is a flowchart of a first data obtaining method according to an embodiment of the present invention, where the method includes:
And S101, obtaining behavior data of a user aiming at the content data to be accessed.
The above behavior data of the user with respect to the content data to be accessed can be understood as: the user refers to operational behavior data of the content data to be accessed, such as: clicking data of a user on a news page and playing data of a user on a video playing page.
The behavior data may carry behavior type data, behavior duration data, an identifier of content data to be accessed, and the like.
Specifically, the behavior data of the user can be obtained through the behavior data of the user recorded by the client. For example: the client can obtain behavior data of the user when the user accesses the news page in a buried point mode and the like.
S102, inquiring whether content attribute information of the content data to be accessed is stored in the data caching device, if yes, executing S103, and if not, executing S104.
The data caching device may be a dis (Remote Dictionary Server, remote dictionary service) based electronic device. The Redis is a data storage system, and has better query performance but smaller storage capacity.
The data caching device is used for storing content attribute information of hot content data and content attribute information of newly added content data.
Specifically, the content data may include: movies, television shows, entertainment programs, etc. The content attribute information of the above content data can be understood as: information of respective content attributes of the content data.
For example: for movie X, the content attribute information of the video corresponding to movie X may be as shown in table 1 below.
TABLE 1
Duration of time Time of showing Director Director (director)
90min 2019-01-01 Star A1, star A2 Director B
In table 1, the "duration, the time of the presentation, the director, and the director" are each content attribute of the movie X, the "90min" is information in which the content attribute is the duration, the "2019-01-01" is information in which the content attribute is the time of the presentation, the "star A1, and the" star A2 "are information in which the content attribute is the director, and the" director B "is information in which the content attribute is the director. Thus, "90min, 2019-01-01, star A1, star A2, director B" is the content attribute information of movie X.
The hot content data are content data with the query times larger than the preset times in the first preset time.
The first preset duration may be one day, one week, or the like. The preset number of times may be 1000 times or the like. For example: assuming that the first preset time period is one day, the specific time period is from 2019-01-01:00 to 2019-01-02 00:00, the preset times are 1000 times, the content data comprise a television series A and a television series B, and the television series A is obtained through statistics in 2019-01-01:00 to 2019-01-02 00: the number of queries in 00 is 2000, and the statistics of TV play B is from 2019-1-1:00 to 2019-1-2 00: the number of queries within 00 is 900. Since 2000 is greater than 1000,900 and less than 1000, drama a is popular content data and drama B is not popular content data.
When the content attribute information of the hot content data is obtained, the content attribute information of the hot content data within a preset first duration can be loaded into the data caching device in a cold loading mode, namely in an offline loading mode.
The newly added content data is newly added content data in the content database within a second preset time period.
In particular, the content database is used for storing content data, and the content data in the content database can be increased in real time.
Since the newly added content data in the content database is added in real time, in order to avoid excessive data volume of the newly added content data cached in the data caching device, the newly added content is the newly added content data in the content database within the second preset time period.
Specifically, the second preset duration may be one day, one week, or the like. For example: assuming that the second preset time period is one day, specifically from 2020-01-01:00 to 2020-01-02:00, the newly added content data may be content data newly added in the content database in the time period 2020-01-01:00 to 2020-01-02:00.
Specifically, the data caching device may monitor the content data of the preset content database to obtain newly added content data in the preset content database.
Because the probability of the user requesting the content attribute information of the popular content data is high, and because the user is interested in comparing the newly added content data, the probability of the user requesting the content attribute information of the newly added content data is also high, the content attribute information of the popular content data and the content attribute information of the newly added content data stored in the data caching device can meet the needs of most users.
Specifically, in querying the above-mentioned data caching device, the content attribute information may be determined according to the identifier of the content data to be accessed.
Since the content attribute information of the popular content data and the content attribute information of the newly added content data are stored in the data caching device. And the user may request to obtain content attribute information of the cold content data. The cold door content data are as follows: and querying the content data of which the number of times does not exceed the preset number of times in the first preset time. Therefore, the content attribute information of the content data to be accessed may not be queried in the above-described data caching apparatus. In this case, S104 is performed.
In one embodiment of the present invention, the validity period of the data stored in the data buffering device in S102 may be a preset third duration.
Specifically, the preset third duration may be one day, two days, one week, or the like. When the data stored in the data caching device exceeds the preset third time period, the stored data can be discarded and restored.
In this way, by setting the validity period of the stored data in the data caching device, the data stored in the data caching device can be cleaned up at regular time, so that the data in the data caching device is the latest content data and the content attribute information of the popular content data, and therefore the efficiency of data acquisition is improved.
And S103, obtaining content attribute information of the content data to be accessed from the data caching device.
And S104, obtaining content attribute information of the content data to be accessed from the full-quantity information base.
The full-size information base may be a distributed data storage system based on Hbase tables, and the full-size information base is used to store content attribute information of each content data stored in the content database.
Specifically, the full-volume information base may be synchronized with the content database in real time, and when the content data stored in the content database is updated, the content attribute information of each content data stored in the full-volume information base is updated accordingly. In this way, it is possible to ensure that the content attribute information stored in the full-size information base is the content attribute information of each piece of content data stored in the content database.
In addition, because the probability that the user generally accesses the popular data or the latest data is higher, and the content attribute information of the popular data or the latest data is stored in the data caching device, the possibility that the content attribute information of the content data to be accessed by the user is obtained from the data caching device is higher, and compared with the prior art, the data obtaining efficiency is improved.
Since the full-size information base is used to store the content attribute information of each content data stored in the content database, when the content attribute information of the content data to be accessed is not queried in the data caching device in S102, the content attribute information is necessarily stored in the full-size information base, and thus the content attribute information of the content data to be accessed can be obtained from the full-size information base.
Specifically, the full-size information base may store the content attribute information in the manner that the data caching device stores the content attribute information in S102, so that a detailed description thereof is omitted.
Similarly, when the content attribute information is queried from the full-size information base, the content attribute information may be queried in such a manner that the content attribute information is stored in the data cache device in S102. Therefore, the description thereof is omitted.
From the above, when the scheme provided by the embodiment is applied to obtain data, behavior data of a user for the content data to be accessed is obtained, whether content attribute information of the content data to be accessed is stored in the data caching device is queried, and if the content attribute information of the content data to be accessed is queried, the content attribute information is obtained from the data caching device; and if the content attribute information of the content data to be accessed is not queried, acquiring the content attribute information from a full-scale information base. Because the content attribute information stored in the data caching device is the content attribute information of the popular content data and the content attribute information of the newly added content data, compared with the prior art, the data amount stored in the data caching device is smaller, and the time required for inquiring the content attribute information of the content data to be accessed is shorter, so that the efficiency of data acquisition is improved.
When the content attribute information is not queried, the content attribute information is obtained from a full-scale information base. Since the content attribute information stored in the full-size database is the content attribute information of each content data stored in the content database. Accordingly, it is possible to ensure that content attribute information for the above-described content data to be accessed is obtained.
In one embodiment of the present invention, after obtaining the content attribute information of the content data to be accessed from the full-size information base in S104, the following steps may be further included.
And storing the obtained content attribute information into the data caching device.
The content attribute information obtained from the full-size information base may also need to be used for a certain period of time after the content attribute information is obtained from the full-size information base, and thus, in order to enable the content attribute information to be queried from the data caching device later, the obtained content attribute information may be stored in the data caching device. Therefore, the data query time can be saved, and the data acquisition efficiency is improved.
In one embodiment of the present invention, after the data caching device in S102 obtains the content attribute information of the popular content data and the content attribute information of the newly added content data, the content attribute information may be stored in the following two ways.
In the first mode, content data to which each piece of content attribute information belongs is classified, and each classification includes content attribute information of the content data corresponding to the classification.
For example: assuming that the content data obtained in the above-described data caching device is content data 1 and content data 2, the content attributes of the content data 1,2 include content attribute a, content attribute b, content attribute c, "XX, YY, ZZ" are content attribute information of the content data 1, and "XX, YY, ZZ" are content attribute information of the content data 2. The content attribute information may be stored in the storage manner shown in table 2 and table 3.
TABLE 2
TABLE 3 Table 3
The table 2 stores content attribute information of the content data 1, and the table 3 stores content attribute information of the content data 2.
In the second mode, classification is performed according to the identification of each content attribute, and each classification includes content attribute information of each content data of the content attribute corresponding to the classification.
For example: along the examples corresponding to tables 2 and 3, the content attribute information of the content data 1 and the content data 2 may be stored in the storage systems shown in tables 4, 5 and 6.
TABLE 4 Table 4
TABLE 5
Content attribute b Yy-content data 1 YY-content data 2
TABLE 6
Content attribute c Zz content data 1 ZZ content data 2
When the content attribute a is stored in the above table 4, the content attributes of the content data 1 and the content data 2 are information of the content attribute a; when the content attribute b is stored in the above table 5, the content attributes of the content data 1 and the content data 2 are information of the content attribute b; when the content attribute c is stored in the above table 6, the content attributes of the content data 1 and the content data 2 are information of the content attribute c.
Referring to fig. 2, fig. 2 is a schematic flow chart of a second data obtaining process according to an embodiment of the present invention, and after S103 or S104, the following S105-S107 may be further included.
And S105, splicing the obtained content attribute information.
Since the obtained content attribute information may be information of a plurality of content attributes of the content data, the obtained content attribute information may be spliced to obtain content attribute information of the complete content data.
For example: along the examples corresponding to tables 2 and 3, assuming that the obtained content attribute information is content attribute information xx and content attribute information yy, the obtained content attribute information is spliced, and the spliced content attribute information may be: content attribute information xx (content attribute a information of content data 1) -content attribute information yy (content attribute b information of content data 2), the content in brackets after the content attribute information indicating which content attribute information is which content attribute of which content data.
And S106, adding the spliced content attribute information into a message queue corresponding to the data analysis service.
After the spliced content attribute information is obtained, the obtained content attribute information can be distributed according to the data required by each data analysis service, and the spliced content attribute information is added into a message queue corresponding to the data analysis service.
Specifically, the spliced content attribute information may be in a time sequence, for example: and sequentially adding the spliced content attribute information into a message queue corresponding to the data analysis service according to the inquiry time, the splicing time and the like.
And S107, carrying out data analysis on the obtained content attribute information according to the sequence of the information stored in the message queue.
In this way, the obtained content attribute information is added to the message queue corresponding to the data analysis service, and each service can perform data analysis on the obtained content attribute information according to the content attribute information stored in the message queue.
In addition, as can be seen from the steps S101 to S107, the execution body of the embodiment of the present invention not only receives the behavior data of the user, that is, not only receives the user traffic, but also performs operations such as querying, splicing, splitting and the like of the content attribute information with respect to the received user traffic, so that the execution body of the embodiment of the present invention can be considered to play a role of a flow agent, and thus can be referred to as a flow agent layer.
Especially under the condition that the content attribute information is more in quantity, the flow agent layer can share the splicing of the content attribute information and the flow splitting work, and the data caching equipment and the full database are not needed for splicing the content attribute information, so that the data obtaining efficiency is improved.
The data acquisition method provided by the embodiment of the present invention will be described in detail below with a specific example.
Referring to fig. 3, fig. 3 is a process block diagram of a data obtaining method according to an embodiment of the present invention.
In fig. 3, there are included a redis cache layer, hbase table, content database, traffic agent layer, behavior data of the user, and message queue 1, message queue 2, … … message queue n.
Wherein the content database is used for storing content data. The Redis cache layer and Hbase table can obtain content attribute information of the content data from the content database.
The Redis cache layer is used for storing content attribute information of hot content data and content attribute information of newly added content data in the content data. The Redis cache layer is the data cache device in the above S102.
The Hbase table is content attribute information for storing each piece of content data stored in the content database. The Hbase table is the full size information base in the S104.
The behavior data of the user is behavior data of the user for the content data to be accessed.
The message queues 1, 2 and … … correspond to the data analysis service, and are used for performing data analysis on the stored content attribute information according to the content attribute information stored in the message queues.
The flow agent layer is an execution subject of the embodiment of the present invention. After obtaining behavior data of a user aiming at the content data to be accessed, the flow agent layer inquires content attribute information of the content data to be accessed in a Redis cache layer, if the content attribute information of the content data to be accessed is not inquired, the content attribute information for the content data to be accessed is obtained in an Hbase table, the obtained content attribute information is spliced, and the spliced content attribute information is added into each message queue.
Corresponding to the data obtaining method, the embodiment of the invention also provides a data obtaining device.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a first data obtaining apparatus according to an embodiment of the present invention, where the apparatus includes 401 to 404.
A behavior data obtaining module 401, configured to obtain behavior data of a user for content data to be accessed;
the information query module 402 is configured to query whether content attribute information of content data to be accessed is stored in a data cache device, where the data cache device is configured to store content attribute information of popular content data and content attribute information of newly added content data, where the popular content data is content data with a number of queries greater than a preset number of times in a first preset duration, and the newly added content data is newly added content data in a content database in a second preset duration; if yes, triggering a first information acquisition module; if not, triggering a second information acquisition module;
A first information obtaining module 403, configured to obtain content attribute information of content data to be accessed from the data caching device;
A second information obtaining module 404, configured to obtain content attribute information of the content data to be accessed from a full-scale information base, where the full-scale information base is used to store content attribute information of each content data stored in the content database.
From the above, when the scheme provided by the embodiment is applied to obtain data, behavior data of a user for the content data to be accessed is obtained, whether content attribute information of the content data to be accessed is stored in the data caching device is queried, and if the content attribute information of the content data to be accessed is queried, the content attribute information is obtained from the data caching device; and if the content attribute information of the content data to be accessed is not queried, acquiring the content attribute information from a full-scale information base. Because the content attribute information stored in the data caching device is the content attribute information of the popular content data and the content attribute information of the newly added content data, compared with the prior art, the data amount stored in the data caching device is smaller, and the time required for inquiring the content attribute information of the content data to be accessed is shorter, so that the efficiency of data acquisition is improved.
When the content attribute information is not queried, the content attribute information is obtained from a full-scale information base. Since the content attribute information stored in the full-size database is the content attribute information of each content data stored in the content database. Accordingly, it is possible to ensure that content attribute information for the above-described content data to be accessed is obtained.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a second data obtaining apparatus according to an embodiment of the present invention, where the apparatus further includes 405-407.
An information splicing module 405, configured to splice the obtained content attribute information;
an information adding module 406, configured to add the spliced content attribute information to a message queue corresponding to the service requested by the data analysis service request;
The data analysis module 407 is configured to perform data analysis according to the obtained content attribute information in the order of the information stored in the message queue.
In this way, the obtained content attribute information is added to the message queue corresponding to the service for data analysis, and each service can perform data analysis on the obtained content attribute information based on the content attribute information stored in the message queue.
In one embodiment of the present invention, the apparatus further includes: and an information storage module, configured to store the obtained content attribute information in the data caching device after the second information obtaining module 404.
The content attribute information obtained from the full-size information base may also need to be used for a certain period of time after the content attribute information is obtained from the full-size information base, and thus, in order to enable the content attribute information to be queried from the data caching device later, the obtained content attribute information may be stored in the data caching device. Therefore, the data query time can be saved, and the data acquisition efficiency is improved.
In an embodiment of the present invention, an validity period of the stored data in the data buffering device is a preset third duration.
In this way, by setting the validity period of the stored data in the data caching device, the data stored in the data caching device can be cleaned up at regular time, so that the data in the data caching device is the latest content data and the content attribute information of the popular content data, and therefore the efficiency of data acquisition is improved.
Corresponding to the data obtaining method, the embodiment of the invention also provides a server.
The embodiment of the present invention further provides a server, as shown in fig. 6, fig. 6 is a schematic structural diagram of the server provided in the embodiment of the present invention, including a processor 601, a communication interface 602, a memory 603, and a communication bus 604, where the processor 601, the communication interface 602, and the memory 603 complete communication with each other through the communication bus 604,
A memory 603 for storing a computer program;
The processor 601 is configured to implement the data obtaining method provided by the embodiment of the present invention when executing the program stored in the memory 603.
The communication bus mentioned above for the electronic device may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the electronic device and other devices.
The memory may include random access memory (Random Access Memory, RAM) or may include non-volatile memory (NVM), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but may also be a digital signal processor (DIGITAL SIGNAL Processing, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.
In still another embodiment of the present invention, a computer readable storage medium is provided, in which a computer program is stored, which when executed by a processor, implements the data obtaining method provided by the embodiment of the present invention.
In yet another embodiment of the present invention, a computer program product comprising instructions which, when run on a computer, cause the computer to perform the data acquisition method provided by the embodiment of the present invention is also provided.
From the above, when the scheme provided by the embodiment is applied to obtain data, behavior data of a user for the content data to be accessed is obtained, whether content attribute information of the content data to be accessed is stored in the data caching device is queried, and if the content attribute information of the content data to be accessed is queried, the content attribute information is obtained from the data caching device; and if the content attribute information of the content data to be accessed is not queried, acquiring the content attribute information from a full-scale information base. Because the content attribute information stored in the data caching device is the content attribute information of the popular content data and the content attribute information of the newly added content data, compared with the prior art, the data amount stored in the data caching device is smaller, and the time required for inquiring the content attribute information of the content data to be accessed is shorter, so that the efficiency of data acquisition is improved.
When the content attribute information is not queried, the content attribute information is obtained from a full-scale information base. Since the content attribute information stored in the full-size database is the content attribute information of each content data stored in the content database. Accordingly, it is possible to ensure that content attribute information for the above-described content data to be accessed is obtained.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the term "comprising
"Comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, server, computer readable storage medium embodiments, since they are substantially similar to method embodiments, the description is relatively simple, and relevant references are made to the partial description of method embodiments.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (8)

1. A method of data acquisition, the method comprising:
Obtaining behavior data of a user aiming at content data to be accessed;
Inquiring whether content attribute information of the content data to be accessed is stored in data caching equipment, wherein the data caching equipment is used for storing content attribute information of popular content data and content attribute information of newly added content data, the popular content data is content data with the inquiry times larger than preset times in a first preset time period, and the newly added content data is newly added content data in a content database in a second preset time period;
if yes, obtaining content attribute information of the content data to be accessed from the data caching device;
If not, obtaining the content attribute information of the content data to be accessed from a full-volume information base, wherein the full-volume information base is used for storing the content attribute information of each content data stored in the content database;
the method further comprises the steps of:
splicing the obtained content attribute information; wherein the content attribute information is a plurality of;
Adding the spliced content attribute information to message queues corresponding to the data analysis services, wherein different data analysis services correspond to different message queues;
And carrying out data analysis on the obtained content attribute information according to the sequence of the information stored in the message queue.
2. The method according to claim 1, further comprising, after said obtaining content attribute information of said content data to be accessed from a full-size information base:
And storing the obtained content attribute information into the data caching device.
3. The method according to claim 1 or 2, wherein the validity period of the stored data in the data caching device is a preset third duration.
4. A data acquisition device, the device comprising:
the behavior data acquisition module is used for acquiring behavior data of a user aiming at the content data to be accessed;
The information inquiry module is used for inquiring whether the content attribute information of the content data to be accessed is stored in the data caching equipment, wherein the data caching equipment is used for storing the content attribute information of the popular content data and the content attribute information of newly added content data, the popular content data is the content data with the inquiry times larger than the preset times in the first preset time length, and the newly added content data is the newly added content data in the content database in the second preset time length; if yes, triggering a first information acquisition module; if not, triggering a second information acquisition module;
the first information obtaining module is used for obtaining content attribute information of the content data to be accessed from the data caching device;
The second information obtaining module is configured to obtain content attribute information of the content data to be accessed from a full-scale information base, where the full-scale information base is configured to store content attribute information of each content data stored in the content database;
The apparatus further comprises:
The information splicing module is used for splicing the obtained content attribute information; wherein the content attribute information is a plurality of;
The information adding module is used for adding the spliced content attribute information to a message queue corresponding to the service requested by the data analysis service request, wherein different data analysis services correspond to different message queues;
And the data analysis module is used for carrying out data analysis on the obtained content attribute information according to the sequence of the information stored in the message queue.
5. The apparatus of claim 4, wherein the apparatus further comprises:
and the information storage module is used for storing the obtained content attribute information into the data caching device after the second information obtaining module.
6. The apparatus according to claim 4 or 5, wherein the validity period of the stored data in the data caching device is a preset third duration.
7. The server is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
A memory for storing a computer program;
A processor for carrying out the method steps of any one of claims 1-3 when executing a program stored on a memory.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-3.
CN202010199200.XA 2020-03-20 2020-03-20 Data acquisition method and device Active CN111427914B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010199200.XA CN111427914B (en) 2020-03-20 2020-03-20 Data acquisition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010199200.XA CN111427914B (en) 2020-03-20 2020-03-20 Data acquisition method and device

Publications (2)

Publication Number Publication Date
CN111427914A CN111427914A (en) 2020-07-17
CN111427914B true CN111427914B (en) 2024-04-19

Family

ID=71548272

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010199200.XA Active CN111427914B (en) 2020-03-20 2020-03-20 Data acquisition method and device

Country Status (1)

Country Link
CN (1) CN111427914B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108132958A (en) * 2016-12-01 2018-06-08 阿里巴巴集团控股有限公司 A kind of multi-level buffer data storage, inquiry, scheduling and processing method and processing device
CN110795457A (en) * 2019-09-24 2020-02-14 苏宁云计算有限公司 Data caching processing method and device, computer equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108132958A (en) * 2016-12-01 2018-06-08 阿里巴巴集团控股有限公司 A kind of multi-level buffer data storage, inquiry, scheduling and processing method and processing device
CN110795457A (en) * 2019-09-24 2020-02-14 苏宁云计算有限公司 Data caching processing method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN111427914A (en) 2020-07-17

Similar Documents

Publication Publication Date Title
KR101782810B1 (en) Method, apparatus, and system for determining a location of a terminal
US11394796B2 (en) Dynamic and static data of metadata objects
US9264774B2 (en) Seamless multi-channel TV everywhere sign-in
JP6781330B2 (en) How and equipment to update the search cache
US20150381678A1 (en) Managing content on an isp cache
US20230409527A1 (en) Method And System For Deleting Obsolete Files From A File System
CN111597448A (en) Information heat determining method and device and electronic equipment
CN112069386B (en) Request processing method, device, system, terminal and server
CN111427914B (en) Data acquisition method and device
CN111143662A (en) Content recommendation method, device and computer storage medium
CN112491939B (en) Multimedia resource scheduling method and system
US20140372361A1 (en) Apparatus and method for providing subscriber big data information in cloud computing environment
CN111586447A (en) Flow guide method of multimedia website, playing method of multimedia data and server
CN112860739A (en) Hotspot data processing method and device, service processing system and storage medium
CN110753268B (en) Page card data generation method and device and electronic equipment
CN114710554B (en) Message processing method and device, electronic equipment and storage medium
CN109086428B (en) Forwarding information access frequency counting method and device
CN113660277B (en) Crawler-resisting method based on multiplexing embedded point information and processing terminal
CN111698324B (en) Data request method, device and system
CN115623247A (en) HLS streaming media data processing method, device and system
US20220405244A1 (en) Batch deletion method and apparatus for cache contents, device and readable storage medium
US10191994B2 (en) Reading from a multitude of web feeds
CN114996231A (en) Data writing method and device and electronic equipment
CN115510273A (en) Data searching method and device, electronic equipment and storage medium
CN114385896A (en) Recommendation method, device, equipment and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant