CN108536759B

CN108536759B - Sample playback data access method and device

Info

Publication number: CN108536759B
Application number: CN201810230627.4A
Authority: CN
Inventors: 魏宏; 张晓明
Original assignee: Alibaba Group Holding Ltd
Current assignee: Advanced New Technologies Co Ltd; Advantageous New Technologies Co Ltd
Priority date: 2018-03-20
Filing date: 2018-03-20
Publication date: 2020-08-04
Anticipated expiration: 2038-03-20
Also published as: CN108536759A; TW201941124A; WO2019179252A1; TWI706343B

Abstract

A sample playback data access method and apparatus are disclosed. Configuring a record information table, a batch information table and a data content table; for any piece of data to be stored, the following storage operations are performed: distributing record identification for the data to be stored according to a record information table; distributing batch identification for the data to be stored according to a batch information table; splicing the distributed record identification, batch identification and the content of the data to be stored according to the storage structure of the data content table, and writing the splicing result into the data content table; and updating the record information table and the batch information table.

Description

Sample playback data access method and device

Technical Field

The embodiment of the specification relates to the technical field of machine learning, in particular to a sample playback data access method and device.

Background

At present, artificial intelligence has become a research focus of various industries, a machine learning (or deep learning) algorithm is a key technology for realizing artificial intelligence, and at present, some algorithms are beginning to be applied to solving actual business requirements. Meanwhile, researchers also find that other peripheral problems besides algorithms, such as data access, hardware resource occupation and the like, generate new requirements in new application scenarios, and some traditional mature schemes are not suitable any more.

Taking the sample playback requirement in reinforcement learning as an example, in reinforcement learning, in order to train, the previous behavior sample needs to be played back as an input of model learning. The sample playback plays a bridge role between the behavior profit and the iterative training in the reinforcement learning, and in order to improve the learning effect, the sample playback can adopt various playback strategies, such as sequential playback, random playback, batch playback, sampling playback according to a specified probability and the like. These strategies are supported in theoretical algorithms and can be respectively and smoothly realized in experimental environments, however, in practical applications, various playback strategies need to be flexibly switched in a service scene, and practical problems such as distributed service environments, huge data throughput and the like sometimes need to be considered, and no scheme can meet these requirements at present.

Disclosure of Invention

In view of the above technical problems, an embodiment of the present specification provides a method and an apparatus for accessing sample playback data, and a technical solution is as follows:

according to the 1 st aspect of the embodiments of the present specification, there is provided a sample playback data storage method, configuring a recording information table, a batch information table, a data content table;

the recording information table is used for storing a recording identifier of the newly written sample playback data;

the batch information table is used for storing batch identification of the latest written sample playback data;

the data content table is used for storing sample playback data, and each piece of sample playback data forms an identification field by a record identification and a batch identification;

for any piece of data to be stored, the following operations are performed:

distributing record identification for the data to be stored according to a record information table;

distributing batch identification for the data to be stored according to a batch information table;

splicing the distributed record identification, batch identification and the content of the data to be stored according to the storage structure of the data content table, and writing the splicing result into the data content table;

and updating the record information table and the batch information table.

According to the 2 nd aspect of the embodiments of the present specification, there is provided a sample playback data reading method including:

determining playback requirements as: randomly selecting record playback;

obtaining the total recording number sum of the written sample playback data according to the recording information table;

generating a random number array, wherein the random number array comprises n random values selected from the sum recording identifiers, and n is the number of sample records required by playback;

traversing the random number array to execute the following steps to obtain n sample playback data records: and taking any numerical value in the array as a record identifier, and reading the sample playback data with the record identifier from the data content table.

According to the 3 rd aspect of the embodiments of the present specification, there is provided a sample playback data reading method including:

determining playback requirements as: randomly selecting batches for playback;

obtaining the total batch count batch _ sum of the written sample playback data according to the batch information table;

generating a random number array, wherein the random number array comprises n random values selected from the batch _ sum recording identifications, and n is the number of sample batches required by playback;

traversing the random number array to execute the following steps to obtain n sample playback data batches: and taking any numerical value in the array as a batch identifier, and reading the sample playback data with the batch identifier from the data content table.

According to the 4 th aspect of the embodiments of the present specification, there is provided a sample playback data storage device configured with a recording information table, a batch information table, a data content table;

the device comprises: the system comprises an identification distribution module, a content writing module and an information updating module, wherein for any piece of data to be stored:

the identification distribution module is used for distributing record identifications to the data to be stored according to a record information table; distributing batch identification for the data to be stored according to a batch information table;

the content writing module is used for splicing the distributed record identification, batch identification and the content of the data to be stored according to the storage structure of the data content table and writing the splicing result into the data content table;

and the information updating module is used for updating the record information table and the batch information table.

According to the 5 th aspect of the embodiments of the present specification, there is provided a sample playback data reading apparatus including:

a playback requirement determining module, configured to determine that the playback requirement is: randomly selecting record playback;

a total record number determining module, configured to obtain a total record number sum of the written sample playback data according to the record information table;

the data reading module is used for generating a random number array, wherein the random number array comprises n random values selected from the sum recording identifiers, and n is the number of sample records required by playback; traversing the random number array to execute the following steps to obtain n sample playback data records: and taking any numerical value in the array as a record identifier, and reading the sample playback data with the record identifier from the data content table.

According to the 6 th aspect of the embodiments of the present specification, there is provided a sample playback data reading apparatus including:

a playback requirement determining module, configured to determine that the playback requirement is: randomly selecting batches for playback;

the batch total number determining module is used for obtaining the batch total number batch _ sum of the written sample playback data according to the batch information table;

the data reading module is used for generating a random number array, wherein the random number array comprises n random values selected from the batch _ sum record identifications, and n is the number of sample batches required by playback; traversing the random number array to execute the following steps to obtain n sample playback data batches: and taking any numerical value in the array as a batch identifier, and reading the sample playback data with the batch identifier from the data content table.

According to the technical scheme provided by the embodiment of the specification, the recording information and the batch information of the sample playback data are extracted, and special entries are configured to store the recording information and the batch information respectively; when the sample playback is needed, various common sample playback strategies can be flexibly realized, so that the application requirements of actual services can be better met.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of embodiments of the invention.

In addition, any one of the embodiments in the present specification is not required to achieve all of the effects described above.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present specification, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1a and 1b are schematic flow diagrams of a sample playback data storage method according to an embodiment of the present disclosure;

FIG. 2 is a block diagram of an overall architecture of a sample playback data access system according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of a sample playback data reading method according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a sample playback data storage device according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a first sample playback data reading apparatus according to an embodiment of the present specification;

fig. 6 is a schematic structural diagram of a second sample playback data reading apparatus according to an embodiment of the present specification;

fig. 7 is a schematic structural diagram of a third sample playback data reading apparatus according to an embodiment of the present specification;

fig. 8 is a schematic structural diagram of an apparatus for configuring a device according to an embodiment of the present disclosure.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present specification, the technical solutions in the embodiments of the present specification will be described in detail below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of protection.

Reinforcement learning (also called reinjection learning, evaluation learning, etc.) is an important machine learning method, and has many applications in the fields of intelligent robot control, analysis and prediction, etc. In the reinforcement learning process, a computer tries to select a series of behaviors without any prompt and obtains a corresponding result, the previous behaviors are evaluated by judging the superiority and inferiority of the result, the evaluation is used for feeding back to a behavior party to adjust the previous behaviors, the algorithm aims to adjust the behaviors to obtain the best evaluation, and the computer can learn what behavior is selected under what condition to obtain the best result by continuous adjustment.

The reinforcement learning sample playback uses behavior data as original samples and performs different playback strategies according to different reinforcement learning algorithms, for example, sequential playback, random playback, batch playback, sampling playback with a specified probability, and the like.

Although the prior art scheme can realize the multiple playback strategies, each strategy is realized independently, which cannot meet the actual requirement of flexible strategy switching, and brings higher development and maintenance cost. In addition, in the prior art, a single-machine memory queue is used as a carrier to realize sample playback, and after the memory queue is full, the earliest added sample record is deleted, which is similar to a FIFO (first in first out) queue. However, the memory queue can only realize multithread sharing in a single machine and cannot be applied to an actual distributed service environment; and due to the limitation of factors such as memory capacity and non-persistent storage, the requirements of large data volume, delayed data use and the like in practical application cannot be met.

In view of the above problems, embodiments of the present specification provide a sample playback data access method. On one hand, the method takes a database as a carrier, thereby ensuring the storage capacity and the persistent and reliable storage of sample playback data. On the other hand, a corresponding data storage structure and a data access method are also provided for the actual requirement of sample playback.

In the reinforcement learning process, each behavior of the computer correspondingly generates a piece of behavior data, and the specific content of a piece of data may include: the specific content in the data may vary according to an algorithm, and the specification is not limited to which specific content fields are included in a piece of data.

In order to distinguish between multiple behaviors, 1 piece of data as described above is called 1 record (record), and different records are distinguished by a "record identification" field.

In addition, in the reinforcement learning process, in order to facilitate batch playback processing, multiple behaviors need to be divided into different sets, the division may be performed according to the number (for example, every 1000 pieces of behavior data are recorded as one set), or may be performed according to the actual application logic (for example, behavior data generated by each policy is recorded as one batch, behavior data generated by each environment is recorded as one batch, and the like), and the present specification does not limit which set is used to divide the logic.

To distinguish between multiple batches, 1 data set as described above is referred to as a 1 batch (batch), with different batches distinguished by a "batch identification" field.

It can be seen that in the sample playback requirement, a piece of sample playback data should at least contain the following fields:

record identification, batch identification and behavior content

In this specification, a table recording the data structure is referred to as a "data content table," where "record identifier" and "batch identifier" are both identifier fields, and different records may correspond to the same batch or different batches; the "behavior content" is a content field, there may be a plurality of content fields, and the specific content fields corresponding to different algorithms may be different.

The data storage structure provided in the present specification includes, in addition to the data content table, "record information table" and "batch information table":

recording information table: a recording flag for storing a piece of the latest written sample playback data; alternatively, the total number of records to which the sample playback data has been currently written may also be stored in the recording information table. The reason why the total number of records is defined herein as optional is that: in some cases, the total number of records may be determined directly from the "most recently written record identification", for example: the record id starts from 0 and naturally increments by +1, and without limiting the total number of records, the total number of records is the latest written record id number + 1.

Batch information table: a batch identifier for storing a piece of the newly written sample playback data; alternatively, the total number of batches to which the sample playback data has been currently written may also be stored in the batch information table. The reason for defining the total number of batches as "optional" is similar to the reason for defining the total number of records as optional, and is not described herein again.

On the basis of defining the storage structure, a sample playback data storage method is further provided, as shown in fig. 1a and fig. 1b, for any piece of sample playback data to be stored, the storage method may include the following steps:

s101, distributing record identification for the data to be stored according to the record information table, and distributing batch identification for the data to be stored according to the batch information table;

the most basic identification allocation method is numbering by natural counting, taking record identification as an example, and assuming that the number of the 1 st written record is 0, the records written subsequently will be numbered 1,2, and 3 … … in sequence. Assuming that "record identification of newly written sample playback data" is expressed by cur, before each time data is written, the calculation formula for assigning a record identification to data to be written is used as:

cur＝cur+1

if the maximum value of the recording number of the sample playback data allowed to be stored is preset for the data content table, the maximum value can be used as a counting period to allocate a recording identifier for the data to be stored, for example, the following formula is used to allocate the recording identifier:

cur＝(cur+1)％max

the distribution mode of the batch identifier is basically similar to that of the record identifier, except that the same batch is allowed to contain a plurality of different records, so that when the batch identifier is distributed, whether the current stored data is the same as the last stored data batch or not may need to be judged; if so, allocating a batch identifier which is the same as the last piece of stored data to the data to be stored; otherwise, distributing a new batch identifier for the data to be stored.

It should be noted that the above-mentioned scheme of identifying and numbering records or batches by natural counts is only a specific example, and should not be construed as a limitation to the scheme of this specification, and for example, other specific algorithms may be used to generate identification information for each record or each batch, which do not affect the implementation of the scheme of this specification.

S102, splicing the distributed record identification, batch identification and the content of the data to be stored according to the storage structure of the data content table, and writing the splicing result into the data content table;

according to the description of the previous embodiment, the data content table comprises 3 parts of basic content: recording identification, batch identification and behavior content.

For the current data to be stored, "record identifier" and "batch identifier" are already determined in S101, and "behavior content" is obtained from an external application, and the three contents are spliced to obtain a triple data set of [ record identifier, batch identifier, behavior content ], and the data is written into a data content table as a new data row. It is understood that the "behavior content" herein generally corresponds to a plurality of specific fields, and when actually stored, a certain conversion process may be performed on a plurality of data obtained from external applications, which do not affect the implementation of the solution of the present specification.

S103, updating the record information table and the batch information table.

According to the definition of the record information table and the batch information table, after the data content table is written, the record information table and the batch information table are updated correspondingly.

The most basic update operation is to update the "recording identification of the most recently written sample playback data" and the "batch identification of the most recently written sample playback data". The updated values are the record identifier and the batch identifier allocated to the data to be stored in S101.

In fact, according to the calculation formula in S101, it can be considered that the update is completed while the new identifier is calculated, and such a processing manner can also be adopted in practical applications. In a strict sense, however, the update flag operation should be performed after the successful writing of the data into the data content table is confirmed, and therefore, the method steps provided in this specification are still designed according to the above strict flow, but it should be clear to those skilled in the art that such method steps should not be construed as limiting the scheme.

In addition, if the total number of records of the playback data is also configured in the record information table, the updating operation should further include: the total number of records is updated, i.e. the original total number of records + 1.

Similarly, if the total number of batches of playback data is also configured in the batch information table, the update operation should further include: the total number of batches is updated. Specifically, if the newly written record belongs to a new batch, the total number of the original batches is + 1; if the last written record batch has not changed, the total number of batches is kept unchanged.

Common sample playback requirements can be divided into two categories, namely designated playback and random playback according to playback objects, wherein the designated playback comprises designated playback of designated records and/or designated batches, for the designated playback requirements, query conditions are directly constructed by designated record identifications and/or batch identifications, and corresponding data are read from a data content table. For example, if 0 to 99 records in batch 1 need to be played back sequentially, a sequence set list { (0,1, 2 … 99} may be generated first, and then the query request may be assembled by using the batch ═ 1 and the record { } as conditions through traversing the sequence set, so as to read the corresponding data from the data content table.

Aiming at the random playback requirement, the basic idea is as follows:

1) determining a return visit requirement, including a range of random reads (e.g., global playback, specified batch playback, etc.) and an object type (record or batch);

2) determining the total number of the object types in the range according to the record information table or the batch information table;

3) generating a random number array comprising n random values selected from the total number, wherein n is the number of sample records or batches required for playback;

4) and traversing the random number array, assembling the query request, and reading corresponding data from the data table.

For example, n records need to be played back randomly from all the global records, the total number of the global records is obtained as sum according to the record information table, a random number array list n _ random (sum) containing n numerical values is generated, then the random number array is traversed, the query request is assembled by taking record list { } as a condition, and corresponding data is read from the data content table.

It should be noted that, the data access method described above is applicable to sample playback data of the same service, and if there are multiple services, in order to implement multiplexing storage of multiple service data, service identification fields (which may be one or more, such as application names, application versions, and the like) may be further added to distinguish different services. In this case, a piece of sample playback data should also contain at least one service identification field, and the service identification field needs to be configured in the record information table, the batch information table, and the data content table at the same time to establish an association between the three tables. From the perspective of data maintenance convenience, generally, different services will use independent record identifiers and batch identifier systems respectively, and theoretically, different services are allowed to share the same identifier system, which does not affect the implementation of the scheme in this specification.

In addition, the record information table, the batch information table, and the data content table in the embodiment of the present specification only represent basic division manners in a logical sense, and in practical applications, one or more of the three tables may be merged or further split without departing from the logic, which all belong to the protection scope of the scheme of the present specification.

The sample playback data access scheme provided in the present specification will be described in detail below with reference to specific examples.

In the field of intelligent decision making, a plurality of practical problems can be solved by utilizing deep reinforcement learning, such as attention of different users, different fatigue degree control, intelligent automatic capacity expansion and capacity reduction according to system pressure and the like, wherein the attention of different users can be caused by the amount of money released by reward money. In a real business environment, not only are data producers (business application ends) distributed, but also data consumers (model training ends) are distributed, and the data consumers need to use a system supporting multiple playback strategies. The specification provides a sample playback data access scheme based on HBase as a storage medium, so that a large-scale distributed real production environment is met.

FIG. 2 is a block diagram of a sample playback data access system;

the online business system generates a large number of business logs which are written by the process of processing to the sample playback component, which includes a write end (write), a read end (read), and an HBase as a persistent carrier, and we design a specific table structure and Rowkey with the HBase to realize all functions of sample playback. The reinforcement learning training system can receive playback data in a cluster mode for training, and can also be used for stand-alone training consumption.

In HBase, Rowkey is the primary key of a row, and data is looked up using a Rowkey or a Rowkey range or scan. There are two concepts of "columns": family and Qualifier, there may be many qualifiers under a Family, so it can be simply understood that the column in HBase is a second-level column, that is Family is a first-level column, Qualifier is a second-level column, and two are parent-child relationships.

According to the basic characteristics of HBase, the structure design of a record information table, a batch information table and a data content table is shown in table 1:

TABLE 1

a) Record information table record meta:

version represents a version number;

app represents a business name;

sum represents the total number of the current available samples, and the value range is [0, max ];

cur represents the current record (i.e. the most recently written record) number, and the cur value is increased by 1 every time 1 record is added, and when max is reached, the loop count is started from 0;

b) batch information table batch meta:

version represents a version number;

app represents a business name;

batch _ sum, which represents the total number of currently available batches, and the value range is [0, maxbatch ]

batch _ cur, representing the current batch (i.e., the most recently written record) number, and when maxbatch is reached, cycle count starts from 0 every time 1 batch _ cur value is replaced;

c) data table record data:

version represents a version number;

app represents a business name;

batch represents a batch number;

cur represents a record number;

data is a kv list field, and can be dynamically extended, for example, for the dqn algorithm, the following form can be used:

[state:xxx][action:xxx][reward:xxxx][next_state:xxx]

the data field can also comprise other information, such as writing time information, so that sample playback data can be selected according to a time range;

in addition, for "randomly selecting record playback according to a specified probability", the data field can also be used for storing the selected probability specified for each sample record. It is understood that the stored content is not limited to the probability value form of [0,1], but may be in other forms as long as the different records can be prioritized. Of course, the designated probability information may be stored in other locations, and the specification does not limit the acquisition source of the designated probability information.

It can be seen that, in the above-described storage structure, the maximum value max of the number of records allowed to be stored and the maximum value maxbatch _ of the number of batches allowed to be stored are limited. And version and app fields are added in the Rowkey as the association of three tables, thereby realizing the multiplexing storage of multiple services/multiple versions. And the version can write time information so as to meet the requirement of selecting sample playback data according to a time range.

Based on the above data structure, for any piece of data to be stored, the sample playback component write logic is as follows:

s201 reads the record meta table to obtain the current total record number cur:

cur＝(cur+1)％max；

s202, reading a batch meta table to obtain a current batch number batch _ cur:

if it is a new batch write, the batch _ cur ═ batch _ cur + 1)% maxbatch,

if not, the batch _ cur is equal to batch _ cur.

S203, splicing data and writing the data into record data:

assembling Rowkey ═ version: app: batch _ cur: cur, the content is the behavior content quadruple, state, action, reward, netx _ state of the current data to be stored, and kv list is written into the row corresponding to the Rowkey in record data.

Assembling Rowkey ═ version: app: null: cur content is behavior content quadruple, state, action, reward and netx _ state of the current data to be stored, and kv list is written into the row corresponding to the Rowkey in record data.

In this embodiment, two rows of records are written for each action: one row is provided with batch number information and is used for realizing the requirement of playback according to batches; one row has no batch number information for implementing global playback requirements. The reason for this is that when the HBase specifies to assemble the Rowkey, it must be assembled field by field in order, so that during global playback (i.e. without using the batch number information as the query condition), batch _ cur cannot be skipped and cur can be assembled directly.

S204, updating a record meta table and a batch meta table:

updating the current total number sum of min (sum +1, max) of the record meta table, and updating the current number value cur of the meta table;

if the record is the last record in the batch, the batch meta table version is updated with the current total number of the app, and batch _ sum is (min batch _ sum +1, max). Meanwhile, the current value batch _ cur in the batch meta table is updated.

The data writing logic of the sample playback component is described above, and several reading logics are also schematically described below:

random reading record:

s301, reading a record meta table to obtain the total number sum of the current records;

s302, calculating a random number array, and selecting n random values from [0, sum-1 ]:

list＝n_random(sum),

s303, traversing each numerical value from the list as cur, assembling Rowkey-version: app: null: cur, reading data from record data table.

Random reading of batches:

s401, reading a batch meta table to obtain the total batch _ sum of the current batch;

s402, calculating a random number array, and selecting n random values from [0, batch _ sum-1 ]:

list＝n_random(batch_sum)

s403, traversing each numerical value from list as batch _ cur, assembling the Rowkey as version, namely app: batch _ cur, wherein the batch is not required to be spliced and recorded with the serial number cur because the batch is integrally used as a reading object, and the whole batch can be read by utilizing a scan method in HBase.

Random record read with assigned probability:

probability priority sampling is a very important means in reinforcement learning, sample priorities can be designed according to various scenes, the time is new or old as the priority, the sample service importance is the priority, the priority is assigned, training efficiency can be rapidly improved, and the specification provides a scheme for consuming data from a queue in a streaming mode and selecting samples according to the assigned probability.

Assuming that the probability of each sample record being selected is determined according to the old and new of the batch in which the record is located, and according to the time sequence written in different batches, the batches are respectively numbered as: 1. 2, 3, … N, where N is the batch maximum. Then it is possible to define:

record i is chosen with probability that record i's batch number/N α

α is a probability correction parameter, which may be a value of (0,1), such as 0.5, 0.8, etc.

It should be noted that, since the maximum value maxbatch of the number of batches allowed to be stored has been previously specified in the present embodiment, batch _ cur is a cycle count, and if maxbatch is not specified, the batch _ cur may be directly used to calculate the probability of being selected here.

Referring to FIG. 3, the probability-assigned random record read logic is as follows:

s501, reading a record meta table to obtain the total number sum of the current records;

s502, calculating a random number array, and selecting n random values from [0, sum-1 ]:

list＝n_random(sum),

s503, traversing each numerical value from the list as cur, assembling Rowkey-version: app: null: cur, reading data from record data table.

It can be seen that S501 to S503 are identical to S301 to S303, and the implementation of random playback according to a specified probability is further described below:

s504, for each record read in S503, determining whether to reserve according to the selected probability:

for any record i:

in one aspect, the probability of being selected, Pi, of record i is determined:

pi ═ batch number/N × α of record i

On the other hand, a random probability value u is generated:

u＝random(1),(u∈[0,1])；

and comparing the sizes of Pi and u, if u is less than Pi, keeping the record i, otherwise, if u is more than or equal to Pi, discarding the record i. It is understood that the flexible arrangement is possible for the case of u-Pi, and this embodiment is only for illustrative purposes.

After S504 is executed for the first time, some of the n records in the list are discarded due to probability, so that the total number of the reserved records is less than n, and S502 to S504 may be repeatedly executed until the total number of the reserved records reaches n. In the process of repeated selection, the same record is generally allowed to be reserved for multiple times, and if the condition is required to be particularly avoided, the screening condition for avoiding the repetition is increased.

Compared with the prior art, the scheme at least comprises the following advantages:

1. by utilizing the high reliability of the HBase cluster, even if part of machines are down or restarted, the loss of sample playback data can not be caused;

2. by utilizing the high throughput performance of the HBase cluster, the mass log samples of the actual on-line production system can be subjected to backflow collection;

3. the cluster deployment can be realized, the limitation of a single-machine memory is avoided, the upper limit of a queue can be set according to the cluster capacity of the whole data set, and the ultra-large-scale data sample set is supported. Meanwhile, due to the large capacity, the data reading and writing speeds of a producer and a user can be matched, and queue samples cannot be jammed in a queue due to the fact that the producer is too fast;

4. the sample playback data structure dynamic schema can be freely defined and dynamically expanded, more information needs to be recorded in the sample when more complex reinforcement learning is needed subsequently, and the whole sample playback component can support dynamic expansion of a data field;

5. and various playback strategies such as global playback, partial playback, sequential playback, random playback and the like are supported.

Corresponding to the above method embodiment, this specification embodiment further provides a sample playback data storage apparatus, and referring to the drawings, the apparatus may include: the identification distribution module 110, the content writing module 120, and the information updating module 130, for any piece of data to be stored:

the identifier allocating module 110 is configured to allocate a record identifier to the data to be stored according to a record information table; distributing batch identification for the data to be stored according to a batch information table;

the content writing module 120 is configured to splice the allocated record identifier, batch identifier, and content of the data to be stored according to a storage structure of a data content table, and write a splicing result into the data content table;

the information updating module 130 is configured to update the record information table and the batch information table.

According to a specific embodiment provided in the present specification, the identifier assigning module 110 may be specifically configured to:

judging whether the data to be stored is the same as the last stored data in batch or not;

if so, allocating a batch identifier which is the same as the last piece of stored data to the data to be stored;

otherwise, distributing a new batch identifier for the data to be stored.

According to a specific implementation manner provided in this specification, the content writing module 120 may be specifically configured to, for a piece of data to be stored, splice two records and write into a data content table, where the two records are respectively:

records carrying batch identifiers are used for realizing the requirement of batch-by-batch playback;

and the record which does not carry the batch identifier is used for realizing the global playback requirement.

According to an embodiment provided in the present specification, the recording information table may be further configured to store a total number of recordings of the written sample playback data;

the information update module 130 may also be configured to: the total number of records is updated.

According to an embodiment provided in the present specification, the batch information table may be further configured to store a total number of batches of the written sample playback data;

the information update module 130 may also be configured to: the total number of batches is updated.

According to a specific embodiment provided by the present specification, a maximum value of the number of recordings of sample playback data allowed to be stored is configured in advance for a data content table;

the identity assignment module 120 may be specifically configured to: and allocating record identification for the data to be stored by taking the maximum value of the record number as a counting period.

According to a specific embodiment provided by the present specification, a maximum value of the number of batches of sample playback data allowed to be stored is preconfigured for a data content table;

the identity assignment module 120 may be specifically configured to: and allocating the batch identification to the data to be stored by taking the maximum value of the batch number as a counting period.

Referring to fig. 5, the present specification also provides a sample playback data reading apparatus, which may include:

a playback requirement determining module 210, configured to determine that the playback requirement is: randomly selecting record playback;

a total record number determining module 220, configured to obtain a total record number sum of the written sample playback data according to the record information table;

a data reading module 230, configured to generate a random number array, where the random number array includes n random values selected from the sum record identifiers, where n is a number of sample records required for playback; traversing the random number array to execute the following steps to obtain n sample playback data records: and taking any numerical value in the array as a record identifier, and reading the sample playback data with the record identifier from the data content table.

Referring to fig. 6, according to an embodiment provided in the present specification, if the playback requirement is specifically: randomly selecting record playback according to the designated probability; the sample playback data reading apparatus may further include:

a data selecting module 240, configured to determine, for each sample playback data record obtained by the data reading module, a designated probability of being selected for the record;

a loop control module 250 for repeatedly performing the following steps until the number of remaining records reaches n:

and generating a random value, if the random value is less than the designated selection probability of the record, keeping the record, and otherwise, discarding the record.

Referring to fig. 7, the present specification also provides a sample playback data reading apparatus, which may include:

a playback requirement determining module 310, configured to determine that the playback requirement is: randomly selecting batches for playback;

a batch total number determining module 320, configured to obtain a batch total number batch _ sum of the written sample playback data according to the batch information table;

the data reading module 330 is configured to generate a random number array, where the random number array includes n random values selected from the batch _ sum record identifiers, where n is a number of sample batches required for playback; traversing the random number array to execute the following steps to obtain n sample playback data batches: and taking any numerical value in the array as a batch identifier, and reading the sample playback data with the batch identifier from the data content table.

Embodiments of the present specification also provide a computer device, which at least includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the aforementioned sample playback data storage or reading method when executing the program.

Fig. 8 is a schematic diagram illustrating a more specific hardware structure of a computing device according to an embodiment of the present disclosure, where the computing device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.

The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).

Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.

It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

Embodiments of the present specification also provide a computer-readable storage medium on which a computer program is stored, which when executed by a processor, implements the aforementioned sample playback data storage or reading method.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

From the above description of the embodiments, it is clear to those skilled in the art that the embodiments of the present disclosure can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the embodiments of the present specification may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments of the present specification.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points. The above-described apparatus embodiments are merely illustrative, and the modules described as separate components may or may not be physically separate, and the functions of the modules may be implemented in one or more software and/or hardware when implementing the embodiments of the present disclosure. And part or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The foregoing is only a specific embodiment of the embodiments of the present disclosure, and it should be noted that, for those skilled in the art, a plurality of modifications and decorations can be made without departing from the principle of the embodiments of the present disclosure, and these modifications and decorations should also be regarded as the protection scope of the embodiments of the present disclosure.

Claims

1. A sample playback data storage method is provided, a recording information table, a batch information table and a data content table are configured;

for any piece of data to be stored, the following operations are performed:

and updating the record information table and the batch information table.

2. The method of claim 1, wherein assigning a batch identification to the data to be stored comprises:

and otherwise, distributing a new batch identifier for the data to be stored.

3. The method according to claim 1, wherein for a piece of data to be stored, two records are spliced and written into the data content table, and the two records are respectively:

4. The method of claim 1, the recording information table further for storing a total number of recordings of the written sample playback data;

the updating the record information table further comprises: and updating the total number of records.

5. The method of claim 1, the batch information table further for storing a total number of batches of written sample playback data;

the updating the batch information table further comprises: updating the total number of batches.

6. The method of claim 1, wherein a maximum value of a recording number of sample playback data allowed to be stored is preconfigured for the data content table;

the allocating record identification for the data to be stored includes: and allocating record identification for the data to be stored by taking the maximum value of the record number as a counting period.

7. The method of claim 1, wherein a maximum number of batches of sample playback data allowed to be stored is preconfigured for the data content table;

the allocating the batch identifier for the data to be stored includes: and allocating the batch identification to the data to be stored by taking the maximum value of the batch number as a counting period.

8. The method of claim 1, wherein the record information table, the batch information table, and the data content table utilize a service identification field as an association field to support the multiplexing storage of multiple service data.

9. A sample playback data reading method, the method comprising:

determining playback requirements as: randomly selecting record playback; wherein, 1 record corresponds to 1 behavior data;

10. The method of claim 9, wherein the playback requirements are in particular: randomly selecting record playback according to the designated probability;

the method further comprises the following steps:

for each obtained sample playback data record, determining a designated probability of being selected for the record;

the following steps are repeatedly performed until the number of retained records reaches n:

11. A sample playback data reading method, the method comprising:

determining playback requirements as: randomly selecting batches for playback; wherein 1 batch corresponds to 1 behavior data set;

12. A sample playback data storage device is provided with a record information table, a batch information table and a data content table;

13. The apparatus according to claim 12, wherein the identifier assigning module is specifically configured to:

and otherwise, distributing a new batch identifier for the data to be stored.

14. The apparatus according to claim 12, wherein the content writing module is specifically configured to, for a piece of data to be stored, splice two records and write the two records into a data content table, where the two records are respectively:

15. The apparatus of claim 12, the recording information table further for storing a total number of recordings of the written sample playback data;

the information update module is further configured to: and updating the total number of records.

16. The apparatus of claim 12, the batch information table further configured to store a total number of batches of the written sample playback data;

the information update module is further configured to: updating the total number of batches.

17. The apparatus according to claim 12, wherein a maximum value of the number of recordings of sample playback data allowed to be stored is preconfigured for the data content table;

the identifier assignment module is specifically configured to: and allocating record identification for the data to be stored by taking the maximum value of the record number as a counting period.

18. The apparatus of claim 12, wherein a maximum number of batches of sample playback data allowed to be stored is preconfigured for the data content table;

the identifier assignment module is specifically configured to: and allocating the batch identification to the data to be stored by taking the maximum value of the batch number as a counting period.

19. The apparatus of claim 12, wherein the record information table, the batch information table, and the data content table utilize a service identification field as an association field to support a multiplexing storage of multiple service data.

20. A sample playback data reading apparatus, the apparatus comprising:

a playback requirement determining module, configured to determine that the playback requirement is: randomly selecting record playback; wherein, 1 record corresponds to 1 behavior data;

21. The apparatus of claim 20, wherein the playback requirement is specifically: randomly selecting record playback according to the designated probability;

the device further comprises:

the data selection module is used for determining the designated selection probability for each sample playback data record obtained by the data reading module;

a loop control module for repeatedly performing the following steps until the number of remaining records reaches n:

22. A sample playback data reading apparatus, the apparatus comprising:

a playback requirement determining module, configured to determine that the playback requirement is: randomly selecting batches for playback; wherein 1 batch corresponds to 1 behavior data set;

23. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the sample playback data storage method of any of claims 1 to 8.

24. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the sample playback data reading method of any one of claims 9 to 11 when executing the program.