CN110737747A

CN110737747A - data operation method, device and system

Info

Publication number: CN110737747A
Application number: CN201910862672.6A
Authority: CN
Inventors: 何庆安; 李晶晶
Original assignee: Suning Cloud Computing Co Ltd
Current assignee: Suning Cloud Computing Co Ltd
Priority date: 2019-09-12
Filing date: 2019-09-12
Publication date: 2020-01-31
Also published as: CA3154763A1; WO2021047323A1

Abstract

The embodiment of the application discloses data processing methods, devices and systems, wherein the method comprises the steps of receiving a data operation request sent by a requester, wherein the data operation request comprises a query word and an operation instruction, querying in memory index data according to the query word, determining th target data containing target document identification, establishing the memory index data based on the corresponding relation between the document identification and partial keywords in the disk index data, and executing corresponding operation on th target data according to the operation instruction.

Description

data operation method, device and system

Technical Field

The present application relates to the field of data manipulation, and in particular, to data manipulation method, apparatus, and system.

Background

In order to solve the data search problem in the big data state, the data search is carried out by establishing indexes by using a search engine, so that the data query efficiency is improved to a great extent.

The indexing is usually performed by a forward index and a reverse index. Both of these ways are queried by keywords.

The forward index takes document IDs as keys, and each document ID corresponds to the number and position of occurrences of keywords contained in the document ID. If a keyword is to be queried, the information of the keyword in each document needs to be scanned until all documents containing the keyword to be queried are found.

The inverted index is a key, and when all documents ID. corresponding to the key are queried, all document IDs including the key can be directly obtained times as long as the key is queried.

The current index data is stored in a disk, and when reading or updating and other operations are needed, the query and corresponding operations are carried out on the disk on the basis of keywords.

For example, the e-commerce platform has a large amount of business data, part of data such as price, stock and the like has high change frequency, and in practical application, such data often needs to be read, such as price reading, stock data sorting and the like.

But the current mode has lower timeliness:

the data change frequency of the fields is high, and frequent changes such as price, stock and the like cause the increment updating to be inefficient and occupy large index space.

, the updating of the fields needs to be realized by rebuilding the index of the total amount of commodities, such as the coupon rule, changes of the coupon rule may update commodities of millions or even millions of scales, and the rule is relatively poor in effectiveness by rebuilding the index of the total amount of commodities, and usually needs several hours to be effective.

Therefore, how to ensure the timeliness is a problem to be solved at present.

Disclosure of Invention

The application provides a method of data manipulation, the method comprising:

receiving a data operation request sent by a requester; the data operation request comprises a query word and an operation instruction;

inquiring in the memory index data according to the inquiry words, and determining th target data containing target document identification, wherein the memory index data is established based on the corresponding relation between the document identification and partial key words in the disk index data;

and performing corresponding operation on the th target data according to the operation instruction.

Preferably, the data operation request is a data acquisition request;

the method further comprises the following steps:

inquiring in the disk index data according to the target document identification to obtain second target data;

the executing the corresponding operation on the th target data according to the operation instruction comprises:

and replacing the corresponding part in the second target data according to the th target data, generating final target data and returning the final target data to the data requester.

Preferably, the memory index comprises memory forward-ranking index data formed by a two-dimensional array, wherein the th dimension of the array is the document identifier corresponding to the th class keyword, and the second dimension is the second class keyword corresponding to the th class keyword.

Preferably, the data operation request is a data update request;

the querying in the memory index data according to the query word, and the determining th target data includes:

determining a target array in the memory forward-ranking index data according to the document identification corresponding to the th keyword in the query term;

determining th target data corresponding to the second type of key words in the query words from the target array;

the th target data is updated.

Preferably, the data operation request is a data acquisition request;

and obtaining th target data and sending the target data to the request sender.

Preferably, the two-dimensional array consists of a document identification array corresponding to the commodity and commodity price arrays of cities in the country; subscripts of the commodity prices in the array are cities corresponding to the commodity prices;

determining a corresponding target document identifier according to the target commodity;

determining a target commodity price array of each city across the country corresponding to the target document identification;

determining a target subscript of the target city in a target commodity price array;

determining the price of the target subscript position in the target commodity price array as th target data.

Preferably, the memory index includes memory inverted index data; the method further comprises the following steps:

receiving the establishing process of the memory inverted index data:

acquiring a fourth type of keywords and a fifth type of keyword set corresponding to the fourth type of keywords;

and establishing a fourth class keyword and a corresponding document identifier set according to the corresponding relation between the prestored document identifiers and the fifth class keywords to form the memory inverted index data.

Preferably, the fourth type of keyword is a coupon rule, and the fifth type of keyword is a commodity;

the process for establishing the memory inverted index data comprises the following steps:

acquiring a coupon rule and a commodity set corresponding to the coupon rule;

and establishing a corresponding relation between a coupon rule and a corresponding document identifier set according to the prestored corresponding relation between the document identifier and the commodity to form the memory inverted index data.

Another aspect of the present application also discloses data manipulation apparatus, comprising:

the request receiving unit is used for receiving a data operation request sent by a requester; the data operation request comprises a query word and an operation instruction;

the target data determining unit is used for inquiring in the memory index data according to the inquiry words and determining target data containing target document identification, wherein the memory index data is established based on the corresponding relation between the document identification and partial key words in the disk index data;

and the operation execution unit is used for executing corresponding operation on the th target data according to the operation instruction.

The present application further discloses in an aspect , computer systems comprising:

or more processors, and

memory associated with the or more processors for storing program instructions that when read executed by the or more processors perform the following:

According to the specific embodiments provided herein, the present application discloses the following technical effects:

according to the technical scheme, in addition to the disk index, memory index data are established for partial keywords of the disk index based on the corresponding relation between the keywords in the disk index and the document identification, and the updating and reading operation of the partial keywords can be directly carried out in the memory index. Therefore, the keywords with high updating frequency can be independently updated and subsequently read in the memory index, the disk does not need to be frequently operated, the whole data in the disk does not need to be updated, the efficiency is improved, and the disk is prevented from being excessively used.

Further , the correspondence between the document id and the keyword in the disk index and the memory index is , so that for the situation that more detailed information needs to be obtained, the corresponding full amount of document data in the disk index can be obtained and combined with the latest data in the memory index to obtain the final data.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a block diagram of a system provided in an embodiment of the present application;

FIGS. 2-6 are schematic illustrations of example ;

FIGS. 7-9 are schematic diagrams of the second embodiment;

FIG. 10 is a flow chart of a method provided by an embodiment of the present application;

FIG. 11 is a block diagram of an apparatus according to an embodiment of the present disclosure;

FIG. 12 is a diagram of a computer system architecture provided by an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application , rather than all embodiments.

The method is characterized in that the data format of the disk index is not changed, partial fields, particularly fields with high updating frequency, in the disk index are placed in the memory index, forward-arranged index and reverse-arranged index data of each keyword field and a document identifier are established steps in the memory based on the reverse-arranged index relationship between the fields (service main keys) and the document identifiers established in advance in the memory index, updating reading of the fields and establishing reading of the reverse-arranged index are directly performed in the forward-arranged index of the memory, wherein the memory index is established on the basis of the corresponding relationship between the document identifiers and the fields in the disk index, so that the disk index data can be read in the step and the data in the memory and the data in the disk index can be seamlessly combined.

As shown in fig. 1, a data operation request of a user is sent to a memory index for query, and when detailed data is needed, the data operation request of the user is sent to a disk index for query, and is combined with the memory index data for replacement to obtain the finally needed data.

The following describes the establishment and operation of the forward index data and the reverse index data in the memory index, taking the business data of the e-commerce platform as an example:

example

Considering that the price data of each city of the commodity in the data of the e-commerce platform is updated frequently, in embodiment , the price information of the commodity in each city is used to create the forward index data in the in-memory index.

As shown in fig. 2 and 3, firstly, two pieces of basic data are established in the internal memory index database, is the inverted index relationship between the indexed document id and the main service key (commodity), and is the mapping relationship between the city and the array subscript, wherein the relationships result from the relationship in the disk index.

The two mapping relations are relied on to create two-dimensional arrays with the length being the maximum document number in the related data segment, as shown in FIG. 4, each position of the array identifies the price information of commodities, document id is used to obtain data, price arrays of each city are stored in each position of the obtained data, each subscript position of the price array stores the price of the corresponding city, for example, the position with document id 3 and subscript 2 in the array represents the Shanghai price corresponding to red rice note 3.

As shown in a scene in fig. 5, when the beijing price of the product iphone8 is updated, only the corresponding document id (0) and the corresponding city subscript (1) need to be taken out according to the relationship, the price array of the city is taken out from the forward index by the document id, and the price is directly replaced at the corresponding position.

For example, when the scenario two queries the Shanghai price of the product iphone8, only the corresponding document id (0) and the corresponding city subscript (2) need to be taken out according to the relationship, and the price array of the city is taken out in the forward index through the document id, so as to obtain the price of the city subscript position in the price array.

In the third scenario shown in fig. 6, when querying a mobile phone whose price in nanjing is in the range of 2000-3000, the corresponding price data is obtained from the forward index for the document id set of the recalled mobile phone by implementing the post-filter with prices, and the nanjing price of each commodity is taken out for judgment.

Example two

Taking the issuing of the coupon rule of the e-commerce platform as an example, the coupon rule needs to take effect on a plurality of commodities each time, and the commodities which take effect are different each time, the index is established in the inverted index format, and all the commodities corresponding to times of taking effect can be taken as key.

7-9 are diagrams for implementing inverted indexes of full 30-30 coupons, an active commodity set calculated according to business or big data is constructed by using the mapping relationship between the document id of the inverted index and the business key (commodity) shown in FIG. 7, and inverted arrays are shown in FIG. 8, wherein dimension in the arrays is a coupon rule, dimension is an ordered document id, and represents that the document id. corresponding to the coupon rule takes effect binds the inverted index set to the current search engine.

When operation is needed, intersection calculation can be performed on the inverted index table and the inverted indexes of other query conditions, and a final result set is obtained as shown in fig. 9.

The user can further step obtain the corresponding data in the disk index according to the corresponding document id, because the partial data of the disk index is not updated, the data of the memory index needs to replace the corresponding part in the data obtained from the disk index, and then the final data is obtained and returned to the user.

Taking the second scenario of the embodiment as an example, if the user needs to know other information of the product after querying the shanghai price of the product, i.e., iphone8, the user can query the disk index according to the corresponding document id of the product, i.e., iphone8, to obtain all information of the product, i.e., iphone 8.

Thus, using the above method, fields that are updated or accessed frequently can be quickly updated and queried in the memory index.

EXAMPLE III

The above is a specific embodiment of the present application, and is also applicable to other fields or similar situations, and correspondingly, the present application provides data manipulation methods, as shown in fig. 10, where the method includes:

s101, receiving a data operation request sent by a requester, wherein the data operation request comprises query words and operation instructions, the query words are keywords used for querying, such as commodities, prices, inventory, cities and the like, and can be or more, and the operation instructions are specific operations on data, such as querying, updating, deleting and the like.

S102, inquiring in the memory index data according to the inquiry words, and determining th target data containing target document identification, wherein the memory index data is established based on the corresponding relation between the document identification and partial key words in the disk index data.

Specifically, the reverse index relationship between part of the keywords (service key such as commodity) and the document identifier may be pre-stored in the memory index database based on the reverse index relationship between the keywords in the disk index data and the document identifier.

Then, the memory index data is further built based on the pre-stored inverted index relationship in the memory index database.

S103, corresponding operation is carried out on the th target data according to the operation instruction.

When the data operation request is a data updating request, the data operation request is used for replacing and updating the acquired target data.

, when the data operation request is a data acquisition request and the user wishes to obtain more detailed data, the method further comprises:

Of course, in the present application, a plurality of data may also be obtained from the memory to perform intersection calculation, and the th target data is determined.

The method comprises the steps that different requirements of indexes are considered, memory forward-row index data and memory reverse-row index data are created in a memory index, wherein the memory forward-row index data and the memory reverse-row index data can be represented in an array form, the memory forward-row index data are composed of document ids and corresponding keyword sets, the memory reverse-row index data are composed of keywords and corresponding document id sets, the document ids correspond to service main keys such as commodities and the like, service main keys can correspond to a plurality of document ids, and document ids only correspond to service main keys.

The th dimension of the array is the document id corresponding to the th class key word (such as commodity), the second dimension is the second class key word (such as price) corresponding to the th class key word (such as commodity), preferably, the subscript position of the array can also correspond to the third class key word (such as city)

When data updating is performed on the memory index data in the forward row, the querying is performed on the memory index data according to the query word, and determining th target data includes:

the th target data is updated.

When data acquisition is performed on the memory forward-row index data, the querying is performed on the memory index data according to the query word, and determining th target data comprises:

and obtaining th target data and sending the target data to the request sender.

Aiming at the memory inverted index data, the method also comprises the process of establishing the memory inverted index data:

acquiring a fourth type of key words (such as coupons) and a fifth type of key word (such as commodities) set corresponding to the fourth type of key words;

and establishing a fourth class of keywords and a corresponding document identifier set according to the corresponding relation between the prestored document identifiers and the fifth class of keywords (such as commodities) to form the memory inverted index data.

If the fourth type of keywords are coupon rules, the fifth type of keywords are commodities;

the process of establishing the memory inverted index data includes:

acquiring a coupon rule and a commodity set corresponding to the coupon rule;

When the retrieval is carried out on the memory inverted index data, the corresponding inverted data can be obtained only by inquiring the memory inverted index data according to the fourth type of key words.

When the fifth type of keywords corresponding to the fourth type of keywords need to be changed, inverted index tables can be directly reestablished according to the above process.

And for the invalidated data such as invalidated coupon rules, the inverted index data table obtained by querying the inverted index data in the memory according to the fourth type of key words can be deleted.

Example four

The present application further discloses data manipulation devices corresponding to the fourth embodiment, as shown in fig. 11, the device includes:

a request receiving unit 11, configured to receive a data operation request sent by a requester; the data operation request comprises a query word and an operation instruction;

, a target data determining unit 12, configured to query the memory index data according to the query term, and determine th target data including a target document identifier, where the memory index data is established based on a correspondence between a document identifier in the disk index data and a part of the keywords;

and the operation execution unit 13 is configured to execute a corresponding operation on the th target data according to the operation instruction.

When the data operation request is a data acquisition request, the operation execution unit 13 is configured to return the acquired th target data to the data requester, and when the data operation request is a data update request, the operation execution unit 13 is configured to replace and update the acquired th target data.

, when the data operation request is a data acquisition request and the user wishes to obtain more detailed data, the apparatus further comprises:

a second target data determining unit 14, configured to perform query in the disk index data according to the target document identifier to obtain second target data;

and the final data determining unit 15 is used for replacing the corresponding part in the second target data according to the th target data and generating final target data to return to the data requester.

Of course, the th target data determining unit 12 in the present application may also be configured to obtain multiple data in the memory for intersection calculation to determine the th target data.

When performing data update on the memory forward index data, the -th target data determining unit 12 includes:

the target array determining unit is used for determining a target array in the memory forward-arranged index data according to the document identifier corresponding to the th keyword in the query term;

a target data determining subunit, configured to determine target data from the target array according to the second type of keyword in the query term;

the operation execution unit 13 is configured to update the th target data.

The th target data determining unit 12 is specifically configured to determine whether the data request is data acquisition of the forward index data of the memory

Determining a target array in the forward-ranked index data of the memory according to the document identification corresponding to the th type key word in the query word and determining corresponding th target data from the target array according to the second type key word in the query word;

the operation execution unit 13 is configured to obtain th target data to send to the request sender.

Aiming at the memory inverted index data, the device also comprises an inverted index creating unit which is used for creating the inverted index

Acquiring a fourth type of key words (such as coupons) and a fifth type of key words (such as commodities) set corresponding to the fourth type of key words, and establishing the fourth type of key words and the corresponding document identifier set according to the corresponding relation between the prestored document identifiers and the fifth type of key words (such as commodities) to form the memory inverted index data.

the reverse index creating unit is specifically configured to obtain a coupon rule and a commodity set corresponding to the coupon rule, and establish a correspondence between the coupon rule and a document identifier set corresponding to the coupon rule according to a correspondence between the pre-stored document identifier and the commodity, so as to form the memory reverse index data.

EXAMPLE five

Corresponding to the above method and apparatus, the present application further discloses in aspect computer systems, including:

or more processors, and

Fig. 12 illustrates an architecture of a computer system, which may include, in particular, a processor 1510, a video display adapter 1511, a disk drive 1512, an input/output interface 1513, a network interface 1514, and a memory 1520. The processor 1510, video display adapter 1511, disk drive 1512, input/output interface 1513, network interface 1514, and memory 1520 may be communicatively coupled via a communication bus 1530.

The processor 1510 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or or more Integrated circuits, and is configured to execute related programs to implement the technical solution provided by the present Application.

The Memory 1520 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random access Memory), a static storage device, a dynamic storage device, or the like. The memory 1520 may store an operating system 1521 for controlling the operation of the computer system 1500, a Basic Input Output System (BIOS) for controlling low-level operations of the computer system 1500. In addition, a web browser 1523, a data storage management system 1524, an icon font processing system 1525, and the like can also be stored. The icon font processing system 1525 may be an application program that implements the operations of the foregoing steps in this embodiment of the application. In summary, when the technical solution provided by the present application is implemented by software or firmware, the relevant program codes are stored in the memory 1520 and called for execution by the processor 1510.

The input/output interface 1513 is used for connecting an input/output module to realize information input and output. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The network interface 1514 is used to connect a communication module (not shown) to enable the device to communicatively interact with other devices. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).

The bus 1530 includes paths for communicating information between the various components of the device, such as the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, and the memory 1520.

In addition, the computer system 1500 may also obtain information of specific extraction conditions from the virtual resource object extraction condition information database 1541 for performing condition judgment, and the like.

It should be noted that although the above devices only show the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, the memory 1520, the bus 1530, etc., in a specific implementation, the devices may also include other components necessary for proper operation. Furthermore, it will be understood by those skilled in the art that the apparatus described above may also include only the components necessary to implement the solution of the present application, and not necessarily all of the components shown in the figures.

Based on the understanding that the technical solutions of the present application or portions thereof contributing to the prior art can be embodied in the form of a software product, which can be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing computer devices (which may be personal computers, cloud servers, or network devices, etc.) to execute the methods described in the various embodiments or some portions of the embodiments of the present application.

The system and system embodiments described above are merely illustrative, wherein the elements described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units, i.e., may be located in places, or may be distributed over a plurality of network elements.

The data processing method, apparatus and device provided by the present application are introduced in detail, and the principle and the implementation of the present application are described herein by applying specific examples, and the above description of the embodiments is only used to help understand the method and the core idea of the present application, and meanwhile, for those skilled in the art , there are variations on the specific implementation and the application scope according to the idea of the present application.

Claims

1, A method of data manipulation, the method comprising:

2. The data manipulation method of claim 1 wherein the data manipulation request is a data acquisition request;

the method further comprises the following steps:

3. The data manipulation method of claim 1 wherein said memory index comprises a memory forward index data comprising a two-dimensional array having dimension being said document identifier corresponding to class keywords and dimension two being a class keyword.

4. The data manipulation method of claim 3 wherein the data manipulation request is a data update request;

the th target data is updated.

5. The data manipulation method of claim 3 wherein the data manipulation request is a data acquisition request;

and obtaining th target data and sending the target data to the request sender.

6. The data manipulation method of claim 3 wherein the two-dimensional array is comprised of a document identification array corresponding to the commodity and commodity price arrays for cities across the country; subscripts of the commodity prices in the array are cities corresponding to the commodity prices;

7. The data manipulation method of any one of claims 1 to 6 to wherein the memory index comprises inverted memory index data, the method further comprising:

receiving the establishing process of the memory inverted index data:

8. The data manipulation method of claim 7 wherein the fourth category of keywords are coupon rules and the fifth category of keywords are merchandise;

acquiring a coupon rule and a commodity set corresponding to the coupon rule;

A data manipulation device of the type 9, , said device comprising:

10, , a computer system, comprising:

or more processors, and