WO2017186049A1 - 信息处理方法和装置 - Google Patents

信息处理方法和装置 Download PDF

Info

Publication number
WO2017186049A1
WO2017186049A1 PCT/CN2017/081200 CN2017081200W WO2017186049A1 WO 2017186049 A1 WO2017186049 A1 WO 2017186049A1 CN 2017081200 W CN2017081200 W CN 2017081200W WO 2017186049 A1 WO2017186049 A1 WO 2017186049A1
Authority
WO
WIPO (PCT)
Prior art keywords
label
tag
sequence
binary number
frequency
Prior art date
Application number
PCT/CN2017/081200
Other languages
English (en)
French (fr)
Inventor
徐峰
Original Assignee
北京京东尚科信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东尚科信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京京东尚科信息技术有限公司
Publication of WO2017186049A1 publication Critical patent/WO2017186049A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket

Definitions

  • the present application relates to the field of computer technology, and in particular to the field of Internet technologies, and in particular, to an information processing method and apparatus.
  • the purpose of the present application is to propose an improved information processing method and apparatus to solve the technical problems mentioned in the background section above.
  • the present application provides an information processing method, where the method includes: acquiring an object information set to be processed, where each object information in the object information set is used to describe each object in a preset object set.
  • the included label, each label included in each object in the object set belongs to a preset label set; for each object information in the object information set, according to whether each label in the preset label sequence is object information
  • the described object contains and generates 0 or 1 to form a binary number equal to the length of the tag sequence, wherein the number 1 is generated when the tag is included by the object, and the number 0 is generated when the tag is not included by the object, the tag
  • the sequence is formed by arranging the labels in the set of labels in a set order; performing a compression step on the binary number Steps to form compressed data, the compressing step comprising: dividing the binary number into at least one segment by a preset length; adding a separator between adjacent segments of the binary number and continuing the end in each segment 0 is removed; the compressed data is stored.
  • the compressing step further comprises: removing the consecutive zeros at the end of the binary number before dividing the binary number into at least one segment by a preset length; or in the binary number After adding a separator between adjacent segments and removing the consecutive zeros at the end of each segment, the consecutive separators at the end of the binary are removed.
  • the preset length is 64.
  • the object information in the object information set is variable
  • the method further includes: after reaching a preset time point, acquiring each tag in the tag set at the current time in the object set
  • the frequency of occurrences in the tag sequence is updated according to the frequency of occurrence of the acquired tags;
  • the compressed data is updated according to the location update of the tags in the tag sequence.
  • the obtaining the frequency of occurrence of each label in the label set in the set of objects at the moment includes: obtaining an appearance frequency of each label in the object set stored when the at least one historical time point is acquired; Data fitting is performed on the frequency of occurrence of each of the acquired tags to predict the frequency of occurrence of each tag in the tag set in the set of objects at this time.
  • the updating the location of the label in the label sequence according to the frequency of occurrence of each acquired label comprises: generating an ideal label sequence according to an appearance frequency of the label in the label set, wherein the ideal label
  • the sequence is a sequence of tags corresponding to the minimum compressed data storage space; determining the change of the position of each tag when the tag sequence is changed to the ideal tag sequence to reduce the storage space occupied by the compressed data is beneficial to optimize the weight of the storage. And selecting at least one tag having the largest weight as the tag of the position to be transformed, and transforming the selected tag to a position indicated by the ideal tag sequence.
  • the updating the compressed data according to the location update of the tags in the tag sequence comprises: first, for the two tags in the tag sequence in which the positions change alternately, the first of the two tags The digit of the label is copied to the new location, and the digit of the first label is simultaneously written in the new location and the original position of the first label during the copying process, and the original position of the first label is cleared after the copying is completed. Number and will The digit read/write operation of the first label is switched to the new position; the digit of the second label of the two labels is copied to the original position of the first label, and the original position and location of the second label are simultaneously in the copying process The original position of the first label is used to write the digit of the second label.
  • the digit of the original position of the second label is cleared, and the reading and writing operation of the digit of the second label is switched to the first label.
  • the old location of the first tag stored in the new location is copied to the original location of the second tag, and the new location and the original location of the second tag are simultaneously
  • the digit of the first tag performs a write operation, and after the copying is completed, the digit of the new location is cleared and the read/write operation of the digit of the first tag is switched to the original location of the second tag.
  • the present application provides an information processing apparatus, where the apparatus includes: an acquiring unit, configured to acquire an object information set to be processed, where each object information in the object information set is used to describe a preset object, respectively. a label included in each object in the set, each label included in each object in the object set belongs to a preset label set; and a generating unit is configured to sequentially, according to the preset label sequence, the object information in the object information set Whether each tag in the object is included by the object described by the object information to generate 0 or 1 to form a binary number equal to the length of the tag sequence, wherein the tag is generated when the tag is included by the object, and the tag is not included in the object when the tag is included And generating a number 0, the label sequence is formed by arranging the order of label settings in the label set; and a compression unit, configured to perform a compression step on the binary number to form compressed data, the compression The step includes: dividing the binary number into at least one segment by a preset length; adjacent segments of
  • the compressing step performed by the compression unit specifically includes: removing the consecutive zeros at the end of the binary number before dividing the binary number into at least one segment by a preset length; or in the binary After adding a separator between adjacent segments of a number and removing consecutive zeros at the end of each segment, the consecutive separators at the end of the binary number are removed.
  • the predetermined length is 64.
  • the object information in the object information set is variable
  • the device further includes: a frequency obtaining unit, configured to acquire the current time after reaching the preset time point a frequency of occurrence of each of the tags in the set of tags; a sequence update unit, configured to update a location of the tags in the tag sequence according to the frequency of occurrence of the acquired tags; a data update unit, Used to update the compressed data according to the location update of the tags in the tag sequence.
  • the frequency acquisition unit includes: a history frequency acquisition subunit, configured to acquire an appearance frequency of each label in the object set stored at the time of at least one historical time point; and a prediction subunit configured to acquire each of the labels The frequency of occurrence is subjected to data fitting to predict the frequency of occurrence of each tag in the set of tags in the set of objects at this time.
  • the sequence update unit is further configured to: generate an ideal tag sequence according to an appearance frequency of the tags in the tag set, where the ideal tag sequence is a tag sequence corresponding to a compressed data storage space; When the tag sequence is changed to the ideal tag sequence to reduce the storage space occupied by the compressed data, the position change of each tag is beneficial to optimize the weight of the storage; at least one tag with the largest weight is selected as the tag of the location to be transformed, and The selected tag is transformed to the location indicated by the ideal tag sequence.
  • the data updating unit is further configured to: copy the digits of the first label of the two labels to the new location for the two labels in the label sequence that are mutually changed in position, and simultaneously during the copying process.
  • Writing a digit of the first label at the new location and the original location of the first label, and clearing the digit of the original location of the first label after the copying is completed and digitizing the first label The read/write operation is switched to the new location; the digits of the second label of the two labels are copied to the original position of the first label, and the original position of the second label and the original of the first label are simultaneously in the copying process
  • the digit of the first tag stored in the new location is copied to the original location of the second tag, and the number of the first tag is simultaneously in the new location and the original location of the
  • the information processing method and device provided by the application can be arranged according to the set label
  • the corresponding binary number is generated, and the digits of the binary number are segmented and the 0 at the end of each segment is removed, which effectively reduces the length of the final stored digits, and can store a large amount of data with a small memory usage.
  • FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
  • FIG. 2 is a flow chart of one embodiment of an information processing method according to the present application.
  • FIG. 3 is a flow chart of still another embodiment of an information processing method according to the present application.
  • FIG. 4 is a schematic structural diagram of an embodiment of an information processing apparatus according to the present application.
  • FIG. 5 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server of an embodiment of the present application.
  • FIG. 1 illustrates an exemplary system architecture 100 in which an embodiment of an information processing method or information processing apparatus of the present application may be applied.
  • system architecture 100 can include terminal devices 101, 102, 103, network 104, and server 105.
  • the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
  • Network 104 may include various types of connections, such as wired, wireless communication links, fiber optic cables, and the like.
  • the user can interact with the server 105 over the network 104 using the terminal devices 101, 102, 103 to receive or transmit messages and the like.
  • Terminal devices 101, 102, and 103 can be installed Various communication client applications.
  • the terminal devices 101, 102, 103 may be various electronic devices, including but not limited to smart phones, tablets, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio layer 3) MP4 (Moving Picture Experts Group Audio Layer IV) player, laptop portable computer and desktop computer, etc.
  • MP3 players Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio layer 3
  • MP4 Moving Picture Experts Group Audio Layer IV
  • the server 105 may be a server that provides various services, such as a cache server that provides data support to the terminal devices 101, 102, 103.
  • the cache server can perform processing such as reading and writing operations on the received data request, and feed back the processing result (for example, the read data) to the terminal device.
  • the information processing method provided by the embodiment of the present application is generally performed by the server 105. Accordingly, the information processing apparatus is generally disposed in the server 105.
  • terminal devices, networks, and servers in Figure 1 is merely illustrative. Depending on the implementation needs, there can be any number of terminal devices, networks, and servers.
  • the information processing method includes the following steps:
  • Step 201 Acquire a set of object information to be processed.
  • the electronic device for example, the server shown in FIG. 1 on which the information processing method runs can acquire the object information set that needs to be processed from another device (for example, a database server) through a wired connection manner or a wireless connection manner.
  • Individual object information in Each object information in the object information set is used for a label included in each object in the object set.
  • the content in the object is a label and the included labels belong to the preset label collection.
  • the above wireless connection manner may include but is not limited to 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connection methods that are now known or developed in the future. .
  • Step 202 For each object information of the object information set, sequentially generate 0 or 1 according to whether the respective tags in the preset tag sequence are included by the object information to form a binary number equal to the length of the tag sequence.
  • the above electronic The device performs the following processing for each object information separately.
  • the electronic device sequentially generates 0 or 1 according to whether the respective tags in the preset tag sequence are included by the object described by the object information.
  • the tag sequence is formed by arranging the tags in the tag set in the order in which the tags are preset.
  • the order may be generated by sorting the appearance frequency in the object set from large to small, and the label of the top ranked label is usually higher than the label of the latter.
  • the order may be sorted strictly from frequency to size.
  • the electronic device can sequentially determine whether each label is included by the current object from Label1 to LabelN, and generate 0 or 1 according to the judgment result. Among them, the inclusion time value is 1, and the time value is not included. For example, when there are 500 labels from label001 to label500 in the label sequence, if the object has two labels, label01 and label130, the generated binary number can be: 100000...(129 0) 10000 whil(370 0).
  • Step 203 performing a compression step on the binary number to form the compressed data.
  • the electronic device performs a compression step on the binary number.
  • the compressing step includes dividing the binary number into at least one segment by a preset length; adding a separator between adjacent segments of the binary number and removing consecutive zeros in the end of each segment.
  • the electronic device may segment the binary number, and each segment segmented may be a preset length, which may be, for example, 16 bits or 32 bits. The electronic device can then remove the 0 at the end of each segment.
  • the preset length may be 64 bits. Taking the above binary numbers 100000 together(129 0) 10000 whil(370 0) as an example, the binary number is segmented by 64 bits, the separator is added in the middle of the segment, and then the 0 at the end of each segment is removed, and the compression step is performed.
  • the generated compressed data is 1, 1, 01, ,,,,.
  • the consecutive zeros at the end of the binary number are removed before the binary number is divided into at least one segment by a preset length; or, between adjacent segments of the binary number is added. After separating the symbols and removing the consecutive zeros at the end of each segment, remove the consecutive delimiters at the end of the binary number. So, the above binary number 100000 ... (129 0) 10000 ... (370 0) After data compression processing, the formed compressed data is 1, 01. In this way, the formed compressed data can further reduce the storage space occupied.
  • Step 204 Store the compressed data.
  • the electronic device may store the compressed data formed by step 203.
  • the above embodiment of the present application can generate a corresponding binary number according to the set label arrangement order, segment the digits of the binary number and remove the 0 at the end of each segment, thereby effectively reducing the length of the final stored digits, which can be used. Small memory usage to store large amounts of data.
  • a flow 300 of yet another embodiment of an information processing method is illustrated.
  • the object information in the object information set is variable.
  • the process 300 of the information processing method includes the following steps:
  • Step 301 Acquire a set of object information to be processed.
  • the object information in the object information set is variable.
  • step 301 For specific processing of step 301, reference may be made to step 201 in the corresponding embodiment of FIG. 2.
  • Step 302 For each object information of the object information set, sequentially generate 0 or 1 according to whether the respective tags in the preset tag sequence are included by the object information to form a binary number equal to the length of the tag sequence.
  • step 302 may refer to step 202 in the corresponding embodiment of FIG. 2.
  • Step 303 performing a compression step on the binary number to form the compressed data.
  • step 303 may refer to step 203 in the corresponding embodiment of FIG. 2.
  • Step 304 storing the compressed data.
  • step 304 may refer to step 204 in the corresponding embodiment of FIG. 2.
  • Step 305 After the preset time point is reached, obtain the frequency of occurrence of each tag in the current time tag set in the object set.
  • the electronic device can determine whether the current time reaches a preset time point. When the time period is reached, the electronic device can perform step 304 and subsequent steps. For example, the start time and the time period may be set in advance so that the preset time point may be determined according to the start time and the time period.
  • the electronic device may obtain the frequency of occurrence of each tag in the current time tag set in the object set by using various methods. For example, Label1 appears at frequency X times and Label2 appears at frequency Y times.
  • the frequency of occurrence of each tag in the tag set in the object set in step 304 may be obtained by the following steps: first, acquiring a set of objects stored at least one historical time point The frequency of occurrence of each label. Then, data fitting is performed on the frequency of occurrence of each of the acquired tags to predict the frequency of occurrence of each tag in the tag set in the object set at this time.
  • the implementation method predicts the frequency of the current label by means of the data fitting manner by means of the frequency of occurrence of each label in the historical time point object set, and can quickly obtain the frequency, thereby reducing the overall operation time.
  • Step 306 Update the position of the label in the label sequence according to the frequency of occurrence of each acquired label.
  • each tag in the tag sequence is arranged in advance according to the frequency of occurrence of the tag, and the arrangement from large to small is advantageous to compress the space by deleting 0 at the end of the segment.
  • the object information in the object information set is variable, that is, the tags contained in each object are variable, after a certain time, each tag in the tag sequence may not satisfy the high frequency to the low frequency. Arrangement. Therefore, it is necessary to update the position of the tag in the tag sequence according to the frequency of occurrence of each tag at the current time point. It should be noted that the position of a part of the label may be changed according to the frequency, or the position of all the labels may be rearranged by referring to the latest frequency from the largest to the smallest.
  • step 306 may specifically include the following steps:
  • an ideal tag sequence is generated by arranging the tags in the tag set from large to small according to the frequency of occurrence, wherein the ideal tag sequence is a tag sequence corresponding to the smallest compressed data storage space.
  • the ideal tag sequence is arranged by the frequency of occurrence of the tags from large to small. And generated.
  • the change of the position of each tag is advantageous for optimizing the weight of the storage.
  • the change in the position of each tag helps to optimize the weight of the storage by the position of the tag in the ideal tag sequence.
  • the position may be the serial number of the label in the ideal label sequence, or may be the position of the segment to which each label belongs in all segments after the ideal label sequence is divided into at least one segment according to the preset length, and the label is The location in the segment. Usually, the higher the position, the higher the weight.
  • At least one tag having the largest weight is selected as the tag of the position to be transformed, and the selected tag is transformed to the position indicated by the ideal tag sequence and the selected tag is transformed to the corresponding position in the ideal tag sequence.
  • the labels of the labels with the larger weights in the label sequence may be changed according to the weights of the optimized storages, so that the storage optimization is performed while avoiding more label movements and causing excessive operation time. To achieve a balance between space optimization and time optimization.
  • Step 307 updating the compressed data according to the location update of the label in the label sequence.
  • the electronic device may update the compressed data according to the change in the position of the tag in the tag sequence. Since the binary bits of the generated binary number are in turn corresponding to the tags in the tag sequence. Therefore, when the position of the label in the label sequence changes, it is necessary to adjust the value of the binary bit according to the change of the position of the label to ensure data reliability.
  • the step 307 specifically includes: copying the digits of the first label of the two labels to the new location for the two labels in the label sequence that are mutually changed by the position, the copying process Simultaneously writing a digit of the first label at the new location and the original location of the first label, and clearing the digit of the original location of the first label after the copying is completed and the first label is The digital read/write operation is switched to a new location; the digit of the second label of the two labels is copied to the original position of the first label, and the original position of the second label and the first label are simultaneously in the copying process The original position of the second label is written, and the original position of the second label is cleared after the copying is completed.
  • the flow 300 of the information processing method in this embodiment can continuously optimize storage according to the ever-changing data.
  • the present application provides an embodiment of an information processing apparatus, and the apparatus embodiment corresponds to the method embodiment shown in FIG. Used in a variety of electronic devices.
  • the information processing apparatus 400 of the present embodiment includes an acquisition unit 401, a generation unit 402, a compression unit 403, and a storage unit 404.
  • the obtaining unit 401 is configured to acquire an object information set to be processed, where each object information in the object information set is used to describe a label included in each object in the preset object set, and each label included in each object in the object set is a preset label set;
  • the generating unit 402 is configured to generate 0 or 1 according to each object information in the object information set, according to whether the label in the preset label sequence is included by the object information, to form A binary number equal to the length of the tag sequence, wherein the number 1 is generated when the tag is included by the object, and the number 0 is generated when the tag is not included in the object, and the tag sequence is formed by arranging the tags in the tag set in the set order.
  • the compressing unit 403 is configured to perform a compressing step on the binary number to form the compressed data, and the compressing step includes: dividing the binary number into at least one segment by a preset length; adding a delimiter symbol between adjacent segments of the binary number and The consecutive 0s in the end of each segment are removed; and the storage unit 404 is used to store the compressed data.
  • the specific processing of the obtaining unit 401, the generating unit 402, the compressing unit 403, and the storage unit 404 of the information processing apparatus 400 may refer to step 201, step 202, step 203, and step 204 of the corresponding embodiment of FIG. 2, where No longer.
  • the compressing step performed by the compressing unit 403 specifically includes: before dividing the binary number into at least one segment by a preset length, The consecutive zeros at the end of the binary number are removed; or after the separators are added between adjacent segments of the binary number and the consecutive zeros in the end of each segment are removed, the consecutive separators at the end of the binary number are removed.
  • the preset length is 64.
  • the object information in the object information set is variable
  • the information processing apparatus further includes: a frequency obtaining unit (not shown), configured to obtain the preset time point, obtain The frequency of occurrence of each tag in the current time tag set in the object set; a sequence update unit (not shown) for updating the position of the tag in the tag sequence according to the frequency of occurrence of each acquired tag; the data update unit ( Not shown) for updating the compressed data according to the location update of the tags in the tag sequence.
  • the frequency acquisition unit includes: a historical frequency acquisition subunit (not shown), configured to acquire an appearance frequency of each label in the object set stored at the at least one historical time point; A unit (not shown) is configured to perform data fitting on the frequency of occurrence of each of the acquired tags to predict the frequency of occurrence of each tag in the tag set in the set of objects at this time.
  • the sequence update unit is further configured to: generate an ideal tag sequence according to an appearance frequency of the tags in the tag set, where the ideal tag sequence is a tag sequence corresponding to a compressed data storage space;
  • the position change of each tag is beneficial to optimize the storage weight; at least one tag with the largest weight is selected as the tag to be transformed, and the selected one is selected.
  • the label changes to the position indicated by the ideal label sequence.
  • the data updating unit is further configured to: copy the digits of the first label of the two labels to the new location for the two labels in the label sequence that are mutually changed by the position, During the copying process, the digits of the first label are simultaneously written in the new location and the original position of the first label, and the copy is completed after the copying is completed.
  • FIG. 5 there is shown a block diagram of a computer system 500 suitable for use in implementing a terminal device or server of an embodiment of the present application.
  • computer system 500 includes a central processing unit (CPU) 501 that can be loaded into a program in random access memory (RAM) 503 according to a program stored in read only memory (ROM) 502 or from storage portion 508. And perform various appropriate actions and processes.
  • RAM random access memory
  • ROM read only memory
  • RAM 503 various programs and data required for the operation of the system 500 are also stored.
  • the CPU 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504.
  • An input/output (I/O) interface 505 is also coupled to bus 504.
  • the following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, etc.; an output portion 507 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 508 including a hard disk or the like. And a communication portion 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the Internet.
  • Driver 510 is also coupled to I/O interface 505 as needed.
  • a removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 510 as needed so that a computer program read therefrom is installed into the storage portion 508 as needed.
  • an embodiment of the present disclosure includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart.
  • the computer program can be downloaded and installed from the network via the communication portion 509, and/or installed from the removable medium 511.
  • each block of the flowchart or block diagrams can represent a module, a program segment, or a portion of code that includes one or more logic for implementing the specified.
  • Functional executable instructions can also occur in a different order than that illustrated in the drawings. For example, two successively represented blocks may in fact be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments of the present application may be implemented by software or by hardware.
  • the described unit may also be provided in the processor, for example, as a processor including an acquisition unit, a generation unit, a compression unit, and a storage unit.
  • the names of these units do not constitute a limitation on the unit itself under certain circumstances.
  • the storage unit may also be described as "a unit that stores compressed data.”
  • the present application further provides a non-volatile computer storage medium, which may be a non-volatile computer storage medium included in the apparatus described in the foregoing embodiments; It may be a non-volatile computer storage medium that exists alone and is not assembled into the terminal.
  • the non-volatile computer storage medium stores one or more programs, when the one or more programs are executed by a device, causing the device to: acquire each object information in the object information set to be processed, Each object information in the object information set is used to describe a label included in each object in the preset object set, and each label included in each object in the object set belongs to a preset label set; Generating 0 or 1 according to whether the respective tags in the preset tag sequence are included by the object information described by the object information to form a binary number equal to the length of the tag sequence, wherein the number 1 is generated when the tag is included by the object, Generating a number 0 when the tag is not included by the object, the tag sequence being formed by arranging the tags in the tag set in a set order; performing a compression step on the binary number to form compressed data,
  • the compressing step includes: dividing the binary number into at least one segment by a preset length; adding a separator between adjacent segments of the binary number and removing consecutive zeros in each segment;

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种信息处理方法和装置。所述方法的一具体实施方式包括:获取待处理的对象信息集合(201);对于所述对象信息集合中的各个对象信息,依次根据预设的标签序列中的各个标签是否被对象信息所描述的对象包含而生成0或1,以形成与所述标签序列长度相等的二进制数(202);对所述二进制数执行压缩步骤以形成已压缩数据(203),所述压缩步骤包括:按预设长度将所述二进制数分成至少一个分段;在二进制数的相邻分段之间增加分隔符号并将每个分段中末尾连续的0去掉;存储所述已压缩数据(204)。该实施方式实现了优化存储。

Description

信息处理方法和装置
相关申请的交叉引用
本申请要求于2016年4月27日提交的中国专利申请号为“201610274281.9”的优先权,其全部内容作为整体并入本申请中。
技术领域
本申请涉及计算机技术领域,具体涉及互联网技术领域,尤其涉及信息处理方法和装置。
背景技术
大数据时代,需要将海量的对象存储在存储介质中,例如可高速访问的缓存。对所包含的信息均为标签的对象而言,为了提高存储空间的利用率,需要使用最小的内存空间对描述对象的信息进行存储。现有技术中在对信息均为标签的对象进行描述时,所生成的信息占用的存储空间依然较大,需要进一步压缩。
发明内容
本申请的目的在于提出一种改进的信息处理方法和装置,来解决以上背景技术部分提到的技术问题。
第一方面,本申请提供了一种信息处理方法,所述方法包括:获取待处理的对象信息集合,所述对象信息集合中的各个对象信息分别用于描述预设的对象集合中各个对象所包含的标签,所述对象集合中各个对象包含的各个标签均属于预设的标签集合;对于所述对象信息集合中的各个对象信息,依次根据预设的标签序列中的各个标签是否被对象信息所描述的对象包含而生成0或1,以形成与所述标签序列长度相等的二进制数,其中,当标签被对象包含时生成数字1,当标签不被对象包含时生成数字0,所述标签序列是对所述标签集合中的标签按照设定的顺序进行排列而形成的;对所述二进制数执行压缩步 骤以形成已压缩数据,所述压缩步骤包括:按预设长度将所述二进制数分成至少一个分段;在二进制数的相邻分段之间增加分隔符号并将每个分段中末尾连续的0去掉;存储所述已压缩数据。
在一些实施例中,所述压缩步骤还包括:在所述按预设长度将所述二进制数分成至少一个分段之前,将所二进制数末尾连续的0去掉;或者在所述在二进制数的相邻分段之间增加分隔符号并将每个分段中末尾连续的0去掉之后,去掉二进制数末尾连续的分隔符号。
在一些实施例中,所述预设长度为64。
在一些实施例中,所述对象信息集合中的对象信息是可变的,以及所述方法还包括:达到预设时间点后,获取当前时间所述标签集合中的各个标签在所述对象集合中的出现频次;按照所获取的各个标签的出现频次对所述标签序列中标签的位置进行更新;按照所述标签序列中标签的位置更新对已压缩数据进行更新。
在一些实施例中,所述获取此时所述标签集合中的各个标签在所述对象集合中的出现频次,包括:获取至少一个历史时间点时所存储的对象集合中各个标签的出现频次;对所获取的各个标签的出现频次进行数据拟合,以预测出此时所述标签集合中的各个标签在所述对象集合中的出现频次。
在一些实施例中,所述按照所获取的各个标签的出现频次对所述标签序列中标签的位置进行更新,包括:对所述标签集合中的标签按照出现频次生成理想标签序列,其中理想标签序列是已压缩数据存储空间最小时对应的标签序列;确定从所述标签序列变化为所述理想标签序列以减少已压缩数据所占用的存储空间时、各个标签的位置变化有利于优化存储的权重;选取权重最大的至少一个标签作为待变换位置的标签,并将所选择的标签变换到所述理想标签序列指示的位置。
在一些实施例中,所述按照所述标签序列中标签的位置更新对已压缩数据进行更新,包括:对于所述标签序列中位置相互替换变化的两个标签,将两个标签中的第一标签的数位复制到新位置,复制过程中同时在所述新位置和所述第一标签的原位置对所述第一标签的数位进行写操作,复制完成后清除所述第一标签的原位置的数位并将对该 第一标签的数位的读写操作切换至新位置;将两个标签中的第二标签的数位复制到第一标签的原位置处,复制过程中同时在所述第二标签的原位置和所述第一标签的原位置对该第二标签的数位进行写操作,复制完成后清除所述第二标签的原位置的数位并将对该第二标签的数位的读写操作切换至第一标签的旧位置;将所述新位置存储的所述第一标签的数位复制到所述第二标签的原位置,复制过程中同时在所述新位置和所述第二标签的原位置对所述第一标签的数位进行写操作,复制完成后清除所述新位置的数位并将对所述第一标签的数位的读写操作切换至第二标签的原位置。
第二方面,本申请提供了一种信息处理装置,所述装置包括:获取单元,用于获取待处理的对象信息集合,所述对象信息集合中的各个对象信息分别用于描述预设的对象集合中各个对象所包含的标签,所述对象集合中各个对象包含的各个标签均属于预设的标签集合;生成单元,用于对于对象信息集合中的各个对象信息,依次根据预设的标签序列中的各个标签是否被对象信息所描述的对象包含而生成0或1,以形成与所述标签序列长度相等的二进制数,其中,当标签被对象包含时生成数字1,当标签不被对象包含时生成数字0,所述标签序列是对所述标签集合中的标签设定的顺序进行排列而形成的;压缩单元,用于对所述二进制数执行压缩步骤以形成已压缩数据,所述压缩步骤包括:按预设长度将所述二进制数分成至少一个分段;在二进制数的相邻分段之间增加分隔符号并将每个分段中末尾连续的0去掉;存储单元,用于存储所述已压缩数据。
在一些实施例中,压缩单元执行的压缩步骤具体包括:在所述按预设长度将所述二进制数分成至少一个分段之前,将所二进制数末尾连续的0去掉;或者在所述在二进制数的相邻分段之间增加分隔符号并将每个分段中末尾连续的0去掉之后,去掉二进制数末尾连续的分隔符号。
在一些实施例中,上述预设长度为64。
在一些实施例中,对象信息集合中的对象信息是可变的,以及上述装置还包括:频次获取单元,用于达到预设时间点后,获取当前时 间所述标签集合中的各个标签在所述对象集合中的出现频次;序列更新单元,用于按照所获取的各个标签的出现频次对所述标签序列中标签的位置进行更新;数据更新单元,用于按照所述标签序列中标签的位置更新对已压缩数据进行更新。
在一些实施例中,频次获取单元包括:历史频次获取子单元,用于获取至少一个历史时间点时所存储的对象集合中各个标签的出现频次;预测子单元,用于对所获取的各个标签的出现频次进行数据拟合,以预测出此时所述标签集合中的各个标签在所述对象集合中的出现频次。
在一些实施例中,所述序列更新单元进一步用于:对所述标签集合中的标签按照出现频次生成理想标签序列,其中理想标签序列是已压缩数据存储空间最小时对应的标签序列;确定从所述标签序列变化为所述理想标签序列以减少已压缩数据所占用的存储空间时、各个标签的位置变化有利于优化存储的权重;选取权重最大的至少一个标签作为待变换位置的标签,并将所选择的标签变换到所述理想标签序列指示的位置。
在一些实施例中,所述数据更新单元进一步用于:对于所述标签序列中位置相互替换变化的两个标签,将两个标签中的第一标签的数位复制到新位置,复制过程中同时在所述新位置和所述第一标签的原位置对所述第一标签的数位进行写操作,复制完成后清除所述第一标签的原位置的数位并将对该第一标签的数位的读写操作切换至新位置;将两个标签中的第二标签的数位复制到第一标签的原位置处,复制过程中同时在所述第二标签的原位置和所述第一标签的原位置对该第二标签的数位进行写操作,复制完成后清除所述第二标签的原位置的数位并将对该第二标签的数位的读写操作切换至第一标签的旧位置;将所述新位置存储的所述第一标签的数位复制到所述第二标签的原位置,复制过程中同时在所述新位置和所述第二标签的原位置对所述第一标签的数位进行写操作,复制完成后清除所述新位置的数位并将对所述第一标签的数位的读写操作切换至第二标签的原位置
本申请提供的信息处理方法和装置,可以按照设定的标签排列顺 序生成对应的二进制数,并对二进制数的数位进行分段以及去掉各个分段末尾的0,有效降低了最终存储的数位长度,可以用较小的内存使用量来存储大量数据。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1是本申请可以应用于其中的示例性系统架构图;
图2是根据本申请的信息处理方法的一个实施例的流程图;
图3是根据本申请的信息处理方法的又一个实施例的流程图;
图4是根据本申请的信息处理装置的一个实施例的结构示意图;
图5是适于用来实现本申请实施例的终端设备或服务器的计算机系统的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
图1示出了可以应用本申请的信息处理方法或信息处理装置的实施例的示例性系统架构100。
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有 各种通讯客户端应用。
终端设备101、102、103可以是各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103提供数据支持的高速缓存服务器。高速缓存服务器可以对接收到的数据请求进行读写操作等处理,并将处理结果(例如所读到的数据)反馈给终端设备。
需要说明的是,本申请实施例所提供的信息处理方法一般由服务器105执行,相应地,信息处理装置一般设置于服务器105中。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
继续参考图2,示出了根据本申请的信息处理方法的一个实施例的流程200。所述的信息处理方法,包括以下步骤:
步骤201,获取待处理的对象信息集合。
在本实施例中,信息处理方法运行于其上的电子设备(例如图1所示的服务器)可以通过有线连接方式或者无线连接方式从其他设备(例如数据库服务器)获取需要进行处理的对象信息集合中的各个对象信息。其中,对象信息集合中的各个对象信息分别是用于对象集合中各个对象所包含的标签的。对象中的内容均为标签且所包含的标签均属于预设的标签集合。需要指出的是,上述无线连接方式可以包括但不限于3G/4G连接、WiFi连接、蓝牙连接、WiMAX连接、Zigbee连接、UWB(ultra wideband)连接、以及其他现在已知或将来开发的无线连接方式。
步骤202,对于对象信息集合的各个对象信息,依次根据预设的标签序列中的各个标签是否被对象信息所描述的对象包含而生成0或1,以形成与标签序列长度相等的二进制数。
在本实施例中,基于步骤201中得到的对象信息集合,上述电子 设备对于各个对象信息分别进行以下处理。对于对象信息,电子设备依次根据预设的标签序列中的各个标签是否被对象信息所描述的对象包含而生成0或1。其中,标签序列是对上述标签集合中的标签按照各个标签预先设定的顺序进行排列而形成的。可选的,该顺序可以是按照在对象集合中的出现频次从大到小进行排序而生成的,通常排在前位的标签的出现频次高于后位的标签。可选的,该顺序可以是严格按频次从大到小进行排序的。例如在标签序列中共有N个标签,第一个标签到最后一个标签分别为Label1、Label2、Label3、……、LabelN,则Label1在对象集合中出现的次数大于或等于Label2,Label2在对象集合中出现的次数大于或等于Label3,依次类推。电子设备可以依次从Label1到LabelN,依次判断每个标签是否被当前的对象所包含,根据判断结果生成0或1。其中,包含时值为1,不包含时值为0。例如,当标签序列中有label001至label500共500个标签时,若对象有label01、label130共两个标签,则所生成的二进制数可以是:100000……(129个0)10000……(370个0)。
步骤203,对二进制数执行压缩步骤以形成已压缩数据。
在本实施例中,基于步骤202生成的二进制数,电子设备对该二进制数执行压缩步骤。该压缩步骤包括:按预设长度将二进制数分成至少一个分段;在二进制数的相邻分段之间增加分隔符号并将每个分段中末尾连续的0去掉。在执行压缩步骤时,电子设备可以对二进制数进行切分,所切分的每个分段可以是预设长度,该预设长度例如可以是16位、32位。之后,电子设备可以将每个分段中末尾的0去掉。
在本实施例的一些可选实现方式中,上述预设长度可以64位。以上述二进制数100000……(129个0)10000……(370个0)为例,将二进制数按64位分段,段中间加分隔符,再去掉每段末尾的0,进行压缩步骤所生成的已压缩数据即为1,,01,,,,,。
在本实施例的一些可选实现方式中,在按预设长度将二进制数分成至少一个分段之前,将所二进制数末尾连续的0去掉;或者,在二进制数的相邻分段之间增加分隔符号并将每个分段中末尾连续的0去掉之后,去掉二进制数末尾连续的分隔符号。如此,上述二进制数 100000……(129个0)10000……(370个0)经数据压缩处理后,形成的已压缩数据即为1,,01。通过这种方式,所形成的已压缩数据可以进一步减少所占用的存储空间。
步骤204,存储上述已压缩数据。
在本实施例中,上述电子设备可以将通过步骤203所形成的已压缩数据进行存储。
本申请的上述实施例可以按照设定的标签排列顺序生成对应的二进制数,并对二进制数的数位进行分段以及去掉各个分段末尾的0,有效降低了最终存储的数位长度,可以用较小的内存使用量来存储大量数据。
进一步参考图3,其示出了信息处理方法的又一个实施例的流程300。其中,对象信息集合中的对象信息是可变的。该信息处理方法的流程300,包括以下步骤:
步骤301,获取待处理的对象信息集合。
在本实施例中,其中,对象信息集合中的对象信息是可变的。步骤301的具体处理可以参考图2对应实施例中的步骤201。
步骤302,对于对象信息集合的各个对象信息,依次根据预设的标签序列中的各个标签是否被对象信息所描述的对象包含而生成0或1,以形成与标签序列长度相等的二进制数。
在本实施例中,步骤302的具体处理可以参考图2对应实施例中的步骤202。
步骤303,对二进制数执行压缩步骤以形成已压缩数据。
在本实施例中,步骤303的具体处理可以参考图2对应实施例中的步骤203。
步骤304,存储已压缩数据。
在本实施例中,步骤304的具体处理可以参考图2对应实施例中的步骤204。
步骤305,达到预设时间点后,获取当前时间标签集合中的各个标签在对象集合中的出现频次。
在本实施例中,电子设备可以判断当前时间是否达到预设时间点,当达到该时间周期时,电子设备可以执行步骤304以及后续的步骤。例如,可以预先设置起始时间和时间周期,从而可以根据起始时间和时间周期确定上述预设时间点。
当达到上述预设时间点后,电子设备可以通过各种方法获取当前时间标签集合中的各个标签在对象集合中的出现频次。例如,Label1出现频次为X次,Label2出现频次为Y次。
在本实施例的一些可选实现方式中,步骤304中标签集合中的各个标签在对象集合中的出现频次可以是通过以下步骤获取的:首先,获取至少一个历史时间点时所存储的对象集合中各个标签的出现频次。之后,对所获取的各个标签的出现频次进行数据拟合,以预测出此时标签集合中的各个标签在对象集合中的出现频次。该实现方式通过数据拟合方式,借助历史时间点对象集合中各个标签的出现频次预测出当前标签的频次,可以通过快速获取频次,从而减少整体的操作时间。
步骤306,按照所获取的各个标签的出现频次对标签序列中标签的位置进行更新。
在本实施例中,由于标签序列中的各个标签是预先按照标签出现频次从大到小进行排列的,且从大到小进行排列有利于通过删除段末的0来压缩空间。由于对象信息集合中的对象信息是可变的,即各个对象中所包含的标签是可变的,则在经过一定时间后,标签序列中的各个标签可能已不满足从高频次到低频次的排布。因此,需要根据当前时间点各个标签的出现频次对标签序列中的标签位置进行更新。需要说明的是,可以根据频次对一部分标签的位置进行变化,也可以对所有标签的位置参照最新的频次从大到小进行重新排列。
在本实施例的一些可选实现方式中,步骤306可以具体包括以下步骤:
首先,对标签集合中的标签按照出现频次从大到小排列生成理想标签序列,其中理想标签序列是已压缩数据存储空间最小时对应的标签序列。通常该理想标签序列是对标签按出现频次从大到小进行排列 而生成的。
其次,确定从标签序列变化为理想标签序列以减少已压缩数据所占用的存储空间时、各个标签的位置变化有利于优化存储的权重。可选的,各个标签的位置变化有利于优化存储的权重可以通过标签在理想标签序列中的位置进行确定。该位置可以是标签在理想标签序列中的序号,也可以是对理想标签序列按照上述预设长度进分成至少一个分段后、各个标签所属的分段在所有分段中的位置以及该标签在所处分段中的位置。通常,位置越靠前则意味着权重越高。
之后,选取权重最大的至少一个标签作为待变换位置的标签,并将所选择的标签变换到理想标签序列指示的位置并将所选择的标签变换到理想标签序列中相应的位置。
在该实现方式中,可以按照各个标签对优化存储的权重对标签序列中权重较大的标签进行位置变化,从而使得在进行存储优化的同时,尽可能避免较多的标签移动造成操作时间过长,实现空间优化和时间优化的平衡。
步骤307,按照标签序列中标签的位置更新对已压缩数据进行更新。
在本实施例中,电子设备可以对已压缩数据按照标签序列中标签的位置变化进行更新。由于所生成的二进制数的各个二进制位依次与标签序列中标签一一对应。因此,当标签序列中的标签位置变化时,需要根据标签位置的变化进行二进制位数值的调整,以保证数据可靠性。
在本实施例的一些可选实现方式中,步骤307具体包括:对于所述标签序列中位置相互替换变化的两个标签,将两个标签中的第一标签的数位复制到新位置,复制过程中同时在所述新位置和所述第一标签的原位置对所述第一标签的数位进行写操作,复制完成后清除所述第一标签的原位置的数位并将对该第一标签的数位的读写操作切换至新位置;将两个标签中的第二标签的数位复制到第一标签的原位置处,复制过程中同时在所述第二标签的原位置和所述第一标签的原位置对该第二标签的数位进行写操作,复制完成后清除所述第二标签的原位 置的数位并将对该第二标签的数位的读写操作切换至第一标签的旧位置;将所述新位置存储的所述第一标签的数位复制到所述第二标签的原位置,复制过程中同时在所述新位置和所述第二标签的原位置对所述第一标签的数位进行写操作,复制完成后清除新位置的数位并将对第一标签的数位的读写操作切换至第二标签的原位置。该实现方式中,在数据更新时可以保证正常的数据读写,不影响外部对数据的使用。
从图3中可以看出,与图2对应的实施例相比,本实施例中的信息处理方法的流程300可以根据数据的不断变化,不断优化存储。
进一步参考图4,作为对上述各图所示方法的实现,本申请提供了一种信息处理装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如4所示,本实施例的信息处理装置400包括:获取单元401、生成单元402、压缩单元403和存储单元404。其中,获取单元401用于获取待处理的对象信息集合,对象信息集合中的各个对象信息分别用于描述预设的对象集合中各个对象所包含的标签,对象集合中各个对象包含的各个标签均属于预设的标签集合;生成单元402用于对于对象信息集合中的各个对象信息,依次根据预设的标签序列中的各个标签是否被对象信息所描述的对象包含而生成0或1,以形成与标签序列长度相等的二进制数,其中,当标签被对象包含时生成数字1,当标签不被对象包含时生成数字0,标签序列是对标签集合中的标签按照设定的顺序进行排列而形成的;压缩单元403用于对二进制数执行压缩步骤以形成已压缩数据,压缩步骤包括:按预设长度将二进制数分成至少一个分段;在二进制数的相邻分段之间增加分隔符号并将每个分段中末尾连续的0去掉;而存储单元404用于存储已压缩数据。
在本实施例中,信息处理装置400的获取单元401、生成单元402、压缩单元403和存储单元404的具体处理可以参考图2对应实施例的步骤201、步骤202、步骤203和步骤204,这里不再赘述。
在本实施例的一些可选实现方式中,压缩单元403执行的压缩步骤具体包括:在按预设长度将二进制数分成至少一个分段之前,将所 二进制数末尾连续的0去掉;或者在在二进制数的相邻分段之间增加分隔符号并将每个分段中末尾连续的0去掉之后,去掉二进制数末尾连续的分隔符号。该实现方式的具体处理可以参考图2对应实施例中相应实现方式的描述,这里不再赘述。
在本实施例的一些可选实现方式中,上述预设长度为64。
在本实施例的一些可选实现方式,上述对象信息集合中的对象信息是可变的,以及信息处理装置还包括:频次获取单元(未示出),用于达到预设时间点后,获取当前时间标签集合中的各个标签在对象集合中的出现频次;序列更新单元(未示出),用于按照所获取的各个标签的出现频次对标签序列中标签的位置进行更新;数据更新单元(未示出),用于按照标签序列中标签的位置更新对已压缩数据进行更新。该实现方式的具体处理可以参考图3对应实施例中的描述。
在本实施例的一些可选实现方式,频次获取单元包括:历史频次获取子单元(未示出),用于获取至少一个历史时间点时所存储的对象集合中各个标签的出现频次;预测子单元(未示出),用于对所获取的各个标签的出现频次进行数据拟合,以预测出此时标签集合中的各个标签在对象集合中的出现频次。该实现方式的具体处理可以参考图3对应实施例中相应实现方式的描述。
在本实施例的一些可选实现方式,序列更新单元进一步用于:对标签集合中的标签按照出现频次生成理想标签序列,其中理想标签序列是已压缩数据存储空间最小时对应的标签序列;确定从标签序列变化为理想标签序列以减少已压缩数据所占用的存储空间时、各个标签的位置变化有利于优化存储的权重;选取权重最大的至少一个标签作为待变换位置的标签,并将所选择的标签变换到理想标签序列指示的位置。该实现方式的具体处理可以参考图3对应实施例中相应实现方式的描述,这里不再赘述。
在本实施例的一些可选实现方式,上述数据更新单元进一步用于:对于所述标签序列中位置相互替换变化的两个标签,将两个标签中的第一标签的数位复制到新位置,复制过程中同时在所述新位置和所述第一标签的原位置对所述第一标签的数位进行写操作,复制完成后清 除所述第一标签的原位置的数位并将对该第一标签的数位的读写操作切换至新位置;将两个标签中的第二标签的数位复制到第一标签的原位置处,复制过程中同时在所述第二标签的原位置和所述第一标签的原位置对该第二标签的数位进行写操作,复制完成后清除所述第二标签的原位置的数位并将对该第二标签的数位的读写操作切换至第一标签的旧位置;将所述新位置存储的所述第一标签的数位复制到所述第二标签的原位置,复制过程中同时在所述新位置和所述第二标签的原位置对第一标签的数位进行写操作,复制完成后清除新位置的数位并将对第一标签的数位的读写操作切换至第二标签的原位置。
下面参考图5,其示出了适于用来实现本申请实施例的终端设备或服务器的计算机系统500的结构示意图。
如图5所示,计算机系统500包括中央处理单元(CPU)501,其可以根据存储在只读存储器(ROM)502中的程序或者从存储部分508加载到随机访问存储器(RAM)503中的程序而执行各种适当的动作和处理。在RAM 503中,还存储有系统500操作所需的各种程序和数据。CPU 501、ROM 502以及RAM 503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。
以下部件连接至I/O接口505:包括键盘、鼠标等的输入部分506;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分507;包括硬盘等的存储部分508;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分509。通信部分509经由诸如因特网的网络执行通信处理。驱动器510也根据需要连接至I/O接口505。可拆卸介质511,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器510上,以便于从其上读出的计算机程序根据需要被安装入存储部分508。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,所述计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施 例中,该计算机程序可以通过通信部分509从网络上被下载和安装,和/或从可拆卸介质511被安装。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,所述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括获取单元、生成单元、压缩单元和存储单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,存储单元还可以被描述为“存储已压缩数据的单元”。
作为另一方面,本申请还提供了一种非易失性计算机存储介质,该非易失性计算机存储介质可以是上述实施例中所述装置中所包含的非易失性计算机存储介质;也可以是单独存在,未装配入终端中的非易失性计算机存储介质。上述非易失性计算机存储介质存储有一个或者多个程序,当所述一个或者多个程序被一个设备执行时,使得所述设备:获取待处理的对象信息集合中的各个对象信息,所述对象信息集合中的各个对象信息分别用于描述预设的对象集合中各个对象所包含的标签,所述对象集合中各个对象包含的各个标签均属于预设的标签集合;对于各个对象信息,依次根据预设的标签序列中的各个标签是否被对象信息所描述的对象包含而生成0或1,以形成与所述标签序列长度相等的二进制数,其中,当标签被对象包含时生成数字1, 当标签不被对象包含时生成数字0,所述标签序列是对所述标签集合中的标签按照设定的顺序进行排列而形成的;对所述二进制数执行压缩步骤以形成已压缩数据,所述压缩步骤包括:按预设长度将所述二进制数分成至少一个分段;在二进制数的相邻分段之间增加分隔符号并将每个分段中末尾连续的0去掉;存储所述已压缩数据。
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离所述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (14)

  1. 一种信息处理方法,其特征在于,所述方法包括:
    获取待处理的对象信息集合,所述对象信息集合中的各个对象信息分别用于描述预设的对象集合中各个对象所包含的标签,所述对象集合中各个对象包含的各个标签均属于预设的标签集合;
    对于所述对象信息集合中的各个对象信息,依次根据预设的标签序列中的各个标签是否被对象信息所描述的对象包含而生成0或1,以形成与所述标签序列长度相等的二进制数,其中,当标签被对象包含时生成数字1,当标签不被对象包含时生成数字0,所述标签序列是对所述标签集合中的标签按照设定的顺序进行排列而形成的;
    对所述二进制数执行压缩步骤以形成已压缩数据,所述压缩步骤包括:按预设长度将所述二进制数分成至少一个分段;在二进制数的相邻分段之间增加分隔符号并将每个分段中末尾连续的0去掉;
    存储所述已压缩数据。
  2. 根据权利要求1所述的方法,其特征在于,所述压缩步骤还包括:
    在所述按预设长度将所述二进制数分成至少一个分段之前,将所二进制数末尾连续的0去掉;或者
    在所述在二进制数的相邻分段之间增加分隔符号并将每个分段中末尾连续的0去掉之后,去掉二进制数末尾连续的分隔符号。
  3. 根据权利要求1或2所述的方法,其特征在于,所述预设长度为64。
  4. 根据权利要求1所述的方法,其特征在于,所述对象信息集合中的对象信息是可变的,以及
    所述方法还包括:
    达到预设时间点后,获取当前时间所述标签集合中的各个标签在 所述对象集合中的出现频次;
    按照所获取的各个标签的出现频次对所述标签序列中标签的位置进行更新;
    按照所述标签序列中标签的位置更新对已压缩数据进行更新。
  5. 根据权利要求4所述的方法,其特征在于,所述获取此时所述标签集合中的各个标签在所述对象集合中的出现频次,包括:
    获取至少一个历史时间点时所存储的对象集合中各个标签的出现频次;
    对所获取的各个标签的出现频次进行数据拟合,以预测出此时所述标签集合中的各个标签在所述对象集合中的出现频次。
  6. 根据权利要求4所述的方法,其特征在于,所述按照所获取的各个标签的出现频次对所述标签序列中标签的位置进行更新,包括:
    对所述标签集合中的标签按照出现频次从大到小排列生成理想标签序列,其中理想标签序列是已压缩数据存储空间最小时对应的标签序列;
    确定从所述标签序列变化为所述理想标签序列以减少已压缩数据所占用的存储空间时、各个标签的位置变化有利于优化存储的权重;
    选取权重最大的至少一个标签作为待变换位置的标签,并将所选择的标签变换到所述理想标签序列指示的位置。
  7. 根据权利要求4所述的方法,其特征在于,所述按照所述标签序列中标签的位置更新对已压缩数据进行更新,包括:
    对于所述标签序列中位置相互替换变化的两个标签,将两个标签中的第一标签的数位复制到新位置,复制过程中同时在所述新位置和所述第一标签的原位置对所述第一标签的数位进行写操作,复制完成后清除所述第一标签的原位置的数位并将对该第一标签的数位的读写操作切换至新位置;
    将两个标签中的第二标签的数位复制到第一标签的原位置处,复 制过程中同时在所述第二标签的原位置和所述第一标签的原位置对该第二标签的数位进行写操作,复制完成后清除所述第二标签的原位置的数位并将对该第二标签的数位的读写操作切换至第一标签的旧位置;
    将所述新位置存储的所述第一标签的数位复制到所述第二标签的原位置,复制过程中同时在所述新位置和所述第二标签的原位置对所述第一标签的数位进行写操作,复制完成后清除所述新位置的数位并将对所述第一标签的数位的读写操作切换至第二标签的原位置。
  8. 一种信息处理装置,其特征在于,所述装置包括:
    获取单元,用于获取待处理的对象信息集合,所述对象信息集合中的各个对象信息分别用于描述预设的对象集合中各个对象所包含的标签,所述对象集合中各个对象包含的各个标签均属于预设的标签集合;
    生成单元,用于对于所述对象信息集合中的各个对象信息,依次根据预设的标签序列中的各个标签是否被对象信息所描述的对象包含而生成0或1,以形成与所述标签序列长度相等的二进制数,其中,当标签被对象包含时生成数字1,当标签不被对象包含时生成数字0,所述标签序列是对所述标签集合中的标签按照设定的顺序进行排列而形成的而生成的;
    压缩单元,用于对所述二进制数执行压缩步骤以形成已压缩数据,所述压缩步骤包括:按预设长度将所述二进制数分成至少一个分段;在二进制数的相邻分段之间增加分隔符号并将每个分段中末尾连续的0去掉;
    存储单元,用于存储所述已压缩数据。
  9. 根据权利要求8所述的装置,其特征在于,所述压缩步骤还包括:在所述按预设长度将所述二进制数分成至少一个分段之前,将所二进制数末尾连续的0去掉;或者在所述在二进制数的相邻分段之间增加分隔符号并将每个分段中末尾连续的0去掉之后,去掉二进制数末尾连续的分隔符号。
  10. 根据权利要求8所述的装置,其特征在于,所述对象信息集合中的对象信息是可变的,以及
    所述装置还包括:
    频次获取单元,用于达到预设时间点后,获取当前时间所述标签集合中的各个标签在所述对象集合中的出现频次;
    序列更新单元,用于按照所获取的各个标签的出现频次对所述标签序列中标签的位置进行更新;
    数据更新单元,用于按照所述标签序列中标签的位置更新对已压缩数据进行更新。
  11. 根据权利要求10所述的装置,其特征在于,所述频次获取单元,包括:
    历史频次获取子单元,用于获取至少一个历史时间点时所存储的对象集合中各个标签的出现频次;
    预测子单元,用于对所获取的各个标签的出现频次进行数据拟合,以预测出此时所述标签集合中的各个标签在所述对象集合中的出现频次。
  12. 根据权利要求11所述的装置,其特征在于,所述序列更新单元进一步用于:
    对所述标签集合中的标签按照出现频次生成理想标签序列,其中理想标签序列是已压缩数据存储空间最小时对应的标签序列;
    确定从所述标签序列变化为所述理想标签序列以减少已压缩数据所占用的存储空间时、各个标签的位置变化有利于优化存储的权重;
    选取权重最大的至少一个标签作为待变换位置的标签,并将所选择的标签变换到所述理想标签序列指示的位置。
  13. 一种设备,包括:
    处理器;和
    存储器,
    所述存储器中存储有能够被所述处理器执行的计算机可读指令,在所述计算机可读指令被执行时,所述处理器执行如权利要求1-7中任一项所述的方法。
  14. 一种非易失性计算机存储介质,所述计算机存储介质存储有能够被处理器执行的计算机可读指令,当所述计算机可读指令被处理器执行时,所述处理器执行如权利要求1-7中任一项所述的方法。
PCT/CN2017/081200 2016-04-27 2017-04-20 信息处理方法和装置 WO2017186049A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610274281.9A CN107315535B (zh) 2016-04-27 2016-04-27 信息处理方法和装置
CN201610274281.9 2016-04-27

Publications (1)

Publication Number Publication Date
WO2017186049A1 true WO2017186049A1 (zh) 2017-11-02

Family

ID=60160750

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/081200 WO2017186049A1 (zh) 2016-04-27 2017-04-20 信息处理方法和装置

Country Status (2)

Country Link
CN (1) CN107315535B (zh)
WO (1) WO2017186049A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102265937B1 (ko) * 2020-12-21 2021-06-17 주식회사 모비젠 시퀀스데이터의 분석 방법 및 그 장치

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5918225A (en) * 1993-04-16 1999-06-29 Sybase, Inc. SQL-based database system with improved indexing methodology
CN101036141A (zh) * 2004-03-26 2007-09-12 甲骨文国际有限公司 具有持久性、用户可访问的位图值的数据库管理系统
CN103995887A (zh) * 2014-05-30 2014-08-20 上海达梦数据库有限公司 位图索引压缩方法和位图索引解压方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120170648A1 (en) * 2011-01-05 2012-07-05 Qualcomm Incorporated Frame splitting in video coding
CN102790656B (zh) * 2012-05-30 2015-10-28 新邮通信设备有限公司 一种iq数据压缩方法和系统
CN103840839B (zh) * 2014-03-21 2017-06-27 中国科学院声学研究所 一种井下声波成像测井数据的实时压缩方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5918225A (en) * 1993-04-16 1999-06-29 Sybase, Inc. SQL-based database system with improved indexing methodology
CN101036141A (zh) * 2004-03-26 2007-09-12 甲骨文国际有限公司 具有持久性、用户可访问的位图值的数据库管理系统
CN103995887A (zh) * 2014-05-30 2014-08-20 上海达梦数据库有限公司 位图索引压缩方法和位图索引解压方法

Also Published As

Publication number Publication date
CN107315535A (zh) 2017-11-03
CN107315535B (zh) 2019-09-20

Similar Documents

Publication Publication Date Title
CN109254733B (zh) 用于存储数据的方法、装置和系统
JP5826114B2 (ja) データ解凍装置、データ圧縮装置、データの解凍プログラム、データの圧縮プログラム、及び、圧縮データ配信システム
EP3376393B1 (en) Data storage method and apparatus
CN107870728A (zh) 用于移动数据的方法和设备
US9357007B2 (en) Controlling storing of data
US9966971B2 (en) Character conversion
CN111898698B (zh) 对象的处理方法及装置、存储介质和电子设备
CN111083933B (zh) 数据存储及获取方法和装置
US10509582B2 (en) System and method for data storage, transfer, synchronization, and security
CN114666212B (zh) 配置数据下发方法
WO2021012162A1 (zh) 存储系统数据压缩的方法、装置、设备及可读存储介质
US11119977B2 (en) Cognitive compression with varying structural granularities in NoSQL databases
CN115168319A (zh) 一种数据库系统、数据处理方法及电子设备
CN107491565B (zh) 一种数据同步方法
WO2017186049A1 (zh) 信息处理方法和装置
CN110958212B (zh) 一种数据压缩、数据解压缩方法、装置及设备
US12001237B2 (en) Pattern-based cache block compression
CN114189518A (zh) 应用于计算机集群的通信方法及通信装置
Xiao et al. Iteration number-based hierarchical gradient aggregation for distributed deep learning
WO2019119336A1 (zh) 一种通用数据gz格式的多线程压缩与解压方法及装置
CN114640357B (zh) 数据编码方法、设备及存储介质
US20240211154A1 (en) Method, device, and computer program product for de-duplicating data
CN107508602A (zh) 一种数据压缩方法、系统及其cpu处理器
CN117251214A (zh) 基于分布式数据库Apache Hudi表格式数据操作指令的执行方法
WO2022263790A1 (en) Power-aware transmission of quantum control signals

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17788701

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15.03.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17788701

Country of ref document: EP

Kind code of ref document: A1