US12424211B2 - Method and device for compressing finite-state transducers data - Google Patents

Method and device for compressing finite-state transducers data

Info

Publication number
US12424211B2
US12424211B2 US17/782,152 US202117782152A US12424211B2 US 12424211 B2 US12424211 B2 US 12424211B2 US 202117782152 A US202117782152 A US 202117782152A US 12424211 B2 US12424211 B2 US 12424211B2
Authority
US
United States
Prior art keywords
data
arrangement
decomposition
state
fst
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US17/782,152
Other versions
US20230005474A1 (en
Inventor
Zhenxing Liang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Ziipin Network Technology Co Ltd
Original Assignee
Guangzhou Ziipin Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Ziipin Network Technology Co Ltd filed Critical Guangzhou Ziipin Network Technology Co Ltd
Assigned to GUANGZHOU ZIIPIN NETWORK TECHNOLOGY CO., LTD reassignment GUANGZHOU ZIIPIN NETWORK TECHNOLOGY CO., LTD ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LIANG, ZHENXING
Publication of US20230005474A1 publication Critical patent/US20230005474A1/en
Application granted granted Critical
Publication of US12424211B2 publication Critical patent/US12424211B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/70Type of the data to be coded, other than image and sound
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
    • G10L15/193Formal grammars, e.g. finite state automata, context free grammars or word networks
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction

Definitions

  • the present disclosure relates to the technical field of natural language processing, and in particular to a method and device for compressing finite-state transducer (FST) data.
  • FST finite-state transducer
  • a search engine searches dictionary indexes according to the input information, and then outputs some matching results. Since the dictionary indexes are used as a target library for search, the search algorithm depends on the data structure of the dictionary indexes, and involves a search speed and the matching results.
  • the finite-state transducer is a data structure of the dictionary indexes.
  • FIG. 1 (A) is a schematic diagram showing a simple FST structure, which mainly includes states S 1 to S 4 and state transitions (arcs) a 1 to a 5 .
  • the states include a start state mark and some final state marks.
  • FIG. 1 (B) is a schematic diagram showing a simple data structure of FST.
  • State transition data includes signal label data (label), weight data (weight) and next state identifier data (next state).
  • State data includes attached state transition identifier data.
  • State data of a state marked as final further includes final weight data.
  • the FST data includes a large amount of redundant data, and is generally stored in a client device for a long time. In a case of limited resources in the client device, the large amount of redundant data may result in a shortage of memory resources in the client device. Therefore, it is important to optimize the method for compressing the FST data.
  • FIG. 4 is a schematic diagram showing a data structure of the FST data compressed by the conventional data compression method.
  • the data may be compressed to some extent.
  • a next state identifier that originally does not exist in the state is packaged inside the compressed data, resulting in a waste of data space.
  • a weight that originally does not exist in the state transition is still packaged inside the compressed data by the above method, resulting in a further waste of the data space.
  • a method and device for compressing FST data are provided according to the present disclosure, to effectively reduce space occupied by the FST data, thereby solving the technical problem of a waste of data space.
  • a method for compressing FST data includes: acquiring to-be-compressed FST data, where the FST data includes state transition data and state data; decomposing the state transition data based on first data categories to acquire first decomposition data; decomposing the state data based on second data categories to acquire second decomposition data; sequentially arranging, for each of the first data categories, the first decomposition data of the first data category, to acquire first arrangement data of the first data category; alternately arranging the first arrangement data and the second decomposition data according to a sequential order used in the first arrangement data, to acquire second arrangement data; performing classification statistics on the first arrangement data and the second arrangement data to acquire index data; and combining the first arrangement data, the second arrangement data, and the index data, to obtain the compressed FST data.
  • the decomposing the state transition data based on first data categories to acquire first decomposition data includes: decomposing the state transition data based on data categories of signal label, weight and next state identifier, to acquire signal label decomposition data, weight decomposition data and next state identifier decomposition data.
  • the method for compressing FST data further includes: setting data types of the first decomposition data based on a maximum value of signal label and a total number of all states in the to-be-compressed FST data.
  • the method for compressing FST data further includes: removing output signal label decomposition data from the signal label decomposition data in a case that information presented by the FST data is suitable to be presented by FSA data; and removing the weight decomposition data in a case that the information presented by the FST data is suitable to be presented by Trie data.
  • the decomposing the state data based on second data categories to acquire second decomposition data includes: decomposing state data of each final state based on data categories of null label and final weight, to acquire null label decomposition data and final weight decomposition data, where the final state is a state marked as final.
  • the sequentially arranging, for each of the first data categories, the first decomposition data of the first data category, to acquire first arrangement data of the first data category includes: sequentially arranging the signal label decomposition data in a sequential order of state identifiers corresponding to the signal label decomposition data, to acquire signal label arrangement data; sequentially arranging the weight decomposition data in a sequential order of state identifiers corresponding to the weight decomposition data, to acquire weight arrangement data; and sequentially arranging the next state identifier decomposition data in a sequential order of state identifiers corresponding to the next state identifier decomposition data, to acquire next state identifier arrangement data.
  • the alternately arranging the first arrangement data and the second decomposition data according to a sequential order used in the first arrangement data, to acquire second arrangement data includes: alternately arranging the signal label arrangement data and the null label decomposition data in a sequential order of state identifiers corresponding to the signal label arrangement data and the null label decomposition data, to acquire signal label mixed arrangement data; and sequentially arranging the final weight decomposition data in a sequential order of state identifiers corresponding to the final weight decomposition data, to acquire final weight arrangement data.
  • the performing classification statistics on the first arrangement data and the second arrangement data to acquire index data includes: performing classification statistics on the first arrangement data based on state identifiers corresponding to the first arrangement data to acquire first index data; and performing classification statistics on the second arrangement data based on state identifiers corresponding to the second arrangement data to acquire second index data.
  • the method for compressing FST data further includes: setting a data type of the index data based on a maximum count of state transitions belonging to a same state, wherein the maximum count is a total number of state transitions belonging to a state with most transitions among all states in the to-be-compressed FST data.
  • the combining the first arrangement data, the second arrangement data, and the index data, to obtain the compressed FST data includes: combining the weight arrangement data, the next state identifier arrangement data, the signal label mixed arrangement data, the final weight arrangement data and the index data, to obtain the compressed FST data.
  • a device for compressing FST data includes an acquisition unit, a first decomposition unit, a second decomposition unit, a first arrangement unit, a second arrangement unit, a classification statistics unit, and a combination unit.
  • the acquisition unit is configured to acquire to-be-compressed FST data.
  • the FST data includes state transition data and state data.
  • the first decomposition unit is configured to decompose the state transition data based on first data categories to acquire first decomposition data.
  • the second decomposition unit is configured to decompose the state data based on second data categories to acquire second decomposition data.
  • the first arrangement unit is configured to sequentially arrange, for each of the first data categories, the first decomposition data of the first data category, to acquire first arrangement data of the first data category.
  • the second arrangement unit is configured to alternately arrange the first arrangement data and the second decomposition data according to a sequential order used in the first arrangement data, to acquire second arrangement data.
  • the classification statistics unit is configured to perform classification statistics on the first arrangement data and the second arrangement data to acquire index data.
  • the combination unit is configured to combine the first arrangement data, the second arrangement data, and the index data, to obtain the compressed FST data.
  • the method for compressing FST data includes: acquiring to-be-compressed FST data, where the FST data includes state transition data and state data; decomposing the state transition data based on first data categories to acquire first decomposition data; decomposing the state data based on second data categories to acquire second decomposition data; sequentially arranging, for each of the first data categories, the first decomposition data of the first data category, to acquire first arrangement data of the first data category; alternately arranging the first arrangement data and the second decomposition data according to a sequential order used in the first arrangement data, to acquire second arrangement data; performing classification statistics on the first arrangement data and the second arrangement data to acquire index data; and combining the first arrangement data, the second arrangement data, and the index data, to obtain the compressed FST data.
  • the state transition data of the FST data is decomposed based on the first data categories to acquire the first decomposition data
  • the state data of the FST data is decomposed based on the second data categories to acquire the second decomposition data.
  • the first decomposition data of the first data category is sequentially arranged to acquire first arrangement data of the first data category.
  • the first arrangement data and the second decomposition data are alternately arranged according to a sequential order used in the first arrangement data to acquire second arrangement data.
  • classification statistics is performed on the first arrangement data and the second arrangement data to acquire index data.
  • the first arrangement data, the second arrangement data, and the index data are combined to obtain the compressed FST data.
  • the FST data is decomposed and arranged in a fine-grained manner, without filling redundant data.
  • the space occupied by the FST data is effectively reduced, thereby solving the technical problem of the waste of data space.
  • FIG. 1 (A) is a schematic diagram showing a simple FST structure
  • FIG. 1 (B) is a schematic diagram showing a data structure of FST data shown in FIG. 1 (A) ;
  • FIG. 2 is a schematic flowchart of a method for compressing FST data according to a first embodiment of the present disclosure
  • FIG. 3 is a schematic flowchart of a method for compressing FST data according to a second embodiment of the present disclosure
  • FIG. 4 is a schematic diagram showing a data structure of FST data compressed by the conventional data compression method
  • FIG. 5 is a schematic diagram showing a data structure of FST data compressed by the compression method according to embodiments of the present disclosure.
  • FIG. 6 is a schematic structural diagram of a device for compressing FST data according to an embodiment of the present disclosure.
  • a method and device for compressing FST data are provided according to embodiments of the present disclosure, to effectively reduce space occupied by the FST data, thereby solving the technical problem of the waste of data space.
  • Trie is an ordered prefix tree with the same prefix.
  • FSA Finite-State Automaton that includes no output signal label.
  • Data type constrains the values that an expression, defines the operations that can be done on the data, the meaning of the data, and the way values of that type can be stored.
  • FIG. 2 is a schematic flowchart of a method for compressing FST data according to a first embodiment of the present disclosure.
  • the method for compressing FST data according to the first embodiment includes the following steps 201 to 207 .
  • step 201 to-be-compressed FST data is acquired.
  • the FST data includes state transition data and state data.
  • the to-be-compressed FST data is acquired first in order to compress the FST data.
  • each FST structure includes states, state transitions, signal labels, weights, next state identifiers, a start state mark and final state marks, as shown in FIG. 1 (A) .
  • FST data corresponding to the above FST structure includes state transition data and state data.
  • the state transition data includes signal label data, weight data and next state identifier data.
  • the state data includes final weight data and attached state transition identifier data. For each state transition, a state to which the state transition is attached may be determined based on the attached state transition identifier data in the state data.
  • a sign S 1 represents a state
  • a sign a i represents a state transition
  • a circle sign in bold represents a start state
  • a double-circle sign represents a final state.
  • the sign for the final state also represents a final weight
  • the sign for the state transition also represents the signal label and the weight.
  • step 202 the state transition data is decomposed based on first data categories to acquire first decomposition data.
  • the state transition data structure is different from the state data structure.
  • the state transition data and the state data are unified in a format, resulting in redundant data.
  • data is decomposed instead of being unified, and then is arranged separately or arranged in a mixed manner based on categories of the decomposed data. Therefore, the state transition data in the FST data is first decomposed in a fine-grained manner based on the first data categories, to acquire fine-grained first decomposition data.
  • the FST data corresponding to the FST structure may be processed as follows in step 202 .
  • Data of the state transitions a 1 to a 5 is decomposed based on data categories of signal label, weight and next state identifier, to acquire signal label decomposition data of the state transitions a 1 to a 5 , weight decomposition data of the state transitions a 1 to a 5 and next state identifier decomposition data of the state transitions a 1 to a 5 .
  • step 203 the state data is decomposed based on second data categories to acquire second decomposition data.
  • the state data when the state transition data is decomposed, the state data is also decomposed in the fine-grained manner based on the second data categories, to acquire fine-grained second decomposition data.
  • the FST data corresponding to the FST structure may be processed as follows in step 203 .
  • Data of the states S 2 and S 4 marked as final in the state data is decomposed based on data categories of null label and final weight, to acquire null label decomposition data of the states S 2 and S 4 and final weight decomposition data of the states S 2 and S 4 .
  • step 204 for each of the first data categories, the first decomposition data of the first data category is sequentially arranged to acquire first arrangement data of the first data category.
  • the first decomposition data acquired from the above step 202 may be processed as follows in step 204 .
  • the signal label decomposition data of the state transitions a 1 to a 5 is sequentially arranged in a sequential order of state identifiers corresponding to the signal label decomposition data, to acquire signal label arrangement data in an order of a signal label of the state transition a 1 , a signal label of the state transition a 2 , a signal label of the state transition a 3 , a signal label of the state transition a 4 , and a signal label of the state transition a 5 .
  • the weight decomposition data of the state transitions a 1 to a 5 is sequentially arranged in a sequential order of state identifiers corresponding to the weight decomposition data, to acquire weight arrangement data in an order of a weight of the state transition a 1 , a weight of the state transition a 2 , a weight of the state transition a 3 , a weight of the state transition a 4 , and a weight of the state transition a 5 .
  • the next state identifier decomposition data of the state transitions a 1 to a 5 is sequentially arranged in a sequential order of state identifiers corresponding to the next state identifier decomposition data, to acquire next state identifier arrangement data in an order of a next state identifier of the state transition a 1 , a next state identifier of the state transition a 2 , a next state identifier of the state transition a 3 , a next state identifier of the state transition a 4 , and a next state identifier of the state transition a 5 .
  • step 205 the first arrangement data and the second decomposition data are alternately arranged according to a sequential order used in the first arrangement data to acquire second arrangement data.
  • the first arrangement data acquired from the above step 204 and the second decomposition data acquired from the above step 203 may be processed as follows in step 205 .
  • the signal label arrangement data of the state transitions a 1 to a 5 in the first arrangement data and the null label decomposition data of the states S 2 and S 4 in the second decomposition data are mixed, and are alternately arranged in a sequential order of state identifiers corresponding to the signal label arrangement data and the null label decomposition data, to acquire signal label mixed arrangement data in an order of the signal label of the state transition a 1 , the signal label of the state transition a 2 , a null label of the state S 2 , the signal label of the state transition a 3 , the signal label of the state transition a 4 , the signal label of the state transition a 5 , and a null label of the state S 4 .
  • the final weight decomposition data of the states S 2 and S 4 is sequentially arranged in a sequential order of state identifiers corresponding to the final weight decomposition data, to acquire final weight arrangement data in an order of a final weight of the state S 2 , and a final weight of the state S 4 .
  • step 206 classification statistics is performed on the first arrangement data and the second arrangement data to acquire index data.
  • the first arrangement data acquired from the above step 204 and the second arrangement data acquired from the above step 205 may be processed as follows in step 206 .
  • Classification statistics is performed on the signal label arrangement data in the first arrangement data based on state identifiers corresponding to the signal label arrangement data to acquire first index data having index values of S 1 :2, S 2 :2, S 3 :1, S 4 : 0.
  • Classification statistics is performed on the signal label mixed arrangement data in the second arrangement data based on state identifiers corresponding to the signal label mixed arrangement data, to acquire second index data having index values of S 1 :2, S 2 :3, S 3 :1, S 4: 1.
  • step 207 the first arrangement data, the second arrangement data, and the index data are combined to obtain the compressed FST data.
  • the first arrangement data acquired from the above step 204 , the second arrangement data acquired from the above step 205 , and the index data acquired from the above step 206 may be processed as follows in step 207 .
  • the weight arrangement data and the next state identifier arrangement data in the first arrangement data, the signal label mixed arrangement data and the final weight arrangement data in the second arrangement data, and the index data are combined, to obtain the compressed FST data.
  • the finally obtained compressed data is arranged as follows: S 1 :2, S 2 :3, S 3 :1, S 4 :1, the signal label of the state transition a 1 , the signal label of the state transition a 2 , the null label of the state S 2 , the signal label of the state transition a 3 , the signal label of the state transition a 4 , the signal label of the state transition a 5 , the null label of the state S 4 , the weight of the state transition a 1 , the weight of the state transition a 2 , the weight of the state transition a 3 , the weight of the state transition a 4 , the weight of the state transition a 5 , the next state identifier of the state transition a 1 , the next state identifier of the state transition a 2 , the next state identifier of the state transition a 3 , the next state identifier of the state transition a 4 , the next state identifier of the state transition a 5 , the final weight of the state S 2 , and
  • the state transition data of the FST data is decomposed based on the first data categories to acquire the first decomposition data
  • the state data of the FST data is decomposed based on the second data categories to acquire the second decomposition data.
  • the first decomposition data of the first data category is sequentially arranged to acquire the first arrangement data of the first data category.
  • the first arrangement data and the second decomposition data are alternately arranged according to the sequential order used in the first arrangement data to acquire the second arrangement data.
  • classification statistics is performed on the first arrangement data and the second arrangement data to acquire index data.
  • the first arrangement data, the second arrangement data, and the index data are combined to obtain the compressed FST data.
  • the FST data is decomposed and arranged in a fine-grained manner, without filling redundant data.
  • the space occupied by the FST data is effectively reduced, thereby solving the technical problem of the waste of data space.
  • FIG. 3 is a schematic flowchart of the method for compressing FST data according to the second embodiment of the present disclosure.
  • the method for compressing FST data according to the second embodiment includes the following steps 301 to 311 .
  • step 301 to-be-compressed FST data is acquired.
  • the FST data includes state transition data and state data.
  • step 301 is the same as the step 201 in the first embodiment.
  • step 301 one may refer to the description of step 201 , which is not repeated here.
  • step 302 data types of the first decomposition data are set based on a maximum value of signal label and a total number of all states in the to-be-compressed FST data.
  • the state transition data in the FST data has a unified data type which requires a large space.
  • a data type of the signal label is 32-bit Integer
  • a data type of the next state identifier is 32-bit Integer
  • a data type of the weight is 32-bit Float, which may result in a waste of data space.
  • an appropriate data type is set for each category of the state transition data in the FST data.
  • a numerical range of each category of the state transition data is first evaluated. That is, the numerical range of the signal label data, the numerical range of the weight data, and the numerical range of the next state identifier data are evaluated.
  • an appropriate data type is determined based on the maximum value in the numerical range of the category of the state transition data, so that any values of the signal label data, the weight data, and the next state identifier data in the state transition data in their respective numerical ranges have corresponding values of their respective data types.
  • the data types are set as follows in step 302 . If the signal label has a maximum value of 127, and has a numerical range of 0 to 127, then the data type of the signal label is set to be 7-bit Integer. If the total number of all states is 4, and the numerical range of the state identifiers is from 0 to 3, then the data type of the next state identifier is set to be 2-bit Integer. If the weight has a numerical range of 0 to 255, then the data type of the weight is set to be 8-bit Integer.
  • step 303 the state transition data is decomposed based on first data categories to acquire first decomposition data.
  • step 303 is the same as the step 202 in the first embodiment.
  • step 303 one may refer to the description of step 202 , which is not repeated here.
  • step 304 output signal label decomposition data is removed from the signal label decomposition data in a case that information presented by the FST data is suitable to be presented by FSA data.
  • each of the signal label data includes input signal label data and output signal label data.
  • a value of the input signal label data is equal to a value of the output signal label data.
  • a value of the input signal label data is equal to a value of the output signal label data.
  • the output signal label decomposition data in the signal label decomposition data may be removed as redundant data so as to reduce the space occupied by the FST data.
  • step 305 the weight decomposition data is removed in a case that the information presented by the FST data is suitable to be presented by Trie data.
  • a path from the start state to this state is unique. That is, a set of state transitions on the path is unique. That is, a value obtained by adding a final weight of a target state to a sum of weights of all state transitions on the path from the start state to the target state is fixed. Therefore, the weights of all the state transitions may be transferred and added to the final weight of the target state.
  • the Trie structure is presented according to the FST data structure
  • the weight data may be removed as redundant data so as to further reduce the space occupied by the FST data.
  • step 306 the state data is decomposed based on second data categories to acquire second decomposition data.
  • step 306 is the same as the step 203 in the first embodiment.
  • step 306 may refer to the description of step 203 , which is not repeated here.
  • step 307 for each of the first data categories, the first decomposition data of the first data category is sequentially arranged to acquire first arrangement data of the first data category.
  • step 307 is the same as the step 204 in the first embodiment.
  • steps 307 one may refer to the description of step 204 , which is not repeated here.
  • step 308 the first arrangement data and the second decomposition data are alternately arranged according to a sequential order used in the first arrangement data to acquire second arrangement data.
  • step 308 is the same as the step 205 in the first embodiment.
  • steps 308 one may refer to the description of step 205 , which is not repeated here.
  • a data type of the index data is set based on a maximum count of state transitions belonging to a same state.
  • the maximum count is a total number of state transitions, that all belonging to a state with most transitions among all states in the to-be-compressed FST data.
  • the index data of the FST data has an absolute address offset data type which has a large numerical range and requires a large space, as shown in FIG. 4 .
  • This data type of the index data is generally 8-bit Integer, 16-bit Integer, or 32-bit Integer, resulting in a waste of data space.
  • an appropriate data type is set for the index data of the FST data based on the condition of the FST data. Since the maximum number of state transitions belonging to a single state is limited, and generally does not exceed the maximum value of signal label, a relative address offset data type which has a small numerical range and requires a small space is determined, as shown in FIG. 5 .
  • the data type of the index data may be set as follows in step 309 .
  • the number of attached state transitions of each state is as follows.
  • the state S 1 has 2 attached state transitions.
  • the state S 2 has 2 attached state transitions.
  • the state S 3 has 1 attached state transition.
  • the state S 4 has no attached state transition. Therefore, the maximum number of attached state transitions of a state among the all states is 2. Considering that there may be a null label to be counted, the maximum number of the attached state transitions is determined to be 3. Therefore, the index data has a numerical range from 0 to 3, and the data type of the index data is 2-bit Integer.
  • step 310 classification statistics is performed on the first arrangement data and the second arrangement data to acquire the index data.
  • step 310 is the same as the step 206 in the first embodiment.
  • step 310 one may refer to the description of step 206 , which is not repeated here.
  • step 311 the first arrangement data, the second arrangement data, and the index data are combined to obtain the compressed FST data.
  • step 311 is the same as the step 207 in the first embodiment.
  • step 311 one may refer to the description of step 207 , which is not repeated here.
  • FIG. 4 is a schematic diagram showing an arrangement structure of FST data compressed by the conventional data compression method.
  • FIG. 5 is a schematic diagram showing an arrangement structure of FST data compressed by the data compression method according to this embodiment. Comparing the compressed data acquired by the different compression methods, it can be found that, the data space occupied by the data compressed by this embodiment is less than that compressed by the conventional method by two data units of the next state identifier, and is further reduced by applying appropriate data types. In a case of complex data structure of the FST data, the space saved by the technical solutions of the present disclosure is considerable.
  • the state transition data of the FST data is decomposed based on the first data categories to acquire the first decomposition data
  • the state data of the FST data is decomposed based on the second data categories to acquire the second decomposition data.
  • the first decomposition data of the first data category is sequentially arranged to acquire first arrangement data of the first data category.
  • the first arrangement data and the second decomposition data are alternately arranged according to the sequential order used in the first arrangement data to acquire the second arrangement data.
  • classification statistics is performed on the first arrangement data and the second arrangement data to acquire the index data.
  • the first arrangement data, the second arrangement data, and the index data are combined to obtain the compressed FST data.
  • the data types of the first decomposition data may be set based on the maximum value of signal label and the total number of all states, and the output signal label decomposition data and the weight decomposition data are removed depending on the appropriate data structure of the FST data.
  • the data type of the index data may be set based on the maximum count of state transitions belonging to a same state.
  • the FST data is decomposed and arranged in a fine-grained manner, without filling redundant data. Compared with the conventional data compression method in which redundant data is required in order to maintain the consistent format of the compressed data, the space occupied by the FST data is effectively reduced, thereby solving the technical problem of the waste of data space.
  • FIG. 6 is a schematic structural diagram of a device for compressing FST data according to an embodiment of the present disclosure.
  • the device for compressing FST data includes an acquisition unit 601 , a first decomposition unit 602 , a second decomposition unit 603 , a first arrangement unit 604 , a second arrangement unit 605 , a classification statistics unit 606 , and a combination unit 607 .
  • the acquisition unit 601 is configured to acquire to-be-compressed FST data.
  • the FST data includes state transition data and state data.
  • the first decomposition unit 602 is configured to decompose the state transition data based on first data categories to acquire first decomposition data.
  • the second decomposition unit 603 is configured to decompose the state data based on second data categories to acquire second decomposition data.
  • the first arrangement unit 604 is configured to sequentially arrange, for each of the first data categories, the first decomposition data of the first data category, to acquire first arrangement data of the first data category.
  • the second arrangement unit 605 is configured to alternately arrange the first arrangement data and the second decomposition data according to a sequential order used in the first arrangement data, to acquire second arrangement data.
  • the classification statistics unit 606 is configured to perform classification statistics on the first arrangement data and the second arrangement data to acquire index data.
  • the combination unit 607 is configured to combine the first arrangement data, the second arrangement data, and the index data, to obtain the compressed FST data.
  • the state transition data of the FST data is decomposed based on the first data categories to acquire the first decomposition data
  • the state data of the FST data is decomposed based on the second data categories to acquire the second decomposition data.
  • the first decomposition data of the first data category is sequentially arranged to acquire the first arrangement data of the first data category.
  • the first arrangement data and the second decomposition data are alternately arranged according to the sequential order used in the first arrangement data to acquire the second arrangement data.
  • classification statistics is performed on the first arrangement data and the second arrangement data to acquire index data.
  • the first arrangement data, the second arrangement data, and the index data are combined to obtain the compressed FST data.
  • the FST data is decomposed and arranged in a fine-grained manner, without filling redundant data.
  • the space occupied by the FST data is effectively reduced, thereby solving the technical problem of the waste of data space.
  • the device and method disclosed herein may be implemented in other manners.
  • the device embodiments described above are illustrative only.
  • the units are divided merely in logical functions, and may be divided in other manners in actual implementation.
  • multiple units or components may be combined or integrated into another device, or some features may be ignored or not performed.
  • the shown or discussed coupling, direct coupling or communication connection between parts may be via some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • Units described as separate components may or may not be physically separated.
  • Components shown as units may or may not be physical units. That is, these components may be located in same place or may be distributed on multiple network units.
  • the object of the technical solutions of the embodiment may be achieved by selecting a part or all of the units according to actual requirements.
  • functional units in embodiments of the present disclosure may be separate physical units or may be integrated into one processing unit.
  • two or more units may be integrated into one unit.
  • the integrated unit may be implemented in a form of hardware, or may be implemented in a form of software functional unit.
  • the integrated unit may be stored in a computer readable storage medium.
  • the software product may be stored in a storage medium.
  • the software product includes a number of instructions that control a computer device (which may be a personal computer, a server, or a network device and the like) to execute all or part of the steps of the methods according to the embodiments of the present disclosure.
  • the above storage medium includes various mediums capable of storing program code, for example, a U disk, a mobile hard disk, a read only memory (ROM), a random access memory (RAM), a magnetic disk, or a compact disc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A method and device for compressing FST data are provided. The method includes: acquiring to-be-compressed FST data, where the FST data includes state transition data and state data; decomposing the state transition data based on first data categories to acquire first decomposition data; decomposing the state data based on second data categories to acquire second decomposition data; sequentially arranging, for each of the first data categories, the first decomposition data of the first data category, to acquire first arrangement data of the first data category; alternately arranging the first arrangement data and the second decomposition data according to a sequential order used in the first arrangement data, to acquire second arrangement data; performing classification statistics on the first arrangement data and the second arrangement data to acquire index data; and combining the first arrangement data, the second arrangement data, and the index data, to obtain the compressed FST data.

Description

This application is the national phase of International Application No. PCT/CN2021/078808, titled “METHOD AND DEVICE FOR COMPRESSING FINITE-STATE TRANSDUCERS DATA”, filed on Mar. 3, 2021, which claims the priority to Chinese Patent Application No. 202010737012.8, titled “METHOD AND DEVICE FOR COMPRESSING FINITE-STATE TRANSDUCERS DATA”, filed on Jul. 28, 2020 with the China National Intellectual Property Administration (CNIPA), both of which are incorporated herein by reference in its entirety.
FIELD
The present disclosure relates to the technical field of natural language processing, and in particular to a method and device for compressing finite-state transducer (FST) data.
BACKGROUND
In applications such as speech recognition, full-text retrieval, and input methods in the technical field of natural language processing, after acquiring input information, a search engine searches dictionary indexes according to the input information, and then outputs some matching results. Since the dictionary indexes are used as a target library for search, the search algorithm depends on the data structure of the dictionary indexes, and involves a search speed and the matching results.
The finite-state transducer (FST) is a data structure of the dictionary indexes. FIG. 1(A) is a schematic diagram showing a simple FST structure, which mainly includes states S1 to S4 and state transitions (arcs) a1 to a5. The states include a start state mark and some final state marks. FIG. 1(B) is a schematic diagram showing a simple data structure of FST. State transition data includes signal label data (label), weight data (weight) and next state identifier data (next state). State data includes attached state transition identifier data. State data of a state marked as final further includes final weight data. The FST data includes a large amount of redundant data, and is generally stored in a client device for a long time. In a case of limited resources in the client device, the large amount of redundant data may result in a shortage of memory resources in the client device. Therefore, it is important to optimize the method for compressing the FST data.
In the conventional data compression method, a final weight of the state marked as final is wrapped as a weight of the state transition, to generate compressed data in a unified format to be stored. FIG. 4 is a schematic diagram showing a data structure of the FST data compressed by the conventional data compression method. By the above method, the data may be compressed to some extent. However, in the process of wrapping the final weight of the final state, in order to maintain the consistent format of the compressed data, a next state identifier that originally does not exist in the state is packaged inside the compressed data, resulting in a waste of data space. In addition, in a case that none of the state transitions includes a weight, a weight that originally does not exist in the state transition is still packaged inside the compressed data by the above method, resulting in a further waste of the data space.
Therefore, it is desired to provide an efficient method for compressing FST data.
SUMMARY
In view of this, a method and device for compressing FST data are provided according to the present disclosure, to effectively reduce space occupied by the FST data, thereby solving the technical problem of a waste of data space.
A method for compressing FST data is provided according to a first aspect of the present disclosure. The method for compressing FST data includes: acquiring to-be-compressed FST data, where the FST data includes state transition data and state data; decomposing the state transition data based on first data categories to acquire first decomposition data; decomposing the state data based on second data categories to acquire second decomposition data; sequentially arranging, for each of the first data categories, the first decomposition data of the first data category, to acquire first arrangement data of the first data category; alternately arranging the first arrangement data and the second decomposition data according to a sequential order used in the first arrangement data, to acquire second arrangement data; performing classification statistics on the first arrangement data and the second arrangement data to acquire index data; and combining the first arrangement data, the second arrangement data, and the index data, to obtain the compressed FST data.
Optionally, the decomposing the state transition data based on first data categories to acquire first decomposition data includes: decomposing the state transition data based on data categories of signal label, weight and next state identifier, to acquire signal label decomposition data, weight decomposition data and next state identifier decomposition data.
Optionally, before the decomposing the state transition data based on first data categories to acquire first decomposition data, the method for compressing FST data further includes: setting data types of the first decomposition data based on a maximum value of signal label and a total number of all states in the to-be-compressed FST data.
Optionally, after the decomposing the state transition data based on first data categories to acquire first decomposition data, the method for compressing FST data further includes: removing output signal label decomposition data from the signal label decomposition data in a case that information presented by the FST data is suitable to be presented by FSA data; and removing the weight decomposition data in a case that the information presented by the FST data is suitable to be presented by Trie data.
Optionally, the decomposing the state data based on second data categories to acquire second decomposition data includes: decomposing state data of each final state based on data categories of null label and final weight, to acquire null label decomposition data and final weight decomposition data, where the final state is a state marked as final.
Optionally, the sequentially arranging, for each of the first data categories, the first decomposition data of the first data category, to acquire first arrangement data of the first data category includes: sequentially arranging the signal label decomposition data in a sequential order of state identifiers corresponding to the signal label decomposition data, to acquire signal label arrangement data; sequentially arranging the weight decomposition data in a sequential order of state identifiers corresponding to the weight decomposition data, to acquire weight arrangement data; and sequentially arranging the next state identifier decomposition data in a sequential order of state identifiers corresponding to the next state identifier decomposition data, to acquire next state identifier arrangement data.
Optionally, the alternately arranging the first arrangement data and the second decomposition data according to a sequential order used in the first arrangement data, to acquire second arrangement data includes: alternately arranging the signal label arrangement data and the null label decomposition data in a sequential order of state identifiers corresponding to the signal label arrangement data and the null label decomposition data, to acquire signal label mixed arrangement data; and sequentially arranging the final weight decomposition data in a sequential order of state identifiers corresponding to the final weight decomposition data, to acquire final weight arrangement data.
Optionally, the performing classification statistics on the first arrangement data and the second arrangement data to acquire index data includes: performing classification statistics on the first arrangement data based on state identifiers corresponding to the first arrangement data to acquire first index data; and performing classification statistics on the second arrangement data based on state identifiers corresponding to the second arrangement data to acquire second index data.
Optionally, before the performing classification statistics on the first arrangement data and the second arrangement data to acquire index data, the method for compressing FST data further includes: setting a data type of the index data based on a maximum count of state transitions belonging to a same state, wherein the maximum count is a total number of state transitions belonging to a state with most transitions among all states in the to-be-compressed FST data.
Optionally, the combining the first arrangement data, the second arrangement data, and the index data, to obtain the compressed FST data includes: combining the weight arrangement data, the next state identifier arrangement data, the signal label mixed arrangement data, the final weight arrangement data and the index data, to obtain the compressed FST data.
A device for compressing FST data is provided according to a second aspect of the present disclosure. The device for compressing FST data includes an acquisition unit, a first decomposition unit, a second decomposition unit, a first arrangement unit, a second arrangement unit, a classification statistics unit, and a combination unit. The acquisition unit is configured to acquire to-be-compressed FST data. The FST data includes state transition data and state data. The first decomposition unit is configured to decompose the state transition data based on first data categories to acquire first decomposition data. The second decomposition unit is configured to decompose the state data based on second data categories to acquire second decomposition data. The first arrangement unit is configured to sequentially arrange, for each of the first data categories, the first decomposition data of the first data category, to acquire first arrangement data of the first data category. The second arrangement unit is configured to alternately arrange the first arrangement data and the second decomposition data according to a sequential order used in the first arrangement data, to acquire second arrangement data. The classification statistics unit is configured to perform classification statistics on the first arrangement data and the second arrangement data to acquire index data. The combination unit is configured to combine the first arrangement data, the second arrangement data, and the index data, to obtain the compressed FST data.
It can be seen from the above technical solutions that the method for compressing FST data according to the present disclosure has the following advantages.
The method for compressing FST data according to the present disclosure includes: acquiring to-be-compressed FST data, where the FST data includes state transition data and state data; decomposing the state transition data based on first data categories to acquire first decomposition data; decomposing the state data based on second data categories to acquire second decomposition data; sequentially arranging, for each of the first data categories, the first decomposition data of the first data category, to acquire first arrangement data of the first data category; alternately arranging the first arrangement data and the second decomposition data according to a sequential order used in the first arrangement data, to acquire second arrangement data; performing classification statistics on the first arrangement data and the second arrangement data to acquire index data; and combining the first arrangement data, the second arrangement data, and the index data, to obtain the compressed FST data.
According to the present disclosure, the state transition data of the FST data is decomposed based on the first data categories to acquire the first decomposition data, and the state data of the FST data is decomposed based on the second data categories to acquire the second decomposition data. Then, for each of the first data categories, the first decomposition data of the first data category is sequentially arranged to acquire first arrangement data of the first data category. Then, the first arrangement data and the second decomposition data are alternately arranged according to a sequential order used in the first arrangement data to acquire second arrangement data. Then, classification statistics is performed on the first arrangement data and the second arrangement data to acquire index data. Finally, the first arrangement data, the second arrangement data, and the index data are combined to obtain the compressed FST data. In the process, the FST data is decomposed and arranged in a fine-grained manner, without filling redundant data. Compared with the conventional data compression method in which redundant data is required in order to maintain the consistent format of the compressed data, the space occupied by the FST data is effectively reduced, thereby solving the technical problem of the waste of data space.
BRIEF DESCRIPTION OF THE DRAWINGS
In order to more clearly illustrate technical solutions in embodiments of the present disclosure, the drawings to be used in the description of the embodiments are briefly described below. It is apparent that the drawings in the following description are only drawings used in some embodiments of the present disclosure, and other drawings may be acquired by those skilled in the art from the drawings without any creative work.
FIG. 1(A) is a schematic diagram showing a simple FST structure;
FIG. 1(B) is a schematic diagram showing a data structure of FST data shown in FIG. 1(A);
FIG. 2 is a schematic flowchart of a method for compressing FST data according to a first embodiment of the present disclosure;
FIG. 3 is a schematic flowchart of a method for compressing FST data according to a second embodiment of the present disclosure;
FIG. 4 is a schematic diagram showing a data structure of FST data compressed by the conventional data compression method;
FIG. 5 is a schematic diagram showing a data structure of FST data compressed by the compression method according to embodiments of the present disclosure; and
FIG. 6 is a schematic structural diagram of a device for compressing FST data according to an embodiment of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
A method and device for compressing FST data are provided according to embodiments of the present disclosure, to effectively reduce space occupied by the FST data, thereby solving the technical problem of the waste of data space.
The technical solutions in the embodiments of the present disclosure are described clearly and completely in conjunction with the drawings in the embodiments of the present disclosure hereinafter, so that those skilled in the art can better understand the technical solutions of the present disclosure. It is apparent that the described embodiments are only some rather than all embodiments of the present disclosure. All other embodiments acquired by those skilled in the art based on the embodiments of the present disclosure without any creative work fall within the protection scope of the present disclosure.
First, some terms used in the description of the embodiments of the present disclosure are explained as follows.
Trie is an ordered prefix tree with the same prefix.
FSA is short for Finite-State Automaton that includes no output signal label.
Data type constrains the values that an expression, defines the operations that can be done on the data, the meaning of the data, and the way values of that type can be stored.
Reference is made to FIG. 2 , which is a schematic flowchart of a method for compressing FST data according to a first embodiment of the present disclosure.
The method for compressing FST data according to the first embodiment includes the following steps 201 to 207.
In step 201, to-be-compressed FST data is acquired. The FST data includes state transition data and state data.
The to-be-compressed FST data is acquired first in order to compress the FST data.
It should be noted that although different FST data may have different structures, each FST structure includes states, state transitions, signal labels, weights, next state identifiers, a start state mark and final state marks, as shown in FIG. 1(A). As shown in FIG. 1(B), FST data corresponding to the above FST structure includes state transition data and state data. The state transition data includes signal label data, weight data and next state identifier data. The state data includes final weight data and attached state transition identifier data. For each state transition, a state to which the state transition is attached may be determined based on the attached state transition identifier data in the state data.
For example, in FIG. 1(A), a sign S1 represents a state, a sign ai represents a state transition, a circle sign in bold represents a start state, and a double-circle sign represents a final state. The sign for the final state also represents a final weight, and the sign for the state transition also represents the signal label and the weight.
In step 202, the state transition data is decomposed based on first data categories to acquire first decomposition data.
The state transition data structure is different from the state data structure. In the conventional data compression method, the state transition data and the state data are unified in a format, resulting in redundant data. In this embodiment, data is decomposed instead of being unified, and then is arranged separately or arranged in a mixed manner based on categories of the decomposed data. Therefore, the state transition data in the FST data is first decomposed in a fine-grained manner based on the first data categories, to acquire fine-grained first decomposition data.
For example, referring to the FST structure shown in FIG. 1(A), the FST data corresponding to the FST structure may be processed as follows in step 202.
Data of the state transitions a1 to a5 is decomposed based on data categories of signal label, weight and next state identifier, to acquire signal label decomposition data of the state transitions a1 to a5, weight decomposition data of the state transitions a1 to a5 and next state identifier decomposition data of the state transitions a1 to a5.
In step 203, the state data is decomposed based on second data categories to acquire second decomposition data.
In this embodiment, when the state transition data is decomposed, the state data is also decomposed in the fine-grained manner based on the second data categories, to acquire fine-grained second decomposition data.
For example, referring to the FST structure shown in FIG. 1(A), the FST data corresponding to the FST structure may be processed as follows in step 203.
Data of the states S2 and S4 marked as final in the state data is decomposed based on data categories of null label and final weight, to acquire null label decomposition data of the states S2 and S4 and final weight decomposition data of the states S2 and S4.
In step 204, for each of the first data categories, the first decomposition data of the first data category is sequentially arranged to acquire first arrangement data of the first data category.
For example, the first decomposition data acquired from the above step 202 may be processed as follows in step 204. The signal label decomposition data of the state transitions a1 to a5 is sequentially arranged in a sequential order of state identifiers corresponding to the signal label decomposition data, to acquire signal label arrangement data in an order of a signal label of the state transition a1, a signal label of the state transition a2, a signal label of the state transition a3, a signal label of the state transition a4, and a signal label of the state transition a5. The weight decomposition data of the state transitions a1 to a5 is sequentially arranged in a sequential order of state identifiers corresponding to the weight decomposition data, to acquire weight arrangement data in an order of a weight of the state transition a1, a weight of the state transition a2, a weight of the state transition a3, a weight of the state transition a4, and a weight of the state transition a5. The next state identifier decomposition data of the state transitions a1 to a5 is sequentially arranged in a sequential order of state identifiers corresponding to the next state identifier decomposition data, to acquire next state identifier arrangement data in an order of a next state identifier of the state transition a1, a next state identifier of the state transition a2, a next state identifier of the state transition a3, a next state identifier of the state transition a4, and a next state identifier of the state transition a5.
In step 205, the first arrangement data and the second decomposition data are alternately arranged according to a sequential order used in the first arrangement data to acquire second arrangement data.
For example, the first arrangement data acquired from the above step 204 and the second decomposition data acquired from the above step 203 may be processed as follows in step 205. The signal label arrangement data of the state transitions a1 to a5 in the first arrangement data and the null label decomposition data of the states S2 and S4 in the second decomposition data are mixed, and are alternately arranged in a sequential order of state identifiers corresponding to the signal label arrangement data and the null label decomposition data, to acquire signal label mixed arrangement data in an order of the signal label of the state transition a1, the signal label of the state transition a2, a null label of the state S2, the signal label of the state transition a3, the signal label of the state transition a4, the signal label of the state transition a5, and a null label of the state S4. The final weight decomposition data of the states S2 and S4 is sequentially arranged in a sequential order of state identifiers corresponding to the final weight decomposition data, to acquire final weight arrangement data in an order of a final weight of the state S2, and a final weight of the state S4.
In step 206, classification statistics is performed on the first arrangement data and the second arrangement data to acquire index data.
For example, the first arrangement data acquired from the above step 204 and the second arrangement data acquired from the above step 205 may be processed as follows in step 206. Classification statistics is performed on the signal label arrangement data in the first arrangement data based on state identifiers corresponding to the signal label arrangement data to acquire first index data having index values of S1:2, S2:2, S3:1, S4: 0. Classification statistics is performed on the signal label mixed arrangement data in the second arrangement data based on state identifiers corresponding to the signal label mixed arrangement data, to acquire second index data having index values of S1:2, S2:3, S3:1, S4: 1.
In step 207, the first arrangement data, the second arrangement data, and the index data are combined to obtain the compressed FST data.
For example, the first arrangement data acquired from the above step 204, the second arrangement data acquired from the above step 205, and the index data acquired from the above step 206 may be processed as follows in step 207. The weight arrangement data and the next state identifier arrangement data in the first arrangement data, the signal label mixed arrangement data and the final weight arrangement data in the second arrangement data, and the index data are combined, to obtain the compressed FST data.
The finally obtained compressed data is arranged as follows: S1:2, S2:3, S3:1, S4:1, the signal label of the state transition a1, the signal label of the state transition a2, the null label of the state S2, the signal label of the state transition a3, the signal label of the state transition a4, the signal label of the state transition a5, the null label of the state S4, the weight of the state transition a1, the weight of the state transition a2, the weight of the state transition a3, the weight of the state transition a4, the weight of the state transition a5, the next state identifier of the state transition a1, the next state identifier of the state transition a2, the next state identifier of the state transition a3, the next state identifier of the state transition a4, the next state identifier of the state transition a5, the final weight of the state S2, and the final weight of the state S4.
In this embodiment, the state transition data of the FST data is decomposed based on the first data categories to acquire the first decomposition data, and the state data of the FST data is decomposed based on the second data categories to acquire the second decomposition data. Then, for each of the first data categories, the first decomposition data of the first data category is sequentially arranged to acquire the first arrangement data of the first data category. Then, the first arrangement data and the second decomposition data are alternately arranged according to the sequential order used in the first arrangement data to acquire the second arrangement data. Then, classification statistics is performed on the first arrangement data and the second arrangement data to acquire index data. Finally, the first arrangement data, the second arrangement data, and the index data are combined to obtain the compressed FST data. In the process, the FST data is decomposed and arranged in a fine-grained manner, without filling redundant data. Compared with the conventional data compression method in which redundant data is required in order to maintain the consistent format of the compressed data, the space occupied by the FST data is effectively reduced, thereby solving the technical problem of the waste of data space.
The method for compressing FST data according to the first embodiment of the present disclosure is described above, and a method for compressing FST data according to a second embodiment of the present disclosure is described below.
Reference is made to FIG. 3 , which is a schematic flowchart of the method for compressing FST data according to the second embodiment of the present disclosure.
The method for compressing FST data according to the second embodiment includes the following steps 301 to 311.
In step 301, to-be-compressed FST data is acquired. The FST data includes state transition data and state data.
It should be noted that the step 301 is the same as the step 201 in the first embodiment. For details of step 301, one may refer to the description of step 201, which is not repeated here.
In step 302, data types of the first decomposition data are set based on a maximum value of signal label and a total number of all states in the to-be-compressed FST data.
In the conventional technical solutions, the state transition data in the FST data has a unified data type which requires a large space. A data type of the signal label is 32-bit Integer, a data type of the next state identifier is 32-bit Integer, and a data type of the weight is 32-bit Float, which may result in a waste of data space. In this embodiment, an appropriate data type is set for each category of the state transition data in the FST data. In this embodiment, a numerical range of each category of the state transition data is first evaluated. That is, the numerical range of the signal label data, the numerical range of the weight data, and the numerical range of the next state identifier data are evaluated. Then, for each category of the state transition data, an appropriate data type is determined based on the maximum value in the numerical range of the category of the state transition data, so that any values of the signal label data, the weight data, and the next state identifier data in the state transition data in their respective numerical ranges have corresponding values of their respective data types.
For example, the data types are set as follows in step 302. If the signal label has a maximum value of 127, and has a numerical range of 0 to 127, then the data type of the signal label is set to be 7-bit Integer. If the total number of all states is 4, and the numerical range of the state identifiers is from 0 to 3, then the data type of the next state identifier is set to be 2-bit Integer. If the weight has a numerical range of 0 to 255, then the data type of the weight is set to be 8-bit Integer.
In step 303, the state transition data is decomposed based on first data categories to acquire first decomposition data.
It should be noted that the step 303 is the same as the step 202 in the first embodiment. For details of step 303, one may refer to the description of step 202, which is not repeated here.
In step 304, output signal label decomposition data is removed from the signal label decomposition data in a case that information presented by the FST data is suitable to be presented by FSA data.
In the FST data, each of the signal label data includes input signal label data and output signal label data. In a case that the FSA structure is presented according to the FST data structure, a value of the input signal label data is equal to a value of the output signal label data.
For example, in a case that the appropriate data structure of the FST data is simplified or changes to the data structure of FSA data, in data of each state transition, a value of the input signal label data is equal to a value of the output signal label data. In this case, the output signal label decomposition data in the signal label decomposition data may be removed as redundant data so as to reduce the space occupied by the FST data.
In step 305, the weight decomposition data is removed in a case that the information presented by the FST data is suitable to be presented by Trie data.
In a case that the Trie structure is presented according to the FST data structure, for any state, a path from the start state to this state is unique. That is, a set of state transitions on the path is unique. That is, a value obtained by adding a final weight of a target state to a sum of weights of all state transitions on the path from the start state to the target state is fixed. Therefore, the weights of all the state transitions may be transferred and added to the final weight of the target state.
For example, in a case that the Trie structure is presented according to the FST data structure, it is possible that none of the state transition data in the FST data includes meaningful weight data. In this case, the weight data may be removed as redundant data so as to further reduce the space occupied by the FST data.
In step 306, the state data is decomposed based on second data categories to acquire second decomposition data.
It should be noted that the step 306 is the same as the step 203 in the first embodiment. For details of step 306, one may refer to the description of step 203, which is not repeated here.
In step 307, for each of the first data categories, the first decomposition data of the first data category is sequentially arranged to acquire first arrangement data of the first data category.
It should be noted that the step 307 is the same as the step 204 in the first embodiment. For details of step 307, one may refer to the description of step 204, which is not repeated here.
In step 308, the first arrangement data and the second decomposition data are alternately arranged according to a sequential order used in the first arrangement data to acquire second arrangement data.
It should be noted that the step 308 is the same as the step 205 in the first embodiment. For details of step 308, one may refer to the description of step 205, which is not repeated here.
In step 309, a data type of the index data is set based on a maximum count of state transitions belonging to a same state.
It should be noted that the maximum count is a total number of state transitions, that all belonging to a state with most transitions among all states in the to-be-compressed FST data.
In the conventional technical solutions, the index data of the FST data has an absolute address offset data type which has a large numerical range and requires a large space, as shown in FIG. 4 . This data type of the index data is generally 8-bit Integer, 16-bit Integer, or 32-bit Integer, resulting in a waste of data space. In this embodiment, an appropriate data type is set for the index data of the FST data based on the condition of the FST data. Since the maximum number of state transitions belonging to a single state is limited, and generally does not exceed the maximum value of signal label, a relative address offset data type which has a small numerical range and requires a small space is determined, as shown in FIG. 5 .
For example, the data type of the index data may be set as follows in step 309. The number of attached state transitions of each state is as follows. The state S1 has 2 attached state transitions. The state S2 has 2 attached state transitions. The state S3 has 1 attached state transition. The state S4 has no attached state transition. Therefore, the maximum number of attached state transitions of a state among the all states is 2. Considering that there may be a null label to be counted, the maximum number of the attached state transitions is determined to be 3. Therefore, the index data has a numerical range from 0 to 3, and the data type of the index data is 2-bit Integer.
In step 310, classification statistics is performed on the first arrangement data and the second arrangement data to acquire the index data.
It should be noted that the step 310 is the same as the step 206 in the first embodiment. For details of step 310, one may refer to the description of step 206, which is not repeated here.
In step 311, the first arrangement data, the second arrangement data, and the index data are combined to obtain the compressed FST data.
It should be noted that the step 311 is the same as the step 207 in the first embodiment. For details of step 311, one may refer to the description of step 207, which is not repeated here.
FIG. 4 is a schematic diagram showing an arrangement structure of FST data compressed by the conventional data compression method. FIG. 5 is a schematic diagram showing an arrangement structure of FST data compressed by the data compression method according to this embodiment. Comparing the compressed data acquired by the different compression methods, it can be found that, the data space occupied by the data compressed by this embodiment is less than that compressed by the conventional method by two data units of the next state identifier, and is further reduced by applying appropriate data types. In a case of complex data structure of the FST data, the space saved by the technical solutions of the present disclosure is considerable.
In this embodiment, the state transition data of the FST data is decomposed based on the first data categories to acquire the first decomposition data, and the state data of the FST data is decomposed based on the second data categories to acquire the second decomposition data. Then, for each of the first data categories, the first decomposition data of the first data category is sequentially arranged to acquire first arrangement data of the first data category. Then, the first arrangement data and the second decomposition data are alternately arranged according to the sequential order used in the first arrangement data to acquire the second arrangement data. Then, classification statistics is performed on the first arrangement data and the second arrangement data to acquire the index data. Finally, the first arrangement data, the second arrangement data, and the index data are combined to obtain the compressed FST data. In the process of decomposing the state transition data based on the first data categories to acquire the first decomposition data, the data types of the first decomposition data may be set based on the maximum value of signal label and the total number of all states, and the output signal label decomposition data and the weight decomposition data are removed depending on the appropriate data structure of the FST data. In the process of performing classification statistics on the first arrangement data and the second arrangement data to acquire index data, the data type of the index data may be set based on the maximum count of state transitions belonging to a same state. In the process, the FST data is decomposed and arranged in a fine-grained manner, without filling redundant data. Compared with the conventional data compression method in which redundant data is required in order to maintain the consistent format of the compressed data, the space occupied by the FST data is effectively reduced, thereby solving the technical problem of the waste of data space.
The method for compressing FST data according to the second embodiment of the present disclosure is described above, and a device for compressing FST data is described below according to an embodiment of the present disclosure.
Reference is made to FIG. 6 , which is a schematic structural diagram of a device for compressing FST data according to an embodiment of the present disclosure. The device for compressing FST data includes an acquisition unit 601, a first decomposition unit 602, a second decomposition unit 603, a first arrangement unit 604, a second arrangement unit 605, a classification statistics unit 606, and a combination unit 607.
The acquisition unit 601 is configured to acquire to-be-compressed FST data. The FST data includes state transition data and state data.
The first decomposition unit 602 is configured to decompose the state transition data based on first data categories to acquire first decomposition data.
The second decomposition unit 603 is configured to decompose the state data based on second data categories to acquire second decomposition data.
The first arrangement unit 604 is configured to sequentially arrange, for each of the first data categories, the first decomposition data of the first data category, to acquire first arrangement data of the first data category.
The second arrangement unit 605 is configured to alternately arrange the first arrangement data and the second decomposition data according to a sequential order used in the first arrangement data, to acquire second arrangement data.
The classification statistics unit 606 is configured to perform classification statistics on the first arrangement data and the second arrangement data to acquire index data.
The combination unit 607 is configured to combine the first arrangement data, the second arrangement data, and the index data, to obtain the compressed FST data.
In this embodiment, the state transition data of the FST data is decomposed based on the first data categories to acquire the first decomposition data, and the state data of the FST data is decomposed based on the second data categories to acquire the second decomposition data. Then, for each of the first data categories, the first decomposition data of the first data category is sequentially arranged to acquire the first arrangement data of the first data category. Then, the first arrangement data and the second decomposition data are alternately arranged according to the sequential order used in the first arrangement data to acquire the second arrangement data. Then, classification statistics is performed on the first arrangement data and the second arrangement data to acquire index data. Finally, the first arrangement data, the second arrangement data, and the index data are combined to obtain the compressed FST data. In the process, the FST data is decomposed and arranged in a fine-grained manner, without filling redundant data. Compared with the conventional data compression method in which redundant data is required in order to maintain the consistent format of the compressed data, the space occupied by the FST data is effectively reduced, thereby solving the technical problem of the waste of data space.
Those skilled in the art should clearly understand that, detailed operating processes of the above device and units correspond to the processing in the foregoing method embodiments, and are not repeated here for convenience and conciseness of the description.
In the embodiments of the present disclosure, it should be understood that the device and method disclosed herein may be implemented in other manners. For example, the device embodiments described above are illustrative only. For example, the units are divided merely in logical functions, and may be divided in other manners in actual implementation. For example, multiple units or components may be combined or integrated into another device, or some features may be ignored or not performed. In addition, the shown or discussed coupling, direct coupling or communication connection between parts may be via some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
Units described as separate components may or may not be physically separated. Components shown as units may or may not be physical units. That is, these components may be located in same place or may be distributed on multiple network units. The object of the technical solutions of the embodiment may be achieved by selecting a part or all of the units according to actual requirements.
Furthermore, functional units in embodiments of the present disclosure may be separate physical units or may be integrated into one processing unit. Alternatively, two or more units may be integrated into one unit. The integrated unit may be implemented in a form of hardware, or may be implemented in a form of software functional unit.
In a case that the integrated unit is implemented as an independent product in the form of software functional unit for sale or use, the integrated unit may be stored in a computer readable storage medium. Based on such understandings, the technical solutions or part of the technical solutions disclosed in the present disclosure that makes contributions to the conventional technology or all or part of the technical solutions may be embodied in the form of a software product. The software product may be stored in a storage medium. The software product includes a number of instructions that control a computer device (which may be a personal computer, a server, or a network device and the like) to execute all or part of the steps of the methods according to the embodiments of the present disclosure. The above storage medium includes various mediums capable of storing program code, for example, a U disk, a mobile hard disk, a read only memory (ROM), a random access memory (RAM), a magnetic disk, or a compact disc.
The above embodiments are only intended for describing the technical solutions of the present application, and should not be interpreted as limitation to the technical solutions. Although the technical solutions are described in detail with references to the embodiments above, those skilled in the art should understand that the technical solutions according to the above embodiments may be modified, or some technical features may be substituted with the equivalents. Such modifications or substitutions do not cause the essence of the technical solutions to deviate from the spirit and scope of the technical solutions according to the embodiments of the present disclosure.

Claims (9)

The invention claimed is:
1. A method for compressing finite-state transducer (FST) data to reduce memory usage in a computing device, comprising:
acquiring to-be-compressed FST data, wherein the FST data comprises state transition data and state data, and wherein the FST data is used in at least one of text retrieval, search engine, natural language processing, machine translation, speech recognition, signal processing and automated control;
decomposing the state transition data based on first data categories to acquire first decomposition data, comprising:
decomposing the state transition data based on data categories of signal label, weight and next state identifier, to acquire signal label decomposition data, weight decomposition data and next state identifier decomposition data;
after decomposing the state transition data based on the first data categories to acquire the first decomposition data, removing output signal label decomposition data from the signal label decomposition data in a case that information presented by the FST data is suitable to be presented by finite-state automaton (FSA) data; and removing the weight decomposition data in a case that the information presented by the FST data is suitable to be presented by Trie data;
decomposing the state data based on second data categories to acquire second decomposition data;
sequentially arranging, for each of the first data categories, the first decomposition data of the first data category, to acquire first arrangement data of the first data category;
alternately arranging the first arrangement data and the second decomposition data according to a sequential order used in the first arrangement data, to acquire second arrangement data;
performing classification statistics on the first arrangement data and the second arrangement data to acquire index data; and
combining the first arrangement data, the second arrangement data, and the index data, to obtain the compressed FST data, wherein the compressed FST data is stored in a memory of the computing device and reduces memory resource consumption during the at least one of text retrieval, search engine, natural language processing, machine translation, speech recognition, signal processing and automated control.
2. The method for compressing FST data according to claim 1, wherein the sequentially arranging, for each of the first data categories, the first decomposition data of the first data category, to acquire first arrangement data of the first data category comprises:
sequentially arranging the signal label decomposition data in a sequential order of state identifiers corresponding to the signal label decomposition data, to acquire signal label arrangement data;
sequentially arranging the weight decomposition data in a sequential order of state identifiers corresponding to the weight decomposition data, to acquire weight arrangement data; and
sequentially arranging the next state identifier decomposition data in a sequential order of state identifiers corresponding to the next state identifier decomposition data, to acquire next state identifier arrangement data.
3. The method for compressing FST data according to claim 2, wherein the decomposing the state data based on second data categories to acquire second decomposition data comprises:
decomposing state data of each final state based on data categories of null label and final weight, to acquire null label decomposition data and final weight decomposition data, wherein the final state is a state marked as final.
4. The method for compressing FST data according to claim 3, wherein the alternately arranging the first arrangement data and the second decomposition data according to a sequential order used in the first arrangement data, to acquire second arrangement data comprises:
alternately arranging the signal label arrangement data and the null label decomposition data in a sequential order of state identifiers corresponding to the signal label arrangement data and the null label decomposition data, to acquire signal label mixed arrangement data; and
sequentially arranging the final weight decomposition data in a sequential order of state identifiers corresponding to the final weight decomposition data, to acquire final weight arrangement data.
5. The method for compressing FST data according to claim 4, wherein the combining the first arrangement data, the second arrangement data, and the index data, to obtain the compressed FST data comprises:
combining the weight arrangement data, the next state identifier arrangement data, the signal label mixed arrangement data, the final weight arrangement data and the index data, to obtain the compressed FST data.
6. The method for compressing FST data according to claim 1, wherein before the decomposing the state transition data based on first data categories to acquire first decomposition data, the method for compressing FST data further comprises:
setting data types of the first decomposition data based on a maximum value of signal label and a total number of all states in the to-be-compressed FST data.
7. The method for compressing FST data according to claim 1, wherein the performing classification statistics on the first arrangement data and the second arrangement data to acquire index data comprises:
performing classification statistics on the first arrangement data based on state identifiers corresponding to the first arrangement data to acquire first index data; and
performing classification statistics on the second arrangement data based on state identifiers corresponding to the second arrangement data to acquire second index data.
8. The method for compressing FST data according to claim 7, wherein before the performing classification statistics on the first arrangement data and the second arrangement data to acquire index data, the method for compressing FST data further comprises:
setting a data type of the index data based on a maximum count of state transitions belonging to a same state, wherein the maximum count is a total number of state transitions belonging to a state with most transitions among all states in the to-be-compressed FST data.
9. A device for compressing finite-state transducer (FST) data to reduce memory usage in a computing device, comprising:
an acquisition unit configured to acquire to-be-compressed FST data, wherein the FST data comprises state transition data and state data, and wherein the FST data is used in at least one of text retrieval, search engine, natural language processing, machine translation, speech recognition, signal processing and automated control;
a first decomposition unit configured to decompose the state transition data based on first data categories to acquire first decomposition data, comprising:
decomposing the state transition data based on data categories of signal label, weight and next state identifier, to acquire signal label decomposition data, weight decomposition data and next state identifier decomposition data;
the first decomposition unit further configured to, after decomposing the state transition data based on the first data categories to acquire the first decomposition data, remove output signal label decomposition data from the signal label decomposition data in a case that information presented by the FST data is suitable to be presented by finite-state automaton (FSA) data; and remove the weight decomposition data in a case that the information presented by the FST data is suitable to be presented by Trie data;
a second decomposition unit configured to decompose the state data based on second data categories to acquire second decomposition data;
a first arrangement unit configured to sequentially arrange, for each of the first data categories, the first decomposition data of the first data category, to acquire first arrangement data of the first data category;
a second arrangement unit configured to alternately arrange the first arrangement data and the second decomposition data according to a sequential order used in the first arrangement data, to acquire second arrangement data;
a classification statistics unit configured to perform classification statistics on the first arrangement data and the second arrangement data to acquire index data; and
a combination unit configured to combine the first arrangement data, the second arrangement data, and the index data, to obtain the compressed FST data, wherein the compressed FST data is stored in a memory of the computing device and reduces memory resource consumption during the at least one of text retrieval, search engine, natural language processing, machine translation, speech recognition, signal processing and automated control.
US17/782,152 2020-07-28 2021-03-03 Method and device for compressing finite-state transducers data Active 2042-10-27 US12424211B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN202010737012.8 2020-07-28
CN202010737012.8A CN111884659B (en) 2020-07-28 2020-07-28 Compression method and device of FST data
PCT/CN2021/078808 WO2022021876A1 (en) 2020-07-28 2021-03-03 Method and device for compressing finite-state transducers data

Publications (2)

Publication Number Publication Date
US20230005474A1 US20230005474A1 (en) 2023-01-05
US12424211B2 true US12424211B2 (en) 2025-09-23

Family

ID=73200387

Family Applications (1)

Application Number Title Priority Date Filing Date
US17/782,152 Active 2042-10-27 US12424211B2 (en) 2020-07-28 2021-03-03 Method and device for compressing finite-state transducers data

Country Status (3)

Country Link
US (1) US12424211B2 (en)
CN (1) CN111884659B (en)
WO (1) WO2022021876A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111884659B (en) * 2020-07-28 2021-09-10 广州智品网络科技有限公司 Compression method and device of FST data

Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1124889A (en) 1993-12-23 1996-06-19 株式会社理光 Method and apparatus for parallel encoding and decoding of data
JPH11266162A (en) 1998-01-05 1999-09-28 Ricoh Co Ltd Coding method, coding apparatus, compression / decompression system, FSM coder and coder
US20060218161A1 (en) 2005-03-23 2006-09-28 Qian Zhang Systems and methods for efficiently compressing and decompressing markup language
US20070192104A1 (en) 2006-02-16 2007-08-16 At&T Corp. A system and method for providing large vocabulary speech processing based on fixed-point arithmetic
US20090228502A1 (en) 2008-03-05 2009-09-10 International Business Machines Corporation Efficient storage for finite state machines
US7661138B1 (en) 2005-08-31 2010-02-09 Jupiter Networks, Inc. Finite state automaton compression
CN204272112U (en) 2014-12-11 2015-04-15 河北大学 Power carrier communication voice compression coding device
CN104636349A (en) 2013-11-07 2015-05-20 阿里巴巴集团控股有限公司 Method and equipment for compression and searching of index data
CN104866981A (en) 2015-06-12 2015-08-26 武汉理工大学 Modeling method based on business process management of extended finite state machine
CN106651658A (en) 2016-12-30 2017-05-10 合肥工业大学 Non-intruding type dwelling electrical load decomposition method based on finite-state machine
US9865254B1 (en) * 2016-02-29 2018-01-09 Amazon Technologies, Inc. Compressed finite state transducers for automatic speech recognition
US9966066B1 (en) * 2016-02-03 2018-05-08 Nvoq Incorporated System and methods for combining finite state transducer based speech recognizers
US20180233134A1 (en) * 2017-02-10 2018-08-16 Samsung Electronics Co., Ltd. Wfst decoding system, speech recognition system including the same and method for storing wfst data
US10121467B1 (en) * 2016-06-30 2018-11-06 Amazon Technologies, Inc. Automatic speech recognition incorporating word usage information
US20190268361A1 (en) * 2018-02-23 2019-08-29 Crowdstrike, Inc. Computer-security event analysis
CN110704199A (en) 2019-09-06 2020-01-17 深圳平安通信科技有限公司 Data compression method and device, computer equipment and storage medium
CN111884659A (en) 2020-07-28 2020-11-03 广州智品网络科技有限公司 A kind of compression method and device of FST data
US20200357388A1 (en) * 2019-05-10 2020-11-12 Google Llc Using Context Information With End-to-End Models for Speech Recognition
US10943583B1 (en) * 2017-07-20 2021-03-09 Amazon Technologies, Inc. Creation of language models for speech recognition
US11056098B1 (en) * 2018-11-28 2021-07-06 Amazon Technologies, Inc. Silent phonemes for tracking end of speech
US11158307B1 (en) * 2019-03-25 2021-10-26 Amazon Technologies, Inc. Alternate utterance generation
US11270687B2 (en) * 2019-05-03 2022-03-08 Google Llc Phoneme-based contextualization for cross-lingual speech recognition in end-to-end models

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101256624B (en) * 2007-02-28 2012-10-10 微软公司 Method and system for establishing HMM topological structure being suitable for recognizing hand-written East Asia character

Patent Citations (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5583500A (en) 1993-02-10 1996-12-10 Ricoh Corporation Method and apparatus for parallel encoding and decoding of data
CN1124889A (en) 1993-12-23 1996-06-19 株式会社理光 Method and apparatus for parallel encoding and decoding of data
JPH11266162A (en) 1998-01-05 1999-09-28 Ricoh Co Ltd Coding method, coding apparatus, compression / decompression system, FSM coder and coder
US6094151A (en) 1998-01-05 2000-07-25 Ricoh Company, Ltd. Apparatus and method for finite state machine coding of information selecting most probable state subintervals
US20060218161A1 (en) 2005-03-23 2006-09-28 Qian Zhang Systems and methods for efficiently compressing and decompressing markup language
US7661138B1 (en) 2005-08-31 2010-02-09 Jupiter Networks, Inc. Finite state automaton compression
US20070192104A1 (en) 2006-02-16 2007-08-16 At&T Corp. A system and method for providing large vocabulary speech processing based on fixed-point arithmetic
US20090228502A1 (en) 2008-03-05 2009-09-10 International Business Machines Corporation Efficient storage for finite state machines
CN104636349A (en) 2013-11-07 2015-05-20 阿里巴巴集团控股有限公司 Method and equipment for compression and searching of index data
CN204272112U (en) 2014-12-11 2015-04-15 河北大学 Power carrier communication voice compression coding device
CN104866981A (en) 2015-06-12 2015-08-26 武汉理工大学 Modeling method based on business process management of extended finite state machine
US9966066B1 (en) * 2016-02-03 2018-05-08 Nvoq Incorporated System and methods for combining finite state transducer based speech recognizers
US9865254B1 (en) * 2016-02-29 2018-01-09 Amazon Technologies, Inc. Compressed finite state transducers for automatic speech recognition
US10381000B1 (en) 2016-02-29 2019-08-13 Amazon Technologies, Inc. Compressed finite state transducers for automatic speech recognition
US10121467B1 (en) * 2016-06-30 2018-11-06 Amazon Technologies, Inc. Automatic speech recognition incorporating word usage information
CN106651658A (en) 2016-12-30 2017-05-10 合肥工业大学 Non-intruding type dwelling electrical load decomposition method based on finite-state machine
US20180233134A1 (en) * 2017-02-10 2018-08-16 Samsung Electronics Co., Ltd. Wfst decoding system, speech recognition system including the same and method for storing wfst data
CN108417222A (en) 2017-02-10 2018-08-17 三星电子株式会社 Weighted finite state converter decodes system and speech recognition system
US10943583B1 (en) * 2017-07-20 2021-03-09 Amazon Technologies, Inc. Creation of language models for speech recognition
US20190268361A1 (en) * 2018-02-23 2019-08-29 Crowdstrike, Inc. Computer-security event analysis
US11056098B1 (en) * 2018-11-28 2021-07-06 Amazon Technologies, Inc. Silent phonemes for tracking end of speech
US11158307B1 (en) * 2019-03-25 2021-10-26 Amazon Technologies, Inc. Alternate utterance generation
US11270687B2 (en) * 2019-05-03 2022-03-08 Google Llc Phoneme-based contextualization for cross-lingual speech recognition in end-to-end models
US20200357388A1 (en) * 2019-05-10 2020-11-12 Google Llc Using Context Information With End-to-End Models for Speech Recognition
CN110704199A (en) 2019-09-06 2020-01-17 深圳平安通信科技有限公司 Data compression method and device, computer equipment and storage medium
CN111884659A (en) 2020-07-28 2020-11-03 广州智品网络科技有限公司 A kind of compression method and device of FST data
WO2022021876A1 (en) 2020-07-28 2022-02-03 Guangzhou Ziipin Network Technology Co., Ltd Method and device for compressing finite-state transducers data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
International Search Report for PCT/CN2021/078808 mailed May 31, 2021, ISA/CN.
Tan En-min, etc., A Hybrid Prefix Encoding Method of Test Data Compression, Microelectronics & Computer vol. 33 No. 9 Sep. 2016.
Wen Hou, etc., Analysis of Compressed Sensing Based CT Reconstruction with Low Radiation, 2014 IEEE International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS) Dec. 1-4, 2014.

Also Published As

Publication number Publication date
CN111884659B (en) 2021-09-10
CN111884659A (en) 2020-11-03
US20230005474A1 (en) 2023-01-05
WO2022021876A1 (en) 2022-02-03

Similar Documents

Publication Publication Date Title
CN110532347B (en) Log data processing method, device, equipment and storage medium
US6904430B1 (en) Method and system for efficiently identifying differences between large files
JP3152868B2 (en) Search device and dictionary / text search method
US8694474B2 (en) Block entropy encoding for word compression
US20230078918A1 (en) Devices and methods for efficient execution of rules using pre-compiled directed acyclic graphs
US20130141259A1 (en) Method and system for data compression
US12530407B2 (en) Data filtering methods and apparatuses for data queries
CN114238257A (en) Log processing method, log processing device and electronic equipment
US12424211B2 (en) Method and device for compressing finite-state transducers data
CN116680258B (en) Data processing method and system based on PDM system and readable storage medium
US8463759B2 (en) Method and system for compressing data
CN113392124B (en) Structured language-based data query method and device
CN117010358A (en) Message card generation method, device, computer equipment and storage medium
CN119226380B (en) Database code extraction method and system based on fast screening of large language model
CN111339378A (en) Character command auditing method and system in operation and maintenance management
US12353742B2 (en) Method, device, and computer program product for data deduplication
CN112527753B (en) DNS analysis record lossless compression method and device, electronic equipment and storage medium
CN116090413A (en) Serialization-based general RDF data compression method
US20250335484A1 (en) Tokenized text for efficient searching by machine learning (ml) applications
US12461918B2 (en) Data compression, store, and search system
TR2022014836T2 (en) METHOD AND DEVICE FOR COMPRESSING DATA OF FINITE STATE CONVERTERS
CN116910053B (en) Data timeout management method and device, electronic equipment and readable storage medium
US20250363074A1 (en) System and Method for Arithmetic Operations on Compacted Data Files
KR20080026772A (en) Lempel-Ziv Compression method that complements the restoration speed of compression method
US9009200B1 (en) Method of searching text based on two computer hardware processing properties: indirect memory addressing and ASCII encoding

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

AS Assignment

Owner name: GUANGZHOU ZIIPIN NETWORK TECHNOLOGY CO., LTD, CHINA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LIANG, ZHENXING;REEL/FRAME:060269/0517

Effective date: 20220510

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE