CN110096693B

CN110096693B - Data processing method and device for data processing

Info

Publication number: CN110096693B
Application number: CN201810084097.7A
Authority: CN
Inventors: 姚光超
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2018-01-29
Filing date: 2018-01-29
Publication date: 2024-05-28
Anticipated expiration: 2038-01-29
Also published as: CN110096693A

Abstract

The embodiment of the invention provides a data processing method, a data processing device and a data processing device. The method specifically comprises the following steps: determining first data and second data from the data of the multivariate grammar model; the first data includes: the rollback weight of the grammar and/or the layer number corresponding to the rollback position of the grammar; the second data includes: word sequence of grammar; storing the first data as the high order bits of a target data field and storing the second data as the low order bits of the target data field. The embodiment of the invention can reduce the memory space occupied by the multi-element grammar model, thereby improving the speed of voice recognition.

Description

Data processing method and device for data processing

Technical Field

The present invention relates to the field of speech recognition technologies, and in particular, to a data processing method and apparatus, and a device for data processing.

Background

Speech recognition refers to converting lexical content in human speech into computer-readable input, such as converting speech signals into text, and the like. With the continuous development of the speech recognition technology, the application scenario of the speech recognition technology is also more extensive, for example, the application scenario may include: voice dialing, voice navigation, indoor equipment control, voice document retrieval, simple dictation data entry, and the like.

An n-gram model is a language model commonly used in speech recognition, where n is generally a positive integer greater than 1, and generally the greater n, the higher the performance of the language model, and the more accurate the result of the speech recognition.

On-line speech recognition refers to storing relevant resources such as a language model required by speech recognition in a server, and a user accesses the server through a network to obtain a speech recognition result. In order to make the recognition result more accurate, the larger the language model is, the better. However, when the language model is completely placed in the server, the language model occupies a large storage resource, for example, may occupy tens of gigabytes (gigabytes) or even hundreds of gigabytes of memory resources, which not only affects the recognition speed, but also causes great waste of resources.

Disclosure of Invention

The embodiment of the invention provides a data processing method, a data processing device and a data processing device, which are used for solving the problem that the memory occupation of online voice recognition is overlarge in the prior art.

In order to solve the above problems, an embodiment of the present invention discloses a data processing method, including:

Determining first data and second data from the data of the multivariate grammar model; the first data includes: the rollback weight of the grammar and/or the layer number corresponding to the rollback position of the grammar; the second data includes: word sequence of grammar;

storing the first data as the high order bits of a target data field and storing the second data as the low order bits of the target data field.

Optionally, the target data field includes: the first data or the second data.

Optionally, the method further comprises:

Determining third data from the data of the multivariate grammar model;

And deleting the third data from the data of the multi-element grammar model.

Optionally, the third data includes: the starting position of the next layer of grammar prefixed by grammar.

Optionally, the third data includes: the rollback weight corresponding to the highest layer grammar.

Optionally, the method further comprises:

determining a data type corresponding to fourth data from the data of the multi-element grammar model;

and storing the fourth data according to the data type.

Optionally, the fourth data includes: conditional probability of grammar, and/or rollback weight of grammar; the data types include: double byte integer.

On the other hand, the embodiment of the invention discloses a voice recognition method, which comprises the following steps:

Loading a multi-element grammar model; the first data in the multi-element grammar model is stored in the high order of the target data field, and the second data is stored in the low order of the target data field; wherein the first data includes: the rollback position of the grammar and/or the layer number corresponding to the rollback position of the grammar; the second data includes: word sequence of grammar;

Performing voice recognition on grammar according to the multi-element grammar model;

The speech recognition of the grammar according to the multi-grammar model comprises the following steps:

First data is acquired from high order bits of the target data field, and second data is acquired from low order bits of the target data field.

Optionally, the target data field includes: the first data or the second data.

Optionally, the multi-gram model does not include: the initial position of the next layer of grammar with the grammar as the prefix; the speech recognition of the grammar according to the multi-grammar model comprises the following steps:

And determining the starting position of the next layer grammar prefixed by the grammar according to the ending position of the last grammar adjacent to the grammar.

Optionally, the multi-gram model does not include: the rollback weight corresponding to the highest layer grammar.

Optionally, the multivariate grammar model comprises: conditional probability of grammar, and/or rollback weight of grammar; the conditional probability of the grammar and/or the data type corresponding to the rollback weight of the grammar comprise: double byte integer.

In yet another aspect, an embodiment of the present invention discloses a data processing apparatus, including:

The first determining module is used for determining first data and second data from the data of the multi-element grammar model; the first data includes: the rollback weight of the grammar and/or the layer number corresponding to the rollback position of the grammar; the second data includes: word sequence of grammar;

and the first storage module is used for storing the first data as the high order of the target data field and storing the second data as the low order of the target data field.

Optionally, the target data field includes: the first data or the second data.

Optionally, the apparatus further comprises:

the second determining module is used for determining third data from the data of the multi-element grammar model;

and the data deleting module is used for deleting the third data from the data of the multi-element grammar model.

Optionally, the apparatus further comprises:

the third determining module is used for determining a data type corresponding to fourth data from the data of the multi-element grammar model;

and the second storage module is used for storing the fourth data according to the data type.

In still another aspect, an embodiment of the present invention discloses a voice recognition apparatus, including:

The loading module is used for loading the multi-element grammar model; the first data in the multi-element grammar model is stored in the high order of the target data field, and the second data is stored in the low order of the target data field; wherein the first data includes: the rollback position of the grammar and/or the layer number corresponding to the rollback position of the grammar; the second data includes: word sequence of grammar;

The recognition module is used for carrying out voice recognition on the grammar according to the multi-grammar model;

wherein, the identification module includes:

and the data acquisition module is used for acquiring the first data from the high order of the target data field and acquiring the second data from the low order of the target data field.

Optionally, the target data field includes: the first data or the second data.

Optionally, the multi-gram model does not include: the initial position of the next layer of grammar with the grammar as the prefix; the identification module comprises:

and the position determining module is used for determining the starting position of the next-layer grammar with the grammar as a prefix according to the ending position of the last grammar adjacent to the grammar.

In yet another aspect, an embodiment of the present invention discloses an apparatus for data processing, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:

Optionally, the target data field includes: the first data or the second data.

Optionally, the processor is also configured to execute the one or more programs by one or more processors, including instructions for:

Determining third data from the data of the multivariate grammar model;

And deleting the third data from the data of the multi-element grammar model.

and storing the fourth data according to the data type.

In yet another aspect, an embodiment of the present invention discloses an apparatus for speech recognition, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:

Optionally, the target data field includes: the first data or the second data.

In yet another aspect, embodiments of the invention disclose a machine-readable medium having instructions stored thereon that, when executed by one or more processors, cause an apparatus to perform a data processing method as described in one or more of the preceding.

In yet another aspect, embodiments of the present invention disclose a machine-readable medium having instructions stored thereon that, when executed by one or more processors, cause an apparatus to perform a speech recognition method as described in one or more of the preceding.

The embodiment of the invention has the following advantages:

according to the data processing method provided by the embodiment of the invention, the first data and the second data can be determined from the data of the multi-element grammar model, the first data is stored as the high order of the target data field, and the second data is stored as the low order of the target data field. The target data field may specifically be a field with an idle bit in the data of the multivariate grammar model, for example, a field corresponding to a word sequence of a grammar may be used as the target data field, and the first data may specifically include: the rollback weight of the grammar and/or the layer number corresponding to the rollback position of the grammar; the second data includes: word sequences of grammar.

In an application example of the present invention, a field word_id corresponding to a word sequence of a grammar may be used as a target data field, the word_id of the word sequence of the grammar is used as second data to be stored in a lower 17 bits of the word_id, a layer number backoff_level corresponding to a rollback position of the grammar is used as first data to be stored in a higher 3 bits of the word_id, and a rollback weight backoff_prob of the grammar is used as second data to be stored in a remaining upper 12 bits of the word_id; thus, by utilizing the free bits of the word sequence of the grammar, one grammar entry can save the memory space of 8B. For 10G grammar entries, 80G storage space can be saved at most, memory space occupied by a multi-element grammar model can be reduced, and further the speed of voice recognition can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of steps of an embodiment of a data processing method of the present invention;

FIG. 2 is a flow chart of steps of an embodiment of a speech recognition method of the present invention;

FIG. 3 is a block diagram of an embodiment of a data processing apparatus of the present invention;

FIG. 4 is a block diagram of an embodiment of a speech recognition apparatus of the present invention;

FIG. 5 is a block diagram of an apparatus 800 for data processing according to the present invention; and

Fig. 6 is a schematic diagram of a server in some embodiments of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the technical field of language models, a multi-element grammar model is usually stored according to a tree structure, each node in each layer of the tree represents a grammar, and the grammar refers to writing rules of articles and is generally used for referring to complete sentences and reasonable organization of the articles, wherein the complete sentences and the articles are formed by arranging words, terms, phrases and sentences. The first layer of the tree is a 1-gram, the second layer is a 2-gram, and so on, and the nth layer is an n-gram. Each level of grammar in the tree may be stored in an array and the array may be ordered to perform a binary search of the grammar stored in the array. The nodes in the tree, i.e., the grammar in the multi-gram model, may employ the following data structure:

struct lm_node1

{float prob；

float backoff_prob；

int word_id；

int low_id；

int high_id；

int backoff_id；

int backoff_level；}

Wherein prob represents the conditional probability of the grammar, backsff_prob represents the backspacing weight of the grammar, word_id represents the word sequence of the grammar, namely the number id of the grammar in the word list, low_id represents the starting position of the next-layer grammar prefixed by the grammar, high_id represents the ending position of the next-layer grammar prefixed by the grammar, backsff_id represents the backspacing position of the grammar, and backsff_level represents the number of layers corresponding to the backspacing position of the grammar.

In one application example of the present invention, assume that word_ids of two words of "beijing" and "weather" are 345 and 9835, 2-element grammar start and end positions corresponding to "beijing" are 103534 and 113543, 2-element grammar start and end positions corresponding to "weather" are 303534 and 313543, 2-element grammar start and end positions corresponding to "beijing weather" are 1303534 and 1313543, and for two words of "beijing" and "weather", 1-element grammar that may exist is:

-2.34 Beijing-0.12

-3.32 Weather-0.32

The possible 2-gram is:

-2.12 Beijing weather-0.24

The lm_node1 corresponding to "Beijing" of 1-gram may be specifically as follows:

struct lm_node1

{float prob：-2.34；

float backoff_prob：-0.12；

int word_id：345；

int low_id：103534；

int high_id：113543；

int backoff_id：-1；

int backoff_level：-1；}

the lm_nod1 corresponding to 1-gram "weather" can be as follows:

struct lm_node1

{float prob：-3.32；

float backoff_prob：-0.32；

int word_id：9835；

int low_id：303534；

int high_id：313543；

int backoff_id：-1；

int backoff_level：-1；}

The lm_node1 corresponding to the 2-gram "Beijing weather" may specifically be as follows:

struct lm_node1

{float prob：-2.12；

float backoff_prob：-0.24；

int word_id：9835；

int low_id：1303534；

int high_id：1313543；

int backoff_id：9835；

int backskoff_level: 0; the/(subscript) starting from 0

}

The data types of prob and backoff_prob are usually float (floating point type), and the data types of word_id, low_id, high_id, backoff_id and backoff_level are usually int (integer type). One float type of data occupies 4B (Byte) of memory space, one int type of data occupies 4B of memory space, and thus one grammar entry occupies 28B of memory space, where a grammar entry refers to an instance of a specific n-gram stored in a multi-gram model in a specific application, for example, "i" is a 1-gram entry, and "there is a |hospital nearby" is a 3-gram entry. If a multi-element grammar model contains 10G grammar entries, the multi-element grammar model needs to occupy 280G of storage space, which is far more than the memory size of most of the current servers.

In order to solve the problem that the storage space occupied by the multi-grammar model is large in the voice recognition process, the embodiment of the invention compresses the existing multi-grammar model so as to reduce the storage space occupied by the multi-grammar model.

Method embodiment one

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a data processing method according to the present invention may specifically include:

Step 101, determining first data and second data from data of a multi-element grammar model; the first data may include: the rollback weight of the grammar and/or the layer number corresponding to the rollback position of the grammar; the second data may include: word sequence of grammar;

Step 102, storing the first data as the high order bits of the target data field, and storing the second data as the low order bits of the target data field.

The multivariate grammar model specifically refers to an n-gram model, wherein n is a positive integer greater than 1, and it can be understood that the specific value of n is not limited in the embodiment of the invention.

The target data field may be a field with idle bits in the data of the multi-element grammar model, for example, word_id is a word sequence of a grammar, that is, the number of the grammar in a word list is represented, the data type of word_id is int, one int type of data occupies a storage space of 4B, that is, occupies 32 bits, and the inventor finds that the word list for online voice recognition generally includes 10 tens of thousands of words at most, that is, word_id can occupy the lower 17 bits of the int type field at most, and the upper 15 bits of the word_id field are in an idle state, so that the word_id can be used as the target data field.

The first data and the second data may specifically be: and the data of the storage space occupied by the target data field can be shared in the data of the multi-element grammar model. For example, in the data of the multi-element grammar model, the backspacing_level represents the layer number corresponding to the backspacing position of the grammar, for online voice recognition, n in the multi-element grammar model is usually 4 or 5 or 6, the value of the backspacing_level is usually 1 to 5, 3-bit storage space is required to be occupied, the data type of the backspacing_level is int, and 32-bit storage space is occupied at present, so that the storage space is wasted. Specifically, for the target data field word_id, the lower 17 bits thereof may be used to store a word sequence of second data such as grammar, and the upper 15 bits thereof may be used to store the layer number corresponding to the rollback position of the first data such as grammar. Because the layer number corresponding to the rollback position of the grammar needs to occupy 3-bit storage space, the upper 15 bits can still leave 12-bit storage space, therefore, the embodiment of the invention can also store the rollback weight backoff_prob of the grammar as the first data in the remaining upper 12 bits of the word_id field.

Therefore, the embodiment of the invention compresses the existing multi-element grammar model by utilizing the idle bit of the target data field to obtain the compressed multi-element grammar model, so that one grammar entry can save the storage space of 8B.

In an alternative embodiment of the present invention, the target data field may specifically include: the first data or the second data.

It can be understood that, the word_id of the word sequence of the grammar is used as the target data field, the word_id of the word sequence of the grammar is used as the second data to be stored in the lower 17 bits of the word_id, the layer number backoff_level corresponding to the rollback position of the grammar is used as the first data to be stored in the upper 3 bits of the word_id, and the rollback weight backoff_prob of the grammar is used as the second data to be stored in the remaining upper 12 bits of the word_id, which is only one application example of the embodiment of the present invention. In fact, a person skilled in the art may determine the target data field according to the actual application requirement, and any field with a free bit is within the protection scope of the target data field in the embodiment of the present invention. In addition, one skilled in the art may flexibly determine the position of the first data, and the second data in the target data field, that is, the first data may be located at a high order or a low order in the target data field, the second data may be located at a low order or a high order in the target data field, and so on.

In addition, the embodiment of the present invention is not limited to the specific manner in which the first data and the second data are stored in the target data field. For example, the field corresponding to the first data may be a target data field, or the field corresponding to the second data may also be used as a target data field.

In an alternative embodiment of the present invention, the method may further comprise the steps of:

Determining third data from the data of the multivariate grammar model;

And deleting the third data from the data of the multi-element grammar model.

The embodiment of the invention can also determine the third data from the data of the multi-element grammar model, and delete the third data from the data of the multi-element grammar model so as to further compress the multi-element grammar model. Wherein the third data may refer to data that is still capable of realizing voice recognition after being deleted.

In an alternative embodiment of the present invention, the third data may specifically include: the starting position of the next layer of grammar prefixed by grammar.

The starting position of the next layer of grammar with the grammar as the prefix is specifically low_id, and in practical application, the low_id can be determined according to the ending position high_id of the adjacent grammar, so that the embodiment of the invention can delete the low_id, only the high_id is reserved, and one grammar entry can save the storage space of 4B by deleting the low_id.

In an alternative embodiment of the present invention, the starting position of the next layer grammar prefixed with the grammar may be determined by:

In one application example of the present invention, let's assume that "low_id" of grammar "is 12345, end position high_id is 23456, that is, the storage position is a binary grammar prefixed with" from 12345 to 23456, let's assume that the next grammar of "grammar" is "my", and add 1 to "high_id" of "low_id" of "my" grammar, so "my" low_id of grammar is 23457.

After the above-mentioned compression processing of storing the first data as the high order of the target data field, storing the second data as the low order of the target data field, and deleting the low_id, the grammar in the multi-grammar model may adopt the following data structure:

struct lm_node2

{float prob；

unsigned int word_id；

int high_id；

unsigned int backoff_id；

}

In an alternative embodiment of the present invention, the third data may specifically include: the rollback weight corresponding to the highest layer grammar.

In practical applications, the highest level grammar usually has no rollback weight and no lower level grammar, for example, for a 3-element grammar model, there are 1-element, 2-element and 3-element total 3-level grammars, and for a 3-level grammar, there is no corresponding lower level grammar, that is, there is no 4-element grammar, the rollback weight is defined for the lower level grammar, and the highest level grammar has no lower level grammar, so there may be no rollback weight.

Therefore, the embodiment of the invention can delete the next grammar corresponding to the highest grammar and the rollback weight corresponding to the highest grammar from the structural data of the multi-grammar model, and specifically, the embodiment of the invention can further delete the backscattering_prob and the high_id from the lm_nod2 structure to obtain the data structure of the highest grammar (namely, the nth grammar) as follows:

struct lm_higram_node1

{float prob；

unsigned int word_id；

unsigned int backoff_id；}

in the embodiment of the invention, structlm_node1 is the data structure adopted by the layer 1 grammar to the layer n grammar in the original n-gram model, after the data processing of the embodiment of the invention, the layer 1 grammar to the layer (n-1) grammar can adopt the data structure structlm_node2, and the layer n grammar can adopt the data structure structlm_ higram _node1, so that the layer n grammar adopts an independent data structure, and the storage space of 4B can be reduced.

and storing the fourth data according to the data type.

In order to further reduce the storage space occupied by the multi-element grammar model, the embodiment of the invention can also compress the data type corresponding to the fourth data in the multi-element grammar model.

In an alternative embodiment of the present invention, the fourth data may specifically include: conditional probability of grammar, and/or rollback weight of grammar; the data types may specifically include: double byte integer.

In an application example of the present invention, by analyzing the conditional probability and the backoff weight of each grammar in the multi-element grammar model in practical application, it can be known that the probability value corresponding to the conditional probability and the backoff weight of the grammar is usually in a small probability range, for example, usually between-10 and 0, whereas the data types of the conditional probability prob and the backoff weight of the grammar in the existing multi-element grammar model are both float types, and occupy a large amount of useless space, so that the fourth data specifically can include the conditional probability and/or the backoff weight of the grammar. The data of one short type only occupies the storage space of 2B, so that after the data types corresponding to prob and backoff_prob are compressed into short, one grammar entry can save the storage space of 2B. It can be understood that in practical application, the invention is not limited to conditional probability of grammar and data type after compression of rollback weight, for example, the data type can be compressed into character type char.

In an optional embodiment of the present invention, the storing the first data as the high order bits of the target data field may specifically include: storing the compressed backoff_prob of the data type in the upper 12 bits of the target data field, such as the word sequence field of grammar.

The embodiment of the invention can compress the data types corresponding to the fourth data in the data structures lm_nod2 and lm_ higram _nod1, and for an n-gram model, the data structures from the 1 st-layer grammar to the (n-1) -th-layer grammar can be compressed into lm_nod3 as follows by lm_nod2:

struct lm_node3

{short prob；

unsigned int word_id；

int high_id；

unsigned int backoff_id；}

and the data structure of the n-layer grammar can be compressed by lm_ higram _nod1 into lm_ higram _nod2 as follows, and the storage space can be reduced by 4B:

struct lm_higram_node2

{short prob；

unsigned int word_id；

unsigned int backoff_id；}

in practical application, the data processing method of the embodiment of the invention is adopted to compress the multi-element grammar model, one multi-element grammar model containing 10G grammar entries can be compressed from 280G to below 128G, and the storage space occupied by the multi-element grammar model is reduced on the basis of not influencing the model identification effect.

In summary, in the data processing method according to the embodiment of the present invention, the first data and the second data may be determined from the data of the multivariate grammar model, where the first data is stored as the high order bits of the target data field, and the second data is stored as the low order bits of the target data field. The target data field may specifically be a field with idle bits in the data of the multivariate grammar model, for example, a word sequence of a grammar may be used as the target data field, and the first data may specifically include: the rollback weight of the grammar and/or the layer number corresponding to the rollback position of the grammar; the second data includes: word sequences of grammar. In an application example of the present invention, a field corresponding to a word_id of a word sequence of a grammar may be used as a target data field, the word_id of the word sequence of the grammar is used as second data to be stored in a lower 17 bits of the word_id, a layer number backoff_level corresponding to a rollback position of the grammar is used as first data to be stored in a higher 3 bits of the word_id, and a rollback weight backoff_prob of the grammar is used as second data to be stored in a remaining higher 12 bits of the word_id, thereby, by using idle bits of the word sequence of the grammar, a storage space of 8B may be saved for one grammar entry. For 10G grammar entries, 80G storage space can be saved at most, memory space occupied by a multi-element grammar model can be reduced, and further the speed of voice recognition can be improved.

Method embodiment II

Referring to fig. 2, a flowchart illustrating steps of an embodiment of a speech recognition method according to the present invention may specifically include:

step 201, loading a multi-element grammar model; the first data in the multi-element grammar model is stored in the high order of the target data field, and the second data is stored in the low order of the target data field; wherein the first data may include: the rollback position of the grammar and/or the layer number corresponding to the rollback position of the grammar; the second data may include: word sequence of grammar;

step 202, performing voice recognition on grammar according to the multi-element grammar model;

The speech recognition of the grammar according to the multi-grammar model may include:

The multi-element grammar model can be specifically a multi-element grammar model obtained after being processed (compressed) by the data processing method shown in fig. 1. The embodiment of the invention can load the compressed multi-grammar model in the voice recognition process so as to reduce the memory space occupied by the multi-grammar model and further improve the voice recognition speed. Particularly, for online voice recognition, in order to improve the voice recognition effect, n of the multi-element grammar model is usually larger, for example, n is 4 or 5 or 6, so that the multi-element grammar model occupies larger memory space, and the multi-element grammar model obtained by compression through the data processing method can save the memory space on the basis of not influencing the recognition effect.

In the embodiment of the invention, the first data in the multi-element grammar model adopted by the voice recognition is stored in the high order of the target data field, and the second data is stored in the low order of the target data field; wherein the first data may include: the rollback position of the grammar and/or the layer number corresponding to the rollback position of the grammar; the second data may include: word sequences of grammar.

The target data field may specifically be a field with idle bits in the data of the multivariate grammar model, for example, word_id is a word sequence of a grammar. The first data and the second data may specifically be data of a storage space occupied by the target data field in the data of the multivariate grammar model. In an application example of the present invention, word_id of a word sequence of a grammar may be used as a target data field, word_id of the word sequence of the grammar may be used as second data to be stored in a lower 17 bits of the word_id, layer backoff_level corresponding to a rollback position of the grammar may be used as first data to be stored in a higher 3 bits of the word_id, and rollback weight backoff_prob of the grammar may be used as second data to be stored in a remaining higher 12 bits of the word_id. Therefore, the existing multi-element grammar model is compressed by utilizing the idle bit of the target data field word_id, and the compressed multi-element grammar model is obtained, so that one grammar entry can save the storage space of 8B.

In the embodiment of the present invention, the performing speech recognition on the grammar according to the multi-element grammar model may specifically include:

Specifically, the word sequence of the second data such as grammar may be obtained from the lower 17 bits of the target data field word_id, the number of layers corresponding to the rollback position of the first data such as grammar may be obtained from the upper 3 bits of the target data field word_id, and the rollback weight of the first data such as grammar may be obtained from the remaining upper 12 bits of the target data field word_id.

It will be appreciated that the embodiment of the present invention is not limited to the specific contents of the target data field, the first data, and the second data, and further, the embodiment of the present invention is not limited to the specific manner in which the first data, and the second data are stored in the target data field. For example, the first data may be a target data field, or the second data may be a target data field.

In an alternative embodiment of the present invention, the multivariate grammar model may not include: the initial position of the next layer of grammar with the grammar as the prefix; the performing speech recognition on the grammar according to the multi-grammar model may specifically further include:

In an alternative embodiment of the present invention, the multivariate grammar model may not include: the rollback weight corresponding to the highest layer grammar.

In practical application, the highest-level grammar does not generally have the rollback weight and the next-level grammar, so that the multi-element grammar model of the embodiment of the invention does not need to include the rollback weight corresponding to the highest-level grammar, and therefore, the multi-element grammar model can reduce the storage space of 5B.

In an alternative embodiment of the present invention, the multi-gram model may include: conditional probability of grammar, and/or rollback weight of grammar; the conditional probability of the grammar and/or the data type corresponding to the rollback weight of the grammar are as follows: double byte integer.

Because the conditional probability of the grammar and the probability value corresponding to the rollback weight of the grammar are usually in a small probability range, the conditional probability of the grammar and/or the target data type corresponding to the rollback weight in the multi-element grammar model in the embodiment of the invention can be compressed into a double-byte integer short from the original floating point type float, and therefore, one grammar entry can save the storage space of 2B.

In summary, the data processing method of the embodiment of the invention can perform voice recognition according to the loaded multi-grammar model; the first data in the multi-element grammar model is stored in the high order of the target data field, and the second data is stored in the low order of the target data field; wherein the first data includes: the rollback position of the grammar and/or the layer number corresponding to the rollback position of the grammar; the second data includes: word sequences of grammar. Thus, by utilizing the free bits of the word sequence of the grammar, one grammar entry can save the memory space of 8B. For 10G grammar entries, 80G storage space can be saved at most, memory space occupied by a multi-element grammar model can be reduced, and further the speed of voice recognition can be improved.

Device embodiment 1

Referring to fig. 3, there is shown a block diagram of an embodiment of a data processing apparatus according to the present invention, which may specifically include:

a first determining module 301, configured to determine first data and second data from data of the multi-gram model; the first data may specifically include: the rollback weight of the grammar and/or the layer number corresponding to the rollback position of the grammar; the second data may specifically include: word sequence of grammar;

A first storage module 302, configured to store the first data as the upper bits of the target data field, and store the second data as the lower bits of the target data field.

Optionally, the target data field may specifically include: the first data or the second data.

Optionally, the apparatus may further include:

Optionally, the third data may specifically include: the starting position of the next layer of grammar prefixed by grammar.

Optionally, the third data may specifically include: the rollback weight corresponding to the highest layer grammar.

Optionally, the apparatus may further include:

Optionally, the fourth data may specifically include: conditional probability of grammar, and/or rollback weight of grammar; the data types include: double byte integer.

Device example two

Referring to fig. 4, a block diagram illustrating a voice recognition apparatus according to an embodiment of the present invention may specifically include:

A loading module 401; the method is used for loading a multi-element grammar model; the first data in the multi-element grammar model is stored in the high order of the target data field, and the second data is stored in the low order of the target data field; the first data may specifically include: the rollback position of the grammar and/or the layer number corresponding to the rollback position of the grammar; the second data may specifically include: word sequence of grammar;

the recognition module 402 is configured to perform speech recognition on the grammar according to the multivariate grammar model;

The identification module 402 may specifically include:

Alternatively, the multi-gram model may not include: the initial position of the next layer of grammar with the grammar as the prefix; the identification module 402 may specifically include:

Alternatively, the multi-gram model may not include: the rollback weight corresponding to the highest layer grammar.

Optionally, the multi-gram model may include: conditional probability of grammar, and/or rollback weight of grammar; the conditional probability of the grammar and/or the data type corresponding to the rollback weight of the grammar specifically may include: double byte integer.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

The embodiment of the invention also discloses a device for data processing, which comprises a memory and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, and the one or more programs comprise instructions for:

Optionally, the target data field includes: the first data or the second data.

Determining third data from the data of the multivariate grammar model;

And deleting the third data from the data of the multi-element grammar model.

Optionally, the third data includes: the next grammar corresponding to the highest grammar.

and storing the fourth data according to the data type.

The embodiment of the invention also discloses a device for voice recognition, which comprises a memory and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, and the one or more programs comprise instructions for:

Optionally, the target data field includes: the first data or the second data.

Fig. 5 is a block diagram illustrating an apparatus 800 for data processing according to an example embodiment. For example, apparatus 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.

Referring to fig. 5, apparatus 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the apparatus 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing element 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the device 800. Examples of such data include instructions for any application or method operating on the device 800, contact data, phonebook data, messages, pictures, videos, and the like. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 806 provides power to the various components of the device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 800.

The multimedia component 808 includes a screen between the device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the device 800 is in an operational mode, such as a call mode, a recording mode, and an offline speech recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the apparatus 800. For example, the sensor assembly 814 may detect an on/off state of the device 800, a relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or one component of the apparatus 800, the presence or absence of user contact with the apparatus 800, an orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communication between the apparatus 800 and other devices, either in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including instructions executable by processor 820 of apparatus 800 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

Fig. 6 is a schematic structural diagram of a server according to an embodiment of the present invention. The server 1900 may vary considerably in configuration or performance and may include one or more central processing units (central processing units, CPUs) 1922 (e.g., one or more processors) and memory 1932, one or more storage mediums 1930 (e.g., one or more mass storage devices) that store applications 1942 or data 1944. Wherein the memory 1932 and storage medium 1930 may be transitory or persistent. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, a central processor 1922 may be provided in communication with a storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.

The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.

A non-transitory computer readable storage medium, which when executed by a processor of an apparatus (terminal or server) enables the apparatus to perform the method shown in fig. 1 or fig. 2.

A method of data processing, the method comprising: determining first data and second data from the data of the multivariate grammar model; the first data includes: the rollback weight of the grammar and/or the layer number corresponding to the rollback position of the grammar; the second data includes: word sequence of grammar; storing the first data as the high order bits of a target data field and storing the second data as the low order bits of the target data field.

A method of speech recognition, comprising:

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

The foregoing has outlined a data processing method, a data processing device and a device for data processing, as well as a speech recognition method, a speech recognition device and a device for speech recognition, according to the present invention, by way of example only, the principles and embodiments of the invention have been explained in detail, the above examples being provided for the purpose of helping to understand the method of the invention and its core ideas; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. A method of data processing, the method comprising:

Storing the first data as high order bits of a target data field, and storing the second data as low order bits of the target data field;

The target data field is a field with idle bits in the data of the multi-element grammar model; the first data and the second data are: and the data of the storage space occupied by the target data field can be shared in the data of the multi-element grammar model.

2. The method of claim 1, wherein the target data field comprises: the first data or the second data.

3. The method according to claim 1, wherein the method further comprises:

Determining third data from the data of the multivariate grammar model;

And deleting the third data from the data of the multi-element grammar model.

4. A method according to claim 3, wherein the third data comprises: the starting position of the next layer of grammar prefixed by grammar.

5. A method according to claim 3, wherein the third data comprises: the rollback weight corresponding to the highest layer grammar.

6. The method according to claim 1, wherein the method further comprises:

and storing the fourth data according to the data type.

7. The method of claim 6, wherein the fourth data comprises: conditional probability of grammar, and/or rollback weight of grammar; the data types include: double byte integer.

8. A method of speech recognition, the method comprising:

acquiring first data from high order bits of the target data field, and acquiring second data from low order bits of the target data field;

wherein, the target data field is a field with idle bits in the data of the multi-element grammar model; and the first data and the second data are data of a storage space occupied by the target data field in the data of the multi-element grammar model.

9. The method of claim 8, wherein the target data field comprises: the first data or the second data.

10. The method of claim 8, wherein the multivariate grammar model does not comprise: the initial position of the next layer of grammar with the grammar as the prefix; the speech recognition of the grammar according to the multi-grammar model comprises the following steps:

11. The method of claim 8, wherein the multivariate grammar model does not comprise: the rollback weight corresponding to the highest layer grammar.

12. The method of claim 8, wherein the multivariate grammar model comprises: conditional probability of grammar, and/or rollback weight of grammar; the conditional probability of the grammar and/or the data type corresponding to the rollback weight of the grammar comprise: double byte integer.

13. A data processing apparatus, the apparatus comprising:

a first storage module, configured to store the first data as high-order bits of a target data field, and store the second data as low-order bits of the target data field;

14. The apparatus of claim 13, wherein the target data field comprises: the first data or the second data.

15. The apparatus of claim 13, wherein the apparatus further comprises:

16. The apparatus of claim 15, wherein the third data comprises: the starting position of the next layer of grammar prefixed by grammar.

17. The apparatus of claim 15, wherein the third data comprises: the rollback weight corresponding to the highest layer grammar.

18. The apparatus of claim 13, wherein the apparatus further comprises:

19. The apparatus of claim 18, wherein the fourth data comprises: conditional probability of grammar, and/or rollback weight of grammar; the data types include: double byte integer.

20. A speech recognition device, the device comprising:

wherein, the identification module includes:

the data acquisition module is used for acquiring first data from the high order of the target data field and acquiring second data from the low order of the target data field;

21. The apparatus of claim 20, wherein the target data field comprises: the first data or the second data.

22. The apparatus of claim 20, wherein the multi-gram model does not include: the initial position of the next layer of grammar with the grammar as the prefix; the identification module comprises:

23. The apparatus of claim 20, wherein the multi-gram model does not include: the rollback weight corresponding to the highest layer grammar.

24. The apparatus of claim 20, wherein the multivariate grammar model comprises: conditional probability of grammar, and/or rollback weight of grammar; the conditional probability of the grammar and/or the data type corresponding to the rollback weight of the grammar comprise: double byte integer.

25. An apparatus for data processing comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:

26. The apparatus of claim 25, wherein the target data field comprises: the first data or the second data.

27. The device of claim 25, wherein the processor is further configured to execute the one or more programs by one or more processors comprises instructions for:

Determining third data from the data of the multivariate grammar model;

And deleting the third data from the data of the multi-element grammar model.

28. The apparatus of claim 27, wherein the third data comprises: the starting position of the next layer of grammar prefixed by grammar.

29. The apparatus of claim 27, wherein the third data comprises: the rollback weight corresponding to the highest layer grammar.

30. The device of claim 25, wherein the processor is further configured to execute the one or more programs by one or more processors comprises instructions for:

and storing the fourth data according to the data type.

31. The apparatus of claim 30, wherein the fourth data comprises: conditional probability of grammar, and/or rollback weight of grammar; the data types include: double byte integer.

32. An apparatus for speech recognition, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:

33. The apparatus of claim 32, wherein the target data field comprises: the first data or the second data.

34. The apparatus of claim 32, wherein the multi-gram model does not include: the initial position of the next layer of grammar with the grammar as the prefix; the speech recognition of the grammar according to the multi-grammar model comprises the following steps:

35. The apparatus of claim 32, wherein the multi-gram model does not include: the rollback weight corresponding to the highest layer grammar.

36. The apparatus of claim 32, wherein the multivariate grammar model comprises: conditional probability of grammar, and/or rollback weight of grammar; the conditional probability of the grammar and/or the data type corresponding to the rollback weight of the grammar comprise: double byte integer.

37. A machine readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the data processing method of one or more of claims 1 to 7.

38. A machine readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the speech recognition method of one or more of claims 8 to 12.