CN110096693A

CN110096693A - A kind of data processing method, device and the device for data processing

Info

Publication number: CN110096693A
Application number: CN201810084097.7A
Authority: CN
Inventors: 姚光超
Original assignee: Beijing Sogou Technology Development Co Ltd; Sogou Hangzhou Intelligent Technology Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2018-01-29
Filing date: 2018-01-29
Publication date: 2019-08-06
Anticipated expiration: 2038-01-29
Also published as: CN110096693B

Abstract

The embodiment of the invention provides a kind of data processing method, device and for the device of data processing.Method therein specifically includes: the first data and the second data are determined from the data of multi-component grammar model；First data include: the rollback weight of the syntax, and/or the corresponding number of plies of retracted position of the syntax；Second data include: the word sequence of the syntax；It is stored using first data as a high position for target data field, and, it is stored second data as the low level of the target data field.The embodiment of the present invention can reduce the memory headroom of multi-component grammar model occupancy, and then the speed of speech recognition can be improved.

Description

A kind of data processing method, device and the device for data processing

Technical field

The present invention relates to technical field of voice recognition more particularly to a kind of data processing method, device and at data The device of reason.

Background technique

Speech recognition refers to that by the vocabulary Content Transformation in human speech be computer-readable input, such as by voice Signal is converted to text etc..With the continuous development of speech recognition technology, the application scenarios of speech recognition technology are also more extensive, Such as above-mentioned application scenarios may include: phonetic dialing, Voice Navigation, indoor equipment control, voice document searching, simply listen Write data inputting etc..

N-gram (the n member syntax) model is common a kind of language model in speech recognition, wherein n is typically greater than 1 Positive integer, and the performance of the usually more big then language model of n is higher, the result of speech recognition is also more accurate.

Online speech recognition, which refers to, is stored in server, user for related resources such as language models required for speech recognition Speech recognition result is obtained by network access server.In order to enable recognition result is more accurate, language model is usually bigger Better.However, language model is all put in the server, language model will occupy biggish storage resource, such as may account for With the memory source of tens G (gigabit) byte or even G byte up to a hundred, recognition speed is not only influenced, but also causes the very big of resource Waste.

Summary of the invention

The embodiment of the present invention provides a kind of data processing method, device and the device for data processing, existing to solve The excessive problem of online speech recognition EMS memory occupation in technology.

To solve the above-mentioned problems, the embodiment of the invention discloses a kind of data processing methods, comprising:

The first data and the second data are determined from the data of multi-component grammar model；First data include: the syntax The corresponding number of plies of retracted position of rollback weight, and/or the syntax；Second data include: the word sequence of the syntax；

Stored using first data as a high position for target data field, and, using second data as The low level of the target data field is stored.

Optionally, the target data field includes: the first data or the corresponding field of the second data.

Optionally, the method also includes:

Third data are determined from the data of the multi-component grammar model；

The third data are deleted from the data of the multi-component grammar model.

Optionally, the third data include: the next layer of grammatical initial position using the syntax as prefix.

Optionally, the third data include: the corresponding rollback weight of the top syntax.

Optionally, the method also includes:

The corresponding data type of the 4th data is determined from the data of the multi-component grammar model；

According to the data type, the 4th data are stored.

Optionally, the 4th data include: the conditional probability of the syntax, and/or the rollback weight of the syntax；The data Type includes: double byte integer.

On the other hand, the embodiment of the invention discloses a kind of audio recognition methods, comprising:

Load multi-component grammar model；The first data in the multi-component grammar model are stored in the height of target data field Position, and, the second data are stored in the low level of the target data field；Wherein, first data include: the rollback of the syntax The corresponding number of plies of retracted position of position, and/or the syntax；Second data include: the word sequence of the syntax；

Speech recognition is carried out to the syntax according to the multi-component grammar model；

It is wherein, described that speech recognition is carried out to the syntax according to the multi-component grammar model, comprising:

From the first data of high-order acquisition of the target data field, and, it is obtained from the low level of the target data field Take the second data.

Optionally, the multi-component grammar model does not include: the next layer of grammatical initial position using the syntax as prefix；It is described Speech recognition is carried out to the syntax according to the multi-component grammar model, comprising:

According to a upper grammatical end position adjacent with the syntax, determine described using the syntax as next layer of prefix The initial position of the syntax.

Optionally, the multi-component grammar model does not include: the corresponding rollback weight of the top syntax.

Optionally, the multi-component grammar model includes: the conditional probability of the syntax, and/or the rollback weight of the syntax；It is described The conditional probability of the syntax, and/or the corresponding data type of rollback weight of the syntax include: double byte integer.

Another aspect, the embodiment of the invention discloses a kind of data processing equipments, comprising:

First determining module, for determining the first data and the second data from the data of multi-component grammar model；Described One data include: the rollback weight of the syntax, and/or the corresponding number of plies of retracted position of the syntax；Second data include: text The word sequence of method；

First memory module, for being stored using first data as a high position for target data field, and, it will Second data are stored as the low level of the target data field.

Optionally, described device further include:

Second determining module, for determining third data from the data of the multi-component grammar model；

Data removing module, for deleting the third data from the data of the multi-component grammar model.

Optionally, described device further include:

Third determining module, for determining the corresponding data class of the 4th data from the data of the multi-component grammar model Type；

Second memory module, for being stored to the 4th data according to the data type.

In another aspect, the embodiment of the invention discloses a kind of speech recognition equipments, comprising:

Loading module, for loading multi-component grammar model；The first data in the multi-component grammar model are stored in target A high position for data field, and, the second data are stored in the low level of the target data field；Wherein, first data packet It includes: the corresponding number of plies of retracted position of grammatical retracted position, and/or the syntax；Second data include: the word order of the syntax Column；

Identification module, for carrying out speech recognition to the syntax according to the multi-component grammar model；

Wherein, the identification module, comprising:

Data acquisition module, for the first data of high-order acquisition from the target data field, and, from the target The low level of data field obtains the second data.

Optionally, the multi-component grammar model does not include: the next layer of grammatical initial position using the syntax as prefix；It is described Identification module, comprising:

Position determination module, for described according to a upper grammatical end position adjacent with the syntax, determining with The syntax are next layer of grammatical initial position of prefix.

In another aspect, including memory, Yi Jiyi the embodiment of the invention discloses a kind of device for data processing A perhaps more than one program one of them or more than one program is stored in memory, and is configured to by one Or it includes the instruction for performing the following operation that more than one processor, which executes the one or more programs:

Optionally, the processor is also configured to execute one or one by one or more than one processor A procedure above includes the instruction for performing the following operation:

Third data are determined from the data of the multi-component grammar model；

The third data are deleted from the data of the multi-component grammar model.

According to the data type, the 4th data are stored.

In another aspect, including memory, Yi Jiyi the embodiment of the invention discloses a kind of device for speech recognition A perhaps more than one program one of them or more than one program is stored in memory, and is configured to by one Or it includes the instruction for performing the following operation that more than one processor, which executes the one or more programs:

In another aspect, be stored thereon with instruction the embodiment of the invention discloses a kind of machine readable media, when by one or When multiple processors execute, so that device executes the data processing method as described in aforementioned one or more.

In another aspect, be stored thereon with instruction the embodiment of the invention discloses a kind of machine readable media, when by one or When multiple processors execute, so that device executes the audio recognition method as described in aforementioned one or more.

The embodiment of the present invention includes following advantages:

The data processing method of the embodiment of the present invention can determine the first data and from the data of multi-component grammar model Two data are stored using first data as a high position for target data field, and, using second data as institute The low level for stating target data field is stored.The target data field is specifically as follows the data of the multi-component grammar model In, there are the fields of spare bits, for example, can using the corresponding field of word sequence of the syntax as target data field, described the One data can specifically include: the corresponding number of plies of retracted position of grammatical rollback weight, and/or the syntax；Second data It include: the word sequence of the syntax.

It, can be using the corresponding field word_id of word sequence of the syntax as target in a kind of application example of the invention The word sequence word_id of the syntax is stored in low 17 of word_id by data field, by the rollback of the syntax The corresponding number of plies backoff_level in position is stored in the 3 high of word_id as the first data, and by the rollback of the syntax It is remaining 12 high that weight backoff_prob as the second data is stored in word_id；Pass through the word order using the syntax as a result, The spare bits of column allow a grammatical entry to save the memory space of 8B.For the grammatical entry of 10G, can at most save The memory space for saving 80G, can reduce the memory headroom of multi-component grammar model occupancy, and then the speed of speech recognition can be improved Degree.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings Obtain other attached drawings.

Fig. 1 is a kind of step flow chart of data processing method embodiment of the invention；

Fig. 2 is a kind of step flow chart of audio recognition method embodiment of the invention；

Fig. 3 is a kind of structural block diagram of data processing equipment embodiment of the invention；

Fig. 4 is a kind of structural block diagram of speech recognition equipment embodiment of the invention；

Fig. 5 is a kind of block diagram of device 800 for data processing of the invention；And

Fig. 6 is the structural schematic diagram of server in some embodiments of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.

In language model technical field, multi-component grammar model is usually stored according to tree, each layer of tree Each of node indicate that a syntax, the syntax refer to the writing regulation of article, be generally used to refer to text, word, short sentence, The layout of sentence and the reasonability tissue of complete sentence and article formed.The first layer of tree is 1 yuan of syntax, and the second layer is 2 yuan The syntax, and so on, n-th layer is the n member syntax.The each layer of syntax in tree can be stored in array, and can be to array It is ranked up, to carry out binary chop to the syntax stored in array.The syntax in node namely multi-component grammar model in tree Following data structure can be used:

struct lm_node1

{float prob；

float backoff_prob；

int word_id；

int low_id；

int high_id；

int backoff_id；

int backoff_level；}

Wherein, prob indicates the conditional probability of the syntax, and backoff_prob indicates the rollback weight of the syntax, word_id table Show the word sequence of the syntax, namely number id of the syntax in vocabulary, low_id indicates grammatical using the syntax as next layer of prefix Initial position, high_id indicate that using the syntax as next layer of grammatical end position of prefix, backoff_id indicates returning for the syntax It gives up the throne and sets, backoff_level indicates the corresponding number of plies of retracted position of the syntax.

In a kind of application example of the invention, it is assumed that the word_id of the two words of " Beijing " and " weather " is respectively 345 With 9835, " Beijing " corresponding 2 yuan of grammatical starting and ending positions are 103534 and 113543, " weather " corresponding 2 yuan of syntax Starting and ending position is 303534 and 313543, and " Beijing weather " corresponding 2 yuan of grammatical starting and ending positions are 1303534 With 1313543, for " Beijing " and " weather " the two words, 1 yuan of syntax that may be present are as follows:

- 2.34 Beijing -0.12

- 3.32 weather -0.32

Its possible 2 yuan of syntax are as follows:

- 2.12 Beijing weather -0.24

The corresponding lm_node1 of 1 yuan of syntax " Beijing " specifically can be such that

struct lm_node1

{ float prob:-2.34；

Float backoff_prob:-0.12；

Int word_id:345；

Int low_id:103534；

Int high_id:113543；

Int backoff_id:-1；

Int backoff_level:-1；}

The corresponding lm_node1 of 1 yuan of syntax " weather " can be such that

struct lm_node1

{ float prob:-3.32；

Float backoff_prob:-0.32；

Int word_id:9835；

Int low_id:303534；

Int high_id:313543；

Int backoff_id:-1；

Int backoff_level:-1；}

The corresponding lm_node1 of 2 yuan of syntax " Beijing weather " specifically can be such that

struct lm_node1

{ float prob:-2.12；

Float backoff_prob:-0.24；

Int word_id:9835；

Int low_id:1303534；

Int high_id:1313543；

Int backoff_id:9835；

Int backoff_level:0；// subscript is since 0

}

Wherein, the data type of prob, backoff_prob are usually float (floating point type), word_id, low_id, The data type of high_id, backoff_id, backoff_level are usually int (integer type).One float type Data occupy the memory space of 4B (Byte, byte), and the data of an int type occupy the memory space of 4B, therefore, a text The memory space of grammar entry occupancy 28B, wherein grammatical entry refers in concrete application that one stored in multi-component grammar model has The example of the n member syntax of body, for example, " I " is 1 yuan of grammatical entry, " nearby | what has | hospital " is 3 yuan of syntax Entry.If in a multi-component grammar model including 10G syntax entry, the storage which need to occupy 280G is empty Between, considerably beyond the size of current most of server memories.

To solve the problems, such as in speech recognition process, multi-component grammar model occupy memory space it is larger, the present invention implement Example compresses existing multi-component grammar model, to reduce the memory space of multi-component grammar model occupancy.

Embodiment of the method one

Referring to Fig.1, a kind of step flow chart of data processing method embodiment of the invention is shown, specifically can wrap It includes:

Step 101 determines the first data and the second data from the data of multi-component grammar model；First data can be with It include: the rollback weight of the syntax, and/or the corresponding number of plies of retracted position of the syntax；Second data may include: the syntax Word sequence；

Step 102 is stored using first data as a high position for target data field, and, by described second Data are stored as the low level of the target data field.

The multi-component grammar model refers specifically to n-gram model, and wherein n is the positive integer greater than 1, it will be understood that the present invention Embodiment is without restriction to the specific value of n.

The target data field is specifically as follows in the data of the multi-component grammar model, there are the field of spare bits, For example, word_id is the word sequence of the syntax, namely indicates number of the syntax in vocabulary, the data type of word_id is int, The data of one int type occupy the memory space of 4B, namely occupy 32bit (position), and inventor has found that being used for The vocabulary of online speech recognition is generally up to about including 100,000 words, that is, word_id can at most occupy int type field Low 17, high 15 of word_id field are generally in idle state, therefore, can be using word_id as target data word Section.

First data are specifically as follows with the second data: in the data of the multi-component grammar model, can share institute State the data of the memory space of target data field occupancy.For example, in the data of multi-component grammar model, backoff_level Indicate the corresponding number of plies of retracted position of the syntax, for online speech recognition, the n in multi-component grammar model is usually 4 or 5 or 6, The value of backoff_level is usually 1 to 5, needs to occupy 3 memory spaces, and the data of backoff_level at present Type is int, occupies 32 memory spaces, causes the waste of memory space, and therefore, the embodiment of the present invention can be by the word of the syntax Sequence word_id is as target data field, using the corresponding number of plies backoff_level of retracted position of the syntax as the first number According to, and using the word sequence word_id of the syntax as the second data.Specifically, low for target data field word_id 17 can be used for storing the word sequence of the second data such as syntax, and high 15 can be used for storing the retracted position of the first data such as syntax The corresponding number of plies.Since the corresponding number of plies of retracted position of the syntax needs to occupy 3 memory spaces, high 15 can also be 12 remaining Memory space, therefore, the embodiment of the present invention can also be using the rollback weight backoff_prob of the syntax as the first data, storage It is remaining 12 high in word_id field.

As a result, the embodiment of the present invention by utilize target data field spare bits, to existing multi-component grammar model into Row compression, obtains compressed multi-component grammar model, and a grammatical entry is allowed to save the memory space of 8B.

In an alternative embodiment of the invention, the target data field can specifically include: the first data or The corresponding field of second data.

It is appreciated that it is above-mentioned using the word sequence word_id of the syntax as target data field, by the word sequence of the syntax Word_id is stored in low 17 of word_id as the second data, by the corresponding number of plies backoff_ of retracted position of the syntax Level is stored in the 3 high of word_id as the first data, and using the rollback weight backoff_prob of the syntax as the Two data are stored in that word_id is remaining 12 high, only apply example as a kind of of the embodiment of the present invention.In fact, ability Field technique personnel can determine that the target data field, the field that arbitrarily there are spare bits are equal according to practical application request Within the protection scope of the target data field of the embodiment of the present invention.In addition, those skilled in the art can neatly determine The position of first data and the second data in target data field, that is, the first data can be located at target data field In a high position perhaps the second data of low level can be located at low level or a high position etc. in target data field.

In addition, what the embodiment of the present invention stored the first data and the second data in the target data field Concrete mode is also without restriction.For example, the corresponding field of first data can be target data field, alternatively, can also Using by the corresponding field of second data as target data field.

In an alternative embodiment of the invention, the method can also include the following steps:

Third data are determined from the data of the multi-component grammar model；

The third data are deleted from the data of the multi-component grammar model.

The embodiment of the present invention can also determine third data from the data of multi-component grammar model, and from the polynary text The third data are deleted in the data of method model, to compress to the multi-component grammar model further progress.Wherein, described Three data can refer to the data for still being able to realize speech recognition after being deleted.

In an alternative embodiment of the invention, the third data can specifically include: using the syntax as under prefix One layer of grammatical initial position.

Described by next layer of grammatical initial position of prefix of the syntax is specially low_id, in practical applications, low_ Id can be determined that therefore, the embodiment of the present invention can delete low_id, only according to the end position high_id of the adjacent syntax Retain high_id, by deleting low_id, a grammatical entry can save the memory space of 4B.

In an alternative embodiment of the invention, it can determine as follows described using the syntax as the next of prefix The initial position of the layer syntax:

In a kind of application example of the invention, it is assumed that the syntax " " low_id be 12345, end position high_id Be 23456, that is, storage location from 12345 to 23456 be with " " be prefix bi-gram, it is assumed that it is grammatical " " phase Adjacent next syntax are " I ", then the low_id of the syntax " I " be " " high_id add 1, therefore the syntax " I " Low_id is 23457.

It is stored by above-mentioned using first data as a high position for target data field, by second data Low level as the target data field is stored, and after the compression processing of deletion low_id, multi-component grammar model In the syntax can use following data structure:

struct lm_node2

{float prob；

unsigned int word_id；

int high_id；

unsigned int backoff_id；

}

In an alternative embodiment of the invention, the third data can specifically include: the top syntax are corresponding Rollback weight.

In practical applications, there is usually no rollback weight and lower layer's syntax for the top syntax, for example, for 3 yuan of texts Method model has 1 yuan, 2 yuan and 3 yuan type 3 grammar in total, and for type 3 grammar, its corresponding next layer of syntax is not present, Namely 4 yuan of syntax are not present, rollback weight is for next layer of grammar definition, and the next layer of syntax are not present in the top syntax, because This, can also be not present rollback weight.

Therefore, the embodiment of the present invention it is right can to delete the top syntax from the structured data of multi-component grammar model The next layer of syntax and the corresponding rollback weight of the top syntax answered, specifically, the embodiment of the present invention can be from above-mentioned lm_ Backoff_prob and high_id are further deleted in node2 structure, obtains the number of the top syntax (namely n-th layer syntax) It is as follows according to structure:

struct lm_higram_node1

{float prob；

unsigned int word_id；

unsigned int backoff_id；}

In embodiments of the present invention, struct lm_node1 is that the 1st layer of syntax are literary to n-th layer in the original grammatical model of n member The data structure that method uses, after the data processing of the embodiment of the present invention, the 1st layer of syntax to (n-1) the layer syntax can be with Data structure struct lm_higram_ can be used using data structure struct lm_node2 and the n-th layer syntax Node1 in this way, the n-th layer syntax use individual data structure, and can reduce the memory space of 4B.

According to the data type, the 4th data are stored.

In order to further decrease the memory space of multi-component grammar model occupancy, the embodiment of the present invention can also be to multi-component grammar The corresponding data type of the 4th data is compressed in model.

In an alternative embodiment of the invention, the 4th data can specifically include: grammatical conditional probability, And/or the rollback weight of the syntax；The data type can specifically include: double byte integer.

In a kind of application example of the invention, by every grammatical item in the multi-component grammar model in practical application Part probability and rollback weight are analyzed, and can learn the conditional probability of the syntax and the corresponding probability of rollback weight of the syntax It is worth usually in the probable range of very little, for example, usually between -10 to 0, and item grammatical in existing multi-component grammar model The data type of part Probability p rob and the rollback weight backoff_prob of the syntax are float type, occupy a large amount of nothing With space, therefore, the 4th data can specifically include the conditional probability and/or rollback weight of the syntax, the embodiment of the present invention The data type of the rollback weight backoff_prob of grammatical conditional probability prob and/or the syntax can be compressed, by In the corresponding numerical value of prob and/or backoff_prob usually between -10 to 0, short (double byte integer) can meet The numberical range of prob and/or backoff_prob, therefore, the embodiment of the present invention can determine the syntax conditional probability and/or The corresponding data type of rollback weight is that double byte integer stores the 4th data according to the double byte integer, Also i.e. by the corresponding data type of prob and/or backoff_prob by original float boil down to short.One short class The data of type only need to occupy the memory space of 2B, in this way, compressing by the corresponding data type of prob and backoff_prob After short, a grammatical entry can save the memory space of 2B again.It is appreciated that in practical applications, the present invention couple It is without restriction in the conditional probability and the compressed data type of rollback weight of the syntax, such as can be with boil down to character type Type char.

In an alternative embodiment of the invention, it is described using first data as target data field it is high-order into Row storage, can specifically include: store data class in high 12 of the target data field such as word sequence field of the syntax The compressed backoff_prob of type.

The embodiment of the present invention can be to the 4th data pair in above-mentioned data structure lm_node2 and lm_higram_node1 The data type answered is compressed, for the grammatical model of n member, wherein the 1st layer of syntax to (n-1) layer syntax data structure by Lm_node2 can be following with boil down to lm_node3:

struct lm_node3

{short prob；

unsigned int word_id；

int high_id；

unsigned int backoff_id；}

And the data structure of the n-th layer syntax can be following with boil down to by lm_higram_node1 lm_higram_ Node2, memory space can reduce 4B again:

struct lm_higram_node2

{short prob；

unsigned int word_id；

unsigned int backoff_id；}

In practical applications, multi-component grammar model is carried out by using the above-mentioned data processing method of the embodiment of the present invention Compression, a multi-component grammar model containing 10G syntax entry can be compressed to 128G by 280G hereinafter, not influencing model identification On the basis of effect, the memory space of multi-component grammar model occupancy is reduced.

To sum up, the data processing method of the embodiment of the present invention can determine the first number from the data of multi-component grammar model According to the second data, stored using first data as a high position for target data field, and, by second data Low level as the target data field is stored.The target data field is specifically as follows the multi-component grammar model Data in, there are the fields of spare bits, for example, the word sequence of the syntax can be used as to target data field, described first is several According to can specifically include: the corresponding number of plies of retracted position of grammatical rollback weight, and/or the syntax；Second data packet It includes: grammatical word sequence.It, can be by the corresponding field of word sequence word_id of the syntax in a kind of application example of the invention As target data field, low 17 of word_id are stored in using the word sequence word_id of the syntax as the second data, it will be literary The corresponding number of plies backoff_level of the retracted position of method is stored in the 3 high of word_id as the first data, and will be literary It is remaining 12 high that the rollback weight backoff_prob of method as the second data is stored in word_id, as a result, by utilizing text The spare bits of the word sequence of method allow a grammatical entry to save the memory space of 8B.For the grammatical entry of 10G, most The memory space that 80G can mostly be saved, can reduce the memory headroom of multi-component grammar model occupancy, and then voice knowledge can be improved Other speed.

Embodiment of the method two

Referring to Fig. 2, a kind of step flow chart of audio recognition method embodiment of the invention is shown, specifically can wrap It includes:

Step 201, load multi-component grammar model；The first data in the multi-component grammar model are stored in target data word A high position for section, and, the second data are stored in the low level of the target data field；Wherein, first data may include: The retracted position of the syntax, and/or the corresponding number of plies of retracted position of the syntax；Second data may include: the word order of the syntax Column；

Step 202 carries out speech recognition to the syntax according to the multi-component grammar model；

Wherein, described that speech recognition is carried out to the syntax according to the multi-component grammar model, may include:

The multi-component grammar model obtains more after being specifically as follows the processing of the data processing method shown in Fig. 1 (compression) First syntax model.The embodiment of the present invention can load compressed multi-component grammar model in speech recognition process, more to reduce The memory headroom that first syntax model occupies, and then the speed of speech recognition can be improved.In particular, for online speech recognition, In order to improve speech recognition effect, the n of multi-component grammar model is usually larger, if n is 4 or 5 or 6, multi-component grammar model is caused to account for With biggish memory headroom, the multi-component grammar model obtained after data processing method compression through the embodiment of the present invention, in not shadow On the basis of ringing recognition effect, memory headroom can be saved.

The first data in the embodiment of the present invention in multi-component grammar model used by speech recognition are stored in target data A high position for field, and, the second data are stored in the low level of the target data field；Wherein, first data can wrap It includes: the corresponding number of plies of retracted position of grammatical retracted position, and/or the syntax；Second data may include: the syntax Word sequence.

The target data field is specifically as follows in the data of the multi-component grammar model, there are the field of spare bits, For example, word_id is the word sequence of the syntax.First data and the second data are specifically as follows the multi-component grammar model In data, the data for the memory space that the target data field occupies can be shared.In a kind of application example of the invention, The word sequence word_id of the syntax can be deposited as the second data using the word sequence word_id of the syntax as target data field Low 17 in word_id are stored up, the corresponding number of plies backoff_level of retracted position of the syntax is stored in as the first data Word_id's is 3 high, and is stored in word_id residue for the rollback weight backoff_prob of the syntax as the second data It is 12 high.As a result, by utilizing the spare bits of target data field word_id, existing multi-component grammar model is pressed Contracting, obtains compressed multi-component grammar model, and a grammatical entry is allowed to save the memory space of 8B.

In embodiments of the present invention, described that speech recognition is carried out to the syntax according to the multi-component grammar model, it specifically can be with Include:

Specifically, the word sequence that the second data such as syntax can be obtained from low 17 of target data field word_id, from High 3 of target data field word_id obtain the corresponding number of plies of retracted position of the first data such as syntax, and from number of targets The rollback weight of the first data such as syntax is obtained according to field word_id remaining high 12.

It is appreciated that the embodiment of the present invention is for the specific of the target data field, the first data and the second data Content is without restriction, in addition, the embodiment of the present invention is for the first data and the second data in the target data field The concrete mode of storage is also without restriction.For example, first data can be used as target data field, alternatively, can also be with Using second data as target data field.

In an alternative embodiment of the invention, the multi-component grammar model can not include: using the syntax as prefix Next layer of grammatical initial position；It is described that speech recognition is carried out to the syntax according to the multi-component grammar model, it can also specifically wrap It includes:

Described by next layer of grammatical initial position of prefix of the syntax is specially low_id, in practical applications, low_ Id can be determined that therefore, the embodiment of the present invention can delete low_id, only according to the end position high_id of the adjacent syntax Retain high_id, by deleting low_id, a grammatical entry can save the memory space of 4B again.

In an alternative embodiment of the invention, the multi-component grammar model can not include: that the top syntax are corresponding Rollback weight.

In practical applications, there is usually no rollback weight and the next layer of syntax for the top syntax, and therefore, the present invention is real Applying in the multi-component grammar model of example can not include the corresponding rollback weight of the top syntax, and multi-component grammar model again may be used as a result, To reduce the memory space of 5B.

In an alternative embodiment of the invention, the multi-component grammar model may include: the syntax conditional probability, And/or the rollback weight of the syntax；The conditional probability of the syntax, and/or the corresponding data type of rollback weight of the syntax are as follows: Double byte integer.

Since the conditional probability of the syntax and the corresponding probability value of rollback weight of the syntax are usually in the probable range of very little Interior therefore grammatical in the multi-component grammar model in embodiment of the present invention conditional probability and/or the corresponding number of targets of rollback weight Can be by original floating point type float boil down to double byte integer short according to type, a grammatical entry again can be with as a result, Save the memory space of 2B.

To sum up, the data processing method of the embodiment of the present invention can carry out voice knowledge according to the multi-component grammar model of load Not；The first data in the multi-component grammar model are stored in a high position for target data field, and, the second data are stored in institute State the low level of target data field；Wherein, first data include: the retracted position of the syntax, and/or the rollback position of the syntax Set the corresponding number of plies；Second data include: the word sequence of the syntax.As a result, by utilizing the spare bits of the word sequence of the syntax, A grammatical entry is allowed to save the memory space of 8B.For the grammatical entry of 10G, the storage of 80G can be at most saved Space, can reduce the memory headroom of multi-component grammar model occupancy, and then the speed of speech recognition can be improved.

Installation practice one

Referring to Fig. 3, a kind of structural block diagram of data processing equipment embodiment of the invention is shown, can specifically include:

First determining module 301, for determining the first data and the second data from the data of multi-component grammar model；It is described First data can specifically include: the corresponding number of plies of retracted position of grammatical rollback weight, and/or the syntax；Second number According to can specifically include: grammatical word sequence；

First memory module 302, for being stored using first data as a high position for target data field, with And it is stored second data as the low level of the target data field.

Optionally, the target data field can specifically include: the first data or the corresponding field of the second data.

Optionally, described device can also include:

Optionally, the third data can specifically include: using the syntax as next layer of grammatical initial position of prefix.

Optionally, the third data can specifically include: the corresponding rollback weight of the top syntax.

Optionally, described device can also include:

Optionally, the 4th data can specifically include: the rollback weight of grammatical conditional probability, and/or the syntax； The data type includes: double byte integer.

Installation practice two

Referring to Fig. 4, a kind of structural block diagram of speech recognition equipment embodiment of the invention is shown, can specifically include:

Loading module 401；For loading multi-component grammar model；The first data in the multi-component grammar model are stored in mesh A high position for data field is marked, and, the second data are stored in the low level of the target data field；Wherein, first data It can specifically include: the corresponding number of plies of retracted position of grammatical retracted position, and/or the syntax；Second data specifically may be used To include: the word sequence of the syntax；

Identification module 402, for carrying out speech recognition to the syntax according to the multi-component grammar model；

Wherein, the identification module 402, can specifically include:

Optionally, the multi-component grammar model can not include: the next layer of grammatical initial position using the syntax as prefix； The identification module 402, can specifically include:

Optionally, the multi-component grammar model can not include: the corresponding rollback weight of the top syntax.

Optionally, the multi-component grammar model may include: the conditional probability of the syntax, and/or the rollback weight of the syntax； The conditional probability of the syntax, and/or the corresponding data type of rollback weight of the syntax can specifically include: double byte integer.

For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple Place illustrates referring to the part of embodiment of the method.

All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.

About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.

The embodiment of the invention also discloses a kind of device for data processing, include memory and one or More than one program, perhaps more than one program is stored in memory and is configured to by one or one for one of them It includes the instruction for performing the following operation that a above processor, which executes the one or more programs:

Third data are determined from the data of the multi-component grammar model；

The third data are deleted from the data of the multi-component grammar model.

Optionally, the third data include: the corresponding next layer of syntax of the top syntax.

According to the data type, the 4th data are stored.

The embodiment of the invention also discloses a kind of device for speech recognition, include memory and one or More than one program, perhaps more than one program is stored in memory and is configured to by one or one for one of them It includes the instruction for performing the following operation that a above processor, which executes the one or more programs:

Fig. 5 is a kind of block diagram of device 800 for data processing shown according to an exemplary embodiment.For example, dress Setting 800 can be mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, medical treatment Equipment, body-building equipment, personal digital assistant etc..

Referring to Fig. 5, device 800 may include following one or more components: processing component 802, memory 804, power supply Component 806, multimedia component 808, audio component 810, the interface 812 of input/output (I/O), sensor module 814, and Communication component 816.

The integrated operation of the usual control device 800 of processing component 802, such as with display, telephone call, data communication, phase Machine operation and record operate associated operation.Processing element 802 may include that one or more processors 820 refer to execute It enables, to perform all or part of the steps of the methods described above.In addition, processing component 802 may include one or more modules, just Interaction between processing component 802 and other assemblies.For example, processing component 802 may include multi-media module, it is more to facilitate Interaction between media component 808 and processing component 802.

Memory 804 is configured as storing various types of data to support the operation in equipment 800.These data are shown Example includes the instruction of any application or method for operating on device 800, contact data, and telephone book data disappears Breath, picture, video etc..Memory 804 can be by any kind of volatibility or non-volatile memory device or their group It closes and realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash Device, disk or CD.

Power supply module 806 provides electric power for the various assemblies of device 800.Power supply module 806 may include power management system System, one or more power supplys and other with for device 800 generate, manage, and distribute the associated component of electric power.

Multimedia component 808 includes the screen of one output interface of offer between described device 800 and user.One In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action Boundary, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more matchmakers Body component 808 includes a front camera and/or rear camera.When equipment 800 is in operation mode, such as screening-mode or When video mode, front camera and/or rear camera can receive external multi-medium data.Each front camera and Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.

Audio component 810 is configured as output and/or input audio signal.For example, audio component 810 includes a Mike Wind (MIC), when device 800 is in operation mode, when such as call model, logging mode and offline speech recognition mode, microphone It is configured as receiving external audio signal.The received audio signal can be further stored in memory 804 or via logical Believe that component 816 is sent.In some embodiments, audio component 810 further includes a loudspeaker, is used for output audio signal.

I/O interface 812 provides interface between processing component 802 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock Determine button.

Sensor module 814 includes one or more sensors, and the state for providing various aspects for device 800 is commented Estimate.For example, sensor module 814 can detecte the state that opens/closes of equipment 800, and the relative positioning of component, for example, it is described Component is the display and keypad of device 800, and sensor module 814 can be with 800 1 components of detection device 800 or device Position change, the existence or non-existence that user contacts with device 800,800 orientation of device or acceleration/deceleration and device 800 Temperature change.Sensor module 814 may include proximity sensor, be configured to detect without any physical contact Presence of nearby objects.Sensor module 814 can also include optical sensor, such as CMOS or ccd image sensor, at As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.

Communication component 816 is configured to facilitate the communication of wired or wireless way between device 800 and other equipment.Device 800 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.In an exemplary implementation In example, communication component 816 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 816 further includes near-field communication (NFC) module, to promote short range communication.Example Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.

In the exemplary embodiment, device 800 can be believed by one or more application specific integrated circuit (ASIC), number Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally provided It such as include the memory 804 of instruction, above-metioned instruction can be executed by the processor 820 of device 800 to complete the above method.For example, The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk With optical data storage devices etc..

Fig. 6 is the structural schematic diagram of server in the embodiment of the present invention.The server 1900 can be different because of configuration or performance And generate bigger difference, may include one or more central processing units (central processing units, CPU) 1922 (for example, one or more processors) and memory 1932, one or more storage application programs 1942 or data 1944 storage medium 1930 (such as one or more mass memory units).Wherein, memory 1932 It can be of short duration storage or persistent storage with storage medium 1930.Be stored in storage medium 1930 program may include one or More than one module (diagram does not mark), each module may include to the series of instructions operation in server.Further Ground, central processing unit 1922 can be set to communicate with storage medium 1930, and storage medium 1930 is executed on server 1900 In series of instructions operation.

Server 1900 can also include one or more power supplys 1926, one or more wired or wireless nets Network interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one or More than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM Etc..

A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by device (terminal or clothes Be engaged in device) processor execute when, enable a device to execute method shown in fig. 1 or fig. 2.

A kind of data processing method, which comprises the first data and the are determined from the data of multi-component grammar model Two data；First data include: the rollback weight of the syntax, and/or the corresponding number of plies of retracted position of the syntax；Described Two data include: the word sequence of the syntax；It is stored using first data as a high position for target data field, and, it will Second data are stored as the low level of the target data field.

A kind of audio recognition method, comprising:

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its Its embodiment.The present invention is directed to cover any variations, uses, or adaptations of the invention, these modifications, purposes or Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claim is pointed out.

It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.

The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Above to a kind of data processing method provided by the present invention, a kind of data processing equipment and a kind of at data The device of reason and a kind of audio recognition method, a kind of speech recognition equipment and a kind of device for speech recognition carry out It is discussed in detail, used herein a specific example illustrates the principle and implementation of the invention, above embodiments Illustrate to be merely used to help understand method and its core concept of the invention；At the same time, for those skilled in the art, according to According to thought of the invention, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification It should not be construed as limiting the invention.

Claims

1. a kind of data processing method, which is characterized in that the described method includes:

The first data and the second data are determined from the data of multi-component grammar model；First data include: the rollback of the syntax The corresponding number of plies of retracted position of weight, and/or the syntax；Second data include: the word sequence of the syntax；

It is stored using first data as a high position for target data field, and, using second data as described in The low level of target data field is stored.

2. the method according to claim 1, wherein the target data field includes: the first data or The corresponding field of two data.

3. the method according to claim 1, wherein the method also includes:

Third data are determined from the data of the multi-component grammar model；

The third data are deleted from the data of the multi-component grammar model.

4. a kind of audio recognition method, which is characterized in that the described method includes:

Load multi-component grammar model；The first data in the multi-component grammar model are stored in a high position for target data field, with And second data be stored in the low level of the target data field；Wherein, first data include: the syntax retracted position, And/or the corresponding number of plies of retracted position of the syntax；Second data include: the word sequence of the syntax；

High-order from the target data field obtains the first data, and, the is obtained from the low level of the target data field Two data.

5. a kind of data processing equipment, which is characterized in that described device includes:

First determining module, for determining the first data and the second data from the data of multi-component grammar model；First number According to the corresponding number of plies of retracted position of the rollback weight, and/or the syntax that include: the syntax；Second data include: the syntax Word sequence；

First memory module, for being stored using first data as a high position for target data field, and, it will be described Second data are stored as the low level of the target data field.

6. a kind of speech recognition equipment, which is characterized in that described device includes:

Loading module, for loading multi-component grammar model；The first data in the multi-component grammar model are stored in target data A high position for field, and, the second data are stored in the low level of the target data field；Wherein, first data include: text The retracted position of method, and/or the corresponding number of plies of retracted position of the syntax；Second data include: the word sequence of the syntax；

Wherein, the identification module, comprising:

Data acquisition module, for the first data of high-order acquisition from the target data field, and, from the target data The low level of field obtains the second data.

7. a kind of device for data processing, which is characterized in that include memory and one or more than one journey Sequence, perhaps more than one program is stored in memory and is configured to by one or more than one processor for one of them Executing the one or more programs includes the instruction for performing the following operation:

8. a kind of device for speech recognition, which is characterized in that include memory and one or more than one journey Sequence, perhaps more than one program is stored in memory and is configured to by one or more than one processor for one of them Executing the one or more programs includes the instruction for performing the following operation:

9. a kind of machine readable media is stored thereon with instruction, when executed by one or more processors, so that device executes Data processing method as described in one or more in claims 1 to 3.

10. a kind of machine readable media is stored thereon with instruction, when executed by one or more processors, so that device is held Row audio recognition method as claimed in claim 4.