CN111354348B

CN111354348B - Data processing method and device for data processing

Info

Publication number: CN111354348B
Application number: CN201811574155.0A
Authority: CN
Inventors: 姚光超
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2018-12-21
Filing date: 2018-12-21
Publication date: 2024-04-26
Anticipated expiration: 2038-12-21
Also published as: CN111354348A

Abstract

The embodiment of the invention provides a data processing method, a data processing device and a data processing device. The method specifically comprises the following steps: determining a corresponding active node of the current voice frame in a decoding network; determining an ordered linked list corresponding to the precursor node of the active node according to the node index of the precursor node received by the active node from the precursor node of the active node and the mapping relation between the node index and the ordered linked list; merging at least one ordered linked list to be merged according to a preset merging algorithm; wherein the at least one ordered linked list to be combined comprises: and an ordered linked list corresponding to at least one precursor node of the active nodes. The embodiment of the invention can reduce the complexity of merging a plurality of ordered linked lists, and further can improve the decoding speed and the voice recognition efficiency.

Description

Data processing method and device for data processing

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a data processing method and apparatus, and a device for data processing.

Background

Speech recognition, also known as ASR (Automatic Speech Recognition ), aims to convert the lexical content in speech into computer-readable inputs, such as keys, binary codes, or character sequences.

Specifically, knowledge sources such as an acoustic model, a language model, a pronunciation dictionary and the like can be compiled into a decoding network, and voice recognition is a process of searching an optimal path in the decoding network.

However, since the decoding network is generally huge, there is a problem that the decoding speed is slow in the process of traversing the decoding network to decode the voice information, and thus the voice recognition efficiency is affected.

Disclosure of Invention

The embodiment of the invention provides a data processing method, a data processing device and a data processing device, which can improve the decoding speed and the voice recognition efficiency.

In order to solve the above problems, an embodiment of the present invention discloses a data processing method, including:

determining a corresponding active node of the current voice frame in a decoding network;

Determining an ordered linked list corresponding to the precursor node of the active node according to the node index of the precursor node received by the active node from the precursor node of the active node and the mapping relation between the node index and the ordered linked list;

merging at least one ordered linked list to be merged according to a preset merging algorithm; wherein the at least one ordered linked list to be combined comprises: and an ordered linked list corresponding to at least one precursor node of the active nodes.

In another aspect, an embodiment of the present invention discloses a data processing apparatus, including:

The node determining module is used for determining the corresponding active node of the current voice frame in the decoding network;

The linked list determining module is used for determining an ordered linked list corresponding to the precursor node of the active node according to the node index of the precursor node received by the active node from the precursor node of the active node and the mapping relation between the node index and the ordered linked list;

the chain table merging module is used for merging at least one ordered chain table to be merged according to a preset merging algorithm; wherein the at least one ordered linked list to be combined comprises: and the plurality of precursor nodes of the active node correspond to an ordered linked list.

In yet another aspect, an embodiment of the present invention discloses an apparatus for data processing, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:

In yet another aspect, embodiments of the invention disclose a machine-readable medium having instructions stored thereon that, when executed by one or more processors, cause an apparatus to perform a data processing method as described in one or more of the preceding.

The embodiment of the invention has the following advantages:

After determining the corresponding active node of the current voice frame in the decoding network, the embodiment of the invention can know which precursor nodes exist in the active node according to the node index of the precursor node received by the active node from the precursor node of the active node, and further can determine the corresponding ordered linked list of the precursor node of the active node according to the node index of the precursor node of the active node and the mapping relation between the node index and the ordered linked list, so that the ordered linked list corresponding to a plurality of precursor nodes of the active node can be combined once according to a preset combining algorithm. Therefore, under the condition that the number of precursor nodes of the active node is more, a preset merging algorithm can be adopted to merge the ordered linked lists corresponding to the precursor nodes, and compared with the method that the ordered linked lists are sequentially merged for multiple times, the complexity of merging the ordered linked lists can be reduced, and then the decoding speed and the voice recognition efficiency can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of steps of an embodiment of a data processing method of the present invention;

FIG. 2 is a schematic diagram of a decoding network according to an embodiment of the present invention;

FIG. 3 is a block diagram of an embodiment of a data processing apparatus of the present invention;

FIG. 4 is a block diagram of an apparatus 800 for data processing according to the present invention; and

Fig. 5 is a schematic diagram of a server in some embodiments of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Method embodiment

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a data processing method according to the present invention may specifically include the following steps:

step 101, determining a corresponding active node of a current voice frame in a decoding network;

102, determining an ordered linked list corresponding to a precursor node of the active node according to a node index of the precursor node received by the active node from the precursor node of the active node and a mapping relation between the node index and the ordered linked list;

Step 103, merging at least one ordered linked list to be merged according to a preset merging algorithm; wherein the at least one ordered linked list to be combined comprises: and the plurality of precursor nodes of the active node correspond to an ordered linked list.

The data processing method of the embodiment of the invention can be applied to a voice recognition scene, and can be operated on electronic equipment, wherein the electronic equipment comprises but is not limited to: servers, smartphones, tablet computers, electronic book readers, MP3 (moving picture experts compression standard audio layer 3,Moving Picture Experts Group Audio Layer III) players, MP4 (moving picture experts compression standard audio layer 4,Moving Picture Experts Group Audio Layer IV) players, laptop computers, car computers, desktop computers, set top boxes, smart televisions, wearable devices, and the like.

It can be understood that the method for obtaining the voice information to be recognized in the embodiment of the present invention is not limited, for example, the electronic device may obtain the voice information to be recognized from a client or a network in a wired connection manner or a wireless connection manner, or may obtain the voice information to be recognized by recording the electronic device in real time, or may also obtain the voice information to be recognized according to an instant communication message obtained in an instant communication application, etc.

In the embodiment of the invention, the voice information to be recognized can be segmented into a plurality of voice frames according to the preset window length and frame shift, wherein each voice frame can be a voice fragment, and then the voice information can be decoded frame by frame. If the voice information to be recognized is analog voice information (such as a recording of a user call), the analog voice information needs to be converted into digital voice information, and then the voice information is segmented.

Wherein the window length may be used to represent the duration of each frame of speech segment and the frame shift may be used to represent the time difference between adjacent frames. For example, when the window length is 25ms and the frame is shifted by 15ms, the first frame of voice segment is 0-25 ms, the second frame of voice segment is 15-40 ms, and so on, the segmentation of the voice information to be recognized can be realized. It will be appreciated that the specific window length and frame shift may be set according to the actual requirements, and embodiments of the present invention are not limited in this respect.

Optionally, before the voice information to be recognized is segmented, the electronic device may further perform noise reduction processing on the voice information to be recognized, so as to improve a subsequent processing capability on the voice information.

In the embodiment of the present invention, the electronic device may have a decoding network stored therein in advance. The decoding network is a search network constructed from a plurality of knowledge sources, which may include acoustic models, acoustic contexts, pronunciation dictionaries, language models, and the like. The decoding process is to search in the decoding network and select one or more optimal paths as the voice recognition result.

In an alternative embodiment of the present invention, the decoding network may specifically include: a static decoding network based on a weighted finite state machine, or a dynamic decoding network based on a prefix tree.

In the embodiment of the present invention, the decoding network may be a dynamically constructed decoding network or a statically constructed decoding network, and it is understood that the embodiment of the present invention does not limit the specific manner of constructing the decoding network.

The decoding network is dynamically constructed, a pronunciation dictionary of the prefix tree is used as a searching network, and other knowledge sources are dynamically integrated in the decoding process. Because the dynamically constructed decoding network only contains the pronunciation dictionary, memory space can be saved by using the dynamically constructed decoding network.

The statically constructed decoding network may be a WFST (WEIGHTED FINITE STATE Transducer) based decoding network that weights finite state machines. The statically constructed decoding network compiles knowledge sources such as an acoustic model, an acoustic context, a pronunciation dictionary, a language model and the like into a search network, and the statically constructed decoding network is used although the search network occupies a large memory, so that the decoding speed can be improved.

In a specific application, the basic structure of the decoding network is a directed graph, which consists of nodes and edges. The directed graph contains effective information (such as acoustic information and language information) related to voice recognition, and the effective information can be on the edge or the node. Speech recognition is the process of finding an optimal path in the directed graph based on the input speech information.

It is to be understood that the embodiments of the present invention are applicable to decoding networks where the valid information is on a node or decoding networks where the valid information is on an edge. For convenience of description, in the embodiment of the present invention, the decoding network of the valid information on the node is taken as an example, and the decoding process is similar to the decoding network of the valid information on the edge, and the decoding processes are referred to each other.

Since the decoding network for speech recognition is huge, if all nodes in the decoding network are traversed, the recognition speed is extremely slow, and thus the decoding process is a breadth-search process with pruning operation. For a decoding network where valid information is on a node, the node left by pruning in decoding a speech frame is called an active node. Likewise, for decoding networks where the valid information is on edges, the edges left by pruning are called active edges.

In an optional embodiment of the present invention, the active node may specifically include any one of the following information: triphones, acoustic states, words, syllables, phonemes.

It will be appreciated that embodiments of the present invention do not limit the granularity represented by the decoding network node. Optionally, the active node may specifically include any one of the following information: triphones, acoustic states, words, syllables, phonemes, i.e. the granularity represented by the nodes of the decoding network may be triphones, or acoustic states, or words, or syllables, or phonemes. Taking Chinese as an example, you's word, "ni" is a phoneme, "ni (3)" is a syllable, and "sil-n+i" is a triphone. The acoustic state may specifically be an HMM (Hidden Markov Model ) state.

For example, the decoding network may be a search network constructed from HMM models, triphone models, pronunciation dictionaries, and language models, where HMM models, triphone models, pronunciation dictionaries, and language models actually describe the possible search spaces of the decoding network at different granularities.

Wherein, the HMM model defines an HMM state sequence corresponding to each triphone. By assuming the state corresponding to each frame of speech segment, a search can be made over the state sequence of the HMM, resulting in a possible triphone sequence. The triphone model defines the correspondence from triphones to phonemes, and possible phoneme sequences can be obtained from the triphone sequences generated by the HMM model. The pronunciation dictionary defines words represented by the phoneme sequences, and corresponding word sequences can be obtained according to possible phoneme sequences generated by the triphone model. The language model defines the probability of occurrence of a word sequence, and the probability score of the word sequence can be obtained according to the word sequence generated by the pronunciation dictionary.

In the embodiment of the invention, in the process of decoding voice information frame by frame according to a decoding network, firstly, the corresponding active node of the current voice frame in the decoding network is determined; then, determining an ordered linked list corresponding to the precursor node of the active node according to the node index of the precursor node received by the active node from the precursor node of the active node and the mapping relation between the node index and the ordered linked list; and finally, merging the orderly linked lists corresponding to the precursor nodes of the active node according to a preset merging algorithm. The method specifically may include: candidate paths from the start node to the precursor node of the decoding network.

In a specific application, the decoding process typically employs token passing (token passing) algorithm. Wherein token refers to an optimal path from a starting node to a current node, and an ordered linked list can be used for storing an optimal path set from the starting node to the current node, that is, candidate paths in the ordered linked list can be specifically: the first n paths (n is a natural number) with the highest score from the start node to the precursor node, and the candidate paths in the ordered linked list may be ordered from high to low in score.

In the embodiment of the invention, the larger the value of n is, the more candidate paths in the ordered linked list are, the higher the decoding accuracy is, but the speed of decoding is affected along with the increase of the value of n, so that the value of n can be set by considering the accuracy and the speed in a balanced manner in practical application. For example, the value of n may be set to 32.

Referring to fig. 2, a schematic diagram of a decoding network according to an embodiment of the present invention is shown. The decoding process is to traverse the entire decoding network back from the starting node (e.g., node 0) in fig. 2. Assuming that the current time is t1, and determining the active node corresponding to the current voice frame at the time t1 includes: node 2, node 3, node 4 and node 5.

The nodes 2,3, 4 and 5 have respectively stored therein ordered linked lists, such as list2, list3, list4 and list5, transferred from their respective predecessor nodes. And the node 2, the node 3, the node 4 and the node 5 respectively generate the ordered linked list corresponding to each node according to the received ordered linked list of the precursor node and the acoustic model score or the language model score of the node, and then transmit the generated ordered linked list to each postdriving node.

For example, according to the list2 and the acoustic model score or language model score of the node 2, the node 2 generates an ordered linked list corresponding to the node 2 as a list2', the list2' includes candidate paths from the node 0 to the node 2, and the node 2 transmits the list2' to a successor node (node 6) thereof; similarly, node 3 generates an ordered linked list3', including candidate paths from node 0 to node 3, node 3 passing list3' to its successor node (node 6); node 4 generates an ordered linked list4', including candidate paths from node 0 to node 4, node 4 passing list4' to its successor node (node 6); node 5 generates an ordered linked list5', which includes candidate paths from node 0 to node 5, with node 5 passing list5' to its successor node (node 7). Then, an active node at the next moment (such as the moment t 2) is generated according to the token passing algorithm, the decoding process of the voice frame at the moment t1 is finished, the decoding operation of the voice frame at the moment t2 is started to be executed, and the voice frame at the moment t2 can be recorded as the current voice frame.

In decoding the speech frame at time t2, node 6 needs to merge list2' from node 2, list3' from node 3, and list4' from node 4, as node 6 has 3 predecessor nodes (node 2, node 3, and node 4) when traversing to node 6. Since node 6 does not know which precursor nodes it exists, node 6 sequentially merges the ordered linked list of received precursor nodes.

For example, when receiving the list2 'from the node 2, the node 6 merges with its own list (ordered list), and since its own list is empty, it remains as the list2' after merging with the empty list, and therefore, the list2 'is stored, when receiving the list3' from the node 3, the list2 'and the list3' are merged, and when receiving the list4 'from the node 4, the result after merging the list2' and the list3 'is merged with the list 4'.

However, in practical applications, since the decoding network is generally bulky, the number of ordered linked lists to be merged is generally large. Assuming that k ordered linked lists to be combined are provided, the length of each ordered linked list is n, and the combined ordered linked list is set to only reserve n candidate paths with highest scores, if the k ordered linked lists are combined in sequence, the complexity of the combination is O (nk). With a large k value, the merging speed is slow, and the voice recognition efficiency is affected.

In order to reduce the complexity of merging a plurality of ordered linked lists and improve the decoding speed and the voice recognition efficiency, the embodiment of the invention can establish the mapping relation between the node index and the ordered linked list and transfer the node index in the token transfer process.

In an alternative embodiment of the present invention, the method may further include: and transmitting the node index of the active node to a successor node of the active node.

Thus, after determining the corresponding active nodes of the current voice frame in the decoding network, each active node has received the node indexes of all the precursor nodes, that is, the active node can know which precursor nodes exist, and further, the ordered linked list corresponding to the precursor nodes of the active node can be determined according to the node indexes of the precursor nodes and the mapping relation between the node indexes and the ordered linked list, so that the ordered linked lists corresponding to the precursor nodes of the active node can be combined according to a preset combination algorithm, and the combination complexity is reduced.

For example, take the decoding network in fig. 2 as an example. A mapping relationship between the node index of the node2 (e.g., node 2) and the ordered list2', a mapping relationship between the node index of the node3 (e.g., node 3) and the ordered list3', a mapping relationship between the node index of the node4 (e.g., node 4) and the ordered list4', and a mapping relationship between the node index of the node5 (e.g., node 5) and the ordered list5' may be established.

During decoding, node2, node3, node4, and node5 each pass their respective node indexes to their respective successor nodes. Specifically, node2 transmits node2 to its successor node (node 6), node3 transmits node3 to its successor node (node 6), node4 transmits node4 to its successor node (node 6), and node5 transmits node5 to its successor node (node 7).

After the decoding of the voice frame at the time t1 is completed, the current voice frame at the time t2 may be decoded, and it is assumed that determining, at the time t2, the active node corresponding to the current voice frame includes: node 6 and node 7 can obtain ordered lists (list 2', list3' and list4 ') of the precursor node (node 2, node3 and node 4) of node 6 according to the node index (node 2, node3 and node 4) of the precursor node received by node 6 and the mapping relation between the node index of the active node and the ordered list of the active node, and then a preset merging algorithm can be adopted to merge list2', list3 'and list 4'. And then, carrying out a process of backward transferring node indexes by the active node at the t2 time until all the voice frames are decoded, and returning a voice recognition result after the last frame of voice frame is decoded.

It can be seen that in the token passing process, the node index of the node can be passed, after the active node corresponding to the current voice frame is determined, the node index of the precursor node and the mapping relationship between the node index and the ordered linked list, which can be received by the active node from the precursor node, of the active node are determined, the ordered linked list corresponding to the precursor node of the active node is determined, and the ordered linked lists corresponding to the precursor nodes of the active node are combined once instead of sequentially combining the ordered linked lists for multiple times according to a preset combining algorithm, so that the complexity of combining the ordered linked lists can be reduced, and the voice recognition efficiency is improved.

It will be appreciated that the embodiment of the present invention does not limit the preset merge algorithm. For example, the merging algorithm may include a maximum stacking method or a divide-and-conquer method, or the like.

In an optional embodiment of the present invention, the merging at least one ordered linked list to be merged according to a preset merging algorithm may specifically include:

step S11, establishing a maximum heap according to the number of ordered linked lists to be combined;

step S12, storing the linked list head of the ordered linked list to be combined into the maximum heap;

and S13, deleting the top of the maximum heap to acquire the data elements in the top of the heap, and adding the next data element of the ordered linked list where the data elements are located into the maximum heap until the merging termination condition is met.

In the embodiment of the invention, the ordered linked list can be an ordered linked list ordered from high to low according to the path scores of the candidate paths. Therefore, the embodiment of the invention can rapidly sort a plurality of ordered linked lists by adopting a maximum stacking method.

For example, assuming that the number of ordered linked lists to be combined is k, a maximum heap with the capacity of k can be established according to the number of ordered linked lists to be combined, the linked list head of the ordered linked list to be combined is stored in the maximum heap, data elements in the top of the heap are fetched each time, the top of the heap is deleted to obtain the data elements in the top of the heap, then the next data element of the ordered linked list where the data elements are located in the top of the heap is added into the maximum heap, and the above process is repeated until the combination termination condition is met, so that the combination of k ordered linked lists can be completed. The merging termination condition may specifically include: and the elements in the maximum heap are empty, or the length of the combined linked list reaches the preset length.

Still taking fig. 2 as an example, node 6 needs to combine list2' from node 2, list3' from node 3, and list4' from node 4. Assuming list2 'is 10- >8- >5- >2, where 10 indicates that the score of the first candidate path in the ordered linked list2' is 10,8 indicates that the score of the second candidate path is 8, and so on; assuming that list3' is 12- >7- >2- >0 and list4' is 9- >6- >3- >1, the combined results for list2', list3', and list4' are 12- >10- >9- >8.

Because the complexity of executing step S13 is O (log), the complexity of merging k ordered linked lists with length n is O (nlogk), and compared with the complexity O (nk) of the sequential merging mode, the complexity of merging a plurality of ordered linked lists can be greatly reduced by adopting a maximum heap merging algorithm, so that the decoding speed can be improved, the efficiency of voice recognition can be improved, and especially, under the condition that the value of k is larger, the accelerating effect can be more obvious.

S21, equally dividing at least one ordered linked list to be combined into two groups, and equally dividing the ordered linked list in each group into two groups until the number of the ordered linked lists in each group after grouping is 1;

And S22, merging every group of ordered linked lists after grouping every two groups until merging each group of ordered linked list into one ordered linked list.

The embodiment of the invention can also adopt a divide-and-conquer method to rapidly combine a plurality of ordered linked lists so as to reduce the complexity of combination.

For example, k ordered linked lists are combined, firstly, the k ordered linked lists can be equally divided into two groups, each group contains k/2 ordered linked lists, and then the divided ordered linked lists in each group are equally divided into two groups until the number of the ordered linked lists in each group is 1; and then starting to combine, and combining every two groups of divided ordered linked lists until the ordered linked lists are combined into one ordered linked list. The complexity of merging k ordered linked lists with the length of n is O (nlogk), so that a divide-and-conquer method is adopted to merge a plurality of ordered linked lists, the complexity of merging a plurality of ordered linked lists can be greatly reduced, and then the decoding speed and the voice recognition efficiency can be improved.

It can be understood that the above-mentioned maximum stacking method and divide-and-conquer method are only examples of an application of the present invention, and in practical application, those skilled in the art can flexibly select the combination algorithm according to the actual situation, which is not limited by the embodiment of the present invention.

In an alternative embodiment of the present invention, the method may further include:

S31, determining the number of ordered linked lists to be combined;

Step S32, determining the active nodes of which the number of the ordered linked lists to be combined exceeds a preset threshold as target active nodes;

the at least one ordered linked list to be combined specifically may include: and the plurality of precursor nodes of the target active node correspond to an ordered linked list.

In the embodiment of the invention, the number of the ordered linked lists to be combined can be judged, and the active node with the number of the ordered linked lists to be combined exceeding the preset threshold is taken as the target active node, wherein the target active node can refer to the active node with more precursor nodes, and a great deal of time is required to be consumed for combining the ordered linked lists of the plurality of precursor nodes.

It may be appreciated that the specific value of the preset threshold is not limited in the embodiment of the present invention, for example, the preset threshold may be set to 8, etc.

Optionally, if the number of precursor nodes of the active node is smaller, for example, only 2 precursor nodes are provided, even if the sequential linked lists of the precursor nodes are combined in sequence, the complexity of the combination is not high, and the efficiency of voice recognition is not affected.

In summary, after determining the corresponding active node of the current voice frame in the decoding network, the embodiment of the invention can learn which precursor nodes exist in the active node according to the node index of the precursor node received by the active node from the precursor node of the active node, and further can determine the ordered linked list corresponding to the precursor node of the active node according to the node index of the precursor node of the active node and the mapping relation between the node index and the ordered linked list, so that the ordered linked lists corresponding to the precursor nodes of the active node can be combined according to a preset combining algorithm. Therefore, under the condition that the number of the precursor nodes of the active node is more, a preset merging algorithm can be adopted to merge the ordered linked lists corresponding to the precursor nodes, and compared with the case that the ordered linked lists are sequentially merged, the complexity of merging the ordered linked lists can be reduced, and then the decoding speed and the voice recognition efficiency can be improved.

It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.

Device embodiment

With reference to FIG. 3, there is shown a block diagram of an embodiment of a data processing apparatus of the present invention, which may include in particular:

A node determining module 301, configured to determine an active node corresponding to a current speech frame in a decoding network;

The linked list determining module 302 is configured to determine an ordered linked list corresponding to a precursor node of the active node according to a node index of the precursor node received by the active node from the precursor node thereof and a mapping relationship between the node index and the ordered linked list;

The chain table merging module 303 is configured to merge at least one ordered chain table to be merged according to a preset merging algorithm; wherein the at least one ordered linked list to be combined comprises: and the plurality of precursor nodes of the active node correspond to an ordered linked list.

Optionally, the apparatus may further include:

the number determining module is used for determining the number of precursor nodes of the active node;

The target determining module is used for determining that the active nodes of which the precursor node numbers exceed a preset threshold value are target active nodes;

Optionally, the apparatus may further include:

And the index transmission module is used for transmitting the node index of the active node to a successor node of the active node.

Optionally, the merging module may specifically include:

The pile building sub-module is used for building a maximum pile according to the number of ordered linked lists to be combined;

A storage sub-module, configured to store a linked list head of the ordered linked list to be combined into the maximum heap;

and the deleting sub-module is used for deleting the top of the maximum heap to acquire the data elements in the top of the heap, and adding the next data element of the ordered linked list where the data elements are positioned into the maximum heap until the merging termination condition is met.

Optionally, the merging module may specifically include:

the grouping sub-module is used for equally dividing at least one ordered linked list to be combined into two groups, and equally dividing the ordered linked list in each group into two groups until the number of the ordered linked lists in each group after grouping is 1;

and the merging sub-module is used for merging every group of ordered linked lists after grouping every two groups until merging each group of ordered linked list into one ordered linked list.

Optionally, the active node may include any one of the following information: triphones, acoustic states, words, syllables, phonemes.

Optionally, the decoding network may specifically include: a static decoding network based on a weighted finite state machine, or a dynamic decoding network based on a prefix tree.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

An embodiment of the present invention provides an apparatus for data processing, including a memory, and one or more programs, wherein the one or more programs are stored in the memory, and configured to be executed by one or more processors, the one or more programs comprising instructions for: determining a corresponding active node of the current voice frame in a decoding network; determining an ordered linked list corresponding to the precursor node of the active node according to the node index of the precursor node received by the active node from the precursor node of the active node and the mapping relation between the node index and the ordered linked list; merging at least one ordered linked list to be merged according to a preset merging algorithm; wherein the at least one ordered linked list to be combined comprises: and the plurality of precursor nodes of the active node correspond to an ordered linked list.

Fig. 4 is a block diagram illustrating an apparatus 800 for data processing according to an example embodiment. For example, apparatus 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.

Referring to fig. 4, apparatus 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the apparatus 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing element 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the device 800. Examples of such data include instructions for any application or method operating on the device 800, contact data, phonebook data, messages, pictures, videos, and the like. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 806 provides power to the various components of the device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 800.

The multimedia component 808 includes a screen between the device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the device 800 is in an operational mode, such as a call mode, a recording mode, and a voice information processing mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the apparatus 800. For example, the sensor assembly 814 may detect an on/off state of the device 800, a relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or one component of the apparatus 800, the presence or absence of user contact with the apparatus 800, an orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communication between the apparatus 800 and other devices, either in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on radio frequency information processing (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including instructions executable by processor 820 of apparatus 800 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

Fig. 5 is a schematic diagram of a server in some embodiments of the invention. The server 1900 may vary considerably in configuration or performance and may include one or more central processing units (central processing units, CPUs) 1922 (e.g., one or more processors) and memory 1932, one or more storage mediums 1930 (e.g., one or more mass storage devices) that store applications 1942 or data 1944. Wherein the memory 1932 and storage medium 1930 may be transitory or persistent. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, a central processor 1922 may be provided in communication with a storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.

The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.

A non-transitory computer readable storage medium, which when executed by a processor of an apparatus (server or terminal) enables the apparatus to perform the data processing method shown in fig. 1.

A non-transitory computer readable storage medium, which when executed by a processor of an apparatus (server or terminal), causes the apparatus to perform a data processing method, the method comprising: determining a corresponding active node of the current voice frame in a decoding network; determining an ordered linked list corresponding to the precursor node of the active node according to the node index of the precursor node received by the active node from the precursor node of the active node and the mapping relation between the node index and the ordered linked list; merging at least one ordered linked list to be merged according to a preset merging algorithm; wherein the at least one ordered linked list to be combined comprises: and the plurality of precursor nodes of the active node correspond to an ordered linked list.

The embodiment of the invention discloses A1, a data processing method, which comprises the following steps: determining a corresponding active node of the current voice frame in a decoding network;

A2, the method of A1, the method further comprising:

Determining the number of precursor nodes of the active node;

determining that the active nodes of which the precursor node numbers exceed a preset threshold value are target active nodes;

The at least one ordered linked list to be combined comprises: and the at least one precursor node of the target active node corresponds to an ordered linked list.

A3, the method of A1, the method further comprising:

and transmitting the node index of the active node to a successor node of the active node.

A4, according to the method of A1, the merging of at least one ordered linked list to be merged according to a preset merging algorithm comprises the following steps:

establishing a maximum heap according to the number of ordered linked lists to be combined;

Storing the linked list head of the ordered linked list to be combined into the maximum heap;

Deleting the top of the largest heap to acquire the data elements in the top of the heap, and adding the next data element of the ordered linked list where the data elements are located into the largest heap until the merging termination condition is met.

A5, according to the method of A1, the merging of at least one ordered linked list to be merged according to a preset merging algorithm comprises the following steps:

dividing at least one ordered linked list to be combined into two groups averagely, dividing the ordered linked list in each group into two groups averagely until the number of the ordered linked lists in each group after grouping is 1;

And merging every group of ordered linked lists after grouping every two groups until merging each group of ordered linked list into one ordered linked list.

A6, the method according to any of A1 to A5, wherein the active node comprises any one of the following information: triphones, acoustic states, words, syllables, phonemes.

A7, the method according to any of A1 to A5, the decoding network comprising: a static decoding network based on a weighted finite state machine, or a dynamic decoding network based on a prefix tree.

The embodiment of the invention discloses a B8 data processing device, which comprises:

B9, the apparatus of B8, the apparatus further comprising:

The at least one ordered linked list to be combined comprises: and the plurality of precursor nodes of the target active node correspond to an ordered linked list.

B10, the apparatus of B8, the apparatus further comprising:

B11, the apparatus of B8, the merge module comprising:

B12, the apparatus of B8, the merge module comprising:

B13, the apparatus of any one of B8 to B12, the active node comprising any one of the following information: triphones, acoustic states, words, syllables, phonemes.

B14, the apparatus of any one of B8 to B12, the decoding network comprising: a static decoding network based on a weighted finite state machine, or a dynamic decoding network based on a prefix tree.

The embodiment of the invention discloses a C15, a device for data processing, which comprises a memory and one or more programs, wherein the one or more programs are stored in the memory, and are configured to be executed by one or more processors, and the one or more programs comprise instructions for:

Merging at least one ordered linked list to be merged according to a preset merging algorithm; wherein the at least one ordered linked list to be combined comprises: and the plurality of precursor nodes of the active node correspond to an ordered linked list.

C16, the device of C15, the device further configured to be executed by one or more processors, the one or more programs comprising instructions for:

Determining the number of precursor nodes of the active node;

C17, the device of C15, the device further configured to be executed by one or more processors, the one or more programs comprising instructions for:

C18, according to the apparatus of C15, the merging of the at least one ordered linked list to be merged according to a preset merging algorithm includes:

C19, according to the apparatus of C15, the merging at least one ordered linked list to be merged according to a preset merging algorithm, including:

C20, the apparatus of any one of C15 to C19, the active node comprising any one of the following information: triphones, acoustic states, words, syllables, phonemes.

C21, the apparatus of any one of C15 to C19, the decoding network comprising: a static decoding network based on a weighted finite state machine, or a dynamic decoding network based on a prefix tree.

Embodiments of the invention disclose D22, a machine-readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform a data processing method as described in one or more of A1 to A7.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

The foregoing has outlined a data processing method, a data processing device and a device for data processing in detail, wherein specific examples are provided herein to illustrate the principles and embodiments of the present invention, and the above examples are provided to assist in understanding the method and core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. A method of data processing, the method comprising:

Merging at least one ordered linked list to be merged according to a preset merging algorithm; wherein the at least one ordered linked list to be combined comprises: an ordered linked list corresponding to at least one precursor node of the active nodes;

the merging of the at least one ordered linked list to be merged according to a preset merging algorithm comprises the following steps:

Deleting the top of the largest heap to acquire data elements in the top of the heap, and adding the next data element of the ordered linked list where the data elements are located into the largest heap until a merging termination condition is met;

the merging of the at least one ordered linked list to be merged according to a preset merging algorithm, or comprises:

2. The method according to claim 1, wherein the method further comprises:

Determining the number of precursor nodes of the active node;

3. The method according to claim 1, wherein the method further comprises:

4. A method according to any of claims 1 to 3, wherein the active node comprises any of the following information: triphones, acoustic states, words, syllables, phonemes.

5. A method according to any one of claims 1 to 3, wherein the decoding network comprises: a static decoding network based on a weighted finite state machine, or a dynamic decoding network based on a prefix tree.

6. A data processing apparatus, the apparatus comprising:

The chain table merging module is used for merging at least one ordered chain table to be merged according to a preset merging algorithm; wherein the at least one ordered linked list to be combined comprises: the plurality of precursor nodes of the active node correspond to an orderly linked list;

The merging module comprises:

A deleting sub-module, configured to delete a top of the largest heap to obtain a data element in the top of the heap, and add a next data element of an ordered linked list where the data element is located to the largest heap until a merge termination condition is satisfied;

The merging module, or comprises:

7. The apparatus of claim 6, wherein the apparatus further comprises:

8. The apparatus of claim 6, wherein the apparatus further comprises:

9. The apparatus according to any of claims 6 to 8, wherein the active node comprises any of the following information: triphones, acoustic states, words, syllables, phonemes.

10. The apparatus according to any one of claims 6 to 8, wherein the decoding network comprises: a static decoding network based on a weighted finite state machine, or a dynamic decoding network based on a prefix tree.

11. An apparatus for data processing comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:

Merging at least one ordered linked list to be merged according to a preset merging algorithm; wherein the at least one ordered linked list to be combined comprises: the plurality of precursor nodes of the active node correspond to an orderly linked list;

12. The device of claim 11, wherein the device is further configured to be executed by one or more processors the one or more programs include instructions for:

Determining the number of precursor nodes of the active node;

13. The device of claim 11, wherein the device is further configured to be executed by one or more processors the one or more programs include instructions for:

14. The apparatus according to any of claims 11 to 13, wherein the active node comprises any of the following information: triphones, acoustic states, words, syllables, phonemes.

15. The apparatus according to any one of claims 11 to 13, wherein the decoding network comprises: a static decoding network based on a weighted finite state machine, or a dynamic decoding network based on a prefix tree.

16. A machine readable medium having instructions stored thereon, which when executed by one or more processors, cause the processors to perform the data processing method of one or more of claims 1 to 5.