CN111354348A

CN111354348A - Data processing method and device and data processing device

Info

Publication number: CN111354348A
Application number: CN201811574155.0A
Authority: CN
Inventors: 姚光超
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2018-12-21
Filing date: 2018-12-21
Publication date: 2020-06-30
Anticipated expiration: 2038-12-21
Also published as: CN111354348B

Abstract

The embodiment of the invention provides a data processing method and device and a device for data processing. The method specifically comprises the following steps: determining an active node corresponding to a current voice frame in a decoding network; determining an ordered linked list corresponding to the precursor node of the active node according to the node index of the precursor node received by the active node from the precursor node of the active node and the mapping relation between the node index and the ordered linked list; merging at least one ordered linked list to be merged according to a preset merging algorithm; wherein the at least one ordered linked list to be merged comprises: and the ordered linked list corresponding to at least one precursor node of the active nodes. The embodiment of the invention can reduce the complexity of combining a plurality of ordered linked lists, thereby improving the decoding speed and the efficiency of voice recognition.

Description

Data processing method and device and data processing device

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a data processing method and apparatus, and an apparatus for data processing.

Background

Speech Recognition, also known as ASR (Automatic Speech Recognition), aims at converting the vocabulary content in Speech into computer-readable input, such as keystrokes, binary codes or character sequences.

Specifically, knowledge sources such as an acoustic model, a language model, a pronunciation dictionary and the like can be compiled into a decoding network, and the voice recognition is a process for finding an optimal path in the decoding network.

However, since the decoding network is usually huge, in the process of decoding the voice information by traversing the decoding network, the decoding speed is often slow, and the voice recognition efficiency is affected.

Disclosure of Invention

The embodiment of the invention provides a data processing method and device and a device for processing data, which can improve the decoding speed and the efficiency of voice recognition.

In order to solve the above problem, an embodiment of the present invention discloses a data processing method, where the method includes:

determining an active node corresponding to a current voice frame in a decoding network;

determining an ordered linked list corresponding to the precursor node of the active node according to the node index of the precursor node received by the active node from the precursor node of the active node and the mapping relation between the node index and the ordered linked list;

merging at least one ordered linked list to be merged according to a preset merging algorithm; wherein the at least one ordered linked list to be merged comprises: and the ordered linked list corresponding to at least one precursor node of the active nodes.

In another aspect, an embodiment of the present invention discloses a data processing apparatus, where the apparatus includes:

a node determining module, configured to determine an active node corresponding to a current voice frame in a decoding network;

a linked list determining module, configured to determine an ordered linked list corresponding to the precursor node of the active node according to the node index of the precursor node received by the active node from the precursor node thereof and a mapping relationship between the node index and the ordered linked list;

the linked list merging module is used for merging at least one ordered linked list to be merged according to a preset merging algorithm; wherein the at least one ordered linked list to be merged comprises: and the ordered linked lists corresponding to a plurality of precursor nodes of the active node.

In yet another aspect, an embodiment of the present invention discloses an apparatus for data processing, including a memory, and one or more programs, where the one or more programs are stored in the memory, and configured to be executed by the one or more processors includes instructions for:

In yet another aspect, an embodiment of the invention discloses a machine-readable medium having stored thereon instructions, which, when executed by one or more processors, cause an apparatus to perform a data processing method as described in one or more of the preceding.

The embodiment of the invention has the following advantages:

after determining the active node corresponding to the current voice frame in the decoding network, the embodiment of the invention can know which precursor nodes exist in the active node according to the node indexes of the precursor nodes received by the active node from the precursor nodes, and further can determine the ordered linked list corresponding to the precursor nodes of the active node according to the node indexes of the precursor nodes of the active node and the mapping relation between the node indexes and the ordered linked list, so that the ordered linked lists corresponding to a plurality of precursor nodes of the active node can be merged once according to a preset merging algorithm. Therefore, under the condition that the number of the precursor nodes of the active node is large, the ordered linked lists corresponding to the precursor nodes can be combined by adopting a preset combination algorithm, and compared with the method that the ordered linked lists are combined for multiple times in sequence, the complexity of combining the ordered linked lists can be reduced, and the decoding speed and the voice recognition efficiency can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a flow chart of the steps of one data processing method embodiment of the present invention;

FIG. 2 is a schematic diagram of a decoding network according to an embodiment of the present invention;

FIG. 3 is a block diagram of an embodiment of a data processing apparatus according to the present invention;

FIG. 4 is a block diagram of an apparatus 800 for data processing of the present invention; and

fig. 5 is a schematic diagram of a server in some embodiments of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Method embodiment

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a data processing method according to the present invention is shown, which may specifically include the following steps:

step 101, determining an active node corresponding to a current voice frame in a decoding network;

102, determining an ordered linked list corresponding to the precursor node of the active node according to the node index of the precursor node received by the active node from the precursor node of the active node and the mapping relation between the node index and the ordered linked list;

103, merging at least one ordered linked list to be merged according to a preset merging algorithm; wherein the at least one ordered linked list to be merged comprises: and the ordered linked lists corresponding to a plurality of precursor nodes of the active node.

The data processing method of the embodiment of the invention can be applied to a voice recognition scene, and the data processing method can be operated in electronic equipment, wherein the electronic equipment comprises but is not limited to: a server, a smart phone, a tablet computer, an e-book reader, an MP3 (Moving Picture Experts Group Audio Layer III) player, an MP4 (Moving Picture Experts Group Audio Layer IV) player, a laptop, a car computer, a desktop computer, a set-top box, a smart tv, a wearable device, and so on.

It can be understood that the obtaining manner of the voice information to be recognized in the embodiment of the present invention is not limited, for example, the electronic device may obtain the voice information to be recognized from the client or the network in a wired connection manner or a wireless connection manner, or may obtain the voice information to be recognized by recording the voice information to be recognized in real time by the electronic device, or may obtain the voice information to be recognized according to the instant messaging message obtained in the instant messaging application.

In the embodiment of the present invention, the speech information to be recognized may be segmented into a plurality of speech frames according to a preset window length and frame shift, where each speech frame may be a speech segment, and the speech information may be decoded frame by frame. If the voice information to be recognized is analog voice information (for example, a recording of a user call), the analog voice information needs to be converted into digital voice information, and then the voice information needs to be segmented.

Wherein the window length may be used to represent the duration of each frame of the speech segment and the frame shift may be used to represent the time difference between adjacent frames. For example, when the window length is 25ms and the frame length is 15ms, the first frame voice segment is 0-25 ms, the second frame voice segment is 15-40 ms, and so on, the segmentation of the voice information to be recognized can be realized. It is understood that the specific window length and frame shift can be set according to actual requirements, and the embodiment of the present invention is not limited thereto.

Optionally, before segmenting the speech information to be recognized, the electronic device may further perform noise reduction processing on the speech information to be recognized, so as to improve subsequent processing capability on the speech information.

In the embodiment of the present invention, the electronic device may store a decoding network in advance. The decoding network is a search network constructed from a plurality of knowledge sources, which may include acoustic models, acoustic contexts, pronunciation dictionaries, language models, and the like. The decoding process is to search in the decoding network and select one or more optimal paths as the voice recognition result.

In an optional embodiment of the present invention, the decoding network may specifically include: a weighted finite state machine based static decoding network, or a prefix tree based dynamic decoding network.

In the embodiment of the present invention, the decoding network may be a dynamically constructed decoding network or a statically constructed decoding network, and it should be understood that the embodiment of the present invention does not limit the specific manner of constructing the decoding network.

The dynamically constructed decoding network takes the pronunciation dictionary of the prefix tree as a search network, and other knowledge sources are dynamically integrated in the decoding process. Because the dynamically constructed decoding network only comprises the pronunciation dictionary, the dynamically constructed decoding network is used, and the memory space can be saved.

The statically constructed decoding network may be a WFST (Weighted State transmitter) based decoding network. The statically constructed decoding network is a search network compiled from knowledge sources such as an acoustic model, an acoustic context, a pronunciation dictionary, a language model and the like, and although a large memory is occupied, the statically constructed decoding network can be used for improving the decoding speed.

In a specific application, the basic structure of the decoding network is a directed graph, which is composed of nodes and edges. The directed graph contains valid information (such as acoustic information and language information) related to voice recognition, and the valid information can be on an edge or a node. Speech recognition is the process of finding an optimal path in the directed graph based on the input speech information.

It is understood that the embodiments of the present invention are applicable to a decoding network of valid information on a node or a decoding network of valid information on an edge. For convenience of description, in the embodiments of the present invention, a decoding network of valid information on a node is taken as an example for explanation, and for a decoding network of valid information on an edge, decoding processes are similar and may refer to each other.

Because the decoding network of speech recognition is huge, if all nodes in the decoding network are traversed, the recognition speed is abnormally slow, and therefore, the decoding process is an extensive search process with pruning operation. For a decoding network with valid information on nodes, nodes left by pruning in the process of decoding a voice frame are called active nodes. Likewise, for decoding networks where valid information is on an edge, the edge left by pruning is called the active edge.

In an optional embodiment of the present invention, the active node may specifically include any one of the following information: triphones, acoustic states, words, syllables, phonemes.

It is to be understood that the granularity represented by the decoding network node is not limited by the embodiments of the present invention. Optionally, the active node may specifically include any one of the following information: triphones, acoustic states, words, syllables, phonemes, i.e. the granularity represented by the nodes of the decoding network may be triphones, or acoustic states, or words, or syllables, or phonemes. For example, in Chinese, "hello" is a word, "ni" is a phoneme, "ni (3)" is a syllable, and "sil-n + i" is a triphone. The acoustic states may specifically be HMM (Hidden Markov Model) states.

For example, the decoding network may be a search network constructed from HMM models, triphone models, pronunciation dictionaries, and language models, where HMM models, triphone models, pronunciation dictionaries, and language models actually describe the possible search spaces of the decoding network at different granularities.

Wherein the HMM model defines a sequence of HMM states corresponding to each triphone. By assuming the corresponding state of each frame of speech segment, a search can be performed on the state sequence of the HMM, thereby generating a possible triphone subsequence. The triphone model defines the correspondence from triphones to phones, and possible phone sequences can be obtained from the triphone sequences generated by the HMM model. The pronunciation dictionary defines words represented by phoneme sequences, and the corresponding word sequences can be obtained according to possible phoneme sequences generated by a triphone model. The language model defines the probability of occurrence of a word sequence, and the probability score of the word sequence can be obtained according to the word sequence generated by the pronunciation dictionary.

In the process of decoding the voice information frame by frame according to the decoding network, the embodiment of the invention firstly determines the corresponding active node of the current voice frame in the decoding network; then, according to the node index of the precursor node received by the active node from the precursor node thereof and the mapping relation between the node index and the ordered linked list, determining the ordered linked list corresponding to the precursor node of the active node; and finally, combining the ordered linked lists corresponding to the plurality of precursor nodes of the active node according to a preset combination algorithm. Wherein, the ordered linked list may specifically include: a candidate path from a starting node of the decoding network to the predecessor node.

In particular applications, the decoding process typically employs a token passing algorithm. The token (token) refers to an optimal path from the start node to the current node, and the ordered linked list may be used to store an optimal path set from the start node to the current node, that is, the candidate paths in the ordered linked list may specifically be: the first n paths (n is a natural number) with the highest score from the starting node to the predecessor node, and the candidate paths in the ordered linked list can be sorted from high to low according to the score.

The length of the ordered linked list is also n, and it can be understood that the specific value of n is not limited in the embodiment of the present invention, and the larger the value of n is, the more candidate paths in the ordered linked list are, the higher the decoding accuracy is, but with the increase of the value of n, the decoding speed is also affected, so in practical application, the value of n can be set in consideration of the accuracy and the speed in a balanced manner. For example, n may be set to a value of 32.

Referring to fig. 2, a schematic diagram of a decoding network according to an embodiment of the present invention is shown. The decoding process starts from the starting node (e.g., node 0) in fig. 2 and traverses the entire decoding network backward. Assuming that the current time is t1, and determining the active node corresponding to the current speech frame at the time t1 includes: node2, node3, node4 and node 5.

The node2, the node3, the node4 and the node5 respectively store ordered linked lists transmitted from respective predecessor nodes, such as list2, list3, list4 and list 5. And the node2, the node3, the node4 and the node5 respectively generate the ordered linked lists corresponding to the respective nodes according to the received ordered linked list of the precursor nodes and the acoustic model score or the language model score of the node, and then transmit the generated ordered linked lists to the respective back-drive nodes.

For example, the node2 generates an ordered linked list corresponding to the node2 as list2 ' according to the list2 and the acoustic model score or the language model score of the node2, the list2 ' includes candidate paths from the node 0 to the node2, and the node2 transmits the list2 ' to a back-drive node (node 6) thereof; similarly, node3 generates an ordered linked list3 'including candidate paths from node 0 to node3, and node3 passes list 3' to its back-drive node (node 6); node4 generates an ordered linked list4 'including candidate paths from node 0 to node4, node4 passing list 4' to its successor node (node 6); node5 generates an ordered linked list5 'including the candidate paths from node 0 to node5, with node5 passing list 5' to its successor node (node 7). Then, an active node at the next time (for example, time t 2) is generated according to the token passing algorithm, which indicates that the decoding process of the speech frame at time t1 is completed, and the decoding operation of the speech frame at time t2 is started, and at this time, the speech frame at time t2 can be marked as the current speech frame.

In the process of decoding the speech frame at the time t2, when traversing to the node 6, since there are 3 predecessor nodes (node2, node3, and node4) in the node 6, the node 6 needs to combine the list2 ' from the node2, the list3 ' from the node3, and the list4 ' from the node 4. Since node 6 does not know which predecessor nodes it has, node 6 sequentially merges the received ordered linked lists of predecessor nodes.

For example, when receiving the list2 'from the node2, the node 6 merges with the list (ordered list) of itself first, and since the list of itself is empty, the list is still 2' after merging with the empty list, so that the list2 'is saved, when receiving the list 3' from the node3, the list2 'and the list 3' are merged, and when receiving the list4 'from the node4, the merged result of the list 2' and the list3 'is merged with the list 4'.

However, in practical applications, the number of ordered linked lists to be merged is typically large, since the decoding network is typically large. Assuming that there are k ordered linked lists to be merged, the length of each ordered linked list is n, and the merged ordered linked list is set to only keep n candidate paths with the highest score, if the k ordered linked lists are merged in sequence, the merging complexity is o (nk). In the case of a large k value, the merging speed will be slow, and the efficiency of speech recognition will be affected.

In order to reduce the complexity of combining a plurality of ordered linked lists and improve the decoding speed and the voice recognition efficiency, the embodiment of the invention can establish the mapping relation between the node indexes and the ordered linked lists and transmit the node indexes in the token transmission process.

In an optional embodiment of the invention, the method may further comprise: and transmitting the node index of the active node to a back-driving node of the active node.

In this way, after determining the active nodes corresponding to the current voice frame in the decoding network, each active node has received the node indexes of all the precursor nodes, that is, the active nodes can know which precursor nodes exist, and further, the ordered linked list corresponding to the precursor nodes of the active nodes can be determined according to the node indexes of the precursor nodes and the mapping relationship between the node indexes and the ordered linked list, so that the ordered linked lists corresponding to the precursor nodes of the active nodes can be combined according to a preset combining algorithm, and the combining complexity can be reduced.

For example, take the decoding network in fig. 2 as an example. A mapping relationship between the node index of node2 (e.g., node2) and the ordered list2 ', a mapping relationship between the node index of node3 (e.g., node3) and the ordered list 3', a mapping relationship between the node index of node4 (e.g., node4) and the ordered list4 ', and a mapping relationship between the node index of node5 (e.g., node5) and the ordered list 5' may be established.

During decoding, node2, node3, node4, and node5 each pass their respective node indices to their respective back-drive nodes. Specifically, node2 transfers node2 to its back-drive node (node 6), node3 transfers node3 to its back-drive node (node 6), node4 transfers node4 to its back-drive node (node 6), and node5 transfers node5 to its back-drive node (node 7).

After the speech frame decoding at time t1 is completed, the current speech frame at time t2 may be decoded, assuming that determining the active node corresponding to the current speech frame at time t2 includes: the node 6 and the node 7 may obtain the ordered linked lists (list2 ', list 3', and list4 ') of the predecessor nodes (node2, node3, and node4) of the node 6 according to the node indexes (node2, node3, and node4) of the predecessor nodes received by the node 6 and the mapping relationship between the node indexes of the active nodes and the ordered linked lists of the active nodes, and may further adopt a preset merging algorithm to merge the list 2', the list3 ', and the list 4'. And then, the process of transmitting the node index backwards by the active node at the t2 is carried out until all the voice frames are decoded, and after the decoding of the last voice frame is finished, a voice recognition result can be returned.

It can be seen that in the token transmission process according to the embodiment of the present invention, the node indexes of the nodes may be transmitted, after determining the active node corresponding to the current voice frame, the active node may receive, from its predecessor node, the node indexes of the predecessor nodes and the mapping relationship between the node indexes and the ordered linked lists, determine the ordered linked lists corresponding to the predecessor nodes of the active node, and according to a preset merging algorithm, merge the ordered linked lists corresponding to the multiple predecessor nodes of the active node once, instead of merge the multiple ordered linked lists for multiple times in sequence, which may reduce the complexity of merging the multiple ordered linked lists, so as to improve the efficiency of voice recognition.

It is to be understood that the preset merging algorithm is not limited by the embodiment of the present invention. For example, the merging algorithm may include a maximum heap method or a divide and conquer method, etc.

In an optional embodiment of the present invention, the merging, according to a preset merging algorithm, at least one ordered linked list to be merged may specifically include:

step S11, establishing a maximum heap according to the number of the ordered linked lists to be merged;

step S12, storing the chain table head of the ordered linked list to be merged into the maximum heap;

and step S13, deleting the heap top of the maximum heap to obtain the data elements in the heap top, and adding the next data element of the ordered linked list where the data elements are located into the maximum heap until the merging termination condition is met.

In the embodiment of the present invention, the ordered linked list may be an ordered linked list ordered from high to low according to the path score of the candidate path. Therefore, the embodiment of the invention can adopt the maximum stacking method to quickly sort the ordered linked lists.

For example, assuming that the number of the ordered linked lists to be merged is k, a maximum heap with the capacity of k may be established according to the number of the ordered linked lists to be merged, and the head of the linked list of the ordered linked lists to be merged is stored in the maximum heap, a data element in the heap top is taken and the heap top is deleted each time to obtain the data element in the heap top, then the next data element of the ordered linked list where the data element in the heap top is located is added to the maximum heap, and the above process is repeated until the merging termination condition is satisfied, so that the merging of the k ordered linked lists can be completed. The merging termination condition may specifically include: and the elements in the maximum heap are empty, or the length of the combined linked list reaches the preset length.

Still taking FIG. 2 as an example, node 6 needs to merge list2 ' from node2, list3 ' from node3, and list4 ' from node 4. Suppose list2 'is 10- >8- >5- >2, where 10 indicates that the score of the first candidate path in the ordered linked list 2' is 10, 8 indicates that the score of the second candidate path is 8, and so on; assuming that list3 ' is 12- >7- >2- >0 and list4 ' is 9- >6- >3- >1, the result obtained by combining list2 ', list3 ' and list4 ' is 12- >10- >9- > 8.

Because the complexity of executing the step S13 is o (logk), the complexity of merging k ordered linked lists with length n is o (nlogk), and compared with the complexity of the sequential merging mode o (nk), the complexity of merging a plurality of ordered linked lists can be greatly reduced by adopting the merging algorithm of the largest heap, so that the decoding speed and the efficiency of speech recognition can be improved, and especially under the condition that the value of k is large, the acceleration effect is more obvious.

step S21, averagely dividing at least one ordered linked list to be merged into two groups, and averagely dividing the ordered linked lists in each group into two groups after grouping until the number of the ordered linked lists in each group after grouping is 1;

and step S22, combining every two ordered linked lists of each group after grouping until all the ordered linked lists are combined into one ordered linked list.

The embodiment of the invention can also adopt a divide-and-conquer method to rapidly merge a plurality of ordered linked lists so as to reduce the complexity of merging.

For example, k ordered linked lists are merged, the k ordered linked lists can be averagely divided into two groups firstly, each group comprises k/2 ordered linked lists, and the divided ordered linked lists in each group are averagely divided into two groups until the number of the ordered linked lists in each group is 1; and then merging is started, and every two of each group of divided ordered linked lists are merged until the ordered linked lists are merged into one ordered linked list. The complexity of the k ordered linked lists with the length of n is O (nlogk), so that the complexity of combining the ordered linked lists can be greatly reduced by combining the ordered linked lists by a divide-and-conquer method, and the decoding speed and the voice recognition efficiency can be improved.

It can be understood that the maximum heap method and the divide-and-conquer method are used as the preset merging algorithm, which is only an application example of the present invention, and in practical applications, a person skilled in the art can flexibly select the merging algorithm according to practical situations, and the embodiment of the present invention is not limited thereto.

In an optional embodiment of the invention, the method may further comprise:

step S31, determining the number of the ordered linked lists to be merged;

step S32, determining the active nodes of which the number of the to-be-merged ordered linked lists exceeds a preset threshold value as target active nodes;

the at least one ordered linked list to be merged may specifically include: and the ordered linked lists correspond to a plurality of precursor nodes of the target active node.

In the embodiment of the invention, the number of the ordered linked lists to be combined can be judged, and the active nodes of which the number of the ordered linked lists to be combined exceeds the preset threshold value are taken as the target active nodes, the target active nodes can refer to the active nodes with more precursor nodes, and a large amount of time is consumed for combining the ordered linked lists of a plurality of precursor nodes, so that the preset combination algorithm can be adopted to combine the ordered linked lists under the condition of more precursor nodes, so that the complexity of the combination operation is reduced, and the efficiency of voice recognition is improved.

It can be understood that the specific value of the preset threshold is not limited in the embodiment of the present invention, and for example, the preset threshold may be set to 8.

Optionally, if the number of predecessor nodes of the active node is small, for example, only 2 predecessor nodes are provided, even if the ordered linked lists of the predecessor nodes are combined in sequence, the complexity of combination is not high, and the efficiency of voice recognition is not affected.

To sum up, after determining an active node corresponding to a current voice frame in a decoding network, the embodiment of the present invention may learn which precursor nodes exist in the active node according to node indexes of the precursor nodes received by the active node from its precursor nodes, and further may determine an ordered linked list corresponding to the precursor nodes of the active node according to the node indexes of the precursor nodes of the active node and mapping relationships between the node indexes and the ordered linked list, so that the ordered linked lists corresponding to a plurality of precursor nodes of the active node may be merged according to a preset merging algorithm. Therefore, under the condition that the number of the precursor nodes of the active node is large, the ordered linked lists corresponding to the precursor nodes can be combined by adopting a preset combination algorithm, and compared with the sequential combination of the ordered linked lists, the complexity of combining the ordered linked lists can be reduced, and the decoding speed and the voice recognition efficiency can be improved.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Device embodiment

Referring to fig. 3, a block diagram of a data processing apparatus according to an embodiment of the present invention is shown, where the apparatus may specifically include:

a node determining module 301, configured to determine an active node corresponding to a current speech frame in a decoding network;

a linked list determining module 302, configured to determine an ordered linked list corresponding to a precursor node of the active node according to a node index of the precursor node received by the active node from the precursor node thereof and a mapping relationship between the node index and the ordered linked list;

a linked list merging module 303, configured to merge at least one ordered linked list to be merged according to a preset merging algorithm; wherein the at least one ordered linked list to be merged comprises: and the ordered linked lists corresponding to a plurality of precursor nodes of the active node.

Optionally, the apparatus may further include:

a number determination module for determining a number of predecessor nodes of the active node;

the target determining module is used for determining the active nodes with the number of the precursor nodes exceeding a preset threshold value as target active nodes;

Optionally, the apparatus may further include:

and the index transmission module is used for transmitting the node index of the active node to the back-drive node of the active node.

Optionally, the merging module may specifically include:

the heap building submodule is used for building a maximum heap according to the number of the ordered linked lists to be combined;

the storage submodule is used for storing the chain table head of the ordered linked list to be combined into the maximum heap;

and the deletion submodule is used for deleting the heap top of the maximum heap to obtain the data elements in the heap top, and adding the next data element of the ordered linked list where the data elements are located into the maximum heap until the merging termination condition is met.

Optionally, the merging module may specifically include:

the grouping submodule is used for averagely dividing at least one ordered linked list to be combined into two groups, and averagely dividing the ordered linked lists in each group into two groups after grouping until the number of the ordered linked lists in each group after grouping is 1;

and the merging submodule is used for merging every two grouped ordered linked lists until the ordered linked lists are merged into one ordered linked list.

Optionally, the active node may include any one of the following information: triphones, acoustic states, words, syllables, phonemes.

Optionally, the decoding network may specifically include: a weighted finite state machine based static decoding network, or a prefix tree based dynamic decoding network.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

An embodiment of the present invention provides an apparatus for data processing, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs configured to be executed by the one or more processors include instructions for: determining an active node corresponding to a current voice frame in a decoding network; determining an ordered linked list corresponding to the precursor node of the active node according to the node index of the precursor node received by the active node from the precursor node of the active node and the mapping relation between the node index and the ordered linked list; merging at least one ordered linked list to be merged according to a preset merging algorithm; wherein the at least one ordered linked list to be merged comprises: and the ordered linked lists corresponding to a plurality of precursor nodes of the active node.

Fig. 4 is a block diagram illustrating an apparatus 800 for data processing in accordance with an example embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 4, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.

The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice information processing mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on radio frequency information processing (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 5 is a schematic diagram of a server in some embodiments of the invention. The server 1900, which may vary widely in configuration or performance, may include one or more Central Processing Units (CPUs) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) storing applications 1942 or data 1944. Memory 1932 and storage medium 1930 can be, among other things, transient or persistent storage. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instructions operating on a server. Still further, a central processor 1922 may be provided in communication with the storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.

The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input-output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

A non-transitory computer-readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform the data processing method shown in fig. 1.

A non-transitory computer readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform a data processing method, the method comprising: determining an active node corresponding to a current voice frame in a decoding network; determining an ordered linked list corresponding to the precursor node of the active node according to the node index of the precursor node received by the active node from the precursor node of the active node and the mapping relation between the node index and the ordered linked list; merging at least one ordered linked list to be merged according to a preset merging algorithm; wherein the at least one ordered linked list to be merged comprises: and the ordered linked lists corresponding to a plurality of precursor nodes of the active node.

The embodiment of the invention discloses A1 and a data processing method, which comprises the following steps: determining an active node corresponding to a current voice frame in a decoding network;

A2, the method of A1, the method further comprising:

determining a number of predecessor nodes of the active node;

determining the active nodes with the number of the precursor nodes exceeding a preset threshold as target active nodes;

the at least one ordered linked list to be merged includes: and the ordered linked list corresponding to at least one precursor node of the target active node.

A3, the method of A1, the method further comprising:

and transmitting the node index of the active node to a back-driving node of the active node.

A4, merging the at least one ordered linked list to be merged according to the preset merging algorithm according to the method of A1, including:

establishing a maximum heap according to the number of the ordered linked lists to be merged;

storing the head of the chain table of the ordered linked list to be merged into the maximum heap;

deleting the heap top of the maximum heap to obtain the data elements in the heap top, and adding the next data element of the ordered linked list where the data elements are located into the maximum heap until meeting the merging termination condition.

A5, merging the at least one ordered linked list to be merged according to the preset merging algorithm according to the method of A1, including:

averagely dividing at least one ordered linked list to be combined into two groups, and averagely dividing the ordered linked lists in each group into two groups after grouping until the number of the ordered linked lists in each group after grouping is 1;

and combining every two grouped ordered linked lists until all the groups of ordered linked lists are combined into one ordered linked list.

A6, the method according to any of a1 to a5, the active node comprising any one of the following information: triphones, acoustic states, words, syllables, phonemes.

A7, the method according to any of A1 to A5, the decoding network comprising: a weighted finite state machine based static decoding network, or a prefix tree based dynamic decoding network.

The embodiment of the invention discloses B8 and a data processing device, which comprises:

B9, the apparatus of B8, the apparatus further comprising:

the at least one ordered linked list to be merged includes: and the ordered linked lists correspond to a plurality of precursor nodes of the target active node.

B10, the apparatus of B8, the apparatus further comprising:

B11, the apparatus of B8, the merge module comprising:

B12, the apparatus of B8, the merge module comprising:

B13, the apparatus according to any of B8 to B12, the active node comprising any of the following information: triphones, acoustic states, words, syllables, phonemes.

B14, the apparatus according to any of B8 to B12, the decoding network comprising: a weighted finite state machine based static decoding network, or a prefix tree based dynamic decoding network.

The embodiment of the invention discloses C15, an apparatus for data processing, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs configured to be executed by the one or more processors comprise instructions for:

merging at least one ordered linked list to be merged according to a preset merging algorithm; wherein the at least one ordered linked list to be merged comprises: and the ordered linked lists corresponding to a plurality of precursor nodes of the active node.

C16, the device of C15, the device also configured to execute the one or more programs by one or more processors including instructions for:

determining a number of predecessor nodes of the active node;

C17, the device of C15, the device also configured to execute the one or more programs by one or more processors including instructions for:

C18, the merging the at least one ordered linked list to be merged according to the preset merging algorithm by the apparatus according to C15, including:

C19, the merging the at least one ordered linked list to be merged according to the preset merging algorithm by the apparatus according to C15, including:

C20, the apparatus according to any of C15 to C19, the active node comprising any of the following information: triphones, acoustic states, words, syllables, phonemes.

C21, the apparatus according to any of C15 to C19, the decoding network comprising: a weighted finite state machine based static decoding network, or a prefix tree based dynamic decoding network.

Embodiments of the present invention disclose D22, a machine-readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform a data processing method as described in one or more of a 1-a 7.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

The data processing method, the data processing apparatus and the apparatus for data processing provided by the present invention are described in detail above, and specific examples are applied herein to illustrate the principles and embodiments of the present invention, and the description of the above embodiments is only used to help understand the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method of data processing, the method comprising:

2. The method of claim 1, further comprising:

determining a number of predecessor nodes of the active node;

3. The method of claim 1, further comprising:

4. The method according to claim 1, wherein said merging the at least one ordered linked list to be merged according to a preset merging algorithm comprises:

5. The method according to claim 1, wherein said merging the at least one ordered linked list to be merged according to a preset merging algorithm comprises:

6. The method according to any of claims 1 to 5, wherein the active node comprises any of the following information: triphones, acoustic states, words, syllables, phonemes.

7. The method according to any of claims 1 to 5, wherein the decoding network comprises: a weighted finite state machine based static decoding network, or a prefix tree based dynamic decoding network.

8. A data processing apparatus, characterized in that the apparatus comprises:

9. An apparatus for data processing, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein execution of the one or more programs by one or more processors comprises instructions for:

10. A machine-readable medium having stored thereon instructions which, when executed by one or more processors, cause an apparatus to perform a data processing method as claimed in one or more of claims 1 to 7.