WO2017024908A1 - 语音处理方法与装置 - Google Patents

语音处理方法与装置 Download PDF

Info

Publication number
WO2017024908A1
WO2017024908A1 PCT/CN2016/088896 CN2016088896W WO2017024908A1 WO 2017024908 A1 WO2017024908 A1 WO 2017024908A1 CN 2016088896 W CN2016088896 W CN 2016088896W WO 2017024908 A1 WO2017024908 A1 WO 2017024908A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
data packet
voice data
connection
recognition
Prior art date
Application number
PCT/CN2016/088896
Other languages
English (en)
French (fr)
Inventor
王育军
Original Assignee
乐视控股(北京)有限公司
乐视致新电子科技(天津)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 乐视控股(北京)有限公司, 乐视致新电子科技(天津)有限公司 filed Critical 乐视控股(北京)有限公司
Priority to US15/243,839 priority Critical patent/US20170047069A1/en
Publication of WO2017024908A1 publication Critical patent/WO2017024908A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems

Definitions

  • the present invention relates to the field of voice processing technologies, and in particular, to a voice processing method and apparatus.
  • the Internet-based speech recognition cloud service can help users implement voice search, voice input, voice dialogue and so on.
  • the voice to be processed is transmitted from the user terminal to the voice recognition server through the Internet, and the recognition result of the voice recognition server is returned to the user terminal along the original link and displayed to the user.
  • the number of user requests for Internet-based speech recognition services is huge and increasing, making the already limited identification resources more and more tense. It can be seen that, for those skilled in the art, the voice recognition server resource can be fully utilized for the voice request processing, so that the voice recognition server can recognize more voice requests in a unit time, which is an urgent problem to be solved.
  • a speech recognition thread is created for each physical core of the recognition server.
  • the voice requested by the user is divided into voice data packets and sent to the voice recognition in the unit of the voice duration of 1 second while the user inputs the voice.
  • Server request 1 packet 1 as shown in Figure 2, request 1 packet 2.
  • a voice connection is established for a voice packet from a client, and a voice recognition thread and a voice recognition instance are bound to the voice connection. That is, a speech recognition thread serves only the voice packets requested by a voice connection.
  • the speech recognition thread processes the voice packet of 1 second voice duration
  • the real-time rate is usually between 0.6 and 0.9, that is, the voice packet for processing 1 second needs 0.6 seconds to 0.9 seconds.
  • the identification is completed. It can be seen that the recognition speed of the speech recognition thread is faster than the speed of the user inputting the speech. This will result in the voice recognition thread being idle when the next voice packet requested by the voice connection is pending after processing a voice packet requested by the voice recognition thread server voice connection.
  • the idle voice thread indicates that the voice recognition server resource is not fully utilized, and the voice recognition server resource utilization rate is low.
  • a recognition server has 12 logical processor physical cores, which can create 48 sets of voice connections, voice recognition threads, and voice recognition instances.
  • the existing second solution when the voice data packet currently processed by a voice recognition thread is processed, the physical core corresponding to the voice processing thread is idle at this time, and at this time, other voice processing with the voice data packet to be processed is performed.
  • the thread grabs the unused physical core. Since the switching time of each voice thread is not controlled, often multiple voice threads compete for one physical core, which will cause physical core processing time rotation between voice threads. Thus, although the physical core is not idle, part of the resources of the physical core are due to voice threads. The rotation is consumed and still not utilized to process voice requests. It can be seen that the existing second simple solution relies on increasing the number of voice recognition thread groups, and there is still a problem that voice recognition server resources cannot be fully utilized for processing voice requests.
  • the speech recognition server resources are not fully utilized for processing voice requests, and the present invention has been made to provide a speech processing method and apparatus that overcomes the above problems or at least partially solves the above problems.
  • a voice processing method includes: a current voice recognition thread acquires a voice data packet stored according to a setting rule; and invokes a voice recognition instance to perform voice recognition processing on the obtained voice data packet. After processing the obtained voice data packet, determining whether there are still other voice data packets to be processed; if yes, returning to the current voice recognition thread to acquire the voice data packet stored according to the setting rule, Performing voice recognition processing on the other voice data packets, wherein the obtained voice data is The packet belongs to a different voice connection than the other voice data packets.
  • the present invention further provides a voice processing apparatus, comprising: an obtaining module, configured to: a current voice recognition thread acquires a voice data packet stored according to a setting rule; and a processing module, configured to invoke voice recognition The example performs voice recognition processing on the obtained voice data packet; the determining module is configured to: after processing the obtained voice data packet, determine whether there are still other voice data packets to be processed; and return a module, if If yes, the obtaining module returns to perform voice recognition processing on the other voice data packets, where the obtained voice data packet belongs to a different voice connection with the other voice data packets.
  • a voice processing apparatus comprising: an obtaining module, configured to: a current voice recognition thread acquires a voice data packet stored according to a setting rule; and a processing module, configured to invoke voice recognition The example performs voice recognition processing on the obtained voice data packet; the determining module is configured to: after processing the obtained voice data packet, determine whether there are still other voice data packets to be processed; and return a module, if If yes, the
  • a computer program comprising computer readable code that, when executed on a server, causes the server to perform the voice processing method described above.
  • a computer readable medium wherein the computer program described above is stored.
  • the current voice recognition thread does not provide a voice processing service for only one voice connection. After the current voice data packet corresponding to the voice connection is processed, that is, when the current voice recognition thread is idle, the current voice recognition thread actively takes the initiative. It is determined whether there are still other requested voice data packets to be processed, and if so, directly acquires a voice data packet of other requests to be processed for processing.
  • the voice processing solution provided by the embodiment of the present invention on the one hand, actively acquires other voice data packets to be processed for voice processing when the voice recognition thread is idle, therefore, the voice recognition thread does not have the voice recognition thread existing in the prior art. Waiting for the next voice packet corresponding to the voice connection causes a waste of resources.
  • the voice recognition thread in the embodiment of the present invention is idle, the idle state of the corresponding physical core is changed by actively acquiring other voice data packets to be processed, so that the resources of the physical core can be fully utilized. There is no switch between the voice recognition threads in the whole process, and all the resources of the physical core corresponding to the voice recognition thread are utilized for processing the voice data packet, so that the existing voice recognition threads existing in the processing scheme can be effectively avoided. The problem of consuming physical nuclear resources.
  • the rotation between the voice threads is performed by the processor according to the clock cycle set by itself, that is, when processing the same voice data packet, it may be temporarily interrupted for a period of time to process another voice recognition thread.
  • Voice packets therefore, the rotation between voice recognition threads will result in a slowdown in the processing speed of a single voice request.
  • there is no rotation between the voice threads and therefore, the voice recognition can be avoided.
  • the problem of slow processing speed of a single voice request caused by the rotation of other threads.
  • FIG. 1 is a schematic diagram of a conventional voice interaction cloud
  • FIG. 3 is a flow chart showing the steps of a voice processing method according to Embodiment 1 of the present invention.
  • FIG. 4 is a flow chart showing the steps of a voice processing method according to Embodiment 2 of the present invention.
  • FIG. 5 is a view showing a performance comparison between the voice processing method shown in Embodiment 2 and the existing first voice processing method when processing a voice request;
  • FIG. 6 is a flow chart showing the process of starting and interrupting a voice processing system of the voice processing method described in Embodiment 2;
  • FIG. 7 is a schematic diagram showing the life cycle of a voice data packet during voice processing
  • FIG. 8 is a structural block diagram of a voice processing apparatus according to Embodiment 3 of the present invention.
  • FIG. 9 is a block diagram showing the structure of a voice processing apparatus according to Embodiment 4 of the present invention.
  • Figure 10 schematically shows a block diagram of a server for performing the method according to the invention.
  • Fig. 11 schematically shows a storage unit for holding or carrying program code implementing the method according to the invention.
  • FIG. 3 a flow chart of steps of a voice processing method according to Embodiment 1 of the present invention is shown.
  • Step S102 The current voice recognition thread acquires a voice data packet stored according to the setting rule.
  • the setting rules for storing voice data packets can be set by a person skilled in the art according to actual needs, which is not specifically limited in the embodiment of the present invention.
  • the next voice data packet requested by the voice connection is not stored before the previous voice data packet set to the voice connection request processing is not processed.
  • the next voice data packet requested by the voice connection is stored.
  • the stored voice data packets can be stored in any suitable manner.
  • a queue is set in the identification server, and voice packets are sequentially stored in the set queue in the order of storage time.
  • a storage pool is set, and each voice data packet to be stored is stored in the storage pool.
  • Step S104 Calling the voice recognition instance to perform voice recognition processing on the obtained voice data packet.
  • the speech recognition thread Upon recognition of the speech, the speech recognition thread assigns a voice recognition instance to each established voice connection, each voice connection processing all of the voice data packets requested by one voice request. Among them, a successful voice connection is established as a connection that has been assigned to process a voice request.
  • the speech recognition instance stores the current recognition status and the recognition history of the voice request, and the history of the voice data packet between the currently recognized voice data packet and the previously processed voice data packet can be semantically associated with each other by the recognition history. The current voice packet is processed.
  • Step S106 After the processing of the acquired voice data packet is completed, it is determined whether there are still other voice data packets to be processed; when the determination result is the presence, the process returns to step S102, and when the determination result is non-existent, the setting operation is performed. .
  • a person skilled in the art can understand that a server usually processes multiple voice connections at the same time, and each voice connection corresponds to multiple voice data packets. Therefore, in a specific implementation process, multiple servers to be processed exist simultaneously in the identification server. Voice packets.
  • the voice data packet to be processed is actively acquired for voice processing until all the voice data packets to be processed are processed.
  • the setting operation can be set by a person skilled in the art according to actual needs. For example, it is set to stop and release the voice recognition thread when it is determined that there is no voice data packet to be processed.
  • the current voice recognition thread does not provide a voice processing service only for one voice connection.
  • the current voice recognition thread Actively determine whether there are still other voice data packets to be processed, and if so, directly acquire a voice data packet of other requests to be processed for processing.
  • the voice processing method provided by the embodiment of the present invention on the one hand, actively acquires other voice data packets to be processed for voice processing when the voice recognition thread is idle, therefore, the voice recognition thread does not have the voice recognition thread existing in the prior art. Waiting for the next voice packet corresponding to the voice connection causes a waste of resources.
  • the voice recognition thread in the embodiment of the present invention is idle, the idle state of the corresponding physical core is changed by actively acquiring other voice data packets to be processed, so that the resources of the physical core can be fully utilized. There is no switching between speech recognition threads in the whole process, and all the resources of the physical core corresponding to the speech recognition thread are utilized for processing the voice data packet, thereby effectively avoiding the existing speech recognition threads among the existing processing methods. The problem of consuming physical nuclear resources.
  • the rotation between the voice threads is performed by the processor according to the clock cycle set by itself, that is, when processing the same voice data packet, it may be temporarily interrupted for a period of time to process another voice recognition thread.
  • Voice packets therefore, the rotation between voice recognition threads will result in a slowdown in the processing speed of a single voice request.
  • there is no rotation between the voice threads and therefore, the problem that the speed of the single voice request processing is reduced due to the rotation of the voice recognition thread can be avoided.
  • FIG. 4 a flow chart of steps of a voice processing method according to Embodiment 2 of the present invention is shown.
  • Step S202 The main thread of the server is respectively according to the number of physical cores included in the server.
  • the physical core creates a speech recognition thread.
  • the server contains 12 physical cores. After the voice processing system is started, the server main thread creates a corresponding voice recognition thread for the 12 physical cores. To improve the concurrency of processing voice requests, more than 12 voice connections and speech recognition instances can be established. In this way, the server can process more than 12 voice requests simultaneously, where each voice connection is used to process all voice packets requested for a voice connection.
  • a voice recognition thread recognizes a voice data packet, it automatically acquires another voice data packet to be processed, and invokes the voice recognition instance corresponding to the acquired voice data packet to perform voice recognition processing.
  • each voice recognition thread processes the voice data packets in parallel, and each voice recognition thread processes the voice data packets in the same manner. Therefore, in the embodiment of the present invention, only one voice recognition thread processes the voice data packet as an example for description. For the process of processing the voice data packet by the other voice recognition thread, refer to the embodiment of the present invention, and details are not described herein again.
  • the voice recognition threads since the number of voice recognition threads is equal to the number of physical cores included in the server logical kernel, that is, the voice recognition threads do not take turns or randomly compete for processor resources.
  • a voice recognition thread is created for each physical core, and each voice recognition thread can establish a voice connection for a voice connection request, and establish a successful voice connection to process each voice data packet sent by the corresponding voice connection request.
  • Step S204 The current voice recognition thread acquires a voice data packet stored according to the setting rule.
  • a preferred way to store voice packets in accordance with a set rule is as follows:
  • the voice connection to which the voice data packet belongs determines whether the voice data packet storage space to be processed, the voice data packet storage space currently being processed, and the processed voice data packet storage space store the voice corresponding to the voice connection. data pack;
  • three storage spaces may be preset in advance, which are respectively a voice data packet storage space to be processed, a voice data packet storage space being processed, and a processed voice data packet storage space, and the three storage spaces may be storage.
  • the queue can also be a storage pool.
  • the to-be-processed voice data packet is stored in the voice data packet storage space to be processed.
  • the voice data packet to be processed can be obtained from the storage space, and the voice data packet is stored to be currently processed.
  • the voice packet is voiced in the storage space.
  • the voice recognition thread stores the processed voice data packet into the processed voice data packet storage space.
  • the voice connection to which the voice data packet belongs scans the processed voice data packet storage space, and acquires the recognition result corresponding to the processed voice data packet. At this time, if the next voice data packet has arrived, the voice connection stores the next voice data packet in the voice data packet storage space to be processed.
  • the voice data packet to which the voice data packet belongs is stored in the server
  • the voice data packet to be processed needs to be guaranteed to be processed, and then the next voice data packet is requested to be processed by the server. Therefore, it is necessary to ensure that the voice data packet corresponding to the voice connection is not stored in the three storage spaces, and the next voice data packet is stored in the server.
  • Step S206 The current voice recognition thread invokes a voice recognition instance corresponding to the voice data packet.
  • the speech recognition instance is initialized, and the initialized speech recognition instance can be stored in the preset identification instance list.
  • the speech recognition instance can be called from the preset queue.
  • a preferred method of invoking a speech recognition instance corresponding to a voice data packet is as follows:
  • S1 determining whether a voice recognition instance corresponding to the voice connection to which the voice data packet belongs is stored in the identification instance list; if yes, executing S2, if not, executing S3.
  • Each voice connection established with the server is given a globally unique identifier by the server, which may be a UID or a unique identifier.
  • the voice recognition thread allocates a voice recognition instance to the voice connection, and associates the identifier of the voice recognition instance with the global unique identifier corresponding to the voice connection.
  • the voice data packet sent by the voice connection carries the global unique identifier corresponding to the voice connection, and the voice processing thread can determine the voice data by using the unique identifier carried in the voice data packet.
  • the connection to which the package belongs, and then established by The correspondence between the voice recognition instance identifier and the globally unique identifier of the voice connection can determine the voice recognition instance corresponding to the voice connection to which the voice data packet belongs.
  • the voice recognition thread has not assigned a voice recognition instance to the voice connection to which the voice data packet belongs, and therefore, there is no voice data packet in the identification instance list.
  • the voice recognition instance corresponding to the voice connection.
  • each voice connection only corresponds to one voice request
  • the voice connection and voice request to which the voice data packet belongs can also be identified by assigning a global unique identifier to each voice request.
  • the voice recognition instance corresponding to the voice connection to which the voice data packet belongs is stored in the identification instance list, it is proved that the current voice data packet is not the first voice data packet sent by the voice connection to which the voice data packet belongs, and at this time, the direct call has been assigned to The voice recognition instance of the voice connection can perform voice recognition on the current voice data packet.
  • the idle speech recognition instance is a speech recognition instance that does not have a correspondence with the voice connection.
  • Step S208 The current voice recognition thread performs voice recognition processing on the voice data packet by using the invoked voice recognition instance, and pushes the processed voice data packet into the processed voice data packet storage space.
  • the current voice recognition thread pushes the processed voice data packet into the processed voice data packet storage space for the voice connection to which the voice data packet belongs.
  • the voice connection to which the voice data packet belongs scans the processed voice data packet storage space according to the setting rule to determine whether the requested voice data packet has been processed, and scans the voice data packet in the storage space to prove the After the voice data packet is processed, the voice connection to which the voice data packet belongs can take the recognition result corresponding to the processed voice data packet.
  • the voice data packet list may be set in the processed voice data packet storage space, and the processed voice data packet belongs to the voice data list.
  • the UID corresponding to the voice connection the voice connection determines whether the requested voice data packet has been processed by scanning and judging whether the set voice data packet list includes its corresponding UID. If the corresponding UID is in the list, it is proved that the voice data packet has been processed, and the recognition result corresponding to the processed voice data packet can be taken away.
  • Step S210 The current voice recognition thread determines whether the voice data packet includes an end identifier; if yes, step S212 is performed, and if not, the setting operation is performed.
  • the end identifier is used to indicate that the current voice data packet is the last voice data packet corresponding to the associated voice connection.
  • the current voice recognition thread determines whether the voice data packet includes an end identifier, and the purpose is to determine whether the voice data packet to be requested by the voice connection is completely processed. If yes, the voice recognition instance corresponding to the voice connection may be released for Other voice connections are used.
  • the setting operation can be set by a person skilled in the art according to actual needs, and the embodiment does not specifically limit this. For example: set to directly obtain other pending voice packets.
  • the voice data requested by a voice connection is divided into a plurality of voice data packets, and the last data packet carries an end identifier.
  • the voice data requested by the current voice connection is divided into five voice data packets, and the first five voice data packets respectively carry identifiers for indicating that the current voice data packet is not the first voice data packet (for example, The identifiers carried in the first to fifth voice data packets are respectively set to 1, 2, 3, 4, 5), and the sixth voice data packet is carried in the voice data packet because it is the last voice data packet.
  • There is an end identifier for indicating that the current voice data packet is the last voice data packet (if the end identifier is set to -6).
  • a person skilled in the art may set the end identifier by using any suitable setting manner, for example, only carrying the setting identifier in the last voice data packet, and when the data packet carries the setting identifier, You can determine the last voice packet that the packet requested for its voice connection.
  • the identifier 1 may be carried in the non-last voice data packet, and the identifier 0 is carried in the last voice data packet, and the identifier 0 is defined as the end identifier.
  • Step S212 If the voice data packet includes the end identifier, the voice recognition thread releases the voice recognition instance corresponding to the voice data packet, and updates the state of the released voice recognition instance to idle.
  • Step S214 After the processing of the acquired voice data packet is completed, the voice recognition thread determines whether there are still other voice data packets to be processed; if yes, returns to step S204, and if not, performs a setting operation.
  • a server usually processes multiple voice connection requests at the same time, and each voice connection request corresponds to multiple voice data packets. Therefore, in a specific implementation process, the voice data packet to be processed in the server is stored. There are multiple voice packets to be processed at the same time in the space. In the embodiment of the present invention, after the current voice recognition thread processes the voice data packet requested by the current voice connection, the voice data packet to be processed is actively acquired for voice processing until all the voice data packets to be processed are processed.
  • the setting operation can be set by a person skilled in the art according to actual needs. For example, it is set to stop and release the voice recognition thread when it is determined that there is no voice data packet to be processed.
  • FIG. 5 shows the use of the voice processing method shown in this embodiment and the existing first voice processing method (create a voice recognition thread for each physical core of the server, and tie the voice recognition thread and the voice recognition instance Therefore, each voice recognition thread processes only the voice request corresponding to one voice connection) a performance comparison diagram when processing the voice request.
  • the comparison chart is generated by counting the same batch of data into the test data after stress testing in two separate speech processing systems, and the test server includes 12 in the two voice processing systems tested. Physical core.
  • the abscissa indicates: the maximum number of concurrent recognitions, that is, the number of voice connections processed by the server at the same time; the ordinate indicates the real-time rate of voice recognition, that is, the time when the voice thread processes the voice packet corresponding to the voice of 1 second, and real-time through voice recognition.
  • the rate reflects the performance of the server's processing of voice requests.
  • the line connecting the dots indicates the performance curve when the voice request is processed by the voice processing method in this embodiment; the line indicating the serial connection is processed by the existing first thread-bound voice processing method. Performance curve for voice request.
  • the speech recognition real-time rates of both speech recognition systems increase.
  • the advantages of the present invention begin to manifest, and although the real-time rate of the present invention is also increasing, However, the upward trend is not obvious. It can be seen that the voice processing method provided by the embodiment of the present invention can improve the throughput of the voice processing system.
  • the problem of updating the voice model of the voice recognition stored in the service is usually faced.
  • the server needs to be externally interrupted.
  • the voice connections connected to the server are directly interrupted, which affects the user experience.
  • the voice processing method in the embodiment of the present invention after receiving the external interrupt request to the server, the server main thread maintains the voice data packet that has been established and continues to process the voice data packets requested by the established voice connection until the voice connection is completed. After all the voice packets requested to be processed are processed, the voice connection is interrupted. For unestablished voice connections, their connection requests are no longer accepted.
  • the interrupt identifier When the main thread of the server receives the external interrupt request to the server, the interrupt identifier is generated. According to the interrupt identifier, the voice data corresponding to the voice connection is continuously processed for the voice connection established in the server, and the voice connection is corresponding to the voice connection. After all the voice packets are voice-recognized, the voice connection is interrupted; for the voice connection that is not connected to the server, the request to establish a connection with the server is directly canceled, and the voice connection between the servers is no longer established.
  • the processing external interrupt request scheme provided by the embodiment of the present invention continues to provide services to users who have established a voice connection until all voice data packets requested by the voice connection are processed, and the voice recognition request is not affected for the user. It will not be suddenly interrupted, so it can enhance the user experience.
  • the current voice recognition thread does not provide a voice processing service only for one voice connection.
  • the current voice recognition thread Actively determine whether there are still other voice data packets to be processed, and if so, directly acquire a voice data packet of other requests to be processed for processing.
  • the voice processing method provided by the embodiment of the present invention on the one hand, actively acquires other voice data packets to be processed for voice processing when the voice recognition thread is idle, therefore, the voice recognition thread does not have the voice recognition thread existing in the prior art. Waiting for the next voice packet corresponding to the voice connection causes a waste of resources.
  • the voice recognition thread in the embodiment of the present invention is idle, the number of other to-be-processed voices is actively acquired. According to the packet, the idle state of its corresponding physical core is changed, so that the resources of the physical core can be fully utilized. There is no switching between speech recognition threads in the whole process, and all the resources of the physical core corresponding to the speech recognition thread are utilized for processing the voice data packet, thereby effectively avoiding the existing speech recognition threads among the existing processing methods. The problem of consuming physical nuclear resources.
  • the rotation between the voice threads is performed by the processor according to the clock cycle set by itself, that is, when processing the same voice data packet, it may be temporarily interrupted for a period of time to process another voice recognition thread.
  • Voice packets therefore, the rotation between voice recognition threads will result in a slowdown in the processing speed of a single voice request.
  • there is no rotation between the voice threads and therefore, the problem that the speed of the single voice request processing is reduced due to the rotation of the voice recognition thread can be avoided.
  • FIG. 6 is a flow chart showing the process of starting and interrupting the voice processing system of the voice processing method described in Embodiment 2.
  • the voice processing system that is, the server starts, performs the following operations in advance:
  • the server uses the corresponding identification resource in the speech recognition process. Therefore, when the server is started, the identification resource needs to be preloaded.
  • the identification resource is stored in the hard disk, and the purpose of loading is to load the identification resource stored in the hard disk into the server memory to avoid frequent access to the hard disk.
  • Initializing the recognition instance establishes a set number of speech recognition instances, and stores the speech recognition instance, ie, the recognition instance, in the speech recognition instance table. Before the voice connection is not connected, the voice recognition instance table may only store the initialized voice recognition instances. However, when the server is connected to the voice connection, the voice recognition instance table stores the global unique identifier and voice corresponding to the voice request. Identify the correspondence of the instances.
  • the server includes M physical cores
  • the initialized task queue includes a queue of data packets waiting to be processed, a list of processed packets, a processed packet list, and an identification instance table, and are initialized in each task queue.
  • the number of storable targets is N, which can be
  • the worker thread is the voice recognition thread.
  • the server main thread creates a voice recognition thread for each physical core, that is, a total of M voice recognition threads are created. Since the number of voice recognition threads is equal to the number of physical cores contained in the server logical kernel, that is, the voice recognition threads do not take turns or randomly compete for processor resources.
  • the identification server can execute S5 to establish a voice connection with the client.
  • each voice recognition thread created in S4 processes the voice data packet corresponding to the voice connection, that is, executes S6.
  • a speech recognition request corresponds to a voice connection.
  • each voice request is given a global unique identifier, that is, a UID, that is, one voice connection corresponds to one UID, and each voice connection receives a data packet from a voice request.
  • a voice packet contained in a voice request corresponding to a UID is pressed into a queue of data packets waiting to be processed in a first-in, first-out.
  • Pushing a new voice packet into the queue of packets waiting to be processed requires the following four conditions to be met:
  • the UID corresponding to the voice request to which the requested voice data packet belongs does not exist in the queue of data packets waiting to be processed; if yes, the last voice data packet indicating the voice request corresponding to the UID has not been processed.
  • the UID corresponding to the voice request to which the requested voice data packet belongs does not exist in the data packet list being processed; if it exists, the last voice packet of the voice request corresponding to the UID has not been processed yet.
  • the UID corresponding to the voice request to which the requested voice data packet belongs does not exist in the processed data packet list; if yes, the last voice packet corresponding to the voice request corresponding to the UID It has been processed but has not been taken away by the voice connection to which it belongs.
  • Step 1 When the voice recognition thread processes the voice data packet, it acquires a voice data packet into the data packet queue waiting for processing for processing.
  • Step 2 Find a corresponding speech recognition instance according to the UID of the voice request of the voice request carried in the voice data packet.
  • the voice data packet is the first voice data packet requested by the voice request corresponding to the UID. At this time, an idle voice recognition is found.
  • the instance is bound to the UID, and the successful binding means that the voice recognition instance is only the voice request service bound.
  • Step 3 After obtaining the voice data packet to be identified and the corresponding voice recognition instance, the UID carried by the voice data packet is moved from the queue of data packets waiting to be processed to the data packet list being processed.
  • the UID is moved from the queue of packets waiting to be processed to the list of packets being processed, indicating that the voice packet requested by the request corresponding to the UID is being processed.
  • Step 4 After the voice recognition thread finishes processing the voice data packet corresponding to the UID, the voice identifier is moved from the list of processed data packets to the processed data packet list.
  • the UID is moved from the list of processed packets to the processed list of packets to indicate that the voice packet has been processed, and the identified feedback is placed in the feedback area of the voice packet.
  • the voice connection waiting for the recognition feedback continuously scans the processed packet list. Once the corresponding UID is found, the recognition result is taken away. And the UID is removed from the "processed packet queue. If the last voice packet of a voice request is currently processed, in addition to taking the recognition result, the UID and the previously bound voice recognition instance need to be unbound. Determined to indicate that the status of the identified instance is idle.
  • each voice processing thread obtains a voice data packet to be processed from a queue of data packets waiting to be processed. Processing, and moving the UID carried by the voice packet in the process to the list of data packets being processed, the number of voices to be processed The UID carried by the packet is moved to the processed packet list.
  • the voice data packet to be processed is acquired from the queue of data packets waiting to be processed for parallel speech recognition.
  • the voice processing method provided by this specific example on the one hand, because the number of voice recognition threads created is equal to the number of physical cores included in the server, the voice recognition thread does not take turns or randomly compete for processor resources, thereby improving The throughput of the speech recognition system.
  • the server computing resource and the voice recognition instance are loosened in the specific example, and a voice recognition thread not only serves a voice request (ie, a voice recognition thread processes more than one voice data packet requested by a voice connection), thereby Improve the throughput of the speech recognition system.
  • FIG. 8 a block diagram of a structure of a voice processing apparatus according to a third embodiment of the present invention is shown.
  • the voice processing device of the embodiment of the present invention includes: an obtaining module 802, configured to: the current voice recognition thread acquires a voice data packet stored according to the setting rule; and the processing module 804 is configured to invoke the voice recognition instance to perform voice on the obtained voice data packet.
  • a determining process 806, configured to determine whether there are still other voice data packets to be processed after the processing of the obtained voice data packet is completed; and returning to the module 808, if yes, returning the acquiring module to continue to the other voice data.
  • the packet performs voice recognition processing, wherein the acquired voice data packet belongs to a different voice connection with other voice data packets.
  • the current voice recognition thread is not only A voice connection provides a voice processing service. After the current voice data packet corresponding to the voice connection is processed, that is, when the current voice recognition thread is idle, the current voice recognition thread actively determines whether there are still other voice data packets to be processed, if any, Then directly obtain a voice packet of other requests to be processed for processing.
  • the voice processing device provided by the embodiment of the present invention on the one hand, actively acquires other voice data packets to be processed for voice processing when the voice recognition thread is idle, therefore, the voice recognition thread does not have the voice recognition thread existing in the prior art. Waiting for the next voice packet corresponding to the voice connection causes a waste of resources.
  • the voice recognition thread in the embodiment of the present invention is idle, the idle state of the corresponding physical core is changed by actively acquiring other voice data packets to be processed, so that the resources of the physical core can be fully utilized. There is no switching between speech recognition threads in the whole process, and all the resources of the physical core corresponding to the speech recognition thread are utilized for processing the voice data packet, thereby effectively avoiding the existing speech recognition threads among the existing processing methods. The problem of consuming physical nuclear resources.
  • the rotation between the voice threads is performed by the processor according to the clock cycle set by itself, that is, when processing the same voice data packet, it may be temporarily interrupted for a period of time to process another voice recognition thread.
  • Voice packets therefore, the rotation between voice recognition threads will result in a slowdown in the processing speed of a single voice request.
  • there is no rotation between the voice threads and therefore, the problem that the speed of the single voice request processing is reduced due to the rotation of the voice recognition thread can be avoided.
  • FIG. 9 a block diagram of a structure of a speech processing apparatus according to a fourth embodiment of the present invention is shown.
  • the embodiment of the present invention is further optimized for the voice processing device in the third embodiment.
  • the optimized voice processing device includes: an obtaining module 902, configured to obtain, by the current voice recognition thread, a voice data packet stored according to the setting rule; 904, used to invoke the voice recognition instance to perform voice recognition processing on the obtained voice data packet; the determining module 906 is configured to determine whether there are still other voice data packets to be processed after the processed voice data packet is processed; 908. If yes, the return obtaining module continues to perform voice recognition processing on other voice data packets, where the obtained voice data packets belong to different voice connections with other voice data packets.
  • the processing module 904 includes: a voice recognition instance invoking module 9042, configured to invoke a voice recognition instance corresponding to the voice data packet; and a voice recognition module 9044 for collecting Performing a voice recognition process on the voice data packet by using the invoked voice recognition instance, and pressing the processed voice data packet into the processed voice data packet storage space, where the voice data packet belongs Voice connection is obtained.
  • a voice recognition instance invoking module 9042 configured to invoke a voice recognition instance corresponding to the voice data packet
  • a voice recognition module 9044 for collecting Performing a voice recognition process on the voice data packet by using the invoked voice recognition instance, and pressing the processed voice data packet into the processed voice data packet storage space, where the voice data packet belongs Voice connection is obtained.
  • the voice recognition instance calling module 9042 invokes the voice recognition instance corresponding to the voice data packet, it is determined whether the voice recognition instance corresponding to the voice connection to which the voice data packet belongs is stored in the identification instance list; if yes, Directly invoking a voice recognition instance corresponding to the voice connection; if not, calling a voice recognition instance that is currently in an idle state from the voice recognition instance list, and establishing the invoked voice recognition instance and Corresponding relationship between the voice connections to which the voice data packet belongs, wherein the idle voice recognition instance is a voice recognition instance that does not establish a correspondence with the voice connection.
  • the voice data packet stored according to the setting rule is stored in the following manner: the voice connection to which the voice data packet belongs determines the voice data packet storage space to be processed, the voice data packet storage space currently being processed, and the processing Whether the voice data packet corresponding to the voice connection is stored in the completed voice data packet storage space; if none of the voice data packets corresponding to the voice connection is stored, the voice data packet is stored in the voice data to be processed In the package storage space.
  • the voice processing device of the embodiment of the present invention further includes: an identifier determining module 910, configured to: after the voice recognition module 9044 presses the voice data packet that has been processed into the processed voice data packet queue, Determining whether the voice data packet includes an end identifier, where the end identifier is used to indicate that the current voice data packet is the last voice data packet corresponding to the associated voice connection; and the voice recognition instance release module 912 is configured to When the determination result of the identifier judging module is YES, the speech recognition instance corresponding to the voice data packet is released, and the released state of the speech recognition instance is updated to be idle.
  • an identifier determining module 910 configured to: after the voice recognition module 9044 presses the voice data packet that has been processed into the processed voice data packet queue, Determining whether the voice data packet includes an end identifier, where the end identifier is used to indicate that the current voice data packet is the last voice data packet corresponding to the associated voice connection
  • the voice recognition instance release module 912 is configured to When the determination
  • the voice processing device of the embodiment of the present invention further includes: a creating module 914, configured to: before the acquiring module 902, the current voice recognition thread acquires the voice data packet stored according to the setting rule, the server main thread includes the server according to the server The number of physical cores, each creating a voice recognition thread for each physical core.
  • a creating module 914 configured to: before the acquiring module 902, the current voice recognition thread acquires the voice data packet stored according to the setting rule, the server main thread includes the server according to the server The number of physical cores, each creating a voice recognition thread for each physical core.
  • the voice processing device of the embodiment of the present invention further includes: a production module 916, configured to generate an interrupt identifier when the server main thread receives an external interrupt request to the server; and a connection processing module 918, configured to use the interrupt identifier For the voice connection established in the server And continuing to perform voice recognition processing on the voice data packet corresponding to the voice connection, and after all the voice data packets corresponding to the voice connection perform voice recognition processing, interrupt the voice connection; for establishing a connection with the server The voice connection directly cancels the request to establish a connection with the server, and no longer establishes a voice connection between the servers.
  • a production module 916 configured to generate an interrupt identifier when the server main thread receives an external interrupt request to the server
  • a connection processing module 918 configured to use the interrupt identifier For the voice connection established in the server And continuing to perform voice recognition processing on the voice data packet corresponding to the voice connection, and after all the voice data packets corresponding to the voice connection perform voice recognition processing, interrupt the voice connection; for establishing
  • the voice processing device of this embodiment is used to implement the corresponding voice processing method in the foregoing embodiment 1 and the second embodiment, and has the beneficial effects of the corresponding method embodiments, and details are not described herein again.
  • the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without deliberate labor.
  • the various component embodiments of the present invention may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof.
  • a microprocessor or digital signal processor may be used in practice to implement some or all of the functionality of some or all of the components in a server in accordance with an embodiment of the present invention.
  • the invention can also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein.
  • a program implementing the invention may be stored on a computer readable medium or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
  • FIG. 10 illustrates a server, such as an application server, that can implement the voice processing method in accordance with the present invention.
  • the server conventionally includes a processor 1010 and a computer program product or computer readable medium in the form of a memory 1020.
  • the memory 1020 may be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM, a hard disk, or a ROM.
  • the memory 1020 has a memory space 1030 for executing program code 1031 of any of the above method steps.
  • storage space 1030 for program code may include various program code 1031 for implementing various steps in the above methods, respectively.
  • These program codes can be from one Read or write to one or more computer program products in one or more computer program products.
  • These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks.
  • Such a computer program product is typically a portable or fixed storage unit as described with reference to FIG.
  • the storage unit may have a storage section, a storage space, and the like arranged similarly to the storage 1020 in the server of FIG.
  • the program code can be compressed, for example, in an appropriate form.
  • the storage unit includes computer readable code 1031', i.e., code that can be read by, for example, a processor such as 1010, which when executed by the server causes the server to perform various steps in the methods described above.

Abstract

一种语音处理方法和装置,其中,所述方法包括:当前语音识别线程获取按照设定规则存储的语音数据包(S102);调用语音识别实例对获取的所述语音数据包进行语音识别处理(S104);在对获取的所述语音数据包处理完成后、判断是否还存在待处理的其它语音数据包(S106);若存在,则返回所述当前语音识别线程获取按照设定规则存储的语音数据包的步骤,继续对所述其它语音数据包进行语音识别处理,其中,获取的所述语音数据包与所述其它语音数据包属于不同的语音连接。能够使语音识别服务器资源被充分利用于处理语音请求。

Description

语音处理方法与装置
本申请要求在2015年8月12日提交中国专利局、申请号为201510497684.5、发明名称为“语音处理方法与装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及语音处理技术领域,特别是涉及一种语音处理方法与装置。
背景技术
基于互联网的语音识别云服务能够帮助用户实现语音搜索,语音输入,语音对话等。如图1所示,在这些服务中,待处理的语音从用户终端,通过互联网被传输给语音识别服务器,语音识别服务器端的识别结果沿着原链路返回给用户终端展示给用户。基于互联网的语音识别服务的用户请求量巨大、且日益增加,使得原本就有限的识别资源变得越发紧张。可见,对于本领域技术人员而言,使语音识别服务器资源能够被充分的利用于语音请求处理,使得语音识别服务器在单位时间内可以识别更多的语音请求,是亟待解决的问题。
发明人在实现本发明的过程中发现,现有的两种解决方案分别如下:
现有的第一种解决方案:
如图2所示,在语音识别服务中,为识别服务器的每个物理核创建一个语音识别线程。为了实时给用户反馈语音识别结果,并减小用户等待语音识别结果的时间,在用户输入语音的同时,将用户请求识别的语音以1秒语音时长为单位,切分成语音数据包发送到语音识别服务器(如图2中所示的请求1包1,请求1包2)。在该方案中为来自一个客户端的语音数据包建立一个语音连接,同时为这个语音连接绑定一个语音识别线程和语音识别实例。也就是说,一个语音识别线程仅为一个语音连接所请求的语音数据包服务。
而事实上,语音识别线程在处理1秒语音时长的语音数据包时,实时率通常在0.6至0.9之间,即处理1秒钟的语音数据包需要0.6秒至0.9秒即可 识别完毕。可见,语音识别线程的识别速度要比用户输入语音的速度快。这将导致,在语音识别线程服务器语音连接所请求的一个语音数据包处理完后,等待该语音连接所请求的下一个语音数据包时,语音识别线程处于闲置状态。本领域技术人员能够明了,语音线程闲置则说明语音识别服务器资源未能得到充分利用,语音识别服务器资源利用率低。
现有的第二种解决方案:
为语音识别服务器中的每个物理核创建多个语音识别线程,并建立与识别线程相同数量的语音识别实例,即,识别服务器的一个物理核需要为多个语音识别线程提供服务。比如一台识别服务器有12个逻辑处理器机物理核,可以创建48组语音连接、语音识别线程和语音识别实例。
现有的第二种解决方案,当一个语音识别线程当前处理的语音数据包处理完成后,此时该语音处理线程对应的物理核闲置,此时,其他具有待处理的语音数据包的语音处理线程抢用闲置的该物理核。由于各语音线程切换时刻不受控制,往往多个语音线程争抢一个物理核,将造成语音线程间的物理核处理时间轮转,这样,虽然物理核未闲置,但是物理核的部分资源由于语音线程的轮转被消耗,依然未被利用于处理语音请求。可见,现有的第二种简单依靠增加语音识别线程组数的解决方案,依然存在语音识别服务器资源无法被充分利用于处理语音请求的问题。
可见,现有的处理方案均无法使语音识别服务器资源被充分利用于处理语音请求。
发明内容
鉴于上述现有的处理方案,均无法使语音识别服务器资源被充分利用于处理语音请求的问题,提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的语音处理方法与装置。
根据本发明的一个方面,本发明公开了一种语音处理方法,包括:当前语音识别线程获取按照设定规则存储的语音数据包;调用语音识别实例对获取的所述语音数据包进行语音识别处理;在对获取的所述语音数据包处理完成后、判断是否还存在待处理的其它语音数据包;若存在,则返回所述当前语音识别线程获取按照设定规则存储的语音数据包的步骤,继续对所述其它语音数据包进行语音识别处理,其中,获取的所述语音数据 包与所述其它语音数据包属于不同的语音连接。
根据本发明的另一个方面,本发明还公开了一种语音处理装置,包括:获取模块,用于供当前语音识别线程获取按照设定规则存储的语音数据包;处理模块,用于调用语音识别实例对获取的所述语音数据包进行语音识别处理;判断模块,用于在对获取的所述语音数据包处理完成后、判断是否还存在待处理的其它语音数据包;返回模块,用于若存在,则返回所述获取模块继续对所述其它语音数据包进行语音识别处理,其中,获取的所述语音数据包与所述其它语音数据包属于不同的语音连接。
根据本发明的又一个方面,提供了一种计算机程序,其包括计算机可读代码,当所述计算机可读代码在服务器上运行时,导致所述服务器执行上述的语音处理方法。
根据本发明的再一个方面,提供了一种计算机可读介质,其中存储了上述的计算机程序。
本发明的有益效果为:
本发明实施例提供的语音处理方案,当前语音识别线程并非仅为一个语音连接提供语音处理服务,在当前语音连接对应的语音数据包处理完成后即当前语音识别线程空闲时,当前语音识别线程主动判断是否还存在待处理的其它请求的语音数据包,若存在,则直接获取一个待处理的其他请求的语音数据包进行处理。本发明实施例提供的语音处理方案,一方面,当语音识别线程空闲时主动获取其他待处理的语音数据包进行语音处理,因此,语音识别线程并不存在现有技术中所存在的语音识别线程等待语音连接对应的下一个语音数据包而造成资源浪费的问题。另一方面,本发明实施例中的语音识别线程空闲时是通过主动获取其他待处理的语音数据包,来改变其对应的物理核的空闲状态,以使物理核的资源能够被充分利用。整个过程中都不存在语音识别线程之间的切换,语音识别线程对应的物理核的全部资源都利用于处理语音数据包,因此,能够有效避免现有的处理方案中存在的语音识别线程之间的轮转而消耗物理核资源的问题。再一方面,语音线程之间的轮转是由处理器按照自身设定的时钟周期进行的,即在处理同一个语音数据包时,可能暂时中断一段时间,来处理另外的一个语音识别线程对应的语音数据包,因此,语音识别线程之间的轮转将导致单个语音请求处理速度的下降。而本发明实施例中,并不存在语音线程之间的轮转,因此,能够避免语音识 别线程轮转而导致的单个语音请求处理速度下降的问题。
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是现有的语音交互云示意图;
图2是现有的语音识别的交互示意图;
图3是根据本发明实施例一的一种语音处理方法的步骤流程图;
图4是根据本发明实施例二的一种语音处理方法的步骤流程图;
图5示出了采用实施例二中所示的语音处理方法与现有的第一种语音处理方法在处理语音请求时的性能对比图;
图6示出了执行实施例二中所述的语音处理方法的语音处理系统启动和中断的流程示意图;
图7示出了语音处理过程中语音数据包的生命周期示意图;
图8是根据本发明实施例三的一种语音处理装置的结构框图;
图9是根据本发明实施例四的一种语音处理装置的结构框图。图10示意性地示出了用于执行根据本发明的方法的服务器的框图;以及
图11示意性地示出了用于保持或者携带实现根据本发明的方法的程序代码的存储单元。
具体实施例
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动 前提下所获得的所有其他实施例,都属于本发明保护的范围。
实施例一
参照图3,示出了本发明实施例一的一种语音处理方法的步骤流程图。
本发明实施例的语音处理方法包括以下步骤:
步骤S102:当前语音识别线程获取按照设定规则存储的语音数据包。
存储语音数据包的设定规则可以由本领域技术人员根据实际需求进行设定,本发明实施例对此不作具体限制。例如:设置成语音连接请求处理的前一个语音数据包未处理完成前,不对该语音连接请求的下一个语音数据包进行存储。例如:设置语音连接请求处理的前一个语音数据包未处理完成后、设定时间后,再对该语音连接请求的下一个语音数据包进行存储。
其中,存储的各语音数据包可以以任意适当的方式进行存储。例如:在识别服务器中设置一个队列,语音数据包以存储时间的先后顺序依次存储在设置的该队列中。再例如:设置一个存储池,待存储的各语音数据包存储在该存储池中。
步骤S104:调用语音识别实例对获取的语音数据包进行语音识别处理。
在对语音进行识别时,语音识别线程为每个已建立成功的语音连接分配一个语音识别实例,每个语音连接处理一个语音请求所请求的全部语音数据包。其中,建立成功的语音连接为已经被分配处理语音请求的连接。语音识别实例中保存有语音请求当前的识别状态和识别历史,通过识别历史可以将当前识别的语音数据包与之前已经处理完成的语音数据包之间的数据进行前后语义的关联,以成功对将当前语音数据包进行处理。
步骤S106:在对获取的语音数据包处理完成后、判断是否还存在待处理的其它语音数据包;当判断结果为存在时,返回执行步骤S102,当判断结果为不存在时,执行设定操作。
本领域技术人员能够明了,服务器通常情况下会同时处理多个语音连接,而每个语音连接对应多个语音数据包,因此,在具体实现过程中,识别服务器中同时会存在多个待处理的语音数据包。
本发明实施例中,当前语音识别线程处理完当前语音连接所请求的语音数据包后,主动获取其他待处理的语音数据包进行语音处理,直至全部待处理的语音数据包处理完毕。
需要说明的是,设定操作可以由本领域技术人员根据实际需求进行设置。例如:设置成当判断出不存在待处理的语音数据包时,停止并释放语音识别线程。
通过本发明实施例提供的语音处理方法,当前语音识别线程并非仅为一个语音连接提供语音处理服务,在当前语音连接对应的语音数据包处理完成后即当前语音识别线程空闲时,当前语音识别线程主动判断是否还存在待处理的其它请求的语音数据包,若存在,则直接获取一个待处理的其他请求的语音数据包进行处理。本发明实施例提供的语音处理方法,一方面,当语音识别线程空闲时主动获取其他待处理的语音数据包进行语音处理,因此,语音识别线程并不存在现有技术中所存在的语音识别线程等待语音连接对应的下一个语音数据包而造成资源浪费的问题。另一方面,本发明实施例中的语音识别线程空闲时是通过主动获取其他待处理的语音数据包,来改变其对应的物理核的空闲状态,以使物理核的资源能够被充分利用。整个过程中都不存在语音识别线程之间的切换,语音识别线程对应的物理核的全部资源都利用于处理语音数据包,因此,能够有效避免现有的处理方法中存在的语音识别线程之间的轮转而消耗物理核资源的问题。再一方面,语音线程之间的轮转是由处理器按照自身设定的时钟周期进行的,即在处理同一个语音数据包时,可能暂时中断一段时间,来处理另外的一个语音识别线程对应的语音数据包,因此,语音识别线程之间的轮转将导致单个语音请求处理速度的下降。而本发明实施例中,并不存在语音线程之间的轮转,因此,能够避免语音识别线程轮转而导致的单个语音请求处理速度下降的问题。
实施例二
参照图4,示出了本发明实施例二的一种语音处理方法的步骤流程图。
本发明实施例的语音处理方法具体包括以下步骤:
步骤S202:服务器主线程依据服务器包含的物理核数量,分别为每个 物理核创建一个语音识别线程。
例如:服务器包含12个物理核,服务器主线程在语音处理系统启动后,分别为这12个物理核创建一个对应的语音识别线程。为了提高处理语音请求的并发度,可以建立多于12个的语音连接和语音识别实例。这样,服务器便可以同时处理12个以上的语音请求,其中,每个语音连接用于处理一个语音连接所请求处理的全部语音数据包。当一个语音识别线程识别完一个语音数据包后,自动获取另一个待处理的语音数据包,并调用所获取的语音数据包对应的语音识别实例,进行语音识别处理。
本发明实施例中,各语音识别线程并行处理语音数据包、且各语音识别线程对语音数据包进行处理时的处理方法相同。因此,本发明实施例中仅以一个语音识别线程对语音数据包进行处理为例进行说明。对于所创建的其他语音识别线程对语音数据包的处理流程参见本发明实施例即可,在此不再赘述。
本发明实施例中,由于语音识别线程的个数等于服务器逻辑内核即服务器包含的物理核的个数,因此,语音识别线程间不会轮流或随意争抢处理器资源。
本发明实施例中为每个物理核创建一个语音识别线程,每个语音识别线程可以为一个语音连接请求建立语音连接,建立成功的语音连接处理其对应的语音连接请求所发送的各语音数据包。
步骤S204:当前语音识别线程获取按照设定规则存储的语音数据包。
由于服务器需要同时处理多个语音连接所请求的语音数据包,因此,在服务器中存储有多个待处理的语音数据包。而这些语音数据包在存储至服务器中时,需要遵循设定规则。
一种优选的按照设定规则存储的语音数据包的方式如下:
S1:语音数据包所属的语音连接判断待处理的语音数据包存储空间中、当前正在处理的语音数据包存储空间中、以及处理完毕的语音数据包存储空间中是否存储有本语音连接对应的语音数据包;
其中,在服务器中可以预先设置三个存储空间,分别为待处理的语音数据包存储空间、正在处理的语音数据包存储空间以及处理完毕的语音数据包存储空间,这个三个存储空间可以是存储队列、也可以是存储池。
待处理的语音数据包存储在待处理的语音数据包存储空间中,语音识别线程空闲时则可从该存储空间中获取该待处理的语音数据包,并将该语音数据包存储至当前正在处理的语音数据包存储空间中对其进行语音处理。当该语音数据包处理完成后,语音识别线程将处理完的语音数据包存储至处理完毕的语音数据包存储空间中。该语音数据包所属的语音连接扫描处理完毕的语音数据包存储空间,获取处理完毕的该语音数据包对应的识别结果。此时,如果下个语音数据包已经到达,语音连接则向待处理的语音数据包存储空间中存储下一个语音数据包。
S2:若均未存储有本语音连接对应的语音数据包,则将语音数据包存储至待处理的语音数据包存储空间中。
语音数据包所属的语音连接在向服务器中存储待处理的语音数据包时,需要保证其请求处理的前一个语音数据包完全处理完毕后,再向服务器请求处理下一个语音数据包。因此,需要保证三个存储空间中均未存储有本语音连接对应的语音数据包,才会向服务器中存储下一个语音数据包。
步骤S206:当前语音识别线程调用与语音数据包对应的语音识别实例。
服务器初始化过程中,会初始化语音识别实例,初始化后的语音识别实例可以存储在预设的识别实例列表中,语音识别过程中需要调用时,即可从预设的队列中调用语音识别实例。
一种优选的调用与语音数据包对应的语音识别实例的方法如下:
S1:判断识别实例列表中是否存储有与语音数据包所属的语音连接对应的语音识别实例;若存在则执行S2,若不存在,则执行S3。
与服务器建立的每个语音连接都被服务器赋予一个全局唯一标识,该标识可以为UID即唯一识别码。当语音连接首次发送语音数据包时,语音识别线程会语音连接分配一个语音识别实例,并将该语音识别实例的标识与该语音连接对应的全局唯一标识建立对应关系。
当该语音连接发送下一个语音数据包时,在语音连接发送的语音数据包中携带有所属语音连接对应的全局唯一标识,语音处理线程通过语音数据包中携带的唯一标识即可确定该语音数据包所属的连接,再通过所建立 的语音识别实例标识与语音连接的全局唯一标识之间的对应关系,即可确定语音数据包所属的语音连接对应的语音识别实例。
当然,若当前语音数据包是所属的语音连接发送的首个数据包,语音识别线程还未为该语音数据包所属的语音连接分配语音识别实例,因此,在识别实例列表中不存在语音数据包所属的语音连接对应的语音识别实例。
需要说明的是,由于每个语音连接仅对应一个语音请求,因此,也可以通过为每个语音请求赋予一个全局唯一标识,来标识语音数据包所属的语音连接以及语音请求。
S2:若存在,则直接调用与语音连接对应的语音识别实例。
若识别实例列表中存储有与语音数据包所属的语音连接对应的语音识别实例,则证明当前语音数据包并非其所属的语音连接发送的首个语音数据包,此时,则直接调用已分配给该语音连接的语音识别实例对当前语音数据包进行语音识别即可。
S3:若不存在,则从语音识别实例列表中调用一个当前在状态为空闲的语音识别实例,并建立调用的语音识别实例与语音数据包所属的语音连接之间的对应关系。
其中,空闲的语音识别实例为未与语音连接建立对应关系的语音识别实例。
步骤S208:当前语音识别线程采用调用的语音识别实例对语音数据包进行语音识别处理,并将处理完成的语音数据包压入处理完毕的语音数据包存储空间中。
本步骤中,当前语音识别线程将处理完成的语音数据包压入处理完毕的语音数据包存储空间中,以供语音数据包所属的语音连接获取。
语音数据包所属的语音连接会按照设定规则扫描处理完毕的语音数据包存储空间,以确定所请求处理的语音数据包是否已处理完毕,在存储空间中扫描到该语音数据包,则证明该语音数据包处理完毕,语音数据包所属的语音连接即可将处理完毕的语音数据包对应的识别结果取走。
当然在具体实现过程中,可以在处理完毕的语音数据包存储空间中设置语音数据包列表,在语音数据列表中添加处理完成的语音数据包所属的 语音连接对应的UID,语音连接通过扫描、并判断设置的语音数据包列表中是否包含其对应的UID即可确定所请求处理的语音数据包是否已处理完毕。若其对应的UID在该列表中,则证明该语音数据包已处理完毕,则可将处理完毕的语音数据包对应的识别结果取走。
步骤S210:当前语音识别线程判断语音数据包中是否包含有结束标识;若包含,则执行步骤S212,若未包含,则执行设定操作。
其中,结束标识用于指示当前语音数据包为所属的语音连接对应的最后一个语音数据包。
本步骤中当前语音识别线程判断语音数据包中是否包含有结束标识,目的是判断语音连接所要请求的语音数据包是否全部处理完毕,若完毕,则可将语音连接对应的语音识别实例释放,供其他语音连接使用。
所述设定操作可以由本领域技术人员根据实际需求进行设定,本实施例对此不作具体限制。例如:设置成直接获取其他待处理的语音数据包。
一个语音连接所请求处理的语音数据被分成多个语音数据包,在最后一个数据包中携带有结束标识。例如:共将当前语音连接所请求处理的语音数据分成五个语音数据包,在前五个语音数据包中分别携带有用于指示当前语音数据包并非第一个语音数据包的标识(如将第一至第五个语音数据包中携带的标识分别设置为1、2、3、4、5),而第六个语音数据包由于是最后一个语音数据包,因此,在该语音数据包中携带有用于指示当前语音数据包为最后一个语音数据包的结束标识(如将该结束标识设置成-6)。
上面仅是列举了一种通过将语音数据包的标号进行正、负标注来确定语音数据包是都为最后一个包的方式。在具体实现过程中,本领域技术人员可以采用任意适当的设定方式来设定结束标识,例如:仅在最后一个语音数据包中携带设定标识,当数据包中携带有设定标识时,即可确定该包为其所属语音连接所请求的最后一个语音数据包。再例如:还可以将非最后一个语音数据包中均携带标识1,而在最后一个语音数据包中携带标识0,将标识0定义为结束标识。
步骤S212:若语音数据包中包含有结束标识,语音识别线程则将语音数据包对应的语音识别实例的释放,并将释放后的语音识别实例的状态更新为空闲。
将语音识别实例的状态更新为空闲,便可供其他的语音连接进行绑定。
步骤S214:语音识别线程在对获取的语音数据包处理完成后、判断是否还存在待处理的其它语音数据包;若存在,则返回执行步骤S204,若不存在,则执行设定操作。
本领域技术人员能够明了,服务器通常情况下会同时处理多个语音连接请求,而每个语音连接请求对应多个语音数据包,因此,在具体实现过程中,服务器中待处理的语音数据包存储空间中同时会存在多个待处理的语音数据包。本发明实施例中,当前语音识别线程处理完当前语音连接所请求的语音数据包后,主动获取其他待处理的语音数据包进行语音处理,直至全部待处理的语音数据包处理完毕。
其中,设定操作可以由本领域技术人员根据实际需求进行设置。例如:设置成当判断出不存在待处理的语音数据包时,停止并释放语音识别线程。
图5示出了采用本实施例中所示的语音处理方法与现有的第一种语音处理方法(为服务器的每个物理核创建一个语音识别线程,并且将语音识别线程和语音识别实例绑定,每个语音识别线程仅处理一个语音连接对应的语音请求)在处理语音请求时的性能对比图。
该对比图,是通过把同样一批数据压入分别两个语音处理系统中进行压力测试后的测试数据进行统计所生成的,测试的在这两个进行测试的语音处理系统中服务器均包含12个物理核。
其中,横坐标表示:最大识别并发数,即服务器同时处理的语音连接的数量;纵坐标表示语音识别实时率,即语音线程处理1秒钟的语音对应的语音数据包的时间,通过语音识别实时率能够体现服务器的处理语音请求的性能。
在图5中,圆点串接的线指示采用本实施例中的语音处理方法处理语音请求时的性能曲线;方块串接的线指示采用现有的第一种线程绑定的语音处理方法处理语音请求时的性能曲线。从图5中可知,随着最大识别并发数增加,两个语音识别系统的语音识别实时率均在增加。当最大识别并发数大于24时,本发明的优势开始体现,此时虽然本发明的实时率也在升高, 但升高趋势不明显。可见,本发明实施例提供的语音处理方法能够提升语音处理系统的吞吐量。
在具体实现过程中,通常会面临对服务中存储的语音识别的语音模型进行更新的问题,当需要对服务器中用于进行语音识别的语音模型进行更新时,则需要对服务器进行外部中断。现有的技术方案在处理该中断时,则直接将服务器连接的各语音连接中断,影响用户的使用体验。本发明实施例中的语音处理方法,当服务器主线程接收到对服务器的外部中断请求后,会维持已经建立语音连接并继续处理这些已经建立的语音连接所请求处理的语音数据包,直至语音连接所请求处理的全部语音数据包处理完毕后,再中断该语音连接。而对于未建立的语音连接,则不再接受其连接请求。
具体实现如下:
当服务器主线程接收到对服务器的外部中断请求时,生成中断标识;根据中断标识,针对服务器中已建立的语音连接,继续对语音连接对应的语音数据包进行语音识别处理,待语音连接对应的全部语音数据包进行语音识别处理后,中断语音连接;针对未与服务器建立连接的语音连接,直接取消与服务器建立连接的请求,不再建立于服务器之间的语音连接。
本发明实施例提供的这种处理外部中断请求方案,对已经建立语音连接的用户继续提供服务,直至语音连接所请求的全部语音数据包处理完毕,对于用户而言语音识别请求不会受到影响,更不会突然中断,因此,能够提升用户的使用体验。
通过本发明实施例提供的语音处理方法,当前语音识别线程并非仅为一个语音连接提供语音处理服务,在当前语音连接对应的语音数据包处理完成后即当前语音识别线程空闲时,当前语音识别线程主动判断是否还存在待处理的其它请求的语音数据包,若存在,则直接获取一个待处理的其他请求的语音数据包进行处理。本发明实施例提供的语音处理方法,一方面,当语音识别线程空闲时主动获取其他待处理的语音数据包进行语音处理,因此,语音识别线程并不存在现有技术中所存在的语音识别线程等待语音连接对应的下一个语音数据包而造成资源浪费的问题。另一方面,本发明实施例中的语音识别线程空闲时是通过主动获取其他待处理的语音数 据包,来改变其对应的物理核的空闲状态,以使物理核的资源能够被充分利用。整个过程中都不存在语音识别线程之间的切换,语音识别线程对应的物理核的全部资源都利用于处理语音数据包,因此,能够有效避免现有的处理方法中存在的语音识别线程之间的轮转而消耗物理核资源的问题。再一方面,语音线程之间的轮转是由处理器按照自身设定的时钟周期进行的,即在处理同一个语音数据包时,可能暂时中断一段时间,来处理另外的一个语音识别线程对应的语音数据包,因此,语音识别线程之间的轮转将导致单个语音请求处理速度的下降。而本发明实施例中,并不存在语音线程之间的轮转,因此,能够避免语音识别线程轮转而导致的单个语音请求处理速度下降的问题。
下面参照图6以一具体实例对本发明实施例的语音处理方法进行说明。
图6示出了执行实施例二中所述的语音处理方法的语音处理系统启动和中断的流程示意图。
如图6所示,语音处理系统即服务器启动时预先执行以下操作:
S1:加载识别资源。
其中,服务器在语音识别处理时使用相应地的识别资源,因此,在服务器启动时,需要预先加载识别资源。识别资源存储在硬盘中,加载的目的是将存储在硬盘中的识别资源加载至服务器内存中,以避免频繁访问硬盘。
S2:初始化识别实例。
初始化识别实例即建立设定个数的语音识别实例,并将语音识别实例即识别实例存储在语音识别实例表中。在未连接语音连接前,语音识别实例表中可能仅存储有初始化的各语音识别实例,但是,当服务器连接了语音连接后,语音识别实例表中将存储有语音请求对应的全局唯一标识与语音识别实例的对应关系。
S3:初始化任务队列。
本具体实例中以服务器包含M个物理核、初始化的任务队列包含等待处理的数据包队列、正在处理的数据包列表、处理完毕的数据包列表和识别实例表,并且初始化后的各任务队列中可存储目标的个数均为N,可并 发处理的语音连接个数为K(通常K=N)个为例,对后续的流程进行说明。
S4:创建并启动工作线程。
其中,工作线程即语音识别线程。
本具体实例中由于服务器包含M个物理核,因此,在创建语音识别线程时,服务器主线程为每个物理核创建一个语音识别线程,即共创建M个语音识别线程。由于语音识别线程的个数等于服务器逻辑内核即服务器包含的物理核的个数,因此,语音识别线程间不会轮流或随意争抢处理器资源。
在执行完上述操作后,识别服务器即可执行S5与客户端建立语音连接。本具体实例中,为了使识别服务器的资源能够充分利用,在S4中所创建的各语音识别线程均会处理语音连接对应的语音数据包,即执行S6。
S6:处理语音识别请求。
下面参照图3对处理一个语音识别请求中的一个语音数据包的处理进行说明。
从图3中可知,一个语音识别请求对应一个语音连接。在具体实现过程中,每个语音请求被赋予一个全局唯一标识即UID,也就是说一个语音连接对应一个UID,每个语音连接接收来自一个语音请求的数据包。一个UID对应的语音请求中包含的某个语音数据包被压到一个先进先出的等待处理的数据包队列中。
向等待处理的数据包队列中压入新的语音数据包需要同时满足下述4个条件:
1、队列未满;
2、请求压入的语音数据包所属的语音请求对应的UID不存在于等待处理的数据包队列;如果存在,说明该UID对应的语音请求的上个语音数据包还没有被处理。
3、请求压入的语音数据包所属的语音请求对应的UID不存在于正在处理的数据包列表中;如果存在,说明该UID对应的语音请求的上个语音包还没有处理完。
4、请求压入的语音数据包所属的语音请求对应的UID不存在于处理完毕的数据包列表中;如果存在,说明该UID对应的语音请求的上个语音包 已经处理完,但还没有被其所属的语音连接取走。
以上4个条件,但凡一个条件不满足,语音连接将进入等待状态,直到条件满足后,再压入所服务的语音请求的下一个语音数据包。语音识别请求中的一个语音数据包的处理流程如下:
步骤1:语音识别线程在处理语音数据包时,会到等待处理的数据包队列中获取一个语音数据包,进行处理。
步骤2:根据语音数据包中携带的所属语音请求的UID去识别实例表中找到相应的语音识别实例。
如果该语音数据包中携带的UID未与任何语音识别实例绑定,说明该语音数据包是这个UID对应的语音请求所请求的第一个语音数据包,此时,则找一个空闲的语音识别实例和该UID进行绑定,绑定成功则意味着该语音识别实例仅为所绑定的语音请求服务。
步骤3:取得待识别语音数据包,和其对应的语音识别实例后,该语音数据包携带的UID被从等待处理的数据包队列中移动到正在处理的数据包列表中。
将UID从等待处理的数据包队列中里移动到正在处理的数据包列表中,表示这个UID对应的请求所请求的语音数据包正在被处理。
步骤4:语音识别线程处理完该UID对应的语音数据包后,将该UID从正在处理的数据包列表中移动到处理完毕的数据包列表中。
将该UID从正在处理的数据包列表中移动到处理完毕的数据包列表中以表示该语音数据包处理完毕,识别的反馈被放到语音数据包的反馈区域中。等待识别反馈的语音连接不断的扫描处理完毕的数据包列表。一旦发现其对应的UID,就会把识别结果取走。并把该UID从“处理完毕的数据包队列中移除。如果当前处理的是一个语音请求的最后一个语音包,除了取走识别结果,还需要把UID和之前绑定的语音识别实例解除绑定,以表示该识别实例的状态为空闲。
需要说明的是,上述仅是以单个语音处理线程处理一个语音数据包为例进行的说明,在具体实现过程中,各语音处理线程均从等待处理的数据包队列中获取待处理的语音数据包进行处理,并且,将处理过程中的语音数据包携带的UID移动到正在处理的数据包列表中,将处理完成的语音数 据包携带的UID移动到处理完毕的数据包列表中。当各语音处理线程空闲时,则会从等待处理的数据包队列中获取待处理的语音数据包进行处理,以实现并行语音识别。
当需要对服务器中用于进行语音识别的语音模型进行更新时,则需要对服务器进行外部中断,本具体实例中,当接收到S7外部中断请求后,则执行S8。
S8:关闭服务连接。
本具体实例中,在关闭服务连接前,必须处理完已建立的各语音识别任务,并返回最终识别结果(即需要处理完以建立的语音连接所请求处理的全部语音数据包)后才关闭所有语音连接,释放语音识别实例,释放全局的模型资源并退出。具体地,在处理完已建立的各语音识别任务后依次执行S9:停止并释放语音识别线程;S10:释放任务队列;S11:释放语音识别实例;S12:释放识别资源。
本具体实例提供的语音处理方法,一方面,由于所创建的语音识别线程的个数等于服务器包含的物理核的个数,因此,语音识别线程不会轮流或随意争抢处理器资源,从而提高语音识别系统的吞吐量。再一方面,本具体实例中松绑了服务器计算资源和语音识别实例,一个语音识别线程不止服务于一个语音请求(即一个语音识别线程不止处理一个语音连接所请求的处理的语音数据包),从而提高了语音识别系统的吞吐量。
实施例三
参照图8,示出了本发明实施例三的一种语音处理装置的结构框图。
本发明实施例的语音处理装置包括:获取模块802,用于供当前语音识别线程获取按照设定规则存储的语音数据包;处理模块804,用于调用语音识别实例对获取的语音数据包进行语音识别处理;判断模块806,用于在对获取的语音数据包处理完成后、判断是否还存在待处理的其它语音数据包;返回模块808,用于若存在,则返回获取模块继续对其它语音数据包进行语音识别处理,其中,获取的语音数据包与其它语音数据包属于不同的语音连接。
通过本发明实施例提供的语音处理装置,当前语音识别线程并非仅为 一个语音连接提供语音处理服务,在当前语音连接对应的语音数据包处理完成后即当前语音识别线程空闲时,当前语音识别线程主动判断是否还存在待处理的其它请求的语音数据包,若存在,则直接获取一个待处理的其他请求的语音数据包进行处理。本发明实施例提供的语音处理装置,一方面,当语音识别线程空闲时主动获取其他待处理的语音数据包进行语音处理,因此,语音识别线程并不存在现有技术中所存在的语音识别线程等待语音连接对应的下一个语音数据包而造成资源浪费的问题。另一方面,本发明实施例中的语音识别线程空闲时是通过主动获取其他待处理的语音数据包,来改变其对应的物理核的空闲状态,以使物理核的资源能够被充分利用。整个过程中都不存在语音识别线程之间的切换,语音识别线程对应的物理核的全部资源都利用于处理语音数据包,因此,能够有效避免现有的处理方法中存在的语音识别线程之间的轮转而消耗物理核资源的问题。再一方面,语音线程之间的轮转是由处理器按照自身设定的时钟周期进行的,即在处理同一个语音数据包时,可能暂时中断一段时间,来处理另外的一个语音识别线程对应的语音数据包,因此,语音识别线程之间的轮转将导致单个语音请求处理速度的下降。而本发明实施例中,并不存在语音线程之间的轮转,因此,能够避免语音识别线程轮转而导致的单个语音请求处理速度下降的问题。
实施例四
参照图9,示出了本发明实施例四的一种语音处理装置的结构框图。
本发明实施例是对实施例三中的语音处理装置的进一步优化,优化后的语音处理装置包括:获取模块902,用于供当前语音识别线程获取按照设定规则存储的语音数据包;处理模块904,用于调用语音识别实例对获取的语音数据包进行语音识别处理;判断模块906,用于在对获取的语音数据包处理完成后、判断是否还存在待处理的其它语音数据包;返回模块908,用于若存在,则返回获取模块继续对其它语音数据包进行语音识别处理,其中,获取的语音数据包与其它语音数据包属于不同的语音连接。
优选地,所述处理模块904包括:语音识别实例调用模块9042,用于调用与所述语音数据包对应的语音识别实例;语音识别模块9044,用于采 用调用的所述语音识别实例对所述语音数据包进行语音识别处理,并将处理完成的所述语音数据包压入处理完毕的语音数据包存储空间中,以供所述语音数据包所属的语音连接获取。
优选地,语音识别实例调用模块9042调用与所述语音数据包对应的语音识别实例时:判断识别实例列表中是否存储有与所述语音数据包所属的语音连接对应的语音识别实例;若存在,则直接调用与所述语音连接对应的语音识别实例;若不存在,则从所述语音识别实例列表中调用一个当前在状态为空闲的语音识别实例,并建立调用的所述语音识别实例与所述语音数据包所属的语音连接之间的对应关系,其中,所述空闲的语音识别实例为未与语音连接建立对应关系的语音识别实例。
优选地,按照设定规则存储的语音数据包通过以下方式存储:所述语音数据包所属的语音连接判断待处理的语音数据包存储空间中、当前正在处理的语音数据包存储空间中、以及处理完毕的语音数据包存储空间中是否存储有本语音连接对应的语音数据包;若均未存储有本语音连接对应的语音数据包,则将所述语音数据包存储至所述待处理的语音数据包存储空间中。
优选地,本发明实施例的语音处理装置还包括:标识判断模块910,用于在所述语音识别模块9044将处理完成毕的所述语音数据包压入处理完毕的语音数据包队列中之后,判断所述语音数据包中是否包含有结束标识,其中,所述结束标识用于指示当前语音数据包为所属的语音连接对应的最后一个语音数据包;语音识别实例释放模块912,用于若所述标识判断模块的判断结果为是时,则将所述语音数据包对应的语音识别实例的释放,并将释放后的所述语音识别实例的状态更新为空闲。
优选地,本发明实施例的语音处理装置还包括:创建模块914,用于在所述获取模块902供当前语音识别线程获取按照设定规则存储的语音数据包之前,供服务器主线程依据服务器包含的物理核数量,分别为每个物理核创建一个语音识别线程。
优选地,本发明实施例的语音处理装置还包括:生产模块916,用于当服务器主线程接收到对服务器的外部中断请求时,生成中断标识;连接处理模块918,用于根据所述中断标识,针对所述服务器中已建立的语音连 接,继续对所述语音连接对应的语音数据包进行语音识别处理,待所述语音连接对应的全部语音数据包进行语音识别处理后,中断所述语音连接;针对未与所述服务器建立连接的语音连接,直接取消与所述服务器建立连接的请求,不再建立于所述服务器之间的语音连接。
本实施例的语音处理装置用于实现前述实施例一以及实施例二中相应的语音处理方法,并且具有相应的方法实施例的有益效果,在此不再赘述。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的服务器中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
例如,图10示出了可以实现根据本发明的语音处理方法的服务器,例如应用服务器。该服务器传统上包括处理器1010和以存储器1020形式的计算机程序产品或者计算机可读介质。存储器1020可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器1020具有用于执行上述方法中的任何方法步骤的程序代码1031的存储空间1030。例如,用于程序代码的存储空间1030可以包括分别用于实现上面的方法中的各种步骤的各个程序代码1031。这些程序代码可以从一 个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为如参考图11所述的便携式或者固定存储单元。该存储单元可以具有与图10的服务器中的存储器1020类似布置的存储段、存储空间等。程序代码可以例如以适当形式进行压缩。通常,存储单元包括计算机可读代码1031’,即可以由例如诸如1010之类的处理器读取的代码,这些代码当由服务器运行时,导致该服务器执行上面所描述的方法中的各个步骤。
本文中所称的“一个实施例”、“实施例”或者“一个或者多个实施例”意味着,结合实施例描述的特定特征、结构或者特性包括在本发明的至少一个实施例中。此外,请注意,这里“在一个实施例中”的词语例子不一定全指同一个实施例。
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下被实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。
此外,还应当注意,本说明书中使用的语言主要是为了可读性和教导的目的而选择的,而不是为了解释或者限定本发明的主题而选择的。因此,在不偏离所附权利要求书的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。对于本发明的范围,对本发明所做的公开是说明性的,而非限制性的,本发明的范围由所附权利要求书限定。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其 限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (16)

  1. 一种语音处理方法,其特征在于,包括:
    当前语音识别线程获取按照设定规则存储的语音数据包;
    调用语音识别实例对获取的所述语音数据包进行语音识别处理;
    在对获取的所述语音数据包处理完成后、判断是否还存在待处理的其它语音数据包;
    若存在,则返回所述当前语音识别线程获取按照设定规则存储的语音数据包的步骤,继续对所述其它语音数据包进行语音识别处理,其中,获取的所述语音数据包与所述其它语音数据包属于不同的语音连接。
  2. 根据权利要求1所述的方法,其特征在于,所述调用语音识别实例对获取的所述语音数据包进行语音识别处理的步骤包括:
    调用与所述语音数据包对应的语音识别实例;
    采用调用的所述语音识别实例对所述语音数据包进行语音识别处理,并将处理完成的所述语音数据包压入处理完毕的语音数据包存储空间中,以供所述语音数据包所属的语音连接获取。
  3. 根据权利要求2所述的方法,其特征在于,其中,所述调用与所述语音数据包对应的语音识别实例的步骤包括:
    判断识别实例列表中是否存储有与所述语音数据包所属的语音连接对应的语音识别实例;
    若存在,则直接调用与所述语音连接对应的语音识别实例;
    若不存在,则从所述语音识别实例列表中调用一个当前在状态为空闲的语音识别实例,并建立调用的所述语音识别实例与所述语音数据包所属的语音连接之间的对应关系,其中,所述空闲的语音识别实例为未与语音连接建立对应关系的语音识别实例。
  4. 根据权利要求1所述的方法,其特征在于,按照设定规则存储的语音数据包通过以下方式存储:
    所述语音数据包所属的语音连接判断待处理的语音数据包存储空间中、当前正在处理的语音数据包存储空间中、以及处理完毕的语音数据包存储空间中是否存储有本语音连接对应的语音数据包;
    若均未存储有本语音连接对应的语音数据包,则将所述语音数据包存储至所述待处理的语音数据包存储空间中。
  5. 根据权利要求2所述的方法,其特征在于,在所述将处理完成毕的所述语音数据包压入处理完毕的语音数据包队列中的步骤之后,所述方法还包括:
    判断所述语音数据包中是否包含有结束标识,其中,所述结束标识用于指示当前语音数据包为所属的语音连接对应的最后一个语音数据包;
    若是,则将所述语音数据包对应的语音识别实例的释放,并将释放后的所述语音识别实例的状态更新为空闲。
  6. 根据权利要求1所述的方法,其特征在于,在所述当前语音识别线程获取按照设定规则存储的语音数据包的步骤之前,所述方法还包括:
    服务器主线程依据服务器包含的物理核数量,分别为每个物理核创建一个语音识别线程。
  7. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    当服务器主线程接收到对服务器的外部中断请求时,生成中断标识;
    根据所述中断标识,针对所述服务器中已建立的语音连接,继续对所述语音连接对应的语音数据包进行语音识别处理,待所述语音连接对应的全部语音数据包进行语音识别处理后,中断所述语音连接;
    针对未与所述服务器建立连接的语音连接,直接取消与所述服务器建立连接的请求,不再建立于所述服务器之间的语音连接。
  8. 一种语音处理装置,其特征在于,包括:
    获取模块,用于供当前语音识别线程获取按照设定规则存储的语音数据包;
    处理模块,用于调用语音识别实例对获取的所述语音数据包进行语音识别处理;
    判断模块,用于在对获取的所述语音数据包处理完成后、判断是否还存在待处理的其它语音数据包;
    返回模块,用于若存在,则返回所述获取模块继续对所述其它语音数据包进行语音识别处理,其中,获取的所述语音数据包与所述其它语音数 据包属于不同的语音连接。
  9. 根据权利要求8所述的装置,其特征在于,所述处理模块包括:
    语音识别实例调用模块,用于调用与所述语音数据包对应的语音识别实例;
    语音识别模块,用于采用调用的所述语音识别实例对所述语音数据包进行语音识别处理,并将处理完成的所述语音数据包压入处理完毕的语音数据包存储空间中,以供所述语音数据包所属的语音连接获取。
  10. 根据权利要求9所述的装置,其特征在于,其中,所述语音识别实例调用模块调用与所述语音数据包对应的语音识别实例时:
    判断识别实例列表中是否存储有与所述语音数据包所属的语音连接对应的语音识别实例;
    若存在,则直接调用与所述语音连接对应的语音识别实例;
    若不存在,则从所述语音识别实例列表中调用一个当前在状态为空闲的语音识别实例,并建立调用的所述语音识别实例与所述语音数据包所属的语音连接之间的对应关系,其中,所述空闲的语音识别实例为未与语音连接建立对应关系的语音识别实例。
  11. 根据权利要求8所述的装置,其特征在于,按照设定规则存储的语音数据包通过以下方式存储:
    所述语音数据包所属的语音连接判断待处理的语音数据包存储空间中、当前正在处理的语音数据包存储空间中、以及处理完毕的语音数据包存储空间中是否存储有本语音连接对应的语音数据包;
    若均未存储有本语音连接对应的语音数据包,则将所述语音数据包存储至所述待处理的语音数据包存储空间中。
  12. 根据权利要求9所述的装置,其特征在于,所述语音处理装置还包括:
    标识判断模块,用于在所述语音识别模块将处理完成毕的所述语音数据包压入处理完毕的语音数据包队列中之后,判断所述语音数据包中是否包含有结束标识,其中,所述结束标识用于指示当前语音数据包为所属的语音连接对应的最后一个语音数据包;
    语音识别实例释放模块,用于若所述标识判断模块的判断结果为是 时,则将所述语音数据包对应的语音识别实例的释放,并将释放后的所述语音识别实例的状态更新为空闲。
  13. 根据权利要求8所述的装置,其特征在于,所述语音处理装置还包括:
    创建模块,用于在所述获取模块供当前语音识别线程获取按照设定规则存储的语音数据包之前,供服务器主线程依据服务器包含的物理核数量,分别为每个物理核创建一个语音识别线程。
  14. 根据权利要求8所述的装置,其特征在于,所述语音处理装置还包括:
    生产模块,用于当服务器主线程接收到对服务器的外部中断请求时,生成中断标识;
    连接处理模块,用于根据所述中断标识,针对所述服务器中已建立的语音连接,继续对所述语音连接对应的语音数据包进行语音识别处理,待所述语音连接对应的全部语音数据包进行语音识别处理后,中断所述语音连接;
    针对未与所述服务器建立连接的语音连接,直接取消与所述服务器建立连接的请求,不再建立于所述服务器之间的语音连接。
  15. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在服务器上运行时,导致所述服务器执行根据权利要求1-7中的任一个所述的语音处理方法。
  16. 一种计算机可读介质,其中存储了如权利要求15所述的计算机程序。
PCT/CN2016/088896 2015-08-12 2016-07-06 语音处理方法与装置 WO2017024908A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/243,839 US20170047069A1 (en) 2015-08-12 2016-08-22 Voice processing method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510497684.5A CN105702257A (zh) 2015-08-12 2015-08-12 语音处理方法与装置
CN201510497684.5 2015-08-12

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/243,839 Continuation US20170047069A1 (en) 2015-08-12 2016-08-22 Voice processing method and device

Publications (1)

Publication Number Publication Date
WO2017024908A1 true WO2017024908A1 (zh) 2017-02-16

Family

ID=56227886

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/088896 WO2017024908A1 (zh) 2015-08-12 2016-07-06 语音处理方法与装置

Country Status (3)

Country Link
US (1) US20170047069A1 (zh)
CN (1) CN105702257A (zh)
WO (1) WO2017024908A1 (zh)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105702257A (zh) * 2015-08-12 2016-06-22 乐视致新电子科技(天津)有限公司 语音处理方法与装置
CN106992007B (zh) * 2017-03-28 2020-07-28 百度在线网络技术(北京)有限公司 基于语音识别打分系统的数据处理方法和装置
CN107230478A (zh) * 2017-05-03 2017-10-03 上海斐讯数据通信技术有限公司 一种语音信息处理方法及系统
CN109994101A (zh) * 2018-01-02 2019-07-09 中国移动通信有限公司研究院 一种语音识别方法、终端、服务器及计算机可读存储介质
CN109324802B (zh) * 2018-09-29 2022-11-01 北京百度网讯科技有限公司 用于配置服务器的方法和装置
CN110289016A (zh) * 2019-06-20 2019-09-27 深圳追一科技有限公司 一种基于实时对话的语音质检方法、装置及电子设备
CN111081262A (zh) * 2019-12-30 2020-04-28 杭州中科先进技术研究院有限公司 一种基于定制化模型的轻量级语音识别系统及方法
CN111641757A (zh) * 2020-05-15 2020-09-08 北京青牛技术股份有限公司 坐席通话的实时质检与辅助话术推送的方法
CN111933132A (zh) * 2020-07-13 2020-11-13 深圳市优必选科技股份有限公司 语音识别方法、系统、终端设备及计算机可读存储介质
CN112698872A (zh) * 2020-12-21 2021-04-23 北京百度网讯科技有限公司 语音数据处理的方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1503220A (zh) * 2002-11-20 2004-06-09 中国科学院声学研究所 一种语音识别中的并行搜索方法
US20070239890A1 (en) * 2006-04-05 2007-10-11 Chiahong Chen Method, system and program storage device for preventing a real-time application from running out of free threads when the real-time application receives a device interface request
CN102866922A (zh) * 2012-08-31 2013-01-09 河海大学 一种海量数据多线程并行处理中的负载均衡方法
CN105355204A (zh) * 2015-11-24 2016-02-24 广州市海葱科技有限公司 一种基于云计算的语音识别系统和方法
CN105702257A (zh) * 2015-08-12 2016-06-22 乐视致新电子科技(天津)有限公司 语音处理方法与装置

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0679274B2 (ja) * 1990-08-31 1994-10-05 インターナショナル・ビジネス・マシーンズ・コーポレイション 非同期制御ユニットにおけるエラー回復方法
US6434526B1 (en) * 1998-06-29 2002-08-13 International Business Machines Corporation Network application software services containing a speech recognition capability
US6370503B1 (en) * 1999-06-30 2002-04-09 International Business Machines Corp. Method and apparatus for improving speech recognition accuracy
US6453290B1 (en) * 1999-10-04 2002-09-17 Globalenglish Corporation Method and system for network-based speech recognition
US7330815B1 (en) * 1999-10-04 2008-02-12 Globalenglish Corporation Method and system for network-based speech recognition
US6801604B2 (en) * 2001-06-25 2004-10-05 International Business Machines Corporation Universal IP-based and scalable architectures across conversational applications using web services for speech and audio processing resources
US7848928B2 (en) * 2005-08-10 2010-12-07 Nuance Communications, Inc. Overriding default speech processing behavior using a default focus receiver
US7533277B2 (en) * 2006-04-04 2009-05-12 Microsoft Corporation Operating system shut down
US8949266B2 (en) * 2007-03-07 2015-02-03 Vlingo Corporation Multiple web-based content category searching in mobile search application
US8699677B2 (en) * 2012-01-09 2014-04-15 Comcast Cable Communications, Llc Voice transcription
CN104517606A (zh) * 2013-09-30 2015-04-15 腾讯科技(深圳)有限公司 语音识别测试方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1503220A (zh) * 2002-11-20 2004-06-09 中国科学院声学研究所 一种语音识别中的并行搜索方法
US20070239890A1 (en) * 2006-04-05 2007-10-11 Chiahong Chen Method, system and program storage device for preventing a real-time application from running out of free threads when the real-time application receives a device interface request
CN102866922A (zh) * 2012-08-31 2013-01-09 河海大学 一种海量数据多线程并行处理中的负载均衡方法
CN105702257A (zh) * 2015-08-12 2016-06-22 乐视致新电子科技(天津)有限公司 语音处理方法与装置
CN105355204A (zh) * 2015-11-24 2016-02-24 广州市海葱科技有限公司 一种基于云计算的语音识别系统和方法

Also Published As

Publication number Publication date
CN105702257A (zh) 2016-06-22
US20170047069A1 (en) 2017-02-16

Similar Documents

Publication Publication Date Title
WO2017024908A1 (zh) 语音处理方法与装置
US20210366484A1 (en) Context-sensitive dynamic update of voice to text model in a voice-enabled electronic device
WO2017166777A1 (zh) 一种任务调度方法及装置
WO2019153973A1 (zh) 事件驱动方法及装置
JP5575897B2 (ja) ダイナミックメディアコンテンツプレビュー
WO2015149693A1 (en) Method and apparatus for scheduling concurrent task
US11252149B1 (en) Resource management techniques for dialog-driven applications
US7900212B2 (en) Pooling stateful session Enterprise Java Beans for reuse
CN108762905B (zh) 一种多任务事件的处理方法和装置
WO2023284171A1 (zh) 一种系统重启后的资源分配方法、系统及相关组件
JP2022028879A (ja) 音声データの処理方法、装置、機器及び記憶媒体
US10802874B1 (en) Cloud agnostic task scheduler
CN114968567A (zh) 用于分配计算节点的计算资源的方法、装置和介质
WO2011089223A2 (en) Efficient multi-core processing of events
WO2016095644A1 (zh) 数据库的高可用解决方法和装置
CN111263930A (zh) 防止长期运行事务持有记录锁定
CN117076096A (zh) 任务流程的执行方法、装置、计算机可读介质及电子设备
EP2733697A9 (en) Application services interface to ASR
CN114579187B (zh) 一种指令分配方法、装置、电子设备及可读存储介质
US11817091B1 (en) Fault-tolerance techniques for dialog-driven applications
US11307974B2 (en) Horizontally scalable distributed system for automated firmware testing and method thereof
TWI783401B (zh) 記憶體管理方法和相關產品
CN113051051B (zh) 视频设备的调度方法、装置、设备及存储介质
US11893407B2 (en) Overlay container storage driver for microservice workloads
CN112181640A (zh) 一种任务处理方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16834532

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16834532

Country of ref document: EP

Kind code of ref document: A1