CN113923177B

CN113923177B - Voice processing system, method and device for instant messaging

Info

Publication number: CN113923177B
Application number: CN202111165023.4A
Authority: CN
Inventors: 魏恒瑞; 覃建策; 金亮
Original assignee: Perfect World Beijing Software Technology Development Co Ltd
Current assignee: Perfect World Beijing Software Technology Development Co Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2023-01-06
Anticipated expiration: 2041-09-30
Also published as: CN113923177A

Abstract

The application relates to a voice processing system, a method and a device for instant messaging. The method comprises the following steps: pulling a corresponding message body from a server according to the message identification mark, acquiring voice data in the message body and storing the voice data in a local database, wherein the message body is a predefined data structure for storing the voice data, the message body is generated by a first client and is uploaded to the server in advance, the voice data is encoded in a binary array format, and the voice data is the voice data of a first object; generating a voice message operation control associated with the message identification identifier on a target session interface, wherein the target session interface is a session interface of a first object and a second object; and under the condition that the target operation of the second object on the voice message operation control is received, generating a voice file according to the voice data stored in the local database, and performing voice playing on the voice file. The method and the device solve the technical problem of low client playing efficiency caused by fragmentation of the server.

Description

Voice processing system, method and device for instant messaging

Technical Field

The present application relates to the field of internet technologies, and in particular, to a system, a method, and an apparatus for processing a voice of instant messaging.

Background

Instant Messaging (IM) is a real-time communication system that allows two or more people to communicate text messages, files, voice and video in real time using a network. Because the data message of instant messaging has the characteristics of discontinuity, frequency, small data volume and the like, a very serious fragmentation phenomenon occurs in the message storage of the server, the server needs to allocate a network address which can be accessed to each fragment message in order to accurately acquire the desired data from the outside, and the network addresses of different fragment messages are different. The instant messaging message, especially the short voice message of instant messaging, the serious fragmentation storage causes that when the client draws the short voice, the network address of the short voice needs to be inquired from the server first, and then the voice file is downloaded according to the network address of the short voice to play the short voice, so that the playing efficiency is low.

Aiming at the problem of low client playing efficiency caused by fragmentation of server storage, no effective solution is provided at present.

Disclosure of Invention

The application provides a voice processing system, a method and a device for instant messaging, which aim to solve the technical problem of low client playing efficiency caused by fragmentation of server storage.

According to an aspect of an embodiment of the present application, there is provided an instant messaging speech processing system, including:

the first client is used for writing the voice data of the first object into a message body carrying a message identification mark and sending the voice message comprising the message body, wherein the voice data is coded in a binary array format, and the message body is a predefined data structure for storing the voice data;

the server is used for receiving the voice message and storing a message body, wherein the message body is stored in a data segment form;

the second client is used for pulling the corresponding message body from the server according to the message identification mark, acquiring the voice data in the message body and storing the voice data in a local database; generating a voice message operation control associated with the message identification identifier on a target session interface, wherein the target session interface is a session interface of a first object and a second object; and under the condition that the target operation of the second object on the voice message operation control is received, generating a voice file according to the voice data stored in the local database, and performing voice playing on the voice file.

According to another aspect of the embodiments of the present application, there is provided an instant messaging speech processing method applied to a second client, including:

pulling a corresponding message body from a server according to the message identification mark, acquiring voice data in the message body and storing the voice data in a local database, wherein the message body is a predefined data structure for storing the voice data, the message body is generated by a first client and is uploaded to the server in advance, the voice data is encoded in a binary array format, and the voice data is the voice data of a first object;

generating a voice message operation control associated with the message identification identifier on a target session interface, wherein the target session interface is a session interface of a first object and a second object;

and under the condition that the target operation of the second object on the voice message operation control is received, generating a voice file according to the voice data stored in the local database, and performing voice playing on the voice file.

Optionally, the pulling the corresponding message body from the server according to the message identification identifier, acquiring the voice data in the message body and storing the voice data in the local database includes:

sending a voice message acquisition request carrying a message identification identifier to a server so that the server searches a corresponding message body according to the message identification identifier;

under the condition of receiving a message body returned by a server end responding to a voice message acquisition request, analyzing the message body to acquire voice data;

and storing the voice data obtained by analysis in a local database.

Optionally, generating a voice file according to the voice data stored in the local database, and performing voice playing on the voice file includes:

searching corresponding voice data in a local database according to the message identification mark;

converting the voice data from the binary array into an audio file;

and playing the audio file.

Optionally, parsing the message body to obtain the voice data comprises:

the message body is taken as a parameter and is transmitted to a target decoding function, wherein when the target decoding function is a data structure of the predefined message body, the target decoding function is matched with an analytic function established by the data structure;

and decoding the message body by using the target decoding function to obtain the voice data.

Optionally, converting the voice data from the binary array to the audio file comprises:

transmitting the binary number group of the voice data as the form parameter to a file stream conversion function;

binary data in the binary array is extracted by using a file stream conversion function, and an original file of a target storage sector is covered, so that an audio file of a current voice message is generated in the target storage sector by using the binary data, wherein the target storage sector is used for multiplexing the audio file of short voice, and the original file is the audio file generated when voice is played last time.

According to another aspect of the embodiments of the present application, there is provided a method for processing a voice of instant messaging, applied to a first client, including:

acquiring voice data of a first object, wherein the voice data is coded in a binary array format;

writing voice data into a message body carrying a message identification mark, wherein the message body is a predefined data structure for storing the voice data;

and sending the voice message comprising the message body to the server so that the server stores the message body, wherein the message body is stored in the server in a data form.

Optionally, the acquiring the voice data of the first object includes:

acquiring an audio file corresponding to the first object;

converting the audio file into a binary array;

and coding the binary digit group to obtain voice data.

Optionally, writing the voice data into the message body carrying the message identification identifier includes:

the voice data is taken as the parameters to be transmitted to a target coding function, wherein the target coding function is matched with an analytic function established by a data structure when the data structure of a message body is defined in advance;

and coding the voice data by using a target coding function to obtain a message body.

According to another aspect of the embodiments of the present application, there is provided an instant messaging speech processing apparatus applied to a second client, including:

the first voice data acquisition module is used for pulling a corresponding message body from the server according to the message identification mark, acquiring voice data in the message body and storing the voice data in a local database, wherein the message body is a predefined data structure for storing the voice data, the message body is generated by the first client and is uploaded to the server in advance, the voice data is encoded in a binary array format, and the voice data is the voice data of the first object;

the control generating module is used for generating a voice message operation control associated with the message identification identifier on a target session interface, wherein the target session interface is a session interface of a first object and a second object;

and the voice playing module is used for generating a voice file according to the voice data stored in the local database and carrying out voice playing on the voice file under the condition of receiving the target operation of the second object on the voice message operation control.

According to another aspect of the embodiments of the present application, there is provided an instant messaging speech processing apparatus, applied to a first client, including:

the second voice data acquisition module is used for acquiring the voice data of the first object, wherein the voice data is coded in a format of binary digit groups;

the recoding module is used for writing the voice data into a message body carrying a message identification mark, wherein the message body is a predefined data structure for storing the voice data;

and the uploading module is used for sending the voice message comprising the message body to the server so as to enable the server to store the message body, wherein the message body is stored in the server in a data segment mode.

According to another aspect of the embodiments of the present application, an electronic device is provided, which includes a memory, a processor, a communication interface, and a communication bus, where the memory stores a computer program that is executable on the processor, and the memory and the processor communicate with each other through the communication bus and the communication interface, and the processor implements the steps of the method when executing the computer program.

According to another aspect of embodiments of the present application, there is also provided a computer readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the above-mentioned method.

Compared with the related art, the technical scheme provided by the embodiment of the application has the following advantages:

the method comprises the following steps of pulling a corresponding message body from a server according to a message identification mark, acquiring voice data in the message body and storing the voice data in a local database, wherein the message body is a predefined data structure for storing the voice data, the message body is generated by a first client and is uploaded to the server in advance, the voice data is encoded in a binary array format, and the voice data is the voice data of a first object; generating a voice message operation control associated with the message identification identifier on a target session interface, wherein the target session interface is a session interface of a first object and a second object; and under the condition that the target operation of the second object on the voice message operation control is received, generating a voice file according to the voice data stored in the local database, and performing voice playing on the voice file. According to the method and the device, the voice file is not stored in the server, the binary data of the voice is directly stored, each voice message is distinguished by the message body, so that the second client can firstly pull the message body of the voice message from the server when the voice needs to be played, the voice data is obtained from the message body and stored in the local database, the operation control associated with the message identification identifier of the voice message is generated, the user triggers the operation control, the voice file can be played according to the voice data stored in the local database, the network address of the short voice does not need to be inquired from the server, and the short voice can be played only by downloading the voice file according to the network address of the short voice, so that the reading and playing efficiency of the voice is greatly improved, and the technical problem of low client playing efficiency caused by fragmentation of the server is solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the technical solutions in the embodiments or related technologies of the present application, the drawings needed to be used in the description of the embodiments or related technologies will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without any creative effort.

Fig. 1 is a schematic diagram of an alternative voice processing system for instant messaging according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a voice processing method applied to optional instant messaging of a second client according to an embodiment of the present application;

fig. 3 is a flowchart illustrating a voice processing method applied to optional instant messaging of a first client according to an embodiment of the present application;

fig. 4 is a block diagram of a speech processing apparatus for optional instant messaging of a second client according to an embodiment of the present application;

fig. 5 is a block diagram of a speech processing apparatus for optional instant messaging of a first client according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an alternative electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the following description, suffixes such as "module", "component", or "unit" used to indicate elements are used only for facilitating the description of the present application, and do not have a specific meaning per se. Thus, "module" and "component" may be used in a mixture.

In the related art, because the data message of the instant messaging has the characteristics of discontinuity, frequency, small data volume and the like, a very serious fragmentation phenomenon occurs in the message storage of the server, the server needs to allocate a network address which can be accessed to each piece of fragmentation information in order to obtain the desired data accurately from the outside, and the network addresses of different pieces of fragmentation information are different. The instant messaging message, especially the short voice message of instant messaging, the serious fragmentation storage causes that when the client draws the short voice, the network address of the short voice needs to be inquired from the server first, and then the voice file is downloaded according to the network address of the short voice to play the short voice, so that the playing efficiency is low. Moreover, after the number of the voices is increased, not only is the storage fragmentation of the server severe, but also the client can greatly occupy storage resources due to the increase of the independent voice files, and the problem of file fragmentation cannot be avoided by the client.

To solve the problems mentioned in the background, according to an aspect of the embodiments of the present application, there is provided an embodiment of a speech processing system for instant messaging, the system including:

the first client 101 is configured to write voice data of the first object into a message body carrying a message identification identifier, and send a voice message including the message body, where the voice data is encoded in a format of a binary array, and the message body is a predefined data structure for storing the voice data.

In this embodiment, a data structure of a message body storing binary voice data may be predefined, for example, MXSoundMessage is used to represent the message body, and the MXSoundMessage includes a message identifier messageId, a session identifier sessionId, and a voice data byte [ ] in a binary array format. The message identification is used to identify different messages, including text messages, voice messages, video messages, application messages, and the like. The session identification is used to identify different sessions. Wherein messages in different sessions can also be distinguished based only on the message identification.

In the embodiment of the present application, the first object is a user of the first client, and the first client can directly record the voice of the first object as binary voice data byte [ ] through the recording function of its own device. And then writing the message identification (messageId), the session identification (sessionId) and the voice data byte [ ] of the voice into a message body MXsoundmessage, thereby forming a data segment of the voice. The first client uploads the message body of the voice to the server, so that the server stores the message body.

In the embodiment of the application, because each voice message does not need to be generated into an audio file, the server can store the voice messages in the memory space in the form of data segments, the problem of file fragmentation is completely avoided, the server does not need to set a corresponding network address for each file in order to conveniently acquire the files from the outside, and the query of the voice data only needs to rely on the query of the server on the message identification identifier. The second client also does not need to generate individual audio files to store a large number of voice messages, and only needs to record all the voice messages in the same file in a data segment mode, so that the problem of file fragmentation management is solved.

The server 103 is configured to receive a voice message and store a message body, where the message body is stored in a data segment.

The second client 105 is used for pulling the corresponding message body from the server according to the message identification identifier, acquiring the voice data in the message body and storing the voice data in a local database; generating a voice message operation control associated with the message identification identifier on a target session interface, wherein the target session interface is a session interface of a first object and a second object; and under the condition that the target operation of the second object on the voice message operation control is received, generating a voice file according to the voice data stored in the local database, and performing voice playing on the voice file.

In this embodiment of the application, the second object is a user of the second client, and the session interface between the first object and the second object is the target session interface.

In the embodiment of the application, the triggering condition for the second client to pull the message body corresponding to the message identification identifier from the server may be that the server sends new message prompt information to the second client after receiving the voice message and storing the message body, the new message prompt information carries the message identification identifier of the voice message, and the second client can pull the corresponding message body from the server according to the message identification identifier after receiving the new message prompt information. The second client can also actively inquire whether the server has a new message, and if the new message exists, the corresponding message body can be pulled from the server according to the message identification identifier of the new message.

In the embodiment of the application, the second client pulls the message body from the server, acquires the voice data in the message body and stores the voice data in the local database, and then generates a voice message operation control on a target session interface for the session between the first object and the second object. The voice message operation control carries a message identification messageId of the voice message, and can also carry a session identification sessionId and the like. When the second object selects to play the voice message, namely when the play function of the voice message operation control is triggered, the second client finds corresponding voice data in the local database according to the message identification identifier, generates a voice file according to the voice data stored in the local database, and plays the voice of the voice file. Different from the prior art, the prior art is that an audio file is downloaded from a server, audio data is placed in a voice message control, and a user plays audio when clicking the voice message control, but the voice message operation control is only associated with a message identification identifier, and the user pulls a corresponding message body from the server to play according to the message identification identifier after clicking the voice message operation control.

Aiming at short voice, particularly voice with data volume requirement within 20k, even if the second client side needs to play voice, the message body of the voice message is firstly pulled from the server side, the voice data is obtained from the message body and stored in the local database, the operation control related to the message identification identifier of the voice message is generated, the user triggers the operation control, voice file playing voice is generated according to the voice data stored in the local database, and the response speed is higher than that of downloading and playing the corresponding audio file inquired in the fragmented file.

Optionally, a database 107 may also be provided for the server 103, so as to provide data access service for the server 103.

According to the method and the device, the voice file is not stored in the server, the binary data of the voice is directly stored, each voice message is distinguished by the message body, so that when the voice needs to be played, the second client can firstly pull the message body of the voice message from the server, obtain the voice data from the message body and store the voice data in the local database, generate the operation control associated with the message identification identifier of the voice message, and trigger the operation control by a user, the voice file can be played according to the voice data stored in the local database, the network address of the short voice does not need to be inquired from the server, and the short voice can be played by downloading the voice file according to the network address of the short voice.

According to another aspect of the embodiments of the present application, there is provided an instant messaging voice processing method, which may be executed by the second client 105, as shown in fig. 2, the method may include the following steps:

step S202, pulling a corresponding message body from a server according to a message identification mark, acquiring voice data in the message body and storing the voice data in a local database, wherein the message body is a predefined data structure for storing the voice data, the message body is generated by a first client and is uploaded to the server in advance, the voice data is encoded in a binary array format, and the voice data is the voice data of a first object;

step S204, generating a voice message operation control associated with the message identification identifier on a target session interface, wherein the target session interface is a session interface of a first object and a second object;

and step S206, under the condition that the target operation of the second object on the voice message operation control is received, generating a voice file according to the voice data stored in the local database, and carrying out voice playing on the voice file.

In the embodiment of the application, the first object uses the first client, and the first client records the voice of the first object to obtain the voice data byte [ ] in the binary array format. The first client writes the voice message identification messageId, the session identification sessionId and the voice data byte [ ] into a message body MXSoundMessage to form a voice data segment. The first client uploads the message body to the server and simultaneously sends voice message prompt information to the second client so as to inform the second client of the existence of the voice message to be received.

In this embodiment, the second object is a user of the second client, and the session interface between the first object and the second object is the target session interface.

In the embodiment of the application, the triggering condition for the second client to pull the message body corresponding to the message identification identifier from the server may be that the server sends new message prompt information to the second client after receiving the voice message and storing the message body, the new message prompt information carries the message identification identifier of the voice message, and the second client can pull the corresponding message body from the server according to the message identification identifier after receiving the new message prompt information. The second client side can also actively inquire whether the server side has the new message or not, and if the new message exists, the corresponding message body can be pulled from the server side according to the message identification mark of the new message.

In the embodiment of the application, the second client pulls the message body from the server, acquires the voice data in the message body and stores the voice data in the local database, and then generates a voice message operation control on a target session interface for the session between the first object and the second object. The voice message operation control carries a message identification messageId of the voice message, and can also carry a session identification sessionId and the like. When the second object selects to play the voice message, namely when the play function of the voice message operation control is triggered, the second client finds corresponding voice data in the local database according to the message identification identifier, generates a voice file according to the voice data stored in the local database, and plays the voice of the voice file.

Through the steps S202 to S206, in the present application, the server does not store the voice file, but directly stores the binary data of the voice, and each voice message is distinguished by the message body, so that the second client can first pull the message body of the voice message from the server when the voice needs to be played, obtain the voice data from the message body and store the voice data in the local database, and generate the operation control associated with the message identification identifier of the voice message, and the user triggers the operation control, and then generates the voice file according to the voice data stored in the local database, so as to play the voice, without inquiring the network address of the short voice from the server first, and then downloading the voice file according to the network address of the short voice, so as to play the short voice, thereby greatly improving the efficiency of reading and playing the voice, and solving the technical problem of low efficiency of the client due to fragmentation of the server.

Optionally, the step S202 of pulling the corresponding message body from the server according to the message identification identifier, acquiring the voice data in the message body and storing the voice data in the local database includes:

step 1, sending a voice message acquisition request carrying a message identification identifier to a server, so that the server searches a corresponding message body according to the message identification identifier.

And 2, under the condition of receiving the message body returned by the server end responding to the voice message acquisition request, analyzing the message body to acquire voice data.

And 3, storing the voice data obtained by analysis in a local database.

In the embodiment of the present application, a matched codec function may be created for a self-defined message body data structure in advance, where the codec function setData is used to write binary speech data byte [ ] into a message body according to a preset format, and the codec function getData is used to decode the message body, so as to obtain the binary speech data byte [ ] from the message body. That is, parsing the message body to obtain voice data includes: the message body is taken as a parameter and is transmitted to a target decoding function, wherein when the target decoding function is a data structure of the predefined message body, the target decoding function is matched with an analytic function established by the data structure; and decoding the message body by using the target decoding function to obtain the voice data.

In the embodiment of the application, in order to facilitate reading and writing by people and machine analysis and generation, and effectively improve network transmission efficiency, the first client can convert the message body into json data and then upload the json data to the server, so that the corresponding json data is pulled by the second client according to the message identification identifier, and the json data needs to be analyzed to obtain the message body.

In the embodiment of the application, because the voice data is binary, compared with the voice data stored in a file format, the application can re-encode the message body or json data by adopting a Base64 encoding mode so as to represent the binary voice data based on 64 printable characters. Base64 encoding is a binary to character process that may be used to convey longer identification information in the HTTP environment.

step 1, searching corresponding voice data in a local database according to a message identification mark;

step 2, converting the voice data from the binary array into an audio file;

and step 3, playing the audio file.

In the embodiment of the present application, when the second client reads the voice data to play, the binary voice data may be stored as a file by using the writing function of the io stream, that is, the binary voice data is temporarily stored before playing, and since the phrase tone size is required to be within 20k, the storage speed is very fast. Specifically, converting voice data from a binary array to an audio file comprises:

step 1, a binary number group of voice data is used as a parameter to be transmitted to a file stream conversion function;

and 2, extracting binary data in the binary array by using a file stream conversion function, and covering an original file of a target storage sector so as to generate an audio file of the current voice message in the target storage sector by using the binary data, wherein the target storage sector is used for multiplexing the audio file of the short voice, and the original file is the audio file generated when the voice is played last time.

In the embodiment of the present application, all phrase sound files only need to be stored as one file in an overlapping manner. Only one voice is stored once, so that the multiplexing of the files is realized, and the terminal has only one file all the time.

In the application, the phrase voice message is stored in the message body stored in the data segment form, the binary voice data is directly taken out from the message body during playing and stored as a file for playing, and the file is deleted or covered after playing is finished, so that the second client only needs one 20k storage space to store the audio file multiplexed each time, the number of the files of the second client is greatly reduced, and the process of searching the playing source file is greatly reduced during playing.

According to another aspect of the embodiments of the present application, there is provided an instant messaging voice processing method, which may be executed by a first client 101, as shown in fig. 3, and the method may include the following steps:

step S302, acquiring voice data of a first object, wherein the voice data is coded in a binary array format;

step S304, writing the voice data into a message body carrying a message identification mark, wherein the message body is a predefined data structure for storing the voice data;

step S306, sending the voice message including the message body to the server side so that the server side stores the message body, wherein the message body is stored in the server side in a data form.

In the embodiment of the application, the first client may directly record the voice of the first object as binary voice data byte through a recording function of the first client. Then, the message identification id, session identification sessionId and voice data byte [ ] of the voice are written into the message body MXSoundMessage, thereby forming the data segment of the voice. The first client uploads the message body of the voice to the server, so that the server stores the message body.

Optionally, the acquiring the voice data of the first object includes:

step 1, acquiring an audio file corresponding to a first object;

step 2, converting the audio file into a binary array;

and 3, coding the binary digit group to obtain voice data.

In the embodiment of the application, an audio file existing in the first object can be acquired, and after the audio file is acquired, the size of the audio file is judged; if the data volume of the audio file is larger than the conversion threshold value, the audio file is abandoned to be converted into a binary array; if the data volume of the audio file is less than or equal to the conversion threshold value, the audio file is converted into a binary array, and finally the binary array is encoded to obtain the voice data. The switching threshold value can be set according to actual needs.

In the embodiment of the present application, a matched codec function may be created in advance for a self-defined message body data structure, where the codec function setData is used to write binary speech data byte [ ] into a message body according to a preset format, and the codec function getData is used to decode the message body, so as to obtain the binary speech data byte [ ]fromthe message body.

In the embodiment of the application, because the voice data is binary, compared with the voice data stored in a file format, the application can adopt a Base64 coding mode to recode the message body or json data so as to represent the binary voice data based on 64 printable characters. Base64 encoding is a binary to character process that may be used to convey longer identification information in the HTTP environment.

According to another aspect of the embodiments of the present application, there is provided an instant messaging voice processing apparatus applied to a second client, as shown in fig. 4, including:

a first voice data obtaining module 401, configured to pull a corresponding message body from the server according to the message identification identifier, obtain voice data in the message body, and store the voice data in a local database, where the message body is a predefined data structure for storing voice data, the message body is generated by the first client and is uploaded to the server in advance, the voice data is encoded in a format of a binary array, and the voice data is voice data of the first object;

a control generating module 403, configured to generate a voice message operation control associated with the message identification identifier on a target session interface, where the target session interface is a session interface of the first object and the second object;

and the voice playing module 405 is configured to generate a voice file according to the voice data stored in the local database and perform voice playing on the voice file when the target operation of the second object on the voice message operation control is received.

It should be noted that the first voice data obtaining module 401 in this embodiment may be configured to execute step S202 in this embodiment, the control generating module 403 in this embodiment may be configured to execute step S204 in this embodiment, and the voice playing module 405 in this embodiment may be configured to execute step S206 in this embodiment.

It should be noted that the modules described above are the same as examples and application scenarios realized by corresponding steps, but are not limited to what is disclosed in the foregoing embodiments. It should be noted that the modules described above as a part of the apparatus may run in a system as shown in fig. 1, and may be implemented by software or hardware.

Optionally, the first voice data obtaining module is specifically configured to: sending a voice message acquisition request carrying a message identification mark to a server so that the server searches a corresponding message body according to the message identification mark; under the condition of receiving a message body returned by a server end responding to a voice message acquisition request, analyzing the message body to acquire voice data; and storing the voice data obtained by analysis in a local database.

Optionally, the voice playing module is specifically configured to: searching corresponding voice data in a local database according to the message identification mark; converting the voice data from the binary array into an audio file; and playing the audio file.

Optionally, the speech playing module further includes a decoding module, configured to: the message body is taken as a parameter and is transmitted to a target decoding function, wherein the target decoding function is matched with an analytic function established by a data structure when the data structure of the message body is defined in advance; and decoding the message body by using the target decoding function to obtain voice data.

Optionally, the voice playing module further includes a file generating module, configured to: transmitting the binary number group of the voice data as the form parameter to a file stream conversion function; binary data in the binary array is extracted by using a file stream conversion function, and an original file of a target storage sector is covered, so that an audio file of a current voice message is generated in the target storage sector by using the binary data, wherein the target storage sector is used for multiplexing short voice audio files, and the original file is an audio file generated when voice is played last time.

According to another aspect of the embodiments of the present application, there is provided an instant messaging speech processing apparatus applied to a first client, as shown in fig. 5, including:

a second voice data obtaining module 501, configured to obtain voice data of the first object, where the voice data is encoded in a format of binary array;

a recoding module 503, configured to write the voice data into a message body carrying the message identification identifier, where the message body is a predefined data structure for storing the voice data;

and an uploading module 505, configured to send the voice message including the message body to the server, so that the server stores the message body, where the message body is stored in the server in the form of a data segment.

It should be noted that the second speech data obtaining module 501 in this embodiment may be configured to execute step S302 in this embodiment, the re-encoding module 503 in this embodiment may be configured to execute step S304 in this embodiment, and the uploading module 505 in this embodiment may be configured to execute step S306 in this embodiment.

It should be noted that the modules described above are the same as examples and application scenarios realized by corresponding steps, but are not limited to what is disclosed in the foregoing embodiments. The modules may be operated in a system as shown in fig. 1 as a part of a device, and may be implemented by software or hardware.

Optionally, the second voice data obtaining module is specifically configured to: acquiring an audio file corresponding to the first object; converting the audio file into a binary array; and coding the binary digit group to obtain voice data.

Optionally, the re-encoding module is specifically configured to: the voice data is taken as the parameters to be transmitted to a target coding function, wherein the target coding function is matched with an analytic function established by a data structure when the data structure of a message body is defined in advance; and coding the voice data by using a target coding function to obtain a message body.

According to another aspect of the embodiments of the present application, an electronic device is provided, as shown in fig. 6, and includes a memory 601, a processor 603, a communication interface 605, and a communication bus 607, where a computer program operable on the processor 603 is stored in the memory 601, the memory 601 and the processor 603 communicate with each other through the communication interface 605 and the communication bus 607, and the steps of the method are implemented when the processor 603 executes the computer program.

The memory and the processor in the electronic equipment are communicated with the communication interface through a communication bus. The communication bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc.

The Memory may include a Random Access Memory (RAM), and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components.

There is also provided in accordance with yet another aspect of an embodiment of the present application a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read by a processor of the computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps of any of the embodiments described above.

Optionally, in an embodiment of the present application, a computer readable medium is configured to store program code for the processor to perform the following steps:

Optionally, for a specific example in this embodiment, reference may be made to the example described in the foregoing embodiment, and this embodiment is not described herein again.

When the embodiments of the present application are specifically implemented, reference may be made to the above embodiments, and corresponding technical effects are achieved.

Optionally, in an embodiment of the present application, the computer readable medium may be further configured to store program code for the processor to perform the following steps:

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments, and this embodiment is not described herein again.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the Processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented by means of units performing the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or make a contribution to the prior art, or may be implemented in the form of a software product stored in a storage medium and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk, and various media capable of storing program codes. It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

The above description is merely exemplary of the present application and is presented to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An instant messaging voice processing system, comprising:

the server is used for receiving the voice message and storing the message body, wherein the message body is stored in a data segment form;

the second client is used for pulling the corresponding message body from the server according to the message identification mark, acquiring the voice data in the message body and storing the voice data in a local database; generating a voice message operation control associated with the message identification mark on a target session interface, wherein the target session interface is a session interface of the first object and the second object; and under the condition that the target operation of the second object on the voice message operation control is received, generating a voice file according to the voice data stored in the local database, and performing voice playing on the voice file.

2. A voice processing method of instant messaging is applied to a second client, and is characterized by comprising the following steps:

pulling a corresponding message body from a server according to a message identification mark, acquiring voice data in the message body and storing the voice data in a local database, wherein the message body is a predefined data structure for storing the voice data, the message body is generated by a first client and is uploaded to the server in advance, the voice data is encoded in a binary array format, and the voice data is the voice data of a first object;

generating a voice message operation control associated with the message identification mark on a target session interface, wherein the target session interface is a session interface of the first object and the second object;

3. The method of claim 2, wherein pulling the corresponding message body from the server according to the message identification tag, acquiring the voice data in the message body and storing the voice data in the local database comprises:

sending a voice message acquisition request carrying the message identification to the server so that the server searches the corresponding message body according to the message identification;

under the condition that the message body returned by the server end responding to the voice message acquisition request is received, the message body is analyzed to acquire the voice data;

and storing the voice data obtained by analysis in a local database.

4. The method of claim 2, wherein generating a voice file according to the voice data stored in the local database, and playing the voice file comprises:

searching the corresponding voice data in a local database according to the message identification mark;

converting the voice data from a binary array into an audio file;

and playing the audio file.

5. The method of claim 3, wherein parsing the message body to obtain the voice data comprises:

transferring the message body as a parameter to a target decoding function, wherein the target decoding function is an analytic function which is matched with the data structure when the data structure of the message body is defined in advance;

6. The method of claim 3, wherein converting the voice data from a binary array to an audio file comprises:

transmitting the binary array of the voice data as a parameter to a file stream conversion function;

and extracting binary data in the binary array by using the file stream conversion function, and covering an original file of a target storage sector so as to generate the audio file of the current voice message in the target storage sector by using the binary data, wherein the target storage sector is used for multiplexing the audio file of short voice, and the original file is the audio file generated when voice is played last time.

7. A voice processing method of instant messaging is applied to a first client, and is characterized by comprising the following steps:

writing the voice data into a message body carrying a message identification mark, wherein the message body is a predefined data structure for storing the voice data;

and sending the voice message comprising the message body to a server so that the server stores the message body, wherein the message body is stored in the server in a data segment form.

8. The method of claim 7, wherein obtaining speech data for the first object comprises:

acquiring an audio file corresponding to the first object;

converting the audio file into a binary array;

and coding the binary digit group to obtain the voice data.

9. The method of claim 7, wherein writing the voice data into a message body carrying a message identification comprises:

the voice data is taken as the parameters to be transmitted to a target coding function, wherein the target coding function is matched with an analytic function created by a data structure when the data structure of the message body is defined in advance;

and coding the voice data by using the target coding function to obtain the message body.

10. An instant messaging voice processing device applied to a second client is characterized by comprising:

the first voice data acquisition module is used for pulling a corresponding message body from a server according to a message identification mark, acquiring voice data in the message body and storing the voice data in a local database, wherein the message body is a predefined data structure for storing the voice data, the message body is generated by a first client and is uploaded to the server in advance, the voice data is encoded in a binary array format, and the voice data is voice data of a first object;

the control generating module is used for generating a voice message operation control associated with the message identification identifier on a target session interface, wherein the target session interface is a session interface of the first object and the second object;

and the voice playing module is used for generating a voice file according to the voice data stored in the local database and playing the voice file under the condition of receiving the target operation of the second object on the voice message operation control.

11. An instant messaging voice processing device applied to a first client side comprises:

the second voice data acquisition module is used for acquiring the voice data of the first object, wherein the voice data is coded by adopting a format of binary digit groups;

and the uploading module is used for sending the voice message comprising the message body to a server so that the server stores the message body, wherein the message body is stored in the server in a data segment form.

12. An electronic device comprising a memory, a processor, a communication interface and a communication bus, wherein the memory stores a computer program operable on the processor, and the memory and the processor communicate via the communication bus and the communication interface, wherein the processor implements the steps of the method according to any of the claims 2 to 6 or 7 to 9 when executing the computer program.

13. A computer-readable medium having non-volatile program code executable by a processor, wherein the program code causes the processor to perform the method of any of claims 2 to 6 or 7 to 9.