WO2020207375A1 - 一种基于即时通讯应用的数据处理方法、装置、设备和存储介质 - Google Patents

一种基于即时通讯应用的数据处理方法、装置、设备和存储介质 Download PDF

Info

Publication number
WO2020207375A1
WO2020207375A1 PCT/CN2020/083485 CN2020083485W WO2020207375A1 WO 2020207375 A1 WO2020207375 A1 WO 2020207375A1 CN 2020083485 W CN2020083485 W CN 2020083485W WO 2020207375 A1 WO2020207375 A1 WO 2020207375A1
Authority
WO
WIPO (PCT)
Prior art keywords
voiceprint
audio data
audio
data
progress
Prior art date
Application number
PCT/CN2020/083485
Other languages
English (en)
French (fr)
Inventor
刘立强
沙莎
吴俊�
钟庆华
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2020207375A1 publication Critical patent/WO2020207375A1/zh
Priority to US17/317,389 priority Critical patent/US11683278B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/04Real-time or near real-time messaging, e.g. instant messaging [IM]
    • H04L51/043Real-time or near real-time messaging, e.g. instant messaging [IM] using or handling presence information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/04Real-time or near real-time messaging, e.g. instant messaging [IM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04847Interaction techniques to control parameter settings, e.g. interaction with sliders or dials
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/07User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
    • H04L51/10Multimedia information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/06Message adaptation to terminal or network requirements
    • H04L51/066Format adaptation, e.g. format conversion or compression

Definitions

  • This application relates to the field of Internet technology, and in particular to a data processing method, device, equipment and storage medium based on instant messaging applications.
  • the duration of the voice message can be displayed in the message bar, and the user can click to play the voice message. It can be seen that only the duration of the voice message is displayed in the message bar, and the presentation form of the voice message is too single; and for the received voice message, the operation mode of using click to play to listen to the voice message is also too single.
  • the embodiments of the present application provide a data processing method, device, device, and storage medium based on an instant messaging application, which can increase the diversity of audio data display forms and enrich audio data operation modes.
  • the embodiments of the present application provide a data processing method based on an instant messaging application, which is executed by a data processing device based on an instant messaging application, including:
  • audio progress control is performed on the audio data, and the voiceprint diagram is displayed and controlled based on the audio progress.
  • One aspect of the embodiments of the present application provides a data processing device based on an instant messaging application, including:
  • the sampling module is used to obtain audio data in an instant messaging application, and obtain sampling volume data corresponding to the audio data based on the sampling frequency;
  • a generating module configured to generate a voiceprint diagram corresponding to the audio data according to the audio data and the sampling volume data, and output a message bar containing the voiceprint diagram and the audio data;
  • the response module is configured to respond to the target trigger operation for the message bar, perform audio progress control on the audio data, and perform display control on the voiceprint diagram based on the audio progress.
  • One aspect of the embodiments of the present application provides a data processing device based on an instant messaging application, including: a processor and a memory;
  • the processor is connected to a memory, wherein the memory is used to store program code, and the processor is used to call the program code to execute the above-mentioned data processing method based on the instant messaging application.
  • One aspect of the embodiments of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, the computer program includes program instructions, and the program instructions, when executed by a processor, execute the above-mentioned Data processing methods for instant messaging applications.
  • displaying the voiceprint map corresponding to the audio data in the message bar of the instant messaging application can not only increase the diversity of audio data display forms, but also enrich the audio data operation modes by controlling the audio progress of the audio data.
  • FIG. 1 is a schematic diagram of a scenario of a data processing method based on an instant messaging application provided by an embodiment of the present application;
  • FIG. 2 is a schematic flowchart of a data processing method based on an instant messaging application provided by an embodiment of the present application
  • 3a to 3c are schematic diagrams of an interface in response to a target trigger operation on a message bar provided by an embodiment of the present application;
  • FIG. 4 is a schematic flowchart of another data processing method based on an instant messaging application provided by an embodiment of the present application
  • FIG. 5 is a schematic diagram of a voiceprint visualization calculation rule provided by an embodiment of the present application.
  • 6a-6c are schematic diagrams of interfaces for visualizing voiceprint shape types provided by embodiments of the present application.
  • FIG. 7 is an implementation model diagram of a voice messaging technology based on instant messaging applications provided by an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of another data processing method based on an instant messaging application according to an embodiment of the present application.
  • FIG. 9 is a schematic diagram of an interface of a personalized message display type provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a message bar function model provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a data processing device based on an instant messaging application provided by an embodiment of the present application.
  • Fig. 12 is a schematic structural diagram of another data processing device based on an instant messaging application provided by an embodiment of the present application.
  • FIG. 1 is a schematic diagram of a data processing method based on an instant messaging application provided by an embodiment of the present application.
  • the user can open an instant messaging application (such as QQ, WeChat, etc.) from the terminal device 100a, and click the chat window 200a corresponding to any contact in the instant messaging application.
  • the chat window 200a the user clicks
  • a voice operation panel 300a can be displayed at the bottom of the chat window 200a.
  • there are multiple voice modes (such as voice mode, voice changing mode, recording mode) for the user to choose. If the user selects the voice mode, the user can press and hold the voice icon with a finger and speak at the microphone of the terminal device 100a.
  • the instant messaging application can record the user’s voice in real time and display it on the voice operation panel 300a
  • the duration information of the recorded voice (such as 0:07); when the user releases the finger, the recorded voice can be used as audio data, and the corresponding first voiceprint image is generated based on the audio data, and the audio data and the
  • the message bar of the first voiceprint image will also send the audio data to the contact corresponding to the chat window 200a (the user may be called the sender of the audio data, and the contact may be called the receiver of the audio data) Or), the receiving end where the contact is located can also generate a corresponding second voiceprint image based on the received audio data, and display a message bar containing the audio data and the second voiceprint image on the receiving end.
  • the voice information contained in the message bar displayed on the local end is the same as the voice information contained in the message bar displayed on the receiving end, but the voiceprint diagram displayed in the message bar, namely the first voiceprint diagram and the second voice information
  • the pattern can be the same or different.
  • the message bar display types used by the local end and the sending end are different, the display colors of the first voiceprint image and the second voiceprint image will be different.
  • a message bar 400a corresponding to the audio data can be displayed in the chat window 200a, and the message bar 400a can display the voiceprint image corresponding to the audio data, that is, the sound of the audio data can be displayed.
  • the user can click the message bar 400a to play the audio data.
  • the audio data playback progress can be recorded in real time.
  • the progress indicator cursor 500a is displayed in the message bar 400a.
  • the progress indicator cursor 500a can The voiceprint map corresponding to the audio data is divided into two regions, namely the voiceprint region corresponding to the voice that has been played (ie voiceprint region 101a) and the voiceprint region corresponding to the unplayed voice (ie voiceprint region 101b)
  • the voiceprint area corresponding to the already played voice and the voiceprint area corresponding to the unplayed voice have different display colors, and the user can quickly determine the progress information of the voice playing according to the progress indicator 500a and the color information.
  • the user can click to pause the audio data currently being played, and when the message bar 400a corresponding to the audio data is clicked again, the voice can be played back from the pause node corresponding to the audio data.
  • the duration corresponding to the audio data is 1:15. If the user clicks to pause the playback when the audio data is played to 0:20, and clicks the message bar 400a corresponding to the audio data again, the user can continue from 0:20. Then play the audio data.
  • the terminal device 100a may include a mobile phone, a tablet computer, a notebook computer, a handheld computer, a mobile internet device (mobile internet device, MID), a POS (Point Of Sales) machine, a wearable device (such as a smart watch, a smart hand). Ring, etc.) or other terminal equipment with the function of installing instant messaging applications.
  • FIG. 2 is a schematic flowchart of a data processing method based on an instant messaging application according to an embodiment of the present invention. As shown in Figure 2, the method can be executed by the instant messaging application-based data processing device shown in Figure 12, and includes the following steps:
  • Step S101 acquiring audio data in an instant messaging application, and acquiring sampling volume data corresponding to the audio data based on the sampling frequency;
  • the terminal device may include a sending end and a receiving end.
  • the user's voice data can be directly recorded in the instant messaging application as audio data, and the receiving end can use the voice data received from the sending end as audio data.
  • the terminal device can record the user's voice data in real time, and the recorded user voice
  • the data is regarded as audio data. It should be noted that when the user selects the voice change mode, the audio data is the voice data after the user performs voice change. In terminal equipment, different operating systems have differences in the digital representation of audio data.
  • the sampling frequency corresponding to the audio data is determined, and the audio data is sampled based on the sampling frequency to obtain the sampling volume data, that is, the audio data is subjected to decibel data conversion processing, and then the sound sampling is performed .
  • the audio data can be sampled based on the sampling frequency first.
  • the sampled volume data is converted into decibel data in the range of 0-255, that is, after the audio data is sound sampled, Perform decibel data conversion processing on the sampled volume data. For example, when the sampling frequency is 100 times/second, it means that 100 sound data points can be sampled per second in the audio data.
  • Step S102 Generate a voiceprint map corresponding to the audio data according to the audio data and the sampling volume data, and output a message bar containing the voiceprint map and the audio data;
  • the terminal device can obtain the audio duration corresponding to the audio data, and can determine the number of voiceprint points corresponding to the audio data according to the audio duration, and then can determine the audio duration corresponding to each voiceprint point in the audio data, and according to each
  • the sampling volume data within the audio duration corresponding to each voiceprint point determines the height corresponding to each voiceprint point.
  • the height of the voiceprint point is related to the sound volume in the sampled volume data. Within the preset volume range (for example, the user's common voice volume range), the higher the sound volume, the higher the corresponding height of the voiceprint point.
  • the terminal device can generate a voiceprint map corresponding to the audio data, and can output the voiceprint map and audio in the chat window of the instant messaging application The message bar of the data.
  • the voiceprint diagram uses graphics to visualize the audio data, that is, the voiceprint diagram is used to express the position of the voiceprint element in the audio data, and the height of the syllable (the height of the voiceprint point in the voiceprint diagram can be expressed as audio).
  • the change trend of the sound volume in the data and the height of the voiceprint point can be expressed as the change trend of the sound volume in the audio data). Therefore, the sound size and sound change of the audio data can be sensed according to the voiceprint image, and the user can quickly determine the operation mode of the message bar containing the audio data (such as earpiece mode, hands-free mode, silent state, etc.).
  • the voiceprint point corresponds to a low height in the voiceprint diagram, you can choose to operate the message bar in hands-free mode; if the voiceprint point corresponds to a higher height in the voiceprint diagram, you can choose to Operate the message bar in the state or in the handset mode.
  • Step S103 In response to a target trigger operation on the message bar, perform audio progress control on the audio data, and perform display control on the voiceprint map based on the audio progress.
  • the terminal device can respond to the user's target triggering operation on the above message bar to control the audio progress of the audio data, and display and control the voiceprint graph based on the audio progress, that is, it can record the progress information of the audio data in real time. According to the progress information, the progress of the read audio data and the unread audio data in the audio data is displayed in the message bar containing the audio data.
  • the target trigger operation may include a play trigger operation, a pause trigger operation, a drag trigger operation, and may also include a voice-to-text trigger operation, a translation trigger operation, and the like.
  • FIGS. 3a to 3c are schematic diagrams of an interface provided in an embodiment of the present application in response to a target trigger operation on the message bar.
  • steps S201-step S207 are specific descriptions of step S103 in the embodiment corresponding to Figure 2 above, that is, step S201-step S207 is a response to the message column provided by the embodiment of this application. The specific process of the target trigger operation.
  • responding to the target trigger operation may include the following steps:
  • Step S201 In response to the first play trigger operation for the message bar, perform voice play on the audio data, record the audio play progress of the audio data, and display the audio data in the voiceprint diagram according to the audio play progress.
  • Progress indicator cursor In response to the first play trigger operation for the message bar, perform voice play on the audio data, record the audio play progress of the audio data, and display the audio data in the voiceprint diagram according to the audio play progress.
  • Progress indicator cursor In response to the first play trigger operation for the message bar, perform voice play on the audio data, record the audio play progress of the audio data, and display the audio data in the voiceprint diagram according to the audio play progress.
  • Progress indicator cursor In response to the first play trigger operation for the message bar, perform voice play on the audio data, record the audio play progress of the audio data, and display the audio data in the voiceprint diagram according to the audio play progress.
  • the message bar corresponding to the multiple voice messages can be displayed in the chat window, such as message bar 400b, message Column 400c, message column 400d, etc.
  • the instant messaging application can identify the message column corresponding to the unread audio data. For example, before the user reads the audio data contained in the message column 400b, A small dot is displayed in the identification area 401b to identify the message bar 400b.
  • the terminal device can respond to the play trigger operation for the message bar 400b, which can also be referred to as the first play trigger operation, and then voice the audio data in the message bar 400b While playing, the logo corresponding to the message column 400b can be cleared, that is, the small dots in the logo area 401b can be cleared.
  • the audio playback progress of the audio data can be recorded, and the progress indicator cursor 500b is displayed in the voiceprint diagram contained in the message bar 400b according to the audio playback progress.
  • the progress indicator cursor 500b can be used to distinguish between the voiceprint diagrams.
  • the played voiceprint area and the unplayed voiceprint area, the above-mentioned played voiceprint area and the unplayed voiceprint area have different display modes (for example, different colors can be displayed).
  • Step S202 In response to the pause trigger operation for the message bar, stop the voice playback of the audio data, and record the stop timestamp of the position where the progress indicator cursor is located when the stop is stopped;
  • the terminal device can respond to the pause triggering operation on the message bar 400b, stop the voice playback of the audio data in the message bar 400b, and can record the stop time corresponding to the progress indicator cursor 500c when the audio data stops playing Stamp, that is, the moment when the audio data stops playing. If the audio duration corresponding to the audio data is 2:00 minutes, and the audio data is playing to 0:30 when the user clicks to stop playback, the stop timestamp of the position where the progress indicator cursor 500c is located is 0:30 in the audio data .
  • Step S203 In response to the second play trigger operation for the message bar, start to play the voice from the position of the stop timestamp in the audio data.
  • the terminal device can respond to the replay trigger operation for the message bar 400b, which can also be referred to as the second play trigger operation (the second play trigger operation here is to distinguish between the step S201 The first play trigger operation of the message column 400b), the audio data contained in the message column 400b is played from the position of the stop time stamp in the audio data contained in the message column 400b, that is, the audio data contained in the message column 400b is played from the paused position.
  • the audio data in the next message bar 400c can be automatically played, and when the audio data contained in the message bar 400c is played, the audio data contained in the message bar 400b can be modified.
  • the corresponding audio playback progress is cleared, that is, only the audio playback progress corresponding to one piece of audio data is stored on the client.
  • the audio data contained in the message bar 400c can be automatically played until all the unread audio data in the user’s chat window is played, or the response is directed to the audio
  • the pause of the message bar corresponding to the data triggers the operation to stop playing the voice.
  • responding to the target trigger operation may include the following steps:
  • Step S204 In response to the drag trigger operation for the progress indicating cursor in the message bar, obtain the first time stamp of the dragged progress indicating cursor in the audio data, and the first time stamp in the audio data corresponding to the voiceprint image In the first text display area, the text information of the audio data corresponding to the first time stamp is displayed, and the played voiceprint area and the unplayed voiceprint area are performed according to the dragged progress indicator cursor Update
  • the user can also press and hold the progress indicator cursor in the message bar and drag the progress indicator cursor, so that the audio data contained in the message bar can be played at any time.
  • the terminal device can respond to the user’s drag trigger operation on the progress indicator cursor 500e to obtain that the progress indicator cursor 500e is in the message bar 400e during the dragging process.
  • the first time stamp in the included audio data that is, the recording progress indicates the audio progress of the cursor 500e during the drag process
  • the text of the audio data corresponding to the first time stamp is displayed in the first text area 600a in the voiceprint diagram Information, that is, the text field content of the current progress can be displayed in real time during the dragging process of the user, so that the user can determine the exact stop position of the progress indicator cursor 500e according to the text field content.
  • the audio duration corresponding to the audio data is 2 minutes. If the user wants to play part of the audio data again after playing the audio data, in order to avoid wasting time (replaying the entire audio data takes 2 Minutes), the user can press and drag the progress indicator cursor in the message bar corresponding to the audio data.
  • the specific position of the progress indicator cursor determines the specific position of the progress indicator cursor, that is, the user wants to play again The location of the voice content.
  • the volume of the modal particles in a sentence is usually low.
  • the user can also The exact stopping position of the progress indicator cursor 500e is determined according to the height of the voiceprint bar, so that the user can start playing from the next complete sentence instead of listening in the middle of a sentence.
  • the played voiceprint area and the unplayed voiceprint area in the voiceprint diagram can be updated in real time according to the dragging progress indication cursor 500e.
  • the dragged voiceprint interval can be determined as the unplayed voiceprint area, and when the progress indication cursor 500e is dragged to the voiceprint area 102b, The dragged voiceprint interval is determined as the played voiceprint area.
  • a scale table can also be displayed in the message bar, such as scale table 102c in the message bar 400e.
  • the time scale in the scale table 102c can be determined according to the audio duration corresponding to the audio data contained in the message bar 400e. If the audio duration corresponding to the audio data contained in the message bar 400e is 120 seconds, the corresponding time information can be displayed in the scale 102c, so that the user can determine the exact stopping position of the progress indicator cursor 500e according to the time.
  • Step S205 Obtain the second time stamp of the progress indication cursor in the audio data when the dragging ends, and start playing the voice from the position of the second time stamp in the audio data.
  • the terminal device can obtain the second timestamp of the audio data contained in the message bar 400e of the progress indicating cursor 500e at the end of the dragging, that is, the timestamp when the dragging is stopped, and read it from the message
  • the audio data contained in the column 400e starts to play the voice at the position of the second time stamp. For example, when the user drags the progress indication cursor 500e from 0:30 to 0:50 in the audio data, and stops dragging at 0:50, the voice can be played from the 0:50 of the audio data.
  • the user can follow the progress of the voice playback before dragging the progress indication cursor 500e. Normal playback, until the progress indicating cursor 500e stops dragging, it jumps to the moment when the progress indicating cursor 500e stops for voice playback.
  • the user can pause the voice playback while dragging the progress indicator cursor 500e, until the progress indicator cursor 500e stops dragging, then jump to the moment when the progress indicator cursor 500e stops. Perform voice playback.
  • the target trigger operation may also include a text conversion trigger operation or a translation trigger operation.
  • responding to the target trigger operation may include the following steps:
  • Step S206 In response to a text conversion trigger operation for the message bar, convert the audio data into first text data, and display the first text data in a second text display area corresponding to the voiceprint image ;
  • the terminal device can respond to the user’s text conversion trigger operation on the message bar 400f, perform voice recognition on the audio data contained in the message bar 400f, to obtain text information corresponding to the audio data, which can also be called the first text data, and
  • the above-mentioned first text data is displayed in the second text display area 600b corresponding to the voiceprint image.
  • the instant messaging application provides the audio data with a text conversion option 701a, which means that the instant messaging application has a text conversion function, and the text conversion function refers to converting audio data into corresponding text information.
  • the text conversion function can convert the audio data into Chinese character text information
  • the voice information in the audio data is Chinese
  • the text conversion function can convert the audio data into Chinese character text information
  • the voice information in the audio data is English
  • the text conversion function can convert the audio data into English text information
  • the voice information in the audio data is a dialect (such as Hunan dialect, Chongqing dialect, Cantonese, etc.)
  • the text conversion function can recognize the dialect in the audio data and convert the audio data into Chinese character text information.
  • Step S207 In response to the translation trigger operation for the first text data, perform text type conversion processing on the first text data to obtain second text data, and display the second text data in the second text display area. text data.
  • the target audio information received by the user is a voice in a foreign language (such as Russian, German, etc.)
  • the target audio information is converted into first text data (the first text data is foreign language text information) through the text conversion function
  • the user cannot understand the content of the first text data, the user can long press the first text data in the second text display area 600b, a menu window 700b can pop up in the corresponding area of the message bar 400f, and the user can select the menu window
  • the translation option 701b in 700b is a voice in a foreign language (such as Russian, German, etc.
  • the terminal device can respond to the user’s translation trigger operation for the first text data, and perform text type conversion processing on the first text data (ie translation Processing) to obtain the second text data matching the translation language type selected by the user, and replace the first text data with the second text data in the second text display area 700b, that is, display in the second text display area 700b
  • the translated text message can realize mutual translation between multiple language types, for example, it can translate Chinese into English, Japanese, German, etc., it can also translate English, Japanese, German into Chinese, etc., and it can also translate English into German. , Italian, etc.
  • the embodiment of the application can obtain audio data in an instant messaging application, obtain sampled volume data by sampling the audio data, and then determine the number of voiceprint points according to the audio duration of the audio data, and determine each voiceprint according to the sampled volume data According to the height corresponding to the point, the voiceprint map corresponding to the audio data can be generated according to the number of voiceprint points and the height corresponding to each voiceprint point, and a message bar containing the voiceprint map and audio data can be output in the instant messaging application. And can respond to the trigger operation for the message bar, record the audio progress of the audio data, and display and control the voiceprint map based on the audio progress. It can be seen that in the chat scene of the instant messaging application, the voice print corresponding to the audio data is displayed in the message bar.
  • the user can click on the message bar to play/pause the voice, and the voice area can be judged through the visual voice print, and the voice can be adjusted by sliding
  • you can watch the text translation corresponding to the voice in real time when you adjust the progress, which can increase the diversity of audio data display forms and enrich the audio data operation methods; and can efficiently help users listen, view, and operate voice messages, which is greatly enhanced
  • the interactivity, readability, and efficiency of voice messages better promote the convenient use of voice messages by users of instant messaging applications.
  • FIG. 4 is a schematic flowchart of another data processing method based on an instant messaging application provided by an embodiment of the present application. As shown in Fig. 4, the method can be executed by the data processing device based on instant messaging application as shown in Fig. 12, and includes the following steps:
  • Step S301 Acquire audio data in an instant messaging application, and obtain sampling volume data corresponding to the audio data based on the sampling frequency;
  • step S301 For the specific implementation process of step S301, reference may be made to the description of step S101 in the embodiment corresponding to FIG. 2, which will not be repeated here.
  • Step S302 Obtain the audio duration corresponding to the audio data
  • the terminal device after the terminal device obtains the audio data in the instant messaging application, it can obtain the audio duration corresponding to the audio data, that is, the duration information from when the user presses the voice icon in the voice operation panel of the instant messaging application to speak to release.
  • Step S303 Determine the length of the message bar in the instant messaging application according to the audio duration
  • the corresponding relationship between the audio duration corresponding to the audio data and the length of the message column is preset. Therefore, after the audio duration corresponding to the audio data is obtained, it can be found from the stored data table of the instant messaging application The length of the message bar that matches the audio duration.
  • Step S304 Determine the length of the voiceprint area corresponding to the audio data according to the reserved margin corresponding to the message column and the length of the message column;
  • the terminal device can obtain the reserved margin corresponding to the message bar in the instant messaging application.
  • the reserved margin includes the left reserved margin and the right reserved margin of the message bar.
  • the reserved margin can be based on the message bar For example, the left margin and the right margin are respectively 5% of the length of the message bar, or it can also be preset, such as regardless of the length of the message bar, the left margin and The reserved margin on the right is set to 2mm.
  • the length of the voiceprint area corresponding to the audio data can be determined, that is, the length of the message bar is subtracted from the reserved margin to obtain the length of the voiceprint area corresponding to the audio data.
  • Step S305 Determine the number of voiceprint points corresponding to the audio data according to the length of the voiceprint area, the size of the voiceprint dot pattern, and the distance between adjacent voiceprint points;
  • the voiceprint corresponding to the audio data can be determined according to the length of the voiceprint area, the size of the voiceprint dot pattern, and the distance between adjacent voiceprint points. The number of points. If the voiceprint dot pattern is a voiceprint dot, if the size of the voiceprint dot pattern can be ignored, the expression is used: [voiceprint area length+distance between two adjacent voiceprint points]/two adjacent voices The distance between the stripes can be calculated to get the number of voiceprint points; if the voiceprint point graph is a voiceprint bar, that is, the voiceprint point is taken as the midpoint of the upper side of the voiceprint bar, then the width of the voiceprint bar is obtained.
  • the distance between two adjacent voiceprint bars is expressed as: [voiceprint area length+distance between two adjacent voiceprint bars]/(width of voiceprint bar+between two adjacent voiceprint bars ⁇ distance), you can calculate the number of voiceprint bars. It should be noted that the width of the voiceprint bar and the distance between the voiceprint points are fixed.
  • the audio duration corresponding to the audio data exceeds the duration threshold (such as 40s)
  • a fixed value such as 25
  • the audio duration corresponding to the audio data is less than or equal to the duration threshold (for example, 40s)
  • the above steps S303 to S305 are executed.
  • Step S306 Determine the unit audio duration corresponding to each voiceprint point according to the audio duration
  • the unit audio duration corresponding to each voiceprint point can be determined according to the audio duration, and the sum of the unit audio duration corresponding to each voiceprint point in the audio data is equal to the audio data of the audio data. duration. For example, if the audio data corresponds to a duration of 10s and the number of voiceprint points is 10, it can be determined that the unit audio duration corresponding to each voiceprint point is 1s, that is, the unit audio duration interval corresponding to the first voiceprint point is audio data The unit audio duration interval corresponding to the second voiceprint point is 1s-2s in the audio data. By analogy, the unit audio duration interval corresponding to each voiceprint point in the audio data can be determined.
  • Step S307 Obtain a volume average value corresponding to the sampled volume data within the unit audio duration, and determine a to-be-processed height corresponding to each voiceprint point based on the volume average value;
  • the average volume value corresponding to the sampled volume data within the aforementioned unit audio duration can be obtained. For example, if the sampling frequency is 100 times/second, and the unit audio duration interval corresponding to a certain voiceprint point is 1s-2s, the average volume of 100 sound data sampled in the range of 1s-2s in the sampled volume data is calculated. According to the functional relationship between the volume and the height of the voiceprint point, determine the respective height to be processed for each voiceprint point. The specific implementation process is: if the volume average value is less than the first volume threshold, the target value is determined as the target value.
  • the to-be-processed height of the voiceprint point corresponding to the average volume if the average volume is greater than or equal to the first volume threshold and less than the second volume threshold, the volume is determined according to the linear increase function between the volume and the height
  • the to-be-processed height of the voiceprint point corresponding to the mean value if the mean value of the volume is greater than or equal to the second volume threshold, the logarithmic increase function between the volume and the height is used to determine the value of the voiceprint point corresponding to the mean value of the volume
  • the height to be processed Please also refer to FIG. 5, which is a schematic diagram of a voiceprint visualization calculation rule provided by an embodiment of the present application.
  • the to-be-processed height of the voiceprint point is non-linear to the volume change, but can be expressed as a piecewise function.
  • the target The value (a fixed value) is determined as the to-be-processed height of the voiceprint point corresponding to the average volume.
  • the to-be-processed height of the voiceprint point corresponding to the volume average value less than the normal volume of the user's voice is always at the minimum.
  • the height of the voiceprint point to be processed is linear with the volume change, which can be based on the linearity between the volume and the height
  • the growth function determines the to-be-processed height of the voiceprint point corresponding to the average volume.
  • the to-be-processed height of the voiceprint point is non-linear to the volume change, and as the volume increases, the to-be-processed height reaches the maximum value According to the logarithmic growth function between the volume and the height, the to-be-processed height of the voiceprint point corresponding to the volume average value is determined.
  • Step S308 Obtain interpolation parameter information corresponding to the height to be processed, and determine the height corresponding to each voiceprint point based on the interpolation parameter information and the height to be processed;
  • the deceleration interpolator can be used to amplify the difference between the to-be-processed heights, that is, for For two voiceprint points with different heights to be processed, the interpolation parameter information corresponding to the two heights to be processed can be obtained through the deceleration difference device.
  • the height difference between two heights to be processed For example, before the enlargement process, the height difference between the two heights to be processed is 0.01 cm, and after the enlargement process, the height difference between the two heights to be processed can become 0.05 cm.
  • Step S309 Generate a voiceprint image to be processed corresponding to the audio data according to the number and the height;
  • the voiceprint map to be processed corresponding to the audio data can be drawn.
  • the voiceprint image to be processed may include information such as the size of the sound in the audio data and the high and low pitches.
  • Step S310 Acquire sound parameters corresponding to the audio data, and select a voiceprint shape type matching the sound parameters from a voiceprint library;
  • the sound parameters corresponding to the audio data can be obtained, the sound type corresponding to the audio data can be determined according to the sound parameter information, and the voiceprint shape type matching the sound type can be selected from the voiceprint library according to the sound type.
  • the voiceprint shape type corresponding to the "normal” voice type is a bar-shaped voiceprint type
  • the voiceprint shape type corresponding to the "Loli” voice type is a curved voiceprint. Type, etc.; it can also be that multiple voice types correspond to one voiceprint shape type.
  • the voiceprint shape types corresponding to the "normal” voice type and the "uncle” voice type are all bar voiceprint types
  • the voiceprint shape types corresponding to the "Thriller” voice type and the "Funny” voice type are curved voiceprint types, etc., which are not limited here.
  • the voiceprint library stores the corresponding relationship between the voice type and the voiceprint shape type, and the voiceprint shape type can be directly searched in the voiceprint library according to the voice type.
  • Step S311 Determine a voiceprint image corresponding to the audio data according to the voiceprint shape type and the voiceprint image to be processed, and output a message bar containing the voiceprint image and the audio data;
  • FIGS. 6a-6c are schematic diagrams of interfaces for visualizing voiceprint shape types provided by embodiments of the present application.
  • the final voiceprint image can be generated, such as the visualized voiceprint image 800a in Figure 6a, the visualized voiceprint image 800b in Figure 6b, and the visualized voiceprint image 800c in Figure 6c.
  • a message bar containing the voiceprint image and the audio data in the chat window of the instant messaging application.
  • the visualized voiceprint diagram 800a in Figure 6a it can also be called a bar-shaped voiceprint diagram.
  • each voiceprint bar and the spacing between the voiceprint bars are preset fixed values, it can be based on each The height of the voiceprint points determines the height of each voiceprint bar.
  • the number of voiceprint points is the number of voiceprint bars. Therefore, a visual voiceprint map 800a can be generated according to the height of each voiceprint bar and the number of voiceprint bars. ; For the visualized voiceprint diagram 800b in Figure 6b, it can also be called a curved voiceprint diagram.
  • each voiceprint point can be connected in a curve to form a smooth
  • select the minimum value from the heights corresponding to all voiceprint points and determine the initial rectangle of each voiceprint bar according to the minimum value.
  • the voiceprint stripe of can be composed of an initial rectangular frame and an over-rectangular frame (the height of the initial rectangular frame and the over-rectangular frame is the same as the height of the voiceprint point), and the visual voiceprint map 800c can be determined.
  • Step S312 In response to the target trigger operation on the message bar, perform audio progress control on the audio data, and perform display control on the voiceprint map based on the audio progress.
  • step S312 For the specific implementation of step S312, reference may be made to the description of step S201 to step S207 in the embodiment corresponding to FIG. 3a to FIG. 3c, which will not be repeated here.
  • FIG. 7 is an implementation model diagram of a voice message technology based on an instant messaging application provided by an embodiment of the present application.
  • the user is in the chat scene of the instant messaging application.
  • the sender can click on voice and select the voice type, and then press and hold the voice icon in the voice operation panel to speak and send
  • the process of speaking by the sender is the process of transmitting data to the instant messaging application client corresponding to the sender (hereinafter referred to as the sender client).
  • the sender client can start radio and record the sender’s real-time voice
  • the recorded real-time sound data is converted into decibel data with an interval of [0,255], and the recorded real-time sound data is sampled.
  • the sender client The terminal completes the real-time recording process and sampling process of sound data, so the voice audio data (including recorded real-time sound data and sampling data) can be sent to the instant messaging application client corresponding to the receiver (hereinafter referred to as the receiver client).
  • the receiver client After the receiver client receives the voice audio data sent by the sender client, it can determine the number of voiceprint bars in the chat interface according to the audio duration (the default voiceprint display type of the voiceprint graph is the bar display type), and according to the sample Data, calculate the average volume of each voiceprint bar, and determine the height of each voiceprint bar according to the volume-height curve. Since the calculated height is small, the deceleration interpolation method can be used to amplify the voiceprint height and generate speech The voiceprint image corresponding to the audio data.
  • the sender client calculates the average volume of each voiceprint bar based on the sampled data, the voice audio data can be transmitted to the receiver, and the voice message bubble (ie the voice message bar) can be displayed on the receiver’s chat interface.
  • the voiceprint map corresponding to the voice audio data can also be displayed in the voice message bubble.
  • a progress indicator cursor can be displayed in the voice message bubble.
  • the receiver client can record the audio playback progress. If the receiver presses the progress indicator cursor and drags , The voice audio data will stop playing, and the receiver client can record the progress of the voice audio data. After the receiver’s finger is released, it can jump to the voice audio data and continue to play the progress when the drag is stopped; if the receiver Click Pause, the receiver client can stop playing the voice audio data and record the current audio progress.
  • the receiver clicks to play the next voice message the receiver client can clear the audio record of the previous voice message and start recording the audio progress of the new voice message. So far, the entire voice message technology implementation process is completed.
  • the voice print corresponding to the audio data is displayed in the message bar.
  • the user can click on the message bar to play/pause the voice, and the voice area can be judged through the visual voice print, and the voice can be adjusted by sliding
  • you can watch the text translation corresponding to the voice in real time when you adjust the progress which can increase the diversity of audio data display forms and enrich the audio data operation methods; and can efficiently help users listen, view, and operate voice messages, which is greatly enhanced
  • the interactivity, readability, and efficiency of voice messages better promote the convenient use of voice messages by users of instant messaging applications.
  • FIG. 8 is a schematic flowchart of another data processing method based on an instant messaging application provided by an embodiment of the present application. As shown in FIG. 8, the method can be executed by the data processing device based on the instant messaging application as shown in FIG. 12, and includes the following steps:
  • Step S401 Acquire audio data in an instant messaging application, and obtain sampling volume data corresponding to the audio data based on the sampling frequency;
  • Step S402 Determine the number of voiceprint points corresponding to the audio data according to the audio duration corresponding to the audio data, and determine the height corresponding to each voiceprint point based on the sampling volume data;
  • step S401-step S402 please refer to the description of step S101-step S102 in the embodiment corresponding to FIG. 2, or refer to the description of step S301-step S308 in the embodiment corresponding to FIG. 4. Do not repeat them here.
  • Step S403 Obtain a message bar display type corresponding to the audio data, and extract a voiceprint display parameter matching the message bar display type;
  • the terminal device can obtain the message bar display type corresponding to the audio data, and extract the sound that matches the message bar display type from the local storage. Pattern display parameters.
  • the user can choose any message bar display type from the multiple message bar display types provided in the instant messaging application.
  • the voiceprint display parameters that match the display type of the message bar can be extracted from the local storage, that is, the background color difference with the display type of the message bar can be extracted.
  • the more sexual color is used as the voiceprint display color of the voiceprint image, which can also be called the voiceprint display parameter.
  • the instant messaging application can provide users with multiple message bar display types.
  • FIG. 9 is an interface diagram of a personalized message display type provided by an embodiment of the present application.
  • the message display type can include a message display type 900a, a message display type 900b, and a message display type 900c.
  • the audio data can be adaptively matched according to the sound type of the audio data (such as the voice change type)
  • the corresponding message display type can also be that the user selects a satisfactory message display type from the client according to his own needs, and then the client can obtain the message display type, and extract the voiceprint display parameters matching the message display type.
  • the voiceprint display parameters matching the message display type 900a are determined. If the message display type 900a is If the background color information is black, the voiceprint display parameters can be determined as white.
  • the corresponding relationship between the message bar display type and the voiceprint display parameter can be stored in the local file of the instant messaging application.
  • the terminal device obtains the voiceprint display parameter corresponding to the voiceprint graph, it can be searched from the local file according to the voiceprint display parameter The display type of the message bar corresponding to the message bar.
  • Step S404 Generate a voiceprint map corresponding to the audio data according to the voiceprint display parameter, the number, and the height, and output a message bar containing the voiceprint map and the audio data;
  • the terminal device can draw a voiceprint image to be processed.
  • the voiceprint display parameters the final corresponding voiceprint image of the audio data can be determined, and the voiceprint image can be determined in real time.
  • a message bar containing the voiceprint image and the audio data is output on the chat interface of the communication application. At this time, the voiceprint image can be clearly distinguished from the background color of the message bar.
  • Step S405 In response to the target trigger operation on the message bar, perform audio progress control on the audio data, and perform display control on the voiceprint diagram based on the audio progress.
  • step S405 For the specific implementation of step S405, please refer to the description of step S201 to step S207 in the embodiment corresponding to FIG. 3a to FIG. 3c, which will not be repeated here.
  • FIG. 10 is a schematic structural diagram of a message bar function model provided by an embodiment of the present application.
  • the message bar function model 100 can recognize the voice information in the current voice audio data according to the user's environmental factors and the voice speaking habits of each user, and analyze the data changes of the voice information, and convert the voice information Visualize the message display as a graph. This can be mainly achieved through the visual information rendering layer 110, the functional operation display 120, and the bubble message display 130.
  • the visual information rendering layer 110 can draw visualized voiceprint information, and determine the position of voiceprint elements, voiceprint syllables and voiceprint color changes when drawing;
  • the functional operation display 120 can provide users with operable click pause/play, Press and hold to adjust the voice, real-time text field, long press to convert voice to text and other functions; bubble message performance 130 can provide a personalized message bar display type for voice audio data, that is, provide a changeable message bar visual background for voice audio data.
  • the voice print corresponding to the audio data is displayed in the message bar.
  • the user can click on the message bar to play/pause the voice, and the voice area can be judged through the visual voice print, and the voice can be adjusted by sliding
  • you can watch the text translation corresponding to the voice in real time when you adjust the progress which can increase the diversity of audio data display forms and enrich the audio data operation methods; and can efficiently help users listen, view, and operate voice messages, which is greatly enhanced
  • the interactivity, readability, and efficiency of voice messages better promote the convenient use of voice messages by users of instant messaging applications.
  • FIG. 11 is a schematic structural diagram of a data processing device based on an instant messaging application provided by an embodiment of the present application.
  • the data processing device 1 based on an instant messaging application may include: a sampling module 10, a generating module 20, and a response module 30;
  • the sampling module 10 is configured to obtain audio data in an instant messaging application, and obtain sampling volume data corresponding to the audio data based on the sampling frequency;
  • the sampling module 10 can record the user’s voice data in real time, and The recorded user voice data serves as audio data. It should be noted that when the user selects the voice change mode, the audio data is the voice data after the user performs voice change. Since different operating systems have differences in the digital representation of audio data, the sampling module 10 needs to perform unified conversion processing on the collected audio data, and convert it into decibel data in the range of 0-255.
  • the interval corresponding to the sound data collected by the sampling module 10 is [0,1], so the sampling module 10 needs to convert the collected sound data into decibel data with an interval of [0,255].
  • the sampling frequency corresponding to the audio data is determined, and the audio data is sampled based on the sampling frequency to obtain the sampling volume data, that is, the audio data is subjected to decibel data conversion processing, and then the sound sampling is performed .
  • the audio data can be sampled based on the sampling frequency first.
  • the sampled volume data is converted into decibel data in the range of 0-255, that is, after the audio data is sound sampled, Perform decibel data conversion processing on the sampled volume data. For example, when the sampling frequency is 100 times/second, it means that 100 sound data points can be sampled per second in the audio data.
  • the generating module 20 is configured to generate a voiceprint diagram corresponding to the audio data according to the audio data and the sampling volume data, and output a message bar containing the voiceprint diagram and the audio data;
  • the generation module 20 can obtain the audio duration corresponding to the audio data, and can determine the number of voiceprint points corresponding to the audio data according to the audio duration, and then can determine the audio duration corresponding to each voiceprint point in the audio data, and according to The sampling volume data within the audio duration corresponding to each voiceprint point determines the height corresponding to each voiceprint point.
  • the height of the voiceprint point is related to the sound volume in the sampled volume data. Within the preset volume range (for example, the user's common voice volume range), the higher the sound volume, the higher the corresponding height of the voiceprint point.
  • the terminal device can generate a voiceprint map corresponding to the audio data, and can output the voiceprint map and audio in the chat window of the instant messaging application The message bar of the data.
  • the voiceprint diagram uses graphics to visualize the audio data, that is, the voiceprint diagram is used to express the position of the voiceprint element in the audio data, and the height of the syllable (the height of the voiceprint point in the voiceprint diagram can be expressed as audio).
  • the change trend of the sound volume in the data and the height of the voiceprint point can be expressed as the change trend of the sound volume in the audio data). Therefore, the sound size and sound change of the audio data can be sensed according to the voiceprint image, and the user can quickly determine the operation mode of the message bar containing the audio data (such as earpiece mode, hands-free mode, silent state, etc.).
  • the voiceprint point corresponds to a low height in the voiceprint diagram, you can choose to operate the message bar in hands-free mode; if the voiceprint point corresponds to a higher height in the voiceprint diagram, you can choose to Operate the message bar in the state or in the handset mode.
  • the response module 30 is configured to perform audio progress control on the audio data in response to a target trigger operation on the message bar, and perform display control on the voiceprint diagram based on the audio progress.
  • the response module 30 can respond to the user's target triggering operation on the above message bar to perform audio progress control on the audio data, and display and control the voiceprint diagram based on the audio progress, that is, it can record the progress information of the audio data in real time. , And display the progress of the read audio data and unread audio data in the audio data in the message bar containing the audio data according to the progress information.
  • the target trigger operation may include a play trigger operation, a pause trigger operation, a drag trigger operation, and may also include a voice-to-text trigger operation, a translation trigger operation, and the like.
  • the data processing device 1 based on the instant messaging application may further include: a conversion module 40 and a translation module 50;
  • the conversion module 40 is configured to respond to the text conversion trigger operation for the message bar, convert the audio data into first text data, and display the first text in the second text display area corresponding to the voiceprint image A text data;
  • the translation module 50 is configured to perform text type conversion processing on the first text data in response to a translation trigger operation for the first text data to obtain the second text data, and display all text data in the second text display area Describe the second text data.
  • step S206 the specific functional implementation of the conversion module 40 and the translation module 50 can be referred to step S206 to step S207 in the embodiment corresponding to FIG. 3c, which will not be repeated here.
  • the generating module 20 may include: a quantity determining unit 201, a height determining unit 202, and a voiceprint image generating unit 203;
  • the quantity determining unit 201 is configured to determine the quantity of voiceprint points corresponding to the audio data according to the audio duration corresponding to the audio data;
  • the height determining unit 202 is configured to determine the height corresponding to each voiceprint point based on the sampled volume data
  • the voiceprint map generating unit 203 is configured to generate a voiceprint map corresponding to the audio data according to the number and the height.
  • step S302-step S308 in the embodiment corresponding to FIG. 4
  • step S309 to S311 in the corresponding embodiment and steps S403 to S404 in the embodiment corresponding to FIG. 8 are not repeated here.
  • the response module 30 may include: a first play operation response unit 301, a pause operation response unit 302, a second play operation response unit 303, a drag operation response unit 304, and a voice play unit 305;
  • the first play operation response unit 301 is configured to respond to the first play trigger operation for the message bar, perform voice play on the audio data, and record the audio play progress of the audio data, and the audio play progress is displayed according to the audio play progress.
  • the progress indicator cursor is displayed in the voiceprint diagram;
  • the pause operation response unit 302 is configured to respond to the pause trigger operation for the message bar, stop the audio playback of the audio data, and record the stop timestamp of the position where the progress indication cursor is located when the stop is stopped;
  • the second play operation response unit 303 is configured to respond to the second play trigger operation for the message bar, and start playing voice from the position of the stop timestamp in the audio data;
  • the drag operation response unit 304 is configured to respond to the drag trigger operation for the progress indication cursor in the message bar, and obtain the first time stamp of the dragged progress indication cursor in the audio data, and In the first text display area corresponding to the voiceprint image, the text information of the audio data corresponding to the first time stamp is displayed, and according to the dragged progress indicator cursor, the played voiceprint area and the Play the voiceprint area to update the area;
  • the voice playing unit 305 is configured to obtain the second time stamp of the progress indication cursor in the audio data when the dragging ends, and start playing the voice from the position of the second time stamp in the audio data.
  • the first play operation response unit 301, the pause operation response unit 302, the second play operation response unit 303, the drag operation response unit 304, and the specific function implementation of the voice play unit 305 can be seen in the corresponding figure 3a-3c above Steps S201 to S205 in the embodiment will not be repeated here.
  • the quantity determining unit 201 may include: a duration obtaining subunit 2011, a length obtaining subunit 2012, and a quantity determining subunit 2013;
  • the duration obtaining subunit 2011 is configured to obtain the audio duration corresponding to the audio data
  • the length obtaining subunit 2012 is configured to determine the length of the message bar in the instant messaging application according to the audio duration
  • the quantity determining subunit 2013 is configured to determine the number of voiceprint points corresponding to the audio data according to the distance between the length of the message bar and the adjacent voiceprint points.
  • the specific functional implementation of the duration obtaining subunit 2011, the length obtaining subunit 2012, and the quantity determining subunit 2013 may refer to step S302 to step S305 in the embodiment corresponding to FIG. 4, which will not be repeated here.
  • the height determining unit 202 may include: a unit duration determining subunit 2021, a to-be-processed height determining subunit 2022, and a voiceprint height determining subunit 2023;
  • the unit duration determining subunit 2021 is configured to determine the unit audio duration corresponding to each voiceprint point according to the audio duration
  • the to-be-processed height determining sub-unit 2022 is configured to obtain the volume average value corresponding to the sampled volume data within the unit audio duration, and determine the to-be-processed height corresponding to each voiceprint point based on the volume average value;
  • the voiceprint height determining subunit 2023 is configured to obtain interpolation parameter information corresponding to the height to be processed, and determine the height corresponding to each voiceprint point based on the interpolation parameter information and the height to be processed.
  • the unit duration determining sub-unit 2021, the to-be-processed height determining sub-unit 2022, and the voiceprint height determining sub-unit 2023 can refer to step S306-step S308 in the embodiment corresponding to FIG. Repeat.
  • the voiceprint image generating unit 203 may include: a voiceprint image generating subunit to be processed 2031, a voiceprint shape selecting subunit 2032, a first voiceprint image determining subunit 2033, and a display parameter extracting subunit 2034 , The second voiceprint image determination subunit 2035;
  • a voiceprint image to be processed generating subunit 2031 configured to generate a voiceprint image to be processed corresponding to the audio data according to the number and the height;
  • the voiceprint shape selection subunit 2032 is used to obtain the voice parameter corresponding to the audio data, and select the voiceprint shape type matching the voice parameter from the voiceprint library;
  • the first voiceprint image determining subunit 2033 is configured to determine the voiceprint image corresponding to the audio data according to the voiceprint shape type and the voiceprint image to be processed;
  • the display parameter extraction subunit 2034 is configured to obtain the message bar display type corresponding to the audio data, and extract the voiceprint display parameters that match the message bar display type;
  • the second voiceprint map determining subunit 2035 is configured to generate a voiceprint map corresponding to the audio data according to the voiceprint display parameter, the number, and the height.
  • the generating module 30 may include: a voiceprint image generating subunit 2031 to be processed, a voiceprint shape selecting subunit 2032, and a first voiceprint image determining subunit 2033.
  • the generation module 30 may also include: a display parameter extraction sub-unit 2034, a second voiceprint image determination sub-unit 2035, the specific function implementation manner can refer to the step S403 in the embodiment corresponding to FIG. 8 -Step S404, which will not be repeated here.
  • the quantity determining subunit 2013 may include: a voiceprint length determining subunit 20131, and a voiceprint point quantity determining subunit 20132;
  • the voiceprint length determining subunit 20131 is configured to determine the length of the voiceprint area corresponding to the audio data according to the reserved margin corresponding to the message column and the length of the message column;
  • the voiceprint point quantity determination subunit 20132 is configured to determine the number of voiceprint points corresponding to the audio data according to the length of the voiceprint area, the size of the voiceprint point pattern, and the distance between adjacent voiceprint points.
  • the specific function implementation of the voiceprint length determining subunit 20131 and the voiceprint point number determining subunit 20132 may refer to step S304 to step S305 in the embodiment corresponding to FIG. 4, which will not be repeated here.
  • the height determination subunit 2022 to be processed may include: an average value determination subunit 20221, a first height determination subunit 20222, a second height determination subunit 20223, and a third height determination subunit 20224;
  • the average value determining subunit 20221 is configured to obtain the average volume value corresponding to the target sampling data within the unit audio duration;
  • the first height determining subunit 20222 is configured to determine the target value as the to-be-processed height of the voiceprint point corresponding to the volume average if the volume average value is less than the first volume threshold;
  • the second height determining subunit 20223 is configured to determine the corresponding volume average value according to the linear increase function between the volume and the height if the volume average value is greater than or equal to the first volume threshold value and less than the second volume threshold value The height of the voiceprint point to be processed;
  • the third height determining sub-unit 20224 is configured to determine the pending voiceprint points corresponding to the volume average value according to the logarithmic growth function between the volume and the height if the volume average value is greater than or equal to the second volume threshold value. Processing height.
  • the specific function implementation of the mean value determining subunit 20221, the first height determining subunit 20222, the second height determining subunit 20223, and the third height determining subunit 20224 can be referred to step 307 in the embodiment corresponding to FIG. 4. Do not repeat them here.
  • the voice print corresponding to the audio data is displayed in the message bar.
  • the user can click on the message bar to play/pause the voice, and the voice area can be judged through the visual voice print, and the voice can be adjusted by sliding
  • you can watch the text translation corresponding to the voice in real time when you adjust the progress which can increase the diversity of audio data display forms and enrich the audio data operation methods; and can efficiently help users listen, view, and operate voice messages, which is greatly enhanced
  • the interactivity, readability, and efficiency of voice messages better promote the convenient use of voice messages by users of instant messaging applications.
  • FIG. 12 is a schematic structural diagram of a data processing device based on an instant messaging application according to an embodiment of the present application.
  • the instant messaging application-based data processing device 1000 may include a processor 1001, a network interface 1004, and a memory 1005.
  • the foregoing instant messaging application-based data processing device 1000 may also include: a user interface 1003, And at least one communication bus 1002.
  • the communication bus 1002 is used to implement connection and communication between these components.
  • the user interface 1003 may include a display screen (Display) and a keyboard (Keyboard), and the user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1004 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the memory 1005 may also be at least one storage device located far away from the aforementioned processor 1001. As shown in FIG. 12, the memory 1005 as a computer-readable storage medium may include an operating system, a network communication module, a user interface module, and a device control application program.
  • the network interface 1004 can provide network communication functions; the user interface 1003 is mainly used to provide input for the user; and the processor 1001 can be used to call the memory 1005
  • the stored device control application program is used to implement the description of the data processing method based on the instant messaging application in any one of the embodiments corresponding to FIG. 2, FIG. 4, and FIG. 8, which will not be repeated here.
  • the instant messaging application-based data processing device 1000 described in the embodiments of the present application can perform the processing of the instant messaging application-based data processing method in any one of the foregoing embodiments corresponding to FIG. 2, FIG. 4, and FIG.
  • the description of the data processing device 1 based on the instant messaging application in the embodiment corresponding to FIG. 11 can also be performed, which is not repeated here.
  • the description of the beneficial effects of using the same method will not be repeated.
  • the embodiment of the present application also provides a computer-readable storage medium, and the computer-readable storage medium stores the aforementioned data processing device 1 based on the instant messaging application.
  • a computer program, and the computer program includes program instructions.
  • the processor executes the program instructions, it can execute the instant messaging-based application in any of the foregoing embodiments corresponding to Figure 2, Figure 4, and Figure 8.
  • the description of the data processing method therefore, will not be repeated here.
  • the description of the beneficial effects of using the same method will not be repeated.
  • For technical details not disclosed in the embodiment of the computer-readable storage medium involved in this application please refer to the description of the method embodiment of this application.
  • the program can be stored in a computer readable storage medium. During execution, it may include the procedures of the above-mentioned method embodiments.
  • the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • User Interface Of Digital Computer (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本申请实施例公开了一种基于即时通讯应用的数据处理方法、装置、设备和存储介质,该方法由基于即时通讯应用的数据处理设备执行,所述方法包括:在即时通讯应用中获取音频数据,并基于采样频率获取音频数据对应的采样音量数据;根据音频数据与采样音量数据,生成音频数据对应的声纹图,输出包含声纹图和音频数据的消息栏;响应针对消息栏的目标触发操作,对音频数据进行音频进度控制,并基于音频进度对声纹图进行显示控制。

Description

一种基于即时通讯应用的数据处理方法、装置、设备和存储介质
本申请要求于2019年4月12日提交中国专利局、申请号为201910295763.6、发明名称为“一种基于即时通讯应用的数据处理方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及互联网技术领域,尤其涉及一种基于即时通讯应用的数据处理方法、装置、设备和存储介质。
发明背景
随着互联网的发展,越来越多用户会使用即时通讯应用进行聊天,即时通讯应用中的语音消息功能由于操作方式简单、交流自然,已经成为各个年龄段用户的一项日常需求。
目前的即时通讯应用聊天场景中,当用户接收到语音消息时,可以在消息栏中显示该语音消息的时长,并为该用户提供点击播放语音消息的功能。可见,消息栏中仅显示语音消息的时长,对于语音消息的展示形式过于单一;而且对于接收到的语音消息,采用点击播放收听语音消息的操作方式也过于单一。
发明内容
本申请实施例提供一种基于即时通讯应用的数据处理方法、装置、设备和存储介质,可以提高音频数据展示形式的多样性,并丰富音频数据操作方式。
本申请实施例一方面提供了一种基于即时通讯应用的数据处理方法,由基于即时通讯应用的数据处理设备执行,包括:
在即时通讯应用中获取音频数据,并基于采样频率获取所述音频数据对应的采样音量数据;
根据所述音频数据与所述采样音量数据,生成所述音频数据对应的声纹图,输出包含所述声纹图和所述音频数据的消息栏;
响应针对所述消息栏的目标触发操作,对所述音频数据进行音频进度控制,并基于音频进度对所述声纹图进行显示控制。
本申请实施例一方面提供了一种基于即时通讯应用的数据处理装置,包括:
采样模块,用于在即时通讯应用中获取音频数据,并基于采样频率获取所述音频数据对应的采样音量数据;
生成模块,用于根据所述音频数据与所述采样音量数据,生成所述音频数据对应的声纹图,输出包含所述声纹图和所述音频数据的消息栏;
响应模块,用于响应针对所述消息栏的目标触发操作,对所述音频数据进行音频进度控制,并基于音频进度对所述声纹图进行显示控制。
本申请实施例一方面提供了一种基于即时通讯应用的数据处理设备,包括:处理器和存储器;
所述处理器和存储器相连,其中,所述存储器用于存储程序代码,所述处理器用于调用所述程序代码,以执行上述基于即时通讯应用的数据处理方法。
本申请实施例一方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时,执行上述基于即时通讯应用的数据处理方法。
可见,在即时通讯应用的消息栏中展示音频数据对应的声纹图,不仅可以提高音频数据展示形式的多样性,而且通过对音频数据进行音频进度控制,可以丰富音频数据操作方式。
附图简要说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种基于即时通讯应用的数据处理方法的场景示意图;
图2是本申请实施例提供的一种基于即时通讯应用的数据处理方法的流程示意图;
图3a-图3c是本申请实施例提供的一种响应针对消息栏的目标触发操作的界面示意图;
图4是本申请实施例提供的另一种基于即时通讯应用的数据处理方法的流程示意图;
图5是本申请实施例提供的一种声纹可视化计算规则的示意图;
图6a-图6c是本申请实施例提供的可视化声纹形状类型的界面示意图;
图7是本申请实施例提供的一种基于即时通讯应用的语音消息技术实现模型图;
图8是本申请实施例提供的另一种基于即时通讯应用的数据处理方法的流程示意图;
图9是本申请实施例提供的一种个性化消息显示类型的界面示意图;
图10是本申请实施例提供的一种消息栏功能模型的结构示意图;
图11是本申请实施例提供的一种基于即时通讯应用的数据处理装置的结构示意图;
图12是本申请实施例提供的另一种基于即时通讯应用的数据处理设备的结构示意图。
实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参见图1,是本申请实施例提供的一种基于即时通讯应用的数据处理方法的场景示意图。如图1所示,用户可以从终端设备100a中打开即时通讯应用(如QQ、微信等),点击即时通讯应用中的任意联系人对应的聊天窗口200a,在该聊天窗口200a中,在用户点击语音图标200b后,可以在聊天窗口200a中的底端显示语音操作面板300a,在该语音操作面板300a中,有多种语音模式(如语音模式、变声模式、录音模式)可供用户选择。若用户选择语音模式,则该用户可以用手指按住语音图标,并对准终端设备100a的话筒说话,与此同时,即时通讯应用可以实时录制该用户的语音,并在语音操作面板300a中显示录制语音的时长信息(如0:07);当用户松开手指时,可以将录制的语音作为音频数据,并根据音频数据生成相应的第一声纹图,在本端显示包含该音频数据与第一声纹图的消息栏,同时也会将所述音频数据发送给该聊天窗口200a对应的联系人(该用户可以称为音频数据的发送者,该联系人可以称为该音频数据的接收者),该联系人所在的接收端也可以根据接收到的音频数据生成相应的第二声纹图,并在接收端显示包含该音频数据与第二 声纹图的消息栏。其中,本端显示的消息栏中包含的语音信息与接收端显示的消息栏中包含的语音信息是一样的,但在消息栏中显示的声纹图,即第一声纹图与第二声纹图可以相同,也可以不同。当本端与发送端使用的消息栏显示类型不同时,第一声纹图与第二声纹图在显示颜色上会有所差异。以该用户对应的客户端聊天窗口200a为例,聊天窗口200a中可以显示该音频数据对应的消息栏400a,该消息栏400a可以显示该音频数据对应的声纹图,即可以显示音频数据的声音大小以及音频数据的高低音节。用户可以通过点击该消息栏400a对音频数据进行播放,在播放过程中可以实时记录音频数据的播放进度,并根据语音播放进度在消息栏400a中显示进度指示游标500a,该进度指示游标500a可以将音频数据对应的声纹图划分为两个区域,分别为已经播放的语音对应的声纹区域(即声纹图区域101a)和未播放的语音对应的声纹区域(即声纹图区域101b),已经播放的语音对应的声纹区域和未播放的语音对应的声纹区域具有不同的显示颜色,用户可以根据进度指示游标500a和颜色信息快速确定语音播放的进度信息。用户可以将当前播放的音频数据点击暂停,并再次点击该音频数据对应的消息栏400a时,可以从该音频数据对应的暂停节点继续向后播放该语音。例如,音频数据对应的时长为1:15,若用户在该音频数据播放到0:20时刻时点击暂停播放,并再次点击该音频数据对应的消息栏400a时,可以从0:20时刻继续往后播放该音频数据。
其中,终端设备100a可以包括手机、平板电脑、笔记本电脑、掌上电脑、移动互联网设备(mobile internet device,MID)、POS(Point Of Sales,销售点)机、可穿戴设备(例如智能手表、智能手环等)或其他具有安装即时通讯应用功能的终端设备。
请参见图2,是本发明实施例提供的一种基于即时通讯应用的数据处理方法的流程示意图。如图2所示,该方法可以由如图12所示的基于即时通讯应用的数据处理设备执行,包括以下步骤:
步骤S101,在即时通讯应用中获取音频数据,并基于采样频率获取所述音频数据对应的采样音量数据;
具体的,终端设备可以包括发送端和接收端,在发送端可以在即时通讯应用中直接记录用户的声音数据作为音频数据,而接收端可以将从发送端接收到的语音数据作为音频数据。当用户在即时通讯应用的聊天窗口中,点击语音图标进入语音操作面板,选择需要的语音模式,并按住语音图标说话时,终端设备可以实时记录该用户的声音数据,并将记录的用户声音数据作为音频数据。需要说明的是,当用户选择变声模式时,音频数据为该用户进行变声之后的声音数据。在终端设备中,不同的操作系统对于音频数据的数字化表现方式存在差异,因此需要对采集 到的音频数据进行统一转换处理,转换成0-255范围的分贝数据。例如在某终端操作系统中,采集到的声音数据对应的区间为[0,1],因此需要将该终端采集到的声音数据换算成区间为[0,255]的分贝数据。在将音频数据进行转换处理后,确定该音频数据对应的采样频率,并基于该采样频率对音频数据进行声音采样,得到采样音量数据,即对音频数据进行分贝数据转换处理后,再进行声音采样。对于获取到的音频数据,可以先基于采样频率对音频数据进行声音采样,在得到采样音量数据后,将采样音量数据转换成0-255范围的分贝数据,即对音频数据进行声音采样后,再对采样音量数据进行分贝数据转换处理。例如,采样频率为100次/秒时,表示在音频数据中每秒钟可以采样100个声音数据点。
步骤S102,根据所述音频数据与所述采样音量数据,生成所述音频数据对应的声纹图,输出包含所述声纹图和所述音频数据的消息栏;
具体的,终端设备可以获取音频数据对应的音频时长,并根据音频时长可以确定音频数据对应的声纹点的数量,进而可以确定每个声纹点在音频数据中对应的音频时长,并根据每个声纹点对应的音频时长内的采样音量数据,确定每个声纹点分别对应的高度。换言之,声纹点的高度与采样音量数据中声音音量有关,在预设音量范围内(如用户语音常用音量范围),声音音量越大,声纹点对应的高度就越高。为了保证消息栏的可读性与视觉上的美观,当采样音量数据中的声音音量低于预设音量范围(如60-150分贝)时,声纹点对应的高度取最小值;当采样音量数据中的声音音量高于预设音量范围时,声纹点对应的高度取最大值。根据上述确定的声纹点的数量和每个声纹点分别对应的高度,终端设备可以生成音频数据对应的声纹图,并可以在即时通讯应用的聊天窗口中,输出包含声纹图和音频数据的消息栏。其中,该声纹图是利用图形对音频数据进行可视化信息展示,即利用声纹图表达音频数据中声纹元素的位置,音节的高低(声纹图中,声纹点的高度可以表示为音频数据中声音音量的大小,声纹点高度的变化趋势,可以表示为音频数据中声音音量的变化趋势)。因此可以根据声纹图感知音频数据的声音大小、声音变化,进而可以使用户能够快速判断包含该音频数据的消息栏的操作方式(如听筒模式、免提模式、无声音状态等)。若声纹图中,声纹点对应的高度较低,则可以选择在免提模式对该消息栏进行操作;若声纹图中,声纹点对应的高度较高,则可以选择在无声音状态下或者听筒模式下对该消息栏进行操作。
步骤S103,响应针对所述消息栏的目标触发操作,对所述音频数据进行音频进度控制,并基于音频进度对所述声纹图进行显示控制。
具体的,终端设备可以响应用户针对上述消息栏的目标触发操作,以对音频数据进行音频进度控制,并基于音频进度对所述声纹图进行显 示控制,即可以实时记录音频数据的进度信息,并根据进度信息在包含音频数据的消息栏中,显示音频数据中已读音频数据与未读音频数据进度。其中,目标触发操作可以包括播放触发操作、暂停触发操作、拖动触发操作,还可以包括语音转文字触发操作、翻译触发操作等。
进一步的,请参见图3a-图3c,是本申请实施例提供的一种响应针对消息栏的目标触发操作的界面示意图。如图3a-图3c所示,步骤S201-步骤S207是对上述图2所对应实施例中步骤S103的具体描述,即步骤S201-步骤S207是本申请实施例提供的一种响应针对消息栏的目标触发操作的具体流程。
当目标触发操作包括第一播放触发操作或暂停触发操作或第二播放触发操作时,如图3a所示,响应目标触发操作可以包括以下步骤:
步骤S201,响应针对所述消息栏的第一播放触发操作,对所述音频数据进行语音播放,并记录所述音频数据的音频播放进度,根据所述音频播放进度在声纹图中显示所述进度指示游标;
具体的,在即时通讯应用的聊天窗口中,若用户接收到了多条语音信息,即多个音频数据,则可以在聊天窗口中显示上述多条语音消息对应的消息栏,如消息栏400b,消息栏400c,消息栏400d等,对于用户未读取的音频数据,即时通讯应用可以对未读取音频数据对应的消息栏进行标识,如用户在读取消息栏400b中包含的音频数据之前,可以在标识区域401b中显示一个小圆点对该消息栏400b进行标识。当用户点击播放消息栏400b所包含的音频数据时,终端设备可以响应针对该消息栏400b的播放触发操作,也可以称为第一播放触发操作,进而对该消息栏400b中的音频数据进行语音播放,同时可以清除消息栏400b对应的标识,即清除标识区域401b中的小圆点。在语音播放过程中,可以记录音频数据的音频播放进度,并根据音频播放进度在消息栏400b所包含的声纹图中显示进度指示游标500b,进度指示游标500b可以用于区分声纹图中的已播放声纹区域和未播放声纹区域,上述已播放声纹区域和未播放声纹区域具有不同的显示方式(如可以显示不同的颜色)。
步骤S202,响应针对所述消息栏的暂停触发操作,停止对所述音频数据进行语音播放,并记录停止时所述进度指示游标所处位置的停止时间戳;
具体的,在消息栏400b所包含的音频数据正在播放的过程中,用户可以点击暂停播放该音频数据。当用户点击暂停时,终端设备可以响应针对该消息栏400b的暂停触发操作,停止对该消息栏400b中的音频数据进行语音播放,并可以记录音频数据停止播放时进度指示游标500c对应的停止时间戳,即记录音频数据停止播放时的时刻。若音频数据对应的音频时长为2:00分钟,用户点击停止播放时,音频数据正播放到 0:30时刻,则进度指示游标500c所处位置的停止时间戳为音频数据中的0:30时刻。
步骤S203,响应针对所述消息栏的第二播放触发操作,从所述音频数据中的所述停止时间戳所在位置开始播放语音。
具体的,对于暂停播放的消息栏400b,用户可以点击再次播放。当用户对该消息栏400b点击再次播放时,终端设备可以响应针对该消息栏400b的再次播放触发操作,也可以称为第二播放触发操作(这里的第二播放触发操作是为了区分步骤S201中的第一播放触发操作),从该消息栏400b所包含的音频数据中的停止时间戳所在位置开始播放语音,即从暂停处开始播放该消息栏400b所包含的音频数据。当该消息栏400b所包含的音频数据播放完成时,可以自动播放下一个消息栏400c中的音频数据,并在播放消息栏400c所包含的音频数据时,可以对消息栏400b所包含的音频数据对应的音频播放进度进行清除,即在客户端仅存储一条音频数据对应的音频播放进度。同理,当播放完消息栏400c所包含的音频数据时,可以自动播放消息栏400d所包含的音频数据,直至播放完该用户聊天窗口中的所有未读取的音频数据,或者响应针对该音频数据对应的消息栏的暂停触发操作,停止播放语音。
当目标触发操作包括拖动触发操作时,如图3b所示,响应目标触发操作可以包括以下步骤:
步骤S204,响应针对所述消息栏中的所述进度指示游标的拖动触发操作,获取所拖动的所述进度指示游标在音频数据中的第一时间戳,在所述声纹图对应的第一文字显示区域中,显示所述第一时间戳对应的音频数据的文字信息,并根据所拖动的所述进度指示游标对所述已播放声纹区域和所述未播放声纹区域进行区域更新;
具体的,用户还可以按住消息栏中的进度指示游标,并拖动进度指示游标,以使消息栏所包含的音频数据可以在任意时刻进行播放。当用户按住消息栏400e中的进度指示游标500e并进行拖动时,终端设备可以响应该用户针对进度指示游标500e的拖动触发操作,获取在拖动过程中进度指示游标500e在消息栏400e所包含的音频数据中的第一时间戳,即记录进度指示游标500e在拖动过程中的音频进度,并在声纹图中的第一文字区域600a中显示第一时间戳对应的音频数据的文字信息,即在用户拖动过程中可以实时展示当前进度的文字字段内容,以使用户可以根据文字字段内容确定进度指示游标500e的准确停止位置。例如,音频数据对应的音频时长为2分钟,若用户在播放了该音频数据后,对于该音频数据中的部分语音内容想进行再次播放,为了避免浪费时间(重新播放整条音频数据需要花费2分钟),该用户可以按住该音频数据对应的消息栏中的进度指示游标并拖动,根据拖动过程中的文字字段内容,确 定进度指示游标的具体位置,即用户想要进行再次播放的语音内容所在位置。另外,在人的说话习惯中,通常在一句话中的语气助词部分(即一句话的最后)音量较小,由于声纹图中声纹条对应的高度可以表示音量的大小,因此用户还可以根据声纹条的高度确定进度指示游标500e的准确停止位置,以便用户可以从下一句完整的句子处开始播放,而不是在一句话的中间开始收听。在拖动过程中,可以根据拖动的进度指示游标500e对声纹图中的已播放声纹区域和未播放声纹区域进行实时更新。换言之,将进度指示游标500e向声纹图区域102a拖动时,可以将所拖动的声纹区间确定为未播放声纹区域,将进度指示游标500e向声纹图区域102b拖动时,可以将所拖动的声纹区间确定为已播放声纹区域。
其中,在消息栏中还可以显示刻度表,如消息栏400e中刻度表102c,在刻度表102c中,可以根据消息栏400e所包含的音频数据对应的的音频时长确定刻度表102c中的时间刻度,如消息栏400e所包含的音频数据对应的音频时长为120秒,则在刻度表102c中可以显示对应的时间信息,以使用户可以根据时间确定进度指示游标500e的准确停止位置。
步骤S205,获取拖动结束时所述进度指示游标在音频数据中的第二时间戳,从所述音频数据中的所述第二时间戳所在位置开始播放语音。
具体的,当用户停止拖动时,终端设备可以获取拖动结束时进度指示游标500e在消息栏400e所包含的音频数据中的第二时间戳,即拖动停止时的时间戳,并从消息栏400e所包含的音频数据中的第二时间戳所在位置开始播放语音。例如,用户将进度指示游标500e从音频数据中的0:30时刻拖动到0:50时刻,并在0:50时刻停止拖动时,可以从音频数据的0:50时刻开始播放语音。
需要说明的是,若用户在拖动进度指示游标500e之前,正在播放消息栏400e中的目标音频数据,则在用户拖动进度指示游标500e的过程中,可以按照拖动之前的语音播放进度进行正常播放,直到进度指示游标500e停止拖动时,才跳转到进度指示游标500e停止时所在时刻进行语音播放。在播放消息栏400e中的音频数据的情形下,用户拖动进度指示游标500e的过程中可以暂停语音播放,直到进度指示游标500e停止拖动时,才跳转到进度指示游标500e停止时所在时刻进行语音播放。
针对包含音频数据和声纹图的消息栏,目标触发操作还可以包括文本转换触发操作或翻译触发操作,如图3c所示,响应目标触发操作可以包括以下步骤:
步骤S206,响应针对所述消息栏的文本转换触发操作,将所述音频数据转换成第一文本数据,并在所述声纹图对应的第二文字显示区域中,显示所述第一文本数据;
具体的,当用户不方便收听语音消息时,可以长按消息栏选择转文 字功能,将音频数据转换为文字信息并进行显示,以使用户通过查看文字信息来读取语音消息。如图3c所示,当用户长按消息栏400f时,在该消息栏400f的相应区域可以弹出一个菜单窗口700a,用户可以选择菜单窗口700a中的转文字选项701a,在用户点击转文字选项701a后,终端设备可以响应该用户针对消息栏400f的文本转换触发操作,对消息栏400f所包含的音频数据进行语音识别,得到音频数据对应的文字信息,也可以称为第一文本数据,并在声纹图对应的第二文字显示区域600b中显示上述第一文本数据。其中,即时通讯应用为音频数据提供转文字选项701a,表示即时通讯应用具备文字转换功能,文字转换功能是指将音频数据转换成相应的文字信息。例如,若音频数据中的语音信息为汉语,文字转换功能可以将该音频数据转换成汉字文字信息;若音频数据中的语音信息为英语,文字转换功能可以将该音频数据转换成英文文字信息;若音频数据中的语音信息为方言(如湖南话、重庆话、粤语等),文字转换功能可以识别该音频数据中的方言,并将该音频数据转换成汉字文字信息。
步骤S207,响应针对所述第一文本数据的翻译触发操作,对所述第一文本数据进行文本类型转换处理,得到第二文本数据,并在所述第二文字显示区域中显示所述第二文本数据。
具体的,当用户接收到的目标音频信息为外语(如:俄语、德语等)语音时,通过文字转换功能将目标音频信息转换成第一文本数据(该第一文本数据为外语文字信息)后,该用户无法理解第一文本数据中的内容,用户可以长按第二文字显示区域600b中的第一文本数据,在该消息栏400f的相应区域可以弹出一个菜单窗口700b,用户可以选择菜单窗口700b中的翻译选项701b,在用户点击翻译选项701b,并选择翻译语言类型后,终端设备可以响应该用户针对第一文本数据的翻译触发操作,对第一文本数据进行文本类型转换处理(即翻译处理),得到与用户选择的翻译语言类型相匹配的第二文本数据,并在第二文字显示区域700b中将第一文本数据替换成第二文本数据,即在第二文字显示区域700b中显示翻译后的文字信息。其中,翻译功能中可以实现多种语言类型之间的相互翻译,例如可以将汉语翻译成英语、日语、德语等,也可以将英语、日语、德语翻译成汉语等,还可以将英语翻译成德语、意大利语等。
本申请实施例可以在即时通讯应用中获取音频数据,通过对音频数据进行采样,得到采样音量数据,进而可以根据音频数据的音频时长确定声纹点的数量,根据采样音量数据确定每个声纹点对应的高度,进而可以根据声纹点的数量与每个声纹点对应的高度,生成音频数据对应的声纹图,并在即时通讯应用中输出包含声纹图与音频数据的消息栏,并 可以响应针对该消息栏的触发操作,记录音频数据的音频进度,基于音频进度对声纹图进行显示控制。可见,在即时通讯应用的聊天场景中,消息栏中展示音频数据对应的声纹图,用户可以点击消息栏播放/暂停语音,还可以通过可视化声纹图判断声音的区域,并可以滑动调节语音进度,同时调节进度时可实时观看语音对应的文字翻译,进而可以提高音频数据展示形式的多样性,丰富音频数据操作方式;并且能够高效地帮助用户收听、查看、操作语音消息,大大的增强了语音消息的互动性、阅读性、高效性,更好的促进即时通讯应用用户对语音消息的便捷使用。
请参见图4,是本申请实施例提供的另一种基于即时通讯应用的数据处理方法的流程示意图。如图4所示,该方法可以由如图12所示的基于即时通讯应用的数据处理设备执行,包括以下步骤:
步骤S301,在即时通讯应用中获取音频数据,并基于采样频率获取所述音频数据对应的采样音量数据;
其中,步骤S301的具体实现过程可以参见上述图2所对应实施例中对步骤S101的描述,这里不再进行赘述。
步骤S302,获取所述音频数据对应的音频时长;
具体的,终端设备在即时通讯应用中获取到音频数据后,可以获取音频数据对应的音频时长,即从用户在即时通讯应用的语音操作面板中按住语音图标说话到松开的时长信息。
步骤S303,根据所述音频时长确定所述即时通讯应用中的消息栏的长度;
具体的,在即时通讯应用中,预先设置有音频数据对应的音频时长与消息栏长度的对应关系,因此在获取到音频数据对应的音频时长后,可以从即时通讯应用的存储数据表中,查找与该音频时长相匹配的消息栏的长度。
步骤S304,根据所述消息栏对应的预留边距与所述消息栏的长度,确定音频数据对应的声纹区域长度;
具体的,终端设备可以获取即时通讯应用中消息栏对应的预留边距,该预留边距包括消息栏的左边预留边距和右边预留边距,该预留边距可以根据消息栏的长度来确定,如左边预留边距和右边预留边距分别为消息栏的长度的5%,或者也可以进行预先设定,如不论消息栏的长度为多少,左边预留边距和右边预留边距均设置为2mm。根据消息栏的长度与上述预留边距,可以确定音频数据对应的声纹区域长度,即将消息栏的长度减去预留边距,可以得到音频数据对应的声纹区域长度。
步骤S305,根据所述声纹区域长度、声纹点图形尺寸以及相邻声纹点之间的距离,确定所述音频数据对应的声纹点的数量;
具体的,获取声纹点图形尺寸以及相邻声纹点之间的距离,可以根 据声纹区域长度、声纹点图形尺寸以及相邻声纹点之间的距离,确定音频数据对应的声纹点的数量。若声纹点图形为声纹点,则若声纹点图形尺寸可以忽略不记,采用表达式:[声纹区域长度+相邻两个声纹点之间的距离]/相邻两个声纹条之间的距离,可以计算得到声纹点的数量;若声纹点图形为声纹条,即以声纹点作为声纹条上边长的中点,则获取声纹条的宽度,相邻两个声纹条之间的距离,采用表达式:[声纹区域长度+相邻两个声纹条之间的距离]/(声纹条的宽度+相邻两个声纹条之间的距离),可以计算得到声纹条的数量。需要说明的是,声纹条的宽度与声纹点之间的距离都是固定的。
需要说明的是,当音频数据对应的音频时长超过时长阈值(如40s)时,可以将固定值(如25)确定为音频数据对应的声纹点的数量。当音频数据对应的音频时长小于或等于时长阈值(如40s)时,才会执行上述步骤S303-步骤S305。
步骤S306,根据所述音频时长,确定每个声纹点分别对应的单位音频时长;
具体的,在确定了声纹点的数量后,可以根据音频时长确定每个声纹点对应的单位音频时长,每个声纹点在音频数据中对应的单位音频时长之和等于音频数据的音频时长。例如,音频数据对应的时长为10s,声纹点的数量为10个,可以确定每个声纹点对应的单位音频时长为1s,即第一个声纹点对应的单位音频时长区间为音频数据中的0-1s,第二个声纹点对应的单位音频时长区间为音频数据中的1s-2s,以此类推,可以确定每个声纹点在音频数据中对应的单位音频时长区间。
步骤S307,获取所述单位音频时长内的采样音量数据对应的音量均值,基于所述音量均值确定每个声纹点分别对应的待处理高度;
具体的,可以获取上述单位音频时长内的采样音量数据对应的音量均值。例如,采样频率为100次/秒,某声纹点对应的单位音频时长区间为1s-2s,则计算采样音量数据中1s-2s范围内采样的100个声音数据的平均音量。根据音量与声纹点高度之间的函数关系,确定每个声纹点分别对应的待处理高度,具体的实施过程为:若所述音量均值小于第一音量阈值,则将目标数值确定为所述音量均值对应的声纹点的待处理高度;若所述音量均值大于或等于所述第一音量阈值且小于第二音量阈值,则根据音量与高度之间的线性增长函数,确定所述音量均值对应的声纹点的待处理高度;若所述音量均值大于或等于所述第二音量阈值,则根据音量与高度之间的对数增长函数,确定所述音量均值对应的声纹点的待处理高度。请一并参见图5,是本申请实施例提供的一种声纹可视化计算规则的示意图。如图5所示,声纹点的待处理高度对于音量的变化并非线性,而是可以表示为一个分段函数,当音量均值小于第一音 量阈值,即小于用户语音正常音量时,可以将目标数值(一个固定值)确定为该音量均值对应的声纹点的待处理高度。换言之,小于用户语音正常音量的音量均值对应的声纹点的待处理高度始终处于最小值。当音量均值大于或等于第一音量阈值且小于第二音量阈值,即处于用户语音正常音量范围时,声纹点的待处理高度对于音量的变化是线性的,可以根据音量与高度之间的线性增长函数,确定音量均值对应的声纹点的待处理高度。当音量均值大于或等于第二音量阈值,即大于用户语音常用音量时,声纹点的待处理高度对于音量的变化是非线性的,且随着音量的增大,待处理高度达到最大值,可以根据音量与高度之间的对数增长函数,确定所述音量均值对应的声纹点的待处理高度。
步骤S308,获取所述待处理高度对应的插值参数信息,基于所述插值参数信息与所述待处理高度,确定每个声纹点分别对应的高度;
具体的,在实际应用中,通过上述声纹可视化规则计算得到的声纹点对应的待处理高度之间的高度差异偏小,因此可以使用减速插值器放大待处理高度之间的差异,即对于两个具有不同待处理高度的声纹点,可以通过减速差值器获得两个待处理高度分别对应的插值参数信息,通过将待处理的高度与各自对应的插值参数信息相乘,可以增大两个待处理高度之间的高度差。例如,在进行放大处理之前,两个待处理高度之间的高度差为0.01厘米,经过放大处理后,两个待处理高度之间的高度差可以变为0.05厘米。
步骤S309,根据所述数量与所述高度,生成所述音频数据对应的待处理声纹图;
具体的,可以根据声纹点的数量和每个声纹点对应的高度,可以绘制出音频数据对应的待处理声纹图。该待处理声纹图可以包括音频数据中声音的大小和高低音节等信息。
步骤S310,获取所述音频数据对应的声音参数,从声纹图库中选择与所述声音参数相匹配的声纹形状类型;
具体的,可以获取音频数据对应的声音参数,可以根据声音参数信息确定音频数据对应的声音类型,并根据声音类型从声纹图库中选择与该声音类型相匹配的声纹形状类型。在即时通讯应用中,可以包括多种声音类型,如“正常”声音类型,“萝莉”声音类型,“大叔”声音类型,“惊悚”声音类型,“搞怪”声音类型等,每种声音类型具有不同的声音参数,也可以对应不同的声纹形状类型。当然,可以是一种声音类型对应一种声纹形状类型,如“正常”声音类型对应的声纹形状类型为条形声纹类型,“萝莉”声音类型对应的声纹形状类型为曲线声纹类型等;也可以是多种声音类型对应一种声纹形状类型,如“正常”声音类型和“大叔”声音类型对应的声纹形状类型都为条形声纹类型,“萝莉”声 音类型、“惊悚”声音类型以及“搞怪”声音类型对应的声纹形状类型为曲线声纹类型等,这里不做限定。需要说明的是,声纹图库存储有声音类型与声纹形状类型的对应关系,可以根据声音类型在声纹图库中直接查找声纹形状类型。
步骤S311,根据所述声纹形状类型与所述待处理声纹图,确定所述音频数据对应的声纹图,输出包含所述声纹图和所述音频数据的消息栏;
具体的,请一并参见图6a-图6c,是本申请实施例提供的可视化声纹形状类型的界面示意图。根据声纹形状类型与待处理声纹图,可以生成最终的声纹图,如图6a中的可视化声纹图800a、图6b中的可视化声纹图800b、图6c中的可视化声纹图800c,并在即时通讯应用的聊天窗口中,输出包含所述声纹图和所述音频数据的消息栏。其中,对于图6a中的可视化声纹图800a,也可以称为条形声纹图,由于每个声纹条的宽度和声纹条之间的间距是预先设置好的固定值,可以根据每个声纹点的高度确定每个声纹条的高度,声纹点的数量即为声纹条的数量,因此可以根据每个声纹条的高度和声纹条的数量,生成可视化声纹图800a;对于图6b中的可视化声纹图800b,也可以称为曲线声纹图,可以根据声纹点的数量与每个声纹点的高度,将每个声纹点进行曲线连接,形成一条圆滑的声纹图曲线,即可视化声纹图800b;对于图6c中的可视化声纹图800c,从所有声纹点分别对应的高度中选择最小值,根据最小值确定每个声纹条的初始矩形框,根据声纹点的高度超出最小值的部分,确定每个声纹条的超出矩形框的个数(超出矩形框的宽度和高度是预先设置好的固定值),每个声纹点对应的声纹条可以由初始矩形框和超出矩形框构成(初始矩形框和超出矩形框的高度与声纹点高度相同),进而可以确定可视化声纹图800c。
步骤S312,响应针对所述消息栏的目标触发操作,对所述音频数据进行音频进度控制,并基于音频进度对所述声纹图进行显示控制。
其中,步骤S312的具体实现方式可以参见上述图3a-图3c所对应实施例中对步骤S201-步骤S207的描述,这里不再进行赘述。
请一并参见图7,是本申请实施例提供的一种基于即时通讯应用的语音消息技术实现模型图。如图7所示,用户在即时通讯应用的聊天场景下,发送者在打开即时通讯应用的聊天界面后,可以点击语音,并选择语音类型,然后按住语音操作面板中的语音图标说话,发送者说话的过程即为向发送者对应的即时通讯应用客户端(下面简称发送者客户端)传送数据的过程,因此在发送者说话时,发送者客户端可以开始收音,记录发送者的实时声音数据,并根据即时通讯应用中的规则,即将记录的实时声音数据转换成区间为[0,255]的分贝数据,并对记录到的实 时声音数据进行声音采样,当用户松开手指时,发送者客户端完成了声音数据的实时记录过程和采样过程,因此可以将语音音频数据(包括记录的实时声音数据和采样数据)发送至接收者对应的即时通讯应用客户端(下面简称接收者客户端)。接收者客户端在接收到发送者客户端发送的语音音频数据后,可以根据音频时长确定聊天界面声纹条个数(默认声纹图的声纹显示类型为条形显示类型),并根据采样数据,计算每个声纹条处的音量平均值,可以根据音量-高度曲线,确定各个声纹条高度,由于这样计算出来的高度较小,可以利用减速插值法,放大声纹高度,生成语音音频数据对应的声纹图。在发送者客户端根据采样数据,计算出每个声纹条处的音量平均值后,可以将语音音频数据传输给接收者,并在接收者的聊天界面上显示语音消息气泡(即语音消息栏),在语音消息气泡中还可以显示语音音频数据对应的声纹图。当接收者点击播放语音消息气泡包含的语音音频数据时,在语音消息气泡中可以显示进度指示游标,此时接收者客户端可以记录音频播放进度,若接收者按住进度指示游标并进行拖动,则停止播放该语音音频数据,且接收者客户端可以记录语音音频数据的进度,在接收者手指松开后,可以跳转到语音音频数据中停止拖动时的进度继续播放;若接收者点击暂停,则接收者客户端可以停止播放该语音音频数据,并记录当前音频进度。当接收者点击播放下一条语音消息时,接收者客户端可以清空上一条语音消息的音频记录,开始记录新的语音消息的音频进度,至此,完成了整个语音消息技术的实现过程。
可见,在即时通讯应用的聊天场景中,消息栏中展示音频数据对应的声纹图,用户可以点击消息栏播放/暂停语音,还可以通过可视化声纹图判断声音的区域,并可以滑动调节语音进度,同时调节进度时可实时观看语音对应的文字翻译,进而可以提高音频数据展示形式的多样性,丰富音频数据操作方式;并且能够高效地帮助用户收听、查看、操作语音消息,大大的增强了语音消息的互动性、阅读性、高效性,更好的促进即时通讯应用用户对语音消息的便捷使用。
请参见图8,是本申请实施例提供的另一种基于即时通讯应用的数据处理方法的流程示意图。如图8所示,该方法可以由如图12所示的基于即时通讯应用的数据处理设备执行,包括以下步骤:
步骤S401,在即时通讯应用中获取音频数据,并基于采样频率获取所述音频数据对应的采样音量数据;
步骤S402,根据所述音频数据对应的音频时长,确定所述音频数据对应的声纹点的数量,并基于所述采样音量数据,确定每个声纹点分别对应的高度;
其中,步骤S401-步骤S402的具体实现过程可以参见上述图2所对 应实施例中对步骤S101-步骤S102的描述,或者可以参见上述图4所对应实施例中对步骤S301-步骤S308的描述,这里不再进行赘述。
步骤S403,获取所述音频数据对应的消息栏显示类型,并提取与所述消息栏显示类型相匹配的声纹显示参数;
具体的,在确定了声纹点的数量与每个声纹点的高度后,终端设备可以获取该音频数据对应的消息栏显示类型,从本地存储中提取与该消息栏显示类型相匹配的声纹显示参数。换言之,用户可以从即时通讯应用中所提供的多种消息栏显示类型中选择任意一种消息栏显示类型,当消息栏显示类型中的消息栏背景颜色与声纹图的显示颜色相冲突(即颜色相同或相近,无法区分消息栏中的声纹图)时,可以从本地存储中提取与该消息栏显示类型相匹配的声纹显示参数,即提取与该消息栏显示类型中的背景颜色差异性较大的颜色作为声纹图的声纹显示颜色,也可以称为声纹显示参数。
其中,即时通讯应用中可以为用户提供多种消息栏显示类型,请一并参见图9,是本申请实施例提供的一种个性化消息显示类型的界面示意图。如图9所示,消息显示类型可以包括消息显示类型900a、消息显示类型900b、消息显示类型900c,在即时通讯应用中,可以根据音频数据的声音类型(如变声类型)自适应匹配该音频数据对应的消息显示类型,也可以是用户根据自身需求从客户端中选择满意的消息显示类型,进而客户端可以获取该消息显示类型,并提取与该消息显示类型相匹配的声纹显示参数。例如,用户为音频数据选择的消息显示类型为消息显示类型900a,则根据该消息显示类型900a的背景颜色信息,确定与该消息显示类型900a相匹配的声纹显示参数,若消息显示类型900a的背景颜色信息为黑色,则可以将声纹显示参数确定为白色等。
即时通讯应用的本地文件中可以存储有消息栏显示类型与声纹显示参数的对应关系,当终端设备获取到声纹图对应的声纹显示参数时,可以从本地文件中根据声纹显示参数查找消息栏对应的消息栏显示类型。
步骤S404,根据所述声纹显示参数、所述数量以及所述高度,生成所述音频数据对应的声纹图,输出包含所述声纹图和所述音频数据的消息栏;
具体的,根据声纹点的数量以及每个声纹点对应的高度,终端设备可以绘制出待处理声纹图,根据声纹显示参数,可以确定音频数据最终对应的声纹图,并在即时通讯应用的聊天界面上输出包含所述声纹图和所述音频数据的消息栏,此时的声纹图可以很明显地与消息栏中的背景颜色区分开来。
步骤S405,响应针对所述消息栏的目标触发操作,对所述音频数据 进行音频进度控制,并基于音频进度对所述声纹图进行显示控制。
其中,步骤S405的具体实现方式可以参见上述图3a-图3c所对应实施例中对步骤S201-步骤S207的描述,这里不再进行赘述。
请一并参见图10,是本申请实施例提供的一种消息栏功能模型的结构示意图。如图10所示,该消息栏功能模型100可以根据用户的环境因素,以及每个用户语音说话的习惯,识别当前语音音频数据中的语音信息,并分析语音信息的数据变化,将语音信息转换成图形进行可视化的消息展示。主要可以通过视觉信息渲染层110,功能操作展示120,气泡消息表现130来实现。其中,视觉信息渲染层110可以绘制可视化的声纹信息,并在绘制时判断声纹元素位置、声纹音节和声纹颜色变化;功能操作展示120可以为用户提供可操作的点击暂停/播放、按住调节语音、实时文字字段、长按转语音转文字等功能;气泡消息表现130可以为语音音频数据提供个性化消息栏显示类型,即为语音音频数据提供可变化的消息栏视觉背景。
可见,在即时通讯应用的聊天场景中,消息栏中展示音频数据对应的声纹图,用户可以点击消息栏播放/暂停语音,还可以通过可视化声纹图判断声音的区域,并可以滑动调节语音进度,同时调节进度时可实时观看语音对应的文字翻译,进而可以提高音频数据展示形式的多样性,丰富音频数据操作方式;并且能够高效地帮助用户收听、查看、操作语音消息,大大的增强了语音消息的互动性、阅读性、高效性,更好的促进即时通讯应用用户对语音消息的便捷使用。
请参见图11,是本申请实施例提供的一种基于即时通讯应用的数据处理装置的结构示意图。如图11所示,该基于即时通讯应用的数据处理装置1可以包括:采样模块10,生成模块20,响应模块30;
采样模块10,用于在即时通讯应用中获取音频数据,并基于采样频率获取所述音频数据对应的采样音量数据;
具体的,当用户在即时通讯应用的聊天窗口中,点击语音图标进入语音操作面板,选择需要的语音模式,并按住语音图标说话时,采样模块10可以实时记录该用户的声音数据,并将记录的用户声音数据作为音频数据。需要说明的是,当用户选择变声模式时,音频数据为该用户进行变声之后的声音数据。由于不同的操作系统对于音频数据的数字化表现方式存在差异,因此采样模块10需要对采集到的音频数据进行统一转换处理,转换成0-255范围的分贝数据。例如在某终端操作系统中,采样模块10采集到的声音数据对应的区间为[0,1],因此采样模块10需要将采集到的声音数据换算成区间为[0,255]的分贝数据。在将音频数据进行转换处理后,确定该音频数据对应的采样频率,并基于该采样频率对音频数据进行声音采样,得到采样音量数据,即对音频数据进行分贝 数据转换处理后,再进行声音采样。对于获取到的音频数据,可以先基于采样频率对音频数据进行声音采样,在得到采样音量数据后,将采样音量数据转换成0-255范围的分贝数据,即对音频数据进行声音采样后,再对采样音量数据进行分贝数据转换处理。例如,采样频率为100次/秒时,表示在音频数据中每秒钟可以采样100个声音数据点。
生成模块20,用于根据所述音频数据与所述采样音量数据,生成所述音频数据对应的声纹图,输出包含所述声纹图和所述音频数据的消息栏;
具体的,生成模块20可以获取音频数据对应的音频时长,并根据音频时长可以确定音频数据对应的声纹点的数量,进而可以确定每个声纹点在音频数据中对应的音频时长,并根据每个声纹点对应的音频时长内的采样音量数据,确定每个声纹点分别对应的高度。换言之,声纹点的高度与采样音量数据中声音音量有关,在预设音量范围内(如用户语音常用音量范围),声音音量越大,声纹点对应的高度就越高。为了保证消息栏的可读性与视觉上的美观,当采样音量数据中的声音音量低于预设音量范围(如60-150分贝)时,声纹点对应的高度取最小值;当采样音量数据中的声音音量高于预设音量范围时,声纹点对应的高度取最大值。根据上述确定的声纹点的数量和每个声纹点分别对应的高度,终端设备可以生成音频数据对应的声纹图,并可以在即时通讯应用的聊天窗口中,输出包含声纹图和音频数据的消息栏。其中,该声纹图是利用图形对音频数据进行可视化信息展示,即利用声纹图表达音频数据中声纹元素的位置,音节的高低(声纹图中,声纹点的高度可以表示为音频数据中声音音量的大小,声纹点高度的变化趋势,可以表示为音频数据中声音音量的变化趋势)。因此可以根据声纹图感知音频数据的声音大小、声音变化,进而可以使用户能够快速判断包含该音频数据的消息栏的操作方式(如听筒模式、免提模式、无声音状态等)。若声纹图中,声纹点对应的高度较低,则可以选择在免提模式对该消息栏进行操作;若声纹图中,声纹点对应的高度较高,则可以选择在无声音状态下或者听筒模式下对该消息栏进行操作。
响应模块30,用于响应针对所述消息栏的目标触发操作,对所述音频数据进行音频进度控制,并基于音频进度对所述声纹图进行显示控制。
具体的,响应模块30可以响应用户针对上述消息栏的目标触发操作,以对音频数据进行音频进度控制,并基于音频进度对所述声纹图进行显示控制,即可以实时记录音频数据的进度信息,并根据进度信息在包含音频数据的消息栏中,显示音频数据中已读音频数据与未读音频数据进度。其中,目标触发操作可以包括播放触发操作、暂停触发操作、 拖动触发操作,还可以包括语音转文字触发操作、翻译触发操作等。
请一并参见图11,该基于即时通讯应用的数据处理装置1还可以包括:转换模块40,翻译模块50;
转换模块40,用于响应针对所述消息栏的文本转换触发操作,将所述音频数据转换成第一文本数据,并在所述声纹图对应的第二文字显示区域中,显示所述第一文本数据;
翻译模块50,用于响应针对所述第一文本数据的翻译触发操作,对所述第一文本数据进行文本类型转换处理,得到第二文本数据,并在所述第二文字显示区域中显示所述第二文本数据。
其中,转换模块40,翻译模块50的具体功能实现方式可以参见上述图3c所对应实施例中的步骤S206-步骤S207,这里不再进行赘述。
请一并参见图11,生成模块20可以包括:数量确定单元201,高度确定单元202,声纹图生成单元203;
数量确定单元201,用于根据所述音频数据对应的音频时长,确定所述音频数据对应的声纹点的数量;
高度确定单元202,用于基于所述采样音量数据,确定每个声纹点分别对应的高度;
声纹图生成单元203,用于根据所述数量与所述高度,生成所述音频数据对应的声纹图。
其中,数量确定单元201,高度确定单元202的具体功能实现方式可以参见上述图4所对应实施例中的步骤S302-步骤S308,声纹图生成单元203的具体功能实现方式可以参见上述图4所对应实施例中的步骤S309-步骤S311和上述图8所对应实施例中的步骤S403-步骤S404,这里不再进行赘述。
请一并参见图11,响应模块30可以包括:第一播放操作响应单元301,暂停操作响应单元302,第二播放操作响应单元303,拖动操作响应单元304,语音播放单元305;
第一播放操作响应单元301,用于响应针对所述消息栏的第一播放触发操作,对所述音频数据进行语音播放,并记录所述音频数据的音频播放进度,根据所述音频播放进度在声纹图中显示所述进度指示游标;
暂停操作响应单元302,用于响应针对所述消息栏的暂停触发操作,停止对所述音频数据进行语音播放,并记录停止时所述进度指示游标所处位置的停止时间戳;
第二播放操作响应单元303,用于响应针对所述消息栏的第二播放触发操作,从所述音频数据中的所述停止时间戳所在位置开始播放语音;
拖动操作响应单元304,用于响应针对所述消息栏中的所述进度指 示游标的拖动触发操作,获取所拖动的所述进度指示游标在音频数据中的第一时间戳,在所述声纹图对应的第一文字显示区域中,显示所述第一时间戳对应的音频数据的文字信息,并根据所拖动的所述进度指示游标对所述已播放声纹区域和所述未播放声纹区域进行区域更新;
语音播放单元305,用于获取拖动结束时所述进度指示游标在音频数据中的第二时间戳,从所述音频数据中的所述第二时间戳所在位置开始播放语音。
其中,第一播放操作响应单元301,暂停操作响应单元302,第二播放操作响应单元303,拖动操作响应单元304,语音播放单元305的具体功能实现方式可以参见上述图3a-图3c所对应实施例中的步骤S201-步骤S205,这里不再进行赘述。
请一并参见图11,数量确定单元201可以包括:时长获取子单元2011,长度获取子单元2012,数量确定子单元2013;
时长获取子单元2011,用于获取所述音频数据对应的音频时长;
长度获取子单元2012,用于根据所述音频时长确定所述即时通讯应用中的消息栏的长度;
数量确定子单元2013,用于根据所述消息栏的长度与相邻声纹点之间的距离,确定所述音频数据对应的声纹点的数量。
其中,时长获取子单元2011,长度获取子单元2012,数量确定子单元2013的具体功能实现方式可以参见上述图4所对应实施例中的步骤S302-步骤S305,这里不再进行赘述。
请一并参见图11,高度确定单元202可以包括:单位时长确定子单元2021,待处理高度确定子单元2022,声纹高度确定子单元2023;
单位时长确定子单元2021,用于根据所述音频时长,确定每个声纹点分别对应的单位音频时长;
待处理高度确定子单元2022,用于获取所述单位音频时长内的采样音量数据对应的音量均值,基于所述音量均值确定每个声纹点分别对应的待处理高度;
声纹高度确定子单元2023,用于获取所述待处理高度对应的插值参数信息,基于所述插值参数信息与所述待处理高度,确定每个声纹点分别对应的高度。
其中,单位时长确定子单元2021,待处理高度确定子单元2022,声纹高度确定子单元2023的具体功能实现方式可以参见上述图4所对应实施例中的步骤S306-步骤S308,这里不再进行赘述。
请一并参见图11,声纹图生成单元203可以包括:待处理声纹图生成子单元2031,声纹形状选择子单元2032,第一声纹图确定子单元2033,显示参数提取子单元2034,第二声纹图确定子单元2035;
待处理声纹图生成子单元2031,用于根据所述数量与所述高度,生成所述音频数据对应的待处理声纹图;
声纹形状选择子单元2032,用于获取所述音频数据对应的声音参数,从声纹图库中选择与所述声音参数相匹配的声纹形状类型;
第一声纹图确定子单元2033,用于根据所述声纹形状类型与所述待处理声纹图,确定所述音频数据对应的声纹图;
显示参数提取子单元2034,用于获取所述音频数据对应的消息栏显示类型,并提取与所述消息栏显示类型相匹配的声纹显示参数;
第二声纹图确定子单元2035,用于根据所述声纹显示参数、所述数量以及所述高度,生成所述音频数据对应的声纹图。
其中,生成模块30可以包括:待处理声纹图生成子单元2031,声纹形状选择子单元2032,第一声纹图确定子单元2033,其具体功能实现方式可以参见上述图4所对应实施例中的步骤S309-步骤S311;生成模块30还可以包括:显示参数提取子单元2034,第二声纹图确定子单元2035,其具体功能实现方式可以参见上述图8所对应实施例中的步骤S403-步骤S404,这里不再进行赘述。
请一并参见图11,数量确定子单元2013可以包括:声纹长度确定子单元20131,声纹点数量确定子单元20132;
声纹长度确定子单元20131,用于根据所述消息栏对应的预留边距与所述消息栏的长度,确定音频数据对应的声纹区域长度;
声纹点数量确定子单元20132,用于根据所述声纹区域长度、声纹点图形尺寸以及相邻声纹点之间的距离,确定所述音频数据对应的声纹点的数量。
其中,声纹长度确定子单元20131,声纹点数量确定子单元20132的具体功能实现方式可以参见上述图4所对应实施例中的步骤S304-步骤S305,这里不再进行赘述。
请一并参见图11,待处理高度确定子单元2022可以包括:均值确定子单元20221,第一高度确定子单元20222,第二高度确定子单元20223,第三高度确定子单元20224;
均值确定子单元20221,用于获取所述单位音频时长内的目标采样数据对应的音量均值;
第一高度确定子单元20222,用于若所述音量均值小于第一音量阈值,则将目标数值确定为所述音量均值对应的声纹点的待处理高度;
第二高度确定子单元20223,用于若所述音量均值大于或等于所述第一音量阈值且小于第二音量阈值,则根据音量与高度之间的线性增长函数,确定所述音量均值对应的声纹点的待处理高度;
第三高度确定子单元20224,用于若所述音量均值大于或等于所述 第二音量阈值,则根据音量与高度之间的对数增长函数,确定所述音量均值对应的声纹点的待处理高度。
其中,均值确定子单元20221,第一高度确定子单元20222,第二高度确定子单元20223,第三高度确定子单元20224的具体功能实现方式可以参见上述图4所对应实施例中的步骤307,这里不再进行赘述。
可见,在即时通讯应用的聊天场景中,消息栏中展示音频数据对应的声纹图,用户可以点击消息栏播放/暂停语音,还可以通过可视化声纹图判断声音的区域,并可以滑动调节语音进度,同时调节进度时可实时观看语音对应的文字翻译,进而可以提高音频数据展示形式的多样性,丰富音频数据操作方式;并且能够高效地帮助用户收听、查看、操作语音消息,大大的增强了语音消息的互动性、阅读性、高效性,更好的促进即时通讯应用用户对语音消息的便捷使用。
请参见图12,图12是本申请实施例提供的一种基于即时通讯应用的数据处理设备的结构示意图。如图12所示,该基于即时通讯应用的数据处理设备1000可以包括:处理器1001,网络接口1004和存储器1005,此外,上述基于即时通讯应用的数据处理设备1000还可以包括:用户接口1003,和至少一个通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。其中,用户接口1003可以包括显示屏(Display)、键盘(Keyboard),用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1004可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器1005还可以是至少一个位于远离前述处理器1001的存储装置。如图12所示,作为一种计算机可读存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及设备控制应用程序。
在如图12所示的基于即时通讯应用的数据处理设备1000中,网络接口1004可提供网络通讯功能;而用户接口1003主要用于为用户提供输入;而处理器1001可以用于调用存储器1005中存储的设备控制应用程序,以实现上述图2、图4、图8任一个所对应实施例中对所述基于即时通讯应用的数据处理方法的描述,在此不再赘述。
应当理解,本申请实施例中所描述的基于即时通讯应用的数据处理设备1000可执行前文图2、图4、图8任一个所对应实施例中对所述基于即时通讯应用的数据处理方法的描述,也可执行前文图11所对应实施例中对所述基于即时通讯应用的数据处理装置1的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
此外,这里需要指出的是:本申请实施例还提供了一种计算机可读存储介质,且所述计算机可读存储介质中存储有前文提及的基于即时通 讯应用的数据处理装置1所执行的计算机程序,且所述计算机程序包括程序指令,当所述处理器执行所述程序指令时,能够执行前文图2、图4、图8任一个所对应实施例中对所述基于即时通讯应用的数据处理方法的描述,因此,这里将不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的计算机可读存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储器(Read-Only Memory,ROM)或随机存储器(Random Access Memory,RAM)等。
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。

Claims (20)

  1. 一种基于即时通讯应用的数据处理方法,由基于即时通讯应用的数据处理设备执行,包括:
    在即时通讯应用中获取音频数据,并基于采样频率获取所述音频数据对应的采样音量数据;
    根据所述音频数据与所述采样音量数据,生成所述音频数据对应的声纹图,输出包含所述声纹图和所述音频数据的消息栏;
    响应针对所述消息栏的目标触发操作,对所述音频数据进行音频进度控制,并基于音频进度对所述声纹图进行显示控制。
  2. 根据权利要求1所述的方法,其中,所述根据所述音频数据与所述采样音量数据,生成所述音频数据对应的声纹图,包括:
    根据所述音频数据对应的音频时长,确定所述音频数据对应的声纹点的数量;
    基于所述采样音量数据,确定每个声纹点分别对应的高度;
    根据所述数量与所述高度,生成所述音频数据对应的声纹图。
  3. 根据权利要求2所述的方法,其中,所述根据所述音频数据对应的音频时长,确定所述音频数据对应的声纹点的数量,包括:
    根据所述音频时长确定所述即时通讯应用中的消息栏的长度;
    根据所述消息栏的长度与相邻声纹点之间的距离,确定所述音频数据对应的声纹点的数量。
  4. 根据权利要求3所述的方法,其中,所述根据所述消息栏的长度与相邻声纹点之间的距离,确定所述音频数据对应的声纹点的数量,包括:
    根据所述消息栏对应的预留边距与所述消息栏的长度,确定音频数据对应的声纹区域长度;
    根据所述声纹区域长度、声纹点图形尺寸以及相邻声纹点之间的距离,确定所述音频数据对应的声纹点的数量。
  5. 根据权利要求2所述的方法,其中,所述基于所述采样音量数据,确定每个声纹点分别对应的高度,包括:
    根据所述音频时长,确定每个声纹点分别对应的单位音频时长;
    获取所述单位音频时长内的采样音量数据对应的音量均值,基于所述音量均值确定每个声纹点分别对应的待处理高度;
    获取所述待处理高度对应的插值参数信息,基于所述插值参数信息与所述待处理高度,确定每个声纹点分别对应的高度。
  6. 根据权利要求5所述的方法,其中,所述获取所述单位音频时 长内的采样音量数据对应的音量均值,基于所述音量均值确定每个声纹点分别对应的待处理高度,包括:
    获取所述单位音频时长内的目标采样数据对应的音量均值;
    若所述音量均值小于第一音量阈值,则将目标数值确定为所述音量均值对应的声纹点的待处理高度;
    若所述音量均值大于或等于所述第一音量阈值且小于第二音量阈值,则根据音量与高度之间的线性增长函数,确定所述音量均值对应的声纹点的待处理高度;
    若所述音量均值大于或等于所述第二音量阈值,则根据音量与高度之间的对数增长函数,确定所述音量均值对应的声纹点的待处理高度。
  7. 根据权利要求1所述的方法,其中,响应所述目标触发操作后的消息栏包括进度指示游标;所述进度指示游标用于区分所述声纹图中的已播放声纹区域和未播放声纹区域,所述已播放声纹区域和所述未播放声纹区域具有不同的显示方式。
  8. 根据权利要求7所述的方法,其中,所述目标触发操作包括第一播放触发操作;
    所述响应针对所述消息栏的目标触发操作,对所述音频数据进行音频进度控制,并基于音频进度对所述声纹图进行显示控制,包括:
    响应针对所述消息栏的第一播放触发操作,对所述音频数据进行语音播放,并记录所述音频数据的音频播放进度,根据所述音频播放进度在声纹图中显示所述进度指示游标。
  9. 根据权利要求7所述的方法,其中,所述目标触发操作包括暂停触发操作;
    所述响应针对所述消息栏的目标触发操作,对所述音频数据进行音频进度控制,并基于音频进度对所述声纹图进行显示控制,包括:
    响应针对所述消息栏的暂停触发操作,停止对所述音频数据进行语音播放,并记录停止时所述进度指示游标所处位置的停止时间戳。
  10. 根据权利要求7所述的方法,其中,所述目标触发操作包括第二播放触发操作;
    所述响应针对所述消息栏的目标触发操作,对所述音频数据进行音频进度控制,并基于音频进度对所述声纹图进行显示控制,包括:
    响应针对所述消息栏的第二播放触发操作,从所述音频数据中的所述停止时间戳所在位置开始播放语音。
  11. 根据权利要求7所述的方法,其中,所述目标触发操作包括拖动触发操作;
    所述响应针对所述消息栏的目标触发操作,对所述音频数据进行音频进度控制,并基于音频进度对所述声纹图进行显示控制,包括:
    响应针对所述消息栏中的所述进度指示游标的拖动触发操作,获取所拖动的所述进度指示游标在音频数据中的第一时间戳,在所述声纹图对应的第一文字显示区域中,显示所述第一时间戳对应的音频数据的文字信息,并根据所拖动的所述进度指示游标对所述已播放声纹区域和所述未播放声纹区域进行区域更新;
    获取拖动结束时所述进度指示游标在音频数据中的第二时间戳,从所述音频数据中的所述第二时间戳所在位置开始播放语音。
  12. 根据权利要求2所述方法,其中,所述根据所述数量与所述高度,生成所述音频数据对应的声纹图,包括:
    根据所述数量与所述高度,生成所述音频数据对应的待处理声纹图;
    获取所述音频数据对应的声音参数,从声纹图库中选择与所述声音参数相匹配的声纹形状类型;
    根据所述声纹形状类型与所述待处理声纹图,确定所述音频数据对应的声纹图。
  13. 根据权利要求2所述方法,其中,所述根据所述数量与所述高度,生成所述音频数据对应的声纹图,包括:
    获取所述音频数据对应的消息栏显示类型,并提取与所述消息栏显示类型相匹配的声纹显示参数;
    根据所述声纹显示参数、所述数量以及所述高度,生成所述音频数据对应的声纹图。
  14. 根据权利要求1所述的方法,还包括:
    响应针对所述消息栏的文本转换触发操作,将所述音频数据转换成第一文本数据,并在所述声纹图对应的第二文字显示区域中,显示所述第一文本数据;
    响应针对所述第一文本数据的翻译触发操作,对所述第一文本数据进行文本类型转换处理,得到第二文本数据,并在所述第二文字显示区域中显示所述第二文本数据。
  15. 一种基于即时通讯应用的数据处理装置,包括:
    采样模块,用于在即时通讯应用中获取音频数据,并基于采样频率获取所述音频数据对应的采样音量数据;
    生成模块,用于根据所述音频数据与所述采样音量数据,生成所述音频数据对应的声纹图,输出包含所述声纹图和所述音频数据的消息栏;
    响应模块,用于响应针对所述消息栏的目标触发操作,对所述音频数据进行音频进度控制,并基于音频进度对所述声纹图进行显示控制。
  16. 根据权利要求15所述的装置,其中,所述生成模块包括:
    数量确定单元,用于根据所述音频数据对应的音频时长,确定所述音频数据对应的声纹点的数量;
    高度确定单元,用于基于所述采样音量数据,确定每个声纹点分别对应的高度;
    声纹图生成单元,用于根据所述数量与所述高度,生成所述音频数据对应的声纹图。
  17. 根据权利要求15所述的装置,其中,所述装置进一步包括:
    转换模块,用于响应针对所述消息栏的文本转换触发操作,将所述音频数据转换成第一文本数据,并在所述声纹图对应的第二文字显示区域中,显示所述第一文本数据;
    翻译模块,用于响应针对所述第一文本数据的翻译触发操作,对所述第一文本数据进行文本类型转换处理,得到第二文本数据,并在所述第二文字显示区域中显示所述第二文本数据。
  18. 根据权利要求15所述的装置,其中,响应所述目标触发操作后的消息栏包括进度指示游标;所述进度指示游标用于区分所述声纹图中的已播放声纹区域和未播放声纹区域,所述已播放声纹区域和所述未播放声纹区域具有不同的显示方式。
  19. 一种基于即时通讯应用的数据处理设备,包括:处理器和存储器;
    所述处理器和存储器相连,其中,所述存储器用于存储程序代码,所述处理器用于调用所述程序代码,以执行如权利要求1-14任一项所述的方法。
  20. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时,执行如权利要求1-14任一项所述的方法。
PCT/CN2020/083485 2019-04-12 2020-04-07 一种基于即时通讯应用的数据处理方法、装置、设备和存储介质 WO2020207375A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/317,389 US11683278B2 (en) 2019-04-12 2021-05-11 Spectrogram and message bar generation based on audio data in an instant messaging application

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910295763.6 2019-04-12
CN201910295763.6A CN111817943B (zh) 2019-04-12 2019-04-12 一种基于即时通讯应用的数据处理方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/317,389 Continuation US11683278B2 (en) 2019-04-12 2021-05-11 Spectrogram and message bar generation based on audio data in an instant messaging application

Publications (1)

Publication Number Publication Date
WO2020207375A1 true WO2020207375A1 (zh) 2020-10-15

Family

ID=72751926

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/083485 WO2020207375A1 (zh) 2019-04-12 2020-04-07 一种基于即时通讯应用的数据处理方法、装置、设备和存储介质

Country Status (3)

Country Link
US (1) US11683278B2 (zh)
CN (2) CN111817943B (zh)
WO (1) WO2020207375A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112530399A (zh) * 2020-11-30 2021-03-19 上海明略人工智能(集团)有限公司 一种语音数据的扩充方法、系统、电子设备及存储介质
CN113141298A (zh) * 2021-05-14 2021-07-20 网易(杭州)网络有限公司 消息处理方法、消息处理装置、存储介质及电子设备
CN113364915A (zh) * 2021-06-02 2021-09-07 维沃移动通信(杭州)有限公司 信息显示方法、装置和电子设备

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9954996B2 (en) 2007-06-28 2018-04-24 Apple Inc. Portable electronic device with conversation management for incoming instant messages
US9185062B1 (en) 2014-05-31 2015-11-10 Apple Inc. Message user interfaces for capture and transmittal of media and location content
US11431836B2 (en) 2017-05-02 2022-08-30 Apple Inc. Methods and interfaces for initiating media playback
US20220279063A1 (en) 2017-05-16 2022-09-01 Apple Inc. Methods and interfaces for home media control
CN111343060B (zh) 2017-05-16 2022-02-11 苹果公司 用于家庭媒体控制的方法和界面
US10372298B2 (en) 2017-09-29 2019-08-06 Apple Inc. User interface for multi-user communication session
EP3769510A1 (en) 2018-05-07 2021-01-27 Apple Inc. User interfaces for viewing live video feeds and recorded video
DK201870364A1 (en) 2018-05-07 2019-12-03 Apple Inc. MULTI-PARTICIPANT LIVE COMMUNICATION USER INTERFACE
US11128792B2 (en) 2018-09-28 2021-09-21 Apple Inc. Capturing and displaying images with multiple focal planes
DK201970533A1 (en) 2019-05-31 2021-02-15 Apple Inc Methods and user interfaces for sharing audio
US11620103B2 (en) 2019-05-31 2023-04-04 Apple Inc. User interfaces for audio media control
US11363071B2 (en) 2019-05-31 2022-06-14 Apple Inc. User interfaces for managing a local network
US10996917B2 (en) 2019-05-31 2021-05-04 Apple Inc. User interfaces for audio media control
US10904029B2 (en) 2019-05-31 2021-01-26 Apple Inc. User interfaces for managing controllable external devices
US11079913B1 (en) 2020-05-11 2021-08-03 Apple Inc. User interface for status indicators
USD989113S1 (en) * 2020-09-21 2023-06-13 Meta Platforms, Inc. Display screen or portion thereof with a graphical user interface
US11868676B2 (en) * 2020-09-23 2024-01-09 Snap Inc. Augmenting image content with sound
US11392291B2 (en) 2020-09-25 2022-07-19 Apple Inc. Methods and interfaces for media control with dynamic feedback
US11431891B2 (en) 2021-01-31 2022-08-30 Apple Inc. User interfaces for wide angle video conference
US20220368548A1 (en) 2021-05-15 2022-11-17 Apple Inc. Shared-content session user interfaces
US11907605B2 (en) 2021-05-15 2024-02-20 Apple Inc. Shared-content session user interfaces
US11893214B2 (en) 2021-05-15 2024-02-06 Apple Inc. Real-time communication user interface
US11770600B2 (en) 2021-09-24 2023-09-26 Apple Inc. Wide angle video conference
US11570396B1 (en) * 2021-11-24 2023-01-31 Dish Network L.L.C. Audio trick mode
CN114780180B (zh) * 2021-12-21 2024-08-16 北京达佳互联信息技术有限公司 一种对象数据显示方法、装置、电子设备及存储介质
CN116708641A (zh) * 2023-07-10 2023-09-05 广东九安智能科技股份有限公司 一种通话音量的显示方法、装置、设备和介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102780653A (zh) * 2012-08-09 2012-11-14 上海量明科技发展有限公司 即时通信中快捷通信的方法、客户端及系统
US20160253434A1 (en) * 2013-10-28 2016-09-01 Zili Yu Natural Expression Processing Method, Processing and Response Method, Device, and System
CN105931657A (zh) * 2016-04-19 2016-09-07 乐视控股(北京)有限公司 音频文件的播放方法、装置及移动终端
CN107395352A (zh) * 2016-05-16 2017-11-24 腾讯科技(深圳)有限公司 基于声纹的身份识别方法及装置
CN107592415A (zh) * 2017-08-31 2018-01-16 努比亚技术有限公司 语音发送方法、终端和计算机可读存储介质

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8595007B2 (en) * 2006-06-15 2013-11-26 NITV Federal Services, LLC Voice print recognition software system for voice identification and matching
US8099134B2 (en) * 2008-12-19 2012-01-17 Verizon Patent And Licensing Inc. Visual manipulation of audio
US8660545B1 (en) * 2010-01-06 2014-02-25 ILook Corporation Responding to a video request by displaying information on a TV remote and video on the TV
US8977962B2 (en) * 2011-09-06 2015-03-10 Apple Inc. Reference waveforms
US9093056B2 (en) * 2011-09-13 2015-07-28 Northwestern University Audio separation system and method
KR102108893B1 (ko) * 2013-07-11 2020-05-11 엘지전자 주식회사 이동 단말기
US9419935B2 (en) * 2013-08-02 2016-08-16 Whatsapp Inc. Voice communications with real-time status notifications
US10845982B2 (en) * 2014-04-28 2020-11-24 Facebook, Inc. Providing intelligent transcriptions of sound messages in a messaging application
CN112152906B (zh) * 2015-02-16 2023-04-07 钉钉控股(开曼)有限公司 通讯方法及服务器
US20160328105A1 (en) * 2015-05-06 2016-11-10 Microsoft Technology Licensing, Llc Techniques to manage bookmarks for media files
CN107305459A (zh) * 2016-04-25 2017-10-31 阿里巴巴集团控股有限公司 语音和多媒体消息的发送方法及装置
CN107844470B (zh) * 2016-09-18 2021-04-30 腾讯科技(深圳)有限公司 一种语音数据处理方法及其设备
CN107274906A (zh) * 2017-06-28 2017-10-20 百度在线网络技术(北京)有限公司 语音信息处理方法、装置、终端及存储介质
CN107481723A (zh) * 2017-08-28 2017-12-15 清华大学 一种用于声纹识别的信道匹配方法及其装置
CN107623614B (zh) * 2017-09-19 2020-12-08 百度在线网络技术(北京)有限公司 用于推送信息的方法和装置
US20190139280A1 (en) * 2017-11-06 2019-05-09 Microsoft Technology Licensing, Llc Augmented reality environment for tabular data in an image feed
CN108347512B (zh) * 2018-01-22 2020-08-28 维沃移动通信有限公司 一种身份识别方法及移动终端
US20190362022A1 (en) * 2018-05-25 2019-11-28 Risto Haukioja Audio file labeling process for building datasets at scale

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102780653A (zh) * 2012-08-09 2012-11-14 上海量明科技发展有限公司 即时通信中快捷通信的方法、客户端及系统
US20160253434A1 (en) * 2013-10-28 2016-09-01 Zili Yu Natural Expression Processing Method, Processing and Response Method, Device, and System
CN105931657A (zh) * 2016-04-19 2016-09-07 乐视控股(北京)有限公司 音频文件的播放方法、装置及移动终端
CN107395352A (zh) * 2016-05-16 2017-11-24 腾讯科技(深圳)有限公司 基于声纹的身份识别方法及装置
CN107592415A (zh) * 2017-08-31 2018-01-16 努比亚技术有限公司 语音发送方法、终端和计算机可读存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112530399A (zh) * 2020-11-30 2021-03-19 上海明略人工智能(集团)有限公司 一种语音数据的扩充方法、系统、电子设备及存储介质
CN113141298A (zh) * 2021-05-14 2021-07-20 网易(杭州)网络有限公司 消息处理方法、消息处理装置、存储介质及电子设备
CN113141298B (zh) * 2021-05-14 2023-05-12 网易(杭州)网络有限公司 消息处理方法、消息处理装置、存储介质及电子设备
CN113364915A (zh) * 2021-06-02 2021-09-07 维沃移动通信(杭州)有限公司 信息显示方法、装置和电子设备

Also Published As

Publication number Publication date
CN114938360B (zh) 2023-04-18
US20210266274A1 (en) 2021-08-26
US11683278B2 (en) 2023-06-20
CN114938360A (zh) 2022-08-23
CN111817943B (zh) 2022-06-14
CN111817943A (zh) 2020-10-23

Similar Documents

Publication Publication Date Title
WO2020207375A1 (zh) 一种基于即时通讯应用的数据处理方法、装置、设备和存储介质
US9239949B2 (en) Method for user function operation based on face recognition and mobile terminal supporting the same
JP6060989B2 (ja) 音声録音装置、音声録音方法、及びプログラム
US20170085696A1 (en) Transcription of Spoken Communications
CN110164437B (zh) 一种即时通信的语音识别方法和终端
WO2017168936A1 (ja) 情報処理装置、情報処理方法、及びプログラム
EP3182260A1 (en) Character editing method and device for screen display device
JP2005512231A (ja) テキストメッセージにおける感情表現方法
JP2008500573A (ja) メッセージを変更するための方法及びシステム
WO2020249038A1 (zh) 音频流的处理方法、装置、移动终端及存储介质
WO2021082637A1 (zh) 音频信息处理方法、装置、电子设备及存储介质
WO2019071808A1 (zh) 视频画面显示的方法、装置、系统、终端设备及存储介质
CN109782997B (zh) 一种数据处理方法、装置及存储介质
CN110943908A (zh) 语音消息发送方法、电子设备及介质
KR101756836B1 (ko) 음성데이터를 이용한 문서생성 방법 및 시스템과, 이를 구비한 화상형성장치
JP2010276733A (ja) 情報表示装置、情報表示方法および情報表示プログラム
JP2013196661A (ja) 入力制御プログラム、入力制御装置、入力制御システム、および入力制御方法
JP6570893B2 (ja) 翻訳支援システムおよび情報処理装置
JP2016091057A (ja) 電子機器
KR20120126649A (ko) 통화 내용 제공 방법, 그 제공 시스템 및 그 제공 방법을 기록한 기록매체
WO2020070959A1 (ja) 通訳システム、サーバ装置、配信方法、および記録媒体
CN113079086B (zh) 消息发送方法、消息发送装置、电子设备和存储介质
WO2019017033A1 (ja) 情報処理装置、情報処理方法、およびプログラム
CN114491087A (zh) 文本处理方法、装置、电子设备以及存储介质
CN114373464A (zh) 文本展示方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20787026

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20787026

Country of ref document: EP

Kind code of ref document: A1