CN115527541A - Information processing method and terminal equipment - Google Patents

Information processing method and terminal equipment Download PDF

Info

Publication number
CN115527541A
CN115527541A CN202110708975.XA CN202110708975A CN115527541A CN 115527541 A CN115527541 A CN 115527541A CN 202110708975 A CN202110708975 A CN 202110708975A CN 115527541 A CN115527541 A CN 115527541A
Authority
CN
China
Prior art keywords
recording
interface
voiceprint
segment
current recording
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110708975.XA
Other languages
Chinese (zh)
Inventor
赵子龙
王倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hisense Mobile Communications Technology Co Ltd
Original Assignee
Hisense Mobile Communications Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hisense Mobile Communications Technology Co Ltd filed Critical Hisense Mobile Communications Technology Co Ltd
Priority to CN202110708975.XA priority Critical patent/CN115527541A/en
Publication of CN115527541A publication Critical patent/CN115527541A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/106Display of layout of documents; Previewing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses an information processing method and terminal equipment, which are used for distinguishing character roles appearing in a recording through voiceprint recognition while the user records the recording, and performing voice recognition conversion and displaying in a text conversation mode. The information processing method provided by the application comprises the following steps: determining a current recording fragment based on the recording recorded in real time, and carrying out voiceprint feature acquisition and speech-to-text recognition on the current recording fragment; and comparing the voiceprint characteristics corresponding to the current recording fragment with the acquired voiceprint characteristics, and outputting and displaying the characters corresponding to the current recording fragment according to the comparison result.

Description

Information processing method and terminal equipment
Technical Field
The present application relates to the field of information technologies, and in particular, to an information processing method and a terminal device.
Background
In a scene that a user uses a recording function, there is often more than one role. Meetings, interviews, and the like are the main usage scenarios of the sound recording function, and a plurality of task roles often exist in the scenarios.
A common sound recording apparatus can record only received sound as one sound recording file. If the user needs to sort the recording records, it is difficult to distinguish the recording roles and the speaking time points of the roles, and the user can only listen to the recording files repeatedly from beginning to end.
Disclosure of Invention
The embodiment of the application provides an information processing method and terminal equipment, which are used for distinguishing character roles appearing in a recording through voiceprint recognition while the user records the recording, and displaying the voice recognition conversion in a text conversation mode.
An information processing method provided by an embodiment of the present application includes:
determining a current recording fragment based on the recording recorded in real time, and performing voiceprint feature acquisition and speech-to-text recognition on the current recording fragment;
and comparing the voiceprint features corresponding to the current recording segment with the collected voiceprint features, and outputting and displaying the characters corresponding to the current recording segment according to the comparison result.
Determining a current recording fragment based on the real-time recorded recording, and carrying out voiceprint feature acquisition and speech-to-text recognition on the current recording fragment; the voiceprint characteristics corresponding to the current recording fragment are compared with the collected voiceprint characteristics, and the characters corresponding to the current recording fragment are output and displayed according to the comparison result, so that the characters appearing in the recording are distinguished through voiceprint recognition while the user records the recording, and the voice recognition is converted and displayed in a character dialogue mode.
Optionally, according to the comparison result, outputting and displaying the text corresponding to the current recording segment, specifically including:
if the voiceprint features corresponding to the current recording fragment are the same as the collected voiceprint features, marking the voiceprint features corresponding to the current recording fragment as marks of the same voiceprint features; if the voiceprint characteristics corresponding to the current recording fragment are different from the acquired voiceprint characteristics, setting a new mark for the voiceprint characteristics corresponding to the current recording fragment;
if the voice print characteristics corresponding to the current recording segment are the same as the voice print characteristics of the previous recording segment, combining and displaying the characters corresponding to the current recording segment in a conversation frame of the character output display corresponding to the previous recording segment; and if the voiceprint features corresponding to the current recording segment are different from the voiceprint features of the previous recording segment, independently displaying the characters corresponding to the current recording segment in a new session frame, and simultaneously displaying marks of the voiceprint features corresponding to the current recording segment.
Optionally, determining the current recording segment based on the real-time recorded recording specifically includes:
and (4) punctuating the recorded sound in real time, wherein each natural sentence is used as a sound recording fragment.
Optionally, the method further comprises:
and receiving a recording list key instruction through an application main interface provided for a user, and outputting a recording list interface to the user, wherein the recording list interface displays preset symbol marks for a recording file containing a role and character recognition.
Optionally, the method further comprises:
receiving a selection instruction of a user on a recording file containing role and character identification through the recording list interface, and outputting a recording playing interface, wherein the recording playing interface at least comprises one or the combination of the following information:
the mark of the voiceprint characteristic, the list of the dialog boxes, and the time point and the time length of the corresponding recording of each dialog box.
Optionally, the method further comprises:
and receiving a selection instruction of a user for any conversation frame through the recording playing interface, and displaying a role setting and editing interface which comprises the text content in the conversation frame for the user to edit.
Optionally, the role setting and editing interface further includes: the function menu at least comprises one or a combination of the following function keys:
modifying roles, naming roles, setting head portraits and playing.
Optionally, the method further comprises:
the recording playing interface also comprises an export text key, when a user clicks the export text key, a file interface showing a preset format is output, and the file interface at least comprises one or a combination of the following contents of the recording file:
name of recording file, recording time, and session text.
Optionally, the method further comprises:
the file interface also comprises a storage confirmation key, and when the user clicks the storage confirmation key, a file in a preset format is generated.
Another embodiment of the present application provides a terminal device, which includes a memory and a processor, where the memory is used to store program instructions, and the processor is used to call the program instructions stored in the memory, and execute any one of the above methods according to the obtained program.
Another embodiment of the present application provides a computer storage medium having stored thereon computer-executable instructions for causing a computer to perform any one of the methods described above.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic diagram of a mobile phone interface for distinguishing and displaying roles provided in the present embodiment;
fig. 2 is a schematic diagram illustrating a first dialog box displaying text recognized and displayed by a voiceprint feature "a" according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram that illustrates text displayed by identifying the voiceprint feature "B" in a second dialog box according to the embodiment of the present application;
fig. 4 is a schematic view of an operation interface for playing a recording according to an embodiment of the present application;
fig. 5 is a schematic view of an operation interface for setting and editing a role provided in the embodiment of the present application;
fig. 6 is a schematic view of an operation interface for editing a text according to an embodiment of the present application;
FIG. 7 is a schematic view of an operation interface of a modifying role provided in an embodiment of the present application;
fig. 8 is a schematic view of an operation interface for naming a role provided in the embodiment of the present application;
FIG. 9 is a schematic view of an operation interface for setting a head portrait according to an embodiment of the present application;
fig. 10 is a schematic view of an operation interface of a separate play session box according to an embodiment of the present application;
fig. 11 is a schematic view of an operation interface for exporting a text according to an embodiment of the present application;
fig. 12 is a schematic flowchart of an information processing method according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of a terminal device according to an embodiment of the present application;
fig. 14 is a schematic structural diagram of another terminal device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides an information processing method and terminal equipment, which are used for distinguishing character roles appearing in a recording through voiceprint recognition while a user records the recording, and performing voice recognition conversion and displaying in a conversation mode.
The method and the terminal equipment are based on the same application concept, and because the principles of solving the problems of the method and the terminal equipment are similar, the implementation of the device and the method can be mutually referred, and repeated parts are not repeated.
The terminal device referred to in the embodiments of the present application may refer to a device providing voice and/or data connectivity to a user, a handheld device having a wireless connection function, or other processing device connected to a wireless modem. In different systems, the name of the terminal device may also be different, for example, in a 5G system, the terminal device may be referred to as a User Equipment (UE). Wireless terminal devices, which may be mobile terminal devices such as mobile telephones (or "cellular" telephones) and computers with mobile terminal devices, such as portable, pocket, hand-held, computer-included, or vehicle-mounted mobile devices, may communicate with one or more core networks via the RAN, and may exchange language and/or data with a radio access network. Such as Personal Communication Service (PCS) phones, cordless phones, session Initiated Protocol (SIP) phones, wireless Local Loop (WLL) stations, personal Digital Assistants (PDAs), and the like. The wireless terminal device may also be referred to as a system, a subscriber unit (subscriber unit), a subscriber station (subscriber station), a mobile station (mobile station), a remote station (remote station), an access point (access point), a remote terminal (remote terminal), an access terminal (access terminal), a user terminal (user terminal), a user agent (user agent), and a user device (user device), which is not limited in this embodiment.
Various embodiments of the present application will be described in detail below with reference to the drawings. It should be noted that the display sequence of the embodiment of the present application only represents the sequence of the embodiment, and does not represent the merits of the technical solutions provided by the embodiments.
The method and the device have the advantages that the role of the sound recording file is distinguished through voiceprint recognition, and the sound recording file with the distinguished role is converted into characters through voice recognition and displayed in a conversation mode. The characters appearing in the sound recording can be distinguished through voiceprint recognition while the user records the sound, and the voice recognition conversion is carried out to display in a dialogue mode. Therefore, the user can check the recording more efficiently and intuitively. And when the recording is played, the user can also correct the recording file. If the wrongly written characters are modified, the wrongly recognized roles are modified. The character avatar and name may be set for the conversation character. Or the recording segments corresponding to a single dialog may be played separately. In addition, the user can directly export the dialog text file without arranging the dialog text file by the user.
The technical scheme provided by the embodiment of the application specifically comprises the following steps:
1. and (5) carrying out role distinguishing and displaying.
Referring to fig. 1, after a user turns on a "text" function switch (a "text" word button displayed on a mobile phone screen in fig. 1) in a recorder application (an application with a recording function is applicable), and clicks "start" (a gray button beside the "text" word button displayed on the mobile phone screen in fig. 1, although the actual display may be red or other colors, and the specific style is not limited in the embodiment of the present application), the mobile phone starts to perform role differentiation, as an example of the display content of the mobile phone screen on the rightmost side in fig. 1, where "a", "B", and "C" respectively represent different roles, that is, different characters, and if the recording is finished, the button on the lower right corner of the mobile phone screen on the rightmost side in fig. 1 may be clicked. In addition, the key with the small red flag mark at the lower left corner of the rightmost interface in fig. 1 has the function that in the recording and playing process and/or the real-time recording process, a user can click the key to mark key time points, so that the small red flag mark can be displayed on the recording and playing interface (the rightmost interface in fig. 4) and the recording and playing time axis, thereby helping the user to quickly locate the important recording time.
The method comprises the following steps of:
1. and (4) punctuating the recorded sound in real time, wherein each natural sentence is used as a sound recording fragment and is marked as a fragment 1, a fragment 2, a fragment 3 and a fragment 4.
2. And after the acquisition of the segment 1 is finished, carrying out voiceprint feature acquisition and speech-to-text recognition on the segment 1. The collected voiceprint feature is marked as "a", and then "a" represents a role as described in the embodiments of the present application. And displaying the character mark displayed by identification in the first conversation box. As shown in fig. 2, the word "hello" converted from the voice of the character "a" is displayed in the first dialog box.
3. And after the acquisition of the fragment 2 is finished, carrying out voiceprint characteristic acquisition and voice-to-text recognition on the fragment 2. The collected voiceprint characteristics are compared with the voiceprint characteristics of the collected "a".
If the characteristics are the same, marking the acquired voiceprint characteristics as 'A'; if the characteristics are different, the collected voiceprint characteristics are marked as 'B', and then 'B' also represents a role in the embodiment of the application, and the voice of the same person is not indicated by the difference from the role represented by 'A'.
If the voiceprint characteristics of the segment 2 are the same as the voiceprint characteristics of the segment 1, merging and displaying the recognized characters in a first session box; and if the voiceprint characteristics of the segment 2 are different from the voiceprint characteristics of the segment 1, the recognized characters are separately displayed in a second session box. As shown in fig. 3, the text "hello" converted from the voice of character "B" is displayed in the second dialog box.
The method can be analogized as (8230) (\ 8230)
4. And after the acquisition of the segment n is finished, carrying out voiceprint feature acquisition and speech-to-text recognition on the segment n. The collected features are compared with the collected features in sequence.
If the same characteristics exist, marking the collected voiceprint characteristics as the same characteristics; and if the characteristics are different, marking the acquired voiceprint characteristics as new characteristics.
If the voiceprint feature of the segment n is the same as the voiceprint feature of the segment n-1, merging and displaying the recognized characters in a previous session frame; and if the voiceprint characteristics of the segment n are different from the voiceprint characteristics of the segment n-1, the recognized characters are separately displayed in a new conversation frame.
2. Recording and playing:
referring to fig. 4, in the main application interface (as shown in the leftmost mobile phone interface in fig. 4), clicking the "recording list" button displayed in the lower right of the screen (i.e., the button in the lower right of the leftmost mobile phone interface in fig. 4) can enter the recording list, and the recorded recording is displayed in the recording list. If a certain sound recording file contains character and text identification, a "text" symbol is displayed in the list, for example, the last file displayed on the middle mobile phone interface in fig. 4 contains a "text" symbol, but the embodiment of the present invention is not limited thereto, and may also be a symbol of another style.
Clicking a single recording file can enter a recording playing interface to play the recording, as shown in the rightmost mobile phone interface in fig. 4. The recognized role, the list of session frames, and the time point and duration of the recording corresponding to each session frame are displayed in the recording playing interface, and a "mark during playing" is displayed on the currently played session, for example, the third voice displayed on the rightmost mobile phone interface in fig. 4 is the currently played voice session.
3. Role setting and editing:
referring to fig. 5, in the recording and playing interface, after the user clicks (clicks, double clicks, or long presses) a single dialog box, a role setting and editing interface, that is, the rightmost mobile phone interface in fig. 5, is displayed. On the interface, the dialog box is in a selected state, the displayed text content can be edited by a user, and a function menu is expanded below the dialog box. The function menu includes, for example, modifying a role, naming the role, setting a head portrait, playing four keys, and clicking each key to execute a corresponding function. For example, in the middle interface of the mobile phone in fig. 5, that is, the recording and playing interface, when the user clicks the third session box, the rightmost role setting and editing interface is displayed, the four function buttons are displayed below the third session box on the role setting and editing interface, and other contents are darkened.
On the handset interface shown on the far right side of fig. 5, the user can implement the following operations:
1. editing the text:
referring to fig. 6, when the dialog box is in the selected state, the text in the dialog box can be edited, and the user can modify the text which is identified incorrectly, and can add or delete the text, etc.
2. And (3) modifying roles:
referring to fig. 7, clicking "modify role" pops up a selection role box, as shown in the middle of fig. 7, which includes currently distinguished role buttons (e.g., a, B, C) and a new role button (e.g., a user can add a new role D). Clicking a character button can set the dialog as the dialog of the character, for example, as shown in the rightmost mobile phone interface in fig. 7, if the user selects the character B, the content of the selected dialog box is changed from the original corresponding character a to the corresponding character B; if the user selects a new role, the content of the dialog box can be set as a dialog box of the new role.
That is, the user can modify the wrongly identified character through this "modify character" function key.
3. Role naming:
referring to fig. 8, click "role naming" to pop up an input box, as shown in the middle of fig. 8. The user may enter a new name. After clicking save, the role is named the name newly entered by the user, rather than the original "A". And, the newly input name is displayed on all the dialog boxes of the original role "a" in the dialog box list. Or after clicking and saving, displaying the name newly input by the user on the conversation box corresponding to the role, namely, simultaneously keeping the original 'A'. And, all the dialog boxes of the original role "a" in the dialog box list display the newly input name, as shown in the rightmost mobile phone interface of fig. 8.
That is, the user can name the roles through the role naming function, and not only can distinguish different roles, so that the later checking and sorting are facilitated.
4. Setting a head portrait:
referring to fig. 9, clicking on the "set avatar" may select to take a picture, or to select a local picture. Selecting to take a picture, and starting up a local camera to take the picture, wherein the taken picture can be set as a head portrait; and if the local photo is selected, opening a local photo selection interface, and setting the photo as the head portrait after the photo is selected. After the avatar is set, all the dialog boxes of the role in the dialog box list may be added with the avatar flag, for example, before or above the dialog box; or all the head portraits of all the conversation frames of the role in the conversation frame list are replaced by the newly set head portraits.
That is to say, the user can set the head portrait for the familiar role through the head portrait setting function, so that the later-stage checking and sorting are facilitated.
5. And (3) playing:
referring to fig. 10, clicking "play" can play the recording segment corresponding to the session box individually, so as to facilitate repeated listening.
4. And (3) text derivation:
referring to fig. 11, by clicking the "derive text" button on the recording and playing interface (e.g. the leftmost interface of the mobile phone in fig. 11), the contents of the recording name, the recording time, the session (role name, speech content) text, etc. can be extracted, and the document can be automatically typeset. The user clicks "confirm save" (e.g. the rightmost cell phone interface in fig. 11), the text can be exported to a word document or a file in another format, and the file can be edited subsequently.
To sum up, referring to fig. 12, on a terminal (e.g., a mobile phone) side, an information processing method provided in an embodiment of the present application includes:
s101, determining a current recording fragment based on the recording recorded in real time, and carrying out voiceprint feature acquisition and speech-to-text recognition on the current recording fragment;
optionally, determining the current recording segment based on the real-time recorded recording specifically includes:
and (4) punctuating the recorded sound recorded in real time, wherein each natural sentence is used as a sound recording fragment.
S102, comparing the voiceprint features corresponding to the current recording segment with the collected voiceprint features, and outputting and displaying the characters corresponding to the current recording segment according to the comparison result.
Optionally, according to the comparison result, outputting and displaying the text corresponding to the current recording segment, specifically including:
if the voiceprint features corresponding to the current recording fragment are the same as the collected voiceprint features, marking the voiceprint features corresponding to the current recording fragment as marks of the same voiceprint features; if the voiceprint characteristics corresponding to the current recording fragment are different from the acquired voiceprint characteristics, setting a new mark for the voiceprint characteristics corresponding to the current recording fragment;
if the voice print characteristics corresponding to the current recording segment are the same as the voice print characteristics of the previous recording segment, combining and displaying the characters corresponding to the current recording segment in a conversation frame of the character output display corresponding to the previous recording segment; and if the voiceprint features corresponding to the current recording segment are different from the voiceprint features of the previous recording segment, independently displaying the characters corresponding to the current recording segment in a new session frame, and simultaneously displaying marks of the voiceprint features corresponding to the current recording segment.
In this embodiment of the present application, optionally, the voiceprint feature corresponding to the current recording segment is compared with the voiceprint feature of the previous recording segment, although the embodiment of the present application is not limited thereto, the voiceprint feature corresponding to the current recording segment may also be compared with the existing voiceprint features of any one or more recording segments, and a specific implementation manner is not limited in the present application as long as the voiceprint feature of the current recording segment and the voiceprint feature of which recording segment are the same or are completely a new voiceprint feature can be identified.
Optionally, the method further comprises:
receiving a recording list key instruction through an application main interface (such as the leftmost interface of fig. 1 and 4) provided for a user, and outputting a recording list interface to the user, wherein the recording list interface displays preset symbol marks for a recording file containing character and character recognition.
The predetermined symbol mark, such as a character symbol.
Optionally, the method further comprises:
receiving a selection instruction of a user on a recording file containing a role and character identification through the recording list interface, and outputting a recording playing interface, wherein the recording playing interface at least comprises one or a combination of the following information:
the mark of the voiceprint characteristic, the list of the dialog boxes, and the time point and the time length of the corresponding recording of each dialog box.
The mark of the voiceprint feature may also be referred to as the above identified character, for example, "a", "B", "C", and the like.
Optionally, the method further comprises:
and receiving a selection instruction of a user for any conversation box through the recording playing interface, and displaying a role setting and editing interface which comprises the text content in the conversation box for the user to edit.
Optionally, the role setting and editing interface further includes: the function menu at least comprises one or a combination of the following function keys:
modifying roles, naming roles, setting head portrait and playing.
Optionally, the method further comprises:
the recording playing interface also comprises an export text key, when a user clicks the export text key, a file interface showing a preset format is output, and the file interface at least comprises one or the combination of the following contents of the recording file:
name of recording file, recording time, and session text.
Optionally, the method further comprises:
the file interface also comprises a storage confirming button, and when a user clicks the storage confirming button, a file in a preset format is generated.
Referring to fig. 13, a terminal device provided in an embodiment of the present application includes:
the processor 600, which is used to read the program in the memory 620, executes the following processes:
determining a current recording fragment based on the recording recorded in real time, and carrying out voiceprint feature acquisition and speech-to-text recognition on the current recording fragment;
and comparing the voiceprint features corresponding to the current recording segment with the collected voiceprint features, and outputting and displaying the characters corresponding to the current recording segment according to the comparison result.
Optionally, according to the comparison result, outputting and displaying the text corresponding to the current recording segment, specifically including:
if the voiceprint features corresponding to the current recording fragment are the same as the collected voiceprint features, marking the voiceprint features corresponding to the current recording fragment as marks of the same voiceprint features; if the corresponding voiceprint characteristics of the current recording fragment are different from the acquired voiceprint characteristics, setting a new mark for the corresponding voiceprint characteristics of the current recording fragment;
if the voice print characteristics corresponding to the current recording segment are the same as the voice print characteristics of the previous recording segment, combining and displaying the characters corresponding to the current recording segment in a conversation frame of the character output display corresponding to the previous recording segment; and if the voiceprint features corresponding to the current recording segment are different from the voiceprint features of the previous recording segment, independently displaying the characters corresponding to the current recording segment in a new session frame, and simultaneously displaying marks of the voiceprint features corresponding to the current recording segment.
Optionally, determining a current recording segment based on the recording recorded in real time includes:
and (4) punctuating the recorded sound recorded in real time, wherein each natural sentence is used as a sound recording fragment.
Optionally, the processor 600 is further configured to:
and receiving a recording list key instruction through an application main interface provided for a user, and outputting a recording list interface to the user, wherein the recording list interface displays preset symbol marks for a recording file containing a role and character recognition.
Optionally, the processor 600 is further configured to:
receiving a selection instruction of a user on a recording file containing role and character identification through the recording list interface, and outputting a recording playing interface, wherein the recording playing interface at least comprises one or the combination of the following information:
the mark of the voiceprint characteristic, the list of the dialog boxes, and the time point and the time length of the sound recording corresponding to each dialog box.
Optionally, the processor 600 is further configured to:
and receiving a selection instruction of a user for any conversation frame through the recording playing interface, and displaying a role setting and editing interface which comprises the text content in the conversation frame for the user to edit.
Optionally, the role setting and editing interface further includes: the function menu at least comprises one or a combination of the following function keys:
modifying roles, naming roles, setting head portrait and playing.
Optionally, the recording and playing interface further includes a text export button, and the processor 600 is further configured to:
when the user clicks the export text key, a file interface showing a preset format is output, and the file interface at least comprises one or the combination of the following contents of the recording file:
name of recording file, recording time, and session text.
Optionally, the file interface further includes a confirm save button, and the processor 600 is further configured to:
and when the user clicks the save confirmation key, generating a file with a preset format.
A transceiver 610 for receiving and transmitting data under the control of the processor 600.
Where in fig. 13 the bus architecture may include any number of interconnected buses and bridges, with various circuits being linked together, particularly one or more processors represented by processor 600 and memory represented by memory 620. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 610 may be a plurality of elements including a transmitter and a receiver that provide a means for communicating with various other apparatus over a transmission medium. For different user devices, the user interface 630 may also be an interface capable of interfacing with a desired device externally, including but not limited to a keypad, display, speaker, microphone, joystick, etc.
The processor 600 is responsible for managing the bus architecture and general processing, and the memory 620 may store data used by the processor 600 in performing operations.
Alternatively, the processor 600 may be a CPU (central processing unit), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or a CPLD (Complex Programmable Logic Device).
Referring to fig. 14, another terminal device provided in the embodiment of the present application includes:
a first unit 11, configured to determine a current recording segment based on a recording recorded in real time, and perform voiceprint feature acquisition and speech-to-text recognition on the current recording segment;
and a second unit 12, configured to compare the voiceprint features corresponding to the current recording segment with the collected voiceprint features, and output and display the text corresponding to the current recording segment according to a comparison result.
Optionally, according to the comparison result, outputting and displaying the text corresponding to the current recording segment, specifically including:
if the corresponding voiceprint features of the current recording fragment are the same as the acquired voiceprint features, marking the corresponding voiceprint features of the current recording fragment as marks of the same voiceprint features; if the voiceprint characteristics corresponding to the current recording fragment are different from the acquired voiceprint characteristics, setting a new mark for the voiceprint characteristics corresponding to the current recording fragment;
if the voice print characteristics corresponding to the current recording segment are the same as the voice print characteristics of the previous recording segment, combining and displaying the characters corresponding to the current recording segment in a conversation frame of the character output display corresponding to the previous recording segment; and if the voiceprint features corresponding to the current recording segment are different from the voiceprint features of the previous recording segment, independently displaying the characters corresponding to the current recording segment in a new session frame, and simultaneously displaying marks of the voiceprint features corresponding to the current recording segment.
Optionally, determining a current recording segment based on the recording recorded in real time includes:
and (4) punctuating the recorded sound recorded in real time, wherein each natural sentence is used as a sound recording fragment.
Optionally, the second unit 12 is further configured to:
and receiving a recording list key instruction through an application main interface provided for a user, and outputting a recording list interface to the user, wherein the recording list interface displays preset symbol marks for a recording file containing a role and character recognition.
Optionally, the second unit 12 is further configured to:
receiving a selection instruction of a user on a recording file containing a role and character identification through the recording list interface, and outputting a recording playing interface, wherein the recording playing interface at least comprises one or a combination of the following information:
the mark of the voiceprint characteristic, the list of the dialog boxes, and the time point and the time length of the sound recording corresponding to each dialog box.
Optionally, the second unit 12 is further configured to:
and receiving a selection instruction of a user for any conversation box through the recording playing interface, and displaying a role setting and editing interface which comprises the text content in the conversation box for the user to edit.
Optionally, the role setting and editing interface further includes: the function menu at least comprises one or a combination of the following function keys:
modifying roles, naming roles, setting head portraits and playing.
Optionally, the recording playing interface further includes a text exporting key, and the second unit 12 is further configured to: when the user clicks the export text key, a file interface showing a preset format is output, and the file interface at least comprises one or the combination of the following contents of the recording file:
name of recording file, recording time, and session text.
Optionally, the file interface further includes a confirm save button, and the second unit 12 is further configured to: and when the user clicks the save confirmation key, generating a file with a preset format.
It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Any one of the above apparatuses provided in the embodiments of the present application may be a terminal device, and may also be referred to as a computing device. The device may be specifically a desktop computer, a portable computer, a smart phone, a tablet computer, a smart television, a Personal Digital Assistant (PDA), and the like. The apparatus may include a Central Processing Unit (CPU), a memory, an input/output device, and the like, the input device may include a keyboard, a mouse, a touch screen, and the like, and the output device may include a Display device, such as a Liquid Crystal Display (LCD), a Cathode Ray Tube (CRT), and the like.
The memory may include Read Only Memory (ROM) and Random Access Memory (RAM), and provides the processor with program instructions and data stored in the memory. In the embodiments of the present application, the memory may be used for storing a program of any one of the methods provided by the embodiments of the present application.
The processor is used for executing any one of the methods provided by the embodiment of the application according to the obtained program instructions by calling the program instructions stored in the memory.
Embodiments of the present application provide a computer storage medium for storing computer program instructions for an apparatus provided in the embodiments of the present application, which includes a program for executing any one of the methods provided in the embodiments of the present application.
The computer storage media may be any available media or data storage device that can be accessed by a computer, including, but not limited to, magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memories (NAND FLASH), solid State Disks (SSDs)), etc.
The above method process flow may be implemented by a software program, which may be stored in a storage medium, and when the stored software program is called, the above method steps are performed.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. An information processing method, characterized in that the method comprises:
determining a current recording fragment based on the recording recorded in real time, and carrying out voiceprint feature acquisition and speech-to-text recognition on the current recording fragment;
and comparing the voiceprint characteristics corresponding to the current recording fragment with the acquired voiceprint characteristics, and outputting and displaying the characters corresponding to the current recording fragment according to the comparison result.
2. The method of claim 1, wherein outputting and displaying the text corresponding to the current recording segment according to the comparison result specifically comprises:
if the voiceprint features corresponding to the current recording fragment are the same as the collected voiceprint features, marking the voiceprint features corresponding to the current recording fragment as marks of the same voiceprint features; if the voiceprint characteristics corresponding to the current recording fragment are different from the acquired voiceprint characteristics, setting a new mark for the voiceprint characteristics corresponding to the current recording fragment;
if the voice print characteristics corresponding to the current recording segment are the same as the voice print characteristics of the previous recording segment, combining and displaying the characters corresponding to the current recording segment in a conversation frame of the character output display corresponding to the previous recording segment; and if the voiceprint features corresponding to the current recording segment are different from the voiceprint features of the previous recording segment, independently displaying the characters corresponding to the current recording segment in a new session frame, and simultaneously displaying marks of the voiceprint features corresponding to the current recording segment.
3. The method of claim 1, further comprising:
and receiving a recording list key instruction through an application main interface provided for a user, and outputting a recording list interface to the user, wherein the recording list interface displays preset symbol marks for a recording file containing a role and character recognition.
4. The method of claim 3, further comprising:
receiving a selection instruction of a user on a recording file containing a role and character identification through the recording list interface, and outputting a recording playing interface, wherein the recording playing interface at least comprises one or a combination of the following information:
the mark of the voiceprint characteristic, the list of the dialog boxes, and the time point and the time length of the sound recording corresponding to each dialog box.
5. The method of claim 4, further comprising:
and receiving a selection instruction of a user for any conversation frame through the recording playing interface, and displaying a role setting and editing interface which comprises the text content in the conversation frame for the user to edit.
6. The method of claim 5, wherein the role setting and editing interface further comprises: the function menu at least comprises one or a combination of the following function keys:
modifying roles, naming roles, setting head portrait and playing.
7. The method of claim 4, further comprising:
the recording playing interface also comprises an export text key, when a user clicks the export text key, a file interface showing a preset format is output, and the file interface at least comprises one or the combination of the following contents of the recording file:
name of recording file, recording time, and session text.
8. The method of claim 7, further comprising:
the file interface also comprises a storage confirmation key, and when the user clicks the storage confirmation key, a file in a preset format is generated.
9. A terminal device, comprising:
a memory for storing program instructions;
a processor for calling program instructions stored in said memory to execute the method of any one of claims 1 to 8 in accordance with the obtained program.
10. A computer storage medium having computer-executable instructions stored thereon for causing a computer to perform the method of any one of claims 1 to 8.
CN202110708975.XA 2021-06-25 2021-06-25 Information processing method and terminal equipment Pending CN115527541A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110708975.XA CN115527541A (en) 2021-06-25 2021-06-25 Information processing method and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110708975.XA CN115527541A (en) 2021-06-25 2021-06-25 Information processing method and terminal equipment

Publications (1)

Publication Number Publication Date
CN115527541A true CN115527541A (en) 2022-12-27

Family

ID=84694277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110708975.XA Pending CN115527541A (en) 2021-06-25 2021-06-25 Information processing method and terminal equipment

Country Status (1)

Country Link
CN (1) CN115527541A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116072141A (en) * 2023-04-06 2023-05-05 深圳市阿尔泰车载娱乐系统有限公司 Vehicle-mounted communication system and method with voice recognition function

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116072141A (en) * 2023-04-06 2023-05-05 深圳市阿尔泰车载娱乐系统有限公司 Vehicle-mounted communication system and method with voice recognition function

Similar Documents

Publication Publication Date Title
CN102687485B (en) For sharing the apparatus and method of content on the mobile apparatus
US20180020090A1 (en) Keyword based message handling
JP2006190296A (en) Method and apparatus for providing information by using context extracted from multimedia communication system
CN105389296A (en) Information partitioning method and apparatus
EP3542522A1 (en) Incoming call management method and apparatus
CN106484134A (en) The method and device of the phonetic entry punctuation mark based on Android system
US7668829B2 (en) Method and apparatus for storing music file in mobile communication terminal
US20120014295A1 (en) Electronic device, storage medium storing information processing program and information processing method
CN115527541A (en) Information processing method and terminal equipment
US7904058B2 (en) Recording data at a mobile telephone during a telephone call
CN112866469A (en) Method and device for recording call content
JP2008113331A (en) Telephone system, telephone set, server device, and program
CN110445934A (en) Call-information processing method, system, terminal and readable storage medium storing program for executing
CN104702758A (en) Terminal and method thereof for managing multimedia notepad
CN105072243A (en) Incoming call prompting method and apparatus
CN109285545A (en) Information processing method and device
CN112905464B (en) Application running environment data processing method and device
CN110708418B (en) Method and device for identifying attributes of calling party
CN104182406A (en) Electronic business card creating method, electronic business card retrieval method and relevant system
CN113055529A (en) Recording control method and recording control device
JP2010219969A (en) Call recording device with retrieving function, and telephone set
KR100724848B1 (en) Method for voice announcing input character in portable terminal
CN110931014A (en) Speech recognition method and device based on regular matching rule
KR100873126B1 (en) Mobile terminal and Method for editing a text in thereof
CN107148062A (en) A kind of double card switching method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination