CN113901186A - Telephone recording marking method, device, equipment and storage medium - Google Patents

Telephone recording marking method, device, equipment and storage medium Download PDF

Info

Publication number
CN113901186A
CN113901186A CN202111168567.6A CN202111168567A CN113901186A CN 113901186 A CN113901186 A CN 113901186A CN 202111168567 A CN202111168567 A CN 202111168567A CN 113901186 A CN113901186 A CN 113901186A
Authority
CN
China
Prior art keywords
page
target
client
child node
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111168567.6A
Other languages
Chinese (zh)
Inventor
杨声钟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202111168567.6A priority Critical patent/CN113901186A/en
Publication of CN113901186A publication Critical patent/CN113901186A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

The application relates to the technical field of voice labeling, and discloses a method, a device, equipment and a storage medium for labeling a telephone recording, wherein the method comprises the following steps: acquiring an audio file of a telephone recording, wherein the telephone recording is a recording of a conversation between a robot role and a client role; performing voice recognition on the audio file to obtain a reference labeling text; segmenting the reference annotation text according to the interactive sequence of the conversation to obtain at least one robot dialogue segment and at least one client dialogue segment; generating page nodes corresponding to the robot dialogue segments and the client dialogue segments one by one; and longitudinally arranging the page nodes in a target page according to the interaction sequence of the conversation, displaying the target page in a target visual area, and acquiring the marks of a marking person on the robot dialog segment and the client dialog segment in the target page. The method and the device have the advantages that the voice recognition model and the manual marking are combined to mark the telephone recording, and marking efficiency and accuracy are improved.

Description

Telephone recording marking method, device, equipment and storage medium
Technical Field
The present application relates to the field of voice tagging technologies, and in particular, to a method, an apparatus, a device, and a storage medium for tagging a telephone recording.
Background
In the improvement process of the intelligent customer service system, the communication between the robot and the customer is recorded, the telephone recording is marked, the marked telephone recording is used for the subsequent training of the intelligent customer service system, the voice recognition and natural language processing capacity of the intelligent customer service system is improved, and the intelligent customer service system is more intelligent.
In some telephone recording labeling methods, only the neural network model is used for labeling the telephone recording, but the accuracy of labeling needs to be improved; or the telephone recording is marked only by a manual marking mode, although the marking accuracy is improved, the pure manual listening and marking time is long, and the operation is complicated.
Therefore, a highly accurate and efficient method for labeling telephone recordings is needed.
Disclosure of Invention
The application aims to provide a method, a device, equipment and a storage medium for marking telephone records, so that the efficiency of manually marking the telephone records can be improved.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.
According to an aspect of an embodiment of the present application, there is provided a method for labeling a telephone recording, the method including:
acquiring an audio file of a telephone recording, wherein the telephone recording is a recording of a conversation between a robot role and a client role;
performing voice recognition on the audio file to obtain a reference labeling text;
segmenting the reference annotation text according to the interactive sequence of the conversation to obtain at least one robot dialogue segment and at least one client dialogue segment;
generating page nodes corresponding to the robot dialogue segments and the client dialogue segments one by one;
and longitudinally arranging the page nodes in a target page according to the interaction sequence of the conversation, and displaying the target page in a target visual area so that a labeling person labels the robot dialog segment and the client dialog segment in the target page.
In some embodiments of the present application, based on the foregoing solution, the method further comprises:
responding to a preset event of a key in the target page, and acquiring the rolling distance of the target page according to the height of a marked page node in the current page corresponding to the target visual area, wherein the preset event comprises a pressing event and/or a bounce event;
and controlling the target page to sequentially scroll according to the scroll distance, and displaying the target page in the target visual area.
In some embodiments of the present application, based on the foregoing solution, the obtaining a scrolling distance of the target page according to a height of a page node labeled in a current page corresponding to the target visible area includes:
determining a first section of continuously marked page nodes from top to bottom in the current page corresponding to the target visual area;
and taking the sum of the heights of the first section of continuous labeled page nodes as the rolling distance.
In some embodiments of the present application, based on the foregoing solution, the method further comprises:
associating the first preset area of the page node corresponding to the customer dialogue segment with an audio playing component;
and if the first target event is detected in the preset area, playing an audio clip corresponding to the client dialog clip.
In some embodiments of the present application, based on the foregoing solution, the page nodes corresponding to the customer dialog pieces include a first child node, a second child node, a third child node, and a fourth child node arranged horizontally in the display interface;
the first child node is used for displaying the client dialogue segment;
the second child node is used for displaying selectable voice recognition attributes corresponding to the customer dialogue segment so that a annotating person can determine the voice recognition attributes of the customer dialogue segment;
the third child node is used for displaying selectable natural language processing attributes corresponding to the client dialog piece so that a annotator can determine the natural language processing attributes of the client dialog piece;
and the fourth child node is used for displaying other attributes corresponding to the client dialogue segment so that a annotating person can determine the attributes of the client dialogue segment.
In some embodiments of the present application, based on the foregoing solution, the method further comprises:
monitoring a second target event in a second preset area of the second child node;
if the second target event is monitored, folding the second child node to hide part of contents in the second child node;
monitoring a third target event in a third preset area of the third child node;
if the third target event is monitored, the third child node is folded, so that part of the content in the third child node is hidden.
According to an aspect of the embodiments of the present application, there is provided a device for annotating a telephone recording, the device comprising:
the system comprises an audio file acquisition unit, a client role acquisition unit and a voice recognition unit, wherein the audio file acquisition unit is used for acquiring an audio file of a telephone recording, and the telephone recording is a recording of a conversation between a robot role and a client role;
the voice recognition unit is used for carrying out voice recognition on the audio file to obtain a reference marking text;
the segmentation unit is used for segmenting the reference annotation text according to the interactive sequence of the conversation to obtain at least one robot dialogue segment and at least one client dialogue segment;
the page node generating unit is used for generating page nodes which correspond to the robot dialogue segments and the client dialogue segments one by one;
and the display unit is used for longitudinally arranging the page nodes in a target page according to the interaction sequence of the conversation and displaying the target page in a target visual area so as to enable a labeling person to label the robot dialogue segment and the client dialogue segment in the target page.
In some embodiments of the present application, based on the foregoing solution, the apparatus further includes:
the rolling distance acquisition unit is used for responding the height of the marked page node in the current page and acquiring the rolling distance of the target page, and the preset event comprises a pressing event and/or a bounce event;
and the target page control unit is used for controlling the target pages to sequentially scroll according to the scroll distance and displaying the target pages in the target visual area.
According to an aspect of the embodiments of the present application, there is provided a computer-readable program medium storing computer program instructions, which, when executed by a computer, cause the computer to execute the above-mentioned method for tagging a telephone recording.
According to an aspect of an embodiment of the present application, there is provided a computer apparatus including: a processor; the memory is stored with computer readable instructions, and when the computer readable instructions are executed by the computer equipment, the telephone recording labeling method is realized.
In the technical scheme of some embodiments of the application, the telephone recording is segmented according to the interactive sequence of the conversation and displayed in the target visual area, so that a annotator can distinguish the robot dialogue segment and the client dialogue segment in the target visual area, and manually annotates the reference annotations of the robot dialogue segment and the client dialogue segment, thereby improving the efficiency of manual annotation.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The above and other features and advantages of the present application will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application can be applied.
Fig. 2 shows a flow chart of a method of call record annotation according to an embodiment of the present application.
FIG. 3 illustrates an effect diagram of a target page according to one embodiment of the present application.
Fig. 4 is a flow chart illustrating a method for tagging telephone recordings according to an embodiment of the present application.
FIG. 5 illustrates a flow chart of a method of obtaining a scroll distance of a target page according to one embodiment of the present application.
Fig. 6 shows a flow chart of another method of callout tagging in accordance with one embodiment of the present application.
FIG. 7 illustrates an effect diagram of yet another target page according to one embodiment of the present application.
Fig. 8 is a flow chart illustrating a method for labeling a telephone recording according to an embodiment of the present application.
Fig. 9 is a schematic structural diagram of a recording annotation device according to an embodiment of the present application.
Fig. 10 shows a schematic diagram of a program product for implementing the above method according to an embodiment of the present application.
FIG. 11 shows a schematic diagram of an electronic device according to one embodiment of the present application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It is also noted that the terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the objects so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than those illustrated or described herein.
Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application can be applied.
As shown in fig. 1, the system architecture may include a terminal device (such as one or more of the smartphone 101, tablet 102, and portable computer 103 shown in fig. 1), a network 104, and a server 105. The network 104 serves as a medium for providing communication links between terminal devices and the server 105. The terminal devices and the server 105 are connected via a network 104, which may include various connection types, such as wired communication links, wireless communication links, and so forth.
The telephone recording method provided by the embodiment of the application can be executed by the server 105, the terminal equipment can send the telephone recording to the server through the network, and the server processes the telephone recording, so that a labeling person can label the telephone recording.
It should also be noted that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. According to implementation needs, the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (content distribution network), a big data and artificial intelligence platform, and the like. The terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart television, and the like, but is not limited thereto, and the application is not limited thereto.
It should be explained that cloud computing (cloud computing) as above is a computing model that distributes computing tasks over a resource pool of a large number of computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the cloud can be infinitely expanded to users, and can be acquired at any time, used as required and expanded at any time. The cloud computing resource pool mainly comprises computing equipment (which is a virtualization machine and comprises an operating system), storage equipment and network equipment.
The following detailed description is performed on implementation details of the technical solution of the embodiment of the present application:
fig. 2 shows a flow chart of a method of call record annotation according to an embodiment of the present application. As shown in fig. 2, the method includes at least the following steps.
Step 210: an audio file of a telephone recording is obtained, the telephone recording being a recording of a conversation between a robot character and a client character.
Step 220: and carrying out voice recognition on the audio file to obtain a reference marking text.
The reference labeling text is a referential text obtained by an audio file through a voice recognition model, and labeling personnel need to perform multidimensional evaluation on the effect of the voice recognition model, namely labeling the reference labeling text, so that the method is convenient for subsequent application in iterative training with labels of the voice recognition model and training of an intelligent customer service system.
In a specific implementation, the audio file may be first subjected to endpoint detection, noise reduction, and acoustic feature extraction. The speech signal contains a lot of underlying information such as speaker, pronunciation content, channel characteristics, accent dialect. Furthermore, the bottom information is combined with each other, and rich high-level information such as emotion change, grammar semantics, implied connotation and the like is expressed. The speech feature extraction is to extract the most relevant information to speech recognition from the original speech signal and filter out other irrelevant information. Then, the acoustic features are converted into phonemes through an acoustic model based on LSTM + CTC (Long Short-Term Memory + connection textual temporal classification) in the speech recognition model, and the phonemes are subjected to statistical pattern recognition through a language model based on N-gram (N-gram) in the speech recognition model to obtain a reference annotation text.
Step 230: and segmenting the reference annotation text according to the interactive sequence of the conversation to obtain at least one robot dialogue segment and at least one client dialogue segment.
The reference annotation text is segmented into a plurality of segments, so that the annotator can clearly distinguish the dialogue segments of the robot and the client, and the context contrast of the dialogue can be performed.
For example, a certain reference annotation text is divided into two segments, (1) the robot: do you ask you for you to be XX? (2) Customer: is it (3) do you need to do a car crash? (4) As needed.
Step 240: and generating page nodes which correspond to the robot dialogue segments and the client dialogue segments one by one.
To visualize the dialog segments, one-to-one page nodes may be generated for the dialog segments.
Step 250: and page nodes are longitudinally arranged in the target page according to the interactive sequence of the conversation, the target page is displayed in the target visible area, and the marks of the robot dialog segment and the client dialog segment in the target page by the marking personnel are obtained.
The target visual area is an area for displaying a target page in the display interface, and can occupy the whole display interface or a part of the display interface. FIG. 3 illustrates an effect diagram of a target page according to one embodiment of the present application. As shown in fig. 3, the dialog segments are arranged from top to bottom according to the interaction sequence, the page nodes corresponding to the robot dialog segments are aligned on the left side of the target visual area, the client dialog segments are aligned on the right side of the target visual area, and the annotator can view the conversation between the robot and the client at a glance.
According to the embodiment of the application, the reference labeling text corresponding to the audio file of the conversation recording of the robot and the client is split section by section according to the interaction sequence of the conversation, and is displayed on the display interface in sequence, so that labeling personnel can distinguish the roles of the conversation and perform context comparison conveniently, and the voice recording is labeled by combining a voice recognition model and manual labeling, and the efficiency and accuracy of labeling are improved.
Fig. 4 is a flow chart of a method for tagging telephone recordings according to an embodiment of the present application, where the method includes at least the following steps, as shown in fig. 4.
The implementation process of step 410-450 is similar to that of step 210-250, and will not be described herein.
Step 460: and responding to the preset event of the key in the target page, and acquiring the preset event of the rolling distance of the target page according to the height of the marked page node in the current page corresponding to the target visual area, wherein the preset event of the rolling distance of the target page comprises a pressing event and/or a bounce event.
In a specific implementation, the key may be a preset shortcut key, and when the annotation operator presses the shortcut key, and the shortcut key is pressed or bounced, the target page responds to obtain the scrolling distance of the page.
Step 470: and controlling the target page to sequentially scroll according to the scroll distance, and displaying the target page in the target visual area.
When the audio file contains more robot dialog segments and more client dialog segments and more corresponding page nodes, the height of a target visual area in the display interface is smaller than that of a target page, and the current page of the target visual area cannot display all the page nodes, so that a annotator needs to scroll the target page to view the page nodes which are not displayed on the current page. When a annotator manually scrolls the page in the visual area by using a mouse or a keyboard, if the mouse slides too slowly, time is consumed, and if the mouse slides too fast, part of page nodes can be missed, so that part of page nodes are not annotated.
The method and the device mainly comprise the steps that when it is monitored that a marking person presses a shortcut key, a page is automatically scrolled, and a subsequent page node which is not displayed on a current page is displayed in a target visual area, for example, the target page comprises page nodes numbered 1, 2, 3, 4 and 5, the page nodes numbered 1 and 2 displayed on the current page are marked, the page nodes numbered 1 and 2 are already marked, then the marking person presses the shortcut key, the scrolling distance of the target page is the sum of the heights of the page nodes numbered 1 and 2, the page nodes numbered 1 and 2 are moved out of the target visual area, and the subsequent page node is displayed from the page node numbered 3 in the target visual area. Or the marking of the page node numbered 1 in the current page is finished, but the marking of the page node numbered 2 is not finished, then the user presses a shortcut key, the page is rolled, the page node numbered 1 moves out of the visible area, and the page node numbered 2 occupies the optimal visible area of the target visible area.
According to the method and the device, the situation that the page of the target visual area is too fast or too slow when the annotating personnel rolls can be avoided, the page can be automatically rolled under one-key operation of the annotating personnel, the page node to be annotated is displayed in the target visual area, and the annotation efficiency is improved.
Fig. 5 is a flowchart illustrating a method for obtaining a scroll distance of a target page according to an embodiment of the present application, and the method at least includes the following steps, as shown in fig. 5.
Step 510: and determining continuous marked page nodes of the first section from the top to the bottom in the current page corresponding to the target visual area.
Step 520: and taking the sum of the heights of the nodes of the first section of the continuous marked pages as the rolling distance.
For example, the page nodes in the current page are numbered 1, 2, and 3, where the page nodes numbered 1 and 3 are labeled completely, but the page node numbered 2 is not labeled completely, then after the annotating staff presses the shortcut key, the first segment of continuously labeled page nodes from top to bottom in the current page corresponding to the target visible area only includes the page node numbered 1, so that only the page node numbered 1 is moved out of the visible area, and the page node numbered 2 is set in the visible area, which is equivalent to moving the unmarked node to the optimal visible area in the target visible area, so that the annotating staff marks the page node numbered 2.
Fig. 6 is a flow chart of another method for labeling telephone recordings according to an embodiment of the present application, where the method includes at least the following steps, as shown in fig. 6.
The implementation process of step 610-650 is similar to that of step 210-250, and will not be described herein.
Step 660: and associating the first preset area of the page node corresponding to the customer dialogue section with the audio playing component.
The audio assembly is used for playing the audio file corresponding to the telephone recording, and the playing assembly is provided with the playing start time and the playing end time of the client dialog segment in the whole audio file.
Step 670: and if the first target event is detected in the preset area, playing an audio clip corresponding to the dialogue clip of the client.
In the specific implementation, a play button can be set in a first preset area, a first target event can be a click event, after a marking person clicks the play button set in the first preset area, a play component plays an audio file according to the play start time and the play end time of the client dialog segment, the marking person can only listen to the recording audio corresponding to the client dialog segment, the recording corresponding to the client dialog segment is not required to be searched in the whole recording audio, the marking person can listen to the recording and watch the text quickly, the client dialog segment is marked, and the marking efficiency is improved.
Furthermore, a playing button of the whole audio file can be set in the target page, and the annotating personnel can click the button to continuously play the audio file.
FIG. 7 illustrates an effect diagram of yet another target page according to one embodiment of the present application. As shown in fig. 7, the page nodes corresponding to the customer dialog segment include a first child node, a second child node, a third child node, and a fourth child node arranged horizontally in the display interface.
And the first child node is used for displaying the client dialogue segment.
And the second child node is used for displaying the selectable speech recognition attributes corresponding to the customer dialogue segment so that the annotating personnel can determine the speech recognition attributes of the customer dialogue segment.
The speech recognition attribute (ASR attribute) is a multi-dimensional attribute used to characterize an audio segment and the speech recognition effect of the audio segment, including, but not limited to, attributes such as "text-on-word", "text-off-word", "default text-on-word", "accurate", "truncation", "substantially accurate", "noise", and the like.
And the marking personnel combines the recorded audio segment heard in the preset area and the text customer dialogue segment seen by the first child node, and the attribute of the customer dialogue segment is selected at the second child node, so that the marking on the speech recognition aspect of the customer dialogue segment can be completed.
In a specific implementation, the first preset region may be a local region of the first child node.
And the third child node is used for displaying the selectable natural language processing attribute corresponding to the client dialogue segment so that the annotating personnel can determine the natural language processing attribute of the client dialogue segment.
The natural language processing attribute (NLP attribute) is used to characterize the natural language processing effect multidimensional attribute of the audio segment, including but not limited to attributes such as "correct intention", "correct intention (multi-intention)", "misjudgment intention", "correct intention (large intention)", "insufficient corpus", "insufficient intention", and the like.
And the annotating personnel combines the recorded audio clip heard in the preset area and the text client dialogue clip seen by the first child node to select the attribute of the client dialogue clip in the third child node, so that the annotation on the aspect of natural language processing of the client dialogue clip can be completed.
And the fourth child node is used for displaying other attributes corresponding to the client dialogue segment so that the annotating personnel can determine the attributes of the client dialogue segment.
In order to improve the labeling efficiency of the labeling personnel and reduce the number of the labeling personnel for checking the ASR attribute and the NLP one by one, some labeling templates can be set in the fourth node, for example, default single selection 1, single selection 2 unit 3, default multiple selection 1, default multiple selection 2, default multiple selection 3, default multiple selection 4, and default multiple selection 5, and the labeling personnel can finish the labeling of the ASR attribute and the NLP attribute in one key by checking the labeling templates.
Furthermore, the annotating personnel can annotate the audio segment corresponding to the client dialogue segment through the fourth child node, that is, the annotating personnel annotates the client dialogue segment in addition to the existing ASR attribute and NLP attribute.
It should be noted that the heights of the first child node, the second child node, the third child node, and the fourth child node may be different, and the four child nodes are aligned upward, so that the height of the page node corresponding to the customer dialog segment is the height of the highest child node in the four child nodes. Accordingly, when a preset event is detected in the target page and the target page is scrolled, the scroll distance should be calculated according to the height of the highest child node.
Fig. 8 is a flowchart illustrating a method for labeling a telephone recording according to an embodiment of the present application, after step 670, the method further comprises the following steps.
Step 810: and monitoring a second target event in a second preset area of the second child node.
Step 820: and if the second target event is monitored, folding the second child node so as to hide partial contents in the second child node.
Step 830: and monitoring a third target event in a third preset area of a third child node.
Step 840: and if the third target event is monitored, folding the third child node so as to hide partial contents in the third child node.
Because the second child node and the third child node have more contents and higher corresponding heights and occupy more display interface visual areas, the folding and unfolding functions of the child nodes are arranged in the second child node and the third child node, when the second child node and the third child node are folded, the page nodes of one screen of a display interface can be increased, and the page utilization rate is improved.
It should be noted that, when the second child node or the third child node is folded, the height of the second child node or the third child node changes, and correspondingly, the height of the page node also changes, and after a preset event is detected in the target page, the scrolling distance needs to be calculated according to the actual height of the page node.
The following describes embodiments of the recording call annotation apparatus of the present application, which can be used to execute the recording call annotation method in the above embodiments of the present application. For the details not disclosed in the embodiments of the recording call marking device of the present application, please refer to the embodiments of the recording call marking method described above in the present application.
Fig. 9 is a schematic structural diagram of a recording annotation device according to an embodiment of the present application. As shown in fig. 9, the apparatus for labeling a call recording includes an audio file acquiring unit 910, a voice recognizing unit 920, a cutting unit 930, a page node generating unit 940, and a display unit 950.
An audio file obtaining unit 910 is configured to obtain an audio file of a telephone recording, the telephone recording being a recording of a conversation between a robot character and a client character.
The speech recognition unit 920 is configured to perform speech recognition on the audio file to obtain a reference annotation text.
The segmentation unit 930 is configured to segment the reference annotation text according to the interaction sequence of the dialog, so as to obtain at least one robot dialog segment and at least one client dialog segment.
The page node generating unit 940 generates page nodes corresponding to the robot dialog segments and the client dialog segments one to one.
And the display unit 950 is configured to longitudinally arrange the page nodes in the target page according to the interaction sequence of the conversations, display the target page in the target visible area, and obtain the labels of the robot dialog segment and the client dialog segment in the target page by the annotating staff.
In some embodiments of the present application, the apparatus for tagging a telephone recording further comprises:
the scrolling distance acquiring unit is used for responding the height of the page node marked in the current page and acquiring the scrolling distance of the target page, and the preset event comprises a pressing event and/or a bouncing event;
and the target page control unit is used for controlling the target pages to sequentially scroll according to the scroll distance and displaying the target pages in the target visible area.
It should be noted that although in the above detailed description reference is made to a method of telesound annotation and to several units of a telesound annotation device, such a division is not mandatory. Indeed, two or more of the units and functions described above may be embodied in one unit according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units. The components displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.
As another aspect, the present application also provides a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, various aspects of the present application may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present application described in the "exemplary methods" section above of this specification, when the program product is run on the terminal device.
Referring to fig. 10, a program product 1000 for implementing the above method according to an embodiment of the present application is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
As another aspect, the present application further provides an electronic device capable of implementing the above method.
As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 1100 according to this embodiment of the present application is described below with reference to fig. 11. The electronic device 1100 shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 11, electronic device 1100 is embodied in the form of a general purpose computing device. The components of the electronic device 1100 may include, but are not limited to: the at least one processing unit 1110, the at least one memory unit 1120, and a bus 1130 that couples various system components including the memory unit 1120 and the processing unit 1110.
Wherein the storage unit stores program code, which can be executed by the processing unit 1110, so that the processing unit 1110 performs the steps according to various exemplary embodiments of the present application described in the section "example methods" above in this specification.
The storage unit 1120 may include readable media in the form of volatile storage units, such as a random access memory unit (RAM)1121 and/or a cache memory unit 1122, and may further include a read-only memory unit (ROM) 1123.
The storage unit 1120 may also include a program/utility 1124 having a set (at least one) of program modules 1125, such program modules 1125 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 1130 may be representative of one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 1100 may also communicate with one or more external devices 1200 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 1100, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 1100 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 1150. Also, the electronic device 1100 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 1160. As shown, the network adapter 1160 communicates with the other modules of the electronic device 1100 over the bus 1130. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 1100, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiments of the present application.
Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the present application, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A method for labeling a recording on a telephone, the method comprising:
acquiring an audio file of a telephone recording, wherein the telephone recording is a recording of a conversation between a robot role and a client role;
performing voice recognition on the audio file to obtain a reference labeling text;
segmenting the reference annotation text according to the interactive sequence of the conversation to obtain at least one robot dialogue segment and at least one client dialogue segment;
generating page nodes corresponding to the robot dialogue segments and the client dialogue segments one by one;
and longitudinally arranging the page nodes in a target page according to the interaction sequence of the conversation, displaying the target page in a target visual area, and acquiring the marks of a marking person on the robot dialog segment and the client dialog segment in the target page.
2. The method of claim 1, wherein the method further comprises:
responding to a preset event of a key in the target page, and acquiring the rolling distance of the target page according to the height of a marked page node in the current page corresponding to the target visual area, wherein the preset event comprises a pressing event and/or a bounce event;
and controlling the target page to sequentially scroll according to the scroll distance, and displaying the target page in the target visual area.
3. The method of claim 2, wherein the obtaining the scrolling distance of the target page according to the height of the labeled page node in the current page corresponding to the target visual area comprises:
determining a first section of continuously marked page nodes from top to bottom in the current page corresponding to the target visual area;
and taking the sum of the heights of the first section of continuous labeled page nodes as the rolling distance.
4. The method of claim 1, wherein the method further comprises:
associating the first preset area of the page node corresponding to the customer dialogue segment with an audio playing component;
and if the first target event is detected in the preset area, playing an audio clip corresponding to the client dialog clip.
5. The method of claim 4, wherein the page nodes corresponding to the customer dialog segment include a first child node, a second child node, a third child node, and a fourth child node arranged horizontally in the display interface;
the first child node is used for displaying the client dialogue segment;
the second child node is used for displaying selectable voice recognition attributes corresponding to the customer dialogue segment so that a annotating person can determine the voice recognition attributes of the customer dialogue segment;
the third child node is used for displaying selectable natural language processing attributes corresponding to the client dialog piece so that a annotator can determine the natural language processing attributes of the client dialog piece;
and the fourth child node is used for displaying other attributes corresponding to the client dialogue segment so that a annotating person can determine the attributes of the client dialogue segment.
6. The method of claim 5, wherein the method further comprises:
monitoring a second target event in a second preset area of the second child node;
if the second target event is monitored, folding the second child node to hide part of contents in the second child node;
monitoring a third target event in a third preset area of the third child node;
if the third target event is monitored, the third child node is folded, so that part of the content in the third child node is hidden.
7. A telesound tagging device, the device comprising:
the system comprises an audio file acquisition unit, a client role acquisition unit and a voice recognition unit, wherein the audio file acquisition unit is used for acquiring an audio file of a telephone recording, and the telephone recording is a recording of a conversation between a robot role and a client role;
the voice recognition unit is used for carrying out voice recognition on the audio file to obtain a reference marking text;
the segmentation unit is used for segmenting the reference annotation text according to the interactive sequence of the conversation to obtain at least one robot dialogue segment and at least one client dialogue segment;
the page node generating unit is used for generating page nodes which correspond to the robot dialogue segments and the client dialogue segments one by one;
and the display unit is used for longitudinally arranging the page nodes in a target page according to the interaction sequence of the conversation and displaying the target page in a target visual area so as to enable a labeling person to label the robot dialogue segment and the client dialogue segment in the target page.
8. A telesound annotation device as recited in claim 7, wherein said device further comprises:
the rolling distance acquisition unit is used for responding the height of the marked page node in the current page and acquiring the rolling distance of the target page, and the preset event comprises a pressing event and/or a bounce event;
and the target page control unit is used for controlling the target pages to sequentially scroll according to the scroll distance and displaying the target pages in the target visual area.
9. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs which, when executed by the one or more processors, cause the electronic device to carry out the method of any one of claims 1-7.
10. A computer-readable medium, on which a computer program is stored, characterized in that the computer program, when executed, implements the method according to any of claims 1-7.
CN202111168567.6A 2021-09-29 2021-09-29 Telephone recording marking method, device, equipment and storage medium Pending CN113901186A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111168567.6A CN113901186A (en) 2021-09-29 2021-09-29 Telephone recording marking method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111168567.6A CN113901186A (en) 2021-09-29 2021-09-29 Telephone recording marking method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113901186A true CN113901186A (en) 2022-01-07

Family

ID=79190134

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111168567.6A Pending CN113901186A (en) 2021-09-29 2021-09-29 Telephone recording marking method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113901186A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114441029A (en) * 2022-01-20 2022-05-06 深圳壹账通科技服务有限公司 Recording noise detection method, device, equipment and medium of voice labeling system
CN115426434A (en) * 2022-08-15 2022-12-02 北京达佳互联信息技术有限公司 Data processing method, device and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114441029A (en) * 2022-01-20 2022-05-06 深圳壹账通科技服务有限公司 Recording noise detection method, device, equipment and medium of voice labeling system
CN115426434A (en) * 2022-08-15 2022-12-02 北京达佳互联信息技术有限公司 Data processing method, device and storage medium
CN115426434B (en) * 2022-08-15 2023-10-31 北京达佳互联信息技术有限公司 Data processing method, device and storage medium

Similar Documents

Publication Publication Date Title
US10635392B2 (en) Method and system for providing interface controls based on voice commands
RU2683174C2 (en) Ink to text representation conversion
CN106971009B (en) Voice database generation method and device, storage medium and electronic equipment
JP5685702B2 (en) Speech recognition result management apparatus and speech recognition result display method
US20190362022A1 (en) Audio file labeling process for building datasets at scale
CN104380375A (en) Device for extracting information from a dialog
CN110602516A (en) Information interaction method and device based on live video and electronic equipment
US20080077869A1 (en) Conference supporting apparatus, method, and computer program product
CN113901186A (en) Telephone recording marking method, device, equipment and storage medium
TW201510774A (en) Apparatus and method for selecting a control object by voice recognition
US20220147835A1 (en) Knowledge graph construction system and knowledge graph construction method
US10331304B2 (en) Techniques to automatically generate bookmarks for media files
WO2019045816A1 (en) Graphical data selection and presentation of digital content
CN114023301A (en) Audio editing method, electronic device and storage medium
US20220269724A1 (en) Audio playing method, electronic device, and storage medium
US10366149B2 (en) Multimedia presentation authoring tools
Knight et al. HeadTalk, HandTalk and the corpus: Towards a framework for multi-modal, multi-media corpus development
CN106873798B (en) Method and apparatus for outputting information
CN110992958B (en) Content recording method, content recording apparatus, electronic device, and storage medium
CN112230811A (en) Input method, device, equipment and storage medium
CN114047900A (en) Service processing method and device, electronic equipment and computer readable storage medium
CN111914115A (en) Sound information processing method and device and electronic equipment
EP4303716A1 (en) Method for generating data input, data input system and computer program
CN114462364B (en) Method and device for inputting information
JP7166370B2 (en) Methods, systems, and computer readable recording media for improving speech recognition rates for audio recordings

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination