CN111859006A

CN111859006A - Method, system, electronic device and storage medium for establishing voice entry tree

Info

Publication number: CN111859006A
Application number: CN201910307534.1A
Authority: CN
Inventors: 马颐中; 姜旭平
Original assignee: Shanghai Powerplus 1+1 Network & Technology Co ltd
Current assignee: Shanghai Powerplus 1+1 Network & Technology Co ltd
Priority date: 2019-04-17
Filing date: 2019-04-17
Publication date: 2020-10-30

Abstract

The invention provides a method, a device, electronic equipment and a storage medium for establishing a voice entry tree, wherein the method for establishing the voice entry tree comprises the following steps: responding to a conference establishing request, and establishing a conference group; responding to a first voice input, triggering and establishing a root word of a voice word tree, wherein the root word at least comprises voice data of the first voice input; responding to a second voice input, triggering and establishing a descendant entry of the root entry, wherein the descendant entry at least comprises voice data of the second voice input; generating one or more phonetic entry trees for the conference group based on the root entry and the descendant entries. The method and the device provided by the invention realize the establishment of the voice entry tree in the conference.

Description

Method, system, electronic device and storage medium for establishing voice entry tree

Technical Field

The present invention relates to the field of computer applications, and in particular, to a method, a system, an electronic device, and a storage medium for building a speech entry tree.

Background

Currently, in various conferences, recording personnel are required to record or record sound to generate a conference report. However, no matter the recording personnel manually record or record, there is no step of organizing the conference, and the obtained conference report is not only poor in readability, but also difficult to intuitively know the subject structure and the conference focus of the conference. Even if the recording personnel collate the recorded text and the recorded sound, a long time and labor cost are required.

Therefore, in the prior art, no scheme for efficiently, automatically and intuitively generating the conference report in the conference process exists in any live conference, telephone conference or video conference.

Disclosure of Invention

The present invention is directed to a method, apparatus, electronic device, and storage medium for building a speech entry tree, which overcome the limitations and disadvantages of the related art, and thereby overcome one or more of the problems due to the limitations and disadvantages of the related art, at least to some extent.

According to an aspect of the present invention, there is provided a method of building a speech entry tree, comprising:

responding to a conference establishing request, and establishing a conference group;

responding to a first voice input, triggering and establishing a root word of a voice word tree, wherein the root word at least comprises voice data of the first voice input;

responding to a second voice input, triggering and establishing a descendant entry of the root entry, wherein the descendant entry at least comprises voice data of the second voice input;

generating one or more phonetic entry trees for the conference group based on the root entry and the descendant entries.

Optionally, the triggering of establishing a root term of a speech term tree in response to the first speech input comprises:

Triggering the collection of the first voice in response to the operation of selecting the establishment of the root word entry;

and responding to the acquisition of the first voice, and triggering to establish a root word of the voice word tree.

Optionally, the triggering establishment of a descendant entry of the root entry in response to the second speech input includes:

triggering the acquisition of the second voice in response to the operation of selecting and establishing the descendant entry, wherein the operation of establishing the descendant entry indicates that the currently established descendant entry is in a parent-child relationship with the established root entry or the established descendant entry;

and responding to the acquisition of the second voice, and triggering and establishing the descendant entries of the root entry.

collecting the first voice;

identifying whether an object of the first voice belongs to a first object set according to the voiceprint characteristics of the first voice, wherein the first object set comprises one or more first objects;

and if so, triggering to establish a root word of the voice word tree.

Optionally, the first set of objects is pre-set; or

The first object set is updated according to the contribution degree of each object to one or more voice entry trees of the conference group, and the contribution degree is calculated according to the position of the entry established by the object in the voice entry trees.

collecting the second voice;

recognizing second text data of the second voice;

judging whether the second text data hits a keyword set, wherein the keyword set comprises at least one keyword;

if yes, determining the entry related to the hit keyword as a father entry of the acquired second voice;

establishing a descendant entry of the collected second voice according to the determined parent entry;

and extracting a keyword from the second text data of the second voice, associating the descendant entry, and adding the keyword into the keyword set.

Optionally, the keywords associated with entries of the same phonetic entry tree are different.

Optionally, the determining the entry associated with the hit keyword as the parent entry of the collected second speech includes:

if the hit keyword is associated with a plurality of entries, determining an entry closest to the acquired second voice by establishing a time distance in descendant entries of the plurality of entries;

determining an ancestor entry of the determined entry in the plurality of entries as a parent entry of the captured second speech.

collecting the first voice;

recognizing first text data of the first voice;

judging whether the first text data hits the keyword set or not;

if not, triggering and establishing a root word strip of the voice word strip tree based on the voice data of the first voice;

and extracting a keyword from the first text data of the first voice, associating the root term and adding the keyword into the keyword set.

Optionally, at least part of the vocabulary entry is displayed as the associated keyword and the icon for playing the voice data corresponding to the vocabulary entry through operation.

Optionally, when the voice data of the entry has the same voiceprint feature, displaying the entry in the same form;

and when the voice data of the entry has different voiceprint characteristics, displaying the entry in different forms.

Optionally, after generating the plurality of entries for the conference group based on the root entry and the descendant entry, the method further includes:

and pushing a plurality of voice entry trees to each object of the conference group, wherein the pushed plurality of voice entry trees are displayed in an order from large to small according to the contribution degree of the pushed object to the plurality of voice entry trees.

According to another aspect of the present invention, there is also provided an apparatus for building a speech entry tree, including:

the group establishing module is used for responding to a conference establishing request and establishing a conference group;

the system comprises a root entry establishing module, a first voice input module and a second voice input module, wherein the root entry establishing module is used for responding to the first voice input and triggering and establishing a root entry of a voice entry tree, and the root entry at least comprises voice data of the first voice input;

a descendant entry establishing module, configured to trigger establishment of a descendant entry of the root entry in response to a second voice input, where the descendant entry at least includes voice data of the second voice input;

a generating module, configured to generate one or more speech entry trees of the conference group based on the root entry and the descendant entries.

According to still another aspect of the present invention, there is also provided an electronic apparatus, including: a processor; a storage medium having stored thereon a computer program which, when executed by the processor, performs the steps as described above.

According to yet another aspect of the present invention, there is also provided a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps as described above.

Compared with the prior art, the invention has the advantages that:

on one hand, in the process of establishing the entries, the parent-child relationship of each entry is determined, so that the voice entry tree of the conference is formed, each voice entry tree can correspond to one discussion subject, therefore, the voice entry tree can be used as conference records and provided for conference participants, the conference records do not need manual recording and sorting, and are automatically generated in the conference process, and the labor cost is reduced; on the other hand, the form of the voice entry tree provided by the invention intuitively shows the conference structure and the conference content to the participants, and has higher readability; on the other hand, the entries of the generated conference records can play the corresponding voice data through the entries comprising the voice data, so that the specific content of the conference can be clearly obtained, and backtracking and correction can be performed.

Drawings

The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.

Fig. 1 shows a flowchart of a method of building a speech entry tree according to an embodiment of the present invention.

Fig. 2 to 10 are diagrams illustrating a process of building a phonetic entry tree according to an embodiment of the present invention.

Fig. 11 is a block diagram illustrating an apparatus for building a speech entry tree according to an embodiment of the present invention.

Fig. 12 schematically illustrates a computer-readable storage medium in an exemplary embodiment of the invention.

Fig. 13 schematically illustrates an electronic device in an exemplary embodiment of the invention.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the drawings are merely schematic illustrations of the invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Fig. 1 shows a flowchart of a method of building a speech entry tree according to an embodiment of the present invention. The method for establishing the voice entry tree comprises the following steps:

step S110: responding to a conference establishing request, and establishing a conference group;

step S120: responding to a first voice input, triggering and establishing a root word of a voice word tree, wherein the root word at least comprises voice data of the first voice input;

step S130: responding to a second voice input, triggering and establishing a descendant entry of the root entry, wherein the descendant entry at least comprises voice data of the second voice input;

step S140: generating one or more phonetic entry trees for the conference group based on the root entry and the descendant entries.

In the method for establishing the voice entry tree provided by the invention, on one hand, in the process of establishing the entries, the parent-child relationship of each entry is determined, so that the voice entry tree of the conference is formed, each voice entry tree can correspond to one discussion theme, therefore, the voice entry tree can be used as a conference record to be provided for conference participants, the conference record does not need manual recording and sorting, and is automatically generated in the conference process, so that the labor cost is reduced; on the other hand, the form of the voice entry tree provided by the invention intuitively shows the conference structure and the conference content to the participants, and has higher readability; on the other hand, the entries of the generated conference records can play the corresponding voice data through the entries comprising the voice data, so that the specific content of the conference can be clearly obtained, and backtracking and correction can be performed.

In various embodiments of the present invention, the voice entry tree has a tree data structure, each entry is a node of the tree data structure, and the voice entry tree has the same characteristics as the tree data structure.

Specifically, the steps S110 to S140 are only for schematically illustrating the sequence of the steps, and the present invention is not limited thereto. For example, before step S140, step S120 and step S130 may be performed a plurality of times to generate a plurality of vocabulary entry trees.

Specifically, the present invention does not limit the conference to a live conference (in which participants communicate face to face), a teleconference, a video conference, or the like.

In an embodiment of the present invention, the triggering, in response to the first voice input, the step S120 of establishing the root entry of the voice entry tree may include the following steps: triggering the collection of the first voice in response to the operation of selecting the establishment of the root word entry; and responding to the acquisition of the first voice, and triggering to establish a root word of the voice word tree. The step S130, in response to the second voice input, triggering establishment of a descendant entry of the root entry may include the steps of: triggering the acquisition of the second voice in response to the operation of selecting and establishing the descendant entry, wherein the operation of establishing the descendant entry indicates that the currently established descendant entry is in a parent-child relationship with the established root entry or the established descendant entry; and responding to the acquisition of the second voice, and triggering and establishing the descendant entries of the root entry.

The above embodiments are described in conjunction with fig. 2 to 10. After the conference group is established, the conference participants in the conference group are provided with an interface as shown in fig. 2, and the interface provides a prompt 210 for establishing a root entry and an icon 211 for establishing the root entry. When any participant in the conference group clicks the icon 211 through clicking or other operation modes, the collection of the first voice is triggered. After the first speech acquisition is completed, the establishment of the root entry 212 of the speech entry tree is triggered. In some embodiments, the start and end times for capturing the first voice are determined by the user continuously clicking/pressing icon 211 (pressing icon 211 to begin capturing the first voice, releasing icon 211 to complete capturing the first voice). In other embodiments, the system analyzes the sentence interval of the first voice, and judges that the first voice collection is completed when the interval time exceeds a predetermined threshold. In still other embodiments, a first voice ending instruction is preset, the conference participant speaks the first voice ending instruction after the conference participant speaks the voice, and the system ends the collection of the first voice when recognizing that the conference participant speaks the first voice ending instruction. In still other embodiments, based on voiceprint recognition, the capture of a first voice is terminated when a change in the voiceprint characteristics of the voice data is made to indicate a speech object change (in some embodiments, a speech object change may trigger the capture of another second/first voice). The present invention can also be implemented in many different ways, which are not described herein. In the present embodiment, the created root entry 212 is shown as a first issue, and the invention is not limited thereto. The display mode of the first issue may be preset, for example, a preset root entry is displayed as the first issue, the second issue, the third issue, and the like according to the established time. In some variations, the agenda for the conference is predetermined, and the display of the root term entry may be predetermined based on the predetermined conference agenda. For example, the conference agenda includes a leader utterance, a guest utterance, an interactive quiz, and the like. The method can be used for corresponding to each item of the conference agenda according to the sequence established by the root terms, so that the root terms are displayed as leader speech, guest speech or interactive questioning. Or matching the text data obtained by identifying the first voice with preset entries of the conference agenda, and displaying the root entry as the entry when any entry is matched. In still other variations, the root term bar may be displayed as a complete or partial text data obtained after recognizing the first speech. When displaying partial text data, only the first N words of the text data (N is an integer greater than or equal to 2) may be displayed, only the first sentence of the text data may be displayed, or only the entity word with the highest frequency of occurrence in the first speech (a word without entity meaning, such as a conjunctive word, may be displayed, and this step may be implemented by various word segmentation and part-of-speech analysis algorithms, which is not a limitation of the present invention).

In the above embodiment, the root entry 212 includes the voice data of the first voice, the icon 213 of the voice data of the first voice is correspondingly displayed at the root entry 212, and when the user operates the icon 213, the voice data of the first voice of the root entry 212 can be played.

After the root entry 212 is created, the interface shown in FIG. 3 may be displayed with an option 221 to create sub-entries for the root entry 212. When any participant in the conference group clicks the icon 221 by clicking or other operation methods, the acquisition of the second voice is triggered. When the second speech acquisition is complete, it triggers the establishment of a subword 222 that has a parent-child relationship with the root term 212. Icon 221 indicates that each entry it is operated upon to create is a child of root entry 212.

In some embodiments, the start and end times for capturing the second voice are determined by the user continuously clicking/pressing icon 221 (pressing icon 221 to begin capturing the second voice, releasing icon 221 to complete capturing the second voice). In other embodiments, the system analyzes the sentence interval of the second voice, and determines that the second voice is collected when the interval time exceeds a predetermined threshold. In still other embodiments, a second voice ending instruction is preset, the conference participant speaks the second voice ending instruction after the conference participant speaks the second voice ending instruction, and the system ends the collection of the second voice when recognizing that the conference participant speaks the second voice ending instruction. In still other embodiments, based on voiceprint recognition, the collection of the second voice is terminated when a change in the voiceprint characteristics of the voice data is made to indicate a speech object change (in some embodiments, a speech object change may trigger the collection of another second/first voice). The present invention can also be implemented in many different ways, which are not described herein. In the present embodiment, the sub-entry 222 of the created root entry 212 is shown as utterance a, but the present invention is not limited thereto. The display mode of utterance a may be set in advance, and for example, it is set in advance that the root entry is displayed as utterance a, utterance B, utterance C, and the like according to the established time. In some variations, the agenda for the conference is predetermined, and the display of the sub-entry may be predetermined according to the predetermined conference agenda. For example, the conference agenda includes a leader utterance, a guest utterance, an interactive quiz, and the like. Under the agenda of leader speech, the predetermined speech sequence of each leader can correspond to the speech of each leader according to the sequence established by the sub-terms (so that the sub-terms are leader a speech, leader B speech and the like). Or matching the text data obtained by identifying the second voice with each entry of the conference agenda under the preset root entry, and displaying the sub-entry as the entry when any entry is matched. In still other variations, the subword can be displayed as a complete or partial text data obtained after recognition of the second speech. When displaying partial text data, only the first N words of the text data (N is an integer greater than or equal to 2) may be displayed, only the first sentence of the text data may be displayed, or only the entity word with the highest frequency of occurrence in the first speech (a word without entity meaning, such as a conjunctive word, may be displayed, and this step may be implemented by various word segmentation and part-of-speech analysis algorithms, which is not a limitation of the present invention).

In the above embodiment, the sub-entry 222 includes the voice data of the second voice, the icon 223 of the voice data of the second voice is correspondingly displayed at the root entry 222, and when the user operates the icon 223, the voice data of the second voice of the sub-entry 222 can be played.

The interface shown in fig. 3 also provides a prompt 310 to create another entry and an icon 311 to create a root entry. When any participant in the conference group clicks the icon 311 by clicking or other operation modes at any time in the conference, the acquisition of the first voice is triggered. After the first speech acquisition is completed, the establishment of the root entry 312 of another speech entry tree is triggered.

When the child entry 222 of the root entry 212 is created, the interface shown in FIG. 4 may be displayed, with an option 221 (siblings of the child entry 222) to create the child entry of the root entry 212 provided in the interface. When any participant in the conference group clicks the icon 221 by clicking or other operation methods, the acquisition of the second voice is triggered. When the second speech acquisition is complete, it triggers the establishment of a subword 222 that has a parent-child relationship with the root term 212. Icon 221 indicates that each entry it is operated upon to create is a child of root entry 212. An option 231 to create a child entry of the child entry 222 (i.e., a grandchild entry of the root entry 212) is also provided in the interface shown in fig. 4. When any participant in the conference group clicks the icon 231 by clicking or other operation methods, the collection of the second voice is triggered. When the second speech acquisition is completed, the creation of a sub-entry having a parent-child relationship with the sub-entry 222 is triggered. Icon 231 indicates that each entry it is operated upon to create is a sub-entry of sub-entry 222.

The interface shown in fig. 4 also provides a prompt 310 to create another entry and an icon 311 to create a root entry. When any participant in the conference group clicks the icon 311 by clicking or other operation modes at any time in the conference, the acquisition of the first voice is triggered. After the first speech acquisition is completed, the establishment of the root entry 312 of another speech entry tree is triggered.

When the conference participants operate the icon 231 shown in fig. 4 to establish the sub-entry 232 of the sub-entry 222 in a similar manner to the sub-entry 222 (the entry 232 corresponds to the icon 233, and the icon 233 is operated to play the voice data of the second voice of the sub-entry 232), an interface as shown in fig. 5 is provided. The sub-entry 232 is displayed on the basis of fig. 4, and an icon 241 of the sub-entry that can establish the sub-entry 232.

When the conference participant operates the icon 231 in the interface shown in fig. 5, and establishes another sub-entry 232 of the sub-entry 222 corresponding to the utterance C in a similar manner to the sub-entry 222 (the entry 232 corresponding to the utterance C corresponds to the icon 233, and the icon 223 is operated to play the voice data of the second voice corresponding to the sub-entry 232 of the utterance C), the interface shown in fig. 6 is provided. A sub-entry 232 corresponding to utterance C and an icon 241 that may create a sub-entry corresponding to sub-entry 232 of utterance C are provided on the basis of fig. 5.

When the conference participant operates the icon 221 in the interface shown in fig. 6 to establish another sub-entry 222 of the root entry 212 corresponding to the utterance D in a manner similar to the sub-entry 222 (the entry 222 corresponding to the utterance D corresponds to the icon 223, and the icon 223 is operated to play the voice data of the second voice of the sub-entry 222 corresponding to the utterance D), then establish the entry 232 of the utterance E (the entry of the utterance E is a sub-entry of the utterance D) in a manner similar to the entry 232 of the utterance B, and provide the interface shown in fig. 7. Based on fig. 6, a sub-entry 222 corresponding to utterance D, a sub-entry 232 corresponding to utterance E, an icon 231 that can establish a sub-entry corresponding to the sub-entry 222 of utterance D, and an icon 241 that can establish a sub-entry corresponding to the sub-entry 232 of utterance E.

In a similar manner, subword 322 of root entry 312, subword 332 of subword 322 may be established.

Icons

313, 323, 333 for playing the voice data of each entry and

icons

321, 331, 341 for creating sub-entries of each entry are provided in the interface, as shown in fig. 8 to 10.

The above is merely an exemplary description of a specific implementation manner of the present invention, and the interfaces provided in fig. 2 to fig. 10 are also merely exemplary, and the interface provided by the present invention is not limited thereto, and the addition, omission, shape change, size change, position layout change, etc. of the interface elements are within the scope of protection of the present invention without departing from the concept of the present invention.

In a specific implementation of the foregoing embodiment, the step S120, in response to the first voice input, triggering establishment of a root entry of the voice entry tree further includes the following steps: collecting the first voice; identifying whether an object of the first voice belongs to a first object set according to the voiceprint characteristics of the first voice, wherein the first object set comprises one or more first objects; and if so, triggering to establish a root word of the voice word tree. Specifically, the implementation realizes division of authority established by the vocabulary entry through voiceprint recognition. In the implementation, in order to avoid the situation that a plurality of people in the conference speak to cause confusion, the root term can be established only when the object of the first voice belongs to the first object set, so that the management of the conference is realized through the authority established by the root term so as to facilitate the establishment of the voice term tree. The first set of objects is predetermined. It will be appreciated that during the course of the conference, the moderator may be the first object in the first set of objects, and the first object may be determined by voiceprint recognition, such that the root term is established by the first object. In some variations, the first set of objects is updated according to a degree of contribution of each object to one or more of the phonetic entry trees of the conference group. In this variation, one or more objects with the highest contribution degree during the conference may be added to the first set of objects, so that the object with the highest contribution degree to the conference manages the conference. Specifically, the contribution degree is calculated according to the position of the entry established by the object in the speech entry tree. For example, the contribution degree of the created root entry is set to 10 in advance, the contribution degrees are sequentially decreased according to the hierarchy of the speech entry tree (the contribution degree of the child entry of the root entry is 9, the contribution degree of the grandchild entry of the root entry is 8, and so on), and the sum of the contribution degrees of all the entries created by the object is used as the contribution degree of the object. For another example, the number of descendant terms of an entry is used as the contribution degree of the entry, and the sum of the contribution degrees of all the terms established by the object is used as the contribution degree of the object. The present invention can also realize a calculation mode of more contribution degrees, which is not described herein again.

In some specific implementations of the foregoing embodiments, only the establishment of the root entry is set with an authority, and the establishment of each descendant entry is not set with an authority, so as to reduce the system computation load. In other specific implementations of the foregoing embodiments, different permissions may be defined according to a hierarchy of the number of voice entries, and the different permissions are associated with different voiceprints, so as to implement more effective conference management and establishment of a voice entry tree, which is not limited by the present invention.

In a specific implementation of the foregoing embodiment, the triggering, in response to the second voice input, the step S130 of establishing a descendant entry of the root entry may include the following steps: collecting the second voice; recognizing second text data of the second voice; judging whether the second text data hits a keyword set, wherein the keyword set comprises at least one keyword; if yes, determining the entry related to the hit keyword as a father entry of the acquired second voice; establishing a descendant entry of the collected second voice according to the determined parent entry; and extracting a keyword from the second text data of the second voice, associating the descendant entry, and adding the keyword into the keyword set. The step S120 of triggering establishment of the root entry of the speech entry tree in response to the first speech input may include the steps of: collecting the first voice; recognizing first text data of the first voice; judging whether the first text data hits the keyword set or not; if not, triggering and establishing a root word strip of the voice word strip tree based on the voice data of the first voice; and extracting a keyword from the first text data of the first voice, associating the root term and adding the keyword into the keyword set.

Specifically, in the above implementation, the entry can be created without providing an icon for creating the entry and determining the entry creation position. In particular, with continued reference to fig. 2-10, when the conference begins, a first speech is captured, and since the set of keywords is now an empty set, the root-term entry 212 is created from the first speech. First text data of the first voice is recognized, and keywords are extracted from the first text data. The keyword may be, for example, a preset display title (fig. 3 is a first issue); may be the first N words of the first text data; the entity word with the highest occurrence frequency in the first text data (obtained through word segmentation and part-of-speech analysis algorithms); the extracted keywords may also be calculated according to a semantic analysis algorithm, which is not intended to limit the present invention. The keywords are preferably displayed in association with the root term entries 212 so that each meeting participant is intuitively aware. The extracted keywords (e.g., the first issue) are associated with the root term 212 and added to the keyword set. Then, the second voice is collected, the second text data of the second voice is recognized, and when the second text data of the second voice includes the first topic in the keyword set, the sub-term 222 of the root term corresponding to the first topic is established. In the same manner as the keyword extraction of the root entry 212, keywords (speech a is taken as an example) are extracted from the second text data, and the speech a is associated with the entry 222 and combined with the keywords. Then, a second voice is collected, second text data of the second voice is recognized, and when the second text data of the second voice comprises an utterance a in the keyword set, a sub-vocabulary bar 232 of the sub-vocabulary bar 222 corresponding to the utterance a is established. In the same manner as the extraction of the keywords of the sub-entry 222, the keywords (speech B is taken as an example) are extracted from the second text data, and the speech B is associated with the entry 232 and added to the keyword combination. By analogy, when the identified text data comprises the keywords in the keyword set, establishing sub-terms of the terms corresponding to the keywords; and when the identified data does not comprise any keyword in the keyword set, establishing a root term of another speech term tree. Therefore, the automatic establishment of the entries is facilitated, and manual operation is not needed.

In a specific implementation of the foregoing embodiment, the keywords associated with the entries of the same speech entry tree are different, so as to prevent confusion caused by the creation of entries. In a specific implementation of the foregoing embodiment, the determining the entry associated with the hit keyword as the parent entry of the collected second speech may further include the following steps: if the hit keyword is associated with a plurality of entries, determining an entry closest to the acquired second voice by establishing a time distance in descendant entries of the plurality of entries; determining an ancestor entry of the determined entry in the plurality of entries as a parent entry of the captured second speech. In this embodiment, considering that there may be the same keyword in the sub-entries of multiple speech entry trees (for example, the keyword of the root entry of each speech entry tree is a different item, and the keywords of the sub-entries of each root entry may all be an item background, an item progress, etc.), the confusion may be generated by building entries through keywords, and since the conference discussion generally has less jumps, the parent entry of the entry to be currently built may be determined by the building time of the sub-entry of the entry. In this embodiment, referring to fig. 10, when a keyword is a project background, terms of utterance a and utterance F are associated, and a term whose establishment time is closest to the acquisition time of the current second voice is determined from a descendant term of utterance a and a descendant term of utterance F, for example, a child term (utterance H) of utterance F, and the term of utterance F is determined as a parent term of a term to be established by the current second voice.

In a specific implementation of the foregoing embodiment, at least a part of the vocabulary entry is displayed as the associated keyword and the icon for playing the voice data corresponding to the vocabulary entry through operation. For example, the remaining entries, except for the leaf entry, are displayed as associated keywords, so that the participators can create sub-entries through voice. In some variations, all entries are displayed as associated keywords and icons that are manipulated to play the speech data corresponding to the entry for reading.

In a specific implementation of the foregoing embodiment, when the voice data of the entry has the same voiceprint feature, the entry is displayed in the same form; and when the voice data of the entry has different voiceprint characteristics, displaying the entry in different forms. In this embodiment, the objects of the different terms are identified by the voiceprint, so that the terms are displayed in different forms for the different objects. For example, the entries created by different objects are displayed in different colors and shapes. In other embodiments, when displaying the speech entry tree to a different object, the entries created by the object may be displayed in different colors only. The present invention can also realize more different display modes, which are not described herein.

In a specific implementation of the foregoing embodiment, after the step S140 generates the multiple vocabulary entry trees of the conference group based on the root entry and the descendant entries, the method may further include the following steps: and pushing a plurality of voice entry trees to each object of the conference group, wherein the pushed plurality of voice entry trees are displayed in an order from large to small according to the contribution degree of the pushed object to the plurality of voice entry trees. Specifically, the contribution degree is calculated according to the position of the entry established by the object in the speech entry tree. For example, the contribution degree of the created root entry is set to 10 in advance, the contribution degrees are sequentially decreased according to the hierarchy of the speech entry tree (the contribution degree of the child entry of the root entry is 9, the contribution degree of the grandchild entry of the root entry is 8, and so on), and the sum of the contribution degrees of all the entries created by the object is used as the contribution degree of the object. For another example, the number of descendant terms of an entry is used as the contribution degree of the entry, and the sum of the contribution degrees of all the terms established by the object is used as the contribution degree of the object. The present invention can also realize a calculation mode of more contribution degrees, which is not described herein again. Therefore, the voice entry tree (as a conference report) can be pushed to the conference-participating object in a personalized manner, and the plurality of voice entry trees are sorted according to the contribution degree of the user to the voice entry tree from large to small, so that the user can view the establishment of the voice entry tree with the maximum contribution degree (with the highest participation degree) in a priority manner.

In each embodiment of the present invention, the speech entry tree may be presented to the user in real time, or may be presented to the user after the conference is finished, which is not limited in the present invention.

In various embodiments of the present invention, modification of entry keywords, modification of entry positions, and the like may also be provided. Preferably, only the object that created the entry may modify the entry. In the embodiment of setting the authority, the entry can be modified when the authority is greater than the object for establishing the entry. The invention is not so limited.

The foregoing schematically describes several implementations of the present invention, which may be implemented alone or in combination, and the present invention is not limited thereto.

Referring now to fig. 11, fig. 11 is a block diagram illustrating an apparatus for building a phonetic entry tree according to an embodiment of the present invention. The apparatus 400 for building a vocabulary entry tree includes a group building module 410, a root vocabulary entry building module 420, a descendant vocabulary entry building module 430, and a generating module 440.

The group establishment module 410 is configured to establish a conference group in response to a conference establishment request.

The root entry building module 420 is configured to trigger building of a root entry of a speech entry tree in response to a first speech input, where the root entry includes at least speech data of the first speech input.

The descendant entry establishing module 430 is configured to, in response to the second voice input, trigger establishment of a descendant entry of the root entry, where the descendant entry at least includes the voice data of the second voice input.

The generating module 440 is configured to generate one or more entries trees of the conference group based on the root entry and the descendant entries.

In the apparatus for establishing a speech entry tree according to the exemplary embodiment of the present invention, on one hand, in the process of establishing entries, the parent-child relationship of each entry is determined, so as to form the speech entry tree of the conference, each speech entry tree may correspond to one discussion topic, and therefore, the speech entry tree may be provided to conference participants as a conference record, the conference record does not need to be manually recorded and sorted, and is automatically generated in the conference process, so as to reduce the labor cost; on the other hand, the form of the voice entry tree provided by the invention intuitively shows the conference structure and the conference content to the participants, and has higher readability; on the other hand, the entries of the generated conference records can play the corresponding voice data through the entries comprising the voice data, so that the specific content of the conference can be clearly obtained, and backtracking and correction can be performed.

Fig. 11 is a schematic diagram illustrating the apparatus 400 for building a speech entry tree according to the present invention, and the splitting, combining and adding of modules are within the scope of the present invention without departing from the spirit of the present invention. The apparatus 400 for building a speech entry tree according to the present invention can be implemented by software, hardware, firmware, plug-in and any combination thereof, which is not limited to the present invention.

In some embodiments of the present invention, the participating object may realize voice collection through its own terminal device, and the interface established by the voice entry tree may be pushed to the terminal device of the participating object. The apparatus 400 for building a speech entry tree may be a server in the cloud to implement pushing of an interface and building of a speech entry tree. In this embodiment, the aforementioned terminal devices may be reused as terminal devices participating in a live conference (a voice collecting device such as a microphone is independent from the terminal devices, the terminal devices may control the voice collecting device; a conference object may share the voice collecting device or each control the respective voice collecting device; the apparatus 400 for establishing a voice entry tree may be connected to each terminal device through a local area network or other means), a teleconference, or a video conference. The embodiment can be applied to telephone conferences and video conferences.

In other embodiments of the present invention, when applied to a live conference, the apparatus 400 for creating a speech entry tree may be a local device of the live conference, the speech entry tree may be displayed on a display device such as a projection screen, and the entries are created by speech recognition or controlled by communication between terminal devices of participants and the apparatus 400 for creating a speech entry tree.

The above description is only illustrative of different application scenarios of the apparatus 400 for building a speech entry tree according to the present invention, and the present invention is not limited thereto.

In an exemplary embodiment of the present invention, a computer-readable storage medium is also provided, on which a computer program is stored, which when executed by, for example, a processor, may implement the steps of the method of building a speech entry tree as described in any of the above embodiments. In some possible embodiments, the various aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the method part of building a phonetic entry tree mentioned above in this description, when said program product is run on the terminal device.

Referring to fig. 12, a program product 700 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the tenant computing device, partly on the tenant device, as a stand-alone software package, partly on the tenant computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing devices may be connected to the tenant computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

In an exemplary embodiment of the invention, there is also provided an electronic device that may include a processor and a memory for storing executable instructions of the processor. Wherein the processor is configured to perform the steps of the method for building a phonetic entry tree according to any one of the above embodiments via execution of the executable instructions.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 500 according to this embodiment of the invention is described below with reference to fig. 13. The electronic device 500 shown in fig. 13 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 13, the electronic device 500 is embodied in the form of a general purpose computing device. The components of the electronic device 500 may include, but are not limited to: at least one processing unit 510, at least one memory unit 520, a bus 530 that couples various system components including the memory unit 520 and the processing unit 510, a display unit 540, and the like.

Wherein the storage unit stores program code, which can be executed by the processing unit 510, to cause the processing unit 510 to perform the steps according to various exemplary embodiments of the present invention described in the above-mentioned method part of building a phonetic entry tree of the present specification. For example, the processing unit 510 may perform the steps as shown in fig. 1.

The memory unit 520 may include a readable medium in the form of a volatile memory unit, such as a random access memory unit (RAM)5201 and/or a cache memory unit 5202, and may further include a read only memory unit (ROM) 5203.

The memory unit 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205, such program modules 5205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 530 may be one or more of any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 500 may also communicate with one or more external devices 600 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a tenant to interact with the electronic device 500, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 500 to communicate with one or more other computing devices. Such communication may be through input/output (I/O) interfaces 550. Also, the electronic device 500 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) via the network adapter 560. The network adapter 560 may communicate with other modules of the electronic device 500 via the bus 530. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 500, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, or a network device, etc.) to execute the above method for building a speech entry tree according to the embodiment of the present invention.

Compared with the prior art, the invention has the advantages that:

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

Claims

1. A method for building a tree of speech entries, comprising:

2. The method of building a tree of speech terms of claim 1, wherein triggering the building of a root term of the tree of speech terms in response to a first speech input comprises:

3. The method of building a tree of speech terms of claim 1, wherein said triggering the building of descendant terms of the root term in response to a second speech input comprises:

4. The method of building a tree of speech terms of claim 1, wherein triggering the building of a root term of the tree of speech terms in response to a first speech input comprises:

collecting the first voice;

and if so, triggering to establish a root word of the voice word tree.

5. The method of building a tree of phonetic entries according to claim 4,

the first object set is preset; or

6. The method of building a tree of speech terms of claim 1, wherein said triggering the building of descendant terms of the root term in response to a second speech input comprises:

collecting the second voice;

Recognizing second text data of the second voice;

7. The method of claim 6, wherein the keywords associated with the entries of the same phonetic entry tree are different.

8. The method of claim 7, wherein determining the entry associated with the hit keyword as a parent entry of the captured second speech comprises:

9. The method of building a tree of speech terms of claim 6 wherein triggering the building of a root term of the tree of speech terms in response to a first speech input comprises:

collecting the first voice;

recognizing first text data of the first voice;

judging whether the first text data hits the keyword set or not;

10. The method according to any one of claims 6 to 9, wherein at least some of the entries are displayed as associated keywords and icons for playing the speech data corresponding to the entries upon operation.

11. The method of building a tree of phonetic entries according to any of claims 1 to 9,

displaying the entry in the same form when the voice data of the entry has the same voiceprint characteristics;

12. The method of creating a vocabulary entry tree according to any of claims 1 to 9, wherein after generating a plurality of vocabulary entry trees for the conference group based on the root vocabulary entry and the descendant vocabulary entry, further comprising:

13. An apparatus for building a tree of speech entries, comprising:

14. An electronic device, characterized in that the electronic device comprises:

A processor;

a memory having stored thereon a computer program which, when executed by the processor, performs the steps of any of claims 1 to 12.

15. A storage medium, having stored thereon a computer program which, when executed by a processor, performs the steps of any of claims 1 to 12.