WO2022161122A1 - 一种会议纪要的处理方法、装置、设备及介质 - Google Patents

一种会议纪要的处理方法、装置、设备及介质 Download PDF

Info

Publication number
WO2022161122A1
WO2022161122A1 PCT/CN2022/070282 CN2022070282W WO2022161122A1 WO 2022161122 A1 WO2022161122 A1 WO 2022161122A1 CN 2022070282 W CN2022070282 W CN 2022070282W WO 2022161122 A1 WO2022161122 A1 WO 2022161122A1
Authority
WO
WIPO (PCT)
Prior art keywords
meeting
statement
text
sentence
minutes
Prior art date
Application number
PCT/CN2022/070282
Other languages
English (en)
French (fr)
Inventor
杜春赛
杨晶生
陈可蓉
郑翔
徐文铭
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Priority to JP2023544227A priority Critical patent/JP2024506495A/ja
Priority to US18/262,400 priority patent/US20240079002A1/en
Publication of WO2022161122A1 publication Critical patent/WO2022161122A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • the present disclosure relates to the technical field of meeting identification, and in particular, to a method, device, device and medium for processing meeting minutes.
  • the audio and video can be converted into text through recognition processing, and the to-do statement including the task intent can be determined from the text.
  • the determination of the to-do statement has problems of low efficiency and low accuracy.
  • the present disclosure provides a method, apparatus, device and medium for processing meeting minutes.
  • An embodiment of the present disclosure provides a method for processing meeting minutes, the method comprising:
  • a meeting to-do statement in the initial to-do statement is determined based on the temporal result.
  • Embodiments of the present disclosure also provide a method for processing meeting minutes, the method comprising:
  • the meeting minutes display interface displays the meeting audio and video, the meeting text of the meeting audio and video, and the target summary statement
  • the target minutes statement and associated sentences of the target minutes statement are displayed.
  • An embodiment of the present disclosure also provides a device for processing meeting minutes, the device comprising:
  • the text acquisition module is used to acquire the conference text of the conference audio and video
  • an initial to-do module used to input the meeting text into the to-do recognition model to determine the initial to-do statement
  • a temporal judgment module for inputting the initial to-do statement into a temporal judgment model, and to determine the temporal result of the initial to-do statement
  • a meeting to-do module is configured to determine a meeting to-do sentence in the initial to-do sentence based on the temporal result.
  • An embodiment of the present disclosure also provides a device for processing meeting minutes, the device comprising:
  • a display triggering module configured to receive a user's display trigger operation on the target summary statement in the meeting minutes display interface, wherein the meeting minutes display interface displays the meeting audio and video, the meeting text of the meeting audio and video, and the target minutes statement;
  • the display module is used to display the target summary statement and the associated statement of the target summary statement.
  • An embodiment of the present disclosure further provides an electronic device, the electronic device includes: a processor; a memory for storing instructions executable by the processor; the processor for reading the memory from the memory The instructions can be executed, and the instructions can be executed to implement the method for processing meeting minutes provided by the embodiments of the present disclosure.
  • An embodiment of the present disclosure further provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is used to execute the method for processing meeting minutes provided by the embodiment of the present disclosure.
  • the technical solution provided by the embodiment of the present disclosure has the following advantages: the solution for processing meeting minutes provided by the embodiment of the present disclosure obtains the conference text of the conference audio and video; the conference text is input into the to-do recognition model, and the initial To-do statement; input the initial to-do statement into the temporal judgment model to determine the temporal result of the initial to-do statement; determine the meeting to-do statement in the initial to-do statement based on the temporal result.
  • the tense judgment is added to avoid the completed sentences from being recognized as the meeting to-do sentences, and the accuracy of determining the meeting to-do sentences is greatly improved.
  • the user's work efficiency based on the meeting to-do statement can be improved, and the user's experience effect can be improved.
  • FIG. 1 is a schematic flowchart of a method for processing meeting minutes according to an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of another method for processing meeting minutes provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of a meeting minutes display interface provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic structural diagram of a device for processing meeting minutes according to an embodiment of the present disclosure
  • FIG. 5 is a schematic structural diagram of a device for processing meeting minutes according to an embodiment of the present disclosure
  • FIG. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
  • the term “including” and variations thereof are open-ended inclusions, ie, "including but not limited to”.
  • the term “based on” is “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
  • the audio and video of the meeting can be converted into text through recognition processing.
  • the content of conference texts is usually large, so how to quickly and correctly filter out sentences containing task intent is particularly important.
  • the content of the meeting is often a record of discussions on one or more topics, and eventually a certain degree of conclusion or many other topics are derived.
  • many tasks to be completed are often arranged in the meeting, and the meeting text of the meeting has a large number of words. If a task to be completed can be selected, it can save a lot of time for the user to organize the meeting minutes.
  • the to-do statement can be a type of intent.
  • the determination of to-do sentences has the problems of low efficiency and low accuracy.
  • an embodiment of the present disclosure provides a method for processing meeting minutes. The method is described below with reference to specific embodiments. .
  • FIG. 1 is a schematic flowchart of a method for processing meeting minutes according to an embodiment of the present disclosure.
  • the method can be executed by a device for processing meeting minutes, where the device can be implemented by software and/or hardware, and can generally be integrated in electronic equipment. .
  • the method includes:
  • Step 101 The processing device acquires the conference text of the conference audio and video.
  • the conference audio and video refers to audio and/or video used to record a conference process.
  • the conference text refers to the text content obtained after the audio and video of the conference are processed by speech recognition.
  • the processing device can acquire the conference text that has been processed by the audio and video, and the processing device can also acquire the conference audio and video, and obtain the conference text by processing the conference audio and video.
  • Step 102 The processing device inputs the conference text into the to-do recognition model, and determines the initial to-do statement.
  • the to-do recognition model may be a pre-trained deep learning model for recognizing to-do intent sentences for conference texts, and the specific deep learning model used is not limited.
  • the processing device may also generate a to-do recognition model, and the to-do recognition model is generated by the following method: training an initial single-classification model based on positive samples of to-do sentences to obtain a to-do recognition model .
  • the to-do recognition model is a single-classification model as an example for description.
  • the single-classification model is a special classification task model. For the training samples of this model, only the labels of the forward class are used, while other samples are divided into another class, which can be understood as determining the boundary of the forward sample, outside the boundary. The data is divided into another category.
  • the positive sample of the to-do sentence may be a sample that has been marked with a positive label, that is, a sample that has been determined to be a to-do sentence of the meeting.
  • the number of forward samples of the to-do statement is not limited, and can be set according to the actual situation.
  • the processing device may input the positive sample of the to-do sentence into the initial single-classification model for model training, and obtain a trained single-classification model, which is the to-do recognition model.
  • the processing device inputs the meeting text into the to-do recognition model, and determines the initial to-do sentence, which may include: the processing device converts the text sentences in the conference text into sentence vectors, and inputs the sentence vectors into the to-do recognition model , determine the initial to-do statement.
  • the text sentence is obtained by sentence cutting or division of the conference text, and the number of the text sentence may be multiple.
  • the processing device can convert each text sentence included in the conference text into a sentence vector through an Embedding layer, and input each sentence vector into the pre-trained to-do recognition model to predict the classification result of the to-do sentence.
  • the statement that returns a value is determined to be the initial to-do statement. Since the to-do recognition model is a single-classification model, it can be understood that the classification is performed by calculating the radius and center of a sphere, which is the boundary of the positive sample, and the space inside the sphere can represent the distribution space of the positive samples of the to-do sentence.
  • the processing device uses a single classification model to identify to-do sentences in the conference text, which reduces the amount of data for deep learning model training, improves model training efficiency, and improves recognition accuracy.
  • Step 103 The processing device inputs the initial to-do statement into the temporal judgment model to determine the temporal result.
  • the temporal judgment model is similar to the above to-do recognition model, and refers to a pre-trained model for further temporal judgment on the initial to-do sentence identified in the previous step, and the specific deep learning model used is not limited .
  • Tenses are forms that characterize behaviors, actions, and states under various time conditions.
  • the tense results can include past tense, present tense, and future tense, etc.
  • Past tense is used to represent past time
  • present tense is used to The present time is represented
  • the future tense is used to represent the future time.
  • the initial to-do sentence can be input into the pre-trained temporal judgment model, and further temporal judgment is performed to determine the temporal result.
  • the temporal judgment model can be a three-category model.
  • Step 104 The processing device determines a meeting to-do sentence in the initial to-do sentence based on the temporal result.
  • the meeting to-do statement is different from the initial to-do statement, and refers to a finalized statement with to-do intention.
  • determining the meeting to-do statement in the initial to-do statement based on the temporal result may include: determining the initial to-do statement whose temporal result is the future tense as the meeting to-do statement.
  • the processing device may take the initial to-do statement whose tense result is the future tense as the meeting to-do statement, and the tense result as the initial to-do statement of the past tense and the present tense.
  • the to-do statement is deleted, and finally the meeting to-do statement is obtained.
  • the processing device realizes the to-do intention recognition for the meeting text through the deep learning model, helps the user to organize the meeting to-do sentences in the meeting minutes, and improves the user's work efficiency; compared with the traditional machine learning method, the to-do recognition
  • the model adopts a single classification model, so it can greatly improve the judgment accuracy of negative samples.
  • the negative samples of to-do intent sentences have no boundaries, and the model has high judgment accuracy, which can greatly improve the user experience.
  • the processing device obtains the meeting text of the meeting audio and video; inputs the meeting text into the to-do recognition model to determine the initial to-do sentence; inputs the initial to-do sentence into the tense judgment model, Determines the temporal result of the initial to-do statement; determines the meeting to-do statement in the initial to-do statement based on the temporal result.
  • preprocessing text sentences based on set rules includes: deleting text sentences lacking intent words; and/or deleting text sentences whose text length is less than a length threshold; and/or deleting text sentences lacking nouns.
  • the text sentence is obtained by sentence cutting or division of the conference text.
  • the conference text can be cut according to punctuation, and the conference text can be converted into a plurality of text sentences.
  • the setting rule may be a rule for processing multiple text sentences, which may not be specifically limited.
  • the setting rule may be deleting stop words and/or deleting repeated words.
  • the conference text is divided into sentences to obtain a plurality of text sentences, and then word segmentation processing can be performed on each text sentence to obtain the word segmentation processing results, and the text sentences can be predicted based on the set rules and the word segmentation processing results.
  • processing, to filter the text sentences the text sentences after the preprocessing are more likely to be to-do sentences.
  • Preprocessing the text sentences may include: retrieving the word segmentation processing results of each text sentence, judging whether the intention words and/or nouns are included, and deleting the text sentences lacking the intention words and/or nouns.
  • Intent words refer to pre-arranged words that may contain to-do intentions.
  • a text sentence includes the word “need to be completed”, it may have to-do intentions, and "need to be completed” is an intention word.
  • a thesaurus may be set to store multiple intended words and/or nouns for preprocessing.
  • preprocessing the text sentences may include: determining the text length of each text sentence, comparing with the length threshold respectively, and deleting the text sentences whose text length is less than the length threshold.
  • the length threshold refers to a preset sentence length value. When the text sentence is too short, it may not be a sentence. Therefore, the too short text sentence is deleted by setting the length threshold.
  • the preprocessing based on the text statements of the set rules may include: performing sentence pattern matching on the text sentences based on the set sentence patterns, and deleting text statements that do not satisfy the set sentence patterns.
  • the set sentence pattern can be understood as a sentence pattern that is more likely to be a to-do intention.
  • the set sentence pattern can include a variety of sentence patterns.
  • the set sentence pattern can be subject + preposition + time word + verb + object, and the corresponding sentence For example, "Xiao Wang, you will finish your homework tomorrow", the statement is a to-do statement. Match each text sentence with the set sentence pattern, and delete the text sentences that do not satisfy the set sentence pattern.
  • the text sentences included in the meeting text can be preprocessed based on a variety of setting rules. Since the setting rules are related to the to-do intent, the preprocessed text sentences are the to-do sentences is more likely, thereby improving the efficiency and accuracy of subsequent determination of to-do statements.
  • FIG. 2 is a schematic flowchart of another method for processing meeting minutes provided by an embodiment of the present disclosure.
  • the method may be executed by a meeting minutes processing apparatus, wherein the apparatus may be implemented by software and/or hardware, and may generally be integrated in electronic equipment middle.
  • the method includes:
  • Step 201 The processing device receives a user's display triggering operation for the target minutes statement in the meeting minutes display interface, wherein the meeting minutes display interface displays the meeting audio and video, the meeting text of the meeting audio and video, and the target minutes statement.
  • the meeting minutes display interface refers to the interface used to display the pre-generated meeting minutes.
  • the meeting audio and video and meeting text are displayed in different areas of the meeting minutes display interface.
  • you can set the audio and video area Areas such as the subtitle area and the meeting minutes display area are respectively used to display the meeting audio and video, the meeting text of the meeting audio and video, and the meeting minutes and other content related to the meeting.
  • the display trigger operation refers to the operation used to trigger the display of the meeting to-do statement in the meeting minutes.
  • the specific method is not limited.
  • the display trigger operation may be a click operation and/or a hover operation on the meeting to-do statement.
  • Minutes sentences refer to the sentences in the meeting minutes, which are displayed in the above-mentioned meeting minutes display area.
  • the summary statement includes a meeting to-do statement, and the meeting to-do statement is a summary statement corresponding to a summary type, and is the to-do statement determined in the above embodiment.
  • the meeting minutes refer to the main content of the meeting generated by processing the audio and video of the meeting. There can be various types of meeting minutes. In this embodiment of the present disclosure, the meeting minutes may include at least one of topics, agendas, discussions, conclusions, and to-dos. Type, the meeting to-do statement is the statement under the to-do type.
  • the client terminal may receive the user's display triggering operation on one of the target minutes sentences in the meeting minutes.
  • FIG. 3 is a schematic diagram of a meeting minutes display interface provided by an embodiment of the present disclosure.
  • a first area 11 in the meeting minutes display interface 10 displays meeting minutes, and the top of the first area 11 displays the meeting minutes.
  • the conference video is displayed, the conference text is displayed in the second area 12, and the conference audio can be displayed at the bottom of the conference minutes display interface 10, which may specifically include the time axis of the conference audio.
  • Figure 3 shows 5 types of meeting minutes, which are topic, agenda, discussion, conclusion, and to-do, of which three to-do statements are included under to-do.
  • the arrows in FIG. 3 may represent a presentation triggering operation for the first meeting to-do statement.
  • the conference text in FIG. 3 can be divided into subtitle segments based on different users participating in the conference.
  • the figure shows the subtitle segments of three users, namely User 1, User 2 and User 3.
  • the meeting title "Team Review Meeting” and related content of the meeting are also displayed at the top of the meeting minutes display interface 10.
  • "2019.12.20 10:00 am” indicates the meeting start time
  • "1h30m30s” indicates that the meeting duration is 1 Hours 30 minutes 20 seconds
  • "16" indicates the number of participants.
  • the meeting minutes display interface 10 in FIG. 3 is only an example, and the location of each content included therein is also an example, and the specific location and display manner can be set according to actual conditions.
  • Step 202 The processing device displays the target summary sentence and the related sentences of the target summary sentence.
  • the associated sentence is included in the conference text, and is a subtitle sentence that has a positional association with the target minutes sentence.
  • the number of associated sentences can be set according to actual conditions, for example, the associated sentences can be two subtitle sentences before and after the position of the target minutes sentence in the conference text. The number can be 2.
  • the subtitle sentence may be a constituent unit of the conference text, which is obtained by dividing the conference text into sentences.
  • the conference text may include multiple subtitle sentences, and the specific number is not limited.
  • displaying the target summary statement and the related statement of the target summary statement may include: displaying the target summary statement and the related statement of the target summary statement in a floating window in the meeting minutes display interface.
  • the floating window can be displayed in the area of the meeting minutes display interface, and the specific position of the floating window can be set according to the actual situation.
  • the position of the floating window can be any position that does not block the current target minutes statement.
  • the processing device can display a floating window to the user, and present the target summary sentence and the related sentences of the target summary sentence in the floating window.
  • the target summary sentence may be difficult to understand by the user when the target summary sentence is presented alone, thereby facilitating the user to understand the content and improving the display effect of the summary sentence.
  • the first underlined meeting to-do statement under the to-do type in the meeting minutes in the first area 11 is the target meeting to-do statement.
  • the Window 13 displays the target meeting to-do statement and related sentences of the target to-do statement.
  • the related sentences displayed in the floating window 13 in the figure are the upper and lower sentences of the target meeting to-do statement.
  • the method for processing meeting minutes may further include: playing the audio and video of the meeting based on the associated time period of the target minutes sentence, and highlighting the associated subtitles of the target minutes sentence in the meeting text.
  • the associated subtitle of the target summary sentence refers to the subtitle corresponding to the target minute sentence in the subtitle text
  • the associated time period of the target minute sentence refers to the time period in the conference audio and video of the original conference speech corresponding to the associated subtitle.
  • the associated time period can be Include start time and end time.
  • the processing device may also play the conference audio and video at the start time in the associated time period of the target summary sentence, and stop playing the conference audio and video at the end time; jump the conference text to Go to the position of the associated subtitle of the target summary sentence, and display the associated subtitle of the target minute sentence in a pre-set manner.
  • the setting manner may be any feasible presentation manner that can be distinguished from other parts of the conference text, for example, may include but not limited to at least one of highlighting, bolding, and adding underline.
  • the user can trigger the interaction of the minutes in the meeting minutes display interface, so as to realize the related interaction between the conference audio and video and the related content in the conference text, which improves the user's interactive experience effect.
  • the user has an intuitive understanding of the relationship between the three, which is more helpful for the user to accurately understand the conference content.
  • the processing device receives a user's display triggering operation for a target summary sentence in a meeting minutes display interface, wherein the meeting minutes display interface displays meeting audio and video, the meeting audio and video The meeting text and the target minutes statement; show the target minutes statement and the associated statement of the target minutes statement.
  • FIG. 4 is a schematic structural diagram of an apparatus for processing meeting minutes according to an embodiment of the present disclosure.
  • the apparatus may be implemented by software and/or hardware, and may generally be integrated into an electronic device.
  • the device includes:
  • a text acquisition module 401 configured to acquire conference text of conference audio and video
  • a temporal judgment module 403 configured to input the initial to-do statement into a temporal judgment model, and determine the temporal result of the initial to-do statement;
  • a meeting to-do module 404 configured to determine a meeting to-do sentence in the initial to-do sentence based on the temporal result.
  • the initial to-do module 402 is specifically used for:
  • the device further includes a model training module, which is specifically used for:
  • the initial single-classification model is trained based on the positive samples of to-do sentences, and the to-do recognition model is obtained.
  • the meeting to-do module 404 is specifically used for:
  • the initial to-do sentence whose tense result is the future tense is determined as a meeting to-do sentence.
  • the device further includes a preprocessing module for: after obtaining the conference text of the conference audio and video,
  • the text sentences are preprocessed based on set rules to filter the text sentences.
  • the preprocessing module is specifically used for:
  • the preprocessing module is specifically used for:
  • Sentence matching is performed on the text sentence based on the set sentence form, and text sentences that do not satisfy the set sentence form are deleted.
  • the apparatus for processing meeting minutes obtains the meeting text of the meeting audio and video through the cooperation of various modules; inputs the meeting text into the to-do recognition model to determine the initial to-do statement; and inputs the initial to-do statement
  • the temporal judgment model the temporal result of the initial to-do statement is determined; the meeting to-do statement in the initial to-do statement is determined based on the temporal result.
  • FIG. 5 is a schematic structural diagram of an apparatus for processing meeting minutes according to an embodiment of the present disclosure.
  • the apparatus may be implemented by software and/or hardware, and may generally be integrated into an electronic device.
  • the device includes:
  • a presentation triggering module 501 is configured to receive a presentation triggering operation by a user on a target summary statement in a meeting minutes presentation interface, wherein the meeting minutes presentation interface displays conference audio and video, the conference text of the conference audio and video, and the target minutes statement;
  • the display module 502 is configured to display the target summary statement and the associated statement of the target summary statement.
  • the associated sentence includes a subtitle sentence associated with the target summary sentence in the conference text
  • the conference text includes a plurality of the subtitle sentences
  • the target summary sentence includes the target conference.
  • To-do statement is a subtitle sentence associated with the target summary sentence in the conference text
  • the conference text includes a plurality of the subtitle sentences
  • the target summary sentence includes the target conference. To-do statement.
  • the display module 502 is specifically used for:
  • the target minutes statement and the associated statement of the target minutes statement are displayed in the floating window in the meeting minutes display interface.
  • the device further includes an associated interaction module for:
  • the audio and video of the conference are played based on the associated time period of the target minutes sentence, and the associated subtitles of the target minutes sentence in the conference text are highlighted.
  • the apparatus for processing meeting minutes receives a user's display triggering operation for a target summary sentence in a meeting minutes display interface, wherein the meeting minutes display interface displays conference audio and video through cooperation among modules. , the conference text of the conference audio and video, and the target summary sentence; display the target summary sentence and the associated sentence of the target summary sentence.
  • FIG. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. Referring specifically to FIG. 6 below, it shows a schematic structural diagram of an electronic device 600 suitable for implementing an embodiment of the present disclosure.
  • the electronic device 600 in the embodiment of the present disclosure may include, but is not limited to, such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle-mounted terminal ( For example, mobile terminals such as car navigation terminals) and the like, and stationary terminals such as digital TVs, desktop computers, and the like.
  • the electronic device shown in FIG. 6 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
  • an electronic device 600 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 601 that may be loaded into random access according to a program stored in a read only memory (ROM) 602 or from a storage device 608 Various appropriate actions and processes are executed by the programs in the memory (RAM) 603 . In the RAM 603, various programs and data required for the operation of the electronic device 600 are also stored.
  • the processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also connected to bus 604 .
  • I/O interface 605 input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 607 of a computer, etc.; a storage device 608 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 609.
  • Communication means 609 may allow electronic device 600 to communicate wirelessly or by wire with other devices to exchange data. While Figure 6 shows electronic device 600 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication device 609, or from the storage device 608, or from the ROM 602.
  • the processing device 601 the above-mentioned functions defined in the method for processing meeting minutes according to the embodiment of the present disclosure are executed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
  • the client and server can use any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol) to communicate, and can communicate with digital data in any form or medium Communication (eg, a communication network) interconnects.
  • HTTP HyperText Transfer Protocol
  • Examples of communication networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: obtains the conference text of the conference audio and video; input the conference text into the to-do recognition model , determine the initial to-do statement; input the initial to-do statement into the temporal judgment model to determine the temporal result; determine the meeting to-do statement in the initial to-do statement based on the temporal result.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: receives a user's display trigger operation of the target minutes statement in the meeting minutes display interface, Wherein, the meeting minutes display interface displays the meeting audio and video, the meeting text of the meeting audio and video, and the target minutes sentence; displays the target minutes sentence and related sentences of the target minutes sentence.
  • Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments of the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner. Among them, the name of the unit does not constitute a limitation of the unit itself under certain circumstances.
  • exemplary types of hardware logic components include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs Systems on Chips
  • CPLDs Complex Programmable Logical Devices
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
  • the machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • the present disclosure provides a method for processing meeting minutes, including:
  • a meeting to-do statement in the initial to-do statement is determined based on the temporal result.
  • the present disclosure provides a method for processing meeting minutes, inputting the meeting text into a to-do recognition model, and determining an initial to-do statement, including:
  • the to-do recognition model is generated in the following manner:
  • the initial single-classification model is trained based on the positive samples of to-do sentences, and the to-do recognition model is obtained.
  • determining a meeting to-do sentence in the initial to-do sentence based on the temporal result includes:
  • the initial to-do sentence whose tense result is the future tense is determined as a meeting to-do sentence.
  • the method for processing meeting minutes after acquiring the meeting text of the meeting audio and video, the method further includes:
  • the text sentences are preprocessed based on set rules to filter the text sentences.
  • the preprocessing of the text sentence based on a set rule includes:
  • the preprocessing based on the text sentence of the set rule includes:
  • Sentence matching is performed on the text sentence based on the set sentence form, and text sentences that do not satisfy the set sentence form are deleted.
  • the present disclosure provides a method for processing meeting minutes, including:
  • the meeting minutes display interface displays the meeting audio and video, the meeting text of the meeting audio and video, and the target summary statement
  • the target minutes statement and associated sentences of the target minutes statement are displayed.
  • the present disclosure provides a method for processing meeting minutes, wherein the associated sentence is included in the meeting text, and a subtitle sentence that has a positional association with the target minutes sentence, the meeting text including a plurality of the subtitle sentences, and the target minutes sentences include target meeting to-do sentences.
  • the displaying the target minutes statement and the associated statement of the target minutes statement includes:
  • the target minutes statement and the associated statement of the target minutes statement are displayed in the floating window in the meeting minutes display interface.
  • the present disclosure provides a method for processing meeting minutes, further comprising:
  • the audio and video of the conference are played based on the associated time period of the target minutes sentence, and the associated subtitles of the target minutes sentence in the conference text are highlighted.
  • the present disclosure provides an apparatus for processing meeting minutes, including:
  • the text acquisition module is used to acquire the conference text of the conference audio and video
  • an initial to-do module used to input the meeting text into the to-do recognition model to determine the initial to-do statement
  • a temporal judgment module for inputting the initial to-do statement into a temporal judgment model, and to determine the temporal result of the initial to-do statement
  • a meeting to-do module is configured to determine a meeting to-do sentence in the initial to-do sentence based on the temporal result.
  • the initial to-do module is specifically used for:
  • the apparatus further includes a model training module, which is specifically used for:
  • the initial single-classification model is trained based on the positive samples of to-do sentences, and the to-do recognition model is obtained.
  • the meeting to-do module is specifically configured to:
  • the initial to-do sentence whose tense result is the future tense is determined as a meeting to-do sentence.
  • the apparatus further includes a preprocessing module, configured to: after acquiring the conference text of the conference audio and video,
  • the text sentences are preprocessed based on set rules to filter the text sentences.
  • the preprocessing module is specifically configured to:
  • the preprocessing module is specifically configured to:
  • Sentence matching is performed on the text sentence based on the set sentence form, and text sentences that do not satisfy the set sentence form are deleted.
  • the present disclosure provides an apparatus for processing meeting minutes, including:
  • a display triggering module configured to receive a user's display trigger operation on the target summary statement in the meeting minutes display interface, wherein the meeting minutes display interface displays the meeting audio and video, the meeting text of the meeting audio and video, and the target minutes statement;
  • the display module is used to display the target summary statement and the associated statement of the target summary statement.
  • the associated sentence includes a subtitle sentence associated with the target minutes sentence in the meeting text, and the meeting The text includes a plurality of the subtitle sentences, and the target minutes sentences include target meeting to-do sentences.
  • the presentation module is specifically used for:
  • the target minutes statement and the associated statement of the target minutes statement are displayed in the floating window in the meeting minutes display interface.
  • the apparatus further includes an associated interaction module for:
  • the audio and video of the conference are played based on the associated time period of the target minutes sentence, and the associated subtitles of the target minutes sentence in the conference text are highlighted.
  • the present disclosure provides an electronic device, comprising:
  • a memory for storing the processor-executable instructions
  • the processor is configured to read the executable instructions from the memory, and execute the instructions to implement any one of the methods for processing meeting minutes provided in the present disclosure.
  • the present disclosure provides a computer-readable storage medium storing a computer program for executing the conference as provided in any one of the present disclosure. How to handle minutes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Economics (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种会议纪要的处理方法、装置、设备及介质,其中该方法包括:获取会议音视频的会议文本(101);将会议文本输入待办识别模型中,确定初始待办语句(102);将初始待办语句输入时态判断模型中,确定初始待办语句的时态结果(103);基于时态结果确定初始待办语句中的会议待办语句(104)。采用上述方法,在对会议音视频的会议文本进行识别的基础上增加时态判断,可以提高确定会议待办语句的准确性,进而可以提升用户基于会议待办语句的工作效率,提高了用户的体验效果。

Description

一种会议纪要的处理方法、装置、设备及介质
本申请要求于2021年01月27日提交中国国家知识产权局、申请号为202110113700.1、申请名称为“一种会议纪要的处理方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及会议识别技术领域,尤其涉及一种会议纪要的处理方法、装置、设备及介质。
背景技术
随着智能设备和多媒体技术的不断发展,通过智能设备进行线上会议因其在沟通效率和信息保留等方面的突出表现,越来越多地被应用在日常和办公生活中。
在会议结束后的音视频可以通过识别处理转化为文本,并从文本中确定出包括任务意图的待办语句。但是,对于待办语句的确定存在效率较低以及准确性不高的问题。
发明内容
为了解决上述技术问题或者至少部分地解决上述技术问题,本公开提供了一种会议纪要的处理方法、装置、设备及介质。
本公开实施例提供了一种会议纪要的处理方法,所述方法包括:
获取会议音视频的会议文本;
将所述会议文本输入待办识别模型中,确定初始待办语句;
将所述初始待办语句输入时态判断模型中,确定所述初始待办语句的时态结果;
基于所述时态结果确定所述初始待办语句中的会议待办语句。
本公开实施例还提供了一种会议纪要的处理方法,所述方法包括:
接收用户对会议纪要展示界面中目标纪要语句的展示触发操作,其中,所述会议纪要展示界面中展示有会议音视频、所述会议音视频的会议文本以及所述目标纪要语句;
展示所述目标纪要语句以及所述目标纪要语句的关联语句。
本公开实施例还提供了一种会议纪要的处理装置,所述装置包括:
文本获取模块,用于获取会议音视频的会议文本;
初始待办模块,用于将所述会议文本输入待办识别模型中,确定初始待办语句;
时态判断模块,用于将所述初始待办语句输入时态判断模型中,确定所述初始待办语句的时态结果;
会议待办模块,用于基于所述时态结果确定所述初始待办语句中的会议待办语句。
本公开实施例还提供了一种会议纪要的处理装置,所述装置包括:
展示触发模块,用于接收用户对会议纪要展示界面中目标纪要语句的展示触发操作,其中,所述会议纪要展示界面中展示有会议音视频、所述会议音视频的会议文本以及所述目标纪要语句;
展示模块,用于展示所述目标纪要语句以及所述目标纪要语句的关联语句。
本公开实施例还提供了一种电子设备,所述电子设备包括:处理器;用于存储所述处理器可执行指令的存储器;所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现如本公开实施例提供的会议纪要的处理方法。
本公开实施例还提供了一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于执行如本公开实施例提供的会议纪要的处理方法。
本公开实施例提供的技术方案与现有技术相比具有如下优点:本公开实施例提供的会议纪要的处理方案,获取会议音视频的会议文本;将会议文本输入待办识别模型中,确定初始待办语句;将初始待办语句输入时态判断模型中,确定初始待办语句的时态结果;基于时态结果确定初始待办语句中的会议待办语句。采用上述技术方案,在对会议音视频的会议文本进行识别的基础上通过增加时态判断,避免已经完成的语句被识别为会议待办语句,大大提高了确定会议待办语句的准确性,进而可以提升用户基于会议待办语句的工作效率,提高了用户的体验效果。
附图说明
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。
图1为本公开实施例提供的一种会议纪要的处理方法的流程示意图;
图2为本公开实施例提供的另一种会议纪要的处理方法的流程示意图;
图3为本公开实施例提供的一种会议纪要展示界面的示意图;
图4为本公开实施例提供的一种会议纪要的处理装置的结构示意图;
图5为本公开实施例提供的一种会议纪要的处理装置的结构示意图;
图6为本公开实施例提供的一种电子设备的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公 开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
在会议结束后,可以将会议的音视频通过识别处理转化为文本。然而,会议文本的内容通常较多,因此如何快速且正确地筛选出包含任务意图的语句,尤为重要。会议内容往往是针对某一个或者多个主题进行讨论的记录,最终得出一定程度的结论或者衍生出其他许多话题。并且,会议中往往会布置许多需要完成的任务,而会议的会议文本字数较多,如果可以挑选出有任务需要完成意图(todo)的话,可以让用户整理会议纪要省下很多时间。其中,待办语句可以为意图的一种类型。但是,目前待办语句的确定存在效率较低以及准确性不高的问题,为了解决上述问题,本公开实施例提供了一种会议纪要的处理方法,下面结合具体的实施例对该方法进行介绍。
图1为本公开实施例提供的一种会议纪要的处理方法的流程示意图,该方法可以由会议纪要的处理装置执行,其中该装置可以采用软件和/或硬件实现,一般可集成在电子设备中。如图1所示,该方法包括:
步骤101、处理装置获取会议音视频的会议文本。
其中,会议音视频是指用于记录一个会议过程的音频和/或视频。而会议文本是指会议音视频进行语音识别处理之后得到的文本内容。
本公开实施例中,处理装置可以获取已经经过音视频处理所得到的会议文本,处理装置也可以获取会议音视频,并通过对会议音视频进行处理得到 会议文本。
步骤102、处理装置将会议文本输入待办识别模型中,确定初始待办语句。
其中,待办识别模型可以为预先训练好的用于对会议文本进行待办意图语句识别的深度学习模型,具体采用的深度学习模型不作限定。
本公开实施例中,在执行步骤102之前,处理装置还可以生成待办识别模型,待办识别模型通过如下方式生成:基于待办语句正样本对初始单分类模型进行训练,得到待办识别模型。考虑到负样本的无边界性,本公开实施例中以待办识别模型为单分类模型为例进行说明。单分类模型是一种特殊的分类任务模型,对于该模型的训练样本只有正向一类的标签,而其他样本则被划分为另一类,可以理解为确定正向样本的边界,边界之外的数据被划分为另一类。
待办语句正样本可以为已经打上正向标签的样本,也即已经确定为会议待办语句的样本。待办语句正向样本的数量不限,可以根据实际情况进行设定。具体的,处理装置可以将待办语句正样本输入初始单分类模型中进行模型训练,得到训练好的单分类模型,即为待办识别模型。
本公开实施例中,处理装置将会议文本输入待办识别模型中,确定初始待办语句,可以包括:处理装置将会议文本中的文本语句转换为句向量,并将句向量输入待办识别模型中,确定初始待办语句。其中,文本语句是通过对会议文本进行句子切割或划分得到,文本语句的数量可以为多个。
处理装置可以将会议文本中包括的各文本语句通过嵌入(Embedding)层转换为句向量,并将各句向量输入预先训练好的待办识别模型中,进行待办语句的分类结果预测,将具有返回值的语句确定为初始待办语句。由于待办识别模型是单分类模型,可以理解为通过计算一个球的半径和球心来进行分类,该球即为正样本的边界,球内空间可以代表待办语句正样本的分布空间。
上述方案中,处理装置通过采用单分类模型对会议文本进行待办语句的识别,减少了深度学习模型训练的数据量,提升了模型训练效率,并且提升了识别精度。
步骤103、处理装置将初始待办语句输入时态判断模型中,确定时态结果。
其中,时态判断模型与上述待办识别模型类型类似,是指预先训练好的用于对上一步骤识别得到的初始待办语句进行进一步时态判断的模型,具体采用的深度学习模型不作限定。时态是表征行为、动作和状态在各种时间条件下的形式,时态结果可以包括过去时态、现在时态和将来时态等,过去时态用于表征过去时间,现在时态用于表征现在时间,将来时态用于表征未来时间。
具体的,处理装置通过待办识别模型对会议文本识别确定初始待办语句 之后,可以将初始待办语句输入预先训练好的时态判断模型中,进一步进行时态判断,确定时态结果。时态判断模型可以为三分类模型。
步骤104、处理装置基于时态结果确定初始待办语句中的会议待办语句。
其中,会议待办语句区别于初始待办语句,是指最终确定的具有待办意图的语句。
具体的,基于时态结果确定初始待办语句中的会议待办语句,可以包括:将时态结果为将来时态的初始待办语句确定为会议待办语句。上述确定各初始待办语句的时态结果之后,处理装置可以取时态结果为将来时态的初始待办语句为会议待办语句,而将时态结果为过去时态和现在时态的初始待办语句删除,最终得到会议待办语句。
本公开实施例中,处理装置通过深度学习模型实现对会议文本进行待办意图识别,帮助用户整理会议纪要中的会议待办语句,提升用户工作效率;相较于传统机器学习方法,待办识别模型采用单分类模型,因此能够较大程度提升负样本的判断精度,待办意图语句负样本无边界性,模型判断精度高,能较大程度提升用户的体验。
本公开实施例提供的会议纪要的处理方案,处理装置获取会议音视频的会议文本;将会议文本输入待办识别模型中,确定初始待办语句;将初始待办语句输入时态判断模型中,确定初始待办语句的时态结果;基于时态结果确定初始待办语句中的会议待办语句。采用上述技术方案,在对会议音视频的会议文本进行识别的基础上通过增加时态判断,可以避免已经完成的语句被识别为会议待办语句,大大提高了确定会议待办语句的准确性,进而可以提升用户基于会议待办语句的工作效率,提高了用户的体验效果。
在一些实施例中,获取会议音视频的会议文本之后,还可以包括:对会议文本进行句子划分,得到多个文本语句;基于设定规则对文本语句进行预处理,以对文本语句进行筛选。可选的,基于设定规则对文本语句进行预处理,包括:删除缺少意图词的文本语句;和/或,删除文本长度小于长度阈值的文本语句;和/或,删除缺少名词的文本语句。
其中,文本语句是通过对会议文本进行句子切割或划分得到,具体可以按照标点对会议文本进行切割,将会议文本转换为多个文本语句。设定规则可以为用于对多个文本语句进行处理的规则,具体可以不作限定,例如设定规则可以为删除停用词和/或删除重复词等。
本公开实施例中,对会议文本进行句子划分,可以得到多个文本语句,之后可以对每个文本语句进行分词处理,得到分词处理结果,并基于设定规则以及分词处理结果对文本语句进行预处理,以对文本语句进行筛选,预处理之后的文本语句为待办语句的可能性更大。对文本语句进行预处理可以包括:检索每个文本语句的分词处理结果,判断是否包括意图词和/或名词,将缺少意图词和/或名词的文本语句删除。其中意图词是指预先整理的可能包含 待办意图的词,例如一个文本语句包括“需要完成”这个词,则可能具有待办意图,“需要完成”即为一个意图词。本公开实施例中可以设置一个词库,存储多个意图词和/或名词,用于进行预处理。
和/或,对文本语句进行预处理可以包括:确定各文本语句的文本长度,分别与长度阈值进行比对,将文本长度小于长度阈值的文本语句删除。其中,长度阈值是指预先设定的一个语句长度值,文本语句太短时可能不成句,因此通过设置长度阈值将过短的文本语句删除。
可选地,基于设定规则的文本语句进行预处理,可以包括:基于设定句式对文本语句进行句式匹配,删除不满足设定句式的文本语句。其中设定句式可以理解为较大可能为待办意图的句式,设定句式可以包括多种句式,例如设定句式可以为主语+介词+时间词+动词+宾语,对应语句举例“小王你在明天把作业完成”,该语句为待办语句。将各文本语句与设定句式进行句式匹配,删除其中不满足设定句式的文本语句。
本公开实施例中,在获取会议文本之后,基于多种设定规则可以对会议文本包括的文本语句进行预处理,由于设定规则与待办意图相关,预处理之后的文本语句为待办语句的可能性更大,进而提高了后续进行待办语句的确定的效率以及准确性。
图2为本公开实施例提供的另一种会议纪要的处理方法的流程示意图,该方法可以由会议纪要的处理装置执行,其中该装置可以采用软件和/或硬件实现,一般可集成在电子设备中。如图2所示,该方法包括:
步骤201、处理装置接收用户对会议纪要展示界面中目标纪要语句的展示触发操作,其中,会议纪要展示界面中展示有会议音视频、会议音视频的会议文本以及目标纪要语句。
其中,会议纪要展示界面是指用于对预先生成的会议纪要进行展示的界面,会议音视频和会议文本分别展示在会议纪要展示界面中的不同区域,会议纪要展示界面中可以设置音视频区域、字幕区域以及会议纪要展示区域等区域,分别用于展示会议音视频、会议音视频的会议文本以及会议纪要等与会议相关的内容。展示触发操作是指用于对会议纪要中会议待办语句进行展示触发的操作,具体的方式不作限定,例如展示触发操作可以为对会议待办语句进行点击操作和/或悬停操作。
纪要语句是指会议纪要中的语句,展示在上述会议纪要展示区域。纪要语句包括会议待办语句,会议待办语句是一种纪要类型对应的纪要语句,是上述实施例中确定的待办语句。会议纪要是指对会议音视频进行处理生成的会议主要内容,会议纪要的类型可以为多种,本公开实施例中会议纪要可以包括议题、议程、讨论、结论和待办等中的至少一种类型,会议待办语句为待办类型下的语句。
本公开实施例中,用户浏览会议纪要展示界面中的内容时,客户端可以接收用户对会议纪要中其中一个目标纪要语句的展示触发操作。
示例性的,图3为本公开实施例提供的一种会议纪要展示界面的示意图,如图3所示,会议纪要展示界面10中的第一区域11展示有会议纪要,第一区域11的顶部展示有会议视频,第二区域12中展示有会议文本,会议纪要展示界面10中的最下方可以展示会议音频,具体可以包括会议音频的时间轴。图3中展示了5种类型的会议纪要,分别为议题、议程、讨论、结论和待办,其中待办下包括了三个会议待办语句。图3中的箭头可以表示对第一个会议待办语句的展示触发操作。
图3中的会议文本可以基于参与会议的不同用户划分字幕片段,图中展示了三个用户的字幕片段,分别为用户1、用户2和用户3。图3中在会议纪要展示界面10的顶部还展示了会议标题“团队回顾会议”以及会议相关内容,图中“2019.12.20上午10:00”表示会议开始时间,“1h30m30s”表示会议时长为1小时30分20秒,“16”表示参会人数量。可以理解的是,图3中的会议纪要展示界面10仅为示例,其中包括的各内容的位置也为一种示例,具体的位置以及展示方式可以根据实际情况进行设定。
步骤202、处理装置展示目标纪要语句以及目标纪要语句的关联语句。
其中,关联语句包括在会议文本中,是与目标纪要语句具有位置关联的字幕语句。关联语句的数量可以根据实际情况进行设定,例如关联语句可以为在会议文本中目标纪要语句位置前后的两个字幕语句。数量可以为2。字幕语句可以为会议文本的一个组成单位,通过对会议文本进行句子划分得到,会议文本中可以包括多个字幕语句,具体数量不限。
本公开实施例中,展示目标纪要语句以及目标纪要语句的关联语句,可以包括:在会议纪要展示界面中的悬浮窗口中展示目标纪要语句以及目标纪要语句的关联语句。悬浮窗口可以呈现在会议纪要展示界面中区域内,悬浮窗口的具体位置可以根据实际情况进行设定,例如悬浮窗口的位置可以是任何不遮挡当前目标纪要语句的位置。
接收到对目标纪要语句的展示触发操作之后,处理装置可以展出一个悬浮窗口给用户,并在悬浮窗口中呈现目标纪要语句以及目标纪要语句的关联语句。本公开实施例中,通过对目标纪要语句以及其上下若干句的呈现,避免了单独呈现目标纪要语句时可能导致用户难以理解,从而方便用户理解内容,提高了纪要语句的展示效果。
示例性的,参见图3,第一区域11的会议纪要中待办类型下的第一个具有下划线的会议待办语句为目标会议待办语句,对目标待办语句进行展示触发之后,在悬浮窗口13中展示了该目标会议待办语句以及目标待办语句的关联语句,图中悬浮窗口13中展示的关联语句为目标会议待办语句的上下各一句。
在一些实施例中,会议纪要的处理方法还可以包括:基于目标纪要语句的关联时间段播放会议音视频,并将会议文本中目标纪要语句的关联字幕突出展示。目标纪要语句的关联字幕是指目标纪要语句在字幕文本中所对应的字幕,目标纪要语句的关联时间段是指关联字幕对应的原始会议语音在会议音视频中的时间段,该关联时间段可以包括开始时间和结束时间。
在接收到用户对目标纪要语句的展示触发操作之后,处理装置还可以在目标纪要语句的关联时间段中的开始时间播放会议音视频,并在结束时间停止播放会议音视频;将会议文本跳转到目标纪要语句的关联字幕的位置处,并将目标纪要语句的关联字幕采用设定方式突出展示。可选的,设定方式可以是任何可行的、能够与会议文本的其它部分区分开的展示方式,例如,可以包括但不限于高亮、加粗和添加下划线中的至少一种。
上述方案中,用户在会议纪要展示界面中通过对纪要语句的交互触发,可以实现会议音视频以及会议文本中相关内容的关联交互,提高了用户的交互体验效果,并通过纪要语句、会议音视频以及会议文本三者之间的关联交互,使用户对三者之间的关系具有直观的了解,更有助于用户准确理解会议内容。
可以理解的是,在不矛盾的前提下,本公开实施例中的各个步骤、特征可以与本公开的其它实施例(包括但不限于如图1所示的实施例以及实施例的具体实现手段等)相互叠加和组合。
本公开实施例提供的会议纪要的处理方案,处理装置接收用户对会议纪要展示界面中目标纪要语句的展示触发操作,其中,所述会议纪要展示界面中展示有会议音视频、所述会议音视频的会议文本以及所述目标纪要语句;展示所述目标纪要语句以及所述目标纪要语句的关联语句。采用上述技术方案,在确定更加准确的纪要语句之后,当处理装置接收到用户对其中一个纪要语句的触发之后,能够将该纪要语句以及其前后若干句进行呈现,避免了单独呈现纪要语句时容易让用户看不明白的问题,更加方便了用户理解内容,使纪要语句的展示效果更优,进而提高了用户的体验效果。
图4为本公开实施例提供的一种会议纪要的处理装置的结构示意图,该装置可由软件和/或硬件实现,一般可集成在电子设备中。如图4所示,该装置包括:
文本获取模块401,用于获取会议音视频的会议文本;
初始待办模块402,用于将所述会议文本输入待办识别模型中,确定初始待办语句;
时态判断模块403,用于将所述初始待办语句输入时态判断模型中,确定所述初始待办语句的时态结果;
会议待办模块404,用于基于所述时态结果确定所述初始待办语句中的 会议待办语句。
可选的,所述初始待办模块402具体用于:
将所述会议文本中的文本语句转换为句向量,并将所述句向量输入所述待办识别模型中,确定初始待办语句,其中,所述待办识别模型为单分类模型。
可选的,所述装置还包括模型训练模块,具体用于:
基于待办语句正样本对初始单分类模型进行训练,得到所述待办识别模型。
可选的,所述会议待办模块404具体用于:
将所述时态结果为将来时态的初始待办语句确定为会议待办语句。
可选的,所述装置还包括预处理模块,用于:获取会议音视频的会议文本之后,
对所述会议文本进行句子划分,得到多个文本语句;
基于设定规则对所述文本语句进行预处理,以对所述文本语句进行筛选。
可选的,所述预处理模块具体用于:
删除缺少意图词的文本语句;和/或,
删除文本长度小于长度阈值的文本语句;和/或
删除缺少名词的文本语句。
可选的,所述预处理模块具体用于:
基于设定句式对所述文本语句进行句式匹配,删除不满足所述设定句式的文本语句。
本公开实施例所提供的会议纪要的处理装置,通过各模块间的配合,获取会议音视频的会议文本;将会议文本输入待办识别模型中,确定初始待办语句;将初始待办语句输入时态判断模型中,确定初始待办语句的时态结果;基于时态结果确定初始待办语句中的会议待办语句。采用上述技术方案,在对会议音视频的会议文本进行识别的基础上通过增加时态判断,可以避免已经完成的语句被识别为会议待办语句,大大提高了确定会议待办语句的准确性,进而可以提升用户基于会议待办语句的工作效率,提高了用户的体验效果。
图5为本公开实施例提供的一种会议纪要的处理装置的结构示意图,该装置可由软件和/或硬件实现,一般可集成在电子设备中。如图5所示,该装置包括:
展示触发模块501,用于接收用户对会议纪要展示界面中目标纪要语句的展示触发操作,其中,所述会议纪要展示界面中展示有会议音视频、所述会议音视频的会议文本以及所述目标纪要语句;
展示模块502,用于展示所述目标纪要语句以及所述目标纪要语句的关 联语句。
可选的,所述关联语句包括在所述会议文本中,与所述目标纪要语句具有位置关联的字幕语句,所述会议文本中包括多个所述字幕语句,所述目标纪要语句包括目标会议待办语句。
可选的,所述展示模块502具体用于:
在会议纪要展示界面中的悬浮窗口中展示所述目标纪要语句以及所述目标纪要语句的关联语句。
可选的,所述装置还包括关联互动模块,用于:
基于所述目标纪要语句的关联时间段播放所述会议音视频,并将所述会议文本中所述目标纪要语句的关联字幕突出展示。
本公开实施例所提供的会议纪要的处理装置,通过各模块间的配合,接收用户对会议纪要展示界面中目标纪要语句的展示触发操作,其中,所述会议纪要展示界面中展示有会议音视频、所述会议音视频的会议文本以及所述目标纪要语句;展示所述目标纪要语句以及所述目标纪要语句的关联语句。采用上述技术方案,在确定更加准确的纪要语句之后,当接收到用户对其中一个纪要语句的触发之后,能够将该纪要语句以及其上下若干句进行呈现,避免了单独呈现纪要语句时用户难以理解,更加方便了用户理解内容,提高了纪要语句的展示效果,进而提高了用户的体验效果。
图6为本公开实施例提供的一种电子设备的结构示意图。下面具体参考图6,其示出了适于用来实现本公开实施例中的电子设备600的结构示意图。本公开实施例中的电子设备600可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图6示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图6所示,电子设备600可以包括处理装置(例如中央处理器、图形处理器等)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储装置608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有电子设备600操作所需的各种程序和数据。处理装置601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
通常,以下装置可以连接至I/O接口605:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置606;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置607;包括例如磁带、硬盘等的存储装置608;以及通信装置609。通信装置609可以允许电子设备600与其他设备进行无线或有线通信以交换数据。虽然图6示出了具有各种 装置的电子设备600,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM 602被安装。在该计算机程序被处理装置601执行时,执行本公开实施例的会议纪要的处理方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取会议音视频的会议文本;将 所述会议文本输入待办识别模型中,确定初始待办语句;将所述初始待办语句输入时态判断模型中,确定时态结果;基于所述时态结果确定所述初始待办语句中的会议待办语句。
或者,上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:接收用户对会议纪要展示界面中目标纪要语句的展示触发操作,其中,所述会议纪要展示界面中展示有会议音视频、所述会议音视频的会议文本以及所述目标纪要语句;展示所述目标纪要语句以及所述目标纪要语句的关联语句。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介 质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,本公开提供了一种会议纪要的处理方法,包括:
获取会议音视频的会议文本;
将所述会议文本输入待办识别模型中,确定初始待办语句;
将所述初始待办语句输入时态判断模型中,确定所述初始待办语句的时态结果;
基于所述时态结果确定所述初始待办语句中的会议待办语句。
根据本公开的一个或多个实施例,本公开提供会议纪要的处理方法中,将所述会议文本输入待办识别模型中,确定初始待办语句,包括:
将所述会议文本中的文本语句转换为句向量,并将所述句向量输入所述待办识别模型中,确定初始待办语句,其中,所述待办识别模型为单分类模型。
根据本公开的一个或多个实施例,本公开提供会议纪要的处理方法中,所述待办识别模型通过如下方式生成:
基于待办语句正样本对初始单分类模型进行训练,得到所述待办识别模型。
根据本公开的一个或多个实施例,本公开提供会议纪要的处理方法中,基于所述时态结果确定所述初始待办语句中的会议待办语句,包括:
将所述时态结果为将来时态的初始待办语句确定为会议待办语句。
根据本公开的一个或多个实施例,本公开提供会议纪要的处理方法中,获取会议音视频的会议文本之后,还包括:
对所述会议文本进行句子划分,得到多个文本语句;
基于设定规则对所述文本语句进行预处理,以对所述文本语句进行筛选。
根据本公开的一个或多个实施例,本公开提供会议纪要的处理方法中,所述基于设定规则对所述文本语句进行预处理,包括:
删除缺少意图词的文本语句;和/或,
删除文本长度小于长度阈值的文本语句;和/或,
删除缺少名词的文本语句。
根据本公开的一个或多个实施例,本公开提供会议纪要的处理方法中,所述基于设定规则的所述文本语句进行预处理,包括:
基于设定句式对所述文本语句进行句式匹配,删除不满足所述设定句式的文本语句。
根据本公开的一个或多个实施例,本公开提供了一种会议纪要的处理方法,包括:
接收用户对会议纪要展示界面中目标纪要语句的展示触发操作,其中,所述会议纪要展示界面中展示有会议音视频、所述会议音视频的会议文本以及所述目标纪要语句;
展示所述目标纪要语句以及所述目标纪要语句的关联语句。
根据本公开的一个或多个实施例,本公开提供会议纪要的处理方法中,所述关联语句包括在所述会议文本中,与所述目标纪要语句具有位置关联的字幕语句,所述会议文本中包括多个所述字幕语句,所述目标纪要语句包括目标会议待办语句。
根据本公开的一个或多个实施例,本公开提供会议纪要的处理方法中,所述展示所述目标纪要语句以及所述目标纪要语句的关联语句,包括:
在会议纪要展示界面中的悬浮窗口中展示所述目标纪要语句以及所述目标纪要语句的关联语句。
根据本公开的一个或多个实施例,本公开提供会议纪要的处理方法中,还包括:
基于所述目标纪要语句的关联时间段播放所述会议音视频,并将所述会议文本中所述目标纪要语句的关联字幕突出展示。
根据本公开的一个或多个实施例,本公开提供了一种会议纪要的处理装置,包括:
文本获取模块,用于获取会议音视频的会议文本;
初始待办模块,用于将所述会议文本输入待办识别模型中,确定初始待办语句;
时态判断模块,用于将所述初始待办语句输入时态判断模型中,确定所述初始待办语句的时态结果;
会议待办模块,用于基于所述时态结果确定所述初始待办语句中的会议待办语句。
根据本公开的一个或多个实施例,本公开提供的会议纪要的处理装置中,所述初始待办模块具体用于:
将所述会议文本中的文本语句转换为句向量,并将所述句向量输入所述待办识别模型中,确定初始待办语句,其中,所述待办识别模型为单分类模型。
根据本公开的一个或多个实施例,本公开提供的会议纪要的处理装置中, 所述装置还包括模型训练模块,具体用于:
基于待办语句正样本对初始单分类模型进行训练,得到所述待办识别模型。
根据本公开的一个或多个实施例,本公开提供的会议纪要的处理装置中,所述会议待办模块具体用于:
将所述时态结果为将来时态的初始待办语句确定为会议待办语句。
根据本公开的一个或多个实施例,本公开提供的会议纪要的处理装置中,所述装置还包括预处理模块,用于:获取会议音视频的会议文本之后,
对所述会议文本进行句子划分,得到多个文本语句;
基于设定规则对所述文本语句进行预处理,以对所述文本语句进行筛选。
根据本公开的一个或多个实施例,本公开提供的会议纪要的处理装置中,所述预处理模块具体用于:
删除缺少意图词的文本语句;和/或,
删除文本长度小于长度阈值的文本语句;和/或,
删除缺少名词的文本语句。
根据本公开的一个或多个实施例,本公开提供的会议纪要的处理装置中,所述预处理模块具体用于:
基于设定句式对所述文本语句进行句式匹配,删除不满足所述设定句式的文本语句。
根据本公开的一个或多个实施例,本公开提供了一种会议纪要的处理装置,包括:
展示触发模块,用于接收用户对会议纪要展示界面中目标纪要语句的展示触发操作,其中,所述会议纪要展示界面中展示有会议音视频、所述会议音视频的会议文本以及所述目标纪要语句;
展示模块,用于展示所述目标纪要语句以及所述目标纪要语句的关联语句。
根据本公开的一个或多个实施例,本公开提供的会议纪要的处理装置中,所述关联语句包括在所述会议文本中,与所述目标纪要语句具有位置关联的字幕语句,所述会议文本中包括多个所述字幕语句,所述目标纪要语句包括目标会议待办语句。
根据本公开的一个或多个实施例,本公开提供的会议纪要的处理装置中,所述展示模块具体用于:
在会议纪要展示界面中的悬浮窗口中展示所述目标纪要语句以及所述目标纪要语句的关联语句。
根据本公开的一个或多个实施例,本公开提供的会议纪要的处理装置中,所述装置还包括关联互动模块,用于:
基于所述目标纪要语句的关联时间段播放所述会议音视频,并将所述会议文本中所述目标纪要语句的关联字幕突出展示。
根据本公开的一个或多个实施例,本公开提供了一种电子设备,包括:
处理器;
用于存储所述处理器可执行指令的存储器;
所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现如本公开提供的任一所述的会议纪要的处理方法。
根据本公开的一个或多个实施例,本公开提供了一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于执行如本公开提供的任一所述的会议纪要的处理方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (15)

  1. 一种会议纪要的处理方法,其特征在于,包括:
    获取会议音视频的会议文本;
    将所述会议文本输入待办识别模型中,确定初始待办语句;
    将所述初始待办语句输入时态判断模型中,确定所述初始待办语句的时态结果;
    基于所述时态结果确定所述初始待办语句中的会议待办语句。
  2. 根据权利要求1所述的方法,其特征在于,所述将所述会议文本输入待办识别模型中,确定初始待办语句,包括:
    将所述会议文本中的文本语句转换为句向量,并将所述句向量输入所述待办识别模型中,确定初始待办语句,其中,所述待办识别模型为单分类模型。
  3. 根据权利要求1所述的方法,其特征在于,所述待办识别模型通过如下方式生成:
    基于待办语句正样本对初始单分类模型进行训练,得到所述待办识别模型。
  4. 根据权利要求1所述的方法,其特征在于,所述基于所述时态结果确定所述初始待办语句中的会议待办语句,包括:
    将所述时态结果为将来时态的初始待办语句确定为会议待办语句。
  5. 根据权利要求1所述的方法,其特征在于,所述获取会议音视频的会议文本之后,所述方法还包括:
    对所述会议文本进行句子划分,得到多个文本语句;
    基于设定规则对所述文本语句进行预处理,以对所述文本语句进行筛选。
  6. 根据权利要求5所述的方法,其特征在于,所述基于设定规则对所述文本语句进行预处理,包括:
    删除缺少意图词的文本语句;和/或,
    删除文本长度小于长度阈值的文本语句;和/或,
    删除缺少名词的文本语句。
  7. 根据权利要求5所述的方法,其特征在于,所述基于设定规则的所述文本语句进行预处理,包括:
    基于设定句式对所述文本语句进行句式匹配,删除不满足所述设定句式的文本语句。
  8. 一种会议纪要的处理方法,其特征在于,包括:
    接收用户对会议纪要展示界面中目标纪要语句的展示触发操作,其中,所述会议纪要展示界面中展示有会议音视频、所述会议音视频的会议文本以及所述目标纪要语句;
    展示所述目标纪要语句以及所述目标纪要语句的关联语句。
  9. 根据权利要求8所述的方法,其特征在于,所述关联语句包括在所述会议文本中,与所述目标纪要语句具有位置关联的字幕语句,所述会议文本中包括多个所述字幕语句,所述目标纪要语句包括目标会议待办语句。
  10. 根据权利要求8所述的方法,其特征在于,所述展示所述目标纪要语句以及所述目标纪要语句的关联语句,包括:
    在会议纪要展示界面中的悬浮窗口中展示所述目标纪要语句以及所述目标纪要语句的关联语句。
  11. 根据权利要求8所述的方法,其特征在于,还包括:
    基于所述目标纪要语句的关联时间段播放所述会议音视频,并将所述会议文本中所述目标纪要语句的关联字幕突出展示。
  12. 一种会议纪要的处理装置,其特征在于,包括:
    文本获取模块,用于获取会议音视频的会议文本;
    初始待办模块,用于将所述会议文本输入待办识别模型中,确定初始待办语句;
    时态判断模块,用于将所述初始待办语句输入时态判断模型中,确定所述初始待办语句的时态结果;
    会议待办模块,用于基于所述时态结果确定所述初始待办语句中的会议待办语句。
  13. 一种会议纪要的处理装置,其特征在于,包括:
    展示触发模块,用于接收用户对会议纪要展示界面中目标纪要语句的展示触发操作,其中,所述会议纪要展示界面中展示有会议音视频、所述会议音视频的会议文本以及所述目标纪要语句;
    展示模块,用于展示所述目标纪要语句以及所述目标纪要语句的关联语句。
  14. 一种电子设备,其特征在于,所述电子设备包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现上述权利要求1-11中任一所述的会议纪要的处理方法。
  15. 一种计算机可读存储介质,其特征在于,所述存储介质存储有计算机程序,所述计算机程序用于执行上述权利要求1-11中任一所述的会议纪要的处理方法。
PCT/CN2022/070282 2021-01-27 2022-01-05 一种会议纪要的处理方法、装置、设备及介质 WO2022161122A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2023544227A JP2024506495A (ja) 2021-01-27 2022-01-05 議事録の処理方法、装置、機器及び媒体
US18/262,400 US20240079002A1 (en) 2021-01-27 2022-01-05 Minutes of meeting processing method and apparatus, device, and medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110113700.1 2021-01-27
CN202110113700.1A CN113011169B (zh) 2021-01-27 2021-01-27 一种会议纪要的处理方法、装置、设备及介质

Publications (1)

Publication Number Publication Date
WO2022161122A1 true WO2022161122A1 (zh) 2022-08-04

Family

ID=76384614

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/070282 WO2022161122A1 (zh) 2021-01-27 2022-01-05 一种会议纪要的处理方法、装置、设备及介质

Country Status (4)

Country Link
US (1) US20240079002A1 (zh)
JP (1) JP2024506495A (zh)
CN (1) CN113011169B (zh)
WO (1) WO2022161122A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011169B (zh) * 2021-01-27 2022-11-11 北京字跳网络技术有限公司 一种会议纪要的处理方法、装置、设备及介质
CN114936001A (zh) * 2022-04-14 2022-08-23 阿里巴巴(中国)有限公司 交互方法、装置及电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080022209A1 (en) * 2006-07-19 2008-01-24 Lyle Ruthie D Dynamically controlling content and flow of an electronic meeting
CN102572372A (zh) * 2011-12-28 2012-07-11 中兴通讯股份有限公司 会议纪要的提取方法和装置
CN110533382A (zh) * 2019-07-24 2019-12-03 阿里巴巴集团控股有限公司 会议纪要的处理方法、装置、服务器及可读存储介质
CN111739541A (zh) * 2019-03-19 2020-10-02 上海云思智慧信息技术有限公司 一种基于语音的会议协助方法及系统、存储介质及终端
CN112069800A (zh) * 2020-09-14 2020-12-11 深圳前海微众银行股份有限公司 基于依存句法的句子时态识别方法、设备和可读存储介质
CN113011169A (zh) * 2021-01-27 2021-06-22 北京字跳网络技术有限公司 一种会议纪要的处理方法、装置、设备及介质

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040064322A1 (en) * 2002-09-30 2004-04-01 Intel Corporation Automatic consolidation of voice enabled multi-user meeting minutes
US7298930B1 (en) * 2002-11-29 2007-11-20 Ricoh Company, Ltd. Multimodal access of meeting recordings
JP2006091938A (ja) * 2004-09-16 2006-04-06 Ricoh Co Ltd 電子会議システム
EP2566144B1 (en) * 2011-09-01 2017-05-03 BlackBerry Limited Conferenced voice to text transcription
TWI590240B (zh) * 2014-12-30 2017-07-01 鴻海精密工業股份有限公司 會議記錄裝置及其自動生成會議記錄的方法
TWI619115B (zh) * 2014-12-30 2018-03-21 鴻海精密工業股份有限公司 會議記錄裝置及其自動生成會議記錄的方法
CN104954151A (zh) * 2015-04-24 2015-09-30 成都腾悦科技有限公司 一种基于网络会议的会议纪要提取与推送方法
CN107562723A (zh) * 2017-08-24 2018-01-09 网易乐得科技有限公司 会议处理方法、介质、装置和计算设备
CN107733666A (zh) * 2017-10-31 2018-02-23 珠海格力电器股份有限公司 一种会议实现方法、装置及电子设备
CN108366216A (zh) * 2018-02-28 2018-08-03 深圳市爱影互联文化传播有限公司 会议视频录制、记录及传播方法、装置及服务器
JP6601545B2 (ja) * 2018-09-13 2019-11-06 株式会社リコー 支援装置、支援方法およびプログラム
CN110717031B (zh) * 2019-10-15 2021-05-18 南京摄星智能科技有限公司 一种智能会议纪要生成方法和系统
CN111832308B (zh) * 2020-07-17 2023-09-08 思必驰科技股份有限公司 语音识别文本连贯性处理方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080022209A1 (en) * 2006-07-19 2008-01-24 Lyle Ruthie D Dynamically controlling content and flow of an electronic meeting
CN102572372A (zh) * 2011-12-28 2012-07-11 中兴通讯股份有限公司 会议纪要的提取方法和装置
CN111739541A (zh) * 2019-03-19 2020-10-02 上海云思智慧信息技术有限公司 一种基于语音的会议协助方法及系统、存储介质及终端
CN110533382A (zh) * 2019-07-24 2019-12-03 阿里巴巴集团控股有限公司 会议纪要的处理方法、装置、服务器及可读存储介质
CN112069800A (zh) * 2020-09-14 2020-12-11 深圳前海微众银行股份有限公司 基于依存句法的句子时态识别方法、设备和可读存储介质
CN113011169A (zh) * 2021-01-27 2021-06-22 北京字跳网络技术有限公司 一种会议纪要的处理方法、装置、设备及介质

Also Published As

Publication number Publication date
JP2024506495A (ja) 2024-02-14
US20240079002A1 (en) 2024-03-07
CN113011169B (zh) 2022-11-11
CN113011169A (zh) 2021-06-22

Similar Documents

Publication Publication Date Title
CN108052577B (zh) 一种通用文本内容挖掘方法、装置、服务器及存储介质
CN107832433B (zh) 基于对话交互的信息推荐方法、装置、服务器和存储介质
CN110969012B (zh) 文本纠错方法、装置、存储介质及电子设备
CN109145104B (zh) 用于对话交互的方法和装置
WO2022105710A1 (zh) 一种会议纪要的交互方法、装置、设备及介质
WO2022161122A1 (zh) 一种会议纪要的处理方法、装置、设备及介质
WO2023279843A1 (zh) 内容搜索方法、装置、设备和存储介质
WO2022105709A1 (zh) 多媒体的交互方法、信息交互方法、装置、设备及介质
WO2022037419A1 (zh) 音频内容识别方法、装置、设备和计算机可读介质
CN113723087B (zh) 信息处理方法、装置、设备、可读存储介质及产品
WO2020182123A1 (zh) 用于推送语句的方法和装置
WO2022166613A1 (zh) 文本中角色的识别方法、装置、可读介质和电子设备
CN112906381B (zh) 对话归属的识别方法、装置、可读介质和电子设备
WO2022105760A1 (zh) 一种多媒体浏览方法、装置、设备及介质
WO2023065825A1 (zh) 信息处理方法、装置、设备及介质
WO2023142913A1 (zh) 视频处理方法、装置、可读介质及电子设备
CN112380365A (zh) 一种多媒体的字幕交互方法、装置、设备及介质
CN113724709A (zh) 文本内容匹配方法、装置、电子设备及存储介质
WO2022184034A1 (zh) 一种文档处理方法、装置、设备和介质
WO2024087821A1 (zh) 信息处理方法、装置和电子设备
CN112069786A (zh) 文本信息处理方法、装置、电子设备及介质
CN111382262A (zh) 用于输出信息的方法和装置
CN112699687A (zh) 内容编目方法、装置和电子设备
CN112231444A (zh) 结合rpa和ai的语料数据的处理方法、装置和电子设备
CN113132789B (zh) 一种多媒体的交互方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22745000

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023544227

Country of ref document: JP

Ref document number: 18262400

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22745000

Country of ref document: EP

Kind code of ref document: A1