CN113326387B - Intelligent conference information retrieval method - Google Patents

Intelligent conference information retrieval method Download PDF

Info

Publication number
CN113326387B
CN113326387B CN202110603641.6A CN202110603641A CN113326387B CN 113326387 B CN113326387 B CN 113326387B CN 202110603641 A CN202110603641 A CN 202110603641A CN 113326387 B CN113326387 B CN 113326387B
Authority
CN
China
Prior art keywords
information
conference
text
video
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110603641.6A
Other languages
Chinese (zh)
Other versions
CN113326387A (en
Inventor
孟强祥
田俊麟
宋昱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Introduction Of Chinese Technology Shenzhen Co ltd
Original Assignee
Introduction Of Chinese Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Introduction Of Chinese Technology Shenzhen Co ltd filed Critical Introduction Of Chinese Technology Shenzhen Co ltd
Priority to CN202110603641.6A priority Critical patent/CN113326387B/en
Publication of CN113326387A publication Critical patent/CN113326387A/en
Application granted granted Critical
Publication of CN113326387B publication Critical patent/CN113326387B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/483Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/438Presentation of query results

Abstract

The invention discloses an intelligent retrieval method for meeting information, which relates to the technical field of meeting records and comprises the following steps: recording conference information, recording and extracting audio streams of conference video contents in a multimedia mode in the whole process in real time, sending the audio streams to a voice recognition module to convert voice into character information, storing the character information, marking according to the conference progress time, inputting text information or voice information to inquire, matching and inquiring with the conference information stored previously, and returning corresponding audio or video information. The conference information is recorded in a multimedia mode, the retrieval information and the conference information are matched and inquired through multi-level processing, when the conference record is matched, the information of the time shaft where the corresponding record is located is displayed, and meanwhile, the audio information which enables a user to hear the speaking of the current conference is played, so that the later analysis and understanding of the conference are more convenient, and the conference record retrieval experience is greatly improved.

Description

Intelligent conference information retrieval method
Technical Field
The invention relates to the technical field of conference recording, in particular to an intelligent conference information retrieval method.
Background
As technology advances, many products that automatically record conference content are being launched. From the earliest recorders to automated speech-to-text equipment. These recording methods record a lot of contents, which often last for several hours. Resulting in time and effort for reviewing or retrieving the meeting record. Although some advanced products label conference participants according to human biological characteristics such as voiceprints, fingerprints and the like, and then quickly locate conference recording contents through the labels, even labeling by using geographic information and administrative levels, the advanced products have the disadvantages of being not humanized, such as: conference records cannot be inquired and retrieved according to contents, the inquiry records are single in mode, only can be manually reviewed and listened, and cannot be quickly positioned.
Disclosure of Invention
The invention aims to provide an intelligent conference information retrieval method to overcome the defects in the prior art.
In order to achieve the above purpose, the invention provides the following technical scheme: an intelligent conference information retrieval method comprises the following steps:
recording meeting information in a multimedia mode in real time in the whole process, wherein the recording comprises archiving of the whole video, audio, text and other forms of the meeting;
step two, extracting the audio stream of the conference video content, copying the audio stream from a media file or a Container (Container) of a stream file by using demultiplexing (demux) to extract the audio stream from the video stream, and sending the audio stream to a voice recognition module to convert voice into text information and store the text information, wherein the original video file is unchanged;
marking the video, audio and text of the conference record according to the time of the conference, taking a speaking detection technology or a silence detection technology as a starting and ending judgment basis, further combining a context judgment technology of NLP (natural language processing) to take the speaking content as a unit or take words as a unit according to the SBD (sequence boundary prediction) and the WS (Word Segmentation) with smaller granularity, and respectively adding marks according to sentences and words and storing the processed conference record content;
step four, the user searches the conference record, inputs text information or voice information for inquiry, converts the voice into a text through the voice-to-text module if the voice information is received, matches and inquires the text with the conference information stored previously, returns corresponding audio or video information and attaches the text information converted by the voice;
and step five, when the user views the returned result, the recorded content of the context can be quickly retrieved, namely, the user can simultaneously view the conference information before and after the retrieved time period, the recorded content is displayed to the user in text, audio or video information through highlighting, and the user can intuitively position, select and modify the corresponding content.
Preferably, in the first step, if the conference is a network video conference, the conference information is directly obtained through the network, and if the conference is a non-network conference, the conference is recorded through multimedia devices such as audio recording and video recording, and extraction and conversion are performed.
Preferably, the text information converted by the voice in the second step can be used for displaying and recording the real-time conference subtitles while being stored.
Preferably, the time interval marked in step three is marked by a sentence or a pause in the audio containing the content of the utterance.
Preferably, the marked Video Segments, audio Segments and Text Segments in the third step are stored in a one-to-one correspondence with a time sequence table, wherein the Video Segments are recorded in a List VSRL (Video Segments Recording List) in time sequence, the audio Segments are recorded in a List SSRL (Speech Segments Recording List) in time sequence, and the Text segment information is recorded in a List TSRL (Text Segments Recording List) in time sequence.
Preferably, the matching process in the fourth step includes the following steps:
step a, first-level character matching, wherein text information generated by user searching is used for matching text information stored in a TSRL, if the text information can be matched, audio information of a corresponding time period is returned, and if corresponding video information exists, the video information of the corresponding time period is directly returned.
B, second-level character matching, if the first level can not be matched, reducing the text information to smaller granularity through SBD for matching again, if the text information can be matched, returning the corresponding audio or video information,
and c, second-stage processing, namely decomposing the information into smaller granularity for re-matching through WS if the second-stage processing cannot be matched, returning corresponding audio or video information if the information can be matched, and otherwise, ensuring that the query information cannot be matched.
In the technical scheme, the invention provides the following technical effects and advantages:
the invention records the conference information in a multimedia mode, marks and stores the video, audio and text recorded by the conference according to the conference proceeding time, the user searches and matches the text information, matches and inquires the searched information and the conference information through multi-stage processing, when the conference record is matched, the information of the time axis where the corresponding record is located is displayed, the user can select the text information through the interactive equipment, the corresponding audio is highlighted, and simultaneously the audio information which enables the user to intuitively hear the speaking of the current conference is played, the user can randomly select any paragraph in the text module, the corresponding audio or video can be synchronously positioned and played, otherwise, the user can quickly search the audio or video content, the corresponding text information can be immediately displayed, thereby the later analysis and understanding of the conference are more convenient, and the conference record searching experience is greatly improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a flow chart of a query matching process of the present invention.
FIG. 3 is an exemplary diagram of an interaction interface when the present invention returns a result.
Fig. 4 is a diagram of another example of a pickup interface for a case where only audio and text information are returned according to the present invention.
FIG. 5 is an exemplary diagram of an interface for a user to select a query message status in the state of FIG. 4 according to the present invention.
Description of reference numerals:
A. a video information display module; B. a video information clip display module of a time axis; C. a text information display module; D. an audio information display module; E. and a time position display module.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the present invention will be further described in detail with reference to the accompanying drawings.
The invention provides an intelligent conference information retrieval method, which comprises the following steps:
recording conference information, recording the whole process in a multimedia mode in real time, filing the whole conference in forms of whole video, audio, text and the like, directly acquiring the conference information through a network if the conference is a network video conference, and recording the conference through multimedia equipment such as sound recording and video recording if the conference is a non-network conference, and extracting and converting;
step two, extracting the audio stream of the conference video content, copying the audio stream from a media file or a Container (Container) of a stream file by using demultiplexing (demux) to extract the audio stream from the video stream, and sending the audio stream to a voice recognition module to convert voice into text information and store the text information while the original video file is kept unchanged, wherein the audio stream can be used for displaying and recording a real-time conference caption;
marking the video, audio and text of the conference record according to the time of the conference, taking a talk detection technology or a silence detection technology as a starting and ending judgment basis, taking a Sentence or a pause containing the talk content in the audio as a mark at a time interval, further combining with a context judgment technology of NLP (natural language processing) to take the talk content as a unit or a Word as a unit according to the Sentence or the Word by not limiting SBD (sequence boundary prediction) and WS (Word Segmentation) with smaller granularity, and respectively adding marks according to the Sentence and the Word and storing the processed conference record content;
the marked Video Segments, audio Segments and Text fields are respectively stored in a one-to-one correspondence manner by setting a time sequence table, wherein the Video Segments are recorded in a List VSRL (Video Segments Recording List) according to the time sequence, the audio Segments are recorded in a List SSRL (speed Segments Recording List) according to the time sequence, the Text segment information is recorded in a List TSRL (Text Segments Recording List) according to the time sequence, and the structures of the VSRL, the SSRL and the TSRL are respectively shown in a table 1, a table 2 and a table 3:
table 1 vsrl examples
Sequence No. Time Offset Duration SegmentsURL
0 00:00:00.000 1000 VS001.mp4
1 00:00:01.000 1000 VS002.mp4
2 00:00:02.000 1500 VS003.mp4
Wherein, the first and the second end of the pipe are connected with each other,
sequence No. represents a mark serial number, and the key value of the mark similarity relation table is unique and is corresponding to the SSRL and the TSRL;
time Offset represents the Offset from the entire video, from the beginning to the current;
duration represents the time length of the current segment in milliseconds ms;
segmentsrurl indicates the URL information of the video file storing the current segment; the streaming media player can directly play the corresponding video by using the URL; in actual use, the address should be further encrypted, and the data security is improved through encryption.
TABLE 2 SSRL examples
Sequence No. Time Offset Duration SegmentsURL
0 00:00:00.000 1000 SS001.wav
1 00:00:01.000 1000 SS002.wav
2 00:00:02.000 1500 SS003.wav
Wherein the content of the first and second substances,
sequence No. indicates a tag number, the same as VSRL;
time Offset represents the Offset from the entire video, from the beginning to the current;
duration represents the time length of the current segment in milliseconds ms;
segmentsrurl indicates the audio file URL information that stores the current segment; the streaming media player can directly play the corresponding audio by using the URL; in actual use, the address should be further encrypted, and the data security is improved through encryption.
Wherein
Sequence No. VSRL =Sequence No. SSRL =Sequence No. TSRL
Table 3 tsrl examples
Figure BDA0003093494680000061
Wherein, the first and the second end of the pipe are connected with each other,
sequence No. indicates a tag number, which is the same as VSRL;
the Original Language Code represents the Language of an Original text, and is represented by an ISO-639-1 standard, wherein en is English, zh is Chinese and the like;
code Page, representing character set of literal Code, 1209 UTF-8Unicode;
characters, representing a file URL where text is stored;
step four, the user searches the conference record, inputs text information or voice information for inquiry, converts the voice into a text through the voice-to-text module if the voice information is received, matches and inquires the text with the conference information stored previously, returns corresponding audio or video information and attaches the text information converted by the voice;
the text matching process comprises the following steps:
step a, first-level character matching, wherein text information generated by user searching is used for matching text information stored in a TSRL, if the text information can be matched, audio information of a corresponding time period is returned, and if corresponding video information exists, the video information of the corresponding time period is directly returned.
B, second level character matching, if the first level can not be matched, reducing the text information to smaller granularity through SBD for matching again, if the text information can be matched, returning the corresponding audio or video information,
step c, second level processing, if the second level can not be matched, the information is decomposed into smaller granularity through WS and matched again, if the information can be matched, the corresponding audio or video information is returned, otherwise, the inquiry information can not be matched
And step five, when the user checks the returned result, the recorded content of the context can be quickly searched, namely, the user can simultaneously check the meeting information before and after the searched time period, the recorded content is displayed to the user in text, audio or video information through highlighting, and the user can intuitively position, select and modify the corresponding content.
In summary, the present invention records in a multimedia manner, includes archiving the whole video, audio, text, etc. of a conference, sends an audio stream to a voice recognition module to convert voice into text information, marks the video, audio, and text recorded in the conference according to the time of the conference, and stores the video, audio, and text in a one-to-one correspondence with the time mark as a basis, a user queries by inputting text information or voice information, if the voice information is received, converts the voice into text through a voice-to-text module, matches and queries the conference information through multi-level processing, when the conference record is matched, displays information corresponding to the time axis in which the record is located, including information of upper and lower paragraphs, the user can select text information through an interactive device such as a mouse or a touch screen, the text information is highlighted, the corresponding audio is also displayed, and plays audio information that the user can intuitively speak in the conference at the same time, if there is video information corresponding to the recorded, the user can randomly select any paragraph in the text module, the corresponding audio or video can be synchronously positioned and played, otherwise, the user can quickly retrieve the audio or video content of the corresponding text information at the conference, and can be immediately displayed, thereby greatly improving the experience of the later retrieval, and the later-stage of the conference can be more conveniently retrieved.
While certain exemplary embodiments of the present invention have been described above by way of illustration only, it will be apparent to those of ordinary skill in the art that the described embodiments may be modified in various different ways without departing from the spirit and scope of the invention. Accordingly, the drawings and description are illustrative in nature and should not be construed as limiting the scope of the invention.

Claims (5)

1. An intelligent conference information retrieval method is characterized by comprising the following steps:
recording conference information in a multimedia mode in real time in the whole process, wherein the whole process comprises archiving in the forms of whole video, audio, text and the like of a conference;
step two, extracting the audio stream of the conference video content, copying the audio stream from a media file or a Container (Container) of a stream file by using demultiplexing (demux) to extract the audio stream from the video stream, and sending the audio stream to a voice recognition module to convert voice into text information and store the text information, wherein the original video file is unchanged;
marking the video, audio and text of the conference record according to the time of the conference, taking a speaking detection technology or a silence detection technology as a starting and ending judgment basis, further combining a context judgment technology of NLP (natural language processing) including but not limited to SBD (sequence boundary prediction) and WS (Word Segmentation) with smaller granularity to process the speaking content as a unit or a Word as a unit, and adding marks on the processed conference record content according to sentences and words and storing the processed conference record content;
step four, the user searches the meeting record, inputs text information or voice information for inquiry, if the voice information is received, the voice is converted into text through the voice-to-text module, and the text is matched and inquired with the meeting information stored previously, and corresponding audio or video information is returned, and the text information converted by the voice is attached, wherein the matching process comprises the following steps:
step a, first-level character matching, wherein text information generated by user searching is used for matching text information stored in a TSRL, if the text information can be matched, audio information of a corresponding time period is returned, and if corresponding video information exists, the video information of the corresponding time period is directly returned;
b, second-level character matching, wherein if the first level character matching cannot be achieved, the text information is reduced to smaller granularity through SBD and matched again, and if the text information can be matched, the corresponding audio or video information is returned;
step c, second-level processing, namely decomposing the information into smaller granularity for re-matching through WS if the second-level processing fails to match, returning corresponding audio or video information if the information can be matched, otherwise, ensuring that the query information cannot be matched;
and step five, when the user views the returned result, the recorded content of the context can be quickly retrieved, namely, the user can simultaneously view the conference information before and after the retrieved time period, the recorded content is displayed to the user in text, audio or video information through highlighting, and the user can intuitively position, select and modify the corresponding content.
2. The intelligent conference information retrieval method according to claim 1, wherein: in the first step, if the conference is a network video conference, the conference information is directly acquired through the network, and if the conference is a non-network conference, the conference is recorded through multimedia equipment such as audio recording and video recording, and extraction and conversion are performed.
3. The intelligent conference information retrieval method according to claim 1, wherein: and the text information converted by the voice in the second step can be used for displaying and recording the real-time conference subtitles while being stored.
4. The intelligent conference information retrieval method according to claim 1, wherein: the time interval marked in step three is marked by a sentence or a pause in the audio containing the content of the utterance.
5. The intelligent conference information retrieval method according to claim 1, wherein: the marked Video Segments, audio Segments and Text Segments in the third step are respectively stored in a one-to-one correspondence way by setting a time sequence List, wherein the Video Segments are recorded in a List VSRL (Video Segments Recording List) according to the time sequence, the audio Segments are recorded in a List SSRL (Speech Segments Recording List) according to the time sequence, and the Text segment information is recorded in a List TSRL (Text Segments Recording List) according to the time sequence.
CN202110603641.6A 2021-05-31 2021-05-31 Intelligent conference information retrieval method Active CN113326387B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110603641.6A CN113326387B (en) 2021-05-31 2021-05-31 Intelligent conference information retrieval method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110603641.6A CN113326387B (en) 2021-05-31 2021-05-31 Intelligent conference information retrieval method

Publications (2)

Publication Number Publication Date
CN113326387A CN113326387A (en) 2021-08-31
CN113326387B true CN113326387B (en) 2022-12-13

Family

ID=77422786

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110603641.6A Active CN113326387B (en) 2021-05-31 2021-05-31 Intelligent conference information retrieval method

Country Status (1)

Country Link
CN (1) CN113326387B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116193179A (en) * 2021-11-26 2023-05-30 华为技术有限公司 Conference recording method, terminal equipment and conference recording system
CN114661943A (en) * 2022-05-21 2022-06-24 中科云策(深圳)科技成果转化信息技术有限公司 Conference information storage management system
CN115828907B (en) * 2023-02-16 2023-04-25 南昌航天广信科技有限责任公司 Intelligent conference management method, system, readable storage medium and computer device
CN116708055B (en) * 2023-06-06 2024-02-20 深圳市艾姆诗电商股份有限公司 Intelligent multimedia audiovisual image processing method, system and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045828A (en) * 2015-06-26 2015-11-11 徐信 Retrieval system and method for accurate positioning of audio/video speech information
CN108345679A (en) * 2018-02-26 2018-07-31 科大讯飞股份有限公司 A kind of audio and video search method, device, equipment and readable storage medium storing program for executing
CN111814028A (en) * 2020-09-14 2020-10-23 腾讯科技(深圳)有限公司 Information searching method and device
CN112765460A (en) * 2021-01-08 2021-05-07 北京字跳网络技术有限公司 Conference information query method, device, storage medium, terminal device and server
CN112839195A (en) * 2020-12-30 2021-05-25 深圳市皓丽智能科技有限公司 Method and device for consulting meeting record, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045828A (en) * 2015-06-26 2015-11-11 徐信 Retrieval system and method for accurate positioning of audio/video speech information
CN108345679A (en) * 2018-02-26 2018-07-31 科大讯飞股份有限公司 A kind of audio and video search method, device, equipment and readable storage medium storing program for executing
CN111814028A (en) * 2020-09-14 2020-10-23 腾讯科技(深圳)有限公司 Information searching method and device
CN112839195A (en) * 2020-12-30 2021-05-25 深圳市皓丽智能科技有限公司 Method and device for consulting meeting record, computer equipment and storage medium
CN112765460A (en) * 2021-01-08 2021-05-07 北京字跳网络技术有限公司 Conference information query method, device, storage medium, terminal device and server

Also Published As

Publication number Publication date
CN113326387A (en) 2021-08-31

Similar Documents

Publication Publication Date Title
CN113326387B (en) Intelligent conference information retrieval method
CN101382937B (en) Multimedia resource processing method based on speech recognition and on-line teaching system thereof
US5649060A (en) Automatic indexing and aligning of audio and text using speech recognition
US6816858B1 (en) System, method and apparatus providing collateral information for a video/audio stream
US10225625B2 (en) Caption extraction and analysis
US9576581B2 (en) Metatagging of captions
US20090043581A1 (en) Methods and apparatus relating to searching of spoken audio data
US8965916B2 (en) Method and apparatus for providing media content
US20100299131A1 (en) Transcript alignment
JP2007519987A (en) Integrated analysis system and method for internal and external audiovisual data
NO325191B1 (en) Sociable multimedia stream
CN110781328A (en) Video generation method, system, device and storage medium based on voice recognition
Bougrine et al. Toward a Web-based speech corpus for Algerian dialectal Arabic varieties
Kamabathula et al. Automated tagging to enable fine-grained browsing of lecture videos
Roy et al. Speaker identification based text to audio alignment for an audio retrieval system
CN102136001B (en) Multi-media information fuzzy search method
US6813624B1 (en) Method and apparatus for archival and retrieval of multiple data streams
CN106550268B (en) Video processing method and video processing device
Nouza et al. Making czech historical radio archive accessible and searchable for wide public
JP4140745B2 (en) How to add timing information to subtitles
CN109376145B (en) Method and device for establishing movie and television dialogue database and storage medium
GB2451938A (en) Methods and apparatus for searching of spoken audio data
KR20010037652A (en) Audio indexing system and method, and audio retrieval system and method
Clements et al. Phonetic searching of digital audio
Chaudhary et al. Keyword based indexing of a multimedia file

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant