CN113326387B - Intelligent conference information retrieval method - Google Patents
Intelligent conference information retrieval method Download PDFInfo
- Publication number
- CN113326387B CN113326387B CN202110603641.6A CN202110603641A CN113326387B CN 113326387 B CN113326387 B CN 113326387B CN 202110603641 A CN202110603641 A CN 202110603641A CN 113326387 B CN113326387 B CN 113326387B
- Authority
- CN
- China
- Prior art keywords
- information
- conference
- text
- video
- audio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/48—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/483—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/43—Querying
- G06F16/438—Presentation of query results
Abstract
The invention discloses an intelligent retrieval method for meeting information, which relates to the technical field of meeting records and comprises the following steps: recording conference information, recording and extracting audio streams of conference video contents in a multimedia mode in the whole process in real time, sending the audio streams to a voice recognition module to convert voice into character information, storing the character information, marking according to the conference progress time, inputting text information or voice information to inquire, matching and inquiring with the conference information stored previously, and returning corresponding audio or video information. The conference information is recorded in a multimedia mode, the retrieval information and the conference information are matched and inquired through multi-level processing, when the conference record is matched, the information of the time shaft where the corresponding record is located is displayed, and meanwhile, the audio information which enables a user to hear the speaking of the current conference is played, so that the later analysis and understanding of the conference are more convenient, and the conference record retrieval experience is greatly improved.
Description
Technical Field
The invention relates to the technical field of conference recording, in particular to an intelligent conference information retrieval method.
Background
As technology advances, many products that automatically record conference content are being launched. From the earliest recorders to automated speech-to-text equipment. These recording methods record a lot of contents, which often last for several hours. Resulting in time and effort for reviewing or retrieving the meeting record. Although some advanced products label conference participants according to human biological characteristics such as voiceprints, fingerprints and the like, and then quickly locate conference recording contents through the labels, even labeling by using geographic information and administrative levels, the advanced products have the disadvantages of being not humanized, such as: conference records cannot be inquired and retrieved according to contents, the inquiry records are single in mode, only can be manually reviewed and listened, and cannot be quickly positioned.
Disclosure of Invention
The invention aims to provide an intelligent conference information retrieval method to overcome the defects in the prior art.
In order to achieve the above purpose, the invention provides the following technical scheme: an intelligent conference information retrieval method comprises the following steps:
recording meeting information in a multimedia mode in real time in the whole process, wherein the recording comprises archiving of the whole video, audio, text and other forms of the meeting;
step two, extracting the audio stream of the conference video content, copying the audio stream from a media file or a Container (Container) of a stream file by using demultiplexing (demux) to extract the audio stream from the video stream, and sending the audio stream to a voice recognition module to convert voice into text information and store the text information, wherein the original video file is unchanged;
marking the video, audio and text of the conference record according to the time of the conference, taking a speaking detection technology or a silence detection technology as a starting and ending judgment basis, further combining a context judgment technology of NLP (natural language processing) to take the speaking content as a unit or take words as a unit according to the SBD (sequence boundary prediction) and the WS (Word Segmentation) with smaller granularity, and respectively adding marks according to sentences and words and storing the processed conference record content;
step four, the user searches the conference record, inputs text information or voice information for inquiry, converts the voice into a text through the voice-to-text module if the voice information is received, matches and inquires the text with the conference information stored previously, returns corresponding audio or video information and attaches the text information converted by the voice;
and step five, when the user views the returned result, the recorded content of the context can be quickly retrieved, namely, the user can simultaneously view the conference information before and after the retrieved time period, the recorded content is displayed to the user in text, audio or video information through highlighting, and the user can intuitively position, select and modify the corresponding content.
Preferably, in the first step, if the conference is a network video conference, the conference information is directly obtained through the network, and if the conference is a non-network conference, the conference is recorded through multimedia devices such as audio recording and video recording, and extraction and conversion are performed.
Preferably, the text information converted by the voice in the second step can be used for displaying and recording the real-time conference subtitles while being stored.
Preferably, the time interval marked in step three is marked by a sentence or a pause in the audio containing the content of the utterance.
Preferably, the marked Video Segments, audio Segments and Text Segments in the third step are stored in a one-to-one correspondence with a time sequence table, wherein the Video Segments are recorded in a List VSRL (Video Segments Recording List) in time sequence, the audio Segments are recorded in a List SSRL (Speech Segments Recording List) in time sequence, and the Text segment information is recorded in a List TSRL (Text Segments Recording List) in time sequence.
Preferably, the matching process in the fourth step includes the following steps:
step a, first-level character matching, wherein text information generated by user searching is used for matching text information stored in a TSRL, if the text information can be matched, audio information of a corresponding time period is returned, and if corresponding video information exists, the video information of the corresponding time period is directly returned.
B, second-level character matching, if the first level can not be matched, reducing the text information to smaller granularity through SBD for matching again, if the text information can be matched, returning the corresponding audio or video information,
and c, second-stage processing, namely decomposing the information into smaller granularity for re-matching through WS if the second-stage processing cannot be matched, returning corresponding audio or video information if the information can be matched, and otherwise, ensuring that the query information cannot be matched.
In the technical scheme, the invention provides the following technical effects and advantages:
the invention records the conference information in a multimedia mode, marks and stores the video, audio and text recorded by the conference according to the conference proceeding time, the user searches and matches the text information, matches and inquires the searched information and the conference information through multi-stage processing, when the conference record is matched, the information of the time axis where the corresponding record is located is displayed, the user can select the text information through the interactive equipment, the corresponding audio is highlighted, and simultaneously the audio information which enables the user to intuitively hear the speaking of the current conference is played, the user can randomly select any paragraph in the text module, the corresponding audio or video can be synchronously positioned and played, otherwise, the user can quickly search the audio or video content, the corresponding text information can be immediately displayed, thereby the later analysis and understanding of the conference are more convenient, and the conference record searching experience is greatly improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a flow chart of a query matching process of the present invention.
FIG. 3 is an exemplary diagram of an interaction interface when the present invention returns a result.
Fig. 4 is a diagram of another example of a pickup interface for a case where only audio and text information are returned according to the present invention.
FIG. 5 is an exemplary diagram of an interface for a user to select a query message status in the state of FIG. 4 according to the present invention.
Description of reference numerals:
A. a video information display module; B. a video information clip display module of a time axis; C. a text information display module; D. an audio information display module; E. and a time position display module.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the present invention will be further described in detail with reference to the accompanying drawings.
The invention provides an intelligent conference information retrieval method, which comprises the following steps:
recording conference information, recording the whole process in a multimedia mode in real time, filing the whole conference in forms of whole video, audio, text and the like, directly acquiring the conference information through a network if the conference is a network video conference, and recording the conference through multimedia equipment such as sound recording and video recording if the conference is a non-network conference, and extracting and converting;
step two, extracting the audio stream of the conference video content, copying the audio stream from a media file or a Container (Container) of a stream file by using demultiplexing (demux) to extract the audio stream from the video stream, and sending the audio stream to a voice recognition module to convert voice into text information and store the text information while the original video file is kept unchanged, wherein the audio stream can be used for displaying and recording a real-time conference caption;
marking the video, audio and text of the conference record according to the time of the conference, taking a talk detection technology or a silence detection technology as a starting and ending judgment basis, taking a Sentence or a pause containing the talk content in the audio as a mark at a time interval, further combining with a context judgment technology of NLP (natural language processing) to take the talk content as a unit or a Word as a unit according to the Sentence or the Word by not limiting SBD (sequence boundary prediction) and WS (Word Segmentation) with smaller granularity, and respectively adding marks according to the Sentence and the Word and storing the processed conference record content;
the marked Video Segments, audio Segments and Text fields are respectively stored in a one-to-one correspondence manner by setting a time sequence table, wherein the Video Segments are recorded in a List VSRL (Video Segments Recording List) according to the time sequence, the audio Segments are recorded in a List SSRL (speed Segments Recording List) according to the time sequence, the Text segment information is recorded in a List TSRL (Text Segments Recording List) according to the time sequence, and the structures of the VSRL, the SSRL and the TSRL are respectively shown in a table 1, a table 2 and a table 3:
table 1 vsrl examples
Sequence No. | Time Offset | Duration | SegmentsURL |
0 | 00:00:00.000 | 1000 | VS001.mp4 |
1 | 00:00:01.000 | 1000 | VS002.mp4 |
2 | 00:00:02.000 | 1500 | VS003.mp4 |
… | … | … | … |
Wherein, the first and the second end of the pipe are connected with each other,
sequence No. represents a mark serial number, and the key value of the mark similarity relation table is unique and is corresponding to the SSRL and the TSRL;
time Offset represents the Offset from the entire video, from the beginning to the current;
duration represents the time length of the current segment in milliseconds ms;
segmentsrurl indicates the URL information of the video file storing the current segment; the streaming media player can directly play the corresponding video by using the URL; in actual use, the address should be further encrypted, and the data security is improved through encryption.
TABLE 2 SSRL examples
Sequence No. | Time Offset | Duration | SegmentsURL |
0 | 00:00:00.000 | 1000 | SS001.wav |
1 | 00:00:01.000 | 1000 | SS002.wav |
2 | 00:00:02.000 | 1500 | SS003.wav |
… | … | … | … |
Wherein the content of the first and second substances,
sequence No. indicates a tag number, the same as VSRL;
time Offset represents the Offset from the entire video, from the beginning to the current;
duration represents the time length of the current segment in milliseconds ms;
segmentsrurl indicates the audio file URL information that stores the current segment; the streaming media player can directly play the corresponding audio by using the URL; in actual use, the address should be further encrypted, and the data security is improved through encryption.
Wherein
Sequence No. VSRL =Sequence No. SSRL =Sequence No. TSRL
Table 3 tsrl examples
Wherein, the first and the second end of the pipe are connected with each other,
sequence No. indicates a tag number, which is the same as VSRL;
the Original Language Code represents the Language of an Original text, and is represented by an ISO-639-1 standard, wherein en is English, zh is Chinese and the like;
code Page, representing character set of literal Code, 1209 UTF-8Unicode;
characters, representing a file URL where text is stored;
step four, the user searches the conference record, inputs text information or voice information for inquiry, converts the voice into a text through the voice-to-text module if the voice information is received, matches and inquires the text with the conference information stored previously, returns corresponding audio or video information and attaches the text information converted by the voice;
the text matching process comprises the following steps:
step a, first-level character matching, wherein text information generated by user searching is used for matching text information stored in a TSRL, if the text information can be matched, audio information of a corresponding time period is returned, and if corresponding video information exists, the video information of the corresponding time period is directly returned.
B, second level character matching, if the first level can not be matched, reducing the text information to smaller granularity through SBD for matching again, if the text information can be matched, returning the corresponding audio or video information,
step c, second level processing, if the second level can not be matched, the information is decomposed into smaller granularity through WS and matched again, if the information can be matched, the corresponding audio or video information is returned, otherwise, the inquiry information can not be matched
And step five, when the user checks the returned result, the recorded content of the context can be quickly searched, namely, the user can simultaneously check the meeting information before and after the searched time period, the recorded content is displayed to the user in text, audio or video information through highlighting, and the user can intuitively position, select and modify the corresponding content.
In summary, the present invention records in a multimedia manner, includes archiving the whole video, audio, text, etc. of a conference, sends an audio stream to a voice recognition module to convert voice into text information, marks the video, audio, and text recorded in the conference according to the time of the conference, and stores the video, audio, and text in a one-to-one correspondence with the time mark as a basis, a user queries by inputting text information or voice information, if the voice information is received, converts the voice into text through a voice-to-text module, matches and queries the conference information through multi-level processing, when the conference record is matched, displays information corresponding to the time axis in which the record is located, including information of upper and lower paragraphs, the user can select text information through an interactive device such as a mouse or a touch screen, the text information is highlighted, the corresponding audio is also displayed, and plays audio information that the user can intuitively speak in the conference at the same time, if there is video information corresponding to the recorded, the user can randomly select any paragraph in the text module, the corresponding audio or video can be synchronously positioned and played, otherwise, the user can quickly retrieve the audio or video content of the corresponding text information at the conference, and can be immediately displayed, thereby greatly improving the experience of the later retrieval, and the later-stage of the conference can be more conveniently retrieved.
While certain exemplary embodiments of the present invention have been described above by way of illustration only, it will be apparent to those of ordinary skill in the art that the described embodiments may be modified in various different ways without departing from the spirit and scope of the invention. Accordingly, the drawings and description are illustrative in nature and should not be construed as limiting the scope of the invention.
Claims (5)
1. An intelligent conference information retrieval method is characterized by comprising the following steps:
recording conference information in a multimedia mode in real time in the whole process, wherein the whole process comprises archiving in the forms of whole video, audio, text and the like of a conference;
step two, extracting the audio stream of the conference video content, copying the audio stream from a media file or a Container (Container) of a stream file by using demultiplexing (demux) to extract the audio stream from the video stream, and sending the audio stream to a voice recognition module to convert voice into text information and store the text information, wherein the original video file is unchanged;
marking the video, audio and text of the conference record according to the time of the conference, taking a speaking detection technology or a silence detection technology as a starting and ending judgment basis, further combining a context judgment technology of NLP (natural language processing) including but not limited to SBD (sequence boundary prediction) and WS (Word Segmentation) with smaller granularity to process the speaking content as a unit or a Word as a unit, and adding marks on the processed conference record content according to sentences and words and storing the processed conference record content;
step four, the user searches the meeting record, inputs text information or voice information for inquiry, if the voice information is received, the voice is converted into text through the voice-to-text module, and the text is matched and inquired with the meeting information stored previously, and corresponding audio or video information is returned, and the text information converted by the voice is attached, wherein the matching process comprises the following steps:
step a, first-level character matching, wherein text information generated by user searching is used for matching text information stored in a TSRL, if the text information can be matched, audio information of a corresponding time period is returned, and if corresponding video information exists, the video information of the corresponding time period is directly returned;
b, second-level character matching, wherein if the first level character matching cannot be achieved, the text information is reduced to smaller granularity through SBD and matched again, and if the text information can be matched, the corresponding audio or video information is returned;
step c, second-level processing, namely decomposing the information into smaller granularity for re-matching through WS if the second-level processing fails to match, returning corresponding audio or video information if the information can be matched, otherwise, ensuring that the query information cannot be matched;
and step five, when the user views the returned result, the recorded content of the context can be quickly retrieved, namely, the user can simultaneously view the conference information before and after the retrieved time period, the recorded content is displayed to the user in text, audio or video information through highlighting, and the user can intuitively position, select and modify the corresponding content.
2. The intelligent conference information retrieval method according to claim 1, wherein: in the first step, if the conference is a network video conference, the conference information is directly acquired through the network, and if the conference is a non-network conference, the conference is recorded through multimedia equipment such as audio recording and video recording, and extraction and conversion are performed.
3. The intelligent conference information retrieval method according to claim 1, wherein: and the text information converted by the voice in the second step can be used for displaying and recording the real-time conference subtitles while being stored.
4. The intelligent conference information retrieval method according to claim 1, wherein: the time interval marked in step three is marked by a sentence or a pause in the audio containing the content of the utterance.
5. The intelligent conference information retrieval method according to claim 1, wherein: the marked Video Segments, audio Segments and Text Segments in the third step are respectively stored in a one-to-one correspondence way by setting a time sequence List, wherein the Video Segments are recorded in a List VSRL (Video Segments Recording List) according to the time sequence, the audio Segments are recorded in a List SSRL (Speech Segments Recording List) according to the time sequence, and the Text segment information is recorded in a List TSRL (Text Segments Recording List) according to the time sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110603641.6A CN113326387B (en) | 2021-05-31 | 2021-05-31 | Intelligent conference information retrieval method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110603641.6A CN113326387B (en) | 2021-05-31 | 2021-05-31 | Intelligent conference information retrieval method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113326387A CN113326387A (en) | 2021-08-31 |
CN113326387B true CN113326387B (en) | 2022-12-13 |
Family
ID=77422786
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110603641.6A Active CN113326387B (en) | 2021-05-31 | 2021-05-31 | Intelligent conference information retrieval method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113326387B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116193179A (en) * | 2021-11-26 | 2023-05-30 | 华为技术有限公司 | Conference recording method, terminal equipment and conference recording system |
CN114661943A (en) * | 2022-05-21 | 2022-06-24 | 中科云策(深圳)科技成果转化信息技术有限公司 | Conference information storage management system |
CN115828907B (en) * | 2023-02-16 | 2023-04-25 | 南昌航天广信科技有限责任公司 | Intelligent conference management method, system, readable storage medium and computer device |
CN116708055B (en) * | 2023-06-06 | 2024-02-20 | 深圳市艾姆诗电商股份有限公司 | Intelligent multimedia audiovisual image processing method, system and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105045828A (en) * | 2015-06-26 | 2015-11-11 | 徐信 | Retrieval system and method for accurate positioning of audio/video speech information |
CN108345679A (en) * | 2018-02-26 | 2018-07-31 | 科大讯飞股份有限公司 | A kind of audio and video search method, device, equipment and readable storage medium storing program for executing |
CN111814028A (en) * | 2020-09-14 | 2020-10-23 | 腾讯科技(深圳)有限公司 | Information searching method and device |
CN112765460A (en) * | 2021-01-08 | 2021-05-07 | 北京字跳网络技术有限公司 | Conference information query method, device, storage medium, terminal device and server |
CN112839195A (en) * | 2020-12-30 | 2021-05-25 | 深圳市皓丽智能科技有限公司 | Method and device for consulting meeting record, computer equipment and storage medium |
-
2021
- 2021-05-31 CN CN202110603641.6A patent/CN113326387B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105045828A (en) * | 2015-06-26 | 2015-11-11 | 徐信 | Retrieval system and method for accurate positioning of audio/video speech information |
CN108345679A (en) * | 2018-02-26 | 2018-07-31 | 科大讯飞股份有限公司 | A kind of audio and video search method, device, equipment and readable storage medium storing program for executing |
CN111814028A (en) * | 2020-09-14 | 2020-10-23 | 腾讯科技(深圳)有限公司 | Information searching method and device |
CN112839195A (en) * | 2020-12-30 | 2021-05-25 | 深圳市皓丽智能科技有限公司 | Method and device for consulting meeting record, computer equipment and storage medium |
CN112765460A (en) * | 2021-01-08 | 2021-05-07 | 北京字跳网络技术有限公司 | Conference information query method, device, storage medium, terminal device and server |
Also Published As
Publication number | Publication date |
---|---|
CN113326387A (en) | 2021-08-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113326387B (en) | Intelligent conference information retrieval method | |
CN101382937B (en) | Multimedia resource processing method based on speech recognition and on-line teaching system thereof | |
US5649060A (en) | Automatic indexing and aligning of audio and text using speech recognition | |
US6816858B1 (en) | System, method and apparatus providing collateral information for a video/audio stream | |
US10225625B2 (en) | Caption extraction and analysis | |
US9576581B2 (en) | Metatagging of captions | |
US20090043581A1 (en) | Methods and apparatus relating to searching of spoken audio data | |
US8965916B2 (en) | Method and apparatus for providing media content | |
US20100299131A1 (en) | Transcript alignment | |
JP2007519987A (en) | Integrated analysis system and method for internal and external audiovisual data | |
NO325191B1 (en) | Sociable multimedia stream | |
CN110781328A (en) | Video generation method, system, device and storage medium based on voice recognition | |
Bougrine et al. | Toward a Web-based speech corpus for Algerian dialectal Arabic varieties | |
Kamabathula et al. | Automated tagging to enable fine-grained browsing of lecture videos | |
Roy et al. | Speaker identification based text to audio alignment for an audio retrieval system | |
CN102136001B (en) | Multi-media information fuzzy search method | |
US6813624B1 (en) | Method and apparatus for archival and retrieval of multiple data streams | |
CN106550268B (en) | Video processing method and video processing device | |
Nouza et al. | Making czech historical radio archive accessible and searchable for wide public | |
JP4140745B2 (en) | How to add timing information to subtitles | |
CN109376145B (en) | Method and device for establishing movie and television dialogue database and storage medium | |
GB2451938A (en) | Methods and apparatus for searching of spoken audio data | |
KR20010037652A (en) | Audio indexing system and method, and audio retrieval system and method | |
Clements et al. | Phonetic searching of digital audio | |
Chaudhary et al. | Keyword based indexing of a multimedia file |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |