WO2020103447A1 - Link-type storage method and apparatus for video information, computer device and storage medium - Google Patents

Link-type storage method and apparatus for video information, computer device and storage medium

Info

Publication number
WO2020103447A1
WO2020103447A1 PCT/CN2019/092636 CN2019092636W WO2020103447A1 WO 2020103447 A1 WO2020103447 A1 WO 2020103447A1 CN 2019092636 W CN2019092636 W CN 2019092636W WO 2020103447 A1 WO2020103447 A1 WO 2020103447A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
video
speaker
stored
processed
Prior art date
Application number
PCT/CN2019/092636
Other languages
French (fr)
Chinese (zh)
Inventor
吴壮伟
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020103447A1 publication Critical patent/WO2020103447A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Definitions

  • This application relates to the field of computer technology, and in particular to a video information chain storage method, device, computer equipment, and storage medium.
  • the traditional processing method is to convert the format of the video file or reduce the resolution of the video file to The compression process is performed to reduce the number of bytes of the video file.
  • the existing video processing method has a problem that the video file cannot be stored lightly.
  • Embodiments of the present application provide a video information chain storage method, device, computer equipment, and storage medium, which are intended to solve the problem in the prior art that the video file cannot be stored lightly.
  • an embodiment of the present application provides a video information chain storage method, which includes: acquiring a video file to be processed, and cutting the video file to be processed through a video cutting model to obtain multiple video segments to be stored; according to a preset
  • the speech recognition model recognizes the voice information in the multiple video segments to be stored to obtain the text information corresponding to the speaker; intercepts the view information corresponding to the text information from the video segment to be stored; according to the text information
  • the speaker stores the obtained text information and view information in a preset database and a linked list corresponding to the speaker.
  • an embodiment of the present application provides a video information chain storage device, which includes a video file cutting unit for acquiring a video file to be processed, and cutting the video file to be processed through a video cutting model to obtain multiple Stored video segment; voice information recognition unit, used to recognize the voice information in the obtained multiple video segments to be stored according to a preset voice recognition model to obtain text information corresponding to the speaker; view information acquisition unit, used To intercept the view information corresponding to the text information from the video segment to be stored; the information storage unit is used to store the obtained text information and view information to the preset database and the linked list corresponding to the speaker according to the speaker corresponding to the text information in.
  • an embodiment of the present application further provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor executing the computer
  • the program implements the video information chain storage method described in the first aspect above.
  • an embodiment of the present application further provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor causes the processor to execute the On the one hand, the video information chain storage method.
  • FIG. 1 is a schematic flowchart of a video information chain storage method provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a sub-process of a video information chain storage method provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of another sub-process of a video information chain storage method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of another sub-process of a video information chain storage method provided by an embodiment of this application.
  • FIG. 5 is another schematic flowchart of a video information chain storage method provided by an embodiment of the present application.
  • FIG. 6 is a schematic block diagram of a video information chain storage device provided by an embodiment of the present application.
  • FIG. 7 is a schematic block diagram of a subunit of a video information chain storage device provided by an embodiment of this application.
  • FIG. 8 is a schematic block diagram of another subunit of a video information chain storage device provided by an embodiment of the present application.
  • FIG. 9 is a schematic block diagram of another subunit of a video information chain storage device provided by an embodiment of this application.
  • FIG. 10 is another schematic block diagram of a video information chain storage device provided by an embodiment of the present application.
  • FIG. 11 is a schematic block diagram of a computer device provided by an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of a video information chain storage method provided by an embodiment of the present application.
  • the video information chain storage method is applied to terminal devices with information storage functions, such as desktop computers, notebook computers, tablet computers, or mobile phones.
  • the method includes steps S110-S140.
  • S110 Obtain a video file to be processed, and cut the video file to be processed through a video cutting model to obtain multiple video segments to be stored.
  • the user inputs a to-be-processed video file into the user terminal, identifies and cuts the to-be-processed video file through a video cutting model, and obtains multiple video segments to be stored.
  • the to-be-processed video file is a video file input by the user and needs to be stored in a light weight.
  • the to-be-processed video file includes number information, video time stamp, and speaker information.
  • the number information is the number used to identify the to-be-processed video file, that is, the ID of the to-be-processed video file, and each to-be-processed video file has a corresponding number information, and the to-be-processed video file number information is not repeated ;
  • Video timestamp is the information used to mark the time of the video file to be processed. The video timestamp can be used to determine the specific creation time of the video file to be processed;
  • the speaker information is the speaker contained in the video file to be processed Information, one to be processed video file can contain one or more speakers.
  • the corresponding speaker information contains only one speaker; if the to-be-processed video file is a face-to-face interview recording, the corresponding speaker information contains Multiple speakers.
  • step S110 includes sub-steps S111 and S112.
  • the speaker switching time point is obtained through the video cutting model and the speaker information in the to-be-processed video file. Specifically, if the speaker information contains only one speaker, there is no speaker switching time point in the pending video file, and the pending video file is not cut; if the speaker information contains multiple speakers, the If the video file to be processed contains one or more speaker switching time points, the speaker switching time point in the video file to be processed needs to be obtained according to the video cutting model. Recognize the speakers in the video file to be processed through the video cutting model to obtain the switching time point of switching from one speaker to another in the video file to be processed.
  • the video cutting model contains the facial recognition results of all speakers , Face recognition of the speaker in the video file to be processed through the video cutting model can be matched to the speaker of the current picture.
  • Face recognition of the speaker in the video file to be processed through the video cutting model can be matched to the speaker of the current picture.
  • the speech occurs Person switching, obtaining the time when the speaker switch occurs in the two screens of the video file to be processed as the obtained speaker switching time point.
  • S112 Cut the video file to be processed according to the time point when the speaker switches in the video file to be processed to obtain the video segment to be stored corresponding to each speaker.
  • the video file to be processed is cut according to the time point when the speaker switches in the video file to be processed to obtain the video segment to be stored corresponding to each speaker. According to the obtained speaker switching time point, the to-be-processed video file can be cut into multiple video segments to be stored. Specifically, each video segment to be stored corresponds to a speaker, and each video segment to be stored includes its The corresponding time information in the video file to be processed.
  • the voice recognition model is a specific model for recognizing the voice information in the video file. Speech recognition models include acoustic models, speech feature dictionaries, and semantic analysis models.
  • step S120 includes sub-steps S121, S122, and S123.
  • the voice information is segmented according to the acoustic model in the voice recognition model to obtain multiple phonemes contained in the voice information.
  • the voice information is a sentence that the user says by uttering.
  • the voice information received by the user terminal is composed of phonemes that are pronounced by multiple characters, and the phonemes of a character include the frequency and timbre of the pronunciation of the character.
  • the acoustic model contains phonemes for pronunciation of all characters. By matching the phonetic information with all phonemes in the acoustic model, the phonemes of individual characters in the phonetic information can be segmented, and finally the phonetic information contained in the phonetic information can be obtained Multiple phonemes.
  • all phonemes can be converted into pinyin information.
  • the phonetic feature dictionary contains the phoneme information corresponding to the pinyin of all characters.
  • the phoneme of a single character can be converted into the pinyin of the character in the phonetic feature dictionary that matches the phoneme , To convert all phonemes contained in the voice information into pinyin information.
  • the obtained pinyin information is semantically analyzed to convert the pinyin information into text information.
  • the semantic analysis model contains the mapping relationship between the pinyin information and the text information. Through the mapping relationship contained in the semantic analysis model, the obtained pinyin information can be semantically analyzed to convert the pinyin information into text information.
  • the text information corresponding to the pinyin "hé, p ⁇ ng" in the semantic analysis model is "peace”.
  • View information randomly intercepted from the video segment to be stored is view information corresponding to the video segment to be stored. Since each video segment to be stored corresponds to a speaker, the captured view information corresponds to the speaker.
  • the view information may be a video or a picture. By using a video or a picture as the view information corresponding to the video segment to be stored, the view information of the speaker in the stored video segment can be intercepted and saved.
  • the view information is a video
  • the view information is a picture
  • the video segment to be stored Randomly intercept a picture of the speaker as view information corresponding to the video segment to be stored.
  • S140 The obtained text information and view information are stored in the linked list corresponding to the speaker in the preset database according to the speaker corresponding to the text information.
  • the preset database is a database used to store data information.
  • the database contains multiple linked lists.
  • the linked list is a data storage unit that stores text information and view information contained in the video file to be processed according to the time axis.
  • the speaker corresponds to a linked list in the database.
  • the logical order of the data information stored in the linked list is implemented by the pointer linking order in the linked list.
  • the time information corresponding to the text information in the to-be-processed video file is used as the logical order of the linked list, which is
  • the text information and view information in the to-be-processed video file are stored in the linked list by using the time information as the pointer linking order.
  • step S140 includes sub-steps S141, S142, and S143.
  • each video segment to be stored contains its corresponding time information in the video file to be processed, and the text information corresponds to the video segment to be stored in the video file to be processed one by one, therefore, by obtaining the corresponding to the text information to be stored
  • the time information of the video segment in the to-be-processed video file can obtain the corresponding time information of the text information in the to-be-processed video file.
  • time information of a video segment to be stored in the video file to be processed is "1 minute 20 seconds to 3 minutes 10 seconds"
  • “1 minute 20 seconds to 3 minutes 10 seconds” will be used as the The corresponding time information of the corresponding text information in the to-be-processed video file.
  • the text information is stored in the linked list according to the time information of the text information and the corresponding speaker.
  • Each piece of text information corresponds to a speaker.
  • the speaker can obtain a linked list corresponding to the speaker from the preset database, and the text information is stored in the linked list using the time information of the text information as the logical sequence of the linked list, and Add the speaker corresponding to the text message to the stored text message.
  • step S150 is further included after step S140.
  • the index information corresponding to the text information is generated according to the number information of the video file to be processed and the video time stamp and stored in the database.
  • the index information corresponding to the text information can be generated according to the number information and video time stamp of the video file to be processed, and one video file to be processed can correspond to one or more text Information, you need to generate one or more index information correspondingly.
  • certain text information is the third text of the video file to be processed in Table 1, and the corresponding time information of the text information in the video file to be processed is "1 minute 20 seconds to 3 minutes 10 seconds", then the corresponding index is generated The information is "S10021-3, 2018.04.11-1 minutes 20 seconds to 3 minutes 10 seconds".
  • the text information and the view information are saved in the linked list to realize the lightweight storage of the video file to be processed without losing the video
  • the storage space required for the video file is greatly reduced, and very good results have been achieved in the actual application process.
  • An embodiment of the present application further provides a video information chain storage device, which is used to execute any embodiment of the foregoing video information chain storage method.
  • FIG. 6 is a schematic block diagram of a video information chain storage device provided by an embodiment of the present application.
  • the video information chain storage device can be configured in terminal devices such as desktop computers, notebook computers, tablet computers or mobile phones.
  • the video information chain storage device 100 includes a video file cutting unit 110, a voice information recognition unit 120, a view information acquisition unit 130, and an information storage unit 140.
  • the video file cutting unit 110 is used for obtaining a video file to be processed, and cutting the video file to be processed through a video cutting model to obtain multiple video segments to be stored.
  • the video file cutting unit 110 includes subunits: a switching time point acquiring unit 111 and a cutting processing unit 112.
  • the switching time point acquiring unit 111 is configured to obtain the speaker switching time point through the video cutting model and the speaker information in the to-be-processed video file.
  • the cutting processing unit 112 is configured to cut the video file to be processed according to the speaker switching time point in the video file to be processed to obtain the video segment to be stored corresponding to each speaker.
  • the voice information recognition unit 120 is configured to recognize the voice information in the obtained multiple video segments to be stored according to a preset voice recognition model to obtain text information corresponding to the speaker.
  • the voice information recognition unit 120 includes subunits: a phoneme segmentation unit 121, a phonetic information acquisition unit 122 and a text information acquisition unit 123.
  • the phoneme segmentation unit 121 is configured to segment the voice information according to the acoustic model in the voice recognition model to obtain multiple phonemes contained in the voice information.
  • the pinyin information acquisition unit 122 is configured to match the obtained phonemes according to the phonetic feature dictionary in the voice recognition model to convert all phonemes into pinyin information.
  • the text information acquiring unit 123 is configured to perform semantic analysis on the obtained pinyin information according to the semantic analysis model in the speech recognition model to convert the pinyin information into text information.
  • the view information obtaining unit 130 is used to intercept the view information corresponding to the text information from the video segment to be stored.
  • the information storage unit 140 is configured to store the obtained text information and view information in a preset database in a linked list corresponding to the speaker according to the speaker corresponding to the text information.
  • the information storage unit 140 includes subunits: a time information acquisition unit 141, a text information storage unit 142, and a view information storage unit 143.
  • the time information obtaining unit 141 is used to obtain the time information corresponding to the text information in the to-be-processed video file.
  • the text information storage unit 142 is configured to store the text information in a linked list corresponding to the speaker based on the time information of the text information and the corresponding speaker.
  • the view information storage unit 143 is configured to insert the view information corresponding to the text information into the stored text information in the linked list to save the view information.
  • the video information chain storage device 100 further includes a subunit: an index information storage unit 150.
  • the index information storage unit 150 is configured to generate index information corresponding to the text information according to the number information of the video file to be processed and the video time stamp and store it in the database.
  • the above-mentioned video information chain storage device may be implemented in the form of a computer program, and the computer program may run on a computer device as shown in FIG. 11.
  • FIG. 11 is a schematic block diagram of a computer device provided by an embodiment of the present application.
  • the computer device 500 includes a processor 502, a memory, and a network interface 505 connected through a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
  • the non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032.
  • the processor 502 can execute the video information chain storage method.
  • the processor 502 is used to provide computing and control capabilities and support the operation of the entire computer device 500.
  • the internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503.
  • the processor 502 can execute the video information chain storage method.
  • the network interface 505 is used for network communication, such as the transmission of data information.
  • the network interface 505 is used for network communication, such as the transmission of data information.
  • FIG. 11 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied.
  • the specific computer device 500 may include more or less components than shown in the figure, or combine certain components, or have a different arrangement of components.
  • the processor 502 is used to run the computer program 5032 stored in the memory to implement the video information chain storage method of the present application.
  • the embodiment of the computer device shown in FIG. 11 does not constitute a limitation on the specific configuration of the computer device.
  • the computer device may include more or fewer components than shown in the figure. Or combine certain components, or arrange different components.
  • the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are consistent with the embodiment shown in FIG. 11, and details are not described herein again.
  • the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), Application specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may be any conventional processor.
  • a computer-readable storage medium may be a non-volatile computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, where the computer program is executed by the processor to implement the video information chain storage method of the embodiments of the present application.
  • the storage medium may be an internal storage unit of the foregoing device, such as a hard disk or a memory of the device.
  • the storage medium may also be an external storage device of the device, such as a plug-in hard disk equipped on the device, a smart memory card (Smart) Card (SMC), a secure digital (SD) card, or a flash memory card (Flash Card) etc.
  • the storage medium may also include both an internal storage unit of the device and an external storage device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application discloses a link-type storage method and apparatus for video information, a computer device and a storage medium. The method comprises: acquiring a video file to be processed, and segmenting said video file by means of a video segmenting model to obtain a plurality of video segments to be stored; recognizing, according to a preset voice recognition model, voice information in the obtained plurality of video segment, to obtain text information corresponding to the speaker; clipping, from said video segments, view information corresponding to the text information; storing, according to the speaker corresponding to the text information, the obtained text information and view information into a linked list corresponding to the speaker in a preset database.

Description

视频信息链式存储方法、装置、计算机设备及存储介质Video information chain storage method, device, computer equipment and storage medium
本申请要求于2018年11月21日提交中国专利局、申请号为201811389154.9、申请名称为“视频信息链式存储方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application submitted to the China Patent Office on November 21, 2018, with the application number 201811389154.9 and the application name as "video information chain storage method, device, computer equipment and storage medium", all of its content Incorporated by reference in this application.
技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种视频信息链式存储方法、装置、计算机设备及存储介质。This application relates to the field of computer technology, and in particular to a video information chain storage method, device, computer equipment, and storage medium.
背景技术Background technique
在对视频文件进行保存时需确保视频文件中的重要信息不被遗漏,因而需占用较大的内存空间,传统的处理方法均是将视频文件的格式进行转换或减小视频文件的分辨率以进行压缩处理,以减小视频文件的字节数,然而由于处理后所得到的视频文件依然需较大内存空间才能够存储,仍无法完美解决视频文件需占用大量内存空间的问题。因而,现有的视频处理方法存在无法将视频文件轻量化存储的问题。When saving the video file, it is necessary to ensure that important information in the video file is not omitted, so it requires a large memory space. The traditional processing method is to convert the format of the video file or reduce the resolution of the video file to The compression process is performed to reduce the number of bytes of the video file. However, because the video file obtained after the processing still needs a large memory space to be stored, the problem that the video file takes up a lot of memory space cannot be perfectly solved. Therefore, the existing video processing method has a problem that the video file cannot be stored lightly.
发明内容Summary of the invention
本申请实施例提供了一种视频信息链式存储方法、装置、计算机设备及存储介质,旨在解决现有技术中所存在的无法将视频文件轻量化存储的问题。Embodiments of the present application provide a video information chain storage method, device, computer equipment, and storage medium, which are intended to solve the problem in the prior art that the video file cannot be stored lightly.
第一方面,本申请实施例提供了一种视频信息链式存储方法,其包括:获取待处理视频文件,通过视频切割模型对待处理视频文件进行切割以得到多个待存储视频段;根据预设的语音识别模型对所得到的多个待存储视频段中的语音信息进行识别以得到与讲话人对应的文字信息;从待存储视频段中截取与文字信息对应的视图信息;根据文字信息对应的讲话人将所得到的文字信息及视图信息存储至预设数据库与该讲话人对应的链表中。In a first aspect, an embodiment of the present application provides a video information chain storage method, which includes: acquiring a video file to be processed, and cutting the video file to be processed through a video cutting model to obtain multiple video segments to be stored; according to a preset The speech recognition model recognizes the voice information in the multiple video segments to be stored to obtain the text information corresponding to the speaker; intercepts the view information corresponding to the text information from the video segment to be stored; according to the text information The speaker stores the obtained text information and view information in a preset database and a linked list corresponding to the speaker.
第二方面,本申请实施例提供了一种视频信息链式存储装置,其包括:视频文件切割单元,用于获取待处理视频文件,通过视频切割模型对待处理视频文件进行切割以得到多个待存储视频段;语音信息识别单元,用于根据预设的 语音识别模型对所得到的多个待存储视频段中的语音信息进行识别以得到与讲话人对应的文字信息;视图信息获取单元,用于从待存储视频段中截取与文字信息对应的视图信息;信息存储单元,用于根据文字信息对应的讲话人将所得到的文字信息及视图信息存储至预设数据库与该讲话人对应的链表中。In a second aspect, an embodiment of the present application provides a video information chain storage device, which includes a video file cutting unit for acquiring a video file to be processed, and cutting the video file to be processed through a video cutting model to obtain multiple Stored video segment; voice information recognition unit, used to recognize the voice information in the obtained multiple video segments to be stored according to a preset voice recognition model to obtain text information corresponding to the speaker; view information acquisition unit, used To intercept the view information corresponding to the text information from the video segment to be stored; the information storage unit is used to store the obtained text information and view information to the preset database and the linked list corresponding to the speaker according to the speaker corresponding to the text information in.
第三方面,本申请实施例又提供了一种计算机设备,其包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述第一方面所述的视频信息链式存储方法。In a third aspect, an embodiment of the present application further provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor executing the computer The program implements the video information chain storage method described in the first aspect above.
第四方面,本申请实施例还提供了一种计算机可读存储介质,其中所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行上述第一方面所述的视频信息链式存储方法。According to a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor causes the processor to execute the On the one hand, the video information chain storage method.
附图说明BRIEF DESCRIPTION
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly explain the technical solutions of the embodiments of the present application, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. Ordinary technicians can obtain other drawings based on these drawings without creative work.
图1为本申请实施例提供的视频信息链式存储方法的流程示意图;FIG. 1 is a schematic flowchart of a video information chain storage method provided by an embodiment of the present application;
图2为本申请实施例提供的视频信息链式存储方法的子流程示意图;2 is a schematic diagram of a sub-process of a video information chain storage method provided by an embodiment of the present application;
图3为本申请实施例提供的视频信息链式存储方法的另一子流程示意图;3 is a schematic diagram of another sub-process of a video information chain storage method provided by an embodiment of the present application;
图4为本申请实施例提供的视频信息链式存储方法的另一子流程示意图;4 is a schematic diagram of another sub-process of a video information chain storage method provided by an embodiment of this application;
图5为本申请实施例提供的视频信息链式存储方法的另一流程示意图;5 is another schematic flowchart of a video information chain storage method provided by an embodiment of the present application;
图6为本申请实施例提供的视频信息链式存储装置的示意性框图;6 is a schematic block diagram of a video information chain storage device provided by an embodiment of the present application;
图7为本申请实施例提供的视频信息链式存储装置的子单元示意性框图;7 is a schematic block diagram of a subunit of a video information chain storage device provided by an embodiment of this application;
图8为本申请实施例提供的视频信息链式存储装置的另一子单元示意性框图;8 is a schematic block diagram of another subunit of a video information chain storage device provided by an embodiment of the present application;
图9为本申请实施例提供的视频信息链式存储装置的另一子单元示意性框图;9 is a schematic block diagram of another subunit of a video information chain storage device provided by an embodiment of this application;
图10为本申请实施例提供的视频信息链式存储装置的另一示意性框图;10 is another schematic block diagram of a video information chain storage device provided by an embodiment of the present application;
图11为本申请实施例提供的计算机设备的示意性框图。FIG. 11 is a schematic block diagram of a computer device provided by an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without creative work fall within the protection scope of the present application.
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that when used in this specification and the appended claims, the terms "including" and "comprising" indicate the presence of described features, wholes, steps, operations, elements, and / or components, but do not exclude one or The presence or addition of multiple other features, wholes, steps, operations, elements, components, and / or collections thereof.
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。It should also be understood that the terminology used in the description of this application is for the purpose of describing particular embodiments only and is not intended to limit this application. As used in the specification of the present application and the appended claims, unless the context clearly indicates otherwise, the singular forms "a", "an", and "the" are intended to include the plural forms.
还应当进一步理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should also be further understood that the term "and / or" used in the specification of the present application and the appended claims refers to any and all possible combinations of one or more of the associated listed items and includes these combinations .
请参阅图1,图1是本申请实施例提供的视频信息链式存储方法的流程示意图。该视频信息链式存储方法应用于具有信息存储功能的终端设备中,例如台式电脑、笔记本电脑、平板电脑或手机等。Please refer to FIG. 1. FIG. 1 is a schematic flowchart of a video information chain storage method provided by an embodiment of the present application. The video information chain storage method is applied to terminal devices with information storage functions, such as desktop computers, notebook computers, tablet computers, or mobile phones.
如图1所示,该方法包括步骤S110~S140。As shown in FIG. 1, the method includes steps S110-S140.
S110、获取待处理视频文件,通过视频切割模型对待处理视频文件进行切割以得到多个待存储视频段。S110: Obtain a video file to be processed, and cut the video file to be processed through a video cutting model to obtain multiple video segments to be stored.
获取待处理视频文件,通过视频切割模型对待处理视频文件进行切割以得到多个待存储视频段。用户将一个待处理视频文件输入用户终端,通过视频切割模型对待处理视频文件进行识别并切割,得到多个待存储视频段。其中,待处理视频文件即是用户所输入的需进行轻量化存储的视频文件,待处理视频文件包括编号信息、视频时间戳及讲话人信息。具体的,编号信息即是用于待处理视频文件进行识别的编号,也即是待处理视频文件的ID,每一个待处理视频文件对应拥有一个编号信息,所有待处理视频文件的编号信息不重复;视频时间戳即是用于对待处理视频文件的时间进行标记的信息,通过视频时间戳即可 确定待处理视频文件具体的创建时间;讲话人信息即是待处理视频文件所包含的讲话人的信息,一个待处理视频文件中可包含一个或多个讲话人。Obtain the video file to be processed, and cut the video file to be processed through the video cutting model to obtain multiple video segments to be stored. The user inputs a to-be-processed video file into the user terminal, identifies and cuts the to-be-processed video file through a video cutting model, and obtains multiple video segments to be stored. Among them, the to-be-processed video file is a video file input by the user and needs to be stored in a light weight. The to-be-processed video file includes number information, video time stamp, and speaker information. Specifically, the number information is the number used to identify the to-be-processed video file, that is, the ID of the to-be-processed video file, and each to-be-processed video file has a corresponding number information, and the to-be-processed video file number information is not repeated ; Video timestamp is the information used to mark the time of the video file to be processed. The video timestamp can be used to determine the specific creation time of the video file to be processed; the speaker information is the speaker contained in the video file to be processed Information, one to be processed video file can contain one or more speakers.
例如,若待处理视频文件为一节课程的视频文件,其对应的讲话人信息中仅包含一个讲话人;若待处理视频文件为一段面对面的访谈节目录像,则其对应的讲话人信息中包含多个讲话人。For example, if the to-be-processed video file is a course video file, the corresponding speaker information contains only one speaker; if the to-be-processed video file is a face-to-face interview recording, the corresponding speaker information contains Multiple speakers.
例如某一个待处理视频文件所包含的信息如表1所示。For example, the information contained in a certain to-be-processed video file is shown in Table 1.
编号信息Number information 视频时间戳Video timestamp 讲话人信息Speaker Information
S10021S10021 2018.04.112018.04.11 AA、BB、CCCAA, BB, CCC
表1Table 1
在一实施例中,如图2所示,步骤S110包括子步骤S111和S112。In an embodiment, as shown in FIG. 2, step S110 includes sub-steps S111 and S112.
S111、通过视频切割模型及待处理视频文件中的讲话人信息获取得到讲话人切换时间点。S111. Obtain the speaker switching time point through the video cutting model and the speaker information in the to-be-processed video file.
通过视频切割模型及待处理视频文件中的讲话人信息获取得到讲话人切换时间点。具体的,若讲话人信息中仅包含一个讲话人,则该待处理视频文件中无讲话人切换时间点,不对该待处理视频文件进行切割处理;若讲话人信息中包含多个讲话人,该待处理视频文件中包含一个或多个讲话人切换时间点,则需根据视频切割模型获取待处理视频文件中的讲话人切换时间点。通过视频切割模型对待处理视频文件中的讲话人进行识别,即可获取待处理视频文件中从一个讲话人切换至另一个讲话人的切换时间点,视频切割模型中包含所有讲话人的面部识别结果,通过视频切割模型对待处理视频文件中的讲话人进行面部识别,即可匹配得到当前画面的讲话人,当待处理视频文件中前一画面讲话人与后一画面讲话人不相同时则发生讲话人切换,获取待处理视频文件两个画面中讲话人发生切换的时间作为所得到的一个讲话人切换时间点。The speaker switching time point is obtained through the video cutting model and the speaker information in the to-be-processed video file. Specifically, if the speaker information contains only one speaker, there is no speaker switching time point in the pending video file, and the pending video file is not cut; if the speaker information contains multiple speakers, the If the video file to be processed contains one or more speaker switching time points, the speaker switching time point in the video file to be processed needs to be obtained according to the video cutting model. Recognize the speakers in the video file to be processed through the video cutting model to obtain the switching time point of switching from one speaker to another in the video file to be processed. The video cutting model contains the facial recognition results of all speakers , Face recognition of the speaker in the video file to be processed through the video cutting model can be matched to the speaker of the current picture. When the speaker of the previous picture and the speaker of the next picture in the video file to be processed are different, the speech occurs Person switching, obtaining the time when the speaker switch occurs in the two screens of the video file to be processed as the obtained speaker switching time point.
S112、根据待处理视频文件中讲话人切换时间点对待处理视频文件进行切割以获取每一个讲话人对应的待存储视频段。S112: Cut the video file to be processed according to the time point when the speaker switches in the video file to be processed to obtain the video segment to be stored corresponding to each speaker.
根据待处理视频文件中讲话人切换时间点对待处理视频文件进行切割以获取每一个讲话人对应的待存储视频段。根据所得到的讲话人切换时间点即可将待处理视频文件切割为多个待存储视频段,具体的,每一个待存储视频段均与一个讲话人对应,每一个待存储视频段均包含其在待处理视频文件中对应的时间信息。The video file to be processed is cut according to the time point when the speaker switches in the video file to be processed to obtain the video segment to be stored corresponding to each speaker. According to the obtained speaker switching time point, the to-be-processed video file can be cut into multiple video segments to be stored. Specifically, each video segment to be stored corresponds to a speaker, and each video segment to be stored includes its The corresponding time information in the video file to be processed.
S120、根据预设的语音识别模型对所得到的多个待存储视频段中的语音信息进行识别以得到与讲话人对应的文字信息。S120. Identify the obtained voice information in the plurality of video segments to be stored according to a preset voice recognition model to obtain text information corresponding to the speaker.
根据预设的语音识别模型对所得到的多个待存储视频段中的语音信息进行识别,以得到与讲话人对应的文字信息。通过语音识别模型即可对待存储视频段中的语音信息进行识别,以得到相应的文字信息,每一段文字信息均与一个讲话人对应。语音识别模型即是用于将视频文件中的语音信息进行识别的具体模型。语音识别模型包括声学模型、语音特征词典及语义解析模型。Recognize the obtained voice information in the plurality of video segments to be stored according to a preset voice recognition model to obtain text information corresponding to the speaker. The speech information in the stored video segment can be recognized through the speech recognition model to obtain corresponding text information, and each text information corresponds to a speaker. The voice recognition model is a specific model for recognizing the voice information in the video file. Speech recognition models include acoustic models, speech feature dictionaries, and semantic analysis models.
在一实施例中,如图3所示,步骤S120包括子步骤S121、S122和S123。In one embodiment, as shown in FIG. 3, step S120 includes sub-steps S121, S122, and S123.
S121、根据语音识别模型中的声学模型对所述语音信息进行切分以得到语音信息中所包含的多个音素。S121. Segment the voice information according to the acoustic model in the voice recognition model to obtain multiple phonemes contained in the voice information.
根据语音识别模型中的声学模型对所述语音信息进行切分,即可得到该语音信息中所包含的多个音素。语音信息也即是用户通过发声所说的一句话,具体的,用户终端所接收到的语音信息由多个字符发音的音素而组成,一个字符的音素包括该字符发音的频率和音色。声学模型中包含所有字符发音的音素,通过将语音信息与声学模型中所有的音素进行匹配,即可对语音信息中单个字符的音素进行切分,通过切分最终得到该语音信息中所包含的多个音素。The voice information is segmented according to the acoustic model in the voice recognition model to obtain multiple phonemes contained in the voice information. The voice information is a sentence that the user says by uttering. Specifically, the voice information received by the user terminal is composed of phonemes that are pronounced by multiple characters, and the phonemes of a character include the frequency and timbre of the pronunciation of the character. The acoustic model contains phonemes for pronunciation of all characters. By matching the phonetic information with all phonemes in the acoustic model, the phonemes of individual characters in the phonetic information can be segmented, and finally the phonetic information contained in the phonetic information can be obtained Multiple phonemes.
S122、根据语音识别模型中的语音特征词典对所得到的音素进行匹配,以将所有音素转换为拼音信息。S122. Match the obtained phonemes according to the speech feature dictionary in the speech recognition model to convert all phonemes into pinyin information.
根据语音识别模型中的语音特征词典对所得到的音素进行匹配,即可将所有音素转换为拼音信息。语音特征词典中包含所有字符拼音对应的音素信息,通过将所得到的音素与字符拼音对应的音素信息进行匹配,即可将单个字符的音素转换为语音特征词典中与该音素相匹配的字符拼音,以实现将语音信息中所包含的所有音素转换为拼音信息。According to the speech feature dictionary in the speech recognition model to match the obtained phonemes, all phonemes can be converted into pinyin information. The phonetic feature dictionary contains the phoneme information corresponding to the pinyin of all characters. By matching the obtained phoneme with the phoneme information corresponding to the pinyin of the character, the phoneme of a single character can be converted into the pinyin of the character in the phonetic feature dictionary that matches the phoneme , To convert all phonemes contained in the voice information into pinyin information.
S123、根据语音识别模型中的语义解析模型对所得到的拼音信息进行语义解析以将拼音信息转换为文字信息。S123. Perform semantic analysis on the obtained pinyin information according to the semantic analysis model in the speech recognition model to convert the pinyin information into text information.
根据语音识别模型中的语义解析模型对所得到的拼音信息进行语义解析,以实现将拼音信息转换为文字信息。语义解析模型中包含拼音信息与文字信息之间所对应的映射关系,通过语义解析模型中所包含的映射关系即可对所得到的拼音信息进行语义解析以将拼音信息转换为文字信息。According to the semantic analysis model in the speech recognition model, the obtained pinyin information is semantically analyzed to convert the pinyin information into text information. The semantic analysis model contains the mapping relationship between the pinyin information and the text information. Through the mapping relationship contained in the semantic analysis model, the obtained pinyin information can be semantically analyzed to convert the pinyin information into text information.
例如,拼音“hé、píng”在语义解析模型中所对应的文字信息为“和平”。For example, the text information corresponding to the pinyin "hé, píng" in the semantic analysis model is "peace".
S130、从待存储视频段中截取与文字信息对应的视图信息。S130. Extract view information corresponding to the text information from the video segment to be stored.
从待存储视频段中随机截取与文字信息对应的视图信息。从待存储视频段中随机截取视图信息,即为与该待存储视频段对应的视图信息,由于每一个待存储视频段均与一个讲话人对应,所截取的视图信息与讲话人相对应。具体的,视图信息可以是一段视频或一张图片,通过一段视频或一张图片作为待存储视频段对应的视图信息,即可对待存储视频段中讲话人的视图信息进行截取和保存。例如,若视图信息为一段视频,则可截取待存储视频段中一段5秒或10秒的视频作为待存储视频段对应的视图信息;若视图信息为一张图片,则可从待存储视频段中随机截取一张讲话人的图片作为待存储视频段对应的视图信息。Randomly intercept the view information corresponding to the text information from the video segment to be stored. View information randomly intercepted from the video segment to be stored is view information corresponding to the video segment to be stored. Since each video segment to be stored corresponds to a speaker, the captured view information corresponds to the speaker. Specifically, the view information may be a video or a picture. By using a video or a picture as the view information corresponding to the video segment to be stored, the view information of the speaker in the stored video segment can be intercepted and saved. For example, if the view information is a video, you can intercept a 5 second or 10 second video in the video segment to be stored as the view information corresponding to the video segment to be stored; if the view information is a picture, the video segment to be stored Randomly intercept a picture of the speaker as view information corresponding to the video segment to be stored.
S140、根据文字信息对应的讲话人将所得到的文字信息及视图信息存储至预设数据库与该讲话人对应的链表中。S140: The obtained text information and view information are stored in the linked list corresponding to the speaker in the preset database according to the speaker corresponding to the text information.
将待处理视频文件中所包含的文字信息及视图信息存储至链表中,以对待处理视频文件中所包含的信息进行轻量化存储。由于待处理视频文件中的语音信息被转换为文字信息,从待处理视频文件中截取了每一个讲话人的视图信息,因此,待处理视频文件中所包含的信息均被保存。预设数据库即是用于对数据信息进行存储的数据库,数据库中包含多个链表,链表即是根据时间轴对待处理视频文件中所包含的文字信息及视图信息进行存储的数据存储单元,每一个讲话人与数据库中的一个链表相对应。其中,链表中所存储的数据信息的逻辑顺序是通过链表中的指针链接次序实现的,在本实施例中以文字信息在待处理视频文件中对应的时间信息作为链表的逻辑顺序,也即是通过时间信息为指针链接次序将待处理视频文件中文字信息及视图信息存储至链表中。通过以时间顺序作为链表的逻辑顺序对文字信息及视图信息进行存储,用户可通过链表获取到讲话人以时间信息为顺序的信息列表,其中链表所存储的信息具有无法删除的特性。Store the text information and view information contained in the to-be-processed video file into a linked list, so that the information contained in the to-be-processed video file can be stored lightly. Since the voice information in the to-be-processed video file is converted into text information and the view information of each speaker is intercepted from the to-be-processed video file, the information contained in the to-be-processed video file is saved. The preset database is a database used to store data information. The database contains multiple linked lists. The linked list is a data storage unit that stores text information and view information contained in the video file to be processed according to the time axis. The speaker corresponds to a linked list in the database. The logical order of the data information stored in the linked list is implemented by the pointer linking order in the linked list. In this embodiment, the time information corresponding to the text information in the to-be-processed video file is used as the logical order of the linked list, which is The text information and view information in the to-be-processed video file are stored in the linked list by using the time information as the pointer linking order. By storing the text information and view information in chronological order as the logical order of the linked list, the user can obtain the speaker's information list in chronological order through the linked list, where the information stored in the linked list has the characteristic that it cannot be deleted.
在一实施例中,如图4所示,步骤S140包括子步骤S141、S142和S143。In one embodiment, as shown in FIG. 4, step S140 includes sub-steps S141, S142, and S143.
S141、获取文字信息在待处理视频文件中对应的时间信息。S141. Acquire time information corresponding to the text information in the to-be-processed video file.
获取文字信息在待处理视频文件中对应的时间信息。由于每一个待存储视频段均包含其在待处理视频文件中对应的时间信息,且文字信息与待处理视频文件中的待存储视频段一一对应,因此,通过获取与文字信息对应的待存储视频段在待处理视频文件中的时间信息,即可获取文字信息在待处理视频文件中 对应的时间信息。Obtain the time information corresponding to the text information in the to-be-processed video file. Since each video segment to be stored contains its corresponding time information in the video file to be processed, and the text information corresponds to the video segment to be stored in the video file to be processed one by one, therefore, by obtaining the corresponding to the text information to be stored The time information of the video segment in the to-be-processed video file can obtain the corresponding time information of the text information in the to-be-processed video file.
例如,某一待存储视频段在待处理视频文件中的时间信息为“1分20秒至3分10秒”,则将“1分20秒至3分10秒”作为与该待存储视频段对应文字信息在待处理视频文件中对应的时间信息。For example, if the time information of a video segment to be stored in the video file to be processed is "1 minute 20 seconds to 3 minutes 10 seconds", then "1 minute 20 seconds to 3 minutes 10 seconds" will be used as the The corresponding time information of the corresponding text information in the to-be-processed video file.
S142、根据文字信息的时间信息及对应的讲话人以将文字信息存储至与讲话人对应的链表中。S142: According to the time information of the text information and the corresponding speaker, the text information is stored in the linked list corresponding to the speaker.
根据文字信息的时间信息及对应的讲话人将文字信息存储至链表中。每一段文字信息均与一个讲话人对应,可通过讲话人从预设数据库中获取与该讲话人对应的链表,以文字信息的时间信息作为链表的逻辑顺序将文字信息存储至该链表中,并将与文字信息对应的讲话人添加至已存储的文字信息中。The text information is stored in the linked list according to the time information of the text information and the corresponding speaker. Each piece of text information corresponds to a speaker. The speaker can obtain a linked list corresponding to the speaker from the preset database, and the text information is stored in the linked list using the time information of the text information as the logical sequence of the linked list, and Add the speaker corresponding to the text message to the stored text message.
S143、将与文字信息对应的视图信息插入所述链表中的已存储文字信息以对视图信息进行保存。S143. Insert the view information corresponding to the text information into the stored text information in the linked list to save the view information.
将与文字信息对应的视图信息插入所述链表中的已存储文字信息以对视图信息进行保存。由于文字信息均从相应待存储视频段中截取了于文字信息对应的视图信息,因此可通过文字信息与视图信息的对应关系,将于文字信息对应的视图信息插入链表已存储的文字信息中。Insert the view information corresponding to the text information into the stored text information in the linked list to save the view information. Since the text information intercepts the view information corresponding to the text information from the corresponding video segment to be stored, the view information corresponding to the text information can be inserted into the stored text information in the linked list through the correspondence between the text information and the view information.
在一实施例中,如图5所示,步骤S140之后还包括步骤S150。In an embodiment, as shown in FIG. 5, step S150 is further included after step S140.
S150、根据待处理视频文件的编号信息、视频时间戳生成与文字信息对应的索引信息并存储至所述数据库中。S150. Generate index information corresponding to the text information according to the number information of the video file to be processed and the video time stamp and store it in the database.
根据待处理视频文件的编号信息、视频时间戳生成与文字信息对应的索引信息并存储至所述数据库中。为方便对链表中所存储的文字信息及视图信息进行查找,可根据待处理视频文件的编号信息、视频时间戳生成与文字信息对应的索引信息,一个待处理视频文件可对应一个或多个文字信息,则需对应生成一个或多个索引信息,通过将索引信息存储至数据库中,可大大方便用户对链表中所存储的数据进行快速查找,提高对链表中数据进行查找的效率。The index information corresponding to the text information is generated according to the number information of the video file to be processed and the video time stamp and stored in the database. In order to facilitate the search of the text information and view information stored in the linked list, the index information corresponding to the text information can be generated according to the number information and video time stamp of the video file to be processed, and one video file to be processed can correspond to one or more text Information, you need to generate one or more index information correspondingly. By storing the index information in the database, you can greatly facilitate the user to quickly search the data stored in the linked list, and improve the efficiency of searching the data in the linked list.
例如,某一文字信息为表1中待处理视频文件的第三段文字,且文字信息在待处理视频文件中对应的时间信息为“1分20秒至3分10秒”,则对应生成的索引信息为“S10021-3,2018.04.11-1分20秒至3分10秒”。For example, if certain text information is the third text of the video file to be processed in Table 1, and the corresponding time information of the text information in the video file to be processed is "1 minute 20 seconds to 3 minutes 10 seconds", then the corresponding index is generated The information is "S10021-3, 2018.04.11-1 minutes 20 seconds to 3 minutes 10 seconds".
通过对待处理视频文件进行切割,并获取待存储视频段中的文字信息及对应的视图信息,将文字信息和视图信息保存至链表中,以实现对待处理视频文 件进行轻量化存储,在不遗失视频文件中重要信息的情况下大幅减小了视频文件所需的存储空间,在实际应用过程中取得了非常良好的效果。By cutting the video file to be processed, and obtaining the text information and corresponding view information in the video segment to be stored, the text information and the view information are saved in the linked list to realize the lightweight storage of the video file to be processed without losing the video In the case of important information in the file, the storage space required for the video file is greatly reduced, and very good results have been achieved in the actual application process.
本申请实施例还提供一种视频信息链式存储装置,该视频信息链式存储装置用于执行前述视频信息链式存储方法的任一实施例。具体地,请参阅图6,图6是本申请实施例提供的视频信息链式存储装置的示意性框图。该视频信息链式存储装置可以配置于台式电脑、笔记本电脑、平板电脑或手机等终端设备中。An embodiment of the present application further provides a video information chain storage device, which is used to execute any embodiment of the foregoing video information chain storage method. Specifically, please refer to FIG. 6, which is a schematic block diagram of a video information chain storage device provided by an embodiment of the present application. The video information chain storage device can be configured in terminal devices such as desktop computers, notebook computers, tablet computers or mobile phones.
如图6所示,视频信息链式存储装置100包括视频文件切割单元110、语音信息识别单元120、视图信息获取单元130、信息存储单元140。As shown in FIG. 6, the video information chain storage device 100 includes a video file cutting unit 110, a voice information recognition unit 120, a view information acquisition unit 130, and an information storage unit 140.
视频文件切割单元110,用于获取待处理视频文件,通过视频切割模型对待处理视频文件进行切割以得到多个待存储视频段。The video file cutting unit 110 is used for obtaining a video file to be processed, and cutting the video file to be processed through a video cutting model to obtain multiple video segments to be stored.
其他申请实施例中,如图7所示,所述视频文件切割单元110包括子单元:切换时间点获取单元111和切割处理单元112。In other application embodiments, as shown in FIG. 7, the video file cutting unit 110 includes subunits: a switching time point acquiring unit 111 and a cutting processing unit 112.
切换时间点获取单元111,用于通过视频切割模型及待处理视频文件中的讲话人信息获取得到讲话人切换时间点。The switching time point acquiring unit 111 is configured to obtain the speaker switching time point through the video cutting model and the speaker information in the to-be-processed video file.
切割处理单元112,用于根据待处理视频文件中讲话人切换时间点对待处理视频文件进行切割以获取每一个讲话人对应的待存储视频段。The cutting processing unit 112 is configured to cut the video file to be processed according to the speaker switching time point in the video file to be processed to obtain the video segment to be stored corresponding to each speaker.
语音信息识别单元120,用于根据预设的语音识别模型对所得到的多个待存储视频段中的语音信息进行识别以得到与讲话人对应的文字信息。The voice information recognition unit 120 is configured to recognize the voice information in the obtained multiple video segments to be stored according to a preset voice recognition model to obtain text information corresponding to the speaker.
其他申请实施例中,如图8所示,所述语音信息识别单元120包括子单元:音素切分单元121、拼音信息获取单元122和文字信息获取单元123。In other application embodiments, as shown in FIG. 8, the voice information recognition unit 120 includes subunits: a phoneme segmentation unit 121, a phonetic information acquisition unit 122 and a text information acquisition unit 123.
音素切分单元121,用于根据语音识别模型中的声学模型对所述语音信息进行切分以得到语音信息中所包含的多个音素。The phoneme segmentation unit 121 is configured to segment the voice information according to the acoustic model in the voice recognition model to obtain multiple phonemes contained in the voice information.
拼音信息获取单元122,用于根据语音识别模型中的语音特征词典对所得到的音素进行匹配,以将所有音素转换为拼音信息。The pinyin information acquisition unit 122 is configured to match the obtained phonemes according to the phonetic feature dictionary in the voice recognition model to convert all phonemes into pinyin information.
文字信息获取单元123,用于根据语音识别模型中的语义解析模型对所得到的拼音信息进行语义解析以将拼音信息转换为文字信息。The text information acquiring unit 123 is configured to perform semantic analysis on the obtained pinyin information according to the semantic analysis model in the speech recognition model to convert the pinyin information into text information.
视图信息获取单元130,用于从待存储视频段中截取与文字信息对应的视图信息。The view information obtaining unit 130 is used to intercept the view information corresponding to the text information from the video segment to be stored.
信息存储单元140,用于根据文字信息对应的讲话人将所得到的文字信息及视图信息存储至预设数据库与该讲话人对应的链表中。The information storage unit 140 is configured to store the obtained text information and view information in a preset database in a linked list corresponding to the speaker according to the speaker corresponding to the text information.
其他申请实施例中,如图9所示,所述信息存储单元140包括子单元:时间信息获取单元141、文字信息存储单元142和视图信息存储单元143。In other application embodiments, as shown in FIG. 9, the information storage unit 140 includes subunits: a time information acquisition unit 141, a text information storage unit 142, and a view information storage unit 143.
时间信息获取单元141,用于获取文字信息在待处理视频文件中对应的时间信息。The time information obtaining unit 141 is used to obtain the time information corresponding to the text information in the to-be-processed video file.
文字信息存储单元142,用于根据文字信息的时间信息及对应的讲话人以将文字信息存储至与讲话人对应的链表中。The text information storage unit 142 is configured to store the text information in a linked list corresponding to the speaker based on the time information of the text information and the corresponding speaker.
视图信息存储单元143,用于将与文字信息对应的视图信息插入所述链表中的已存储文字信息以对视图信息进行保存。The view information storage unit 143 is configured to insert the view information corresponding to the text information into the stored text information in the linked list to save the view information.
其他申请实施例中,如图10所示,所述视频信息链式存储装置100还包括子单元:索引信息存储单元150。In other application embodiments, as shown in FIG. 10, the video information chain storage device 100 further includes a subunit: an index information storage unit 150.
索引信息存储单元150,用于根据待处理视频文件的编号信息、视频时间戳生成与文字信息对应的索引信息并存储至所述数据库中。The index information storage unit 150 is configured to generate index information corresponding to the text information according to the number information of the video file to be processed and the video time stamp and store it in the database.
上述视频信息链式存储装置可以实现为计算机程序的形式,该计算机程序可以在如图11所示的计算机设备上运行。The above-mentioned video information chain storage device may be implemented in the form of a computer program, and the computer program may run on a computer device as shown in FIG. 11.
请参阅图11,图11是本申请实施例提供的计算机设备的示意性框图。Please refer to FIG. 11, which is a schematic block diagram of a computer device provided by an embodiment of the present application.
参阅图11,该计算机设备500包括通过系统总线501连接的处理器502、存储器和网络接口505,其中,存储器可以包括非易失性存储介质503和内存储器504。该非易失性存储介质503可存储操作系统5031和计算机程序5032。该计算机程序5032被执行时,可使得处理器502执行视频信息链式存储方法。该处理器502用于提供计算和控制能力,支撑整个计算机设备500的运行。该内存储器504为非易失性存储介质503中的计算机程序5032的运行提供环境,该计算机程序5032被处理器502执行时,可使得处理器502执行视频信息链式存储方法。该网络接口505用于进行网络通信,如提供数据信息的传输等。本领域技术人员可以理解,图11中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备500的限定,具体的计算机设备500可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Referring to FIG. 11, the computer device 500 includes a processor 502, a memory, and a network interface 505 connected through a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504. The non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032. When the computer program 5032 is executed, the processor 502 can execute the video information chain storage method. The processor 502 is used to provide computing and control capabilities and support the operation of the entire computer device 500. The internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503. When the computer program 5032 is executed by the processor 502, the processor 502 can execute the video information chain storage method. The network interface 505 is used for network communication, such as the transmission of data information. Those skilled in the art can understand that the structure shown in FIG. 11 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied. The specific computer device 500 may include more or less components than shown in the figure, or combine certain components, or have a different arrangement of components.
其中,所述处理器502用于运行存储在存储器中的计算机程序5032,以实现本申请的视频信息链式存储方法。Wherein, the processor 502 is used to run the computer program 5032 stored in the memory to implement the video information chain storage method of the present application.
本领域技术人员可以理解,图11中示出的计算机设备的实施例并不构成对 计算机设备具体构成的限定,在其他实施例中,计算机设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。例如,在一些实施例中,计算机设备可以仅包括存储器及处理器,在这样的实施例中,存储器及处理器的结构及功能与图11所示实施例一致,在此不再赘述。Those skilled in the art can understand that the embodiment of the computer device shown in FIG. 11 does not constitute a limitation on the specific configuration of the computer device. In other embodiments, the computer device may include more or fewer components than shown in the figure. Or combine certain components, or arrange different components. For example, in some embodiments, the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are consistent with the embodiment shown in FIG. 11, and details are not described herein again.
应当理解,在本申请实施例中,处理器502可以是中央处理单元(Central Processing Unit,CPU),该处理器502还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that in the embodiment of the present application, the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), Application specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor.
在本申请的另一实施例中提供计算机可读存储介质。该计算机可读存储介质可以为非易失性的计算机可读存储介质。该计算机可读存储介质存储有计算机程序,其中计算机程序被处理器执行时实现本申请实施例的视频信息链式存储方法。In another embodiment of the present application, a computer-readable storage medium is provided. The computer-readable storage medium may be a non-volatile computer-readable storage medium. The computer-readable storage medium stores a computer program, where the computer program is executed by the processor to implement the video information chain storage method of the embodiments of the present application.
所述存储介质可以是前述设备的内部存储单元,例如设备的硬盘或内存。所述存储介质也可以是所述设备的外部存储设备,例如所述设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储介质还可以既包括所述设备的内部存储单元也包括外部存储设备。The storage medium may be an internal storage unit of the foregoing device, such as a hard disk or a memory of the device. The storage medium may also be an external storage device of the device, such as a plug-in hard disk equipped on the device, a smart memory card (Smart) Card (SMC), a secure digital (SD) card, or a flash memory card (Flash Card) etc. Further, the storage medium may also include both an internal storage unit of the device and an external storage device.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的设备、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, the specific working processes of the devices, devices, and units described above can refer to the corresponding processes in the foregoing method embodiments, and are not repeated here.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above is only the specific implementation of this application, but the scope of protection of this application is not limited to this, any person skilled in the art can easily think of various equivalents within the technical scope disclosed in this application Modifications or replacements, these modifications or replacements should be covered within the scope of protection of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (20)

  1. 一种视频信息链式存储方法,包括:A video information chain storage method, including:
    获取待处理视频文件,通过视频切割模型对待处理视频文件进行切割以得到多个待存储视频段;Obtain the video file to be processed, and cut the video file to be processed through the video cutting model to obtain multiple video segments to be stored;
    根据预设的语音识别模型对所得到的多个待存储视频段中的语音信息进行识别以得到与讲话人对应的文字信息;Recognize the voice information in the obtained multiple video segments to be stored according to a preset voice recognition model to obtain text information corresponding to the speaker;
    从待存储视频段中截取与文字信息对应的视图信息;Intercept view information corresponding to text information from the video segment to be stored;
    根据所述讲话人将所得到的文字信息及视图信息存储至预设数据库与该讲话人对应的链表中。According to the speaker, the obtained text information and view information are stored in a preset database in a linked list corresponding to the speaker.
  2. 根据权利要求1所述的视频信息链式存储方法,其中,所述通过视频切割模型对待处理视频文件进行切割以得到多个待存储视频段,包括:The video information chain storage method according to claim 1, wherein the video file to be processed is cut by the video cutting model to obtain a plurality of video segments to be stored, including:
    通过视频切割模型及待处理视频文件中的讲话人信息获取得到讲话人切换时间点;Obtain the speaker switching time through the video cutting model and the speaker information in the video file to be processed;
    根据待处理视频文件中讲话人切换时间点对待处理视频文件进行切割以获取每一个讲话人对应的待存储视频段。The video file to be processed is cut according to the time point when the speaker switches in the video file to be processed to obtain the video segment to be stored corresponding to each speaker.
  3. 根据权利要求1所述的视频信息链式存储方法,其中,所述根据预设的语音识别模型对所得到的多个待存储视频段中的语音信息进行识别以得到与讲话人对应的文字信息,包括:The video information chain storage method according to claim 1, wherein the voice information in the obtained multiple video segments to be stored is recognized according to a preset voice recognition model to obtain text information corresponding to the speaker ,include:
    根据语音识别模型中的声学模型对所述语音信息进行切分以得到语音信息中所包含的多个音素;Dividing the voice information according to the acoustic model in the voice recognition model to obtain multiple phonemes contained in the voice information;
    根据语音识别模型中的语音特征词典对所得到的音素进行匹配,以将所有音素转换为拼音信息;Match the obtained phonemes according to the speech feature dictionary in the speech recognition model to convert all phonemes into pinyin information;
    根据语音识别模型中的语义解析模型对所得到的拼音信息进行语义解析以将拼音信息转换为文字信息。According to the semantic analysis model in the speech recognition model, the obtained pinyin information is semantically analyzed to convert the pinyin information into text information.
  4. 根据权利要求1所述的视频信息链式存储方法,其中,所述根据文字信息对应的讲话人将所得到的文字信息及视图信息存储至预设数据库与该讲话人对应的链表中,包括:The video information chain storage method according to claim 1, wherein the speaker corresponding to the text information stores the obtained text information and view information in a preset database in a linked list corresponding to the speaker, including:
    获取文字信息在待处理视频文件中对应的时间信息;Obtain the time information corresponding to the text information in the pending video file;
    根据文字信息的时间信息及对应的讲话人以将文字信息存储至与讲话人对 应的链表中;According to the time information of the text information and the corresponding speaker, the text information is stored in the linked list corresponding to the speaker;
    将与文字信息对应的视图信息插入所述链表中的已存储文字信息以对视图信息进行保存。Insert the view information corresponding to the text information into the stored text information in the linked list to save the view information.
  5. 根据权利要求1所述的视频信息链式存储方法,其中,所述根据文字信息对应的讲话人将所得到的文字信息及视图信息存储至预设数据库与该讲话人对应的链表中之后,还包括:The video information chain storage method according to claim 1, wherein after the speaker corresponding to the text information stores the obtained text information and view information in a preset database in a linked list corresponding to the speaker, further include:
    根据待处理视频文件的编号信息、视频时间戳生成与文字信息对应的索引信息并存储至所述数据库中。The index information corresponding to the text information is generated according to the number information of the video file to be processed and the video time stamp and stored in the database.
  6. 根据权利要求2所述的视频信息链式存储方法,其中,所述通过视频切割模型及待处理视频文件中的讲话人信息获取得到讲话人切换时间点,包括:The video information chain storage method according to claim 2, wherein the speaker switching time point obtained by acquiring the speaker information in the video cutting model and the to-be-processed video file includes:
    对所述讲话人信息中是否仅包含一个讲话人进行判断;Determine whether the speaker information contains only one speaker;
    若所述讲话人信息中不是仅包含一个讲话人,根据所述视频切割模型获取所述待处理视频文件中的讲话人切换时间点。If the speaker information does not include only one speaker, obtain the speaker switching time point in the to-be-processed video file according to the video cutting model.
  7. 根据权利要求1所述的视频信息链式存储方法,其中,所述视图信息为一段视频或一张图片。The video information chain storage method according to claim 1, wherein the view information is a piece of video or a picture.
  8. 一种视频信息链式存储装置,包括:A video information chain storage device, including:
    视频文件切割单元,用于获取待处理视频文件,通过视频切割模型对待处理视频文件进行切割以得到多个待存储视频段;The video file cutting unit is used to obtain the video file to be processed, and the video file to be processed is cut through the video cutting model to obtain multiple video segments to be stored;
    语音信息识别单元,用于根据预设的语音识别模型对所得到的多个待存储视频段中的语音信息进行识别以得到与讲话人对应的文字信息;The voice information recognition unit is used to recognize the voice information in the obtained multiple video segments to be stored according to a preset voice recognition model to obtain text information corresponding to the speaker;
    视图信息获取单元,用于从待存储视频段中截取与文字信息对应的视图信息;The view information acquisition unit is used to intercept the view information corresponding to the text information from the video segment to be stored;
    信息存储单元,用于根据文字信息对应的讲话人将所得到的文字信息及视图信息存储至预设数据库与该讲话人对应的链表中。The information storage unit is configured to store the obtained text information and view information in a linked list corresponding to the speaker in the preset database according to the speaker corresponding to the text information.
  9. 根据权利要求8所述的视频信息链式存储装置,其中,所述视频文件切割单元,包括:The video information chain storage device according to claim 8, wherein the video file cutting unit includes:
    切换时间点获取单元,用于通过视频切割模型及待处理视频文件中的讲话人信息获取得到讲话人切换时间点;A switching time point acquisition unit, which is used to obtain the speaker switching time point through the video cutting model and the speaker information in the video file to be processed;
    切割处理单元,用于根据待处理视频文件中讲话人切换时间点对待处理视频文件进行切割以获取每一个讲话人对应的待存储视频段。The cutting processing unit is configured to cut the video file to be processed according to the time point when the speaker switches in the video file to be processed to obtain the video segment to be stored corresponding to each speaker.
  10. 根据权利要求8所述的视频信息链式存储装置,其中,所述语音信息识别单元,包括:The video information chain storage device according to claim 8, wherein the voice information recognition unit includes:
    音素切分单元,用于根据语音识别模型中的声学模型对所述语音信息进行切分以得到语音信息中所包含的多个音素;The phoneme segmentation unit is used to segment the voice information according to the acoustic model in the voice recognition model to obtain multiple phonemes contained in the voice information;
    拼音信息获取单元,用于根据语音识别模型中的语音特征词典对所得到的音素进行匹配,以将所有音素转换为拼音信息;The Pinyin information acquisition unit is used to match the obtained phonemes according to the speech feature dictionary in the speech recognition model to convert all phonemes into Pinyin information;
    文字信息获取单元,用于根据语音识别模型中的语义解析模型对所得到的拼音信息进行语义解析以将拼音信息转换为文字信息。The text information acquisition unit is used to perform semantic analysis on the obtained pinyin information according to the semantic analysis model in the speech recognition model to convert the pinyin information into text information.
  11. 一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现以下步骤:A computer device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the following steps when executing the computer program:
    获取待处理视频文件,通过视频切割模型对待处理视频文件进行切割以得到多个待存储视频段;Obtain the video file to be processed, and cut the video file to be processed through the video cutting model to obtain multiple video segments to be stored;
    根据预设的语音识别模型对所得到的多个待存储视频段中的语音信息进行识别以得到与讲话人对应的文字信息;Recognize the voice information in the obtained multiple video segments to be stored according to a preset voice recognition model to obtain text information corresponding to the speaker;
    从待存储视频段中截取与文字信息对应的视图信息;Intercept view information corresponding to text information from the video segment to be stored;
    根据所述讲话人将所得到的文字信息及视图信息存储至预设数据库与该讲话人对应的链表中。According to the speaker, the obtained text information and view information are stored in a preset database in a linked list corresponding to the speaker.
  12. 根据权利要求11所述的计算机设备,其中,所述通过视频切割模型对待处理视频文件进行切割以得到多个待存储视频段,包括:The computer device according to claim 11, wherein the video file to be processed is cut by the video cutting model to obtain a plurality of video segments to be stored, including:
    通过视频切割模型及待处理视频文件中的讲话人信息获取得到讲话人切换时间点;Obtain the speaker switching time through the video cutting model and the speaker information in the video file to be processed;
    根据待处理视频文件中讲话人切换时间点对待处理视频文件进行切割以获取每一个讲话人对应的待存储视频段。The video file to be processed is cut according to the time point when the speaker switches in the video file to be processed to obtain the video segment to be stored corresponding to each speaker.
  13. 根据权利要求11所述的计算机设备,其中,所述根据预设的语音识别模型对所得到的多个待存储视频段中的语音信息进行识别以得到与讲话人对应的文字信息,包括:The computer device according to claim 11, wherein the recognizing the voice information in the obtained plurality of video segments to be stored according to a preset voice recognition model to obtain text information corresponding to the speaker includes:
    根据语音识别模型中的声学模型对所述语音信息进行切分以得到语音信息中所包含的多个音素;Dividing the voice information according to the acoustic model in the voice recognition model to obtain multiple phonemes contained in the voice information;
    根据语音识别模型中的语音特征词典对所得到的音素进行匹配,以将所有 音素转换为拼音信息;Match the obtained phonemes according to the speech feature dictionary in the speech recognition model to convert all phonemes into pinyin information;
    根据语音识别模型中的语义解析模型对所得到的拼音信息进行语义解析以将拼音信息转换为文字信息。According to the semantic analysis model in the speech recognition model, the obtained pinyin information is semantically analyzed to convert the pinyin information into text information.
  14. 根据权利要求11所述的计算机设备,其中,所述根据文字信息对应的讲话人将所得到的文字信息及视图信息存储至预设数据库与该讲话人对应的链表中,包括:The computer device according to claim 11, wherein the speaker corresponding to the text information stores the obtained text information and view information in a preset database in a linked list corresponding to the speaker, including:
    获取文字信息在待处理视频文件中对应的时间信息;Obtain the time information corresponding to the text information in the pending video file;
    根据文字信息的时间信息及对应的讲话人以将文字信息存储至与讲话人对应的链表中;According to the time information of the text information and the corresponding speaker, the text information is stored in the linked list corresponding to the speaker;
    将与文字信息对应的视图信息插入所述链表中的已存储文字信息以对视图信息进行保存。Insert the view information corresponding to the text information into the stored text information in the linked list to save the view information.
  15. 根据权利要求11所述的计算机设备,其中,所述根据文字信息对应的讲话人将所得到的文字信息及视图信息存储至预设数据库与该讲话人对应的链表中之后,还包括:The computer device according to claim 11, wherein after the speaker corresponding to the text information stores the obtained text information and view information in a preset database and a linked list corresponding to the speaker, the method further comprises:
    根据待处理视频文件的编号信息、视频时间戳生成与文字信息对应的索引信息并存储至所述数据库中。The index information corresponding to the text information is generated according to the number information of the video file to be processed and the video time stamp and stored in the database.
  16. 根据权利要求12所述的计算机设备,其中,所述通过视频切割模型及待处理视频文件中的讲话人信息获取得到讲话人切换时间点,包括:The computer device according to claim 12, wherein the obtaining the speaker switching time point through the video cutting model and the speaker information in the to-be-processed video file includes:
    对所述讲话人信息中是否仅包含一个讲话人进行判断;Determine whether the speaker information contains only one speaker;
    若所述讲话人信息中不是仅包含一个讲话人,根据所述视频切割模型获取所述待处理视频文件中的讲话人切换时间点。If the speaker information does not include only one speaker, obtain the speaker switching time point in the to-be-processed video file according to the video cutting model.
  17. 根据权利要求11所述的计算机设备,其中,所述视图信息为一段视频或一张图片。The computer device according to claim 11, wherein the view information is a piece of video or a picture.
  18. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行以下操作:A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, which when executed by a processor causes the processor to perform the following operations:
    获取待处理视频文件,通过视频切割模型对待处理视频文件进行切割以得到多个待存储视频段;Obtain the video file to be processed, and cut the video file to be processed through the video cutting model to obtain multiple video segments to be stored;
    根据预设的语音识别模型对所得到的多个待存储视频段中的语音信息进行识别以得到与讲话人对应的文字信息;Recognize the voice information in the obtained multiple video segments to be stored according to a preset voice recognition model to obtain text information corresponding to the speaker;
    从待存储视频段中截取与文字信息对应的视图信息;Intercept view information corresponding to text information from the video segment to be stored;
    根据所述讲话人将所得到的文字信息及视图信息存储至预设数据库与该讲话人对应的链表中。According to the speaker, the obtained text information and view information are stored in a preset database in a linked list corresponding to the speaker.
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述通过视频切割模型对待处理视频文件进行切割以得到多个待存储视频段,包括:The computer-readable storage medium according to claim 18, wherein the video file to be processed is cut by the video cutting model to obtain a plurality of video segments to be stored, including:
    通过视频切割模型及待处理视频文件中的讲话人信息获取得到讲话人切换时间点;Obtain the speaker switching time through the video cutting model and the speaker information in the video file to be processed;
    根据待处理视频文件中讲话人切换时间点对待处理视频文件进行切割以获取每一个讲话人对应的待存储视频段。The video file to be processed is cut according to the time point when the speaker switches in the video file to be processed to obtain the video segment to be stored corresponding to each speaker.
  20. 根据权利要求18所述的计算机可读存储介质,其中,所述根据预设的语音识别模型对所得到的多个待存储视频段中的语音信息进行识别以得到与讲话人对应的文字信息,包括:The computer-readable storage medium according to claim 18, wherein the speech information in the obtained plurality of video segments to be stored is recognized according to a preset speech recognition model to obtain text information corresponding to the speaker, include:
    根据语音识别模型中的声学模型对所述语音信息进行切分以得到语音信息中所包含的多个音素;Dividing the voice information according to the acoustic model in the voice recognition model to obtain multiple phonemes contained in the voice information;
    根据语音识别模型中的语音特征词典对所得到的音素进行匹配,以将所有音素转换为拼音信息;Match the obtained phonemes according to the speech feature dictionary in the speech recognition model to convert all phonemes into pinyin information;
    根据语音识别模型中的语义解析模型对所得到的拼音信息进行语义解析以将拼音信息转换为文字信息。According to the semantic analysis model in the speech recognition model, the obtained pinyin information is semantically analyzed to convert the pinyin information into text information.
PCT/CN2019/092636 2018-11-21 2019-06-25 Link-type storage method and apparatus for video information, computer device and storage medium WO2020103447A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811389154.9A CN109582823A (en) 2018-11-21 2018-11-21 Video information chain type storage method, device, computer equipment and storage medium
CN201811389154.9 2018-11-21

Publications (1)

Publication Number Publication Date
WO2020103447A1 true WO2020103447A1 (en) 2020-05-28

Family

ID=65923631

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/092636 WO2020103447A1 (en) 2018-11-21 2019-06-25 Link-type storage method and apparatus for video information, computer device and storage medium

Country Status (2)

Country Link
CN (1) CN109582823A (en)
WO (1) WO2020103447A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582823A (en) * 2018-11-21 2019-04-05 平安科技(深圳)有限公司 Video information chain type storage method, device, computer equipment and storage medium
CN110414352B (en) * 2019-06-26 2022-11-29 深圳职业技术学院 Method for extracting PPT (Power Point) file information from video file and related equipment
CN114697706A (en) * 2022-03-29 2022-07-01 深圳市恒扬数据股份有限公司 Video content processing method, device, terminal and storage medium
CN115129198B (en) * 2022-06-13 2023-10-27 中移互联网有限公司 Data acquisition method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110205330A1 (en) * 2010-02-25 2011-08-25 Ricoh Company, Ltd. Video conference system, processing method used in the same, and machine-readable medium
CN105512348A (en) * 2016-01-28 2016-04-20 北京旷视科技有限公司 Method and device for processing videos and related audios and retrieving method and device
CN105957531A (en) * 2016-04-25 2016-09-21 上海交通大学 Speech content extracting method and speech content extracting device based on cloud platform
CN106162222A (en) * 2015-04-22 2016-11-23 无锡天脉聚源传媒科技有限公司 A kind of method and device of video lens cutting
CN107241616A (en) * 2017-06-09 2017-10-10 腾讯科技(深圳)有限公司 video lines extracting method, device and storage medium
CN109582823A (en) * 2018-11-21 2019-04-05 平安科技(深圳)有限公司 Video information chain type storage method, device, computer equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104301649B (en) * 2014-10-29 2019-08-16 上海斐讯数据通信技术有限公司 A kind of method and system that video capture is stored, played
CN105912615A (en) * 2016-04-05 2016-08-31 重庆大学 Human voice content index based audio and video file management method
CN107273423B (en) * 2017-05-15 2019-04-12 中国移动通信集团湖北有限公司 Multimedia message data processing method, device and system
CN108829765A (en) * 2018-05-29 2018-11-16 平安科技(深圳)有限公司 A kind of information query method, device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110205330A1 (en) * 2010-02-25 2011-08-25 Ricoh Company, Ltd. Video conference system, processing method used in the same, and machine-readable medium
CN106162222A (en) * 2015-04-22 2016-11-23 无锡天脉聚源传媒科技有限公司 A kind of method and device of video lens cutting
CN105512348A (en) * 2016-01-28 2016-04-20 北京旷视科技有限公司 Method and device for processing videos and related audios and retrieving method and device
CN105957531A (en) * 2016-04-25 2016-09-21 上海交通大学 Speech content extracting method and speech content extracting device based on cloud platform
CN107241616A (en) * 2017-06-09 2017-10-10 腾讯科技(深圳)有限公司 video lines extracting method, device and storage medium
CN109582823A (en) * 2018-11-21 2019-04-05 平安科技(深圳)有限公司 Video information chain type storage method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN109582823A (en) 2019-04-05

Similar Documents

Publication Publication Date Title
WO2020103447A1 (en) Link-type storage method and apparatus for video information, computer device and storage medium
US10079014B2 (en) Name recognition system
WO2019095586A1 (en) Meeting minutes generation method, application server, and computer readable storage medium
WO2020087655A1 (en) Translation method, apparatus and device, and readable storage medium
WO2019232991A1 (en) Method for recognizing conference voice as text, electronic device and storage medium
US11138971B2 (en) Using context to interpret natural language speech recognition commands
US10811005B2 (en) Adapting voice input processing based on voice input characteristics
US10270736B2 (en) Account adding method, terminal, server, and computer storage medium
US11011170B2 (en) Speech processing method and device
JP5860171B2 (en) Input processing method and apparatus
CN110164435A (en) Audio recognition method, device, equipment and computer readable storage medium
US10108698B2 (en) Common data repository for improving transactional efficiencies of user interactions with a computing device
WO2019223134A1 (en) Voice message searching method and apparatus, computer device, and storage medium
WO2016023317A1 (en) Voice information processing method and terminal
WO2020119064A1 (en) Method and device for storing internet information in linked manner, computer apparatus and storage medium
WO2021135603A1 (en) Intention recognition method, server and storage medium
WO2013163804A1 (en) Method and device for adjusting word bank
TW201606750A (en) Speech recognition using a foreign word grammar
WO2021218069A1 (en) Dynamic scenario configuration-based interactive processing method and apparatus, and computer device
CN108682421B (en) Voice recognition method, terminal equipment and computer readable storage medium
CN106713111B (en) Processing method for adding friends, terminal and server
WO2020037921A1 (en) Expression picture prompting method and apparatus, computer device, and storage medium
TW201339862A (en) System and method for eliminating language ambiguity
US8868419B2 (en) Generalizing text content summary from speech content
KR20190074508A (en) Method for crowdsourcing data of chat model for chatbot

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19887972

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19887972

Country of ref document: EP

Kind code of ref document: A1