WO2020103447A1

WO2020103447A1 - Link-type storage method and apparatus for video information, computer device and storage medium

Info

Publication number: WO2020103447A1
Application number: PCT/CN2019/092636
Authority: WO
Inventors: 吴壮伟
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-11-21
Filing date: 2019-06-25
Publication date: 2020-05-28
Also published as: CN109582823A

Abstract

The present application discloses a link-type storage method and apparatus for video information, a computer device and a storage medium. The method comprises: acquiring a video file to be processed, and segmenting said video file by means of a video segmenting model to obtain a plurality of video segments to be stored; recognizing, according to a preset voice recognition model, voice information in the obtained plurality of video segment, to obtain text information corresponding to the speaker; clipping, from said video segments, view information corresponding to the text information; storing, according to the speaker corresponding to the text information, the obtained text information and view information into a linked list corresponding to the speaker in a preset database.

Description

Video information chain storage method, device, computer equipment and storage medium

This application requires the priority of the Chinese patent application submitted to the China Patent Office on November 21, 2018, with the application number 201811389154.9 and the application name as "video information chain storage method, device, computer equipment and storage medium", all of its content Incorporated by reference in this application.

Technical field

This application relates to the field of computer technology, and in particular to a video information chain storage method, device, computer equipment, and storage medium.

Background technique

When saving the video file, it is necessary to ensure that important information in the video file is not omitted, so it requires a large memory space. The traditional processing method is to convert the format of the video file or reduce the resolution of the video file to The compression process is performed to reduce the number of bytes of the video file. However, because the video file obtained after the processing still needs a large memory space to be stored, the problem that the video file takes up a lot of memory space cannot be perfectly solved. Therefore, the existing video processing method has a problem that the video file cannot be stored lightly.

Summary of the invention

Embodiments of the present application provide a video information chain storage method, device, computer equipment, and storage medium, which are intended to solve the problem in the prior art that the video file cannot be stored lightly.

In a first aspect, an embodiment of the present application provides a video information chain storage method, which includes: acquiring a video file to be processed, and cutting the video file to be processed through a video cutting model to obtain multiple video segments to be stored; according to a preset The speech recognition model recognizes the voice information in the multiple video segments to be stored to obtain the text information corresponding to the speaker; intercepts the view information corresponding to the text information from the video segment to be stored; according to the text information The speaker stores the obtained text information and view information in a preset database and a linked list corresponding to the speaker.

In a second aspect, an embodiment of the present application provides a video information chain storage device, which includes a video file cutting unit for acquiring a video file to be processed, and cutting the video file to be processed through a video cutting model to obtain multiple Stored video segment; voice information recognition unit, used to recognize the voice information in the obtained multiple video segments to be stored according to a preset voice recognition model to obtain text information corresponding to the speaker; view information acquisition unit, used To intercept the view information corresponding to the text information from the video segment to be stored; the information storage unit is used to store the obtained text information and view information to the preset database and the linked list corresponding to the speaker according to the speaker corresponding to the text information in.

In a third aspect, an embodiment of the present application further provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor executing the computer The program implements the video information chain storage method described in the first aspect above.

According to a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor causes the processor to execute the On the one hand, the video information chain storage method.

BRIEF DESCRIPTION

In order to more clearly explain the technical solutions of the embodiments of the present application, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. Ordinary technicians can obtain other drawings based on these drawings without creative work.

FIG. 1 is a schematic flowchart of a video information chain storage method provided by an embodiment of the present application;

2 is a schematic diagram of a sub-process of a video information chain storage method provided by an embodiment of the present application;

3 is a schematic diagram of another sub-process of a video information chain storage method provided by an embodiment of the present application;

4 is a schematic diagram of another sub-process of a video information chain storage method provided by an embodiment of this application;

5 is another schematic flowchart of a video information chain storage method provided by an embodiment of the present application;

6 is a schematic block diagram of a video information chain storage device provided by an embodiment of the present application;

7 is a schematic block diagram of a subunit of a video information chain storage device provided by an embodiment of this application;

8 is a schematic block diagram of another subunit of a video information chain storage device provided by an embodiment of the present application;

9 is a schematic block diagram of another subunit of a video information chain storage device provided by an embodiment of this application;

10 is another schematic block diagram of a video information chain storage device provided by an embodiment of the present application;

FIG. 11 is a schematic block diagram of a computer device provided by an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without creative work fall within the protection scope of the present application.

It should be understood that when used in this specification and the appended claims, the terms "including" and "comprising" indicate the presence of described features, wholes, steps, operations, elements, and / or components, but do not exclude one or The presence or addition of multiple other features, wholes, steps, operations, elements, components, and / or collections thereof.

It should also be understood that the terminology used in the description of this application is for the purpose of describing particular embodiments only and is not intended to limit this application. As used in the specification of the present application and the appended claims, unless the context clearly indicates otherwise, the singular forms "a", "an", and "the" are intended to include the plural forms.

It should also be further understood that the term "and / or" used in the specification of the present application and the appended claims refers to any and all possible combinations of one or more of the associated listed items and includes these combinations .

Please refer to FIG. 1. FIG. 1 is a schematic flowchart of a video information chain storage method provided by an embodiment of the present application. The video information chain storage method is applied to terminal devices with information storage functions, such as desktop computers, notebook computers, tablet computers, or mobile phones.

As shown in FIG. 1, the method includes steps S110-S140.

S110: Obtain a video file to be processed, and cut the video file to be processed through a video cutting model to obtain multiple video segments to be stored.

Obtain the video file to be processed, and cut the video file to be processed through the video cutting model to obtain multiple video segments to be stored. The user inputs a to-be-processed video file into the user terminal, identifies and cuts the to-be-processed video file through a video cutting model, and obtains multiple video segments to be stored. Among them, the to-be-processed video file is a video file input by the user and needs to be stored in a light weight. The to-be-processed video file includes number information, video time stamp, and speaker information. Specifically, the number information is the number used to identify the to-be-processed video file, that is, the ID of the to-be-processed video file, and each to-be-processed video file has a corresponding number information, and the to-be-processed video file number information is not repeated ; Video timestamp is the information used to mark the time of the video file to be processed. The video timestamp can be used to determine the specific creation time of the video file to be processed; the speaker information is the speaker contained in the video file to be processed Information, one to be processed video file can contain one or more speakers.

For example, if the to-be-processed video file is a course video file, the corresponding speaker information contains only one speaker; if the to-be-processed video file is a face-to-face interview recording, the corresponding speaker information contains Multiple speakers.

For example, the information contained in a certain to-be-processed video file is shown in Table 1.

编号信息Number information	视频时间戳Video timestamp	讲话人信息Speaker Information
S10021S10021	2018.04.112018.04.11	AA、BB、CCCAA, BB, CCC

Table 1

In an embodiment, as shown in FIG. 2, step S110 includes sub-steps S111 and S112.

S111. Obtain the speaker switching time point through the video cutting model and the speaker information in the to-be-processed video file.

The speaker switching time point is obtained through the video cutting model and the speaker information in the to-be-processed video file. Specifically, if the speaker information contains only one speaker, there is no speaker switching time point in the pending video file, and the pending video file is not cut; if the speaker information contains multiple speakers, the If the video file to be processed contains one or more speaker switching time points, the speaker switching time point in the video file to be processed needs to be obtained according to the video cutting model. Recognize the speakers in the video file to be processed through the video cutting model to obtain the switching time point of switching from one speaker to another in the video file to be processed. The video cutting model contains the facial recognition results of all speakers , Face recognition of the speaker in the video file to be processed through the video cutting model can be matched to the speaker of the current picture. When the speaker of the previous picture and the speaker of the next picture in the video file to be processed are different, the speech occurs Person switching, obtaining the time when the speaker switch occurs in the two screens of the video file to be processed as the obtained speaker switching time point.

S112: Cut the video file to be processed according to the time point when the speaker switches in the video file to be processed to obtain the video segment to be stored corresponding to each speaker.

The video file to be processed is cut according to the time point when the speaker switches in the video file to be processed to obtain the video segment to be stored corresponding to each speaker. According to the obtained speaker switching time point, the to-be-processed video file can be cut into multiple video segments to be stored. Specifically, each video segment to be stored corresponds to a speaker, and each video segment to be stored includes its The corresponding time information in the video file to be processed.

S120. Identify the obtained voice information in the plurality of video segments to be stored according to a preset voice recognition model to obtain text information corresponding to the speaker.

Recognize the obtained voice information in the plurality of video segments to be stored according to a preset voice recognition model to obtain text information corresponding to the speaker. The speech information in the stored video segment can be recognized through the speech recognition model to obtain corresponding text information, and each text information corresponds to a speaker. The voice recognition model is a specific model for recognizing the voice information in the video file. Speech recognition models include acoustic models, speech feature dictionaries, and semantic analysis models.

In one embodiment, as shown in FIG. 3, step S120 includes sub-steps S121, S122, and S123.

S121. Segment the voice information according to the acoustic model in the voice recognition model to obtain multiple phonemes contained in the voice information.

The voice information is segmented according to the acoustic model in the voice recognition model to obtain multiple phonemes contained in the voice information. The voice information is a sentence that the user says by uttering. Specifically, the voice information received by the user terminal is composed of phonemes that are pronounced by multiple characters, and the phonemes of a character include the frequency and timbre of the pronunciation of the character. The acoustic model contains phonemes for pronunciation of all characters. By matching the phonetic information with all phonemes in the acoustic model, the phonemes of individual characters in the phonetic information can be segmented, and finally the phonetic information contained in the phonetic information can be obtained Multiple phonemes.

S122. Match the obtained phonemes according to the speech feature dictionary in the speech recognition model to convert all phonemes into pinyin information.

According to the speech feature dictionary in the speech recognition model to match the obtained phonemes, all phonemes can be converted into pinyin information. The phonetic feature dictionary contains the phoneme information corresponding to the pinyin of all characters. By matching the obtained phoneme with the phoneme information corresponding to the pinyin of the character, the phoneme of a single character can be converted into the pinyin of the character in the phonetic feature dictionary that matches the phoneme , To convert all phonemes contained in the voice information into pinyin information.

S123. Perform semantic analysis on the obtained pinyin information according to the semantic analysis model in the speech recognition model to convert the pinyin information into text information.

According to the semantic analysis model in the speech recognition model, the obtained pinyin information is semantically analyzed to convert the pinyin information into text information. The semantic analysis model contains the mapping relationship between the pinyin information and the text information. Through the mapping relationship contained in the semantic analysis model, the obtained pinyin information can be semantically analyzed to convert the pinyin information into text information.

For example, the text information corresponding to the pinyin "hé, píng" in the semantic analysis model is "peace".

S130. Extract view information corresponding to the text information from the video segment to be stored.

Randomly intercept the view information corresponding to the text information from the video segment to be stored. View information randomly intercepted from the video segment to be stored is view information corresponding to the video segment to be stored. Since each video segment to be stored corresponds to a speaker, the captured view information corresponds to the speaker. Specifically, the view information may be a video or a picture. By using a video or a picture as the view information corresponding to the video segment to be stored, the view information of the speaker in the stored video segment can be intercepted and saved. For example, if the view information is a video, you can intercept a 5 second or 10 second video in the video segment to be stored as the view information corresponding to the video segment to be stored; if the view information is a picture, the video segment to be stored Randomly intercept a picture of the speaker as view information corresponding to the video segment to be stored.

S140: The obtained text information and view information are stored in the linked list corresponding to the speaker in the preset database according to the speaker corresponding to the text information.

Store the text information and view information contained in the to-be-processed video file into a linked list, so that the information contained in the to-be-processed video file can be stored lightly. Since the voice information in the to-be-processed video file is converted into text information and the view information of each speaker is intercepted from the to-be-processed video file, the information contained in the to-be-processed video file is saved. The preset database is a database used to store data information. The database contains multiple linked lists. The linked list is a data storage unit that stores text information and view information contained in the video file to be processed according to the time axis. The speaker corresponds to a linked list in the database. The logical order of the data information stored in the linked list is implemented by the pointer linking order in the linked list. In this embodiment, the time information corresponding to the text information in the to-be-processed video file is used as the logical order of the linked list, which is The text information and view information in the to-be-processed video file are stored in the linked list by using the time information as the pointer linking order. By storing the text information and view information in chronological order as the logical order of the linked list, the user can obtain the speaker's information list in chronological order through the linked list, where the information stored in the linked list has the characteristic that it cannot be deleted.

In one embodiment, as shown in FIG. 4, step S140 includes sub-steps S141, S142, and S143.

S141. Acquire time information corresponding to the text information in the to-be-processed video file.

Obtain the time information corresponding to the text information in the to-be-processed video file. Since each video segment to be stored contains its corresponding time information in the video file to be processed, and the text information corresponds to the video segment to be stored in the video file to be processed one by one, therefore, by obtaining the corresponding to the text information to be stored The time information of the video segment in the to-be-processed video file can obtain the corresponding time information of the text information in the to-be-processed video file.

For example, if the time information of a video segment to be stored in the video file to be processed is "1 minute 20 seconds to 3 minutes 10 seconds", then "1 minute 20 seconds to 3 minutes 10 seconds" will be used as the The corresponding time information of the corresponding text information in the to-be-processed video file.

S142: According to the time information of the text information and the corresponding speaker, the text information is stored in the linked list corresponding to the speaker.

The text information is stored in the linked list according to the time information of the text information and the corresponding speaker. Each piece of text information corresponds to a speaker. The speaker can obtain a linked list corresponding to the speaker from the preset database, and the text information is stored in the linked list using the time information of the text information as the logical sequence of the linked list, and Add the speaker corresponding to the text message to the stored text message.

S143. Insert the view information corresponding to the text information into the stored text information in the linked list to save the view information.

Insert the view information corresponding to the text information into the stored text information in the linked list to save the view information. Since the text information intercepts the view information corresponding to the text information from the corresponding video segment to be stored, the view information corresponding to the text information can be inserted into the stored text information in the linked list through the correspondence between the text information and the view information.

In an embodiment, as shown in FIG. 5, step S150 is further included after step S140.

S150. Generate index information corresponding to the text information according to the number information of the video file to be processed and the video time stamp and store it in the database.

The index information corresponding to the text information is generated according to the number information of the video file to be processed and the video time stamp and stored in the database. In order to facilitate the search of the text information and view information stored in the linked list, the index information corresponding to the text information can be generated according to the number information and video time stamp of the video file to be processed, and one video file to be processed can correspond to one or more text Information, you need to generate one or more index information correspondingly. By storing the index information in the database, you can greatly facilitate the user to quickly search the data stored in the linked list, and improve the efficiency of searching the data in the linked list.

For example, if certain text information is the third text of the video file to be processed in Table 1, and the corresponding time information of the text information in the video file to be processed is "1 minute 20 seconds to 3 minutes 10 seconds", then the corresponding index is generated The information is "S10021-3, 2018.04.11-1 minutes 20 seconds to 3 minutes 10 seconds".

By cutting the video file to be processed, and obtaining the text information and corresponding view information in the video segment to be stored, the text information and the view information are saved in the linked list to realize the lightweight storage of the video file to be processed without losing the video In the case of important information in the file, the storage space required for the video file is greatly reduced, and very good results have been achieved in the actual application process.

An embodiment of the present application further provides a video information chain storage device, which is used to execute any embodiment of the foregoing video information chain storage method. Specifically, please refer to FIG. 6, which is a schematic block diagram of a video information chain storage device provided by an embodiment of the present application. The video information chain storage device can be configured in terminal devices such as desktop computers, notebook computers, tablet computers or mobile phones.

As shown in FIG. 6, the video information chain storage device 100 includes a video file cutting unit 110, a voice information recognition unit 120, a view information acquisition unit 130, and an information storage unit 140.

The video file cutting unit 110 is used for obtaining a video file to be processed, and cutting the video file to be processed through a video cutting model to obtain multiple video segments to be stored.

In other application embodiments, as shown in FIG. 7, the video file cutting unit 110 includes subunits: a switching time point acquiring unit 111 and a cutting processing unit 112.

The switching time point acquiring unit 111 is configured to obtain the speaker switching time point through the video cutting model and the speaker information in the to-be-processed video file.

The cutting processing unit 112 is configured to cut the video file to be processed according to the speaker switching time point in the video file to be processed to obtain the video segment to be stored corresponding to each speaker.

The voice information recognition unit 120 is configured to recognize the voice information in the obtained multiple video segments to be stored according to a preset voice recognition model to obtain text information corresponding to the speaker.

In other application embodiments, as shown in FIG. 8, the voice information recognition unit 120 includes subunits: a phoneme segmentation unit 121, a phonetic information acquisition unit 122 and a text information acquisition unit 123.

The phoneme segmentation unit 121 is configured to segment the voice information according to the acoustic model in the voice recognition model to obtain multiple phonemes contained in the voice information.

The pinyin information acquisition unit 122 is configured to match the obtained phonemes according to the phonetic feature dictionary in the voice recognition model to convert all phonemes into pinyin information.

The text information acquiring unit 123 is configured to perform semantic analysis on the obtained pinyin information according to the semantic analysis model in the speech recognition model to convert the pinyin information into text information.

The view information obtaining unit 130 is used to intercept the view information corresponding to the text information from the video segment to be stored.

The information storage unit 140 is configured to store the obtained text information and view information in a preset database in a linked list corresponding to the speaker according to the speaker corresponding to the text information.

In other application embodiments, as shown in FIG. 9, the information storage unit 140 includes subunits: a time information acquisition unit 141, a text information storage unit 142, and a view information storage unit 143.

The time information obtaining unit 141 is used to obtain the time information corresponding to the text information in the to-be-processed video file.

The text information storage unit 142 is configured to store the text information in a linked list corresponding to the speaker based on the time information of the text information and the corresponding speaker.

The view information storage unit 143 is configured to insert the view information corresponding to the text information into the stored text information in the linked list to save the view information.

In other application embodiments, as shown in FIG. 10, the video information chain storage device 100 further includes a subunit: an index information storage unit 150.

The index information storage unit 150 is configured to generate index information corresponding to the text information according to the number information of the video file to be processed and the video time stamp and store it in the database.

The above-mentioned video information chain storage device may be implemented in the form of a computer program, and the computer program may run on a computer device as shown in FIG. 11.

Please refer to FIG. 11, which is a schematic block diagram of a computer device provided by an embodiment of the present application.

Referring to FIG. 11, the computer device 500 includes a processor 502, a memory, and a network interface 505 connected through a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504. The non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032. When the computer program 5032 is executed, the processor 502 can execute the video information chain storage method. The processor 502 is used to provide computing and control capabilities and support the operation of the entire computer device 500. The internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503. When the computer program 5032 is executed by the processor 502, the processor 502 can execute the video information chain storage method. The network interface 505 is used for network communication, such as the transmission of data information. Those skilled in the art can understand that the structure shown in FIG. 11 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied. The specific computer device 500 may include more or less components than shown in the figure, or combine certain components, or have a different arrangement of components.

Wherein, the processor 502 is used to run the computer program 5032 stored in the memory to implement the video information chain storage method of the present application.

Those skilled in the art can understand that the embodiment of the computer device shown in FIG. 11 does not constitute a limitation on the specific configuration of the computer device. In other embodiments, the computer device may include more or fewer components than shown in the figure. Or combine certain components, or arrange different components. For example, in some embodiments, the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are consistent with the embodiment shown in FIG. 11, and details are not described herein again.

It should be understood that in the embodiment of the present application, the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), Application specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor.

In another embodiment of the present application, a computer-readable storage medium is provided. The computer-readable storage medium may be a non-volatile computer-readable storage medium. The computer-readable storage medium stores a computer program, where the computer program is executed by the processor to implement the video information chain storage method of the embodiments of the present application.

The storage medium may be an internal storage unit of the foregoing device, such as a hard disk or a memory of the device. The storage medium may also be an external storage device of the device, such as a plug-in hard disk equipped on the device, a smart memory card (Smart) Card (SMC), a secure digital (SD) card, or a flash memory card (Flash Card) etc. Further, the storage medium may also include both an internal storage unit of the device and an external storage device.

Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, the specific working processes of the devices, devices, and units described above can refer to the corresponding processes in the foregoing method embodiments, and are not repeated here.

The above is only the specific implementation of this application, but the scope of protection of this application is not limited to this, any person skilled in the art can easily think of various equivalents within the technical scope disclosed in this application Modifications or replacements, these modifications or replacements should be covered within the scope of protection of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

A video information chain storage method, including:

Obtain the video file to be processed, and cut the video file to be processed through the video cutting model to obtain multiple video segments to be stored;

Recognize the voice information in the obtained multiple video segments to be stored according to a preset voice recognition model to obtain text information corresponding to the speaker;

Intercept view information corresponding to text information from the video segment to be stored;

According to the speaker, the obtained text information and view information are stored in a preset database in a linked list corresponding to the speaker.
The video information chain storage method according to claim 1, wherein the video file to be processed is cut by the video cutting model to obtain a plurality of video segments to be stored, including:

Obtain the speaker switching time through the video cutting model and the speaker information in the video file to be processed;

The video file to be processed is cut according to the time point when the speaker switches in the video file to be processed to obtain the video segment to be stored corresponding to each speaker.
The video information chain storage method according to claim 1, wherein the voice information in the obtained multiple video segments to be stored is recognized according to a preset voice recognition model to obtain text information corresponding to the speaker ,include:

Dividing the voice information according to the acoustic model in the voice recognition model to obtain multiple phonemes contained in the voice information;

Match the obtained phonemes according to the speech feature dictionary in the speech recognition model to convert all phonemes into pinyin information;

According to the semantic analysis model in the speech recognition model, the obtained pinyin information is semantically analyzed to convert the pinyin information into text information.
The video information chain storage method according to claim 1, wherein the speaker corresponding to the text information stores the obtained text information and view information in a preset database in a linked list corresponding to the speaker, including:

Obtain the time information corresponding to the text information in the pending video file;

According to the time information of the text information and the corresponding speaker, the text information is stored in the linked list corresponding to the speaker;

Insert the view information corresponding to the text information into the stored text information in the linked list to save the view information.
The video information chain storage method according to claim 1, wherein after the speaker corresponding to the text information stores the obtained text information and view information in a preset database in a linked list corresponding to the speaker, further include:

The index information corresponding to the text information is generated according to the number information of the video file to be processed and the video time stamp and stored in the database.
The video information chain storage method according to claim 2, wherein the speaker switching time point obtained by acquiring the speaker information in the video cutting model and the to-be-processed video file includes:

Determine whether the speaker information contains only one speaker;

If the speaker information does not include only one speaker, obtain the speaker switching time point in the to-be-processed video file according to the video cutting model.
The video information chain storage method according to claim 1, wherein the view information is a piece of video or a picture.
A video information chain storage device, including:

The video file cutting unit is used to obtain the video file to be processed, and the video file to be processed is cut through the video cutting model to obtain multiple video segments to be stored;

The voice information recognition unit is used to recognize the voice information in the obtained multiple video segments to be stored according to a preset voice recognition model to obtain text information corresponding to the speaker;

The view information acquisition unit is used to intercept the view information corresponding to the text information from the video segment to be stored;

The information storage unit is configured to store the obtained text information and view information in a linked list corresponding to the speaker in the preset database according to the speaker corresponding to the text information.
The video information chain storage device according to claim 8, wherein the video file cutting unit includes:

A switching time point acquisition unit, which is used to obtain the speaker switching time point through the video cutting model and the speaker information in the video file to be processed;

The cutting processing unit is configured to cut the video file to be processed according to the time point when the speaker switches in the video file to be processed to obtain the video segment to be stored corresponding to each speaker.
The video information chain storage device according to claim 8, wherein the voice information recognition unit includes:

The phoneme segmentation unit is used to segment the voice information according to the acoustic model in the voice recognition model to obtain multiple phonemes contained in the voice information;

The Pinyin information acquisition unit is used to match the obtained phonemes according to the speech feature dictionary in the speech recognition model to convert all phonemes into Pinyin information;

The text information acquisition unit is used to perform semantic analysis on the obtained pinyin information according to the semantic analysis model in the speech recognition model to convert the pinyin information into text information.
A computer device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the following steps when executing the computer program:

Obtain the video file to be processed, and cut the video file to be processed through the video cutting model to obtain multiple video segments to be stored;

Recognize the voice information in the obtained multiple video segments to be stored according to a preset voice recognition model to obtain text information corresponding to the speaker;

Intercept view information corresponding to text information from the video segment to be stored;

According to the speaker, the obtained text information and view information are stored in a preset database in a linked list corresponding to the speaker.
The computer device according to claim 11, wherein the video file to be processed is cut by the video cutting model to obtain a plurality of video segments to be stored, including:

Obtain the speaker switching time through the video cutting model and the speaker information in the video file to be processed;

The video file to be processed is cut according to the time point when the speaker switches in the video file to be processed to obtain the video segment to be stored corresponding to each speaker.
The computer device according to claim 11, wherein the recognizing the voice information in the obtained plurality of video segments to be stored according to a preset voice recognition model to obtain text information corresponding to the speaker includes:

Dividing the voice information according to the acoustic model in the voice recognition model to obtain multiple phonemes contained in the voice information;

Match the obtained phonemes according to the speech feature dictionary in the speech recognition model to convert all phonemes into pinyin information;

According to the semantic analysis model in the speech recognition model, the obtained pinyin information is semantically analyzed to convert the pinyin information into text information.
The computer device according to claim 11, wherein the speaker corresponding to the text information stores the obtained text information and view information in a preset database in a linked list corresponding to the speaker, including:

Obtain the time information corresponding to the text information in the pending video file;

According to the time information of the text information and the corresponding speaker, the text information is stored in the linked list corresponding to the speaker;

Insert the view information corresponding to the text information into the stored text information in the linked list to save the view information.
The computer device according to claim 11, wherein after the speaker corresponding to the text information stores the obtained text information and view information in a preset database and a linked list corresponding to the speaker, the method further comprises:

The index information corresponding to the text information is generated according to the number information of the video file to be processed and the video time stamp and stored in the database.
The computer device according to claim 12, wherein the obtaining the speaker switching time point through the video cutting model and the speaker information in the to-be-processed video file includes:

Determine whether the speaker information contains only one speaker;

If the speaker information does not include only one speaker, obtain the speaker switching time point in the to-be-processed video file according to the video cutting model.
The computer device according to claim 11, wherein the view information is a piece of video or a picture.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, which when executed by a processor causes the processor to perform the following operations:

Obtain the video file to be processed, and cut the video file to be processed through the video cutting model to obtain multiple video segments to be stored;

Recognize the voice information in the obtained multiple video segments to be stored according to a preset voice recognition model to obtain text information corresponding to the speaker;

Intercept view information corresponding to text information from the video segment to be stored;

According to the speaker, the obtained text information and view information are stored in a preset database in a linked list corresponding to the speaker.
The computer-readable storage medium according to claim 18, wherein the video file to be processed is cut by the video cutting model to obtain a plurality of video segments to be stored, including:

Obtain the speaker switching time through the video cutting model and the speaker information in the video file to be processed;

The video file to be processed is cut according to the time point when the speaker switches in the video file to be processed to obtain the video segment to be stored corresponding to each speaker.
The computer-readable storage medium according to claim 18, wherein the speech information in the obtained plurality of video segments to be stored is recognized according to a preset speech recognition model to obtain text information corresponding to the speaker, include:

Dividing the voice information according to the acoustic model in the voice recognition model to obtain multiple phonemes contained in the voice information;

Match the obtained phonemes according to the speech feature dictionary in the speech recognition model to convert all phonemes into pinyin information;

According to the semantic analysis model in the speech recognition model, the obtained pinyin information is semantically analyzed to convert the pinyin information into text information.