CN113204669B

CN113204669B - Short video search recommendation method, system and storage medium based on voice recognition

Info

Publication number: CN113204669B
Application number: CN202110635741.7A
Authority: CN
Inventors: 孔祥兰
Original assignee: Yi Texinfang Shenzhen Technology Co ltd
Current assignee: Yi Texinfang Shenzhen Technology Co ltd
Priority date: 2021-06-08
Filing date: 2021-06-08
Publication date: 2022-12-06
Anticipated expiration: 2041-06-08
Also published as: CN113204669A

Abstract

The invention discloses a short video search recommendation method, a system and a computer storage medium based on voice recognition. The short video search recommendation method based on the voice recognition comprises the following steps: carrying out platform import on the voice information corresponding to the short video to be searched by the user; converting the voice information input by the user into a text format which can be recognized by a platform; sending a search mode selection instruction to a search interface corresponding to the user to acquire a search mode selected by the user; processing the voice text information corresponding to the user; acquiring information corresponding to each short video of the platform; matching and screening the processed voice text information and the information corresponding to the short video; and then, by carrying out mode-based search processing and analysis on the voice input by the user, the problem that the search mode of the conventional short video search recommendation method is limited is effectively solved, and the search recommendation efficiency corresponding to the short video to be searched by the user is greatly improved.

Description

Short video search recommendation method, system and storage medium based on voice recognition

Technical Field

The invention belongs to the technical field of video search recommendation, and relates to a short video search recommendation method and system based on voice recognition and a computer storage medium.

Background

With the rapid development of the internet of things industry, the propagation mode of the content of the internet of things also gradually becomes diversified, the short video with rich content in the playing time period also gradually becomes one of the popular propagation modes of the content of the internet of things, and in order to improve the watching experience of a user, the short video searched by the user needs to be accurately recommended.

The existing short video search recommendation method mainly aims at analyzing and video recommending characters input by a user, but because information contained in short videos is diversified, and the simple characters cannot realize accurate recommendation and search of the short videos, the existing short video search recommendation method also has certain disadvantages.

Disclosure of Invention

In view of this, in order to solve the problems in the background art, a method, a system, and a computer storage medium for short video search recommendation based on speech recognition are provided, so as to implement intelligent search and recommendation of short videos;

the purpose of the invention can be realized by the following technical scheme:

the invention provides a short video search recommendation method based on voice recognition in a first aspect, which comprises the following steps:

s1, voice information input: platform import is carried out on the voice information corresponding to the short video to be searched by the user through a voice import module, and then the voice information input by the user is obtained;

s2, voice information conversion: converting the voice information input by the user into a text format which can be recognized by a platform through a voice information conversion module, further acquiring text information corresponding to the voice input by the user, and recording the text information corresponding to the voice input by the user as voice text information;

s3, selecting a search mode: sending a search mode selection instruction to a search interface corresponding to the user through a search mode selection module so as to obtain a search mode corresponding to the user, wherein the search mode comprises fuzzy search and precise search;

s4, text information processing: processing voice text information corresponding to the user through a text information processing module, wherein the text information processing comprises fuzzy search mode voice text information processing and precise search mode voice text information processing;

s5, short video information acquisition: acquiring information corresponding to each short video of the platform through a short video information acquisition module, further acquiring the number of the short videos corresponding to the platform, numbering the short videos corresponding to the platform according to a preset sequence, and sequentially marking the short videos as 1,2,. I,. N;

s6, video matching analysis: matching and screening the processed voice text information and the information corresponding to the short videos through a data processing and analyzing module, and further acquiring the matching degree of each short video of the platform and the voice text information corresponding to the user;

s7, video sending: and sending the video matching analysis result corresponding to the user to a video search interface corresponding to the user through an information sending module.

Preferably, the specific process of the voice information conversion is as follows: according to the voice information input by the user, the voice information input by the user is subjected to filtering and enhancing processing, the processed voice format input by the user is converted into a text format which can be recognized by a platform through a voice recognition technology, and the voice text information corresponding to the user is obtained.

Preferably, the process of fuzzy search mode speech-text information processing comprises the steps of:

a1, according to voice text information corresponding to a user, filtering stop words corresponding to the voice text information, further acquiring processed text information of the user, dividing the processed voice text information into words, further counting the number of the words divided from the text information corresponding to the user, numbering the divided words according to a preset sequence, sequentially marking the divided words as 1,2,. J,. M, further constructing a set of divided words F { F1, F2,. Fj,. Fm }, wherein Fj represents a jth divided word corresponding to the text information corresponding to the user;

a2, comparing and screening each segmentation word in the voice text information, further acquiring the occurrence frequency of each segmentation word in the user text information, and recording the occurrence frequency of each segmentation word in the user text information as word frequency;

a3, according to the frequency corresponding to each division word in the voice text information, further counting the weight corresponding to each division word in the voice text information, wherein the calculation formula is

G _r Representing a weight corresponding to the r-th segmented word in the speech text information, f _r Indicating the frequency corresponding to the r-th segmented word of the speech text information, and F indicating the number of documents stored in the databaseAmount, k _r Representing the number of documents corresponding to the r-th segmentation word contained in the database;

and A4, comparing the weight corresponding to each segmented word with the standard weight corresponding to the keyword respectively according to the weight corresponding to each segmented word, if the weight corresponding to a certain segmented word is greater than the standard weight corresponding to the keyword, marking the segmented word as a target keyword, counting the number of the target keyword, and acquiring the number corresponding to each target keyword subword.

Preferably, the process of processing the accurate search mode speech text information comprises the following steps:

b1, according to the acquired voice text information corresponding to the user, further sending the voice text information corresponding to the user to a mobile terminal corresponding to the user for auditing and confirmation;

b2, the user receives the text information sent by the platform, then checks the text information, if the voice text information has errors, modifies and marks the areas with the errors in the voice text information, and sends the modified and marked text information to the platform;

b3, the platform obtains the modified and marked voice text information fed back by the user, marks the modified and marked voice text information fed back by the user as confirmed text information, further performs word segmentation on the confirmed text information, further obtains each segmented word corresponding to the confirmed text information, numbers each segmented word corresponding to the confirmed text information according to a preset sequence, and marks the segmented words as 1,2,. X,. Y in sequence, further constructs a confirmed text information segmented word set H (H1, H2,. Hx,. Hy), and Hx represents the corresponding x-th segmented word in the confirmed text information;

b4, acquiring weights corresponding to all the segmentation words of the confirmed text information according to a fuzzy search mode and a calculation method of the weights of all the segmentation words of the user voice text information, sequencing the weights corresponding to all the segmentation words in the confirmed text information according to a descending order, extracting the segmentation words with the top five ranked weights in the confirmed text information, marking the segmentation words as candidate keywords, and sending all the extracted candidate keywords corresponding to the confirmed text information to a mobile terminal corresponding to the user;

b5, the user receives the candidate keywords sent by the platform, confirms the received candidate keywords and marks the confirmed candidate keywords as confirmation keywords;

b6, the platform receives the confirmation keywords fed back by the user, counts the number of the confirmation keywords fed back by the user, numbers the confirmation keywords fed back by the user according to a preset sequence, marks the number as 1,2,. U,. V in sequence, counts the weight corresponding to each confirmation keyword according to a calculation method of the weight of each segmented word, and constructs a weight set Q (Q1, Q2,. Qu,. Qv) of each confirmation keyword, wherein the Qu represents the weight corresponding to the u-th confirmation keyword.

Preferably, the short video information includes the number of keywords corresponding to the short video and the frequency corresponding to each keyword of each short video, the keywords corresponding to each short video are recorded as keywords, each keyword of each short video is matched and screened with the documents stored in the database, the number of documents containing each keyword of each short video in the database is further obtained, the weight corresponding to each keyword of each short video is further obtained, the keywords corresponding to each short video are numbered according to a preset sequence, and the keywords are sequentially marked as 1,2,. K,. H.

Preferably, the video matching analysis is used for performing video matching analysis on the speech text information processed by the fuzzy search mode, and the specific analysis process is as follows: processing the number of the obtained target keywords corresponding to the voice text information according to a fuzzy search mode, then matching and screening each target keyword corresponding to the voice text information and each keyword corresponding to each short video, further counting the number of the keywords corresponding to each short video and the target keywords corresponding to the voice text information, and further counting the word matching degree corresponding to each short video, wherein the calculation formula is that

λ _d Indicates that the d short video corresponding key word corresponds to the voice text informationDegree of matching of target keyword, e _d G representing the number of the keywords corresponding to the d-th short video and the target keywords corresponding to the voice text information _d Denotes the number of keywords corresponding to the d-th short video, d denotes the short video number, d =1, 2.

Preferably, the video matching analysis is used for performing video matching analysis on the speech text information processed by the accurate search mode, and the specific analysis process is as follows:

c1, according to the number of confirmation keywords corresponding to the confirmation text information and the weight corresponding to each confirmation keyword which are obtained by the accurate search and acquisition mode processing, converting each confirmation keyword of the confirmation text information into a vector form according to the weight corresponding to each confirmation keyword, and constructing a confirmation text information vector set L (L1, L2, lu.. Lv), wherein Lu represents a vector corresponding to the u-th confirmation keyword of the confirmation text information;

c2, simultaneously obtaining the number corresponding to the keywords corresponding to each short video and the weight corresponding to each keyword of each short video, converting each keyword of each short video into a vector form, and constructing each short video vector set L' _d (L′ _d 1,L′ _d 2,...L′ _d k,...L′ _d h)，L′ _d k represents a vector corresponding to the kth keyword of the d-th short video;

c3, according to the confirmed text information vector set and each short video vector set, further counting the comprehensive matching degree corresponding to each short video, wherein the calculation formula is

γ _d Indicating the comprehensive matching degree corresponding to the d-th short video, and Lt indicating the vector, L ', corresponding to the t-th keyword of the confirmed text information' _d t 'represents a vector corresponding to the t' th keyword of the d-th short video, t represents a confirmation keyword number, t =1,2,. U,. V,. T 'represents each short video keyword number, t' =1,2,. K,. H;

and C4, according to the counted comprehensive matching degrees corresponding to the short videos, sequencing the comprehensive matching degrees corresponding to the short videos in a descending order, further extracting the short video with the first comprehensive matching degree rank, using the short video as the preferred recommended short video, and further extracting the number corresponding to the preferred recommended short video.

Preferably, the video sending includes fuzzy search mode video sending and precise search mode video sending, when the search mode selected by the user is fuzzy search, video links corresponding to the short videos are further obtained and generated according to the recommendation sequence corresponding to the short videos obtained through fuzzy search, the video links corresponding to the short videos are sequentially sent to the search interface corresponding to the user according to the recommendation sequence corresponding to the short videos, when the search mode selected by the user is precise search, video links corresponding to the preferred recommended short videos are generated according to the number corresponding to the preferred recommended short videos, and the video links corresponding to the preferred recommended short videos are sent to the search interface corresponding to the user to complete video sending.

The invention provides a short video search recommendation system based on voice recognition, wherein the data processing and analyzing module is respectively connected with the text information processing module, the short video information acquisition module, the database and the information sending module, the voice information conversion module is respectively connected with the voice importing module and the search mode selection module, and the text information processing module is connected with the search mode selection module.

A third aspect of the present invention provides a computer storage medium, in which a computer program is burned, and when the computer program runs in a memory of a server, the computer program implements the method according to any one of the above-mentioned embodiments.

The invention has the beneficial effects that:

(1) According to the short video search recommendation method based on voice recognition, the voice input by the user is subjected to mode-division search processing and analysis, the problem that the search mode of the existing short video search recommendation method is limited and the accuracy of short video search cannot be improved is effectively solved, the search recommendation efficiency corresponding to the short video to be searched by the user is effectively improved, and meanwhile, the search experience of the user is greatly improved.

(2) According to the method, the fuzzy search mode and the accurate search mode are selected in the search mode, so that different search requirements of a user are greatly met, and meanwhile, the relevance between the voice input by the user and the video to be searched is greatly improved.

(3) In the embodiment of the invention, the voice format input by the user is converted into the text format which can be recognized by the platform in the voice information conversion, so that a powerful information basis is further provided for the subsequent matching and searching of the short video to be searched by the user, and the efficiency of short video searching and the reference of the searching result are further greatly improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a diagram of the steps of the method of the present invention;

FIG. 2 is a schematic diagram of the system module connection according to the present invention.

Detailed Description

While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Referring to fig. 1, a first aspect of the present invention provides a short video search recommendation method based on speech recognition, including the following steps:

the specific process of the voice information conversion is as follows: according to the voice information input by the user, the voice information input by the user is subjected to filtering and enhancing processing, the processed voice format input by the user is converted into a text format which can be recognized by a platform through a voice recognition technology, and the voice text information corresponding to the user is obtained.

The embodiment of the invention converts the voice format input by the user into the text format which can be recognized by the platform, thereby providing a powerful information basis for the subsequent matching and searching of the short video to be searched by the user, and further greatly improving the efficiency of short video searching and the referential of the searched result.

according to the embodiment of the invention, different search requirements of users are greatly met by setting the fuzzy search mode and the accurate search mode, and meanwhile, the relevance between the voice input by the user and the video to be searched is greatly improved, wherein the fuzzy search mode can greatly expand the search range to a certain extent and can meet the search requirements of the users on similar expanded short videos, and the accurate search mode greatly improves the accuracy of corresponding recommendation on the short videos to be searched by the users by carrying out secondary confirmation on the processed voice text information.

specifically, the process of processing the fuzzy search mode voice text information comprises the following steps:

a1, according to voice text information corresponding to the user, filtering stop words corresponding to the voice text information, further acquiring processed text information of the user, dividing the processed voice text information into words, further counting the number of the words divided from the text information corresponding to the user, numbering the divided words according to a preset sequence, and sequentially marking the divided words as 1,2,. J,. M, further constructing a set of divided words F { F1, F2,. F j,. Fj,. Fm }, wherein Fj represents the jth divided word corresponding to the text information corresponding to the user;

G _r Representing a weight corresponding to the r-th segmented word in the speech text information, f _r Representing the frequency corresponding to the r-th segmented word of the speech text information, F representing the number of documents stored in the database, k _r Representing the number of documents corresponding to the r-th segmentation word contained in the database;

Specifically, the process of processing the accurate search mode speech text information includes the following steps:

the modification mode of the user for modifying the voice text information comprises voice modification, replacement modification and manual modification.

In the specific embodiment, the voice modification mode carries out re-input of voice by a user in a sentence of a text area with an error, the replacement modification is to carry out text retrieval and replacement on a word with an error, and the manual modification is to delete and modify the error area.

B3, the platform acquires modified and marked voice text information fed back by the user, marks the modified and marked voice text information fed back by the user as confirmed text information, performs word segmentation on the confirmed text information, acquires each segmented word corresponding to the confirmed text information, numbers each segmented word corresponding to the confirmed text information according to a preset sequence, and marks the segmented words as 1,2,... X.. Y in sequence, so as to construct a confirmed text information segmented word set H (H1, H2,... Hx.. Hy), wherein Hx represents the corresponding x-th segmented word in the confirmed text information;

When the voice text information corresponding to the user is processed, the voice text information is subjected to word segmentation and keyword extraction, so that an important information basis is provided for subsequent matching search of the short videos, and the search efficiency and the search accuracy of the short videos to be searched of the user are greatly improved.

S5, short video information acquisition: acquiring information corresponding to each short video of the platform through a short video information acquisition module, further acquiring the quantity of the short videos corresponding to the platform, numbering the short videos corresponding to the platform according to a preset sequence, and marking the short videos as 1,2,. I,. N in sequence;

specifically, the short video information includes the number of keywords corresponding to the short video and the frequency corresponding to each keyword of each short video, the keywords corresponding to each short video are recorded as keywords, each keyword of each short video is matched and screened with a document stored in a database, the number of documents containing each keyword of each short video in the database is further obtained, the weight corresponding to each keyword of each short video is further obtained, the keywords corresponding to each short video are numbered according to a preset sequence, and the keywords are sequentially marked as 1,2,. K,. H.

specifically, the video matching analysis is used for performing video matching analysis on the speech text information processed by the fuzzy search mode, and the specific analysis process is as follows: processing the number of the obtained target keywords corresponding to the voice text information according to a fuzzy search mode, then matching and screening each target keyword corresponding to the voice text information and each keyword corresponding to each short video, further counting the number of the keywords corresponding to each short video and the target keywords corresponding to the voice text information, and further counting the word matching degree corresponding to each short video, wherein the calculation formula is that

λ _d Representing the matching degree of the corresponding keywords of the d-th short video and the corresponding target keywords of the voice text information, e _d G representing the number of the keywords corresponding to the d-th short video and the target keywords corresponding to the voice text information _d Denotes the number of keywords corresponding to the d-th short video, d denotes the short video number, d =1, 2.

Specifically, the video matching analysis is used for performing video matching analysis on the speech text information processed in the accurate search mode, and the specific analysis process is as follows:

c2, simultaneously obtaining the number corresponding to the key words of each short video and the weight corresponding to each key word of each short video, and converting each key word of each short video intoVector form and construct each short video vector set L' _d (L′ _d 1,L′ _d 2,...L′ _d k,...L′ _d h)，L′ _d k represents a vector corresponding to the kth keyword of the d-th short video;

γ _d Indicating the comprehensive matching degree corresponding to the d-th short video, and Lt indicating the vector, L ', corresponding to the t-th keyword of the confirmed text information' _d t ' represents a vector corresponding to the tth key of the d-th short video, t represents a confirmation key number, t =1,2,... U,... V, t ' represents each short video key number, t ' =1,2,... K,... H;

According to the embodiment of the invention, through carrying out video matching analysis on the fuzzy search mode and the accurate search mode, the problem that the search mode of the conventional short video search recommendation method has limitation and the accuracy of short video search cannot be improved is effectively solved, so that the search recommendation efficiency corresponding to the short video to be searched by the user is effectively improved, and meanwhile, the search experience of the user is greatly improved.

Specifically, the video sending includes fuzzy search mode video sending and precise search mode video sending, when the search mode selected by the user is fuzzy search, video links corresponding to the short videos are further obtained and generated according to the recommendation sequence corresponding to the short videos obtained through fuzzy search, the video links corresponding to the short videos are sequentially sent to the search interface corresponding to the user according to the recommendation sequence corresponding to the short videos, when the search mode selected by the user is precise search, video links corresponding to the preferred recommended short videos are generated according to the number corresponding to the preferred recommended short videos, and the video links corresponding to the preferred recommended short videos are sent to the search interface corresponding to the user to complete video sending.

Referring to fig. 2, in a second aspect, the present invention provides a short video search recommendation system based on speech recognition, in which the data processing and analyzing module is respectively connected to the text information processing module, the short video information obtaining module, the database and the information sending module, the speech information converting module is respectively connected to the speech importing module and the search mode selecting module, and the text information processing module is connected to the search mode selecting module.

The database is used for storing the text information of each type and the standard weight corresponding to the key words.

In the embodiment of the invention, each type of text information comprises scientific text information, geographic text information and the like, so that the database and the stored text types are numbered according to a preset sequence and are sequentially marked as 1,2,. P,. Q, the quantity corresponding to each type of text information is further obtained, the text information corresponding to each type is stored, and the keyword extraction of the voice text information corresponding to the user is facilitated, and the more the text types and the quantity of each type of text are, the higher the keyword extraction accuracy is.

A third aspect of the present invention provides a computer storage medium, where a computer program is recorded on the computer storage medium, and when the computer program runs in a memory of a server, the computer program implements any one of the methods described above.

The foregoing is illustrative and explanatory only of the present invention, and it is intended that the present invention cover modifications, additions, or substitutions by those skilled in the art, without departing from the spirit of the invention or exceeding the scope of the claims.

Claims

1. A short video search recommendation method based on voice recognition is characterized in that: the method comprises the following steps:

s1, voice information input: performing platform import on voice information corresponding to short videos to be searched by a user through a voice import module so as to acquire the voice information input by the user;

s7, video sending: sending the video matching analysis result corresponding to the user to a video searching interface corresponding to the user through an information sending module;

the video matching analysis is used for carrying out video matching analysis on the voice text information processed by the accurate search mode, and the specific analysis process is as follows:

c1, processing the number of the obtained confirmation keywords corresponding to the confirmation text information and the weight corresponding to each confirmation keyword according to the accurate search obtaining mode, further converting each confirmation keyword of the confirmation text information into a vector form according to the weight corresponding to each confirmation keyword, and constructing a confirmation text information vector set L (L1, L2, lu.. Lv), wherein Lu represents a vector corresponding to the u-th confirmation keyword of the confirmation text information;

c4, according to the counted comprehensive matching degrees corresponding to the short videos, sequencing the comprehensive matching degrees corresponding to the short videos in a descending order, further extracting the short video with the comprehensive matching degree ranked first, taking the short video as the preferred recommended short video, and further extracting the number corresponding to the preferred recommended short video;

the process of processing the fuzzy search mode speech text information comprises the following steps:

a3, according to the frequency corresponding to each segmentation word in the voice text information, further counting the weight corresponding to each segmentation word in the voice text information, wherein the calculation formula is

a4, comparing the weight corresponding to each segmented word with the standard weight corresponding to the keyword respectively according to the weight corresponding to each segmented word, marking the segmented word as a target keyword if the weight corresponding to a certain segmented word is greater than the standard weight corresponding to the keyword, counting the number of the target keyword, and acquiring the number corresponding to each target key sub-word;

the process of processing the accurate search mode speech text information comprises the following steps:

b6, the platform receives the confirmation keywords fed back by the user, counts the number of the confirmation keywords fed back by the user, numbers the confirmation keywords fed back by the user according to a preset sequence, sequentially marks the confirmation keywords as 1,2,. U,. V, counts the weight corresponding to each confirmation keyword according to a calculation method of each segmentation word weight, and constructs a weight set Q (Q1, Q2,. Qu.. Qv) of each confirmation keyword, wherein the Qu represents the weight corresponding to the u-th confirmation keyword.

2. The method of claim 1, wherein the short video search recommendation method based on speech recognition is characterized in that: the specific process of the voice information conversion is as follows: according to the voice information input by the user, the voice information input by the user is subjected to filtering and enhancing processing, the processed voice format input by the user is converted into a text format which can be recognized by a platform through a voice recognition technology, and the voice text information corresponding to the user is obtained.

3. The method of claim 1, wherein the short video search recommendation method based on speech recognition is characterized in that: the short video information comprises the number of keywords corresponding to short videos and the frequency corresponding to each keyword of each short video, the keywords corresponding to each short video are recorded as keywords, each keyword of each short video is matched and screened with a document stored in a database, the number of documents containing each keyword of each short video in the database is further obtained, the weight corresponding to each keyword of each short video is further obtained, the keywords corresponding to each short video are numbered according to a preset sequence, and the keywords are sequentially marked as 1,2,.

4. The short video search recommendation method based on speech recognition according to claim 1, characterized in that: the video matching analysis is used for carrying out video matching analysis on the voice text information processed by the fuzzy search mode, and the specific analysis process comprises the following steps: processing the number of the obtained target keywords corresponding to the voice text information according to a fuzzy search mode, then matching and screening each target keyword corresponding to the voice text information and each keyword corresponding to each short video, further counting the number of the keywords corresponding to each short video and the target keywords corresponding to the voice text information, and further counting the word matching degree corresponding to each short video, wherein the calculation formula is that

λ _d Indicating a d-th short video correspondenceDegree of matching of the keyword with the target keyword corresponding to the speech text information, e _d G, the number of the keywords corresponding to the d-th short video is the same as the target keywords corresponding to the voice text information _d Denotes the number of keywords corresponding to the d-th short video, d denotes the short video number, d =1, 2.

5. The short video search recommendation method based on speech recognition according to claim 1, characterized in that: the video sending comprises fuzzy retrieval mode video sending and accurate retrieval mode video sending, when a search mode selected by a user is fuzzy search, video links corresponding to the short videos are obtained and generated according to a recommendation sequence corresponding to the short videos obtained by the fuzzy search, the video links corresponding to the short videos are sent to a search interface corresponding to the user in sequence according to the recommendation sequence corresponding to the short videos, when the search mode selected by the user is accurate search, video links corresponding to the preferentially recommended short videos are generated according to numbers corresponding to the preferentially recommended short videos, the video links corresponding to the preferentially recommended short videos are sent to the search interface corresponding to the user, and then the video links corresponding to the preferentially recommended short videos are sent to the search interface corresponding to the user to complete video sending.

6. A short video search recommendation system based on speech recognition, for performing a short video search recommendation method based on speech recognition according to any one of claims 1-5, wherein: the data processing and analyzing module is respectively connected with the text information processing module, the short video information acquisition module, the database and the information sending module, the voice information conversion module is respectively connected with the voice importing module and the search mode selecting module, and the text information processing module is connected with the search mode selecting module.

7. A computer storage medium, wherein a computer program is burned on the computer storage medium, and when the computer program runs in a memory of a server, the computer program realizes the method of any one of the above claims 1 to 5.