CN116383416A

CN116383416A - Method and device for requesting multimedia resources

Info

Publication number: CN116383416A
Application number: CN202310288305.6A
Authority: CN
Inventors: 罗嗣梧
Original assignee: Beijing Eswin Computing Technology Co Ltd
Current assignee: Beijing Eswin Computing Technology Co Ltd
Priority date: 2023-03-22
Filing date: 2023-03-22
Publication date: 2023-07-04

Abstract

The application provides a method and a device for requesting multimedia resources, and relates to the technical field of intelligent searching. The on-demand method of the multimedia resource comprises the following steps: performing voice recognition on the multimedia on-demand voice to obtain pinyin information of the multimedia on-demand voice, and acquiring key pinyin for searching from the pinyin information; determining a search object type corresponding to the key pinyin, and acquiring an object name matched with the search object type and a target mapping dictionary between the pinyin of the object name; searching in the target mapping dictionary based on the key pinyin to obtain a target multimedia resource matched with the multimedia on-demand voice for playing. In the embodiment of the application, the search types based on the key pinyin are used for acquiring different target mapping dictionaries, the target multimedia resources are searched based on the target mapping dictionaries, and the search efficiency and the accuracy are high.

Description

Method and device for requesting multimedia resources

Technical Field

The application relates to the technical field of intelligent search, in particular to a multimedia resource on-demand method and a device thereof.

Background

The speech recognition ASR (Auto Speech Recognition) is a technology for converting a speech signal into a corresponding text by using a machine, the speech recognition ASR technology is divided into a traditional speech recognition ASR and a deep learning-based speech recognition ASR, the deep learning-based speech recognition ASR is currently more commonly used, that is, the deep learning-based speech recognition ASR is used for training an ASR model to perform speech recognition, and the deep learning-based method is used for training the ASR model to perform recognition, so that the recognition effect of the recognition is very dependent on data, and therefore one of the difficulties of the deep learning-based speech recognition ASR is the recognition of unregistered words, which are words not encountered in the training process of the ASR model.

Under the video-on-demand scene, the speech recognition ASR in the speech search technology generally recognizes speech as Chinese characters, and for the speech search of unusual film and television names and person names, the speech recognition ASR technology cannot accurately convert the words into corresponding texts, so that accurate search results cannot be matched in the speech search process; and when new video resources are recorded, if higher accuracy is required to be maintained, new voice data are required to be continuously collected and trained, so that time is consumed, and cost is increased.

Disclosure of Invention

The embodiment of the application provides a method and a device for requesting multimedia resources.

An embodiment of a first aspect of the present application provides a method for on-demand multimedia resources, including:

performing voice recognition on the multimedia on-demand voice to obtain pinyin information of the multimedia on-demand voice, and acquiring key pinyin for searching from the pinyin information;

determining a search object type corresponding to the key pinyin, and acquiring an object name matched with the search object type and a target mapping dictionary between the pinyin of the object name;

searching in the target mapping dictionary based on the key pinyin to obtain a target multimedia resource matched with the multimedia on-demand voice for playing.

In one embodiment of the present application, the obtaining a target mapping dictionary between an object name matching the search object type and pinyin of the object name includes:

when the search object type indicates that the search object is a multimedia resource, a first mapping dictionary between the names of the candidate multimedia resources and the name pinyin of the candidate multimedia resources is obtained and used as the target mapping dictionary; or alternatively, the process may be performed,

and when the search object type indicates that the search object is a name, acquiring a second mapping dictionary between the candidate name and pinyin of the candidate name as the target mapping dictionary.

In one embodiment of the present application, the searching in the target mapping dictionary based on the key pinyin to obtain a target multimedia resource matched with the multimedia on-demand voice for playing includes:

when the target mapping dictionary is the first mapping dictionary, acquiring word frequency inverse document frequency TF-IDF of name pinyin of candidate multimedia resources in the first mapping dictionary;

determining the name pinyin of the first candidate multimedia resource from the name pinyins of the candidate multimedia resources included in the first mapping dictionary based on the TF-IDF of the name pinyin of the candidate multimedia resources;

Obtaining a screening score of the first candidate multimedia resource according to the name pinyin and the key pinyin of the first candidate multimedia resource;

and selecting the multimedia resource with the highest screening score from the first candidate multimedia resources as the target multimedia resource.

In one embodiment of the present application, the obtaining the screening score of the first candidate multimedia resource according to the name pinyin and the key pinyin of the first candidate multimedia resource includes:

acquiring the similar distance between the name pinyin of the first candidate multimedia resource and the key pinyin;

determining the weight of the first candidate multimedia resource according to the similarity distance of the first candidate multimedia resource and the length of the name pinyin;

and weighting the TF-IDF of the first candidate multimedia resource based on the weight of the first candidate multimedia resource to obtain the screening score of the first candidate multimedia resource.

When the target mapping dictionary is the second mapping dictionary, carrying out reverse maximum matching on the key pinyin in a pinyin word stock to obtain a first person name pinyin, wherein the pinyin word stock comprises candidate person name pinyin and single word pinyin;

determining candidate names with mapping relation with the first name pinyin from the second mapping dictionary as target names;

and determining the target multimedia resource based on the target person name from the candidate multimedia resources.

In one embodiment of the present application, determining the target multimedia asset based on the target person name includes:

acquiring a second candidate multimedia resource associated with the target name from the candidate multimedia resources;

the target multimedia asset is determined based on the second candidate multimedia asset.

In one embodiment of the present application, the obtaining, from the candidate multimedia resources, a second candidate multimedia resource associated with the target name includes:

acquiring a third mapping dictionary between the candidate name and the candidate multimedia resource;

and inquiring at least one or more candidate multimedia resources with a mapping relation with the target name in the candidate multimedia resources based on the third mapping dictionary, and taking the candidate multimedia resources as the second candidate multimedia resources.

An embodiment of a second aspect of the present application provides a device for requesting multimedia resources, including:

the first acquisition module is used for carrying out voice recognition on the multimedia on-demand voice to obtain pinyin information of the multimedia on-demand voice, and acquiring key pinyin for searching from the pinyin information;

the second acquisition module is used for determining a search object type corresponding to the key pinyin and acquiring an object name matched with the search object type and a target mapping dictionary between the pinyin of the object name;

and the playing module is used for searching in the target mapping dictionary based on the key pinyin so as to obtain a target multimedia resource matched with the multimedia on-demand voice for playing.

An embodiment of a third aspect of the present application provides an electronic device, including: an embodiment of the second aspect of the present application provides a device for requesting multimedia resources.

An embodiment of a fourth aspect of the present application proposes an electronic device, including: a processor; a memory for storing the processor-executable instructions; the processor is configured to execute the instructions to implement the method for on-demand multimedia resources according to the embodiment of the first aspect of the present application.

Embodiments of a fifth aspect of the present application provide a non-transitory computer readable storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the method provided by the embodiments of the first aspect of the present application.

Embodiments of a sixth aspect of the present application propose a computer program product comprising a computer program which, when executed by a processor in a communication device, implements the method proposed by the embodiments of the first aspect of the present application.

The technical scheme provided by the embodiment of the application at least brings the following beneficial effects:

in the embodiment of the application, the pinyin information of the multimedia on-demand voice is used for matching recognition, so that errors of recognition by utilizing Chinese characters can be avoided, and the accuracy is higher; meanwhile, the multimedia on-demand voice is converted into pinyin to be directly recognized, so that the efficiency is high; when matching and identifying the target multimedia resource based on the pinyin information, extracting key pinyin, acquiring different target mapping dictionaries according to the type of the search object of the key pinyin, searching based on the corresponding target mapping dictionary, and having more pertinence to the search of the key pinyin, reducing the search range, and having higher search efficiency and higher accuracy.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

fig. 1 is a flow chart of a method for requesting multimedia resources according to an embodiment of the present application;

fig. 2 is a flow chart of another method for requesting multimedia resources according to an embodiment of the present application;

fig. 3 is a flow chart of another method for requesting multimedia resources according to an embodiment of the present application;

fig. 4 is a flow chart of another method for requesting multimedia resources according to an embodiment of the present application;

fig. 5 is a flow chart of another method for requesting multimedia resources according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a multimedia resource on-demand device according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

Fig. 8 is a schematic structural diagram of another electronic device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of another electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with the embodiments of the present application. Rather, they are merely examples of apparatus and methods consistent with aspects of embodiments of the present application as detailed in the accompanying claims.

The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the application. As used in this application in the examples and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in embodiments of the present application to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of embodiments of the present application. The words "if" and "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination", depending on the context.

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the like or similar elements throughout. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present application and are not to be construed as limiting the present application.

It should be noted that, the method for requesting multimedia resources provided in any embodiment of the present application may be executed alone, or may be executed in combination with possible implementation methods in other embodiments, or may be executed in combination with any technical solution in the related art. Since the automatic speech recognition technology (Auto Speech Recognition, ASR) technology is trained based on deep learning, the words which are not encountered in the training process are recorded as the unregistered words, and when the unusual movie names or person names are searched for in the video on demand scene, the speech recognition ASR technology cannot accurately convert the words into corresponding texts, so that accurate search results cannot be obtained in the speech searching process, and therefore one of the difficulties of the speech recognition ASR is the recognition of the unregistered words; meanwhile, when new movie and television multimedia resource records exist, new voice data needs to be continuously collected for training so as to ensure high accuracy of the voice recognition ASR technology, which increases time consumption and cost.

The essence of the voice search is that the voice is converted into text by the voice recognition ASR technology, and then the user's desired result is searched based on the converted text, and the voice is directly converted into text, so that the recognition is inaccurate, and the technology of converting voice into pinyin is adopted for searching in the embodiment.

The method and device for requesting multimedia resources according to the embodiments of the present application are described below with reference to the accompanying drawings.

Fig. 1 is a flow chart of a method for requesting multimedia resources according to an embodiment of the present application. As shown in fig. 1, the method includes, but is not limited to, the steps of:

s101, performing voice recognition on the multimedia on-demand voice to obtain pinyin information of the multimedia on-demand voice, and acquiring key pinyin for searching from the pinyin information.

The embodiment of the application can be applied to scenes such as APP, television, audio playing equipment and the like for multimedia playing on mobile equipment.

The multimedia on-demand voice can be acquired through a microphone of the device, or the multimedia on-demand voice can be acquired through an intelligent assistant and the like. Further, extracting voice characteristic signals from the multimedia on-demand voice, and converting the voice characteristic signals into pinyin information through an acoustic model.

Optionally, in order to ensure the accuracy of the subsequent processing of the multimedia on-demand voice, the collected multimedia on-demand voice is subjected to preprocessing such as noise reduction, echo cancellation and the like.

In some implementations, the noise reduction method for the multimedia on-demand voice may be wavelet noise reduction or EMD noise reduction, and the implementation manner of the noise reduction method is not limited.

In some implementations, the echo cancellation method for the multimedia on-demand voice may be an echo cancellation technology based on a real-time platform such as DSP or an echo cancellation technology based on a non-real-time platform such as Windows, and the implementation manner of the echo cancellation method is not limited.

Further, judging whether the collected voice signal is a complete sentence or not, wherein the judging method is to adopt a voice activation detection (Voice Activation Detection, VAD) model for detection, and when the collected voice signal is the complete sentence, taking the voice signal as the final multimedia on-demand voice; and then, extracting voice characteristic signals of the multimedia on-demand voice, and converting the voice characteristic signals into pinyin information through an acoustic model.

After the pinyin information of the multimedia on-demand voice is obtained, key pinyin for searching in the video resource on-demand scene is obtained according to the result of the pinyin information.

For example, in a video resource on demand scenario, key pinyin may be obtained as: "bang1 wo3 bo1 fang4" (help me play), "bang1 wo3 dian3 bo1" (help me order), "wo3 xing 3 kan4" (i want to see), "de5 dian4 ying3" (movie) and "de5 dian4 shi4 ju4" (television play) are performed for subsequent search matching when such key pinyin is contained in the identified pinyin information.

If key pinyin such as 'de 5 dian4 ying 3' (movies) and 'de 5 dian4 shi4 ju 4' (television dramas) are searched, the key pinyin is indicated to be a multimedia resource needing to be searched for relevant actors or directors; otherwise, key pinyin such as 'bang 1 wo3 bo1 fang 4' (help me play), 'bang 1 wo3 dian3 bo 1' (help me order), and 'wo 3 xiang3 kan 4' (i want to see) are directly matched with multimedia resources according to texts.

S102, determining a search object type corresponding to the key pinyin, and acquiring an object name matched with the search object type and a target mapping dictionary between the pinyin of the object name.

In some implementations, the search object may be a multimedia asset, e.g., the multimedia asset may include video assets such as a television show, a movie, a short video, etc., and may also include audio assets such as music, a voice book, etc.

In other implementations, the search object may be a person name, such as actor name, director name, singer name, and the like.

In the embodiment of the application, the search object type may indicate whether to search for the multimedia resource directly or indirectly through a name. In the embodiment of the application, the semantic information of the multimedia on-demand voice can be identified, and the type of the search object is determined based on the semantic identification result.

It will be appreciated that different target mapping dictionaries are previously established for different search object types. The method specifically comprises the following steps: constructing a first mapping dictionary of candidate multimedia resources and corresponding name pinyin according to a film and television database, constructing a second mapping dictionary of candidate names and corresponding pinyin according to a person name database, constructing a third mapping dictionary of candidate names and candidate multimedia resources, and constructing a pinyin word stock of candidate name pinyin and single word pinyin.

When the type of the search object indicates that the search object is a multimedia resource, a first mapping dictionary between the names of the candidate multimedia resources and the name pinyin of the candidate multimedia resources is obtained and is used as a target mapping dictionary; or when the search object type indicates that the search object is a name, acquiring a second mapping dictionary between the candidate name and pinyin of the candidate name as a target mapping dictionary.

And S103, searching in the target mapping dictionary based on the key pinyin so as to obtain a target multimedia resource matched with the multimedia on-demand voice for playing.

In the embodiment of the application, searching is performed in a corresponding target mapping dictionary based on the key pinyin, and the multimedia resource matched with the key pinyin is obtained from the target mapping dictionary. It is understood that a multimedia asset that matches a key pinyin may be understood as a target multimedia asset that the user is attempting to order.

In some implementations, the target multimedia asset can be played directly.

In other implementations, the target multimedia asset needs to wait for further feedback from the user to play.

If a plurality of target multimedia resources exist, the target multimedia resources which need to be played can be displayed and determined from the plurality of target multimedia resources to be played based on a further selection instruction of the user.

In the embodiment of the application, the pinyin information of the multimedia on-demand voice is used for matching recognition, so that errors of recognition by utilizing Chinese characters can be avoided, and the accuracy is higher; meanwhile, the multimedia on-demand voice is converted into pinyin to be directly recognized, so that the efficiency is high; when matching and identifying the target multimedia resource based on the pinyin information, firstly, extracting key pinyin, acquiring different target mapping dictionaries according to the type of the search object of the key pinyin, searching based on the corresponding target mapping dictionary, and more pertinently searching the key pinyin, thereby reducing the search range, and having higher search efficiency and higher accuracy.

Fig. 2 is a flow chart of another method for requesting multimedia resources according to an embodiment of the present application. As shown in fig. 2, the method includes, but is not limited to, the steps of:

s201, performing voice recognition on the multimedia on-demand voice to obtain pinyin information of the multimedia on-demand voice, and acquiring key pinyin for searching from the pinyin information.

In the embodiment of the present application, the implementation manner of step S201 may be implemented by any one of the embodiments of the present disclosure, which is not limited herein, and is not described herein again.

S202, determining the type of the search object corresponding to the key pinyin.

In this embodiment of the present application, the implementation manner of step S202 may be implemented by any one of the embodiments of the present disclosure, which is not limited herein, and is not described herein again.

S203, when the type of the search object indicates that the search object is the multimedia resource, a first mapping dictionary between the names of the candidate multimedia resources and the name pinyin of the candidate multimedia resources is obtained as a target mapping dictionary.

In the embodiment of the application, the candidate multimedia resources and the first mapping dictionary of the corresponding pinyin are constructed through the database, so that when the search object type corresponding to the key pinyin indicates that the search object is the multimedia resources, the first mapping dictionary between the candidate multimedia resources and the corresponding pinyin is used as the target mapping dictionary.

S204, acquiring word frequency inverse document frequency TF-IDF of the name pinyin of the candidate multimedia resource in the first mapping dictionary.

Optionally, in this embodiment, a TF-IDF algorithm is used to obtain a word frequency inverse document frequency TF-IDF corresponding to each candidate multimedia resource in the first mapping dictionary, that is, a word frequency inverse document frequency TF-IDF of the name pinyin of each candidate multimedia resource is obtained.

In the embodiment of the application, in the calculation of the word frequency inverse document frequency, TF is the frequency of occurrence of the extracted key pinyin in noun pinyin of each multimedia resource; the IDF is obtained based on the ratio of the number of name pinyins of all multimedia resources to the number of name pinyins of the multimedia resources containing the key pinyins.

S205, determining the name pinyin of the first candidate multimedia resource from the name pinyins of the candidate multimedia resources included in the first mapping dictionary based on the TF-IDF of the name pinyins of the candidate multimedia resources.

After the TF-IDF of the name pinyin of each candidate multimedia resource in the first mapping dictionary is obtained, initially screening all candidate multimedia resources in the first mapping dictionary to obtain the first candidate multimedia resources in the first mapping dictionary.

Optionally, the TF-IDFs of the name pinyin of all candidate multimedia resources in the first mapping dictionary are arranged in a descending order, that is, sorting is performed from large to small according to the TF-IDFs of the name pinyin of the candidate multimedia resources, and topN candidate multimedia resources after the arrangement are selected as the first candidate multimedia resources. For example, N may be 5, and the first 5 candidate multimedia resources with the largest IF-ID may be obtained as the first candidate multimedia resource.

Alternatively, the initial score of each candidate multimedia resource may be obtained based on the TF-IDF of the name pinyin of each candidate multimedia resource, further, the initial scores are arranged in a descending order, and topN candidate multimedia resources after the arrangement are selected as the first candidate multimedia resource.

S206, obtaining the screening score of the first candidate multimedia resources according to the name pinyin and the key pinyin of the first candidate multimedia resources.

In some implementations, a similarity distance of name pinyin and key pinyin for the first candidate multimedia resource is obtained. In this embodiment, the similar distance between the name pinyin and the key pinyin of the first candidate multimedia resource may be a euclidean distance or a cosine distance; the similarity distance is used to characterize the similarity between the name pinyin and the key pinyin of the first candidate multimedia resource.

Further, according to the similarity distance of the first candidate multimedia resources and the length of the name pinyin, the weight of the first candidate multimedia resources is determined, and based on the weight of the first candidate multimedia resources, weighting operation is carried out on TF-IDF of the first candidate multimedia resources, so that the screening score of the first candidate multimedia resources is obtained.

Optionally, for any one of the first candidate multimedia resources, calculating a ratio of the similarity degree corresponding to the first candidate multimedia resource to the length of the name pinyin of the first candidate multimedia resource, and taking the ratio as the weight of the first candidate multimedia resource.

S207, selecting the multimedia resource with highest screening score from the first candidate multimedia resources as a target multimedia resource.

The higher the screening score is, the more the first candidate multimedia resource is matched with the key pinyin, so that the first candidate multimedia resource with the highest screening score is selected, and the first candidate multimedia resource with the highest screening score is taken as the target multimedia resource.

S208, playing the target multimedia resource.

In the video-on-demand scene, after a target multimedia resource is obtained based on key pinyin of the multimedia on-demand voice, the target multimedia resource is played.

In some implementations, the target multimedia asset can be played directly.

In the embodiment of the application, when the type of the search object corresponding to the key pinyin is a multimedia resource, analyzing the candidate multimedia resource in the first mapping dictionary to obtain a target multimedia resource; firstly, obtaining word frequency inverse document frequency TF-IDF corresponding to each candidate multimedia resource according to a traditional TF-IDF algorithm, and reducing the calculated amount in the acquisition process of the target multimedia resource by acquiring the first candidate multimedia resource for each word frequency inverse document frequency; further, in order to avoid the problem of inaccurate results caused by neglecting text sequences in the conventional TF-IDF algorithm, in this embodiment, the screening score of each first candidate multimedia resource is obtained by improving the TF-IDF algorithm, the weight is added to the word frequency inverse document frequency TF-IDF of the first candidate multimedia resource obtained by the conventional TF-IDF algorithm, and the weight is obtained from the similarity distance between the name pinyin and the key pinyin of the first candidate multimedia resource, so that the accuracy of the screening score after combining the word frequency inverse document frequency TF-IDF is ensured, and the accuracy of the target multimedia resource obtained based on the screening score at this time is higher.

Fig. 3 is a flow chart of another method for requesting multimedia resources according to an embodiment of the present application. As shown in fig. 3, the method includes, but is not limited to, the steps of:

s301, performing voice recognition on the multimedia on-demand voice to obtain pinyin information of the multimedia on-demand voice, and acquiring key pinyin for searching from the pinyin information.

In this embodiment of the present application, the implementation manner of step S301 may be implemented by any one of the embodiments of the present disclosure, which is not limited herein, and is not described herein again.

S302, determining the type of the search object corresponding to the key pinyin.

In this embodiment of the present application, the implementation manner of step S302 may be implemented in any manner of each embodiment of the present disclosure, which is not limited herein, and is not described herein again.

S303, when the search object type indicates that the search object is a person name, a second mapping dictionary between the candidate person name and pinyin of the candidate person name is acquired as a target mapping dictionary.

In the embodiment of the application, the second mapping dictionary of the candidate name and the corresponding pinyin is constructed through the database, so that when the search object type corresponding to the key pinyin indicates that the search object is the name, the second mapping dictionary between the candidate name and the corresponding pinyin is used as the target mapping dictionary.

S304, performing reverse maximum matching on the key pinyin in a pinyin word stock to obtain a first-person pinyin, wherein the pinyin word stock comprises candidate name pinyin and single-word pinyin.

And carrying out reverse maximum matching on the key pinyin in a pinyin word stock so as to obtain a first person name pinyin corresponding to the key pinyin, wherein the pinyin word stock is a word stock which can be pre-constructed person name pinyin and single word pinyin, namely the pinyin word stock comprises candidate person name pinyin and single word pinyin.

Illustratively, the candidate name pinyin may be director name pinyin, actor name pinyin, or singer name pinyin, where single word pinyin refers to pinyin of a single word, with the purpose of providing a basis for subsequent reverse maximum matching.

The reverse maximum matching algorithm is to start matching scanning from the tail end of the processed document, and if the key pinyin of the multimedia on-demand voice is 'hu2 ge1 de5 dian4 shi4ju 4' (a television play of Hu Ge); the human name pinyin and the single word pinyin in the pinyin word library comprise { ' hu2', ' ge1', ' de5', ' dian4', ' shi4', ' ju2', ' hu2 ge1', < other single word pinyin >, ' other human name pinyin ', '; when the reverse maximum matching algorithm is utilized for scanning matching, assuming that the length of the longest word in a pinyin word stock is 5, performing first-round scanning on key pinyin to obtain 'ge1 de5 dian4 shi4ju 4', performing second-round scanning on the key pinyin word stock to obtain 'de5 dian4 shi4ju 4', performing third-round scanning on the key pinyin word stock to obtain 'dian4 shi4ju 4', performing fourth-round scanning on the key pinyin word stock to obtain 'shi 4ju 4', performing fifth-round scanning on the key pinyin word stock to obtain 'ju 4', performing successful matching in the pinyin word stock, stopping scanning, and outputting 'ju 4'; the second round of scanning is started, and 'ju 4' is removed during the second round of scanning, wherein the first round of scanning is 'hu2 ge1 de5 dian4 shi4', the matching in a pinyin word stock fails, the second round of scanning is 'ge1 de5 dian4 shi4', the matching in the pinyin word stock fails, the third round of scanning is 'de5 dian4 shi4', the matching in the pinyin word stock fails, the fourth round of scanning is 'dian4 shi4', the scanning in the pinyin word stock fails, the fifth round of scanning is 'shi4', the matching in the pinyin word stock is successful, and the scanning is stopped, and 'shi4' is output; starting a third round of scanning, wherein "ju4" and "shi4" are removed during the third round of scanning, the first round of scanning is "hu2 ge1 de5 dian4", and the like, and the third round of scanning outputs "dian4"; starting a fourth round of scanning, removing 'ju 4', 'shi4' and 'dian4' during the fourth round of scanning, wherein the first round of scanning is 'hu2 ge1 de5', and the fourth round of scanning outputs 'de 5'; the fifth round of scanning is started, and 'ju 4', 'shi4', 'dian4' and 'de5' are removed during the fifth round of scanning, wherein 'hu2 ge1' is used for the first time in the fifth round of scanning, matching is successful in a pinyin word stock, 'hu2 ge1' is output, the whole scanning is finished, and the final segmentation result is 'hu2 ge1/de5/dian4/shi4/ju 4' output by a reverse maximum matching algorithm. Wherein 'hu2 ge1' is the name pinyin of the person.

S305, determining candidate names with mapping relation with the pinyin of the first person name from the second mapping dictionary as target person names.

Selecting a candidate name with a mapping relation with the pinyin of the first person from the second mapping dictionary, and marking the candidate name with the mapping relation as a target name; and in the subsequent analysis, only the target name is analyzed, so that the cost in the name analysis process is reduced, the analysis range is shortened, and the efficiency is higher.

S306, determining the target multimedia resources based on the target person names from the candidate multimedia resources.

After the target person name is obtained, the corresponding target multimedia resource can be obtained according to the target person name. In some implementations, a second candidate multimedia asset associated with the target person name is obtained from the candidate multimedia assets. Optionally, a third mapping dictionary between the candidate name and the candidate multimedia resources is obtained, and further, at least one or more candidate multimedia resources with a mapping relation with the target name are queried in the candidate multimedia resources based on the third mapping dictionary and serve as second candidate multimedia resources.

Illustratively, the candidate person name may be a director name, the candidate multimedia resource may be a movie or a television show, and the third mapping dictionary may include each director name and a movie name photographed by the director.

It will be appreciated that different types of names, such as actors, directors, singers, etc., and names of video or audio works corresponding to the respective names may be included in the third mapping dictionary. The first mapping dictionary may include different multimedia assets, such as video or audio works, and name pinyin corresponding to each video or audio. Different types of names, such as singer, actor, and director, and the name pinyin corresponding to each name may be included in the second mapping dictionary.

Further, after the second candidate multimedia resource is obtained, the target multimedia resource may be determined from the second candidate multimedia resource.

Optionally, if the second multimedia resources are multiple, the user may participate in the interaction again, and the user selects the target multimedia resource from the multiple second multimedia resources, for example, the user manually selects the target multimedia resource by using a remote controller or other modes; or the user can continue to select the target multimedia resources through voice recognition, such as what number to play, or the user can directly speak the names of the target multimedia resources; the second multimedia resource with the largest playing amount can be directly selected from the plurality of second multimedia resources as the target multimedia resource without the participation of users in interaction, and the second multimedia resource with the nearest online time can be selected from the plurality of second multimedia resources as the target multimedia resource.

S307, playing the target multimedia resource.

In some implementations, the target multimedia asset can be played directly.

In the embodiment of the application, when the search object type corresponding to the key pinyin is a person name, the first person name pinyin corresponding to the key pinyin is obtained in the pinyin word stock based on the reverse maximum matching, and the candidate person name having the mapping relation with the first person name pinyin is selected from the second mapping dictionary based on the first person name pinyin, so that the target person name is obtained, only the target person name is analyzed later, and the calculation cost is reduced. Further, a second candidate multimedia resource library corresponding to the target name is obtained based on a third mapping dictionary constructed in advance, multimedia resources associated with the target name are selected from the second candidate multimedia resource library, more options can be provided for the user, the user experience is improved, and more accurate target multimedia resources are obtained under the interaction of the user.

Fig. 4 is a flow chart of another method for requesting multimedia resources according to an embodiment of the present application. As shown in fig. 4, the method includes, but is not limited to, the steps of:

s401, performing voice recognition on the multimedia on-demand voice to obtain pinyin information of the multimedia on-demand voice, and acquiring key pinyin for searching from the pinyin information.

In this embodiment of the present application, the implementation manner of step S401 may be implemented by any one of the embodiments of the present disclosure, which is not limited herein, and is not described herein again.

S402, determining the type of the search object corresponding to the key pinyin.

In the embodiment of the present application, the implementation manner of step S402 may be implemented in any manner of each embodiment of the present disclosure, which is not limited herein, and is not described herein again.

S403, when the type of the search object indicates that the search object is the multimedia resource, a first mapping dictionary between the names of the candidate multimedia resources and the name pinyin of the candidate multimedia resources is obtained as a target mapping dictionary.

In this embodiment of the present application, the implementation manner of step S403 may be implemented by any one of the embodiments of the present disclosure, which is not limited herein, and is not described herein again.

S404, the TF-IDF of the name pinyin of the candidate multimedia resource in the first mapping dictionary is obtained.

In this embodiment of the present application, the implementation manner of step S404 may be implemented by any one of the embodiments of the present disclosure, which is not limited herein, and is not described herein again.

S405, determining a target multimedia resource based on the TF-IDF of the name pinyin of the candidate multimedia resource and the first mapping dictionary.

Optionally, the name pinyin of the first candidate multimedia resource is determined from the name pinyins of the candidate multimedia resources included in the first mapping dictionary based on TF-IDFs of the name pinyins of the candidate multimedia resources. Further, a screening score of the first candidate multimedia resource is obtained according to the name pinyin and the key pinyin of the first candidate multimedia resource. Optionally, selecting the multimedia resource with the highest screening score from the first candidate multimedia resources as the target multimedia resource.

S406, when the search object type indicates that the search object is a name, a second mapping dictionary between the candidate name and the pinyin of the candidate name is acquired as a target mapping dictionary.

In this embodiment of the present application, the implementation manner of step S406 may be implemented by any one of the embodiments of the present disclosure, which is not limited herein, and is not described herein again.

S407, performing reverse maximum matching on the key pinyin in the pinyin word library to obtain a first-person pinyin, and determining a target multimedia resource based on the second mapping dictionary and the first-person pinyin.

In some implementations, the pinyin word library includes candidate name pinyin and single word pinyin.

In some implementations, a candidate name having a mapping relationship with the first person's pinyin is determined from the second mapping dictionary as a target name, and a target multimedia resource is determined from the candidate multimedia resources based on the target name.

S408, playing the target multimedia resource.

In this embodiment of the present application, the implementation manner of step S408 may be implemented by any one of the embodiments of the present disclosure, which is not limited herein, and is not described herein again.

In the embodiment of the application, the key pinyin of the multimedia on-demand voice is extracted, so that the type of a search object of the multimedia on-demand voice is further judged, different target search dictionaries are corresponding to different search object types, and when the search object types indicate that the search object is a multimedia resource, the first mapping dictionary is used as the target mapping dictionary for searching; when the search object type mode indicates that the search object is a name, the second mapping dictionary is used as a target mapping dictionary for searching, and in the searching process, the target multimedia resources are judged through the screening score of each candidate multimedia resource, analysis is carried out based on different conditions, adaptability is higher, and different dimensions are used for searching, such as a resource dimension and a name dimension, so that the diversity and flexibility of searching are increased. The accuracy of the obtained target multimedia resource is higher.

Taking a television on-demand scene as an example, the on-demand method of the multimedia resource provided in the application is explained as follows:

fig. 5 is a flowchart of another method for requesting multimedia resources according to an embodiment of the present application. As shown in fig. 5, the method includes, but is not limited to, the steps of:

s501, constructing a film name pinyin database and a name pinyin database.

In the embodiment of the application, a mapping dictionary is constructed through a database to support multimedia resources and name related film and television searching; the method specifically comprises the following steps: the method comprises the steps of constructing a first mapping dictionary of candidate multimedia resources and corresponding pinyin according to a film and television database, constructing a second mapping dictionary of candidate names and corresponding pinyin according to a name database, constructing a third mapping dictionary of candidate names and candidate multimedia resources, and constructing a pinyin word stock of candidate name pinyin and single word pinyin.

S502, preloading a database.

The database is preloaded, so that waiting time can be reduced, the database analysis is more convenient and rapid, and user experience is improved.

Alternatively, database preloading may utilize the Include method.

S503, collecting video play on-demand voice, and performing pinyin conversion on the voice.

S504, acquiring the searched key pinyin from the pinyin information.

After the pinyin information of the multimedia on-demand voice is obtained, key pinyin for searching in a television on-demand scene is obtained as a key word according to the result of the pinyin information.

Optionally, the key pinyin is: "bang1 wo3 bo1 fang4" (help me play), "bang1 wo3 dian3 bo1" (help me order), "wo3 xiang3kan4" (i want to see), "de5dian4 ying3" (movie) and "de5dian4 shi4 ju4" (television show).

S505, judging whether to search the movie names or the person names according to all the key pinyin.

If key pinyin such as 'de 5dian4 ying 3' (movie) and 'de 5dian4 shi4 ju 4' (television play) are searched, the key pinyin indicates that multimedia resources of relevant actors or directors need to be searched, namely, names of people need to be searched; otherwise, key pinyin such as "bang1 wo3 bo1 fang4" (help me play), "bang1 wo3 dian3 bo1" (help me order), and "wo3 xiang3kan4" (i want to see) are directly matched with multimedia resources according to texts, namely, the movie names are searched.

S506, when the movie names are searched, the final movie is matched by utilizing the improved TF-IDF algorithm.

When searching for the video names, utilizing TF-IDF for acquiring name pinyin of the candidate multimedia resources in the first mapping dictionary; determining the name pinyin of the first candidate multimedia resource from the name pinyins of the candidate multimedia resources included in the first mapping dictionary based on the TF-IDF of the name pinyin of the candidate multimedia resource; acquiring the similar distance between the name pinyin and the key pinyin of the first candidate multimedia resource; determining the weight of the first candidate multimedia resource according to the similarity distance of the first candidate multimedia resource and the length of the name pinyin; weighting operation is carried out on TF-IDF of the first candidate multimedia resources based on the weight of the first candidate multimedia resources, and screening scores of the first candidate multimedia resources are obtained; selecting the multimedia resource with highest screening score from the first candidate multimedia resources as a target multimedia resource.

S507, when searching the name, obtaining a name matching result by using a reverse maximum matching algorithm, and matching the final movie and television play based on the name matching result.

When searching for the name, carrying out reverse maximum matching on key pinyin in a pinyin word stock to obtain first-person-name pinyin, wherein the pinyin word stock comprises candidate name pinyin and single-word pinyin; determining candidate names with mapping relation with the pinyin of the first person name from the second mapping dictionary as target person names; acquiring a third mapping dictionary between the candidate name and the candidate multimedia resource; inquiring at least one or more candidate multimedia resources with mapping relation with the target name in the candidate multimedia resources based on the third mapping dictionary, and taking the candidate multimedia resources as second candidate multimedia resources; a target multimedia asset is determined based on the second candidate multimedia asset.

And S508, playing the final movie and television play.

In some implementations, the target multimedia asset can be played directly.

In the embodiment of the application, the matching recognition is further carried out through the pinyin information of the multimedia on-demand voice by constructing the database of the film name pinyin and the name pinyin as the basis for the subsequent processing, so that the error of recognition by utilizing Chinese characters can be avoided, and the recognition is more accurate; meanwhile, different matching methods are adopted according to whether key pinyin in multimedia on-demand voice is a film name or a person name, so that the adaptability is higher, and the accuracy of finally matched target multimedia resources is higher.

Fig. 6 is a schematic structural diagram of a multimedia resource on-demand device according to an embodiment of the present application. As shown in fig. 6, the on-demand device 600 for multimedia resources includes:

The first obtaining module 601 is configured to perform voice recognition on a multimedia on-demand voice, obtain pinyin information of the multimedia on-demand voice, and obtain key pinyin for searching from the pinyin information;

a second obtaining module 602, configured to determine a search object type corresponding to the key pinyin, and obtain a target mapping dictionary between an object name matched with the search object type and the pinyin of the object name;

and the playing module 603 is configured to search the target mapping dictionary based on the key pinyin to obtain a target multimedia resource matched with the multimedia on-demand voice for playing.

In some implementations, the second acquisition module 602 is further configured to:

Fig. 7 is a block diagram of an electronic device, according to an example embodiment. As shown in fig. 7, an electronic device 700 includes an on-demand apparatus 600 of a multimedia asset. The electronic device may be a mobile electronic device or a non-mobile electronic device. By way of example, the mobile electronic device may be a cell phone, tablet computer, notebook computer, palm computer, vehicle-mounted electronic device, wearable device, ultra-mobile personal computer (ultra-mobile personal computer, UMPC), netbook or personal digital assistant (personal digital assistant, PDA), etc., and the non-mobile electronic device may be a network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (TV), teller machine or self-service machine, etc., and the embodiments of the present application are not limited in particular.

There is also provided, in accordance with an embodiment of the present application, an electronic device including: a processor; a memory for storing the processor-executable instructions, wherein the processor is configured to execute the instructions to implement the on-demand method for multimedia resources as described above.

In order to implement the above embodiment, the present application also proposes a storage medium.

Wherein the instructions in the storage medium, when executed by the processor of the electronic device, enable the electronic device to perform the method of on-demand of multimedia resources as described above.

To achieve the above embodiments, the present application also provides a computer program product.

Wherein the computer program product, when executed by a processor of the electronic device, enables the electronic device to perform the method of on-demand of multimedia resources as described above.

Fig. 8 is a block diagram of an electronic device, according to an example embodiment. The electronic device shown in fig. 8 is only an example and should not impose any limitation on the functionality and scope of use of the embodiments of the present application.

As shown in fig. 8, the electronic device 800 includes a processor 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a Memory 806 into a random access Memory (RAM, random Access Memory) 1003. In the RAM803, various programs and data required for the operation of the electronic device 800 are also stored. The processor 801, the ROM 802, and the RAM803 are connected to each other by a bus 804. An Input/Output (I/O) interface 805 is also connected to bus 804.

The following components are connected to the I/O interface 805: a memory 806 including a hard disk or the like; and a communication section 807 including a network interface card such as a LAN (local area network ) card, a modem, or the like, the communication section 807 performing communication processing via a network such as the internet; the drive 808 is also connected to the I/O interface 805 as needed.

In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program embodied on a computer readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from the network through the communication section 807. The above-described functions defined in the methods of the present application are performed when the computer program is executed by the processor 801.

In an exemplary embodiment, a storage medium is also provided, such as a memory, comprising instructions executable by the processor 801 of the electronic device 800 to perform the above-described method. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Fig. 9 is a block diagram of an electronic device, according to an example embodiment. The electronic device shown in fig. 9 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application. As shown in fig. 9, the electronic device 900 includes a processor 901 and a memory 902. The memory 902 is used for storing program codes, and the processor 901 is connected to the memory 902 and is used for reading the program codes from the memory 902, so as to implement the on-demand method of the multimedia resource in the above embodiment.

Alternatively, the number of processors 901 may be one or more.

Optionally, the electronic device may further include an interface 903, and the number of the interfaces 903 may be plural. The interface 903 may be connected to an application program, and may receive data of an external device such as a sensor, or the like.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method for on-demand delivery of a multimedia asset, comprising:

2. The method of claim 1, wherein the obtaining a target mapping dictionary between the object name matching the search object type and the pinyin of the object name comprises:

3. The method of claim 2, wherein searching the target mapping dictionary based on the key pinyin to obtain a target multimedia resource matching the multimedia on-demand voice for playback comprises:

4. The method of claim 3, wherein the obtaining the screening score of the first candidate multimedia resource according to the name pinyin and the key pinyin of the first candidate multimedia resource comprises:

5. The method of claim 2, wherein searching the target mapping dictionary based on the key pinyin to obtain a target multimedia resource matching the multimedia on-demand voice for playback comprises:

6. The method of claim 5, wherein said determining said target multimedia asset from among candidate multimedia assets based on said target person name comprises:

7. The method of claim 5, wherein the obtaining a second candidate multimedia asset associated with the target person name from the candidate multimedia assets comprises:

8. An on-demand device for multimedia resources, comprising:

9. An electronic device, comprising: the apparatus of claim 8.

10. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of any one of claims 1 to 7.

11. A non-transitory computer readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method of any one of claims 1 to 7.

12. A computer program product comprising a computer program which, when executed by a processor, implements the method of any of claims 1-7.