WO2021042234A1

WO2021042234A1 - Application introduction method, mobile terminal, and server

Info

Publication number: WO2021042234A1
Application number: PCT/CN2019/104000
Authority: WO
Inventors: 艾静雅; 柳彤; 朱大卫; 汤慧秀
Original assignee: 深圳海付移通科技有限公司
Priority date: 2019-09-02
Filing date: 2019-09-02
Publication date: 2021-03-11
Also published as: CN111801673A

Abstract

An application introduction method, a mobile terminal, and a server. The method comprises: obtaining introduction requirement information about an application, wherein the introduction requirement information is used for indicating a requirement for introducing the application (11); extracting a keyword in the introduction requirement information (12); obtaining associated image and voice on the basis of the keyword (13); and processing the image and the voice to form a video for introducing the application (14). On the one hand, the method can adapt to different user groups, so that the application satisfies the requirements of more user groups. On the other hand, the application is introduced in the form of animation, making the introduction of the application more personalized and more interesting, thereby improving user experience.

Description

Application introduction method, mobile terminal and server

【Technical Field】

This application relates to the technical field of application programs, in particular to an application program introduction method, mobile terminal and server.

【Background technique】

With the popularity of mobile terminals, more and more applications are used on mobile terminals. After downloading an application on a mobile terminal, a user usually hopes to understand the usage method, usage scenario, and parts that need attention, and more relevant parts of the application within a short period of time. For applications such as payment and financial management, the main application introductions are all regular pictures and texts. These are relatively fixed, unattractive, and will appear rigid, non-personal and boring.

[Summary of the invention]

In order to solve the above-mentioned problems, this application provides an application introduction method, mobile terminal and server. On the one hand, it can adapt to different user groups so that the application can meet the needs of more user groups. On the other hand, it adopts the form of animation. The introduction of the application can increase the personalization of the introduction of the application, increase the interest, and improve the user experience.

The first technical solution adopted by this application is to provide an application introduction method, including: obtaining introduction requirement information about the application; wherein the introduction requirement information is used to indicate the requirement for introducing the application; and the introduction requirement information is extracted Keywords; obtain related images and voices based on keywords; process images and voices to form a video for introducing the application.

Among them, the introduction demand information is audio information; extracting keywords in the introduction demand information includes: performing voice recognition on the audio information to obtain text information; performing keyword extraction on the text information to obtain keywords.

Wherein, keyword extraction is performed on text information to obtain keywords, including: semantic segmentation of text information; keywords are obtained based on the result of semantic segmentation.

Wherein, performing semantic segmentation on text information includes: inputting the text information into a convolutional neural network for deep learning, so as to perform semantic segmentation on the text information to obtain keywords.

Among them, the introduction demand information is text information; the extraction of keywords in the introduction demand information includes: semantic segmentation of the text information; keywords are obtained based on the result of the semantic segmentation.

Among them, obtaining the associated image and voice based on the keyword includes: sending the keyword to the server so that the server generates the associated image and voice based on the keyword; obtaining the image and voice sent by the server.

Among them, the image and voice are processed to form a video for introducing the application, including: image segmentation of multiple corresponding images, extraction of feature information in the image; combination of feature information to generate multiple images Frame; multiple image frames are formed into animation; animation and voice are merged to form a video used to introduce the application.

Wherein, the method further includes: obtaining background music sent by the server; wherein, the background music is music generated by the server based on keywords; and adding the background music to the video.

The second technical solution adopted in this application is to provide an application introduction method, including: acquiring keywords sent by the mobile terminal; wherein the keywords are extracted by the mobile terminal based on the acquired introduction demand information about the application. The introduction demand information is used to express the demand for introducing the application; generate related images and voice based on keywords; send the image and voice to the mobile terminal, so that the mobile terminal can process the image and voice to form the application program Introductory video.

Among them, generating related images and voices based on keywords includes: deep learning the keywords to obtain related images from a preset image library.

Among them, generating related images and voices based on keywords includes: applying keywords through deep learning to generate text information that meets the keyword scene; and converting text information into voice.

Another technical solution adopted in this application is to provide a mobile terminal, which includes a processor and a memory connected to the processor; the memory is used to store program data, and the processor is used to execute the program data, so as to implement the first solution described above. Method provided in.

Another technical solution adopted by this application is to provide a server, the server includes a processor and a memory connected to the processor; the memory is used to store program data, and the processor is used to execute the program data, so as to implement the above-mentioned second solution. Methods.

Another technical solution adopted by the present application is to provide a computer storage medium, which is used to store program data, and when the program data is executed by a processor, it is used to implement any of the methods provided in the above-mentioned solutions.

Another technical solution adopted in this application is to provide a mobile terminal. The mobile terminal includes: an acquisition module for acquiring introduction requirement information about an application program; wherein the introduction requirement information is used to indicate a requirement for introducing an application program; extraction The module is used to extract keywords in the introduction demand information; the acquisition module is also used to obtain related images and voices based on the keywords; the processing module is used to process the images and voices to form an introduction to the application Video.

Another technical solution adopted in this application is to provide a server. The server includes: an acquisition module for acquiring keywords sent by the mobile terminal; wherein the keywords are extracted by the mobile terminal based on the acquired introduction demand information about the application , The introduction demand information is used to express the demand for introducing the application; the processing module is used to generate associated images and voices based on keywords; the sending module is used to send images and voices to the mobile terminal so that the mobile terminal can respond to the image And voice processing to form a video for introducing the application.

The beneficial effect of this application is that, different from the situation in the prior art, an application introduction method of this application includes: obtaining introduction requirement information about the application; wherein, the introduction requirement information is used to indicate information about the introduction of the application. Demand; extract the keywords in the introduction demand information; obtain related images and voices based on the keywords; process the images and voices to form a video for introducing the application program. Through the above methods, users’ needs can be easily obtained, and different application programs can be introduced according to the different needs of users. On the one hand, it can adapt to different user groups and make the application meet the needs of more user groups. On the other hand, The introduction of the application in the form of animation can increase the personalization of the application introduction, increase the interest, and improve the user experience.

【Explanation of the drawings】

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained from these drawings without creative work. among them:

FIG. 1 is a schematic flowchart of a first embodiment of an application program introduction method provided by the present application;

FIG. 2 is a schematic flowchart of a second embodiment of an application program introduction method provided by the present application;

FIG. 3 is a schematic flowchart of a third embodiment of an application program introduction method provided by the present application;

FIG. 4 is a schematic flowchart of a fourth embodiment of an application program introduction method provided by the present application;

FIG. 5 is a schematic flowchart of a fifth embodiment of an application program introduction method provided by the present application;

FIG. 6 is a schematic flowchart of a sixth embodiment of an application program introduction method provided by the present application;

FIG. 7 is a schematic flowchart of a seventh embodiment of an application program introduction method provided by the present application;

FIG. 8 is a schematic structural diagram of a first embodiment of a mobile terminal provided by the present application;

FIG. 9 is a schematic structural diagram of a first embodiment of a server provided by the present application;

FIG. 10 is a schematic structural diagram of an embodiment of a computer storage medium provided by the present application;

FIG. 11 is a schematic structural diagram of a second embodiment of a mobile terminal provided by the present application;

Fig. 12 is a schematic structural diagram of a second embodiment of a server provided by the present application.

【detailed description】

The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. It can be understood that the specific embodiments described here are only used to explain the application, but not to limit the application. In addition, it should be noted that, for ease of description, the drawings only show a part of the structure related to the present application instead of all of the structure. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative work shall fall within the protection scope of this application.

The terms "first", "second", etc. in this application are used to distinguish different objects, rather than to describe a specific sequence. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes unlisted steps or units, or optionally also includes Other steps or units inherent in these processes, methods, products or equipment.

Reference to "embodiments" herein means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.

Referring to Fig. 1, Fig. 1 is a schematic flowchart of a first embodiment of the application introduction method provided by the present application. The method is implemented based on a mobile terminal, and the method includes:

Step 11: Obtain the introduction requirement information about the application; among them, the introduction requirement information is used to indicate the requirement for the introduction of the application.

Optionally, after the mobile terminal responds to the user to download the application and the installation is complete, it obtains the introduction requirement information about the application.

Optionally, the introduction requirement information may be audio information or text information. The audio information is collected by the microphone of the mobile terminal, and the text information can be manually input, or keywords prompted by the application can be selected as the text information.

Optionally, the introduction requirement information is used to indicate the requirement for the introduction application. For example, when users need to know the income of financial management applications, they can use “income” as the introduction demand information.

Step 12: Extract the keywords in the introduction requirement information.

Optionally, when the introduction requirement information is obtained, the mobile terminal performs keyword extraction on the content of the introduction requirement information. For example, the obtained introduction requirement information is audio information, and the text information parsed from the audio information is "How is this application safe?" "Payment", the extracted keyword is "secure payment".

The keyword extraction method can be based on a keyword extraction algorithm based on statistical features.

The keyword extraction algorithm based on statistical features uses the statistical information of the words in the document to extract the keywords of the document. Usually, the text is preprocessed to obtain a set of candidate words, and then keywords are obtained from the candidate set by means of feature value quantification.

Among them, the feature value quantization methods include feature quantification based on word weight, feature quantification based on word document location, and feature quantification based on word-related information. Feature quantification based on word weight mainly includes part-of-speech, word frequency, inverse document frequency, relative word frequency, word length, etc.; word-based feature quantification of document position is based on the assumption that sentences at different positions of the article have different importance to the document. Generally, words in the first N words, last N words, beginning of paragraph, end of paragraph, title, introduction, etc. of the article are representative. These words can express the entire topic as keywords; quantify the characteristics based on the related information of the words : Word related information refers to the degree of relevance information between words and words, words and documents, including mutual information, hits value, contribution degree, dependency degree, TF-IDF value, etc.

Keyword extraction methods can also be based on deep learning methods.

It is understandable that there are many ways to extract keywords, which are not listed here.

Step 13: Obtain related images and voices based on keywords.

Optionally, acquiring the associated image may be that the mobile terminal sends the keyword to the server, and the server performs image retrieval in a preset image library to obtain multiple images.

Optionally, obtaining the associated voice may be that the mobile terminal sends the keywords to the server, and the server generates multiple texts conforming to the application scenarios through the keywords and application scenarios, and then sends the multiple texts to the mobile terminal, and then the mobile terminal sends the multiple texts to the mobile terminal. Convert text messages into voice messages.

Optionally, acquiring the associated image may be that the mobile terminal performs image retrieval in a local preset image library to obtain multiple images.

Optionally, acquiring the associated voice may be that the mobile terminal generates multiple paragraphs of text that conform to the application scenario through keywords and application scenarios, and then converts the text information into voice information.

Step 14: Process the image and voice to form a video for introducing the application.

Optionally, perform image segmentation on the image, extract feature information that matches the keyword, and compose the feature information into a new image. For example, the keywords are "birds" and "trees"; one image has the characteristic information of a tree, and another image has the characteristic information of a bird. These two pieces of information can be extracted to form a bird stop according to the scene. Image on the tree. After composing a series of complete images, the image is smoothed and other enhanced processing, the purpose is to make the image content more natural.

Optionally, the voice information and the image information are merged to form a video for introducing the application program.

for example:

The user downloads a financial management application. When starting the application, the application reminds the user to say what he wants to know. The audio information collected by the mobile terminal at this time is "I open an account for the first time, how can I invest in order to achieve high returns and The risk is small, and there is also how to pay. The keywords extracted by the mobile terminal are "open account for the first time", "investment", "high return", "small risk", and "how to pay". Then, based on these keywords, search for the corresponding images in the preset image library. For example, “open an account for the first time” will search for the account opening screen and images of animated characters, and “high return” and “low risk” will search for warnings and recommendations. The image of the product is then formed into a section with an animated character explaining how to invest in the best way to ensure low risk, and then there is another picture where payment safety is also very important. At the same time, based on these keywords and scenes, text information that meets the scene is generated, the text information is converted into voice, and the voice information and image information are merged to form a video that introduces user needs.

In some embodiments, background music can also be added to the video.

In other embodiments, the mobile terminal can generate an application introduction corresponding to the sound and image according to a piece of voice, or tell a story to the child through voice. The child can describe the type of story he likes to listen to, and generate a short story with pictures and texts through machine learning. , So that children are more interested, and it can also make some children who are not literate can acquire corresponding knowledge through animation.

In other embodiments, according to different application programs and different user requirements, the mobile terminal generates a video with pictures and texts corresponding to the application program for the user to watch.

Different from the situation in the prior art, an application introduction method of this application includes: obtaining introduction requirement information about the application; wherein the introduction requirement information is used to indicate the requirement for introducing the application; and the introduction requirement information is extracted Keywords; obtain related images and voices based on keywords; process images and voices to form a video for introducing the application. Through the above methods, users’ needs can be easily obtained, and different application programs can be introduced according to the different needs of users. On the one hand, it can adapt to different user groups and make the application meet the needs of more user groups. On the other hand, The introduction of the application in the form of animation can increase the personalization of the application introduction, increase the interest, and improve the user experience.

Referring to Figure 2, Figure 2 is a schematic flowchart of a second embodiment of the application introduction method provided by the present application. The method is implemented based on a mobile terminal, and the method includes:

Step 21: Acquire audio information about the application program; among them, the audio information is used to indicate the demand for introducing the application program.

In this embodiment, the user's audio information is collected to indicate the demand for introducing the application program.

Optionally, the audio information may be audio information related to the application that the user wants to learn about the application.

Optionally, the audio information may be text information displayed to the user after the application is started to prompt the user to learn about the application, so that the user can quickly speak the corresponding keyword information.

Step 22: Perform voice recognition on the audio information to obtain text information.

Optionally, speech recognition is to convert a piece of audio information into corresponding text information. The system mainly includes four parts: feature extraction, acoustic model, language model, dictionary and decoding. In order to extract features more effectively, the collected Perform audio data preprocessing work such as filtering and framing of the audio information to properly extract the audio information that needs to be analyzed from the original signal; feature extraction converts the audio information from the time domain to the frequency domain to provide a suitable acoustic model Feature vector: The acoustic model calculates the score of each feature vector on the acoustic feature according to the acoustic characteristics; while the language model calculates the probability of the sound signal corresponding to the possible sequence of phrases according to the linguistic theory; finally according to the existing dictionary , Decode the phrase sequence to get the final possible text representation.

Step 23: Input the text information into the convolutional neural network for deep learning, so as to perform semantic segmentation on the text information to obtain keywords.

Optionally, according to the characteristics of the application program, a large amount of information is trained in advance through a convolutional neural network for deep learning to generate a corresponding semantic segmentation model. When the semantic segmentation model gets the text information, it can get the keywords.

Step 24: Obtain related images and voices based on the keywords.

Step 25: Process the image and voice to form a video for introducing the application.

Steps 24-25 have the same or similar technical solutions as the foregoing embodiment, and will not be repeated here.

Referring to FIG. 3, FIG. 3 is a schematic flowchart of a third embodiment of the application introduction method provided by the present application. The method is implemented based on a mobile terminal, and the method includes:

Step 31: Acquire text information about the application program; where the text information is used to indicate a requirement for introducing the application program.

Optionally, the text information may be manually input by the user, or may be generated by the user's selection by the application prompting multiple paragraphs of text.

Step 32: Perform semantic segmentation on the text information.

Step 33: Obtain keywords based on the result of semantic segmentation.

Steps 32-33 can be specifically:

Adopt TF-IDF (term frequency inverse document frequency, a common weighting technique for information retrieval data mining), TextRank (a general graph-based sorting algorithm for natural language processing), Rake (Rapid Automatic Keyword Extraction, fast automatic keyword extraction), Topic -Model (theme model) and other methods, you can get keywords.

TF-IDF: TF measures the frequency of a word in text information. Words that appear multiple times in a text message always have a certain special meaning, but not all words that appear multiple times are meaningful. If a word appears multiple times in all documents, then the word has no value. TF-IDF measures these factors well: TF=(the number of times the word appears in the text information)/(the total number of words in the article), IDF=log(summary of the text information in the corpus/(the number of text information containing the word) +1));

TF-IDF=TF*IDF;

The greater the TF-IDF value, the greater the probability of the word becoming a keyword.

The process of the Rake algorithm is word segmentation, such as punctuation and stop words as the word segmentation standard; then construct a co-occurrence matrix; feature extraction. Contains three features: word frequency freq, degree deg, and the ratio of degree to frequency deg/freq; defines score. score=deg/freq; output in descending order. Output keywords of 1/3 document vocabulary in descending order of score size.

Among them, there is a special process after extracting features. For adjacent keywords, if at least two adjacent keywords in the same document and in the same order are met, they will be merged and become a new candidate keyword. Score is defined as the candidate before merging. The sum of the keyword score. The reason for this operation is that these adjacent candidate keywords are relatively few, and simply adding the score increases their importance.

In other embodiments, a semantic segmentation model may be established in advance through deep learning of a neural network to achieve rapid extraction of keywords.

Step 34: Obtain related images and voices based on the keywords.

Step 35: Process the image and voice to form a video for introducing the application program.

Steps 34-35 have the same or similar technical solutions as the foregoing embodiment, and will not be repeated here.

Referring to FIG. 4, FIG. 4 is a schematic flowchart of a fourth embodiment of the application introduction method provided by the present application. The method is implemented based on a mobile terminal, and the method includes:

Step 41: Obtain the introduction requirement information about the application program; wherein the introduction requirement information is used to indicate the requirement for the introduction application program.

Step 42: Extract keywords in the introduction requirement information.

Steps 41-42 have the same or similar technical solutions as the foregoing embodiment, and will not be repeated here.

Step 43: Send the keywords to the server, so that the server generates associated images and voices based on the keywords.

Optionally, after the server receives the keywords, deep learning based on the convolutional neural network can obtain images and voices associated with the keywords.

In other embodiments, the image may be obtained by the server, and the voice may be recognized by the mobile terminal on the keywords, so as to generate multiple paragraphs of text that match the scene, and convert them into voice.

Step 44: Obtain the image and voice sent by the server.

Step 45: Process the image and voice to form a video for introducing the application program.

Steps 44-45 have the same or similar technical solutions as the foregoing embodiment, and will not be repeated here.

Referring to FIG. 5, FIG. 5 is a schematic flowchart of a fifth embodiment of the application introduction method provided by the present application, and the method includes:

Step 51: Obtain introduction requirement information about the application; where the introduction requirement information is used to indicate the requirement for the introduction of the application.

Step 52: Extract keywords in the introduction requirement information.

Step 53: Send the keywords to the server, so that the server generates associated images and voices based on the keywords.

Step 54: Obtain the image and voice sent by the server.

Steps 51-54 have the same or similar technical solutions as the above-mentioned embodiment, and will not be repeated here.

Step 55: Perform image segmentation on multiple corresponding images, and extract feature information from the images.

Image segmentation is the technique and process of dividing an image into a number of specific areas with unique properties and proposing objects of interest. It is a key step from image processing to image analysis. The existing image segmentation methods are mainly divided into the following categories: threshold-based segmentation methods, region-based segmentation methods, edge-based segmentation methods, and segmentation methods based on specific theories. From a mathematical point of view, image segmentation is the process of dividing a digital image into disjoint areas. The process of image segmentation is also a marking process, that is, the pixels belonging to the same area are assigned the same number.

Among them, the threshold-based segmentation method is a region-based image segmentation technology, the principle is to divide the image pixels into several categories. Image thresholding segmentation is one of the most commonly used traditional image segmentation methods. It has become the most basic and most widely used segmentation technique in image segmentation due to its simple implementation, small calculation amount, and stable performance. It is especially suitable for images where the target and background occupy different gray scale ranges. It can not only greatly compress the amount of data, but also greatly simplifies the analysis and processing steps. Therefore, in many cases, it is a necessary image preprocessing process before image analysis, feature extraction and pattern recognition. The purpose of image thresholding is to divide the pixel set according to the gray level, and each of the obtained subsets forms an area corresponding to the real scene. Each area has the same attributes, while the adjacent areas do not have this Consistent attributes. Such division can be achieved by selecting one or more thresholds starting from the gray level.

Among them, the region-based segmentation method is a segmentation technique based on directly finding the region. The specific algorithms include region growth and region separation and merging algorithms. There are two basic forms of region-based extraction methods: one is region growth, which starts from a single pixel and gradually merges to form the required segmentation area; the other is to start from the overall situation and gradually cut to the required segmentation area.

Among them, edge-based segmentation mainly includes point-based detection, line-based detection, and edge-based detection.

Among them, segmentation methods based on specific theories can be divided into cluster analysis, fuzzy set theory, gene coding, wavelet transform and other methods.

Optionally, after the image is segmented, feature extraction is performed based on the keywords and the scene, to perform step 56.

Step 56: Combine the feature information to generate multiple image frames.

Step 57: Form multiple image frames into animation.

Optionally, steps 55-57 are specifically:

Deep learning is performed in advance through the convolutional neural network to establish an image model so that the corresponding feature information generates multiple image frames, and then the multiple image frames are formed into an animation.

Step 58: The animation and voice are merged to form a video for introducing the application.

Referring to FIG. 6, FIG. 6 is a schematic flowchart of a sixth embodiment of the application introduction method provided by the present application. The method is implemented based on a server, and the method includes:

Step 61: Acquire keywords sent by the mobile terminal; wherein the keywords are extracted by the mobile terminal based on the acquired introduction requirement information about the application, and the introduction requirement information is used to indicate the requirement for introducing the application.

Optionally, after the mobile terminal obtains the introduction requirement information about the application, it extracts the keywords and sends the keywords to the server.

Step 62: Generate associated images and voices based on the keywords.

Optionally, the server performs model training through the relevant content of the application in advance, so that when the keyword of the mobile terminal is obtained, it responds quickly and obtains the image and voice associated with the keyword.

Step 63: Send the image and voice to the mobile terminal, so that the mobile terminal processes the image and voice to form a video for introducing the application program.

Optionally, the generated image and voice are sent to the mobile terminal, so that the mobile terminal extracts feature information of the image, and then combines the feature information to generate multiple image frames and combine to generate multiple image frames. The mobile terminal merges multiple image frames into animation and voice to form a video for introducing the application.

Different from the situation in the prior art, an application introduction method of this application includes: acquiring keywords sent by a mobile terminal; wherein the keywords are extracted by the mobile terminal based on the acquired introduction requirement information about the application. The introduction demand information is used to express the demand for introducing the application; generate related images and voice based on keywords; send the image and voice to the mobile terminal, so that the mobile terminal can process the image and voice to form the application program Introductory video. Through the above methods, the needs of users can be easily obtained, and different application programs can be introduced according to the different needs of users. On the one hand, it can adapt to different user groups and make the application meet the needs of more user groups. The introduction of the application in the form of animation can increase the personalization of the application introduction, increase the interest, and improve the user experience

Referring to FIG. 7, FIG. 7 is a schematic flowchart of a seventh embodiment of the application introduction method provided by the present application, and the method includes:

Step 71: Acquire keywords sent by the mobile terminal; wherein the keywords are extracted by the mobile terminal based on the acquired introduction requirement information about the application, and the introduction requirement information is used to indicate the requirement for introducing the application.

Step 72: Pass the keywords through deep learning to obtain associated images from the preset image library.

Deep learning models include convolutional neural network (convolutional neural network), DBN (Deep Belief Network, deep trust network model) and stacked auto-encoder network models.

Convolutional neural networks are inspired by the structure of the visual system. The first convolutional neural network calculation model was proposed in the neurocognitive machine. Based on the local connection between neurons and the layered organization image conversion, the neurons with the same parameters are applied to the difference of the previous layer of neural network. Position, get a translation-invariant neural network structure. Later, on the basis of this idea, a convolutional neural network was designed and trained with error gradients to obtain superior performance on some pattern recognition tasks.

DBN can be interpreted as a Bayesian probability generation model, which is composed of multiple layers of random latent variables. The upper two layers have undirected symmetrical connections, and the lower layer gets top-down directed connections from the upper layer, and the lowest layer unit The state of is the visible input data vector. The DBN is composed of a stack of 2F structural units, and the structural unit is usually RBM (RestIlcted Boltzmann Machine, Restricted Boltzmann Machine). The number of neurons in the visible layer of each RBM unit in the stack is equal to the number of neurons in the hidden layer of the previous RBM unit. According to the deep learning mechanism, the input samples are used to train the first-layer RBM units, and their output is used to train the second-layer RBM models, and the RBM models are stacked to improve the model performance by adding layers. In the unsupervised pre-training process, after the DBN code is input to the top RBM, the state of the top layer is decoded to the unit of the bottom layer to realize the reconstruction of the input. As the structural unit of DBN, RBM shares parameters with each layer of DBN.

The structure of the stacked self-encoding network is similar to that of the DBN, consisting of a stack of several structural units. The difference is that the structural unit is an auto-en-coder instead of RBM. The self-encoding model is a two-layer neural network, the first layer is called the coding layer, and the second layer is called the decoding layer.

Optionally, the server needs to generate a corresponding scene prediction according to the characteristics of the application and the characteristics of the keywords, and search for the corresponding image according to the scene.

Optionally, when the preset image library does not meet the search requirements, the server will search for images in the Internet.

Step 73: Pass the keywords through deep learning to generate text information that meets the keyword scene.

Step 74: Convert the text information into voice.

Step 75: Send the image and voice to the mobile terminal, so that the mobile terminal processes the image and voice to form a video for introducing the application program.

Optionally, the server first retrieves a large number of images, sends the images to the mobile terminal, and the mobile terminal divides the images according to keywords, and then combines them according to the scene to form an animation. It is voice fusion to form a video for introducing the application.

Referring to FIG. 8, FIG. 8 is a schematic structural diagram of a first embodiment of a mobile terminal provided by the present application. The mobile terminal 80 includes a processor 81 and a memory 82 connected to the processor 81; the memory 82 is used to store program data, and the processor 81 Used to execute program data to implement the following methods:

Obtain the introduction demand information about the application; among them, the introduction demand information is used to express the demand for introducing the application; extract the keywords in the introduction demand information; obtain the associated images and voices based on the keywords; process the images and voices To form a video for introducing the application.

Optionally, the processor 81 for executing the program data is also used to implement the following methods: perform voice recognition on audio information to obtain text information; and perform keyword extraction on text information to obtain keywords.

Optionally, the processor 81 is used to execute the program data to implement the following method: perform semantic segmentation on the text information; obtain keywords based on the result of the semantic segmentation.

Optionally, the processor 81 used to execute the program data is also used to implement the following method: input text information into a convolutional neural network for deep learning, so as to perform semantic segmentation on the text information to obtain keywords.

Optionally, the processor 81 is used to execute the program data to implement the following method: sending keywords to the server so that the server generates associated images and voices based on the keywords; acquiring images and voices sent by the server.

Optionally, the processor 81 is configured to execute the program data to implement the following method: image segmentation of multiple corresponding images, extraction of feature information in the image; combination of feature information to generate multiple image frames; Multiple image frames are formed into animation; animation and voice are merged to form a video for introducing the application.

Optionally, the processor 81 is used to execute the program data to implement the following method: acquiring background music sent by the server; wherein the background music is music generated by the server based on keywords; adding the background music to the video.

Referring to FIG. 9, FIG. 9 is a schematic structural diagram of a first embodiment of a server provided by the present application. The server 90 includes a processor 91 and a memory 92 connected to the processor 91; the memory 92 is used to store program data, and the processor 91 is used to store program data. The program data is executed to achieve the following methods:

Acquire keywords sent by the mobile terminal; among them, the keywords are extracted by the mobile terminal based on the acquired introduction demand information about the application, and the introduction demand information is used to indicate the demand for introducing the application; the associated image is generated based on the keywords And voice; send images and voices to the mobile terminal so that the mobile terminal can process the images and voices to form a video for introducing the application.

Optionally, the processor 91 used to execute the program data is also used to implement the following method: pass keywords through deep learning to obtain associated images from a preset image library.

Optionally, the processor 91 used to execute the program data is also used to implement the following method: pass keywords through deep learning to generate text information that meets the keyword scene; convert the text information into voice

Referring to FIG. 10, FIG. 10 is a schematic structural diagram of an embodiment of a computer storage medium provided by the present application. The computer storage medium 100 is used to store program data 101. When the program data 101 is executed by a processor, it is used to implement the following methods:

Obtain the introduction demand information about the application; among them, the introduction demand information is used to express the demand for introducing the application; extract the keywords in the introduction demand information; obtain the associated images and voices based on the keywords; process the images and voices , To form a video for introducing the application;

Or, acquire keywords sent by the mobile terminal; wherein, the keywords are extracted by the mobile terminal based on the acquired introduction demand information about the application, and the introduction demand information is used to indicate the demand for introducing the application; and the correlation is generated based on the keywords The image and voice of the mobile terminal; send the image and voice to the mobile terminal so that the mobile terminal can process the image and voice to form a video for introducing the application.

It can be understood that the computer storage medium can be applied to the above-mentioned mobile terminal or the above-mentioned server to implement the method of any one of the above-mentioned embodiments.

Referring to FIG. 11, FIG. 11 is a schematic structural diagram of a second embodiment of a mobile terminal provided by the present application. The mobile terminal 110 includes: an acquisition module 111, an extraction module 112, and a processing module 113.

The obtaining module 111 is used for obtaining introduction requirement information about the application program; wherein, the introduction requirement information is used to indicate the requirement for introducing the application program;

The extraction module 112 is used to extract keywords in the introduction demand information;

The obtaining module 111 is also used to obtain related images and voices based on keywords;

The processing module 113 is used to process images and voices to form a video for introducing the application program.

Referring to FIG. 12, FIG. 12 is a schematic structural diagram of a second embodiment of a server provided by the present application. The server 120 includes: an obtaining module 121, a processing module 122, and a sending module 123.

The obtaining module 121 is used to obtain keywords sent by the mobile terminal; wherein, the keywords are extracted by the mobile terminal based on the obtained introduction requirement information about the application, and the introduction requirement information is used to indicate the requirement for introducing the application;

The processing module 122 is configured to generate associated images and voices based on keywords;

The sending module 123 is configured to send images and voices to the mobile terminal, so that the mobile terminal processes the images and voices to form a video for introducing the application program.

In the several implementation manners provided in this application, it should be understood that the disclosed method and device may be implemented in other ways. For example, the device implementation described above is only illustrative. For example, the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be Combined or can be integrated into another system, or some features can be ignored or not implemented.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of this embodiment.

In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

If the integrated unit in the other embodiments described above is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to enable a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes. .

The above are only examples of this application, and do not limit the scope of this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of this application, or directly or indirectly applied to other related technical fields, The same reasoning is included in the scope of patent protection of this application.

Claims

A method for introducing an application program, which is characterized in that it includes:

Acquire introduction requirement information about the application program; wherein, the introduction requirement information is used to indicate a requirement for introducing the application program;

Extract the keywords in the introduction demand information;

Acquiring associated images and voices based on the keywords;

The image and the voice are processed to form a video for introducing the application program.
The method of claim 1, wherein:

The introduction requirement information is audio information;

The extraction of keywords in the introduction requirement information includes:

Performing voice recognition on the audio information to obtain text information;

Keyword extraction is performed on the text information to obtain keywords.
The method of claim 2, wherein:

The keyword extraction on the text information to obtain keywords includes:

Perform semantic segmentation on the text information;

A keyword is obtained based on the result of the semantic segmentation.
The method of claim 3, wherein:

The semantic segmentation of the text information includes:

The text information is input to a convolutional neural network for deep learning, so as to perform semantic segmentation on the text information to obtain keywords.
The method of claim 1, wherein:

The introduction requirement information is text information;

The extraction of keywords in the introduction requirement information includes:

Perform semantic segmentation on the text information;

A keyword is obtained based on the result of the semantic segmentation.
The method of claim 1, wherein:

The obtaining the associated image and voice based on the keyword includes:

Sending the keywords to a server, so that the server generates associated images and voices based on the keywords;

Acquiring the image and the voice sent by the server.
The method of claim 1, wherein:

The processing the image and the voice to form a video for introducing the application program includes:

Performing image segmentation on a plurality of the corresponding images, and extracting feature information in the images;

Combining the feature information to generate multiple image frames;

Forming the plurality of image frames into an animation;

The animation and the voice are merged to form a video for introducing the application.
The method according to claim 7, wherein:

The method also includes:

Acquiring background music sent by the server; wherein the background music is music generated by the server based on the keywords;

Add the background music to the video.
A method for introducing an application program, which is characterized in that it includes:

Acquiring keywords sent by the mobile terminal; wherein the keywords are extracted by the mobile terminal based on the acquired introduction demand information about the application, and the introduction demand information is used to indicate the demand for introducing the application;

Generate related images and voices based on the keywords;

The image and the voice are sent to the mobile terminal, so that the mobile terminal processes the image and the voice to form a video for introducing the application program.
The method of claim 9, wherein:

The generating associated images and voices based on the keywords includes:

The keywords are subjected to deep learning to obtain associated images from a preset image library.
The method of claim 10, wherein:

The generating associated images and voices based on the keywords includes:

Pass the keyword through deep learning to generate text information that meets the keyword scene;

Convert the text information into the voice.
A mobile terminal, characterized in that the mobile terminal includes a processor and a memory connected to the processor;

The memory is used to store program data, and the processor is used to execute the program data to implement the method according to any one of claims 1-8.
A server, characterized in that the server includes a processor and a memory connected to the processor;

The memory is used to store program data, and the processor is used to execute the program data to implement the method according to any one of claims 9-11.
A computer storage medium, wherein the computer storage medium is used to store program data, and when the program data is executed by a processor, it is used to implement the method according to any one of claims 1-11.
A mobile terminal, characterized in that, the mobile terminal includes:

The obtaining module is used to obtain introduction requirement information about the application program; wherein, the introduction requirement information is used to indicate the requirement for introducing the application program;

The extraction module is used to extract keywords in the introduction demand information;

The acquisition module is also used to acquire associated images and voices based on the keywords;

The processing module is configured to process the image and the voice to form a video for introducing the application program.
A server, characterized in that the server includes:

The acquiring module is used to acquire keywords sent by the mobile terminal; wherein, the keywords are extracted by the mobile terminal based on the acquired introduction demand information about the application, and the introduction demand information is used to indicate that the introduction Application requirements;

A processing module for generating associated images and voices based on the keywords;

The sending module is configured to send the image and the voice to the mobile terminal, so that the mobile terminal processes the image and the voice to form a video for introducing the application program.