WO2013189156A1 - Système de recherche vidéo, procédé et serveur de recherche vidéo basés sur une entrée d'interaction naturelle - Google Patents

Système de recherche vidéo, procédé et serveur de recherche vidéo basés sur une entrée d'interaction naturelle Download PDF

Info

Publication number
WO2013189156A1
WO2013189156A1 PCT/CN2012/086283 CN2012086283W WO2013189156A1 WO 2013189156 A1 WO2013189156 A1 WO 2013189156A1 CN 2012086283 W CN2012086283 W CN 2012086283W WO 2013189156 A1 WO2013189156 A1 WO 2013189156A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
semantic
user
search
data
Prior art date
Application number
PCT/CN2012/086283
Other languages
English (en)
Chinese (zh)
Inventor
王勇进
张瑞
张钰林
Original Assignee
海信集团有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 海信集团有限公司 filed Critical 海信集团有限公司
Publication of WO2013189156A1 publication Critical patent/WO2013189156A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Definitions

  • the present invention relates to the field of video search technology, and more particularly to a video search system and method based on natural interactive input (e.g., voice input), and a video search server.
  • natural interactive input e.g., voice input
  • SIRI Personalized Intelligent Assistant
  • Another object of the present invention is to provide a video search method based on natural interactive input, which can realize intelligent perception of a user's video target task and provide a more natural and friendly user experience.
  • Still another object of the present invention is to provide a video search server having natural language semantic analysis capabilities and intelligent video search capabilities.
  • the video search system based on natural interaction input provided by the embodiment of the present invention includes a user end and a video search server.
  • the user end includes a voice collection module and a human machine interface, and the voice collection module collects voice input of the user to generate user voice data and provide the same to the human machine interface.
  • the video search server comprises a control module, a speech recognition module, a natural language processing module, a video relational database and a video search module; the video relational database stores a video semantic space and a semantic descriptor sub-set of the video text data in the video semantic space.
  • the control module receives the user voice data provided by the human-machine interface of the client and provides the voice recognition module to obtain the user text data, and provides the user text data to the natural language processing module to obtain the user text semantic analysis result data, and utilizes the user semantic analysis result.
  • the data is pre-searched in the video relational database to obtain video pre-search results.
  • the video pre-search result includes a subset of semantic descriptions of the associated video text data that matches the user text semantic analysis result data in the video semantic space.
  • the video search module receives the user text semantic analysis result data and the video pre-search result provided by the control module, and uses the user text semantic analysis result data in the semantic space of the video semantic descriptor to be similar to the semantic description sub-set included in the video pre-search result. Degree comparison The result is the final result of the video output to the control module, which is then provided by the control module to the human machine interface for presentation to the user.
  • a video search method based on natural interactive input includes the following steps: (a) collecting natural interaction input of the user to obtain user text data; (b) performing natural language semantic analysis on user text data. Obtaining user text semantic analysis result data; (c) using user text semantic analysis result data to perform pre-search to obtain video pre-search results, the video pre-search results including related video text data matching user text semantic analysis result data in video semantic space Semantic description sub-set; (d) projecting the user text semantic analysis result data into the video semantic space, and performing similarity comparison with the semantic description sub-set included in the video pre-search result, respectively, and outputting the video final search result; and (e ) Present the video's final search results to the user.
  • a video input method based on voice input includes the following steps: (1) Quantifying video file semantic analysis result data obtained by performing natural language semantic analysis on the collected video text data. And based on the latent semantic index, the training semantic learning space is obtained, and the collected semantics of the video text data in the semantic space of the video is obtained; (2) collecting the natural interaction input of the user to obtain the user text data; (3) Performing natural language semantic analysis on user text data to obtain user text semantic analysis result data; (4) using user text semantic analysis result data in the semantic space of the video semantic space to at least partially collect video text data in the video semantic space
  • the semantic description sub-set performs similarity comparison to output the video final search result; and (5) presents the video final search result to the user.
  • a video search server includes: a video relational database, a natural language processing module, a control module, and a video search module.
  • the video relational database stores the video semantic space and the semantic description of the video text data in the video semantic space
  • the control module provides the user text data representing the user video requirement to the natural language processing module to obtain the user text semantic analysis result data; the video search module obtains the semantic description of the user text semantic analysis result data in the video semantic space, And using the semantic descriptor to perform similarity comparison on at least part of the video text data in the semantic description sub-set of the video semantic space to output the video final search result to the control module.
  • the natural interactive input based video search system and method and video search server in the above various embodiments of the present invention have at least one or more of the following advantages: being capable of being guided by a user's video target task, allowing the user to interact using natural language Through natural language processing technology, using the video-related knowledge base for reasoning operations, the user can quickly obtain related videos from the database by providing a simple description of the video content, thereby realizing intelligent perception of the user's video target tasks; It can realize natural and friendly human-computer interaction mode and interface, and has the ability to continuously learn and upgrade; therefore, it can effectively enhance the user experience.
  • FIG. 1 is a schematic diagram of a video search system architecture based on natural interactive input (eg, voice input) according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a module of the user end shown in FIG. 1.
  • FIG. 3 is a schematic diagram of a module of the video search server shown in FIG. 1.
  • FIG. 4 is a flowchart of a video input method based on voice input according to an embodiment of the present invention.
  • FIG. 5 is a flowchart of another voice input based video search method according to an embodiment of the present invention. detailed description
  • FIG. 1 is a schematic structural diagram of a video search system based on natural interactive input (for example, voice input) according to an embodiment of the present invention.
  • the voice input based video search system 100 of the present embodiment includes a client 10 and a video search server 30; the client 10 receives user voice input and generates user voice data, and the video search server 30 generates voice data according to the user.
  • a video search is performed and the video final search results are returned to the client 10 for presentation to the user.
  • one video search server 30 may correspond to multiple user terminals 10, so that the user voice data of each client terminal 10 can be respectively responded to and the corresponding video is returned. Final search results.
  • FIG. 2 is a schematic diagram of a module of the client 10 according to an embodiment of the present invention.
  • the client 10 includes, for example, a voice collection module 11 and a human machine interface 13.
  • voice The acquisition module 11 collects user voice input and generates user voice data, which is transmitted to the video search server 30 through the human machine interface 13.
  • the tasks of the human machine interface 13 include, for example, human-computer interaction, user information recording, and user authentication.
  • two usage modes may be specifically provided for the user, such as a public mode and a privacy mode.
  • the video search server 30 may perform video search in two ways of enabling or skipping user authentication, so that The user's personal information is protected, and suitable video search results can be provided to users of different ages.
  • the client 10 is, for example, a smart TV with a TV remote control (having an Internet function), a desktop computer, a notebook computer, a smart phone, and the like; when the client terminal 10 is a smart TV with a TV remote controller.
  • the voice collection module 11 may be a microphone built in the TV remote controller, and the human machine interface 13 may be a Hyper Text Transport Protocol (HTTP) website service running on a smart TV (for example, port 80), which will
  • HTTP Hyper Text Transport Protocol
  • the user voice data output by the microphone is transmitted to the video search server 30 for video search, and the video final search result may be displayed subsequently for presentation to the user; further, it can be understood that before the user voice data is transmitted to the video search server 30
  • Data compression can be performed on user voice data first.
  • FIG. 3 is a block diagram of a video search server 30 according to an embodiment of the present invention.
  • the video search server 30 includes a control module 31, a voice recognition module 33, a natural language processing module 35, a video data collection module 36, a video relational database 37, a semantic space learning module 38, a video search module 39, and a server.
  • Management module 32 It is explained here that each module in the video search server 30 can be implemented in hardware and/or software according to the needs of actual design flexibility; further, the video search server 30 can be a group consisting of a single server or multiple servers. Group, plus the necessary peripheral components.
  • the video search server 30 includes two parts, an online and an offline, and the online part is mainly controlled by a module. 31.
  • the speech recognition module 33, the natural language processing module 35, and the video search module 39 are configured.
  • the offline portion is mainly composed of a video data collection module 36, a video relational database 37, and a semantic space learning module 38, and shares a natural language with the online portion. Processing module 35.
  • control module 31 serves as a scheduling center of the entire video search server 30, and receives the user voice data transmitted by the client terminal 10 (for example, transmitted by a wired or wireless network connection) and finally returns the video final search result as an output to the client. 10.
  • the control module 31 first verifies the identity of the user, and determines whether to perform a video search and/or return a video before the final search result according to the authentication result. Search results filtering is required first.
  • the speech recognition module 33 is for speech recognition of speech data for conversion to corresponding textual data, which is typically coupled to a speech library (not shown in Figure 3) for speech instruction matching operations.
  • the voice recognition module 33 can convert the user voice data provided by the control module 31 into user text data representing the user's video requirements and return it to the control module 31.
  • the natural language processing module 35 is adapted to perform semantic analysis on text data (such as user text data, video text data, etc.), for example, to perform Chinese semantic analysis: including word segmentation, part-of-speech tagging, named entity analysis, and the like.
  • text data such as user text data, video text data, etc.
  • Chinese semantic analysis including word segmentation, part-of-speech tagging, named entity analysis, and the like.
  • the natural language processing module 35 can also perform semantic analysis on different language texts, and is not limited to Chinese, but also English, etc., but only needs to provide semantic libraries of different languages to support.
  • the natural language processing module 35 can perform semantic analysis on the user text data provided by the control module 31 to return the user text semantic analysis result data to the control module 31.
  • the user text semantic analysis result data can be understood as user text data after the operation of word segmentation, part-of-speech tagging, and the like.
  • the video data collection module 36 is configured to collect video data and provide video text data, the video
  • the text data may be text data such as a movie name, an alias, a director name, an actor name, a video production date, and a video theme type (such as a video name, an alias, a director name, and the like) searched from a network (including a film and television program providing partner).
  • Video descriptions such as war films, comedy films, etc., video regions (such as China, the United States, etc.) or languages (such as Chinese, English, etc.), video categories (such as movies, TV shows, etc.), and data validity tags. text.
  • the video data collection module 36 can operate in a periodic automatic collection or manually triggered collection.
  • the video text data provided by the video data collection module 36 is first transmitted to the natural language processing module 35 for natural language semantic analysis to form the video text semantic analysis result data and stored in the video relation database 37; it can be understood that
  • the video text data provided by the video data collecting module 36 may also be stored in the video relation database 37, and then the natural language processing module 37 performs word segmentation, part-of-speech tagging, etc. on the video text data stored in the video relational database 37 (ie, semantic analysis). )operating.
  • the video text semantic analysis result data can be understood as the result data after the operation of the word segmentation and the part-of-speech tagging of the video text data.
  • the video relational database 37 performs a video search as a data source of the video search server 30, and includes a data table such as a video data table, a backup data table, a user table, and a query record table.
  • the video data table stores, for example, semantically analyzed video text data
  • the backup data table stores, for example, duplicated and culled data
  • the user table stores, for example, user data
  • the query record table for example, saves the user's video search record.
  • the semantic spatial learning module 38 is a major part of the machine learning of the voice input based video search system 100, which is primarily responsible for quantifying the video text data in the video relational database 37 and then based on Latent semantic indexing (LSI) versus video.
  • LSI Latent semantic indexing
  • the main semantics of the relational database 37 are analyzed and learned to obtain the video semantic space, and the collected video text is found.
  • the data is in a semantically described sub-set of the video semantic space (i.e., a projected set in the video semantic space) and stored in the video relational database 37.
  • the process of establishing the video semantic space may be: the semantic space learning module 38 uses the semantic analysis result data of the video text stored in the video relational database 37 as a training sample set, so that a vocabulary containing a large number of useful words is established, and then Using this vocabulary, each video text data (ie, video description) can be quantified and ultimately represented by a vector; at this point, each element in the vector will represent a word in a certain video text data. The number of occurrences, the vector is also the word frequency of the video text data.
  • some special directions can be calculated in the linear space to which the word frequency vector belongs, and the vector representing these special directions is a set of standard orthogonal vector groups. They form a new linear space.
  • the special physical meaning of this set of vectors is: Any of these vectors represent certain vocabulary that often appear simultaneously in a specific context. Each such specific context corresponds to a semantic topic, that is, the simultaneous appearance of certain vocabulary Represents a semantic. However, only a part of this set of special vectors constituting the new linear space has a very high degree of semantic discrimination and is therefore preserved. These preserved vectors ultimately constitute the video semantic space.
  • the video text data in the video relational database 37 will find a projection in the video semantic space, i.e., a semantic descriptor of the video text data in the video semantic space.
  • the video search module 39 is connected to the control module 31 and the video relation database 37, which can receive the user text semantic analysis result data provided by the control module 31 and can acquire the video semantic space from the video relation database 37 (for example, the coordinate axis of the semantic space, etc.) Information) and projecting the user text semantic analysis result data in the video semantic space to obtain a projection (ie, a semantic descriptor) of the user text data in the video semantic space. Subsequently, the video search module 39 can utilize the semantic description The descriptor performs a video search operation.
  • the video search operation of the video search module 39 in the embodiment of the present invention may be: First, let the control module 31 perform video pre-search in the video relational database 37 by using the user text semantic analysis result data (that is, the semantically analyzed user text data). For example, a classified search: that is, a video director name search, a video actor name search, a video production age search, a video theme type search, a video region or language type search, and a video category search, etc.; It is possible to reduce the workload of the video search module 39 for video search and improve the search efficiency.
  • the user text semantic analysis result data that is, the semantically analyzed user text data.
  • a classified search that is, a video director name search, a video actor name search, a video production age search, a video theme type search, a video region or language type search, and a video category search, etc.
  • the video pre-search result includes, for example, a set of semantic descriptors of the related video text data matching the user text data in the video semantic space, and the semantic description sub-collection is provided to the video search module 39 along with the user text semantic analysis result data. .
  • the video search module 39 compares the semantic description of the user text data in the semantic space of the video semantic space with the related video text data included in the video pre-search result in the semantic description sub-set of the video semantic space to obtain a video final search result. And transmitted to the control module 31, and then provided by the control module 31 to the human machine interface 13 of the client 10 for presentation to the user.
  • the similarity comparison can be realized by calculating the Euclidean distance, but the present invention is not limited thereto, and other methods for calculating the similarity between projections in the semantic space can be employed.
  • the final search result of the video here may be a list of videos sorted according to the degree of similarity.
  • the semantic space search is performed on a part of the video text data in the semantic description sub-set of the video semantic space.
  • the video pre-search may also be omitted, and the semantic descriptor of the video semantic space is directly utilized by the user to perform semantic space search in the semantic description sub-set of the video semantic space. Frequency final search results.
  • the server management module 32 is disposed in the video search server 30 as a module that is not user-oriented.
  • the voice recognition module 33 of the above embodiment of the present invention can also be integrated into the client terminal 10 instead of the video search server 30, so that the client terminal 10 can convert the user voice data into user text data before transmitting it to the video search server 30.
  • Control module 31 in the middle can also be integrated into the client terminal 10 instead of the video search server 30, so that the client terminal 10 can convert the user voice data into user text data before transmitting it to the video search server 30.
  • a video input method based on voice input mainly includes S400-S410:
  • S400 collecting voice input of the user to generate user voice data
  • S402 Perform speech recognition on user voice data to obtain user text data.
  • S406 Perform a pre-search (for example, the foregoing classified search) by using the user text semantic analysis result data to obtain a video pre-search result, where the video pre-search result includes the semantics of the related video text data in the video semantic space that matches the user text semantic analysis result data. Describe subsets;
  • S408 Projecting the user text semantic analysis result data to the video semantic space, and performing similarity comparison with the semantic description sub-set included in the video pre-search result to output a final video search result (for example, sorting according to the similarity score) Video list); and
  • S410 Present the final search result of the video to the user.
  • another video input method based on voice input includes, for example, mainly steps S500-S510:
  • S500 The video text semantic analysis result data obtained by performing natural language semantic analysis on the collected video text data is quantized and trained based on the latent semantic index to obtain a video semantic space, and the collected video text data is obtained in the video. a semantic description sub-collection of the semantic space;
  • S502 collecting voice input of the user and converting the data into user text
  • S504 performing natural language semantic analysis on user text data to obtain user text semantic analysis result data
  • step S506 Using a semantic description of the user text semantic analysis result data in the video semantic space, performing similarity comparison on the at least partially collected video text data in a semantic description sub-set of the video semantic space to output a video final search result;
  • the method includes the foregoing performing a video pre-search (for example, the foregoing classified search), performing a semantic space search, and directly performing a semantic space search without performing a video pre-search;
  • S508 Present the final search result of the video to the user.
  • the natural interactive input mode is not limited to voice input, and can also be direct natural language text input, or even gesture input; accordingly, in the video search methods of the above embodiments, The text conversion step of the user voice data is not required; and the module design in the video search system can be appropriately increased, decreased, and/or changed according to the actual situation.
  • the video search system and method based on natural interaction input, such as voice input, and the video search server provided by the embodiments of the present invention have at least one or more of the following advantages.
  • the video search system and method based on natural interactive input, such as voice input, and the video search server provided by the present invention have at least one or more of the following advantages: being capable of being guided by a user's video target task, allowing the user to interact using natural language, Through the natural language processing technology, using the video-related knowledge base for the reasoning operation, the user can quickly obtain the relevant video from the database by providing a simple description of the video content, thereby realizing the intelligent perception of the user's video target task; It can realize natural and friendly human-computer interaction mode and interface, and has the ability to continuously learn and upgrade; therefore, it can effectively enhance the user experience.

Landscapes

  • Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention concerne le domaine des technologies de recherche vidéo, et fournit un système de recherche vidéo, et un procédé et un serveur de recherche vidéo basés sur une entrée d'interaction naturelle. Une extrémité d'utilisateur du système de recherche vidéo reçoit une entrée d'interaction naturelle d'un utilisateur et fournit l'entrée au serveur de recherche vidéo de celui-ci pour une recherche de vidéo, le serveur de recherche vidéo pouvant comprendre une partie en ligne et une partie hors ligne. La partie hors ligne effectue une analyse sémantique des informations vidéo collectées, et établit un espace sémantique vidéo et une base de données de relations vidéo. La partie en ligne, en fonction de l'entrée d'interaction naturelle de l'utilisateur, obtient des données textuelles de l'utilisateur et effectue l'analyse sémantique, utilise un résultat de l'analyse sémantique pour effectuer une pré-recherche vidéo dans la base de données de relations, et en fonction de la description sémantique du résultat d'analyse sémantique dans l'espace sémantique vidéo, effectue une comparaison et une recherche dans un sous-ensemble de description sémantique contenu dans un résultat de la pré-recherche vidéo pour délivrer en sortie un résultat final de recherche vidéo à l'utilisateur. L'utilisateur n'a qu'à fournir une description simple du contenu vidéo afin d'obtenir rapidement la vidéo appropriée à partir de la base de données, ce qui permet de mettre en œuvre une reconnaissance intelligente d'une tâche cible vidéo de l'utilisateur.
PCT/CN2012/086283 2012-06-18 2012-12-10 Système de recherche vidéo, procédé et serveur de recherche vidéo basés sur une entrée d'interaction naturelle WO2013189156A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210199239.7 2012-06-18
CN201210199239.7A CN102750366B (zh) 2012-06-18 2012-06-18 基于自然交互输入的视频搜索系统及方法

Publications (1)

Publication Number Publication Date
WO2013189156A1 true WO2013189156A1 (fr) 2013-12-27

Family

ID=47030551

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/086283 WO2013189156A1 (fr) 2012-06-18 2012-12-10 Système de recherche vidéo, procédé et serveur de recherche vidéo basés sur une entrée d'interaction naturelle

Country Status (2)

Country Link
CN (1) CN102750366B (fr)
WO (1) WO2013189156A1 (fr)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750366B (zh) * 2012-06-18 2015-05-27 海信集团有限公司 基于自然交互输入的视频搜索系统及方法
CN103970791B (zh) * 2013-02-01 2018-01-23 华为技术有限公司 一种从视频库推荐视频的方法、装置
CN104240700B (zh) * 2014-08-26 2018-09-07 智歌科技(北京)有限公司 一种面向车载终端设备的全局语音交互方法及系统
US9886958B2 (en) * 2015-12-11 2018-02-06 Microsoft Technology Licensing, Llc Language and domain independent model based approach for on-screen item selection
CN106776872A (zh) * 2016-11-29 2017-05-31 暴风集团股份有限公司 根据语音定义语意进行语音搜索的方法及系统
CN108549655A (zh) * 2018-03-09 2018-09-18 阿里巴巴集团控股有限公司 一种影视作品的制作方法、装置及设备
CN109089133B (zh) 2018-08-07 2020-08-11 北京市商汤科技开发有限公司 视频处理方法及装置、电子设备和存储介质
CN110968736B (zh) * 2019-12-04 2021-02-02 深圳追一科技有限公司 视频生成方法、装置、电子设备及存储介质
CN113596602A (zh) * 2021-07-28 2021-11-02 深圳创维-Rgb电子有限公司 智能匹配方法、电视和计算机可读存储介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120131060A1 (en) * 2010-11-24 2012-05-24 Robert Heidasch Systems and methods performing semantic analysis to facilitate audio information searches
CN102750366A (zh) * 2012-06-18 2012-10-24 海信集团有限公司 基于自然交互输入的视频搜索系统及方法和视频搜索服务器

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080059522A1 (en) * 2006-08-29 2008-03-06 International Business Machines Corporation System and method for automatically creating personal profiles for video characters
CN100565532C (zh) * 2008-05-28 2009-12-02 叶睿智 一种基于音频内容检索的多媒体资源检索方法
CN101382937B (zh) * 2008-07-01 2011-03-30 深圳先进技术研究院 基于语音识别的多媒体资源处理方法及其在线教学系统
CN101404035A (zh) * 2008-11-21 2009-04-08 北京得意音通技术有限责任公司 一种基于文本或语音的信息搜索方法
CN102063476B (zh) * 2010-12-13 2013-07-10 百度时代网络技术(北京)有限公司 视频搜索方法及系统
CN102262624A (zh) * 2011-08-08 2011-11-30 中国科学院自动化研究所 基于多模态辅助的实现跨语言沟通系统及方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120131060A1 (en) * 2010-11-24 2012-05-24 Robert Heidasch Systems and methods performing semantic analysis to facilitate audio information searches
CN102750366A (zh) * 2012-06-18 2012-10-24 海信集团有限公司 基于自然交互输入的视频搜索系统及方法和视频搜索服务器

Also Published As

Publication number Publication date
CN102750366B (zh) 2015-05-27
CN102750366A (zh) 2012-10-24

Similar Documents

Publication Publication Date Title
WO2013189156A1 (fr) Système de recherche vidéo, procédé et serveur de recherche vidéo basés sur une entrée d'interaction naturelle
US11842727B2 (en) Natural language processing with contextual data representing displayed content
CN106575293B (zh) 孤立话语检测系统和方法
US20180121547A1 (en) Systems and methods for providing information discovery and retrieval
US9972358B2 (en) Interactive video generation
CN111666416B (zh) 用于生成语义匹配模型的方法和装置
CN112000820A (zh) 一种媒资推荐方法及显示设备
US10650814B2 (en) Interactive question-answering apparatus and method thereof
US9519355B2 (en) Mobile device event control with digital images
US11782979B2 (en) Method and apparatus for video searches and index construction
CN111432282B (zh) 一种视频推荐方法及装置
JP7140913B2 (ja) 映像配信時効の決定方法及び装置
CN111522909A (zh) 一种语音交互方法及服务器
CN112182196A (zh) 应用于多轮对话的服务设备及多轮对话方法
CN108256071B (zh) 录屏文件的生成方法、装置、终端及存储介质
TWI748266B (zh) 搜索方法、電子裝置及非暫時性電腦可讀記錄媒體
CN115273840A (zh) 语音交互设备和语音交互方法
CN116758362A (zh) 图像处理方法、装置、计算机设备及存储介质
KR102122918B1 (ko) 대화형 질의응답 장치 및 그 방법
CN114781365A (zh) 端到端模型训练方法、语义理解方法、装置、设备和介质
CN116030375A (zh) 视频特征提取、模型训练方法、装置、设备及存储介质
CN114840711A (zh) 一种智能设备与主题构建方法
CN111950288B (zh) 一种命名实体识别中的实体标注方法及智能设备
CN115114931A (zh) 模型训练方法、短视频召回方法、装置、设备和介质
CN111344664B (zh) 电子设备及其控制方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12879332

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12879332

Country of ref document: EP

Kind code of ref document: A1