CN112860979A - Resource searching method, device, equipment and storage medium - Google Patents

Resource searching method, device, equipment and storage medium Download PDF

Info

Publication number
CN112860979A
CN112860979A CN202110181662.3A CN202110181662A CN112860979A CN 112860979 A CN112860979 A CN 112860979A CN 202110181662 A CN202110181662 A CN 202110181662A CN 112860979 A CN112860979 A CN 112860979A
Authority
CN
China
Prior art keywords
resource
index file
emoticons
search
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110181662.3A
Other languages
Chinese (zh)
Other versions
CN112860979B (en
Inventor
邓广伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202110181662.3A priority Critical patent/CN112860979B/en
Publication of CN112860979A publication Critical patent/CN112860979A/en
Application granted granted Critical
Publication of CN112860979B publication Critical patent/CN112860979B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/2445Data retrieval commands; View definitions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2468Fuzzy queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/432Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Automation & Control Theory (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to a resource searching method, a resource searching device and a storage medium, and belongs to the technical field of internet. The embodiment of the application provides a method for supporting searching based on emoticons, which is characterized in that emoticons input by a user are processed into key words, and then the key words are queried in an index file in a text retrieval mode, so that resources related to the emoticons are found. By this method, an emoticon-based search service is realized. Meanwhile, the dimension of the emoticons is added for the search service, and the requirement of a user for searching by using the emoticons is met.

Description

Resource searching method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of internet technologies, and in particular, to a resource search method, apparatus, device, and storage medium.
Background
Search technology is a vital technology in internet applications. The basic flow of the search technique is roughly that when a user wants to search a resource, the user inputs content describing the resource in the client. The client sends the content input by the user to the server. The server side inquires the matched resources according to the content input by the user and returns the resources to the user, so that the user is helped to quickly find the desired resources.
However, the current search technology is limited to searching based on dimensions such as name, author, label, etc., and the emoticon-based search cannot be realized.
Disclosure of Invention
The present disclosure provides a resource search method, apparatus, device, and storage medium to at least solve a problem in the related art that a search based on emoticons is impossible. The technical scheme of the disclosure is as follows.
According to a first aspect of the embodiments of the present disclosure, there is provided a resource search method, including:
receiving a search request from a client device, the search request including an emoticon-based search resource;
processing according to the emoticons to obtain keywords;
according to the key words, resource identifiers are obtained by inquiring from an index file, and the index file indicates the corresponding relation between the key words and the resource identifiers;
and sending the target resource corresponding to the resource identifier to the client equipment.
In some embodiments, the processing according to the emoticon to obtain a keyword includes:
converting the emoticon into text;
and processing the text to obtain a keyword.
In some embodiments, said converting said emoticon into text comprises:
inquiring to obtain the text from an expression information base according to the emoticons, wherein the expression information base comprises the corresponding relation between at least one group of emoticons and the text; alternatively, the first and second electrodes may be,
coding the expression symbols to obtain the text; alternatively, the first and second electrodes may be,
and inputting the emoticons into a machine learning model, identifying the emoticons through the machine learning model, and outputting the text, wherein the machine learning model is used for identifying the text according to the emoticons.
In some embodiments, the processing the text to obtain a keyword includes:
and performing lexical analysis, syntactic analysis and language processing on the text to obtain the keywords.
In some embodiments, the method further comprises:
and updating the index file every a preset time period.
In some embodiments, the updating the index file includes:
converting the emoticons in the first resource newly added in the preset time period into texts to obtain a second resource;
and constructing an updated index file according to the second resource.
In some embodiments, the method further comprises:
if the updated index file is successfully constructed, replacing the index file before updating with the updated index file; alternatively, the first and second electrodes may be,
and if the updated index file is failed to be constructed, continuing to provide search service by using the index file before updating.
In some embodiments, the target assets are materials used to generate multimedia files.
According to a second aspect of the embodiments of the present disclosure, there is provided a resource search apparatus, including:
a receiving unit configured to perform receiving a search request from a client device, the search request including an emoticon-based search resource;
the processing unit is configured to execute processing according to the emoticons to obtain keywords;
the query unit is configured to perform query to obtain a resource identifier from an index file according to the keyword, wherein the index file indicates the corresponding relation between the keyword and the resource identifier;
a sending unit configured to execute sending of a target resource corresponding to the resource identifier to the client device.
In some embodiments, the processing unit is configured to perform converting the emoticon into text; and processing the text to obtain a keyword.
In some embodiments, the processing unit is configured to perform query to obtain the text from an emoticon database according to the emoticon, where the emoticon database includes at least one set of correspondences between emoticons and texts; or, coding the expression symbol to obtain the text; or inputting the emoticons into a machine learning model, identifying the emoticons through the machine learning model, and outputting the text, wherein the machine learning model is used for identifying the text according to the emoticons.
In some embodiments, the processing unit is configured to perform lexical analysis, syntactic analysis, and linguistic processing on the text to obtain the keyword.
In some embodiments, the processing unit is further configured to update the index file every preset time period.
In some embodiments, the processing unit is configured to perform conversion of the emoticons in the newly added first resource into texts in the preset time period, so as to obtain a second resource; and constructing an updated index file according to the second resource.
In some embodiments, the processing unit is further configured to perform, if the updated index file is successfully constructed, replacing the index file before updating with the updated index file; or if the updated index file is failed to be constructed, the index file before updating is continuously used for providing the search service.
In some embodiments, the target assets are materials used to generate multimedia files.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:
one or more processors;
one or more memories for storing the processor-executable program code;
wherein the one or more processors are configured to execute the program code to implement the resource search method described above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium having program code embodied therein, which when executed by a processor of an electronic device, enables the electronic device to perform the above-described resource search method.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the above-described resource search method.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
the embodiment provides a method for supporting emoticon-based search, which is characterized in that emoticons input by a user are processed into keywords, and then the keywords are queried in an index file in a text retrieval mode, so that resources related to the emoticons are found. By this method, an emoticon-based search service is realized. Meanwhile, the dimension of the emoticons is added for the search service, and the requirement of a user for searching by using the emoticons is met.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is a block diagram illustrating the architecture of a resource search system in accordance with an exemplary embodiment;
FIG. 2 is a flow diagram illustrating a method of resource searching in accordance with an exemplary embodiment;
FIG. 3 is a flow diagram illustrating a method of resource searching in accordance with an exemplary embodiment;
FIG. 4 is a block diagram illustrating a resource search apparatus in accordance with an exemplary embodiment;
FIG. 5 is a block diagram illustrating a server in accordance with an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
First, some concepts related to the embodiments of the present application will be explained.
MySQL is a relational database management system developed by the MySQLAB company, Sweden, and belongs to the product under Oracle flag. MySQL is one of the more popular relational database management systems. In terms of World Wide Web applications, MySQL is one of the better RDBMS (Relational Database Management System) applications.
Oracle database is a relational database management system of Oracle corporation. The Oracle database is a product that has been leading in the database field. The Oracle database system is a popular relational database management system in the world at present, has the advantages of good portability, convenient use, strong function, suitability for various large, medium and small microcomputer environments and the like. The Oracle database is a high-efficiency, reliable and high-throughput database scheme.
Lucene, a sub-project of the apache software Foundation 4jakarta project group, is a full-text search engine toolkit of open source code. However, Lucene is not a complete full-text search engine, but a framework of the full-text search engine. Lucene provides a complete query engine and index engine and a partial text analysis engine.
Inverted indexing: is a search mode. Colloquially, the forward index is to find the value (value) by the key (key), and the reverse index is to find the key (key) by the value (value). When the inverted index technology is adopted, a dictionary and an inverted table are constructed.
Dictionary: including many words. A dictionary may be understood as a collection of words.
Word (Term): after a piece of text is analyzed by the analyzer, a string of words is output, and the words are called words.
Inverted meter: the document list is used for recording all documents of which a word appears and the position information of the word appearing in the document. Each record in the inverted table is called an inverted entry (Posting). From the posting lists it is known which documents contain a word. The inverted table includes at least a document ID (document number), and optionally also a Term frequency (number of times Term occurs), an offset (offset), and the like. The document IDs in the inverted list are used to identify documents in which words in the dictionary appear.
The ElasticSearch is a search server based on Lucene. The ElasticSearch provides a full-text search engine with distributed multi-user capability, and is based on a RESTful web interface. The Elasticsearch was developed in the Java language and published as open source under the Apache licensing terms, a popular enterprise level search engine.
Guava Cache is a local Cache component developed by Google.
An application scenario of the embodiment of the present application is described below.
Along with the increasing number of video templates in video editing application, a user can hardly find a video template desired by the user without searching. Therefore, it is necessary to provide a method capable of searching a video template according to a plurality of factors, such as a template name, a description of the template, an author of the template, and a classification tag of the template. The basic principle of searching a video template is that a user inputs template content to be queried in a client of a video editing application. The client sends the content input by the user to a server of the video editing application. And the server of the video editing application calculates the similarity between the content input by the user and the content in the template. And after the calculated results are arranged in the sequence of high similarity to low similarity by the server, the calculated results are returned to the user, so that the user is helped to quickly find the template required by the user.
In the search technology, there are several processing methods in the industry for the character string content sent by the user.
(1) Fuzzy query is carried out in the MySQL/Oracle database, and then the query result is returned to the client. The pseudo code of the query is as follows:
Select*from table where col like‘%name%’
the method is common in an internal search system and is simple to implement. However, once the data size is large, or the query condition is complicated, or the query rate per second (QPS) of the user query is high, this query method of MySQL/Oracle is time-consuming.
(2) Search method based on ElasticSearch. The disadvantage of the distributed, highly extended, highly real-time search engine is that it requires a separate deployment and maintenance of a suite of ElasticSearch services, and ensures its high availability.
(3) The search was performed using Lucene.
The index of Lucene adopts an incremental updating mode. Specifically, the index data needs to be updated whenever the data used to construct the Lucene index is updated. In the distributed Lucene search, it needs to be ensured that all data in the index needs to be updated after the data is updated. The disadvantage of this scheme is that, on one hand, in the distributed Lucene search, when data is updated, there is a case that partial data update fails; on the other hand, since no special processing is performed on the text when the index is constructed, the expression symbol cannot be searched.
The above-described schemes (1) to (3) all have problems. For example, in the above (1), the fuzzy query is performed in the MySQL/Oracle database, this scheme is generally used for internal systems, and MySQL/Oracle cannot support the user query with high QPS; the above (2) searching method based on the ElasticSearch requires a separate deployment and maintenance of a set of ElasticSearch services; if the search is performed using Lucene in the above (3), there may be a case where the index update fails in the distributed service. Some embodiments of the application perform corresponding optimization on the basis of the Lucene scheme and support expression search.
Hereinafter, a hardware environment of the embodiments of the present disclosure is exemplified.
FIG. 1 is a block diagram illustrating the architecture of a resource search system in accordance with an exemplary embodiment. The resource search system includes: a client device 101 and a resource platform 110.
The client device 101 is connected to the resource platform 110 through a wireless network or a wired network. The client device 101 may be at least one of a smartphone, a game console, a desktop computer, a tablet computer, an e-book reader, an MP3(Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3) player, or an MP4(Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4) player, and a laptop computer. The client device 101 is installed and operated with application client software that supports resource searching. The application client software may be a video editing application, a live application, a multimedia application, a short video application, and the like. Illustratively, the client device 101 is a terminal used by a user, and a user account is logged in application client software. The client device 101 is connected to the resource platform 110 through a wireless network or a wired network.
The resource platform 110 includes at least one of a server, a plurality of servers, a cloud computing platform, and a virtualization center. The resource platform 110 is used to provide background services for application client software. Optionally, the resource platform 110 and the client device 101 work in cooperation in searching for resources. For example, the resource platform 110 undertakes primary work and the client device 101 undertakes secondary work; alternatively, the resource platform 110 undertakes secondary work and the client device 101 undertakes primary work; alternatively, the resource platform 110 or the client device 101 may be responsible for the work separately.
Optionally, the resource platform 110 includes: access server, search server 1101, and database 1102. The access server is used to provide access services for the client device 101. The search server 1101 is used for providing background services related to resource search, such as processing emoticons input by a user, constructing an index file, updating the index file, and the like. The search server 1101 may be one or more. When the search servers 1101 are multiple, at least two search servers 1101 are present for providing different services, and/or at least two search servers 1101 are present for providing the same service, for example, providing the same service in a load balancing manner, which is not limited by the embodiment of the present disclosure. Database 1102 may be used to store resources. The database 1102 may provide the stored resources to the search server 1101 when needed.
The client device 101 may be broadly referred to as one of a plurality of client devices, and the present embodiment is illustrated with the client device 101 only.
One skilled in the art will appreciate that the number of client devices 101 may be greater or fewer. For example, the number of the client devices 101 may be only one, or the number of the client devices 101 may be tens or hundreds, or more, in which case the resource search system further includes other client devices. The number and the device type of the client devices are not limited by the embodiments of the present disclosure.
FIG. 2 is a flow diagram illustrating a method of resource searching in accordance with an exemplary embodiment. As shown in fig. 2, the resource search method is interactively performed by a client device and a server, and comprises the following steps.
In step S21, the client device sends a search request to the server, the search request including searching for resources based on the emoticon.
Emoticons refer to symbols used to serve as emotions. Types of emoticons include, without limitation, pictures or character strings. In the case where the emoticon is a picture, the specific type of emoticon includes, but is not limited to, a still picture or a moving picture (e.g., GIF animation).
Types of resources include, without limitation, video, audio, pictures, text, and the like. In some embodiments, the assets are materials used to generate multimedia files. For example, a resource is a video template used to generate a video. In other embodiments, the resource is a commodity, topic, microblog, or the like.
In some embodiments, the emoticon is a search keyword in a search request. For example, the search request includes a query statement, and the search terms in the query statement include emoticons. Optionally, in the scenario of searching in combination with emoticons along with other dimensions, the search request further includes other search keywords besides emoticons. For example, the search request includes, in addition to the emoticon, an author of the resource, a name of the resource, a tag of the resource, a type of the resource, description information of the resource, and so forth.
In some embodiments, the search request is triggered based on an input operation by the user. In particular, the client device supports the function of inputting emoticons. In the process of searching resources, a user performs input operation on the client device and inputs emoticons. The client device responds to the input operation of the user and generates a search request according to the emoticons input by the user.
Initiating a search request containing emoticons involves a variety of scenarios, which are exemplified below in connection with scenarios one through three.
Scene one, a client device displays a search interface in a multimedia application, wherein the search interface comprises a search box. And after the user clicks the search box, displaying the expression selection control in the multimedia application. And after the user triggers and operates the expression selection control, the multimedia application displays an expression symbol interface. The emoticon interface includes a plurality of emoticons. After a user triggers an operation on one emoticon in the emoticon interface, the multimedia application generates and initiates a search request according to the emoticon triggered in the emoticon interface.
And in the second scenario, the user executes input operation through input method software and inputs pinyin corresponding to the expression symbols. And the input method software determines the corresponding expression symbols according to the pinyin input by the user and provides the expression symbols for the multimedia application. The multimedia application initiates a search request according to the emoticon provided by the input method software.
And a third scenario that the user downloads the emoticons from the Internet in advance, or the user edits the emoticons manually and stores the emoticons in the local client equipment. The client device initiates a search request according to the emoticon in the local storage.
In step S22, the server receives a search request from the client device.
The server obtains the emoticon from the search request, and performs subsequent steps according to the emoticon to provide a search service.
In step S23, the server converts the emoticon into text.
The text converted out of the emoticon is used to describe the emoticon. In some embodiments, the text converted by an emoticon is a string (e.g., a word). In some embodiments, the text that the emoticon translates out is Unicode (also known as Unicode, ten thousand Unicode, Unicode) encoding of the emoticon. For example, the emoticon "laugh and cry" translates into "laughcry:". The text converted from the expression symbol "dog head" is dog: ".
There are many implementations of how emoticons are converted into text, which are exemplified by ways one to three below.
In the first mode, the server queries and obtains a text from the expression information base according to the expression symbols.
The expression information base is used for storing information related to the preconfigured emoticons. The expression information base comprises corresponding relations between at least one group of emoticons and texts. For example, the emoticon information library includes a correspondence between an emoticon 1 and a text 1, and a correspondence between an emoticon 2 and a text 2. If the emoticon in the search request hits emoticon 1, the server acquires text 1 as converted text.
In one possible implementation, the operation and maintenance personnel perform configuration operations on the server in advance, for example, inputting at least one set of emoticons and text through a command line interface or a web interface. The server obtains at least one group of emoticons and texts according to the configuration operation, and an emotion information base is created according to the at least one group of emoticons and texts.
And in the second mode, the server encodes the emoticons to obtain a text.
For example, the server performs Unicode encoding on the emoticons, and the converted text is encoded Unicode encoding.
And thirdly, the server inputs the emoticons into the machine learning model, identifies the emoticons through the machine learning model and outputs the text.
The machine learning model is used to recognize text from the emoticons. For example, in the case where the emoticon is a picture, the machine learning model is an image recognition model. As another example, where the emoticon is a character string, the machine learning model is a text recognition model. Machine learning models include, without limitation, convolutional neural networks, cyclic neural networks, decision tree models, random forest models, and the like.
Optionally, the machine learning model is trained by the server according to the sample emoticons. For example, the server obtains a sample set, the sample set including a plurality of sample emoticons. The label of the sample emoticon is text. The server performs model training using the sample set, thereby obtaining a machine learning model.
By means of the method III, the unknown emoticons are converted into texts, the method is suitable for scenes in which the emoticons are updated frequently in a network, and complexity caused by pre-configuring texts corresponding to the emoticons is reduced.
The method and the device have the advantages that various implementation modes for converting the emoticons into the texts are provided, a proper mode can be adopted according to actual requirements, and the flexibility is high.
In step S24, the server processes the text to obtain keywords.
In some embodiments, the server processes the text based on Natural Language Processing (NLP) techniques. Specifically, the server performs lexical analysis, syntactic analysis and language processing on the text to obtain keywords. In one possible implementation, the server first performs lexical analysis on the text, then performs syntactic analysis according to the result of the lexical analysis, and then performs language processing according to the result of the syntactic analysis.
Lexical analysis (lexical analysis) is a process of converting a sequence of characters into a sequence of tokens (tokens) in computer technology, i.e., segmenting text from a string of contiguous characters into words. In this embodiment, the lexical analysis mainly identifies words or phrases.
The grammar analysis is to combine the word sequence into various grammar phrases based on the lexical analysis. In this embodiment, the syntax analysis is mainly to form a syntax tree according to the syntax rules of the query statement.
The language processing is mainly to perform some language-related processing on the obtained lemmas (Token). The language processing is optionally performed by a language processing component (linear processor) run by the server. For example, in the case where the text converted out by the emoticon is an english word, the language processing component implements, for example, the following: first, the English word is changed to Lowercase (Lowercase). Second, English words are reduced to root form, e.g., "cars" are treated as "car". This processing operation is called: stem extraction (stemming). Second, the english word is converted into root form, e.g., "drove" is treated as "drive".
The text is processed by means of lexical analysis, grammatical analysis and language processing, the form of the text corresponding to the emoticons can be standardized, and the influence of factors such as redundant information, capital and small cases, single and plural numbers and the like in the text on subsequent searching is avoided.
The above steps S23 and S24 provide an alternative implementation of obtaining keywords according to emoticon processing, and in other embodiments, the emoticons are processed in a manner other than steps S23 and S24 to obtain keywords. For example, the server stores a keyword list including preset correspondence between emoticons and keywords. And the server queries the keyword table according to the emoticons to obtain the keywords.
In step S25, the server queries the index file for the resource identifier according to the keyword.
The resource identifier is used to identify the corresponding resource. For example, the resource identification is a name, number, keyword, etc. of the resource.
The index file is also called an index repository. The index file indicates a correspondence between the keyword and the resource identifier. In one example, the index file indicates a correspondence between keyword a and a resource identification of resource a, and a correspondence between keyword B and a resource identification of resource B. And if the keyword obtained according to the search request hits the keyword A, the inquired resource identifier is the resource identifier of the resource A.
In some embodiments, the index file is constructed based on an inverted index technique. The index file includes a dictionary and an inverted table. The dictionary includes words transformed from emoticons in the resource. Optionally, the dictionary further includes words corresponding to other contents besides the emoticons in the resource. For example, the database includes N resources. The N resources include M emoticons in total, and the N resources also include P texts. Then, the dictionary includes at least M words into which M emoticons are converted, and optionally also includes words corresponding to P texts.
The inverted list is used for indicating resources in which the emoticons corresponding to the words in the dictionary appear, and the inverted list comprises resource identifications of the resources in which the corresponding emoticons appear. For example, resource 1 and resource 2 both have an emoticon representing a smile, which translates to the word smile. Then the dictionary would include smile and the inverted list would include the resource 1 ID and the resource 2 ID, thus noting that the smiley emoticon (smile) appears in both resource 1 and resource 2.
In step S26, the server sends the target resource corresponding to the resource identifier to the client device.
The target resource refers to a resource serving as a search result. The server obtains the target resource from the resource library according to the resource identifier.
The resource library comprises a corresponding relation between the resource identification and the resource. Optionally, the repository is maintained in a local cache of the server. Specifically, the server loads the resource library into the local cache in advance. And after the server finds the resource identifier, the server accesses the local cache according to the resource identifier so as to obtain the target resource. Since the cache has the advantage of high access speed, the resources are acquired from the cache, so that search results can be provided to a user more quickly. In other embodiments, the resource library is stored in the cloud, the server sends a resource acquisition request carrying the resource identifier to the cloud, and the cloud acquires the target resource from the resource library according to the resource identifier and returns the target resource to the server. And the server receives the target resource returned by the cloud.
In some embodiments, if the server queries to obtain a plurality of target resources, the server ranks the plurality of target resources in order of weight from high to low according to the weights of the plurality of target resources, so that the target resource with the highest weight is ranked first. And the server sends the sorted target resources to the client equipment, so that the target resources with higher weights are displayed more preferentially.
In some embodiments, the weight of the target resource is determined according to the usage of the target resource. The larger the usage of the target resource, the larger the weight of the target resource. The usage amount of the target resource is, for example, the total number of times the target resource is used by the user, and the usage amount of the target resource reflects the popularity of the resource to some extent. In this way, the resource with large usage amount can be arranged at the front position in the search result, which is helpful for improving the accuracy of the search result.
In some embodiments, the weight of the target resource is determined according to the similarity of the target resource to the query statement. The greater the similarity of the target resource to the query statement, the greater the weight of the target resource.
Optionally, the weight of the target resource is a product of the usage amount of the target resource and the similarity of the target resource and the query statement.
In step S27, the client device receives the target resource and provides the target resource to the user.
The manner in which the client device provides the target asset includes, without limitation, presenting the target asset, playing the target asset, and the like. For example, in a scenario where a multimedia file is synthesized from some materials, the server searches for a target asset as one material used in synthesizing the multimedia file. And the server or the client equipment synthesizes the searched target resources and the resources provided by the user to obtain the multimedia file.
For example, in the scenario of making a video, the target resource searched by the server is, for example, a video template, and the video template is used for synthesizing the video. For example, after the server sends the video template to the client device, the client device displays the video template in the interface. And triggering a confirmation operation when the user agrees to use the video template to make the video. The client device or the server synthesizes the video template and the video or audio made by the user to obtain a video.
In some embodiments, the server may periodically update the index file asynchronously. Specifically, the server maintains a preset time period. And the server updates the index file every preset time period. For example, if the preset time period is set to T, the server starts a timer, and whenever the time passes by one T, the server constructs an updated index file according to the newly added resources in the T, so as to update the index file once. The server updates by adopting a timing asynchronous mode, so that the resource identification of the newly added resource is supplemented to the index file in time, and the newly added resource can be quickly searched.
In some embodiments, when the server updates the index file, the server converts the emoticons in the resource into texts, and then constructs the updated index file. Taking the resource before the emoticon conversion as a first resource and the resource after the emoticon conversion as a second resource as an example, for example, when a preset time period passes, the server converts the emoticon in the first resource newly added in the preset time period into a text to obtain the second resource; and the server constructs an updated index file according to the second resource. Wherein the second resource does not include an emoticon. The second resource comprises text corresponding to the emoticon in the first resource and other contents except the emoticon in the first resource. Similarly, under the condition that a plurality of resources are newly added in a preset time period, the emoticons in each newly added resource are converted into texts, and then the updated index file is constructed according to the plurality of newly added resources after conversion.
In some embodiments, if the updated index file is successfully constructed, the server replaces the pre-updated index file with the updated index file. Further, the server deletes the index file before update. And after receiving a search request sent by the client equipment next time, the server provides search service by using the updated index file. If the updated index file is failed to be constructed, the server does not replace the old index file with the new index file, does not delete the index file before updating, and continues to use the index file before updating to provide search service. In this way, the influence of the failure in updating the index file on the search service can be reduced.
In some embodiments, the server updates the index file with a full update. Specifically, each time the server updates the index file, the server builds the updated index file based on all resources already saved in the database. In other words, when the server constructs the updated index file, the server not only uses the newly added resources in the current time period, but also uses all the resources that have been saved in the previous time period. By adopting the mode of full update, the index of the newly added resource can be prevented from being lost, and the fault-tolerant capability of the system is improved. The principle of this effect is explained below with reference to an example.
For example, the initial time point is T0, and the preset time period is set to T. The database holds n0 resources at time t 0. The server constructs an initial index file from the n0 resources at time point t 0. The initial index file includes an identification of n0 resources, such that n0 resources are searchable by the user. When the time passes through a first preset time period and reaches a time point (T0+ T), the database stores (n0+ Delta 1) resources at the time point (T0+ T). The server constructs the updated index file from (n0 +. DELTA.1) resources at a time point (T0+ T), however, an index file construction failure occurs, and the server continues to provide a search service using the original index file. When the time passes through a second preset time period and reaches a time point (T0+2T), the database stores (n0+ delta 1+ delta 2) resources at the time point (T0+ 2T). And the server constructs an updated index file according to (n0 +. DELTA.1 +. DELTA.2) resources at (T0+ 2T). The updated index file includes the identifications of (n0 +. DELTA.1 +. DELTA.2) resources. The server replaces the initial index file with the updated index file, and provides a search service by using the updated index file, so that (n0 +. DELTA.1 +. DELTA.2) resources can be searched by the user. As can be seen from this example, although updating the index file in the first time period fails, Δ 1 resources newly added in the first time period are not lost finally. Due to the adoption of the full-quantity updating mode, when the index file is updated again in the second time period, the index file reconstructed in the second time period comprises the newly increased delta 2 resources in the second time period and the newly increased delta 1 resources in the first time period, so that the newly increased delta 1 resources in the first time period can be timely searched after the second time period.
The embodiment provides a method for supporting emoticon-based search, which is characterized in that emoticons input by a user are processed into keywords, and then the keywords are queried in an index file in a text retrieval mode, so that resources related to the emoticons are found. By this method, an emoticon-based search service is realized. Meanwhile, the dimension of the emoticons is added for the search service, and the requirement of a user for searching by using the emoticons is met.
The method shown in FIG. 2 is described below with reference to an example and FIG. 3. The resources in the method shown in fig. 2 are templates in the following example. The candidate resource in the method shown in fig. 2 is a template of the MySQL database in the example described below. The target resource in the method shown in fig. 2 is a template returned to the client in the following example. The resource identification in the method shown in fig. 2 is a template ID in the following example. The text into which the emoticon is converted in the method shown in fig. 2 is a word in the following example. The following example is implemented using Lucene.
Example 1
The method comprises the following steps: the server builds an index file.
Specifically, after the server starts the search service, the Lucene index is constructed. In some embodiments, before the index file is built, the server may uniformly distribute the traffic accessed by the user to the Lucene nodes that have been completely started, and may not distribute the traffic to the Lucene nodes that are building the index, so as to avoid that the Lucene nodes cannot perform the search service, which may result in the search failure.
Optionally, the template used to construct the index file is derived from a MySQL database. When the server constructs the index file, firstly, the template is inquired from the MySQL database. The search service not only provides the search of the text content, but also provides the search of the emoticons; but when Lucene builds the index, the emoticons are deleted by Lucene default. Therefore, in order to provide emoticon search, as shown in fig. 3, the server converts emoticons in the template into specific english words, and the server performs lexical analysis, syntactic analysis, and language processing on the english words, and constructs an index file from the processed template. In addition, in order to ensure that templates with large usage amount are arranged at a position in a search result, when the index file is constructed, the server assigns different weights to each template according to the usage amount of the template. The template with large usage amount has larger weight. The template with small usage amount has smaller weight.
Step two: the server periodically and asynchronously updates the index file.
After the search service is completely started, in order to ensure that the template added in the MySQL database can be searched in time, the index file can be updated in a timing asynchronous mode. And after the new index file is successfully constructed, the original old index file is replaced, and the old index file is deleted. If the new index file is constructed, the construction is failed due to various problems, so that the new index file and the old index file cannot be replaced, and the influence on the search service is reduced.
Step three: the server provides the search service for the user according to the index file.
And the user inputs the related content of the template to be searched in the video editing APP. And the client of the video editing APP transmits the content input by the user to the server of the video editing APP. And (3) after receiving the content input by the user, the server performs corresponding operation on the content input by the user, and the specific flow is from the step (3-1) to the step (3-3).
And (3-1) the server converts the emoticons in the content input by the user into specific words, and then the server performs lexical analysis, syntactic analysis and language processing on the input content to obtain keywords.
And (3-2) the server takes the keywords obtained in the step (3-1) as search indexes, and searches to obtain documents conforming to the syntax tree.
Firstly, the server respectively finds out a document linked list containing the keywords obtained by word segmentation in an index file. Secondly, the server performs merging operation on the linked lists containing the keywords. All documents containing these keywords are obtained.
And (3-3) the server sorts the queried template IDs according to the obtained correlation between the documents and the query sentences, and takes the sorted template IDs as query results.
Specifically, for template IDs sorted by relevance to the query statement, the more relevant template IDs are the more advanced in the query result.
Step four: the server returns to the client template.
Specifically, the template ID is queried in step three, and then the specific template is queried according to the template ID in step four. For example, the server stores the template corresponding to the template ID in the local cache guava cache in advance. And the server inquires a specific template from the cache according to the template ID returned by the Lucene search, and returns the template to the client.
With the above embodiment, since the emoticons are specially processed, the template supports the search of the emoticons. Because the index library is constructed in a timing asynchronous manner, a template newly added in the MySQL database can be searched quickly; the server can quickly read a corresponding template from the local cache according to the searched template ID, and the user can be guaranteed to obtain a result almost instantly when searching the template.
FIG. 4 is a block diagram illustrating a resource search apparatus according to an example embodiment. Referring to fig. 4, the apparatus includes a receiving unit 501, a processing unit 502, a querying unit 503, and a transmitting unit 504.
A receiving unit 501 configured to perform receiving a search request from a client device, the search request including an emoticon-based search resource; a processing unit 502 configured to perform processing according to the emoticon to obtain a keyword; a query unit 503 configured to perform query to obtain a resource identifier from an index file according to the keyword, where the index file indicates a corresponding relationship between the keyword and the resource identifier; a sending unit 504 configured to execute sending, to the client device, a target resource corresponding to the resource identifier.
In some embodiments, the processing unit 502 is configured to perform converting the emoticon into text; and processing the text to obtain a keyword.
In some embodiments, the processing unit 502 is configured to perform query to obtain a text from an emoticon according to the emoticon, where the emoticon includes at least one set of correspondence between the emoticon and the text; or, encoding the expression symbols to obtain a text; or inputting the emoticons into a machine learning model, identifying the emoticons through the machine learning model, and outputting a text, wherein the machine learning model is used for identifying the text according to the emoticons.
In some embodiments, the processing unit 502 is configured to perform lexical analysis, syntactic analysis, and linguistic processing on the text, resulting in keywords.
In some embodiments, the processing unit 502 is further configured to perform updating the index file every preset time period.
In some embodiments, the processing unit 502 is configured to perform converting the emoticons in the newly added first resource into texts in a preset time period, so as to obtain a second resource; and constructing an updated index file according to the second resource.
In some embodiments, the processing unit 502 is further configured to perform, if the updated index file is successfully constructed, replacing the index file before updating with the updated index file; or if the updated index file fails to be constructed, the index file before updating is continuously used for providing the search service.
In some embodiments, the querying unit 503 is configured to perform: acquiring a resource identifier of at least one candidate resource from the index file according to the keyword; and selecting the resource identifier of the target resource from the resource identifiers of the at least one candidate resource according to the weight of the at least one candidate resource, wherein the weight of the candidate resource is related to the usage amount of the candidate resource, and the larger the usage amount of the candidate resource is, the larger the weight of the candidate resource is.
In some embodiments, the target assets are materials used to generate multimedia files.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 5 is a schematic structural diagram of a server according to an embodiment of the present disclosure, where the server 600 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 601 and one or more memories 602, where at least one program code is stored in the memory 602, and the at least one program code is loaded and executed by the processors 601 to implement the resource search method provided by each method embodiment. Of course, the server may also have a wired or wireless network interface, an input/output interface, and other components to facilitate input and output, and the server may also include other components for implementing the functions of the device, which are not described herein again.
In an exemplary embodiment, there is also provided a storage medium, such as a memory, comprising program code executable by a processor of an electronic device to perform the above-described resource search method. Alternatively, the storage medium may be a non-transitory computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.
The user information to which the present disclosure relates may be information authorized by the user or sufficiently authorized by each party.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A method for resource search, the method comprising:
receiving a search request from a client device, the search request including an emoticon-based search resource;
processing according to the emoticons to obtain keywords;
according to the key words, resource identifiers are obtained by inquiring from an index file, and the index file indicates the corresponding relation between the key words and the resource identifiers;
and sending the target resource corresponding to the resource identifier to the client equipment.
2. The resource search method of claim 1, wherein the processing according to the emoticon to obtain a keyword comprises:
converting the emoticon into text;
and processing the text to obtain a keyword.
3. The resource search method of claim 2, wherein said converting the emoticon into text comprises:
inquiring to obtain the text from an expression information base according to the emoticons, wherein the expression information base comprises the corresponding relation between at least one group of emoticons and the text; alternatively, the first and second electrodes may be,
coding the expression symbols to obtain the text; alternatively, the first and second electrodes may be,
and inputting the emoticons into a machine learning model, identifying the emoticons through the machine learning model, and outputting the text, wherein the machine learning model is used for identifying the text according to the emoticons.
4. The method of claim 2, wherein the processing the text to obtain the keyword comprises:
and performing lexical analysis, syntactic analysis and language processing on the text to obtain the keywords.
5. The resource searching method of claim 1, further comprising:
and updating the index file every a preset time period.
6. The resource search method of claim 5, wherein the updating the index file comprises:
converting the emoticons in the first resource newly added in the preset time period into texts to obtain a second resource;
and constructing an updated index file according to the second resource.
7. The resource searching method of claim 6, further comprising:
if the updated index file is successfully constructed, replacing the index file before updating with the updated index file; alternatively, the first and second electrodes may be,
and if the updated index file is failed to be constructed, continuing to provide search service by using the index file before updating.
8. An apparatus for resource search, the apparatus comprising:
a receiving unit configured to perform receiving a search request from a client device, the search request including an emoticon-based search resource;
the processing unit is configured to execute processing according to the emoticons to obtain keywords;
the query unit is configured to perform query to obtain a resource identifier from an index file according to the keyword, wherein the index file indicates the corresponding relation between the keyword and the resource identifier;
a sending unit configured to execute sending of a target resource corresponding to the resource identifier to the client device.
9. An electronic device, comprising:
one or more processors;
one or more memories for storing the one or more processor-executable program codes;
wherein the one or more processors are configured to execute the program code to implement the resource search method of any of claims 1 to 7.
10. A storage medium, wherein program code in the storage medium, when executed by a processor of an electronic device, enables the electronic device to perform the resource search method of any one of claims 1 to 7.
CN202110181662.3A 2021-02-09 2021-02-09 Resource searching method, device, equipment and storage medium Active CN112860979B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110181662.3A CN112860979B (en) 2021-02-09 2021-02-09 Resource searching method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110181662.3A CN112860979B (en) 2021-02-09 2021-02-09 Resource searching method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112860979A true CN112860979A (en) 2021-05-28
CN112860979B CN112860979B (en) 2024-03-26

Family

ID=75989548

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110181662.3A Active CN112860979B (en) 2021-02-09 2021-02-09 Resource searching method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112860979B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491477A (en) * 2017-06-30 2017-12-19 百度在线网络技术(北京)有限公司 A kind of emoticon searching method and device
CN109034203A (en) * 2018-06-29 2018-12-18 北京百度网讯科技有限公司 Training, expression recommended method, device, equipment and the medium of expression recommended models
CN110381266A (en) * 2019-07-31 2019-10-25 百度在线网络技术(北京)有限公司 A kind of video generation method, device and terminal
CN110489578A (en) * 2019-08-12 2019-11-22 腾讯科技(深圳)有限公司 Image processing method, device and computer equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491477A (en) * 2017-06-30 2017-12-19 百度在线网络技术(北京)有限公司 A kind of emoticon searching method and device
CN109034203A (en) * 2018-06-29 2018-12-18 北京百度网讯科技有限公司 Training, expression recommended method, device, equipment and the medium of expression recommended models
CN110381266A (en) * 2019-07-31 2019-10-25 百度在线网络技术(北京)有限公司 A kind of video generation method, device and terminal
CN110489578A (en) * 2019-08-12 2019-11-22 腾讯科技(深圳)有限公司 Image processing method, device and computer equipment

Also Published As

Publication number Publication date
CN112860979B (en) 2024-03-26

Similar Documents

Publication Publication Date Title
CN112507715B (en) Method, device, equipment and storage medium for determining association relation between entities
US11514235B2 (en) Information extraction from open-ended schema-less tables
US10558754B2 (en) Method and system for automating training of named entity recognition in natural language processing
KR20210038860A (en) Intent recommendation method, apparatus, device and storage medium
US20210216580A1 (en) Method and apparatus for generating text topics
US11222053B2 (en) Searching multilingual documents based on document structure extraction
US20160098433A1 (en) Method for facet searching and search suggestions
KR101754473B1 (en) Method and system for automatically summarizing documents to images and providing the image-based contents
US20130060769A1 (en) System and method for identifying social media interactions
CN110968695A (en) Intelligent labeling method, device and platform based on active learning of weak supervision technology
CN112528001B (en) Information query method and device and electronic equipment
CN112131449A (en) Implementation method of cultural resource cascade query interface based on elastic search
CN112115232A (en) Data error correction method and device and server
CN111797272A (en) Video content segmentation and search
CN114238573B (en) Text countercheck sample-based information pushing method and device
CN111783861A (en) Data classification method, model training device and electronic equipment
CN110990057A (en) Extraction method, device, equipment and medium of small program sub-chain information
WO2015084757A1 (en) Systems and methods for processing data stored in a database
CN112597768B (en) Text auditing method, device, electronic equipment, storage medium and program product
US20190095525A1 (en) Extraction of expression for natural language processing
US20170124090A1 (en) Method of discovering and exploring feature knowledge
CN112015866A (en) Method, device, electronic equipment and storage medium for generating synonymous text
US20160085760A1 (en) Method for in-loop human validation of disambiguated features
CN115115432B (en) Product information recommendation method and device based on artificial intelligence
CN111488450A (en) Method and device for generating keyword library and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant