CN115203445A

CN115203445A - Multimedia resource searching method, device, equipment and medium

Info

Publication number: CN115203445A
Application number: CN202210855628.4A
Authority: CN
Inventors: 朱运; 乔建秀
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-07-20
Filing date: 2022-07-20
Publication date: 2022-10-18

Abstract

The invention relates to the technical field of artificial intelligence, and provides a multimedia resource searching method, device, equipment and medium. Extracting character contents from a text to obtain a text segment, storing the text segment to a preset database, and segmenting words of the text segment to obtain first key words; constructing an inverted index table according to the first key words, and storing the classification labels of the text segments into the inverted index table to construct a multimedia library; extracting a second keyword from the query request, searching a classification label of the first keyword associated with the second keyword in the multimedia library according to the inverted index table and the second keyword, and reading a text segment from a preset database according to the classification label; and scoring and sequencing the similarity among the text segments, selecting the text segments according to the sequencing sequence, rendering the text segments into corresponding texts, and outputting the texts to a user side. The invention also relates to the technical field of block chains, and the first keyword and the second keyword can also be stored in a node of a block chain.

Description

Multimedia resource searching method, device, equipment and medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a multimedia resource searching method, device, equipment and medium.

Background

With the rapid development of the internet, multimedia resource search is currently an important topic. In general, a multimedia resource search respectively constructs a plurality of content libraries for the contents of different types of text (web page text, PDF text, picture text, video text). Then, when a user inputs keyword search content, the background searches the plurality of content libraries respectively, returns all different types of texts associated with the keywords to the user, and needs to switch the types of the texts to and fro in a display interface for watching, and the user needs to spend time for screening the text segments wanted by the user in each text, so that the user time is consumed, and the accuracy of the searched text segments is low due to manual operation of the user.

Disclosure of Invention

In view of the above, the present invention provides a method, an apparatus, a device and a medium for searching multimedia resources, and aims to solve the technical problems of low efficiency and low accuracy in searching various different types of text segments in the prior art.

In order to achieve the above object, the present invention provides a multimedia resource searching method, which comprises:

respectively extracting character contents from various different types of texts to obtain one or more text segments, storing the text segments in a preset database, and segmenting each text segment to obtain a first keyword of each text segment;

constructing an inverted index table of word search according to the first keyword, and storing the classification label of each text segment to the inverted index table to construct a multimedia library;

receiving a query request sent by a user side, extracting a second keyword from the query request, searching a classification label of a text segment corresponding to a first keyword associated with the second keyword in the multimedia library according to the inverted index table and the second keyword, and reading a corresponding text segment from the preset database according to the classification label obtained by retrieval;

and scoring the similarity among the text fragments, sorting the obtained scoring values according to a preset sorting sequence, selecting a preset number of text fragments according to the sorting sequence, rendering the text fragments into corresponding texts, and outputting the texts to the user side.

Preferably, the extracting text content from the texts of different types to obtain one or more text segments and storing the one or more text segments in a preset database includes:

dividing each type of text into a format part and a text content part, and performing segment division on the text content part to obtain one or more text segments and storing the text segments in a preset database.

Preferably, the segmenting each text segment to obtain the first keyword of each text segment includes:

dividing the long text sentence of each text segment according to a preset word segmentation algorithm to obtain a plurality of word groups;

and calculating the similarity value between adjacent phrases, and taking the phrase with the similarity value smaller than a preset threshold value as a first keyword.

Preferably, after the constructing the inverted index table of the word search according to the first keyword, the method further includes:

counting word frequency values of the first keywords appearing in the corresponding text segments;

comparing the word frequency value with a preset word frequency value, and if the word frequency value is greater than or equal to the preset word frequency value, filling the first keyword into a high-frequency word queue in the inverted index table;

and if the word frequency value is smaller than a preset word frequency value, filling the first keyword into a low-frequency word queue in the inverted index table.

Preferably, before the storing the classification label of each text segment to the inverted index table to construct a multimedia library, the method further includes:

reading a text sequence of a first keyword of each text segment, inputting the text sequence into a preset classification model for marking and embedding to obtain word vector characteristics;

matching the classification labels of the text segments from the label modules of the preset classification models according to the word vector characteristics, and establishing a mapping relation between the classification labels and the first keywords of the text segments.

Preferably, the extracting the second keyword from the query request includes:

performing word segmentation on the information of the query request to obtain a plurality of participles;

and generating a dictionary tree according to a pre-constructed dictionary word list, and inputting the plurality of participles into the dictionary tree for traversal to obtain the second keyword.

Preferably, the searching, according to the inverted index table and the second keyword, a classification tag of a text segment corresponding to a first keyword associated with the second keyword in the multimedia library, and reading a corresponding text segment from the preset database according to the retrieved classification tag includes:

inputting the second keyword into a search engine of the inverted index table;

traversing the first keywords in the inverted index table according to the search engine to obtain first keywords related to the second keywords;

and reading the classification label of the associated first keyword according to the mapping relation, and reading the corresponding text segment from the preset database according to the retrieved classification label.

In order to achieve the above object, the present invention further provides a multimedia resource search apparatus, comprising:

an extraction module: the system comprises a database, a word segmentation module and a word segmentation module, wherein the word segmentation module is used for extracting word contents from various texts of different types respectively to obtain one or more text segments, storing the text segments into a preset database, and segmenting each text segment to obtain a first keyword of each text segment;

a storage module: the reverse index table is used for constructing word search according to the first key words, and the classification labels of the text segments are stored in the reverse index table to construct a multimedia library;

the query module: the system comprises a multimedia library, a query request, a search module and a database, wherein the multimedia library is used for receiving the query request sent by a user side, extracting a second keyword from the query request, searching a classification label of a text segment corresponding to a first keyword associated with the second keyword in the multimedia library according to the inverted index table and the second keyword, and reading a corresponding text segment from the preset database according to the classification label obtained by retrieval;

an output module: and the system is used for grading the similarity among the text fragments, sequencing the obtained grading values according to a preset sequencing sequence, selecting a preset number of text fragments according to the sequencing sequence, rendering the text fragments into corresponding texts and outputting the texts to the user side.

To achieve the above object, the present invention also provides an electronic device, including:

at least one processor; and (c) a second step of,

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores a program executable by the at least one processor, the program being executed by the at least one processor to enable the at least one processor to perform the multimedia asset searching method according to any one of claims 1 to 7.

To achieve the above object, the present invention further provides a computer readable medium storing a multimedia resource, which when executed by a processor, implements the steps of the multimedia resource searching method according to any one of claims 1 to 7.

The method extracts the first keywords and the text segments of the texts of different types, constructs the inverted index table of word search according to all the first keywords, and stores the classification labels of all the text segments into the inverted index table to construct the multimedia library, so that the content of the texts of different types is searched under a unified index architecture, and the cost and the search time for constructing a plurality of content libraries are reduced.

According to the inverted index table and the second keywords inquired by the user, the multimedia library is searched to obtain a plurality of text segments of the first keywords related to the second keywords, the similarity of the text segments is scored and sequenced, the text segments which are sequenced before are selected and rendered into corresponding texts, and the corresponding texts are output to the user side, so that the text segments are used as search results, and the texts of various different types are displayed in a display interface in a mixed manner, manual operation of the user is reduced, and the searching accuracy and efficiency are improved.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating a preferred embodiment of a multimedia resource searching method according to the present invention;

FIG. 2 is a block diagram of a multimedia resource searching apparatus according to a preferred embodiment of the present invention;

FIG. 3 is a diagram of an electronic device according to a preferred embodiment of the present invention;

the objects, features and advantages of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention can acquire and process related data based on an artificial intelligence technology. Among them, artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The invention provides a multimedia resource searching method. Fig. 1 is a schematic method flow diagram of an embodiment of the multimedia resource searching method of the present invention. The method may be performed by an electronic device, which may be implemented by software and/or hardware. The multimedia resource searching method comprises the following steps S10-S40:

step S10: the method comprises the steps of extracting character contents from various texts of different types respectively to obtain one or more text segments, storing the text segments in a preset database, and segmenting words of each text segment to obtain first keywords of each text segment.

In this embodiment, the different types of text include, but are not limited to, web page text, PDF text, picture text, and video text. The methods for extracting text contents from different types of texts are different, the extracted text contents have more space, the text contents are divided into at least one text segment according to punctuation marks (periods, exclamation marks, semicolons) or paragraphs in the text contents, the text segment refers to a text pause caused by turning, emphasizing, intermittence and the like during expression of thought contents of the text, and people generally refer to a 'natural segment'. Dividing words of each text segment, and taking words representing important word senses and semantics of the text segments as first keywords of the text segments, wherein the first keywords are one of main methods for searching and indexing a multimedia library and are also specific name terms of products, services and the like of an enterprise which a user wants to know.

In one embodiment, the extracting text content from the different types of texts to obtain one or more text segments and storing the one or more text segments in a preset database includes:

In one embodiment, the plurality of different types of text formats includes: and taking HTML codes of the webpage text, coordinate information of the character content of the PDF text, coordinate information of the character content of the picture text and the initial time period of the video text in the playing time axis as the format of the text.

The method divides format parts and character content parts of various different types of texts, is a basic condition for searching text segments in the texts, and is a precondition for simultaneously realizing mixed display of the various different types of texts on a user interface and reducing the time spent by a user for screening the text segments required by the user in each text.

Dividing each type of text into a format part and a text content part, and specifically comprising the following steps:

webpage text: the HTML code portion and the text content portion of the web page text are separated. For example, when a web page text (e.g., web address: https:// www.163. Com/dy/arrow \8230;) is opened, clicking the "show web page source code" button of the right mouse button, the current web page text will show the HTML code and text content mixed together, e.g., "< title > quick to see: final complete form of the Chinese space station! The manned spacecraft rocket net easy subscription 8230, the HTML 8230, reading HTML code (format) < title > </title > "and text" quickly see: final complete form of the Chinese space station! The method comprises the steps of separating a manned spacecraft, a spaceman, a Shenzhou rocket and a network easy subscription, dividing the text content into a set containing at least one text segment according to the title and the paragraph of the text content, and respectively storing an HTML code (format) and the set of the text segment into a preset database.

PDF text: extracting a text content part of a PDF text and a coordinate information part of the text content through an OCR (character recognition) algorithm, dividing the text content into a set containing at least one text segment according to a title and a paragraph of the text content, and respectively storing the coordinate information of the text content and the set of the text segment into a preset database. The coordinate information of the text content is coordinate information of a line of text, and the coordinate information includes coordinate information of the x-axis and the y-axis of the vertex of the rectangular frame of the line of text, and four element information such as the length and the width of the rectangular frame. The OCR trains and judges which region in the PDF text may contain the text through a preset character recognition model, and then performs character recognition on the region. For example, in the case of a PDF text, the text recognition model first generates candidate rectangular boxes, determines the likelihood that the boxes contain text, and then identifies the text within the boxes.

Picture text: extracting a text content part of a picture text and a coordinate information part of the text content through an OCR (character recognition) algorithm, dividing the text content into a set containing at least one text segment according to a title and a paragraph of the text content, and respectively storing the coordinate information of the text content and the set of the text segment into a preset database.

Video text: the method comprises the steps of identifying subtitles and voices in a video text through an ASR (automatic speech recognition) algorithm to extract to obtain a text content part, dividing the text content into a set containing at least one text segment according to the similarity of keywords of the subtitles and/or the pause of the voices, reading the starting time period of each text segment in a playing time axis, and storing the starting time period and the set of the text segments into a preset database.

In an embodiment, the segmenting each text segment to obtain the first keyword of each text segment includes:

dividing the long text sentence of each text fragment according to a preset word segmentation algorithm to obtain a plurality of word groups;

The predetermined word segmentation algorithm includes, but is not limited to, a greedy algorithm and a blocking algorithm. Dividing the long text sentence of each text segment to obtain a word sequence vector, wherein the word sequence vector comprises a plurality of word groups obtained by segmenting the text segments, calculating similarity values between every two adjacent word groups, reading and judging whether the similarity values are smaller than a preset threshold (for example, the preset threshold is 1), and taking the word groups smaller than the preset threshold as first keywords.

According to the method and the device, the first keywords of each text segment are extracted, the first keywords represent the central theme and the core thought of each text segment, the corresponding text segments can be screened out through the first keywords, and if the relevance of the extracted first keywords is larger, the searching efficiency and accuracy are improved.

Step S20: and constructing an inverted index table for word search according to the first keyword, and storing the classification label of each text segment to the inverted index table to construct a multimedia library.

In this embodiment, the inverted index table is used to record a list of which first keywords are included in the text segment. And storing the classification labels of all the text segments into a queue of the first key words corresponding to the inverted index table to construct a multimedia library. The multimedia library realizes the search of the contents of various texts with different types under a unified index architecture by using the inverted index table, and reduces the cost and the search time for constructing a plurality of content libraries.

In the set of text segments, there are many text segments containing the same first keyword, each text segment records information of each first keyword (for example, an arrangement sequence number and a sharing frequency of the first keyword in the inverted index table) in a document number (DocID) in the inverted index table, and also records information such as the frequency of occurrence of the first keyword in the text segment (word frequency IDF) and positions of the first keyword in the text segment, and the information related to one text segment is used as an inverted index entry (nesting), and a series of inverted index entries containing all the first keywords form a structure of the inverted index table.

The core of the inverted index table contains the contents of two parts (word dictionary and inverted list):

1. dictionary word list: all the first keywords are recorded to form a list, and the splitting granularity of the first keyword can be realized according to specific requirements. Dictionary vocabularies are generally large and can be implemented through a B + tree or a hash chain table to satisfy high-performance insertion in query and custom editing (e.g., deletion, addition, and modification of a first keyword).

2. Inverted arrangement table: the relation between the first keyword and the corresponding text segment is mainly recorded, and the attribute in the relation between the first keyword and the corresponding text segment is called an inverted index item, wherein the inverted index item comprises the DocID, the word frequency (the word frequency refers to the number of times the first keyword appears in the text segment and can be used for calculating the relevancy) and the position (the position refers to the starting position and the ending position) of the first keyword in the text segment.

In one embodiment, after the constructing the inverted index table of the word search according to the first keyword, the method further comprises:

The word frequency statistics of each obtained first keyword can be performed through a programming model such as MapReduce, and according to a preset word frequency value (for example, the preset word frequency value is 3), the first keywords larger than or equal to the preset word frequency value are used as high-frequency words, and the first keywords smaller than the preset word frequency value are used as low-frequency words, so as to be filled into a queue of the high-frequency words or the low-frequency words in the inverted index table. The high frequency word queue and the low frequency word queue are respectively generated into respective reverse indexes, so that the precision and the speed of searching the first keyword can be improved and the resources of a search engine can be reduced by generating the respective reverse indexes. For example, the mode of generating the high-frequency word queue as a reverse index and the mode of generating the low-frequency word queue as a forward index, or the mode of generating the high-frequency word queue as a forward index and the mode of generating the low-frequency word queue as a reverse index, or the mode of generating the high-frequency word queue and the low-frequency word queue as a reverse index or a forward index at the same time may be generated, and is set according to an actual service scenario, and is not limited herein.

In one embodiment, before storing the classification label of each text segment in the inverted index table to construct the multimedia library, the method further comprises:

The preset classification model is a classification model obtained by collecting and manually labeling a sample set of text segments containing different keywords and training the sample set through a preset model (bert modeling).

For example, a text sequence of each first keyword of a read text segment a is input into a preset classification model for label embedding, the text sequence is subjected to matrix representation through an encoder, word vector characteristics of each first keyword are output, similarity matching is performed on the word vector characteristics through a characteristic representation fusion layer and a full connection layer, a label module of the classification model outputs a label with the maximum similarity between the label and the word vector characteristics as a classification label of the text segment a, and a mapping relation is established between the classification label and each first keyword of the text segment a. By establishing a mapping relation between the classification label and the first keyword of the text segment, when searching, the corresponding text segment can be found through the classification label only by determining the first keyword, the text segment is not required to be searched for any keyword, the text segment is only required to be stored in a preset database, and the operation speed of the inverted index table is improved.

Step S30: receiving a query request sent by a user side, extracting a second keyword from the query request, searching a classification label of a text segment corresponding to a first keyword associated with the second keyword in the multimedia library according to the inverted index table and the second keyword, and reading the corresponding text segment from the preset database according to the classification label obtained by searching.

In this embodiment, after a user inputs the content of a query request on an interface of a search engine (a search engine of an inverted index table) of a multimedia library at a user end and clicks a "search" button, a search engine program processes the content, such as performing word segmentation processing specific to chinese, removing a stop word, determining whether to start integrated search, and determining whether a spelling error or a wrongly written word exists. The query request can be analyzed and the second keywords can be extracted, after the second keywords are obtained, the search engine program starts matching work, all first keywords with the same or similar semantics as the second keywords are found out from the inverted index table, then the classification labels associated with the first keywords are obtained through searching according to the mapping relation, and a plurality of text segments of the first keywords are obtained through reading from a preset database according to the classification labels.

In one embodiment, the extracting the second keyword from the query request includes:

The method comprises the steps of cutting words of contents of a query request based on a preset word segmentation algorithm (for example, a textrank word segmentation algorithm), extracting related words in the query request and removing stop words, constructing a correlation matrix of the words according to the related words, correcting conditions that spelling errors or wrongly written characters exist in the contents of the query request, obtaining an important hierarchy value of each word through a word segmentation algorithm formula, and selecting a preset number of words ranked in the front as the words according to a sequence of the important hierarchy values from large to small.

The method comprises the steps of matching a word list of first keywords with maximum word prefixes to obtain a dictionary word list based on all the pre-recorded first keywords, generating a tree-structured dictionary tree by taking key values and character strings of each first keyword of the dictionary word list as nodes, counting word frequencies of participles in historical query requests of all users in advance, reading character prefix characteristics of a plurality of participles, starting traversal matching along root nodes of the dictionary tree, and taking words with the same character prefix characteristics of the nodes of the dictionary tree and the same character prefix characteristics of the participles as second keywords. And matching the content of the query request with the combined user historical search behavior according to the dictionary vocabulary to obtain a second keyword, so that the technical problems of wrongly written characters, wrongly grammated syntax and unclear expression of the content input by the user are solved.

In one embodiment, the searching, according to the inverted index table and the second keyword, for a classification tag of a text segment corresponding to the first keyword associated with the second keyword in the multimedia library, and reading the corresponding text segment from the preset database according to the retrieved classification tag includes:

inputting the second keyword into a search engine of the inverted index table;

The search engine that inputs the second keyword into the inverted index table obtains a first keyword associated with the second keyword according to different characteristics of the high-frequency word queue and the low-frequency word queue of the inverted index table in data reading, for example, the search engine traverses the high-frequency word queue of the inverted index table in a reverse indexing manner and traverses the low-frequency word queue of the inverted index table in a forward indexing manner, where the associated first keyword refers to a first keyword that has the same or similar semantic meaning as the second keyword, and which indexing manner is set according to an actual service scenario is not limited herein. And reading the classification labels of the associated first keywords according to the mapping relationship established in the step S20 to obtain a plurality of text segments of the first keywords. By adopting different indexing modes, the technical problems that in the prior art, only a single indexing mode occupies more physical space of a search engine, and indexes need to be dynamically maintained when data in an inverted index table is added, deleted and modified, so that the data maintenance speed is reduced are solved, the physical space is effectively saved, and the convenience of data maintenance is improved.

Step S40: and scoring the similarity among the text fragments, sorting the obtained scoring values according to a preset sorting sequence, selecting a preset number of text fragments according to the sorting sequence, rendering the text fragments into corresponding texts, and outputting the texts to the user side.

In this embodiment, after a plurality of text segments of the first keyword are obtained, the text segments may include text segments of different types such as a web page text, a PDF text, an image text, a video text, and the like, similarity calculation is performed on the text segments, the similarity of the text segments is scored according to a preset scoring algorithm, the obtained score values are sorted according to a preset sorting order (for example, the score values are sorted from high to low), a preset number (for example, 10) of text segments with top ranking are selected according to the sorting order, and formats of the 10 text segments read from a preset database are rendered into corresponding texts and output to a user side.

For example, the first keywords are acquired as 'shenzhou' and 'rocket', the selected text segments are web page texts, all the text segments related to the two keywords are returned according to the 'shenzhou' and 'rocket' first keywords, 10 text segments with the top rank are selected after calculation, corresponding HTML codes are acquired, and the original web page texts are rendered at the user side and displayed to the user. And if the selected text segments are PDF texts and picture texts, reading the corresponding text segments and the coordinate information for rendering. And if the selected text clip is a video text, reading the corresponding text clip and playing the initial time period of the time shaft for rendering.

The preset scoring algorithm comprises the following steps:

wherein T is a score value of any text segment, n is the number of first keywords of the text segment, i is the ith first keyword of the text segment, wi is the IDF value of the ith first keyword of the text segment, q is the second keyword queried by the user, d is the d-th public word of the text segment in a dictionary word list, and R (q, d) is the similarity between the text segment and the second keyword queried by the user.

By scoring the similarity among the text segments and selecting the text segments which are in front of the score value in the user query request and the format of the text segments for rendering, the text segments which the user wants to view can be automatically and quickly obtained, the user does not need to spend time to switch the type to view back and forth in the display interface, the user does not need to spend time to discriminate the text segments which the user wants in each text, the time for obtaining the text and the time for rendering are reduced, and the effects of taking the text segments as search results and displaying the texts of different types in a mixed manner in the display interface are achieved.

Referring to fig. 2, a functional block diagram of the multimedia resource searching apparatus 100 according to the present invention is shown.

The multimedia resource searching apparatus 100 of the present invention may be installed in an electronic device. According to the realized functions, the multimedia resource searching apparatus 100 may include an extraction module 110, an extraction module 20, a query module 130, and an output module 140. A module according to the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.

In this embodiment, the functions of the modules/units are as follows:

the extraction module 110: the system comprises a database, a word segmentation module and a word segmentation module, wherein the word segmentation module is used for extracting word contents from various different types of texts respectively to obtain one or more text segments, storing the one or more text segments to a preset database, and segmenting each text segment to obtain a first keyword of each text segment;

the storage module 120: the reverse index table is used for building word search according to the first key words, and the classification labels of the text segments are stored in the reverse index table to build a multimedia library;

the query module 130: the system comprises a multimedia library, a preset database and a query request sending module, wherein the multimedia library is used for receiving the query request sent by a user end, extracting a second keyword from the query request, searching a classification label of a text segment corresponding to a first keyword associated with the second keyword in the multimedia library according to the inverted index table and the second keyword, and reading the corresponding text segment from the preset database according to the classification label obtained by searching;

the output module 140: and the system is used for grading the similarity among the text fragments, sequencing the obtained grading values according to a preset sequencing sequence, selecting a preset number of text fragments according to the sequencing sequence, rendering the text fragments into corresponding texts and outputting the texts to the user side.

In one embodiment, the extracting text content from the texts of different types to obtain one or more text segments and storing the text segments in a preset database includes:

In one embodiment, before storing the classification tags of the text segments in the inverted index table to construct the multimedia library, the method further comprises:

inputting the second keyword into a search engine of the inverted index table;

Fig. 3 is a schematic diagram of an electronic device 1 according to a preferred embodiment of the invention.

The electronic device 1 includes but is not limited to: memory 11, processor 12, display 13, and network interface 14. The electronic device 1 is connected to a network through a network interface 14 to obtain raw data. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a global system for mobile communications (GSM), wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, bluetooth (Bluetooth), wi-Fi, or a call network.

The memory 11 includes at least one type of readable medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 11 may be an internal storage unit of the electronic device 1, such as a hard disk or a memory of the electronic device 1. In other embodiments, the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (FlashCard), or the like, which is equipped with the electronic device 1. Of course, the memory 11 may also comprise both an internal memory unit and an external memory device of the electronic device 1. In this embodiment, the memory 11 is generally used for storing an operating system installed in the electronic device 1 and various application software, such as program codes of the multimedia resource search 10. Further, the memory 11 may also be used to temporarily store various types of data that have been output or are to be output.

Processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 12 is typically used for controlling the overall operation of the electronic device 1, such as performing data interaction or communication related control and processing. In this embodiment, the processor 12 is configured to run a program code stored in the memory 11 or process data, for example, a program code of the multimedia resource search 10.

The display 13 may be referred to as a display screen or display unit. In some embodiments, the display 13 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (OLED) touch panel, or the like. The display 13 is used for displaying information processed in the electronic device 1 and for displaying a visual work interface, e.g. displaying the results of data statistics.

The network interface 14 may optionally comprise a standard wired interface, a wireless interface (e.g. WI-FI interface), the network interface 14 typically being used for establishing a communication connection between the electronic device 1 and other electronic devices.

Fig. 3 only shows the electronic device 1 with components 11-14 and the multimedia asset search 10, but it is understood that not all of the shown components are required to be implemented, and that more or less components may be implemented instead.

Optionally, the electronic device 1 may further comprise a user interface, the user interface may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface may further comprise a standard wired interface and a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (OLED) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.

The electronic device 1 may further comprise Radio Frequency (RF) circuitry, sensors, audio circuitry, etc., which will not be described in detail herein.

In the above embodiment, the processor 12 may implement the following steps when executing the multimedia resource search 10 stored in the memory 11:

The storage device may be the memory 11 of the electronic device 1, or may be another storage device communicatively connected to the electronic device 1.

For the detailed description of the above steps, please refer to the above description of fig. 2 regarding a functional block diagram of an embodiment of the multimedia resource searching apparatus 100 and fig. 1 regarding a flowchart of an embodiment of a multimedia resource searching method.

In addition, the embodiment of the present invention further provides a computer-readable medium, which may be non-volatile or volatile. The computer readable medium may be any one or any combination of hard disks, multi-media cards, SD cards, flash memory cards, SMC, read Only Memory (ROM), erasable Programmable Read Only Memory (EPROM), portable compact disc read only memory (CD-ROM), USB memory, and the like. The computer readable medium includes a data storage area and a program storage area, the data storage area stores data created according to the use of the blockchain node, the program storage area stores a multimedia resource 10, and the multimedia resource search 10 realizes the following operations when being executed by a processor:

The specific implementation of the computer readable medium of the present invention is substantially the same as the specific implementation of the multimedia resource searching method, and is not described herein again.

In another embodiment, in order to further ensure the privacy and security of all the data, all the data may be stored in a node of a block chain. Such as a first keyword, a second keyword, which may be stored in a block link point.

It should be noted that the blockchain in the present invention is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, which is used for verifying the validity (anti-counterfeiting) of the information and generating a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of another identical element in a process, apparatus, article, or method comprising the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention essentially or contributing to the prior art can be embodied in the form of a software product, which is stored in a medium (such as ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (such as a mobile phone, a computer, an electronic device, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for searching multimedia resources, the method comprising:

2. The method for searching multimedia resources according to claim 1, wherein the plurality of different types of texts include web page texts, PDF texts, picture texts, and video texts, and the extracting text contents from the plurality of different types of texts respectively to obtain one or more text segments and storing the one or more text segments in a preset database includes:

3. The method for searching for multimedia resources according to claim 1, wherein the segmenting words for each text segment to obtain the first keyword of each text segment comprises:

and calculating similarity values between adjacent phrases, and taking the phrases with the similarity values smaller than a preset threshold value as first keywords.

4. The method for searching for multimedia resources according to claim 1, wherein after said constructing an inverted index table of word searches based on said first keyword, the method further comprises:

5. The method of claim 1, wherein before storing the classification tags of the text segments in the inverted index table to construct a multimedia library, the method further comprises:

6. The method for searching for multimedia resources according to claim 1, wherein said extracting the second keyword from the query request comprises:

cutting words of the information of the query request to obtain a plurality of participles;

7. The method for searching for multimedia resources according to claim 1, wherein said searching for the category label of the text segment corresponding to the first keyword associated with the second keyword in the multimedia library according to the inverted index table and the second keyword, and reading the corresponding text segment from the preset database according to the retrieved category label comprises:

inputting the second keyword into a search engine of the inverted index table;

traversing the first keywords in the inverted index table according to the search engine to obtain first keywords associated with the second keywords;

8. An apparatus for searching multimedia resources, the apparatus comprising:

an extraction module: the system comprises a database, a word segmentation module and a word segmentation module, wherein the word segmentation module is used for extracting word contents from various different types of texts respectively to obtain one or more text segments, storing the one or more text segments to a preset database, and segmenting each text segment to obtain a first keyword of each text segment;

a storage module: the reverse index table is used for building word search according to the first key words, and the classification labels of the text segments are stored in the reverse index table to build a multimedia library;

the query module: the system comprises a multimedia library, a preset database and a query request sending module, wherein the multimedia library is used for receiving the query request sent by a user end, extracting a second keyword from the query request, searching a classification label of a text segment corresponding to a first keyword associated with the second keyword in the multimedia library according to the inverted index table and the second keyword, and reading the corresponding text segment from the preset database according to the classification label obtained by searching;

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

10. A computer-readable medium, characterized in that the computer-readable medium stores a multimedia resource, and the multimedia resource, when executed by a processor, implements the multimedia resource searching method according to any one of claims 1 to 7.