CN110309324B

CN110309324B - Searching method and related device

Info

Publication number: CN110309324B
Application number: CN201810195845.9A
Authority: CN
Inventors: 丁文彪; 孙玉玺; 沈炎军; 常庆丰; 潘达; 周泽南; 苏雪峰; 佟子健
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2018-03-09
Filing date: 2018-03-09
Publication date: 2024-03-22
Anticipated expiration: 2038-03-09
Also published as: CN110309324A

Abstract

The invention provides a searching method and a related device, wherein the searching method comprises the following steps: responding to a search request of a user, and acquiring a target file, wherein the target file comprises one or more of a picture, an audio file and a video file; acquiring a description text corresponding to the target file in an associated webpage, wherein the associated webpage is at least one webpage associated with the target file; and determining the video and audio information corresponding to the target file according to the description text. Therefore, in the embodiment of the invention, a large number of video clips are not required to be extracted from the audio file or the video file to establish indexes, but the corresponding video information can be determined by extracting and analyzing the description text of the webpage, and the description text occupies less storage resources compared with the files such as pictures, the audio clips, the video clips and the like, so that the requirement on storage capacity is reduced.

Description

Searching method and related device

Technical Field

The invention relates to the technical field of Internet, in particular to a searching method and a related device.

Background

With the rapid development of internet technology, search technology has been widely used. The user may search for information of interest using a search technique.

In the current searching technology, a user is usually only supported to search characters such as characters, however, when the user browses a file such as a picture, an audio clip, a video clip, etc., the user may want to search audio-visual information such as an audio-visual name, an actor name, a director name, etc., corresponding to the file. For example, when a user browses a frame of a movie on a web page, the user wants to search for a movie to which the frame belongs.

The inventor finds that in the related art, if the corresponding audio-video information is to be searched through files such as pictures, audio clips and video clips, a large number of audio-video clips are required to be extracted from the audio files or the video files, indexes between the extracted audio-video clips and the audio-video information are established, and searching is performed according to the established indexes. For example, when a movie to which a frame belongs is to be searched, a large number of key frames must be extracted from the movie, an index between the key frames and the movie name is created, and the search is performed based on the created index. Because the files such as pictures, audio clips and video clips often occupy more storage resources, the requirement of the searching mode on storage capacity is high.

Disclosure of Invention

The technical problem solved by the invention is to provide a searching method and a related device, which can search video and audio information corresponding to pictures, audio files or video files, and simultaneously reduce the requirement on storage capacity.

Therefore, the technical scheme for solving the technical problems is as follows:

the embodiment of the invention provides a searching method, which comprises the following steps:

responding to a search request of a user, and acquiring a target file, wherein the target file comprises one or more of a picture, an audio file and a video file;

acquiring a description text corresponding to the target file in an associated webpage, wherein the associated webpage is at least one webpage associated with the target file;

and determining the video and audio information corresponding to the target file according to the description text.

Optionally, the obtaining the description text corresponding to the target file in the associated web page includes:

inquiring an associated webpage corresponding to the target file according to a corresponding relation between a pre-established webpage file and a webpage comprising the webpage file; the webpage file comprises one or more of a picture, an audio file and a video file;

and acquiring the description text corresponding to the target file from the associated webpage.

Optionally, the querying, according to a correspondence between a pre-established web page file and a web page including the web page file, an associated web page corresponding to the target file includes:

determining standard files matched with the target file in a standard file library;

and inquiring the associated web page corresponding to the standard file according to the corresponding relation.

Optionally, the determining the standard file matched with the target file in the standard file library includes:

acquiring a feature vector of the target file, wherein the feature vector comprises a plurality of vector elements;

in the standard file library, according to the index relation between the vector elements of the standard file and the standard file, acquiring a standard file set matched with the vector elements of the target file;

and taking the intersection of the standard file sets as the standard files matched with the target file in the standard file library.

inquiring the description text corresponding to the target file according to the corresponding relation between the pre-established webpage file and the description text in the webpage comprising the webpage file; the web page file includes one or more of a picture, an audio file, and a video file.

determining an associated webpage according to the target file;

Optionally, the determining, according to the description text, the video and audio information corresponding to the target file includes:

inputting the description text into a pre-trained video and audio recognition model;

and obtaining the video and audio information corresponding to the target file through the video and audio identification model.

acquiring video and audio candidate words according to the description text;

and determining the first video information corresponding to the target file according to the video candidate words.

Optionally, the obtaining the audio-visual candidate word according to the description text includes:

and extracting keywords from the description text, and matching the keywords with the audio-video candidate word stock to obtain audio-video candidate words.

Optionally, determining the first audio-visual information corresponding to the target file according to the audio-visual candidate word includes:

according to at least one of the following parameters: and screening the video and audio candidate words, wherein the video and audio candidate words are in the description text, the webpage attribute of the associated webpage where the video and audio candidate words are located and the context information of the associated webpage where the video and audio candidate words are located.

Optionally, the method further comprises:

and acquiring second video and audio information corresponding to the target file according to the first video and audio information.

Optionally, the obtaining the target file in response to the search request of the user includes:

receiving a search request of a user, and acquiring an address of a target file carried in the search request;

and acquiring the target file according to the address of the target file.

detecting an operation of a user on a target file, wherein the target file comprises one or more of a picture, an audio file and a video file;

sending a search request of the user to a server, wherein the search request carries the identification of the target file;

and receiving the video and audio information corresponding to the target file returned by the server.

Optionally, detecting the operation of the user on the target file includes: detecting the operation of a user on a target file through a browser plug-in;

sending a search request of the user to a server, including: and sending a search request of a user to a server through the browser plug-in, wherein the search request carries the address of the target file.

Optionally, the method further comprises:

And displaying and/or playing the video and audio information.

The embodiment of the invention provides a searching device, which comprises:

the first acquisition unit is used for responding to a search request of a user and acquiring a target file, wherein the target file comprises one or more of a picture, an audio file and a video file;

the second acquisition unit is used for acquiring descriptive text corresponding to the target file in an associated webpage, wherein the associated webpage is at least one webpage associated with the target file;

and the determining unit is used for determining the video and audio information corresponding to the target file according to the description text.

Optionally, the second obtaining unit includes:

the query unit is used for querying the associated webpage corresponding to the target file according to the corresponding relation between the pre-established webpage file and the webpage comprising the webpage file; the webpage file comprises one or more of a picture, an audio file and a video file;

and the third acquisition unit is used for acquiring the description text corresponding to the target file from the associated webpage.

Optionally, the query unit includes:

the first determining subunit is used for determining standard files matched with the target file in the standard file library;

And the first inquiring subunit is used for inquiring the associated webpage corresponding to the standard file according to the corresponding relation.

Optionally, the first determining subunit is specifically configured to obtain a feature vector of the target file, where the feature vector includes a plurality of vector elements;

in the standard file library, according to the index relation between the vector elements of the standard file and the standard file, acquiring a standard file set matched with the vector elements of the target file; and taking the intersection of the standard file sets as the standard files matched with the target file in the standard file library.

Optionally, the second obtaining unit includes:

the second query subunit is used for querying the description text corresponding to the target file according to the corresponding relation between the pre-established webpage file and the description text in the webpage comprising the webpage file; the web page file includes one or more of a picture, an audio file, and a video file.

Optionally, the second obtaining unit includes:

the second determining subunit is used for determining the associated webpage according to the target file;

the first acquisition subunit is used for acquiring the description text corresponding to the target file from the associated webpage.

Optionally, the determining unit includes:

the input subunit is used for inputting the description text into a pre-trained video and audio recognition model;

and the identification subunit is used for obtaining the video and audio information corresponding to the target file through the video and audio identification model.

Optionally, the determining unit includes:

the second acquisition subunit is used for acquiring the video and audio candidate words according to the description text;

and the third determination subunit is used for determining the first video information corresponding to the target file according to the video candidate words.

Optionally, the second obtaining subunit is specifically configured to extract a keyword from the description text, and match the keyword with the audio-video candidate word bank to obtain an audio-video candidate word.

Optionally, the third determining subunit is specifically configured to, according to at least one of the following parameters: and screening the video and audio candidate words, wherein the video and audio candidate words are in the description text, the webpage attribute of the associated webpage where the video and audio candidate words are located and the context information of the associated webpage where the video and audio candidate words are located.

Optionally, the method further comprises: and the third acquisition subunit is used for acquiring the second video and audio information corresponding to the target file according to the first video and audio information.

Optionally, the first obtaining unit is specifically configured to receive a search request of a user, and obtain an address of a target file carried in the search request; and acquiring the target file according to the address of the target file.

The embodiment of the invention provides a searching device, which comprises:

the detection unit is used for detecting the operation of a user on a target file, wherein the target file comprises one or more of a picture, an audio file and a video file;

a sending unit, configured to send a search request of the user to a server, where the search request carries an identifier of the target file;

and the receiving unit is used for receiving the video and audio information corresponding to the target file returned by the server.

Optionally, the detecting unit is specifically configured to detect, by using a browser plug-in, an operation of a user on a target file; the sending unit is specifically configured to send a search request of a user to a server through the browser plug-in, where the search request carries an address of the target file.

Optionally, the method further comprises: the display unit is used for displaying the video and audio information, and the play unit is used for playing the video and audio information.

An embodiment of the invention provides a device for searching, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and are configured to be executed by one or more processors, the one or more programs comprising instructions for:

Embodiments of the present invention provide a machine-readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the search method embodiments described in any one or more of the above.

According to the technical scheme, in order to search the video and audio information corresponding to the target file, the target file comprises one or more of a picture, an audio file and a video file, the target file is obtained in response to a search request of a user, an associated webpage associated with the target file is determined, and the corresponding video and audio information is analyzed according to the corresponding description text of the target file in the associated webpage. Therefore, in the embodiment of the invention, a large number of video clips are not required to be extracted from the audio file or the video file to establish indexes, but the corresponding video information can be determined by extracting and analyzing the description text of the webpage, and the description text occupies less storage resources compared with the files such as pictures, the audio clips, the video clips and the like, so that the requirement on storage capacity is reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings for those of ordinary skill in the art.

Fig. 1 is a schematic architecture diagram of an application scenario provided in an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a search method according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating another embodiment of a search method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a user triggering a selection operation for a picture;

FIG. 5 is a flowchart illustrating another embodiment of a search method according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an embodiment of a device according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of another embodiment of a device according to the embodiments of the present application;

FIG. 8 is a block diagram of an apparatus for searching, according to an example embodiment;

fig. 9 is a schematic structural diagram of a server according to an embodiment of the present invention.

Detailed Description

When a user browses and is interested in a picture related to a movie (such as a frame of a movie in the movie, or a movie poster), the user wants to search for the movie corresponding to the picture, but the user does not know related information of the movie (such as the name of the movie, the director, etc.), so the description of the movie cannot be performed, and thus the user cannot search for the movie or the related information of the movie in a search engine.

In order to solve the above problems, there is a method for searching a video based on a picture, which is required to extract a large number of key frames from the video in advance, and to build an index of the key frames and the video, and to search according to the built index. It can be seen that this approach requires a large number of key frames to be stored, requiring a large amount of memory. For example, establishing an index of a video and a key frame requires at least 586Mb of memory resources.

In addition, the method can also adopt a similar mode to the searching method to realize the searching of the video and audio information based on files such as pictures, audio clips, video clips and the like, specifically, an index between the video and audio information extracted from the audio files and the video files needs to be established, and the technical problem that the requirement on storage capacity is high is also solved.

In order to reduce the requirement for storage capacity while a user searches corresponding audio-video information such as an audio-video name, an actor name, a director name, an audio-video resource and the like through pictures, audio-video files or video files, the embodiment of the invention provides a searching method, and the audio-video information corresponding to a target file selected by the user is determined according to the description text of a webpage occupying less storage resources; the object files include one or more of pictures, audio files, and video files. Taking the application scenario shown in fig. 1 as an example, specifically, when the user 101 browses to a target file of interest on a web page of the terminal 102, the user 101 may perform an operation on the terminal 102 for the target file; the terminal 102 responds to the operation and transmits a search request of the user to the server 103; the server 103 responds to the search request of the user, acquires the target file of the operation executed by the user 101, acquires the description text corresponding to the target file in the associated web page, wherein the associated web page is at least one web page associated with the target file, and then determines the video and audio information corresponding to the target file according to the description text. Therefore, in the process of obtaining the video and audio information corresponding to the target file selected by the user, the index relation between the key frame and the corresponding video is not required to be established, and the corresponding video and audio information can be obtained by extracting and analyzing the description text of the webpage occupying less storage resources, so that the required storage capacity is reduced. For example, it is assumed that after a user selects a picture, video information corresponding to the picture can be determined through description texts of 100 web pages, and storage resources occupied by the description texts of 100 web pages are about 0.5Mb-5Mb, so that the requirement on storage capacity is reduced.

In order to make the technical solution of the present invention better understood by those skilled in the art, the technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, shall fall within the scope of the invention.

Referring to fig. 2 together, fig. 2 shows a flowchart of a specific embodiment of a search method according to an embodiment of the present invention, where the method includes:

s201: and responding to the search request of the user, and acquiring the target file.

The target file may be a resource segment of audio-visual information that the user wants to search, or may be an audio-visual resource associated with the audio-visual information, and specifically, may include any one or more of a picture, an audio file, and a video file. For example, the target file selected by the user may be a picture, such as a album cover of a song, a frame of a movie, or a dynamic image of an image interchange format (Graphics Interchange Format, GIF), etc.; or a video file, such as a one-minute video file, a trailer, and a multi-frame picture in a movie; but also audio files, etc., such as thirty seconds of audio files in a song, etc.

In the embodiment of the invention, when a user browses to a target file through a browser and the like and wants to search video and audio information corresponding to the target file, the operation aiming at the target file can be performed, for example, the target file can be subjected to single click or double click operation through a mouse, or the target file is selected on a touch screen. According to the selection operation of the user, a search request of the user can be generated, wherein the search request is used for requesting to search the video and audio information corresponding to the target file.

In this embodiment, the address of the target file may be carried in the search request of the user, and in response to the search request, the target file is obtained according to the address of the target file in the search request, for example, the terminal may send the search request to the server, where the search request carries the address, and the server reads the target file stored corresponding to the address according to the address; or, the search request may not carry an address, but carry a feature vector of the target file, for example, when the terminal executes the search method of this embodiment, the terminal responds to the search request to directly obtain the feature vector of the target file carried by the request from the search request, and according to the feature vector, the target file can be determined, and no need to search the target file according to the address.

S202: and acquiring a description text corresponding to the target file in the associated webpage.

In this embodiment, the associated web page refers to at least one web page associated with the target file selected by the user. Specifically, the associated web pages include web pages displaying or playing audio-visual files having the same content as the target file and/or web pages displaying or playing audio-visual files having similar content to the target file. The following is a detailed description:

when the target file is a video file, the associated webpage plays a video file with the same content or similar content as the video file, or the associated webpage displays pictures which are the same as or similar to a plurality of frames in the video file; when the target file is an audio file, the associated webpage plays an audio file with the same content or similar content as the audio file; when the target file is a picture, the associated web page displays a picture with the same or similar content as the picture.

On a web page associated with a target file, the target file is generally described by texts such as a title, a body, a tag text and the like of the web page, for example, the video and audio information such as a video and audio name corresponding to the target file is described. Therefore, after determining the associated web page associated with the target file selected by the user, the embodiment obtains the description text corresponding to the target file from the texts such as the title, the text, the label text and the like of the associated web page.

S203: and determining the video and audio information corresponding to the target file according to the description text.

Because the content of the obtained description text usually describes the target file selected by the user, the audio-visual information corresponding to the target file selected by the user can be determined by analyzing the obtained description text.

In the embodiment of the invention, the video information may include related information such as a video name, video details, etc., for example, information such as actors or directors of a video file, singers of an audio file, album names, etc., the video information may also include video resources of a video file, etc., for example, the object file is a video clip of a movie, and the video information may include video resources of the movie to which the video clip belongs.

Further, after the video and audio information corresponding to the target file is determined, the video and audio information can be displayed and/or played. For example, when the audio-visual information is an audio-visual name or information of an actor, a director, etc., the audio-visual name or information of the actor, the director, etc. can be displayed. For another example, when the audio-visual information is an audio resource or a video resource, the audio resource or the video resource may be played.

As can be seen from the above technical solution, in this embodiment, a target file is obtained in response to a search request of a user, where the target file includes any one or more of a picture, an audio file, and a video file. And determining associated web pages associated with the target file, and analyzing corresponding film and television information from corresponding description texts of the target file in the associated web pages. The above process can be seen that in the process of searching the video and audio information corresponding to the target file, the description text of the web page is extracted and analyzed to determine the video and audio information, a large number of video and audio clips are not required to be extracted from the audio file or the video file to build indexes, and the description text of the web page occupies less storage resources compared with the files such as pictures, audio clips, video clips and the like, so that the requirement on storage capacity is reduced.

In addition, the target file in the embodiment of the invention can be an audio-video clip and the audio-video information can be an audio-video resource, so that the audio-video resource to which the audio-video clip belongs or related to the audio-video resource can be searched according to the searching method in the embodiment of the invention. For example, the corresponding movie is searched according to a frame of picture in the movie, or the corresponding movie is searched according to the related poster or trailer of the movie, etc., and the search range is wider.

In one implementation manner of the embodiment of the present invention, the related web page may be acquired first, and then the description text corresponding to the target file may be acquired from the related web page, and a specific acquisition manner of the related web page is described in an exemplary manner. In the embodiment of the invention, after the target file is acquired, the associated web page can be determined according to the target file, for example, the web page of the video file with the same content or similar content as the target file is inquired and displayed, and/or the web page of the video file with the same content or similar content as the target file is played as the associated web page; in the embodiment of the present invention, a corresponding relationship between a web page file and a web page including the web page file may be established in advance, and an associated web page corresponding to a target file may be queried according to the corresponding relationship, which is described in detail below.

The pre-established correspondence may be specifically a correspondence between an identifier of a web page file and a web page including the web page file, where the web page file may include any one or more of a picture, an audio file, and a video file. For example, a plurality of available pictures are extracted from a large number of web pages, and a correspondence relationship between the identification of the picture and the web page on which the picture is displayed is established.

Since the web page files extracted from different web pages may be identical, a globally uniform identification may be employed for web page files having identical content. For example, a picture a is displayed on a web page a, a picture B with the same content as the picture a is displayed on a web page B, and it can be determined that the two pictures are pictures with the same content according to the feature vectors of the picture a and the picture B, so that the two pictures can correspond to a unified picture identifier. And obtaining the corresponding relation of the picture identification corresponding to the plurality of webpages. Specifically, a standard file library is established, a plurality of standard files are stored in the standard file library, and each standard file has a globally uniform identifier, so that when an associated webpage is inquired, a target file can be matched with the standard files in the standard file library, and the standard files matched with the target file in the standard file library are determined; and inquiring the associated webpage corresponding to the standard file according to the corresponding relation between the webpage file and the webpage comprising the webpage file. The standard file may include any one or more of a picture, an audio file, and a video file, among others. For example, a plurality of standard pictures are stored in a standard picture library, each standard picture corresponds to a unique picture identifier, a target picture selected by a user is matched with the standard picture library, the standard picture matched with the target picture is determined, and a corresponding relation between the identifier of the webpage picture and a webpage displaying the webpage picture is inquired by utilizing the identifier of the standard picture, so that an associated webpage is obtained.

When the standard files are stored in the standard file library, the feature vectors of the standard files can be stored specifically, and when the matched standard files are determined, the feature vectors of the target files are compared with the feature vectors of the standard files specifically. In order to reduce the comparison times and improve the matching speed, the video and audio features of the standard file can be extracted, the video and audio features are converted into one-dimensional or multi-dimensional feature vectors according to a specific algorithm, the vector elements in the feature vectors are used as entries, the inverted index of the standard file is established, and the inverted index is stored in a standard file library and is specifically described below.

Specifically, the feature vector of the standard file includes a plurality of vector elements, and an index relationship between each vector element of the standard file and the standard file is established. Specifically, an index relationship between the vector element and the identifier of the standard file may be established, for example, the feature vector of the standard file a is (a 1, a2, a 3), and the identifier of the standard file a is ID01, where the established index relationship includes: the correspondence between the feature vector a1 and the ID01, the index relationship between the feature vector a2 and the ID01, and the index relationship between the feature vector a3 and the ID 01. The feature vector may include any one or more of a texture feature vector, a color feature vector, a shape feature vector, and a spatial relationship feature vector, which are not limited by the comparison of the embodiment of the present invention.

When matching a target file with a standard file by utilizing the inverted index relationship, specifically, obtaining a feature vector of the target file, wherein the feature vector comprises a plurality of vector elements; searching vector elements which are the same as the vector elements of the target file in a standard file library respectively, determining a standard file set matched with the same vector elements according to an index relation, and taking the intersection of the standard file set as a standard file matched with the target file in the standard file library. The following is an illustration.

After the target file H selected by the user is obtained, extracting the feature vector of the target file H as [ m, n ], wherein the set X of the identifiers matched with the standard file in the standard file library by using the vector element m is { ID01, ID02, ID03}, the set Y of the identifiers matched with the standard file in the standard file library by using the vector element n is { ID02, ID04}, and solving an intersection of the set X and the set Y to obtain an intersection { ID02}, wherein the ID02 in the intersection is the identifier of the standard file matched with the target file H in the standard file library.

In another implementation manner of the embodiment of the present invention, the association web page does not need to be acquired, but a corresponding relationship between the web page file and the description text in the web page including the web page file is established in advance, and the description text corresponding to the target file is directly queried according to the corresponding relationship, where the queried description text is actually the description text corresponding to the target file in the association web page. The corresponding relation can be obtained according to the corresponding relation between the webpage file and the webpage comprising the webpage file and the corresponding relation between the webpage and the descriptive text in the webpage. For example, the identifier ID01 of the picture a and the address a of the web page including the picture a have a correspondence relationship, the address a of the web page has a correspondence relationship with the description text M1 of the web page, and according to the correspondence relationship, the correspondence relationship between the identifier ID01 of the picture a and the description text M1 can be obtained. Therefore, in this embodiment, the description text corresponding to the target file is directly queried according to the corresponding relationship between the web page file and the description text, so that the query speed of the description text can be improved, and the storage resource is further saved.

In S203 in the embodiment of the present invention, the audio-visual information corresponding to the target file is determined according to the description text, and a specific implementation manner of determining the audio-visual information according to the description text is described in the following.

In the embodiment of the invention, the video and audio recognition model can be trained in advance through deep learning and other modes, and the description text is input into the pre-trained video and audio recognition model; and obtaining the video and audio information corresponding to the target file through the video and audio identification model. The audio-video candidate words can include audio-video names, audio-video detail information and the like. For example, the descriptive text is input into the audiovisual recognition model, which outputs one or more audiovisual names.

When determining the audio-visual information, in an optional implementation manner, an audio-visual candidate word is obtained from the description text, and the first audio-visual information corresponding to the target file is determined according to the audio-visual candidate word. The first audio-visual information may be audio-visual name, audio-visual detail information (such as director, actor name, etc.), after determining the first audio-visual information, the second audio-visual information corresponding to the target file may be searched further according to the first audio-visual information, for example, related information may be searched in audio-visual encyclopedia and audio-visual information index according to the audio-visual name, such as poster, director, brief introduction, play link, singer name, etc.

When extracting the audio-video candidate words, specifically, extracting keywords from the description text, for example, extracting audio-video names, and matching the keywords with the audio-video candidate word bank to obtain the audio-video candidate words. The keywords may be extracted by a plurality of extraction methods, for example, the keywords may be extracted from the description text by an audio-visual recognition model, or the description text may be subjected to word segmentation, word segmentation or filtering, to obtain the extracted keywords. The candidate word library may include a plurality of candidate words, for example, including a plurality of movie names, and the keywords extracted from the description text are matched with the movie names in the candidate word library, so as to determine the matched movie names as the finally extracted movie names; the audio-video candidate word library may also include a correspondence between audio-video candidate words and audio-video detail information, for example, including a correspondence between actor names and movie names, and the keyword extracted from the description text is matched with the correspondence, specifically, the keyword is matched with the audio-video detail information, and the corresponding audio-video candidate words are determined from the correspondence according to the matched audio-video detail information, for example, the actor names are extracted from the description text: z and S, and determining the corresponding movie names according to the corresponding relation between the actor names and the movie names: GG.

When the number of the extracted video candidate words is multiple, the multiple video candidate words can be screened, and one or more screened video candidate words are used as first video information corresponding to the target file. At the time of screening, the following parameters may be used: the number of times of occurrence of the video and audio candidate words in the description text, the webpage attribute of the associated webpage where the video and audio candidate words are located, and the context information of the associated webpage where the video and audio candidate words are located. The following description will be given separately.

The more the number of occurrences of the video candidate word in the description text, the higher the likelihood that the video candidate word is the corresponding video information is, so that the video candidate word can be screened according to the number of occurrences of the video candidate word, for example, the video candidate word with the number of occurrences being the first few or the number of occurrences being greater than a preset threshold is used as the screened video candidate word. For example, matching the keywords extracted from the descriptive text with the audio-video candidate word library to obtain an audio-video candidate word a, an audio-video candidate word B and an audio-video candidate word C, and assuming that the number of occurrences of the audio-video candidate word a in the descriptive text is 45 times, the number of occurrences of the audio-video candidate word B in the descriptive text is 60 times, and the number of occurrences of the audio-video candidate word C in the descriptive text is 55 times, then the audio-video candidate word B with the largest number of occurrences may be used as the audio-video candidate word selected, or the audio-video candidate word B and the audio-video candidate word C with the number of occurrences greater than a preset threshold (e.g. 50) may be used as the audio-video candidate word selected.

The web page attribute of the associated web page where the video and audio candidate word is located can comprise any one or more of the web page category, the number of times the web page is browsed, the web page tag and other attributes, and the video and audio candidate word can be screened through the web page attribute. Taking the example that the webpage attribute comprises a webpage category, wherein the associated webpage comprises an associated webpage of a novel class and an associated webpage of a film and television class, matching keywords extracted from the associated webpage of the film and television class with an audio-video candidate word stock to obtain an audio-video candidate word D, matching keywords extracted from the associated webpage of the novel class with the audio-video candidate word stock to obtain an audio-video candidate word E, and taking the audio-video candidate word D as the screened audio-video candidate word according to the webpage category.

In addition, the video and audio candidate words can be screened from the context information on the associated web page where the video and audio candidate words are located. For example, if the context information of the associated web page where the candidate words of the video include related words related to the video, such as "director", "actor", "scenario introduction", "singer", etc., the candidate words of the video are used as the candidate words of the video after screening.

The embodiment of the searching method can be applied to a terminal or a server. Specifically, when the method is applied to a terminal, the terminal can detect a search request of a user generated based on the operation of the user on a target file, acquire the target file in response to the search request, and then determine audio-visual information corresponding to the target file from a description text corresponding to the target file in an associated webpage; when the method is applied to the server, the server can receive a search request of a user sent by the terminal, acquire a target file according to the search request of the user, further determine audio-visual information corresponding to the target file from description texts corresponding to the target file in the associated webpage, and return the audio-visual information to the terminal.

When the above embodiment of the search method is applied to the server, the implementation procedure of the terminal side may refer to the flow shown in fig. 3, which is specifically as follows:

s301: and detecting the operation of the user on the target file.

Wherein the target file includes one or more of a picture, an audio file, and a video file.

S302: and sending a search request of a user to a server, wherein the search request carries the identification of the target file and is used for requesting the server to search the video and audio information corresponding to the target file.

The identification of the target file may include an address of the target file, or may include a feature vector of the target file, etc.

S303: and receiving the video and audio information corresponding to the target file returned by the server.

The process of obtaining the audio-visual information by the server may be in the embodiment corresponding to fig. 2, which is not described herein.

As an exemplary embodiment, when a user browses to and is interested in a target file being displayed or played on a web page of a terminal, the user may perform a selection operation on the target file to request to search for audio-visual information corresponding to the target file. Therefore, when detecting the operation of the user on the target file, the terminal can respond to the selection operation of the user, acquire the identification of the target file, such as the feature vector of the target file or the address of the target file, from the webpage, and send the search request to the server after generating the search request containing the identification of the target file, so that the server can search the video and audio information corresponding to the target file according to the identification of the target file in the user search request. The terminal can further display and/or play the video and audio information.

In one embodiment, the browser plug-in can be installed on the terminal to detect and respond to the operation of the user on the target file, so that the user can quickly and conveniently select the target file of interest and search, the learning cost and the using cost of the user are low, and the user experience is improved. For example, as shown in fig. 4, taking a target file as a picture, a user moves a cursor around the picture selected by the user through a mouse, a touch screen or the like, then clicks a right button, and selects a corresponding item from a pop-up menu box, for example, clicks a "picture search engine" in the menu box, and after detecting the above operation of the user, the browser plug-in sends a search request to the server in response to the above operation, where the search request carries an address of the picture selected by the user or carries a feature vector of the picture selected by the user. The user may also implement the operation on the picture in other manners, for example, by clicking the left button for a long time, which is not limited in the embodiment of the present invention.

Further, in the embodiment of the invention, the field and/or type of the video and audio information to be searched can be determined through the operation of the user on the target file, and the field and/or type is used as screening information. When the audio-visual information is determined in step S203, the audio-visual information corresponding to the target file and the filtering information at the same time is specifically determined. Wherein, the screening information can be determined according to the operation of the user.

For example, a user selects a picture as a target file, and according to the picture, video and audio information in a plurality of fields such as a video field and a variety field can be determined, and if the user selects the video field, the video and audio information in the video field such as a movie file corresponding to the picture can be determined; if the user selects the variety field, the video and audio information of the variety field such as a picture in the variety program corresponding to the picture can be determined. For example, when the user selects an audio clip as the target file, various types of audio-video information such as text, video, audio and the like can be determined according to the audio clip, if the user selects the text, the audio-video information such as album name, singer and the like corresponding to the audio clip can be determined, and if the user selects the audio clip, the audio-video information such as the audio file and the like to which the audio clip belongs can be determined.

The method can respond to the search operation of the user to provide all fields and/or all types of video and audio information for the user, for example, according to the right click operation of the user, pop-up menu boxes, click on 'image search engine' in the menu boxes, pop-up submenu boxes, provide the searched fields and/or types of information for the user, and also provide the preset number of searched fields and/or types of information for the user, and further provide other fields and/or types of information according to the trigger operation of the user.

In the following, a specific scenario is taken as an example to describe a specific embodiment of a method for searching audio-visual information. In the specific scene, a user browses a web page on a browser and is interested in the "actor Q" scenario being displayed on the web page, and wants to acquire a movie name corresponding to the "actor Q" scenario by triggering the search for the "actor Q" scenario.

Referring to fig. 5, fig. 5 is a flow chart illustrating another embodiment of a searching method according to an embodiment of the invention. The method specifically comprises the following steps:

s501: the user moves the cursor to the "actor Q" episode being displayed on the web page by moving the mouse and right-clicking the mouse to cause the web page to pop up a menu box.

S502: the user selects a corresponding item in the pop-up menu box to trigger a search for the movie name corresponding to the "actor Q" episode.

S503: and the browser plug-in responds to the selection operation of the user and acquires the address of the 'actor Q' scenario.

The browser plug-in can be installed in a browser based on WebKit, such as a chrome, a dog search browser and the like.

S504: the browser plug-in generates a search request for the user, wherein the search request for the user carries an address of the "actor Q" episode.

S505: the browser plug-in sends the search request of the user to the server.

S506: and the server acquires the 'actor Q' scenario according to the address of the 'actor Q' scenario in the search request of the user.

S507: the server extracts the feature vector of the "actor Q" episode.

S508: and the server determines a standard picture matched with the 'actor Q' scenario from a picture library according to the feature vector of the 'actor Q' scenario.

The standard pictures are pictures stored in a standard picture library, and each standard picture has a globally unique identifier. And the matched standard pictures are pictures with the same or similar contents as the 'actor Q' drama in the standard picture library.

When the matched standard pictures are determined, the feature vector of the 'actor Q' scenario can be obtained, the feature vector comprises a plurality of vector elements, then the set of the identifications of the standard pictures corresponding to each vector element is obtained from a picture library, the set of the identifications of the standard pictures corresponding to each vector element is subjected to intersection, and the identifications of the standard pictures in the intersection are used as the identifications of the standard pictures matched with the 'actor Q' scenario selected by a user.

S509: and the server inquires the description text corresponding to the standard picture.

As an example, the server may first obtain a first correspondence between the identifier of the standard picture and the web page address, query the web page address corresponding to the identifier of the matched standard picture according to the first correspondence, obtain a second correspondence between the web page address and the description text of the web page, and obtain the description text corresponding to the matched standard picture by using the second correspondence and the queried web page address; in another example, the server may obtain a correspondence between the identifier of the standard picture and the descriptive text of the web page, and query the descriptive text corresponding to the identifier of the matched standard picture using the correspondence.

S510: the server analyzes the queried description text and determines the movie name corresponding to the 'actor Q' scenario.

In an exemplary embodiment, after the descriptive text is obtained, the descriptive text may be input into a trained video recognition model to obtain a movie name output by the model. The audio-video recognition model can extract keywords from the descriptive text, match the extracted keywords with an audio-video candidate word bank to obtain a plurality of matched audio-video candidate words, and then screen the plurality of matched audio-video candidate words to obtain a screened audio-video candidate word as the determined movie name.

S511: and the server searches the related information of the movie corresponding to the movie name according to the determined movie name.

S512: the server sends the relevant information of the movie to the browser plug-in.

S513: the browser plug-in displays relevant information of the movie on the page and/or plays the movie.

On the one hand, when the user browses the interesting "actor Q" scenario on the webpage, the user can directly select the "actor Q" scenario to trigger the search of the movie corresponding to the "actor Q" scenario, so that the user can conveniently and rapidly select the target file, and the learning cost and the use cost of the user are lower; on the other hand, the server does not need to extract and store a large number of key frames from the movie in advance, but analyzes the key frames according to the description text corresponding to the standard picture matched with the "actor Q" scenario, so as to obtain the movie name corresponding to the "actor Q" scenario, and the description text occupies less storage resources compared with the files such as pictures, audio clips, video clips and the like, so that the requirement on storage capacity is reduced.

Corresponding to the method embodiment shown in fig. 2, the embodiment of the present invention further provides a corresponding device embodiment, which is specifically described below.

Referring to fig. 6, an embodiment of the present invention provides an apparatus embodiment of a search apparatus, including: a first acquisition unit 601, a second acquisition unit 602, and a determination unit 603.

The first obtaining unit 601 is configured to obtain, in response to a search request of a user, a target file, where the target file includes one or more of a picture, an audio file, and a video file.

The second obtaining unit 602 is configured to obtain a description text corresponding to the target file in an associated web page, where the associated web page is at least one web page associated with the target file.

And the determining unit 603 is configured to determine, according to the description text, audio-visual information corresponding to the target file.

Optionally, the second obtaining unit includes:

Optionally, the query unit includes:

Optionally, the second obtaining unit includes:

Optionally, the determining unit includes:

The specific manner in which the respective units perform the operations in the apparatus of the above embodiment has been described in detail in the embodiment of the method corresponding to fig. 2, and will not be described in detail here.

Corresponding to the method embodiment shown in fig. 3, the embodiment of the present invention further provides a corresponding device embodiment, which is specifically described below.

Referring to fig. 7, another embodiment of a search apparatus is provided, including: a detection unit 701, a transmission unit 702, and a reception unit 703.

And a detection unit 701, configured to detect an operation of a user on a target file, where the target file includes one or more of a picture, an audio file, and a video file.

And the sending unit 702 is configured to send a search request of the user to a server, where the search request carries an identifier of the target file.

And the receiving unit 703 is configured to receive the video and audio information corresponding to the target file returned by the server.

The specific manner in which the respective units perform the operations in the apparatus of the above embodiment has been described in detail in the embodiment of the method corresponding to fig. 3, and will not be described in detail here.

Fig. 8 is a block diagram illustrating an apparatus 800 for searching according to an example embodiment. For example, apparatus 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.

Referring to fig. 8, apparatus 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the apparatus 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the device 800. Examples of such data include instructions for any application or method operating on the device 800, contact data, phonebook data, messages, pictures, videos, and the like. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 806 provides power to the various components of the device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 800.

The multimedia component 808 includes a screen between the device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the apparatus 800. For example, the sensor assembly 814 may detect an on/off state of the device 800, a relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or one component of the apparatus 800, the presence or absence of user contact with the apparatus 800, an orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communication between the apparatus 800 and other devices, either in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including instructions executable by processor 820 of apparatus 800 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

A non-transitory computer readable storage medium, which when executed by a processor of a mobile terminal, causes the mobile terminal to perform a search method, the method comprising:

Fig. 9 is a schematic structural diagram of a server according to an embodiment of the present invention. The server 1900 may vary considerably in configuration or performance and may include one or more central processing units (central processing units, CPU) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) that store applications 1942 or data 1944. Wherein the memory 1932 and storage medium 1930 may be transitory or persistent. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, a central processor 1922 may be provided in communication with a storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.

The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

1. A search method, comprising:

responding to a search request of a user, and acquiring a target file, wherein the target file comprises one or more of a picture, an audio file and a video file; the search request is generated by a browser plug-in installed on the terminal in response to a user selection operation of the target file being displayed or played on a web page;

according to the description text, determining the video and audio information corresponding to the target file specifically comprises the following steps:

extracting keywords from the descriptive text;

matching the keywords with the audio and video detail information in the audio and video candidate word library, and determining the matched audio and video candidate words from the corresponding relation between the audio and video detail information in the audio and video candidate word library according to the matched audio and video detail information; the audio-video candidate word library comprises the corresponding relation between the audio-video detail information and the audio-video candidate words;

determining first audio-visual information corresponding to the target file according to the matched audio-visual candidate words, and sending the first audio-visual information to the browser plug-in, so that the browser plug-in displays and/or plays the first audio-visual information on a page, wherein the first audio-visual information comprises audio-visual resources to which the target file belongs or is associated;

2. The method of claim 1, wherein the obtaining the descriptive text corresponding to the target file in the associated web page includes:

3. The searching method according to claim 2, wherein the querying the associated web page corresponding to the target file according to the correspondence between the pre-established web page file and the web page including the web page file includes:

4. A search method according to claim 3, wherein said determining a standard file in a standard file library to which said target file matches comprises:

5. The method of claim 1, wherein the obtaining the descriptive text corresponding to the target file in the associated web page includes:

6. The method of claim 1, wherein the obtaining the descriptive text corresponding to the target file in the associated web page includes:

determining an associated webpage according to the target file;

7. The method of claim 1, wherein determining the video and audio information corresponding to the target file according to the description text comprises:

8. The method of claim 1, wherein the determining the first audio-visual information corresponding to the target file according to the matched audio-visual candidate word includes:

according to at least one of the following parameters: and screening the matched video candidate words, wherein the occurrence times of the matched video candidate words in the descriptive text, the webpage attribute of the associated webpage where the matched video candidate words are located and the context information of the associated webpage where the matched video candidate words are located.

9. The search method according to claim 1, wherein acquiring the target file in response to the search request of the user includes:

and acquiring the target file according to the address of the target file.

10. A search method, comprising:

detecting the selection operation of a user on a target file displayed or played on a webpage through a browser plug-in installed on a terminal, and generating a search request of the user, wherein the target file comprises one or more of a picture, an audio file and a video file;

Sending a search request of the user to a server so that the server obtains a description text corresponding to the target file in an associated webpage, wherein the associated webpage is at least one webpage associated with the target file; extracting keywords from the descriptive text; matching the keywords with the audio and video detail information in the audio and video candidate word library, and determining the matched audio and video candidate words from the corresponding relation between the audio and video detail information in the audio and video candidate word library according to the matched audio and video detail information; the audio-video candidate word library comprises the corresponding relation between the audio-video detail information and the audio-video candidate words; determining first video information corresponding to the target file according to the matched video candidate words, and sending the first video information to the browser plug-in; the search request carries the identification of the target file;

receiving first video and audio information corresponding to the target file returned by the server through the browser plug-in, and displaying and/or playing the first video and audio information on a page, wherein the first video and audio information comprises video and audio resources to which the target file belongs or is associated with; the server is also used for obtaining second video and audio information corresponding to the target file according to the first video and audio information.

11. The method of searching according to claim 10, wherein,

12. A search apparatus, comprising:

the first acquisition unit is used for responding to a search request of a user and acquiring a target file, wherein the target file comprises one or more of a picture, an audio file and a video file; the search request is generated by a browser plug-in installed on the terminal in response to a user selection operation of the target file being displayed or played on a web page;

the determining unit is used for determining the video and audio information corresponding to the target file according to the description text;

the determination unit includes:

a second obtaining subunit, configured to extract a keyword from the description text; matching the keywords with the audio and video detail information in the audio and video candidate word library, and determining the matched audio and video candidate words from the corresponding relation between the audio and video detail information in the audio and video candidate word library according to the matched audio and video detail information; the audio-video candidate word library comprises the corresponding relation between the audio-video detail information and the audio-video candidate words;

A third determining subunit, configured to determine, according to the matched audio-video candidate word, first audio-video information corresponding to the target file, where the first audio-video information is used to send the first audio-video information to the browser plug-in, so that the browser plug-in displays and/or plays the first audio-video information on a page, where the first audio-video information includes an audio-video resource to which the target file belongs or is associated with the first audio-video information;

and the third acquisition subunit is used for acquiring the second video and audio information corresponding to the target file according to the first video and audio information.

13. The search device according to claim 12, wherein the second acquisition unit includes:

14. The search apparatus of claim 13, wherein the query unit comprises:

15. The apparatus according to claim 14, wherein the first determining subunit is specifically configured to obtain a feature vector of the target file, where the feature vector includes a plurality of vector elements;

16. The search device according to claim 12, wherein the second acquisition unit includes:

17. The search device according to claim 12, wherein the second acquisition unit includes:

18. The search apparatus according to claim 12, wherein the determination unit includes:

19. The search device of claim 12, wherein the third determination subunit is configured to, in particular, determine, according to at least one of the following parameters: and screening the matched video candidate words, wherein the occurrence times of the matched video candidate words in the descriptive text, the webpage attribute of the associated webpage where the matched video candidate words are located and the context information of the associated webpage where the matched video candidate words are located.

20. The apparatus according to claim 12, wherein the first obtaining unit is specifically configured to receive a search request from a user, and obtain an address of a target file carried in the search request; and acquiring the target file according to the address of the target file.

21. A search apparatus, comprising:

the detection unit is used for detecting the selection operation of a user on a target file which is displayed or played on a webpage through a browser plug-in installed on the terminal and generating a search request of the user, wherein the target file comprises one or more of pictures, audio files and video files;

the sending unit is used for sending a search request of the user to a server so that the server obtains a description text corresponding to the target file in an associated webpage, wherein the associated webpage is at least one webpage associated with the target file; extracting keywords from the descriptive text; matching the keywords with the audio and video detail information in the audio and video candidate word library, and determining the matched audio and video candidate words from the corresponding relation between the audio and video detail information in the audio and video candidate word library according to the matched audio and video detail information; the audio-video candidate word library comprises the corresponding relation between the audio-video detail information and the audio-video candidate words; determining first video information corresponding to the target file according to the matched video candidate words, and sending the first video information to the browser plug-in; the search request carries the identification of the target file;

The receiving unit is used for receiving first video and audio information corresponding to the target file returned by the server through the browser plug-in; the server is also used for acquiring second video and audio information corresponding to the target file according to the first video and audio information;

the apparatus further comprises: a display unit and/or a play unit;

the display unit is used for displaying the first video and audio information;

the playing unit is used for playing the first video and audio information, and the first video and audio information comprises video and audio resources to which the target file belongs or is associated.

22. The apparatus according to claim 21, wherein the sending unit is specifically configured to send a search request of a user to a server through the browser plug-in, where the search request carries an address of the target file.

23. An apparatus for searching, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:

extracting keywords from the descriptive text;

24. The apparatus of claim 23, wherein the processor obtaining descriptive text corresponding to the target file in an associated web page comprises:

25. The apparatus of claim 24, wherein the processor queries the associated web page corresponding to the target file based on a pre-established correspondence between web page files and web pages comprising the web page files, comprising:

26. The apparatus of claim 25, wherein the processor determining a standard file for which the target file matches in a standard file library comprises:

27. The apparatus of claim 23, wherein the processor obtaining descriptive text corresponding to the target file in an associated web page comprises:

28. The apparatus of claim 23, wherein the processor obtaining descriptive text corresponding to the target file in an associated web page comprises:

determining an associated webpage according to the target file;

29. The apparatus of claim 23, wherein the processor determines the audiovisual information corresponding to the target file based on the descriptive text, comprising:

30. The apparatus of claim 23, wherein the processor determining the first audiovisual information corresponding to the target file from the matched audiovisual candidate word comprises:

31. The apparatus of claim 23, wherein the processor, in response to a search request from a user, obtains the target file, comprising:

and acquiring the target file according to the address of the target file.

32. An apparatus for searching, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:

33. The apparatus of claim 32, wherein the device comprises a plurality of sensors,

34. A machine readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the search method of one or more of claims 1 to 9.

35. A machine readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the search method of one or more of claims 10 to 11.