CN111708911B - Searching method, searching device, electronic equipment and computer-readable storage medium - Google Patents

Searching method, searching device, electronic equipment and computer-readable storage medium Download PDF

Info

Publication number
CN111708911B
CN111708911B CN202010555041.2A CN202010555041A CN111708911B CN 111708911 B CN111708911 B CN 111708911B CN 202010555041 A CN202010555041 A CN 202010555041A CN 111708911 B CN111708911 B CN 111708911B
Authority
CN
China
Prior art keywords
search
keyword
language
participle
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010555041.2A
Other languages
Chinese (zh)
Other versions
CN111708911A (en
Inventor
王鑫宇
张永华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Douyin Vision Co Ltd
Douyin Vision Beijing Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202010555041.2A priority Critical patent/CN111708911B/en
Publication of CN111708911A publication Critical patent/CN111708911A/en
Application granted granted Critical
Publication of CN111708911B publication Critical patent/CN111708911B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/732Query formulation

Abstract

The disclosure provides a searching method, a searching device, electronic equipment and a computer-readable storage medium, and relates to the field of information processing. The method comprises the following steps: acquiring a search request; the search request comprises a first search keyword in a first language; searching based on the first search keyword to obtain a search result list; the search result list comprises at least one of a first search result obtained according to the first search keyword and a second search result obtained according to the second search keyword, and the second search keyword is in a second language and corresponds to the first search keyword; and displaying the search result list. According to the method and the device, when the number of the search results obtained by searching based on the first search keyword is small, a large number of search results can be finally obtained by searching, the fact that a user can obtain enough search results is guaranteed, the search efficiency is improved, meanwhile, the process of obtaining the second search keyword and the process based on the second search keyword are not sensible to the user, and the user experience is improved.

Description

Searching method, searching device, electronic equipment and computer-readable storage medium
Technical Field
The present disclosure relates to the field of information processing technologies, and in particular, to a search method, an apparatus, an electronic device, and a computer-readable storage medium.
Background
The technological progress makes the terminal function more and more powerful, and the user can satisfy corresponding needs by installing various types of application programs, for example, by installing the application program of the video playing class to watch the video.
In the prior art, a user can search videos required by the user in video playing applications, but when search keywords input by the user are in a Chinese language, the search results are few or even no search results are generated due to the small amount of searchable resources, so that the search requirements of the user cannot be met, and the user experience is influenced.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
The present disclosure provides a search method, apparatus, electronic device, and computer-readable storage medium, which can solve the problem of a small search resource when performing a video search using a search keyword in a small language.
The technical scheme is as follows:
in a first aspect, a search method is provided, and the method includes:
acquiring a search request, wherein the search request comprises a first search keyword in a first language;
searching based on the first search keyword to obtain a search result list; the search result list comprises at least one of a first search result obtained according to a first search keyword and a second search result obtained according to a second search keyword, wherein the second search keyword is in a second language and corresponds to the first search keyword;
and displaying the search result list.
In a second aspect, a search apparatus is provided, the apparatus comprising:
the device comprises a receiving module, a searching module and a searching module, wherein the receiving module is used for acquiring a searching request which comprises a first searching keyword of a first language;
the search module is used for searching based on the first search keyword to obtain a search result list; the search result list comprises at least one of a first search result obtained according to a first search keyword and a second search result obtained according to a second search keyword, wherein the second search keyword is in a second language and corresponds to the first search keyword;
and the display module is used for displaying the search result list.
In a third aspect, an electronic device is provided, which includes:
a processor, a memory, and a bus;
the bus is used for connecting the processor and the memory;
the memory is used for storing operation instructions;
the processor is configured to, by invoking the operation instruction, make the processor perform an operation corresponding to the search method shown in the first aspect of the disclosure.
In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the search method shown in the first aspect of the present disclosure.
The technical scheme provided by the disclosure has the following beneficial effects:
in the embodiment of the disclosure, a search request is obtained first; the search request comprises a first search keyword of a first language, and then a search result list is obtained based on the first search keyword; the search result list comprises at least one of a first search result obtained according to a first search keyword and a second search result obtained according to a second search keyword, wherein the second search keyword is in a second language and corresponds to the first search keyword; and displaying the search result list. Thus, when the search result obtained by the search based on the first search keyword of the first language is less, the second search keyword with the same semantic meaning as the first search keyword can be obtained, and the search is carried out based on the second search keyword to obtain the search result containing the second search keyword, because the second search keyword is a language with wider universality, the search based on the second search keyword can obtain a great number of search results containing the second search keyword, so that when the search result obtained by the search based on the first search keyword is less, a great number of search results can still be finally obtained by the search, the user can be ensured to obtain enough search results, the search efficiency is improved, meanwhile, the processes of obtaining the second search keyword and the search result based on the second search keyword are not sensible to the user, and the user does not need to participate in the operation, thereby improving the user experience.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and components are not necessarily drawn to scale.
Fig. 1 is a schematic flow chart of a search method according to an embodiment of the present disclosure;
FIG. 2 is a schematic view of a search interface for entering search keywords in the present disclosure;
FIG. 3 is a first interface diagram showing search results in the present disclosure;
FIG. 4 is a second interface diagram showing search results in the present disclosure;
FIG. 5 is a third interface diagram showing search results in the present disclosure;
fig. 6 is a schematic structural diagram of a search apparatus according to yet another embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of an electronic device for searching according to yet another embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the devices, modules or units to be determined as different devices, modules or units, and are not used for limiting the sequence or interdependence relationship of the functions executed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
The present disclosure provides a searching method, apparatus, electronic device and computer-readable storage medium, which aim to solve the above technical problems of the prior art.
The following describes the technical solutions of the present disclosure and how to solve the above technical problems in specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present disclosure will be described below with reference to the accompanying drawings.
In one embodiment, a search method is provided, as shown in fig. 1, the method comprising:
step S101, obtaining a search request; the search request comprises a first search keyword in a first language;
the method and the device can be applied to video searching, in particular to video searching based on the languages. The so-called idioms are foreign languages such as german, italian, swedish, and czech, which are used in a few countries, as the name implies, in a wide variety of languages relative to the application.
In practical application, a user can install an application program for playing a video in a terminal, a search bar can be provided in a display interface of the application program, the user can input a search keyword of a first language in the search bar, the search keyword can be a word or a plurality of words, then when the user clicks a button for searching such as "confirm", "search", and the like, a search instruction based on the search keyword is triggered, the terminal generates a search request based on the search keyword of the current language after receiving the search instruction, and then sends the search request to a server.
Step S102, searching based on a first search keyword to obtain a search result list; the search result list comprises at least one of a first search result obtained according to the first search keyword and a second search result obtained according to the second search keyword, and the second search keyword is of a second language and corresponds to the first search keyword;
after receiving the search request, the server extracts a first search keyword in a first language from the search request, and then searches based on the first search keyword to obtain a corresponding search result list. The search result list comprises at least one of a first search result obtained according to the first search keyword and a second search result obtained according to the second search keyword, and the second search keyword is in a second language and corresponds to the first search keyword; the correspondence between the second search keyword and the first search keyword can be related, semantically identical, semantically similar, and the like.
That is, only the first search result obtained according to the first search keyword, only the second search result obtained according to the second search keyword, or both of them may be included in the search result list. When the first search result is not obtained according to the first search keyword, that is, when the first search result obtained according to the first search keyword is 0, only the second search result obtained according to the second search keyword may be in the search result list.
And step S103, displaying the search result list.
In the embodiment of the disclosure, a search request is obtained first; the search request comprises a first search keyword of a first language, and then a search result list is obtained based on the first search keyword; the search result list comprises at least one of a first search result obtained according to the first search keyword and a second search result obtained according to the second search keyword, and the second search keyword is of a second language and corresponds to the first search keyword; and displaying the search result list. Thus, when the search result obtained by the search based on the first search keyword of the first language is less, the second search keyword with the same semantic meaning as the first search keyword can be obtained, and the search is carried out based on the second search keyword to obtain the search result containing the second search keyword, because the second search keyword is a language with wider universality, the search based on the second search keyword can obtain a great number of search results containing the second search keyword, so that when the search result obtained by the search based on the first search keyword is less, a great number of search results can still be finally obtained by the search, the user can be ensured to obtain enough search results, the search efficiency is improved, meanwhile, the processes of obtaining the second search keyword and the search result based on the second search keyword are not sensible to the user, and the user does not need to participate in the operation, thereby improving user experience.
In another embodiment, a detailed description of a search method as shown in fig. 1 is continued.
Step S101, obtaining a search request; the search request comprises a first search keyword in a first language;
in particular, the present disclosure may be applied in video searching, especially in whisper-based video searching. The so-called idioms are foreign languages such as german, italian, swedish, and czech, which are used in a few countries, as the name implies, in a wide variety of languages relative to the application. In these countries using the chinese language, when the user performs a video search using the chinese language search keyword, there may be cases where there are few or no search results.
In practical applications, a user may install an application program for playing a video in a terminal, and a search bar may be provided in a presentation interface of the application program, and the user may input a search keyword in a first language in the search bar, where the search keyword may be a word or multiple words, for example, "crayon novei" (a word) and "meat-cooking practice" (two words) may both be the search keyword. And then, when the user clicks buttons for searching, such as 'confirm', 'search', and the like, a search instruction based on the search keyword is triggered, the terminal generates a search request based on the search keyword of the current language after receiving the search instruction, and then sends the search request to the server.
Wherein, this terminal station can have following characteristics:
(1) on a hardware architecture, a device has a central processing unit, a memory, an input unit and an output unit, that is, the device is often a microcomputer device having a communication function. In addition, various input modes such as a keyboard, a mouse, a touch screen, a microphone, a camera and the like can be provided, and input can be adjusted as required. Meanwhile, the equipment often has a plurality of output modes, such as a telephone receiver, a display screen and the like, and can be adjusted according to needs;
(2) on a software system, the device must have an operating system, such as Windows Mobile, Symbian, Palm, Android, iOS, and the like. Meanwhile, the operating systems are more and more open, and personalized application programs developed based on the open operating system platforms are infinite, such as a communication book, a schedule, a notebook, a calculator, various games and the like, so that the requirements of personalized users are met to a great extent;
(3) in terms of communication capacity, the device has flexible access mode and high-bandwidth communication performance, and can automatically adjust the selected communication mode according to the selected service and the environment, thereby being convenient for users to use. The device can support GSM (Global System for Mobile Communication), WCDMA (Wideband Code Division Multiple Access), CDMA2000(Code Division Multiple Access), TDSCDMA (Time Division-Synchronous Code Division Multiple Access), Wi-Fi (Wireless-Fidelity), WiMAX (world Interoperability for Microwave Access) and the like, thereby being suitable for various types of networks, and not only supporting voice services, but also supporting various Wireless data services;
(4) in the aspect of function use, the equipment focuses more on humanization, individuation and multi-functionalization. With the development of computer technology, devices enter a human-centered mode from a device-centered mode, and the embedded computing, control technology, artificial intelligence technology, biometric authentication technology and the like are integrated, so that the human-oriented purpose is fully embodied. Due to the development of software technology, the equipment can be adjusted and set according to individual requirements, and is more personalized. Meanwhile, the device integrates a plurality of software and hardware, and the function is more and more powerful.
Step S102, searching based on a first search keyword to obtain a search result list; the search result list comprises at least one of a first search result obtained according to the first search keyword and a second search result obtained according to the second search keyword, and the second search keyword is of a second language and corresponds to the first search keyword;
after receiving the search request, the server extracts a first search keyword in a first language from the search request, and then searches based on the first search keyword to obtain a corresponding search result list. The search result list comprises at least one of a first search result obtained according to the first search keyword and a second search result obtained according to the second search keyword, and the second search keyword is in a second language and corresponds to the first search keyword; the correspondence between the second search keyword and the first search keyword can be related, semantically identical, semantically similar, and the like.
That is, only the first search result obtained according to the first search keyword, only the second search result obtained according to the second search keyword, or both of them may be included in the search result list. When the first search result is not obtained according to the first search keyword, that is, when the first search result obtained according to the first search keyword is 0, only the second search result obtained according to the second search keyword may be in the search result list.
In a preferred embodiment of the present disclosure, the searching for the search result list based on the first search keyword includes:
searching based on the first search keyword to obtain a corresponding first search result;
when the number of the first search results is smaller than the number threshold, acquiring at least one second search keyword of a second language corresponding to the first search keyword based on a preset rule;
searching based on each second search keyword to obtain a corresponding second search result;
taking the first search result and each second search result as a search result list;
and when the number of the first search results is not less than the number threshold value, using the first search results as a search result list.
Specifically, after receiving a search request, the server extracts a first search keyword in a first language from the search request, and then performs a search based on the first search keyword to obtain a first search result, and when the number of first search results obtained by using the first search keyword is smaller than a number threshold, a situation that there is no user requirement in the search results may occur.
Further, if the number of the first search results exceeds the number threshold, the second search keyword does not need to be acquired, the second search result is obtained based on the second search keyword, and the first search result is used as the search result list.
For example, the preset number threshold is 10, the first search keyword is a, and when the first search result obtained based on a search is 15, the 15 search results are used as a search result list; if the first search result is 5, a second search keyword a 'corresponding to the first search keyword is obtained, and then 30 second search results are obtained based on a' search, and these 35 search results are used as a search result list. Further, when the search result based on a is 0 and the search result based on a' is 30, then the search result list is 30 search results at this time, and the first search result is not included.
In a preferred embodiment of the present disclosure, searching for a corresponding first search result based on a first search keyword includes:
performing word segmentation processing on the first search keyword to obtain at least one word segmentation;
and searching based on each participle and a preset rule to obtain a corresponding first search result.
When searching for the search keyword, natural language processing, such as word segmentation processing, may be performed on the search keyword to obtain at least one word segmentation, and then each word segmentation is searched by using a preset rule, so as to obtain a search result.
The method for obtaining the corresponding first search result based on each participle and a preset rule search comprises the following steps:
and when the number of the participles does not exceed the participle number threshold, at least one of the participles is adopted for searching to obtain a corresponding first search result.
Specifically, a threshold of the number of segmented words may be preset, and when the number of each segmented word obtained after the segmentation processing of the search keyword does not exceed the threshold of the number of segmented words, at least one of the segmented words may be used for searching, so as to obtain a search result.
For example, if the preset word segmentation quantity threshold is 3, then the search keyword "red-cooked pork practice" is segmented to obtain 2 word segments "red-cooked pork" and "practice", then at least one of the 2 word segments may be used for searching, that is, three search keywords "red-cooked pork", "practice" and "red-cooked pork practice" are used for searching, and as long as any information such as the name, brief introduction and the like of any video includes any one of the three search keywords, the video may be used as a search result, so that a search result corresponding to the "red-cooked pork practice" is obtained.
Further, the priority of the complete search keyword is highest at the time of searching. In the last example, among the three search keywords of "pork braised in soy sauce", "method" and "method for pork braised in soy sauce", the "method for pork braised in soy sauce" is preferentially adopted for searching, and then the "method for pork braised in soy sauce" and "method" are adopted for searching, so that the search result can be guaranteed to be more matched with the requirements of the user.
The method for obtaining the corresponding first search result based on each participle and a preset rule search comprises the following steps:
when the number of the participles exceeds a participle number threshold value, acquiring the word frequency of each participle; the word frequency is the frequency of the occurrence of the participles in a preset multilingual word bank;
taking the participle with the lowest word frequency and the preset participle number in each participle as each first target participle, and calculating the product of the word frequencies of each first target participle to obtain a calculation result;
when the calculation result is smaller than the product threshold, removing the first target participle with the minimum word frequency in each first target participle to obtain each second target participle;
and taking each second target participle as a current first target participle, repeatedly executing the step of calculating the product of the word frequencies of each first target participle to obtain a calculation result, removing the first target participle with the minimum word frequency in each first target participle when the calculation result is smaller than a product threshold value to obtain each second target participle until the current calculation result is not smaller than the product threshold value, and searching by adopting at least one of the plurality of second target participles to obtain a corresponding first search result.
Specifically, when the number of the participles exceeds a participle number threshold, the word frequency of each participle is obtained, and as the word frequency is higher, the information amount is less, and the search result is less, the participle with the word frequency of the least participle number threshold needs to be selected from each participle as a target participle; the word frequency of each participle is the frequency of each participle appearing in a preset multilingual word stock. For example, 5 segmented words are obtained after the search keyword segmentation processing, and if the number of the segmented words exceeds 3, 3 segmented words with the lowest word frequency number are selected from the 5 segmented words as target segmented words.
And then, calculating the product of the word frequencies of all the target participles to obtain a calculation result, when the calculation result is smaller than a product threshold value, indicating that no search result may exist when searching by adopting all the current target participles, at the moment, removing the participle with the lowest word frequency in all the current target participles to obtain all the remaining target participles, then continuously calculating the product of the word frequencies of all the remaining target participles to obtain the calculation result, and if the calculation result is still smaller than the product threshold value, continuously repeating the steps until the product of the word frequencies of all the remaining target participles is not smaller than the product threshold value.
In a preferred embodiment of the present disclosure, obtaining at least one second search keyword in a second language corresponding to the first search keyword based on a preset rule includes:
and inquiring and matching the first search keyword in a preset multilingual word stock to obtain at least one first language keyword matched with the first search keyword and obtain each second language keyword which is respectively associated with each first language keyword.
Specifically, the preset multilingual lexicon includes a plurality of keyword pairs, each keyword pair includes a first language keyword and a corresponding second language keyword, the first language keyword and the second language keyword have an association relationship therebetween, and meanwhile, the plurality of keyword pairs may have an association relationship therebetween.
Furthermore, the first search keyword is queried and matched in the multilingual lexicon to obtain at least one corresponding first language keyword and each second language keyword corresponding to each first language keyword.
For example, the multilingual lexicon includes three keyword pairs "pork cooked meat" - "Braised pork", "way of doing" - "preceding", or "pork cooked meat doing" - "how to make broad pork", and "pork cooked meat" and "Braised pork" have an association relationship, and "way" and "preceding" have an association relationship, and "pork cooked meat doing" and "how to make broad pork" have an association relationship, and meanwhile, the three keyword pairs have an association relationship with each other.
When the first search keyword is 'meat cooking method', the 'meat cooking method' is inquired and matched in a multilingual word stock to obtain three first-language keywords 'meat cooking method', 'cooking method' and 'meat cooking method', and three second-language keywords 'Braided pork', 'front part' and 'how to cook pork' are obtained based on the incidence relation.
In a preferred embodiment of the present disclosure, the preset multilingual lexicon is an offline database constructed based on historical search keywords, the multilingual lexicon includes at least one first-language partial word, a second-language keyword, and a recall number corresponding to the second-language keyword, and the first-language partial word, the second-language keyword, and the recall number have an association relationship with each other. The historical search keywords are search keywords used by the user in searching.
Further, a preset multilingual word stock is generated in the following way:
obtaining historical search keywords of a first language, and performing word segmentation on the historical search keywords to obtain at least one first language keyword; the first language keyword comprises at least one word segmentation result;
translating each first language keyword to obtain a second language keyword corresponding to each first language keyword;
recalling the search results for each second language keyword to obtain the number of the recalls of the search results corresponding to each second language keyword;
and establishing the association relationship between the historical search keywords and each first language keyword, each second language keyword and each recall quantity, and storing each first language keyword, each second language keyword, each recall quantity and the association relationship into a multi-language lexicon.
Specifically, after the user finishes searching by using the search keyword in the first language each time, under an offline condition, the application program may first obtain the search keyword, and then perform word segmentation on the search keyword to obtain at least one keyword in the first language, where each keyword in the first language includes at least one word segmentation result, for example, the historical search keyword is a "meat-braised method", and two words of "meat-braised-in-red" and "method" are obtained after word segmentation, so that there are 3 keywords in total in addition to the "meat-braised-in-red method".
Then, each first language keyword is translated to obtain a corresponding second language keyword, and the association relation between each first language keyword and the corresponding second language keyword is established. For example, translating the "red-cooked pork" to obtain "Braised pot", translating the "practice" to obtain "practice", translating the "red-cooked pork practice" to obtain "how to cook pickled pot", and establishing the association relationship between the "red-cooked pork" and the "Braised pot", the "practice" and the "practice", and the "way to cook pork practice" and the "how to cook pickled pot", respectively, to obtain 3 keyword pairs.
Then, recalling the search results for each second language keyword to obtain the number of the recalls of the search results corresponding to each second language keyword; the recall number is a search result obtained when the keyword is used for searching. For example, the number of recalls of "Braised pork" is 334, the number of recalls of "practice" is 125889, and the number of recalls of "how to book bought pork" is 3, that is, a search using "Braised pork" can result in 334 search results.
And then establishing the incidence relation between each second language keyword and the corresponding recall quantity thereof and between each keyword pair, and storing each keyword pair and the corresponding recall quantity thereof.
Furthermore, since the historical search keywords of each user are obtained, repeated keywords appear in the obtained historical search keywords, and the number of times that the keywords are repeated in the multilingual word stock can be counted as the word frequency of the keywords.
Thus, in the offline case, the application may generate an offline multilingual thesaurus based on the user's historical search keywords. When a user performs online search by using a first search keyword to obtain a first search result, and the first search result is smaller than a quantity threshold value, a second search keyword corresponding to the first search keyword can be obtained through an offline multilingual word stock, and a second search result is obtained based on the second search keyword, so that the quantity of the search results is greatly increased.
In a preferred embodiment of the present disclosure, the obtaining of the respective corresponding second search result based on the search of each second search keyword includes:
acquiring the number of recalls of the search results corresponding to each second language keyword; the number of recalls of the search results is the number of search results obtained by searching according to any second language keyword;
determining at least one target second language keyword based on each recall quantity and the recall quantity threshold;
and determining the second search results with the recall quantity threshold value from the set of search results corresponding to the target second language keywords respectively.
Specifically, before searching by using each search keyword, the recall number corresponding to each search keyword may be obtained from the multi-language lexicon, then at least one target second-language keyword is determined based on each recall number and a preset recall number threshold, and then the second search result of the recall number threshold is determined from the set of search results corresponding to each target second-language keyword (that is, all search results corresponding to each target second-language keyword).
In a preferred embodiment of the present disclosure, determining at least one target second language keyword based on the number of recalls and the number of recalls threshold includes:
sequencing the second language keywords in an ascending order based on the number of recalls to obtain the sequenced second language keywords;
taking the first-ranked keywords in the second language as detection objects, and determining the total recall quantity of the detection objects;
judging whether the total number of recalls exceeds a threshold value of the number of recalls;
if not, the detection object and the next second language keyword in the detection object sequence are simultaneously used as the current detection object, the steps of determining the total recall quantity of the detection object and judging whether the total recall quantity exceeds a recall quantity threshold value are repeatedly executed, and at least one second language keyword corresponding to the current detection object is used as a target second language keyword when the total recall quantity exceeds the recall quantity threshold value.
Assuming that the threshold number of recalls is 70, the number of recalls for keyword a in the second language is 10, the number of recalls for B is 40, the number of recalls for C is 20, the number of recalls for D is 30, and the number of recalls for E is 50, then sorting ABCDE in ascending order based on the respective number of recalls yields: A-C-D-B-E, then taking A as a detection object, determining the total recall number of A, then judging whether the total recall number of A exceeds a recall number threshold value, taking the next C in the sequence of A as the current detection object at the same time because 10<70, repeatedly executing the steps of determining the total recall number of the detection object and judging whether the total recall number exceeds the recall number threshold value, and continuing to repeatedly execute the steps because 10+20 is equal to 30<70 until ACDB is taken as the current detection object at the same time and ACDB is taken as the target second language keyword because 10+20+30+40 is equal to 100>70 when ACDB is taken as the current detection object at the same time.
In a preferred embodiment of the present disclosure, determining the second search result with the recall number threshold from the set of search results corresponding to each target keyword in the second language includes:
sorting the search results based on the ascending sort of the target second language keywords and the preset sort of the search results corresponding to the target second language keywords to obtain the sorted search results;
taking the top N search results in the sorted search results as second search results; where N is a recall number threshold.
In practical applications, the second search results obtained according to the second language keywords are also ranked, for example, the number of recalls of a is 10, which means that 10 second search results can be obtained according to a, and then the 10 second search results are also ranked.
Therefore, after the target second language keyword ACDB is determined, all search results are ranked based on the ranking of each target second language keyword and the ranking of the search result corresponding to each target second language keyword, so as to obtain ranked search results, and then from the search result ranked first, N search results before ranking are selected as second search results, where N is a recall number threshold.
For example, in the above example, 100 search results corresponding to ACDB are ranked, and then the top 70 search result is used as the second search result, that is, the second search result includes all search results corresponding to A, C, D, and the top 10 search result in the search results corresponding to B.
In practical application, the smaller the number of search results corresponding to a search keyword, the higher the matching degree between the search results and the search keyword. For example, for the first search keyword "meat-braised method" in the previous example, three second search keywords are obtained through the multilingual lexicon: braided pork, practice, how to book bouired pork, "Braided pork" had a number of recalls of 334, "practice" had a number of recalls of 125889, "how to book bouilled pork" had a number of recalls of 3. The number of recalls of the "how to book bought pork" is the least, but the matching degree with the "red-cooked pork practice" is the highest, and the 3 search results are also the most suitable for the user requirements.
And step S103, displaying the search result list.
For example, if the number of the first search results is 50 and the number of the second search results is 150, the number of the search results returned to the terminal and the device is 200, and the terminal device displays a search result list including 200 search results to the user.
Further, the simplified Chinese language is used as the first language, and English is used as the second language for example. In the search interface shown in fig. 2, the user enters "braised pork practice" in the search bar, and 5 first search results are obtained, and assuming that the number threshold is 3, the 5 first search results are displayed as shown in fig. 3.
If 2 first search results are obtained, three second search keywords are obtained based on the multilingual thesaurus: and performing a search based on the three second search keywords to obtain 6 second search results, and finally displaying the 8 search results, as shown in fig. 4.
If the number of the first search results is 0 and 3 second search results are obtained, 3 second search results are displayed, as shown in fig. 5.
In the embodiment of the present disclosure, a plurality of keywords in the first language may be obtained based on the historical search keywords in the first language of the user, each keyword in the first language is translated to obtain a keyword in the second language, the number of search results that can be recalled for the keyword in the second language may be further determined, then each keyword in the first language is associated with the corresponding keyword in the second language and the number of recalled keywords in the second language to form a keyword pair, and then the keyword pair is stored in a multi-language library. Thus, in the offline case, the application may generate an offline multilingual thesaurus based on the user's historical search keywords. When a user carries out online search by adopting a first search keyword to obtain a first search result, and the first search result is smaller than a quantity threshold value, a second search keyword corresponding to the first search keyword can be obtained through an offline multilingual word stock, and a second search result is obtained based on the second search keyword, so that the quantity of the search results is greatly increased, online instant translation of the first search keyword is not needed, and search delay caused by online instant translation is avoided, thereby improving the search efficiency and improving the user experience.
Further, when a corresponding first search result is obtained based on the first search keyword, word segmentation can be performed on the first search keyword to obtain at least one word segmentation, and then which word or words are adopted for searching is determined based on the number of the word segmentation and the frequency of occurrence of each word segmentation in the multi-language word bank; moreover, after the second search keywords are obtained, if there is more than one second search keyword, it may be further determined which one or more second search keywords to use for the search based on the number of recalls corresponding to each second search keyword. Therefore, the hit rate of the search can be improved by adopting the common word segmentation in the first search keyword to search, and the hit rate of the search is further improved by adopting the second search key with a large number of recalls to search.
Fig. 6 is a schematic structural diagram of a search apparatus according to another embodiment of the present disclosure, and as shown in fig. 6, the apparatus of this embodiment may include:
a receiving module 601, configured to obtain a search request; the search request comprises a first search keyword in a first language;
a search module 602, configured to search for a search result list based on a first search keyword; the search result list comprises at least one of a first search result obtained according to the first search keyword and a second search result obtained according to the second search keyword, and the second search keyword is of a second language and corresponds to the first search keyword;
and a display module 603, configured to display the search result list.
In a preferred embodiment of the present disclosure, the search module includes:
the first search submodule is used for obtaining a corresponding first search result based on the first search keyword;
the obtaining sub-module is used for obtaining at least one second search keyword in a second language corresponding to the first search keyword based on a preset rule when the number of the first search results is smaller than a number threshold;
the second searching submodule is used for searching to obtain a second searching result corresponding to each second searching keyword based on each second searching keyword;
and the determining submodule is used for taking the first search result and each second search result as a search result list.
In a preferred embodiment of the present disclosure, the determining sub-module is further configured to:
and when the number of the first search results is not less than the number threshold value, using the first search results as a search result list.
In a preferred embodiment of the present disclosure, the first obtaining sub-module is specifically configured to:
and inquiring and matching the first search keyword in a preset multilingual word stock to obtain at least one first language keyword matched with the first search keyword and obtain each second language keyword which is respectively associated with each first language keyword.
In a preferred embodiment of the present disclosure, the second search submodule includes:
the recall number acquisition unit is used for acquiring the recall number of the search results corresponding to each second language keyword; the number of recalls of the search results is the number of search results obtained by searching according to any second language keyword;
a first determining unit, configured to determine at least one target second-language keyword based on each recall number and the recall number threshold;
and the second determining unit is used for determining the second search results of the recall quantity threshold from the set of the search results corresponding to the target second language keywords respectively.
In a preferred embodiment of the present disclosure, the first determining unit includes:
the first sequencing subunit is used for sequencing the second language keywords in an ascending order based on the number of recalls to obtain the sequenced second language keywords;
the total recall number determining subunit is used for taking the keywords of the second language with the first sequence as the detection objects and determining the total recall number of the detection objects;
a judging subunit, configured to judge whether the total recall number exceeds a recall number threshold;
and the processing subunit is used for taking the detection object and the second language keyword next to the detection object in the sequence as the current detection object at the same time, repeatedly calling the total recall quantity determining subunit and the judging subunit, and taking at least one second language keyword corresponding to the current detection object as the target second language keyword until the total recall quantity exceeds a recall quantity threshold value.
In a preferred embodiment of the present disclosure, the second determination unit includes:
the second sorting subunit is used for sorting the search results based on the ascending sorting of the target second language keywords and the preset sorting of the search results corresponding to the target second language keywords to obtain the sorted search results;
the determining subunit is used for taking the first N search results in the sorted search results as second search results; where N is a recall number threshold.
In a preferred embodiment of the present disclosure, the preset multilingual lexicon is an offline database constructed based on historical search keywords, the multilingual lexicon includes at least one keyword in a first language, a keyword in a second language, and a recall number corresponding to the keyword in the second language, and the first language partial word, the keyword in the second language, and the recall number have an association relationship with each other.
In a preferred embodiment of the present disclosure, the preset multilingual lexicon is generated as follows:
obtaining historical search keywords of a first language, and performing word segmentation on the historical search keywords to obtain at least one first language keyword; the first language keyword comprises at least one word segmentation result;
translating each first language keyword to obtain a second language keyword corresponding to each first language keyword;
recalling search results for each second language keyword to obtain the number of recalls of the search results corresponding to each second language keyword;
and establishing the association relationship between the historical search keywords and each first language keyword, each second language keyword and each recall quantity, and storing each first language keyword, each second language keyword, each recall quantity and the association relationship into a multi-language lexicon.
In a preferred embodiment of the present disclosure, the first search sub-module includes:
the word segmentation unit is used for carrying out word segmentation processing on the first search keyword to obtain at least one word segmentation;
and the processing unit is used for searching to obtain a corresponding first search result based on each participle and a preset rule.
In a preferred embodiment of the present disclosure, the processing unit is specifically configured to:
and when the number of the participles does not exceed the participle number threshold, at least one of the participles is adopted for searching to obtain a corresponding first search result.
In a preferred embodiment of the present disclosure, the processing unit includes:
the word frequency obtaining subunit is used for obtaining the word frequency of each participle when the number of the participles exceeds a participle number threshold; the word frequency is the frequency of the occurrence of the participles in a preset multilingual word bank;
the calculation subunit is used for taking the participle with the lowest word frequency and the preset participle number in each participle as each first target participle, and calculating the product of the word frequencies of each first target participle to obtain a calculation result;
the filtering subunit is used for removing the first target participle with the minimum word frequency in each first target participle to obtain each second target participle when the calculation result is smaller than the product threshold;
and respectively taking each second target word segmentation as a current first target word segmentation, repeatedly calling the calculation subunit and the filtering subunit until the current calculation result is not less than the product threshold value, and searching by adopting at least one of the second target word segmentation to obtain a corresponding first search result.
The search apparatus of this embodiment can perform the search methods shown in the first embodiment and the second embodiment of the present disclosure, and the implementation principles thereof are similar, and are not described herein again.
In the embodiment of the disclosure, a search request is obtained first; the search request comprises a first search keyword of a first language, and then a search result list is obtained based on the first search keyword; the search result list comprises at least one of search results containing first search keywords and search results containing second search keywords, and the second search keywords are in a second language and have the same semantics as the first search keywords; and displaying the search result list. Thus, when the search result obtained by the search based on the first search keyword of the first language is less, the second search keyword with the same semantic meaning as the first search keyword can be obtained, and the search is carried out based on the second search keyword to obtain the search result containing the second search keyword, because the second search keyword is a language with wider universality, the search based on the second search keyword can obtain a great number of search results containing the second search keyword, so that when the search result obtained by the search based on the first search keyword is less, a great number of search results can still be finally obtained by the search, the user can be ensured to obtain enough search results, the search efficiency is improved, meanwhile, the processes of obtaining the second search keyword and the search result based on the second search keyword are not sensible to the user, and the user does not need to participate in the operation, thereby improving the user experience.
Furthermore, a plurality of first language keywords can be obtained based on historical search keywords of a user in a first language, each first language keyword is translated to obtain a second language keyword, the number of search results which can be recalled by the second language keywords can be further determined, then each first language keyword is associated with the corresponding second language keyword and the number of recall of the second language keyword to form a keyword pair, and then the keyword pair is stored in a multi-language word library. Thus, in the offline case, the application may generate an offline multilingual thesaurus based on the user's historical search keywords. When a user carries out online search by adopting a first search keyword to obtain a first search result, and the first search result is smaller than a quantity threshold value, a second search keyword corresponding to the first search keyword can be obtained through an offline multilingual word stock, and a second search result is obtained based on the second search keyword, so that the quantity of the search results is greatly increased, online instant translation of the first search keyword is not needed, and search delay caused by online instant translation is avoided, thereby improving the search efficiency and improving the user experience.
Furthermore, when a corresponding first search result is obtained based on the first search keyword, the first search keyword can be segmented to obtain at least one segmented word, and then which segmented word or segmented words are adopted for searching is determined based on the number of the segmented words and the frequency of the occurrence of each segmented word in the multi-language word bank; moreover, after the second search keywords are obtained, if there is more than one second search keyword, it may be further determined which one or more second search keywords are adopted for searching based on the number of recalls corresponding to each second search keyword. Therefore, the hit rate of the search can be improved by adopting the common word segmentation in the first search keyword to search, and the hit rate of the search is further improved by adopting the second search key with a large number of recalls to search.
Referring now to FIG. 7, shown is a schematic diagram of an electronic device 700 suitable for use in implementing embodiments of the present disclosure. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
The electronic device includes: a memory and a processor, wherein the processor may be referred to as the processing device 701 hereinafter, and the memory may include at least one of a Read Only Memory (ROM)702, a Random Access Memory (RAM)703 and a storage device 708 hereinafter, as shown in detail below: as shown in fig. 7, electronic device 700 may include a processing means (e.g., central processing unit, graphics processor, etc.) 701 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage means 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 700 are also stored. The processing device 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Generally, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, or the like; an output device 707 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 708 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication means 709 may allow the electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 illustrates an electronic device 700 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication means 709, or may be installed from the storage means 708, or may be installed from the ROM 702. The computer program, when executed by the processing device 701, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may be separate and not incorporated into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a search request; the search request comprises a first search keyword in a first language; searching based on the first search keyword to obtain a search result list; the search result list comprises at least one of a first search result obtained according to a first search keyword and a second search result obtained according to a second search keyword, wherein the second search keyword is in a second language and corresponds to the first search keyword; and displaying the search result list.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, including conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules or units described in the embodiments of the present disclosure may be implemented by software or hardware. Wherein the designation of a module or unit does not in some cases constitute a limitation of the unit itself.
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
According to one or more embodiments of the present disclosure [ example one ] there is provided a search method comprising:
acquiring a search request, wherein the search request comprises a first search keyword in a first language;
searching based on the first search keyword to obtain a search result list; the search result list comprises at least one of a first search result obtained according to a first search keyword and a second search result obtained according to a second search keyword, wherein the second search keyword is in a second language and corresponds to the first search keyword;
and displaying the search result list.
In a preferred embodiment of the present disclosure, the searching for the search result list based on the first search keyword includes:
obtaining a corresponding first search result based on the first search keyword;
when the number of the first search results is smaller than a number threshold, acquiring at least one second search keyword in a second language corresponding to the first search keyword based on a preset rule;
searching based on each second search keyword to obtain a corresponding second search result;
and taking the first search result and each second search result as the search result list.
In a preferred embodiment of the present disclosure, the method further includes:
when the number of the first search results is not less than the number threshold, the first search results are used as the search result list.
In a preferred embodiment of the present disclosure, obtaining at least one second search keyword in a second language corresponding to the first search keyword based on a preset rule includes:
and inquiring and matching the first search keyword in a preset multilingual word stock to obtain at least one first language keyword matched with the first search keyword and obtain each second language keyword which is respectively associated with each first language keyword.
In a preferred embodiment of the present disclosure, the obtaining of the respective corresponding second search result based on the search of each second search keyword includes:
acquiring the number of recalls of the search results corresponding to each second language keyword; the number of recalls of the search results is the number of search results obtained by searching according to any second language keyword;
determining at least one target second language keyword based on each recall quantity and the recall quantity threshold;
and determining the second search result of the recall quantity threshold value from the set of search results corresponding to the target second language key words respectively.
In a preferred embodiment of the present disclosure, the determining at least one target keyword in the second language based on the number of recalls and the number of recalls threshold includes:
performing ascending sorting on each second language keyword based on each recall quantity to obtain each second language keyword after sorting;
using the first-ranked keywords in the second language as detection objects, and determining the total recall quantity of the detection objects;
determining whether the total number of recalls exceeds the recall number threshold;
if not, the detection object and the next second language keyword in the detection object sequence are simultaneously used as the current detection object, the steps of determining the total recall quantity of the detection object and judging whether the total recall quantity exceeds the recall quantity threshold value are repeatedly executed, and at least one second language keyword corresponding to the current detection object is used as a target second language keyword when the total recall quantity exceeds the recall quantity threshold value.
In a preferred embodiment of the present disclosure, the determining the second search result of the recall number threshold from the set of search results corresponding to each target second language keyword includes:
sorting the search results based on the ascending sort of the target second language keywords and the preset sort of the search results corresponding to the target second language keywords to obtain the sorted search results;
taking the first N search results in the sorted search results as second search results; wherein N is the recall number threshold.
In a preferred embodiment of the present disclosure, the preset multilingual lexicon is an offline database constructed based on historical search keywords, the multilingual lexicon includes at least one keyword in a first language, a keyword in a second language, and a recall number corresponding to the keyword in the second language, and the partial words in the first language, the keyword in the second language, and the recall number have an association relationship with each other.
In a preferred embodiment of the present disclosure, the preset multilingual lexicon is generated as follows:
obtaining historical search keywords of a first language, and performing word segmentation on the historical search keywords to obtain at least one first language keyword; the first language keyword comprises at least one word segmentation result;
translating each first language keyword to obtain a second language keyword corresponding to each first language keyword;
recalling the search results for each second language keyword to obtain the number of the recalls of the search results corresponding to each second language keyword;
and establishing the association relationship between the historical search keywords and each first language keyword, each second language keyword and each recall quantity, and storing each first language keyword, each second language keyword, each recall quantity and the association relationship to the multilingual word stock.
In a preferred embodiment of the present disclosure, the searching for the corresponding first search result based on the first search keyword includes:
performing word segmentation processing on the first search keyword to obtain at least one word segmentation;
and searching to obtain a corresponding first search result based on each participle and a preset rule.
In a preferred embodiment of the present disclosure, the searching for obtaining a corresponding first search result based on each segmented word and a preset rule includes:
and when the number of the participles does not exceed the participle number threshold, at least one of the participles is adopted for searching to obtain a corresponding first search result.
In a preferred embodiment of the present disclosure, the searching based on each participle and a preset rule to obtain a corresponding first search result includes:
when the number of the participles exceeds a participle number threshold value, acquiring the word frequency of each participle; the word frequency is the frequency of the occurrence of the participle in a preset multilingual word bank;
taking the participle with the lowest word frequency and the preset participle number in each participle as each first target participle, and calculating the product of the word frequencies of each first target participle to obtain a calculation result;
when the calculation result is smaller than the product threshold value, removing the first target participle with the minimum word frequency in each first target participle to obtain each second target participle;
and taking each second target participle as a current first target participle, repeatedly executing the step of calculating the product of the word frequency of each first target participle to obtain a calculation result, when the calculation result is smaller than a product threshold, removing the first target participle with the minimum word frequency in each first target participle to obtain each second target participle, and searching by adopting at least one of a plurality of second target participles to obtain a corresponding first search result until the current calculation result is not smaller than the product threshold.
According to one or more embodiments of the present disclosure, [ example two ] there is provided an apparatus of example one, comprising:
the receiving module is used for acquiring a search request; the search request comprises a first search keyword in a first language;
the search module is used for searching based on the first search keyword to obtain a search result list; the search result list comprises at least one of a first search result obtained according to a first search keyword and a second search result obtained according to a second search keyword, wherein the second search keyword is in a second language and corresponds to the first search keyword;
and the display module is used for displaying the search result list.
In a preferred embodiment of the present disclosure, the search module includes:
the first search submodule is used for obtaining a corresponding first search result based on the first search keyword;
the obtaining sub-module is used for obtaining at least one second search keyword in a second language corresponding to the first search keyword based on a preset rule when the number of the first search results is smaller than a number threshold;
the second searching submodule is used for searching to obtain a second searching result corresponding to each second searching keyword based on each second searching keyword;
and the determining submodule is used for taking the first search result and each second search result as the search result list.
In a preferred embodiment of the present disclosure, the determining sub-module is further configured to:
when the number of the first search results is not less than the number threshold, the first search results are used as the search result list.
In a preferred embodiment of the present disclosure, the first obtaining sub-module is specifically configured to:
and inquiring and matching the first search keyword in a preset multilingual word stock to obtain at least one first language keyword matched with the first search keyword and obtain each second language keyword which is respectively associated with each first language keyword.
In a preferred embodiment of the present disclosure, the second search submodule includes:
the recall number acquisition unit is used for acquiring the recall number of the search results corresponding to each second language keyword; the number of recalls of the search results is the number of search results obtained by searching according to any second language keyword;
a first determining unit, configured to determine at least one target second-language keyword based on each recall number and the recall number threshold;
and the second determining unit is used for determining the second search result of the recall quantity threshold from the set of search results corresponding to the target second language keywords.
In a preferred embodiment of the present disclosure, the first determining unit includes:
the first sequencing subunit is used for sequencing the second language keywords in an ascending order based on the number of recalls to obtain the sequenced second language keywords;
a total recall number determining subunit, configured to use the first-ranked keyword in the second language as a detection object, and determine a total recall number of the detection object;
a determining subunit, configured to determine whether the total number of recalls exceeds the recall number threshold;
and the processing subunit is configured to use the detection object and the second-language keyword next to the detection object in the sequence as the current detection object at the same time, and repeatedly call the total recall number determining subunit and the judging subunit until the total recall number exceeds the recall number threshold, and use at least one second-language keyword corresponding to the current detection object as a target second-language keyword.
In a preferred embodiment of the present disclosure, the second determining unit includes:
the second sorting subunit is used for sorting the search results based on the ascending sorting of the target second language keywords and the preset sorting of the search results corresponding to the target second language keywords to obtain the sorted search results;
the determining subunit is used for taking the first N search results in the sorted search results as second search results; wherein N is the recall number threshold.
In a preferred embodiment of the present disclosure, the preset multilingual lexicon is an offline database constructed based on historical search keywords, the multilingual lexicon includes at least one keyword in a first language, a keyword in a second language, and a recall number corresponding to the keyword in the second language, and the words in the first language, the keyword in the second language, and the recall number have an association relationship with each other.
In a preferred embodiment of the present disclosure, the preset multilingual lexicon is generated as follows:
obtaining historical search keywords of a first language, and performing word segmentation on the historical search keywords to obtain at least one first language keyword; the first language keyword comprises at least one word segmentation result;
translating each first language keyword to obtain a second language keyword corresponding to each first language keyword;
recalling search results for each second language keyword to obtain the number of recalls of the search results corresponding to each second language keyword;
and establishing an association relation between the historical search keywords and each first language keyword, each second language keyword and each recall quantity, and storing each first language keyword, each second language keyword, each recall quantity and the association relation to the multilingual word stock.
In a preferred embodiment of the present disclosure, the first search submodule includes:
the word segmentation unit is used for carrying out word segmentation processing on the first search keyword to obtain at least one word segmentation;
and the processing unit is used for searching to obtain a corresponding first search result based on each participle and a preset rule.
In a preferred embodiment of the present disclosure, the processing unit is specifically configured to:
and when the number of the participles does not exceed the participle number threshold, at least one of the participles is adopted for searching to obtain a corresponding first search result.
In a preferred embodiment of the present disclosure, the processing unit includes:
the word frequency obtaining subunit is used for obtaining the word frequency of each participle when the number of the participles exceeds a participle number threshold; the word frequency is the frequency of the occurrence of the participles in a preset multilingual word bank;
the calculation subunit is used for taking the participle with the lowest word frequency and the preset participle number in each participle as each first target participle, and calculating the product of the word frequencies of each first target participle to obtain a calculation result;
the filtering subunit is used for removing the first target participle with the minimum word frequency in each first target participle to obtain each second target participle when the calculation result is smaller than the product threshold;
and respectively taking each second target word segmentation as a current first target word segmentation, repeatedly calling the calculation subunit and the filtering subunit until the current calculation result is not less than the product threshold value, and searching by adopting at least one of the second target word segmentation to obtain a corresponding first search result.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (12)

1. A method of searching, comprising:
acquiring a search request, wherein the search request comprises a first search keyword in a first language;
performing word segmentation processing on the first search keyword to obtain at least one word segmentation;
when the number of the participles exceeds a participle number threshold value, acquiring the word frequency of each participle; the word frequency is the frequency of the occurrence of the participle in a preset multilingual word bank;
taking the participles with the lowest word frequency and the preset participle number in each participle as each first target participle, and calculating the product of the word frequencies of each first target participle to obtain a calculation result;
when the calculation result is smaller than the product threshold value, removing the first target participle with the minimum word frequency in each first target participle to obtain each second target participle;
taking each second target participle as a current first target participle, repeatedly executing the step of calculating the product of the word frequency of each first target participle to obtain a calculation result, when the calculation result is smaller than a product threshold value, removing the first target participle with the minimum word frequency in each first target participle to obtain the step of each second target participle until the current calculation result is not smaller than the product threshold value, and searching by adopting at least one of a plurality of second target participles to obtain a corresponding first search result;
when the number of the first search results is smaller than a number threshold, acquiring at least one second search keyword in a second language corresponding to the first search keyword based on a preset rule;
searching based on each second search keyword to obtain a corresponding second search result;
at least one of the first search result and each second search result is used as a search result list;
and displaying the search result list.
2. The search method of claim 1, further comprising:
when the number of the first search results is not less than the number threshold, the first search results are used as the search result list.
3. The method according to claim 1, wherein obtaining at least one second search keyword in a second language corresponding to the first search keyword based on a preset rule comprises:
and inquiring and matching the first search keyword in a preset multilingual word stock to obtain at least one first language keyword matched with the first search keyword and obtain each second language keyword which is respectively associated with each first language keyword.
4. The searching method according to claim 1 or 3, wherein the searching based on each second search keyword to obtain a corresponding second search result comprises:
acquiring the number of recalls of the search results corresponding to each second language keyword; the number of the recalls of the search results is the number of the search results obtained by searching according to any second language keyword;
determining at least one target second language keyword based on each recall quantity and the recall quantity threshold;
and determining the second search result of the recall quantity threshold value from the set of search results corresponding to the target second language key words respectively.
5. The search method of claim 4, wherein said determining at least one target second language keyword based on respective recall numbers and recall number thresholds comprises:
performing ascending sorting on each second language keyword based on each recall quantity to obtain each second language keyword after sorting;
taking the first-ranked second language keywords as detection objects, and determining the total recall quantity of the detection objects;
determining whether the total number of recalls exceeds the recall number threshold;
if not, taking the second language keyword in the next order of the detection object and the detection object as the current detection object, repeatedly determining the total recall quantity of the detection object, judging whether the total recall quantity exceeds the recall quantity threshold value, and taking at least one second language keyword corresponding to the current detection object as a target second language keyword when the total recall quantity exceeds the recall quantity threshold value.
6. The search method according to claim 4, wherein the determining the second search result of the recall number threshold from the set of search results corresponding to each target keyword in the second language comprises:
sorting the search results based on the ascending sort of the target second language keywords and the preset sort of the search results corresponding to the target second language keywords to obtain the sorted search results;
taking the first N search results in the sorted search results as second search results; wherein N is the recall number threshold.
7. The searching method according to claim 1 or 3, wherein the preset multilingual thesaurus is an off-line database constructed based on historical search keywords, the multilingual thesaurus includes at least one keyword in a first language, a keyword in a second language, and a recall number corresponding to the keyword in the second language, and the keyword in the first language, the keyword in the second language, and the recall number are related to each other.
8. The search method according to claim 1 or 3, wherein the predetermined multilingual lexicon is generated by:
obtaining historical search keywords of a first language, and performing word segmentation on the historical search keywords to obtain at least one first language keyword; the first language keyword comprises at least one word segmentation result;
translating each first language keyword to obtain a second language keyword corresponding to each first language keyword;
recalling the search results for each second language keyword to obtain the number of the recalls of the search results corresponding to each second language keyword;
and establishing the association relationship between the historical search keywords and each first language keyword, each second language keyword and each recall quantity, and storing each first language keyword, each second language keyword, each recall quantity and the association relationship to the multilingual word stock.
9. The search method of claim 1, wherein when the number of segments does not exceed the number of segments threshold, the method further comprises:
and searching by adopting at least one of the multiple segmented words to obtain a corresponding first search result.
10. A search apparatus, comprising:
the device comprises a receiving module, a searching module and a searching module, wherein the receiving module is used for acquiring a searching request which comprises a first searching keyword of a first language;
the search module comprises a first search submodule, an acquisition submodule, a second search submodule and a determination submodule;
the first searching submodule comprises a word segmentation unit and a processing unit, wherein the processing unit comprises a word frequency acquisition subunit, a calculation subunit and a filtering subunit; the word segmentation unit carries out word segmentation processing on the first search keyword to obtain at least one word segmentation; the word frequency obtaining subunit is configured to obtain the word frequency of each participle when the number of the participles exceeds a participle number threshold; the word frequency is the frequency of the occurrence of the participle in a preset multilingual word bank; the calculating subunit is configured to use the participle with the lowest word frequency and the preset participle number in each participle as each first target participle, and calculate a product of the word frequencies of each first target participle to obtain a calculation result; the filtering subunit is configured to, when the calculation result is smaller than a product threshold, remove a first target participle with a smallest word frequency in each first target participle to obtain each second target participle; respectively taking each second target word segmentation as a current first target word segmentation, repeatedly calling the calculation subunit and the filtering subunit until the current calculation result is not less than the product threshold value, and searching by adopting at least one of the second target word segmentations to obtain a corresponding first search result;
the obtaining sub-module is used for obtaining at least one second search keyword in a second language corresponding to the first search keyword based on a preset rule when the number of the first search results is smaller than a number threshold;
the second search sub-module is used for searching to obtain respective corresponding second search results based on each second search keyword;
the determining submodule is used for taking at least one of the first search result and each second search result as the search result list;
and the display module is used for displaying the search result list.
11. An electronic device, comprising:
a processor, a memory, and a bus;
the bus is used for connecting the processor and the memory;
the memory is used for storing operation instructions;
the processor is configured to execute the search method according to any one of claims 1 to 9 by calling the operation instruction.
12. A computer-readable medium for storing computer instructions which, when executed on a computer, cause the computer to perform the search method of any one of claims 1-9.
CN202010555041.2A 2020-06-17 2020-06-17 Searching method, searching device, electronic equipment and computer-readable storage medium Active CN111708911B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010555041.2A CN111708911B (en) 2020-06-17 2020-06-17 Searching method, searching device, electronic equipment and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010555041.2A CN111708911B (en) 2020-06-17 2020-06-17 Searching method, searching device, electronic equipment and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN111708911A CN111708911A (en) 2020-09-25
CN111708911B true CN111708911B (en) 2022-06-24

Family

ID=72541108

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010555041.2A Active CN111708911B (en) 2020-06-17 2020-06-17 Searching method, searching device, electronic equipment and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN111708911B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177061B (en) * 2021-05-25 2023-05-16 马上消费金融股份有限公司 Searching method and device and electronic equipment
CN113392308A (en) * 2021-06-22 2021-09-14 北京字节跳动网络技术有限公司 Content search method, device, equipment and medium

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5956740A (en) * 1996-10-23 1999-09-21 Iti, Inc. Document searching system for multilingual documents
US6009422A (en) * 1997-11-26 1999-12-28 International Business Machines Corporation System and method for query translation/semantic translation using generalized query language
KR20040059240A (en) * 2002-12-28 2004-07-05 엔에이치엔(주) A method for providing multi-language translation service and a system of enabling the method
CN1940925A (en) * 2006-07-03 2007-04-04 魏新成 Method for instantlly reminding search result list during process of keywork search
CN102043845B (en) * 2010-12-08 2013-08-21 百度在线网络技术(北京)有限公司 Method and equipment for extracting core keywords based on query sequence cluster
US8918308B2 (en) * 2012-07-06 2014-12-23 International Business Machines Corporation Providing multi-lingual searching of mono-lingual content
IN2013CH01237A (en) * 2013-03-21 2015-08-14 Infosys Ltd
US9348870B2 (en) * 2014-02-06 2016-05-24 International Business Machines Corporation Searching content managed by a search engine using relational database type queries
US9569526B2 (en) * 2014-02-28 2017-02-14 Ebay Inc. Automatic machine translation using user feedback
CN106599206A (en) * 2016-12-15 2017-04-26 北京小米移动软件有限公司 Method and device for searching information
US20190108282A1 (en) * 2017-10-09 2019-04-11 Facebook, Inc. Parsing and Classifying Search Queries on Online Social Networks
CN109948036B (en) * 2017-11-15 2022-10-04 腾讯科技(深圳)有限公司 Method and device for calculating weight of participle term
CN111143659A (en) * 2018-11-02 2020-05-12 声音猎手公司 System and method for performing intelligent cross-domain search

Also Published As

Publication number Publication date
CN111708911A (en) 2020-09-25

Similar Documents

Publication Publication Date Title
CN110096655B (en) Search result sorting method, device, equipment and storage medium
CN110619076B (en) Search term recommendation method and device, computer and storage medium
CN110516159B (en) Information recommendation method and device, electronic equipment and storage medium
CN111708911B (en) Searching method, searching device, electronic equipment and computer-readable storage medium
CN111898643A (en) Semantic matching method and device
CN112037792A (en) Voice recognition method and device, electronic equipment and storage medium
CN112287206A (en) Information processing method and device and electronic equipment
CN112532507B (en) Method and device for presenting an emoticon, and for transmitting an emoticon
CN111160029A (en) Information processing method and device, electronic equipment and computer readable storage medium
CN114445754A (en) Video processing method and device, readable medium and electronic equipment
CN113011169B (en) Method, device, equipment and medium for processing conference summary
CN112819512B (en) Text processing method, device, equipment and medium
CN111694985B (en) Search method, search device, electronic equipment and computer-readable storage medium
CN110738048A (en) keyword extraction method and device and terminal equipment
CN110598049A (en) Method, apparatus, electronic device and computer readable medium for retrieving video
CN114063795A (en) Interaction method, interaction device, electronic equipment and storage medium
CN111782895B (en) Retrieval processing method and device, readable medium and electronic equipment
CN113420723A (en) Method and device for acquiring video hotspot, readable medium and electronic equipment
CN113127718A (en) Text search method and device, readable medium and electronic equipment
CN110209939B (en) Method and device for acquiring recommendation information, electronic equipment and readable storage medium
CN112669816A (en) Model training method, speech recognition method, device, medium and equipment
CN111444813A (en) Method, device, equipment and storage medium for identifying attribute classification of target object
CN112596617A (en) Message content input method and device and electronic equipment
CN112231444A (en) Processing method and device for corpus data combining RPA and AI and electronic equipment
CN111737572A (en) Search statement generation method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Douyin Vision Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: Tiktok vision (Beijing) Co.,Ltd.

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Tiktok vision (Beijing) Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: BEIJING BYTEDANCE NETWORK TECHNOLOGY Co.,Ltd.

CP01 Change in the name or title of a patent holder