CN110909245A

CN110909245A - Multi-label webpage searching method, browser, server and storage medium

Info

Publication number: CN110909245A
Application number: CN201911205658.5A
Authority: CN
Inventors: 陈顺利
Original assignee: Beijing Hanzi Technology Co Ltd
Current assignee: Beijing Hanzi Technology Co Ltd
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2020-03-24

Abstract

The embodiment of the invention discloses a multi-label webpage searching method, a browser, a server and a storage medium. The method comprises the following steps: acquiring voice information of a user; matching the voice information with the content information of each webpage pointed by the plurality of labels respectively, and confirming the webpage with the maximum similarity with the voice information; and displaying the webpage with the maximum similarity. According to the technical scheme of the embodiment of the invention, the voice information is acquired and matched with the content information of each webpage, so that a user can conveniently and quickly find the content which the user wants to find when opening a large number of webpage labels, and the searching efficiency is improved.

Description

Multi-label webpage searching method, browser, server and storage medium

Technical Field

The embodiment of the invention relates to a web browser technology, in particular to a multi-label web searching method, a browser, a server and a storage medium.

Background

When a user searches for data by using a web browser each time, a large number of tags are opened, and the tags are required to be switched in a complicated manner to find the content which the user wants to find, so that the user is difficult to find the tag in which the user wants to find the content, and the searching efficiency is low.

Disclosure of Invention

The embodiment of the invention provides a multi-label webpage searching method, a browser, a server and a storage medium, so that contents which a user wants to search can be quickly found when a large number of webpage labels are opened, and the searching efficiency is improved.

In a first aspect, an embodiment of the present invention provides a multi-tag webpage searching method, including:

acquiring voice information of a user;

matching the voice information with the content information of each webpage pointed by the plurality of labels respectively, and confirming the webpage with the maximum similarity with the voice information;

and displaying the webpage with the maximum similarity.

Optionally, the matching the content information of each webpage to which the voice information and the multiple tags point respectively to determine the webpage with the maximum similarity to the voice information includes:

calculating the similarity of the content information of each webpage respectively pointed by the voice information and the plurality of labels;

and confirming the webpage with the maximum similarity with the voice information according to the similarity.

Optionally, the calculating the similarity of the content information of each webpage to which the voice information and the tags point respectively includes:

converting the speech information into sentence vectors

Respectively converting the content information of each webpage into a vector

Vector the sentence

Vector with content information of each web page

Multiplying to obtain the similarity;

the confirming the webpage with the maximum similarity with the voice information according to the similarity comprises the following steps:

taking a maximum value among the similarities

Confirming the webpage with the maximum similarity with the voice information:

optionally, the method further includes:

after displaying the webpage with the maximum similarity, acquiring description keywords input by a user in a plurality of searching processes and/or description keywords of reading webpage marks corresponding to the plurality of searching processes;

inputting the description keywords into a training model trained in advance, and outputting result keywords;

and displaying the result keywords on the webpage with the maximum similarity or a preset area of the current page.

Optionally, before obtaining the description keyword input by the user in the multiple search processes and/or the description keyword of the reading webpage mark corresponding to the multiple search processes, the training of the training model further includes:

collecting a large number of description keywords and result keywords of a specific field;

marking the description keywords by using the result keywords to generate a training sample set;

and inputting each description keyword of the training sample set into a training model for training.

Optionally, after the training of the training model, detecting the training model further includes:

marking the description keywords by using the result keywords to generate a detection sample set;

inputting each description keyword of the detection sample set into a training model for detection so as to output a detection result;

and confirming whether the training model needs to be trained continuously or not according to the matching degree of the detection result and the result keyword.

In a second aspect, an embodiment of the present invention further provides a browser, including:

the acquisition unit is used for acquiring voice information of a user;

the matching unit is used for matching the voice information with the content information of each webpage pointed by the labels respectively and confirming the webpage with the maximum similarity with the voice information;

and the display unit is used for displaying the webpage with the maximum similarity.

In a third aspect, an embodiment of the present invention further provides a server, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the multi-tag web page searching method described in any of the foregoing embodiments when executing the computer program.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the multi-tag web page searching method in any one of the foregoing embodiments.

According to the technical scheme of the embodiment of the invention, the voice information is acquired and matched with the content information of each webpage, so that a user can conveniently and quickly find the content which the user wants to find when opening a large number of webpage labels, and the searching efficiency is improved.

Drawings

Fig. 1 is a schematic flowchart of a multi-tag web page searching method according to a first embodiment of the present invention;

fig. 2 is a schematic flowchart of a multi-tag web page searching method according to a second embodiment of the present invention;

FIG. 3 is a diagram illustrating a model for generating answers according to a second embodiment of the present invention;

fig. 4 is a schematic structural diagram of a browser in a third embodiment of the present invention;

fig. 5 is a schematic structural diagram of a server in the fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

Furthermore, the terms "first," "second," and the like may be used herein to describe various orientations, actions, steps, elements, or the like, but the orientations, actions, steps, or elements are not limited by these terms. These terms are only used to distinguish one direction, action, step or element from another direction, action, step or element. For example, a first tag may be termed a second tag, and, similarly, a second tag may be termed a first tag, without departing from the scope of the present application. The first label and the second label are both labels, but they are not the same label. The terms "first", "second", etc. are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Example one

According to the technical scheme of the first embodiment of the invention, the webpage is searched by using the voice vector to replace the searching problem of the multi-label webpage, so that the problem that the content cannot be seen by the multi-label webpage is solved, and the intelligent webpage questioning experience is realized. Fig. 1 is a schematic flowchart of a multi-tag webpage searching method according to an embodiment of the present invention, which is applicable to a webpage searching situation. The method of the embodiment of the invention can be executed by a multi-label web page searching device, which can be realized by software and/or hardware, and can be generally integrated in a browser, a server or terminal equipment. Referring to fig. 1, a method for searching a multi-tag webpage according to an embodiment of the present invention specifically includes the following steps:

and step S110, acquiring voice information of the user.

Specifically, the voice information of the user can be acquired through a mobile phone APP or a microphone of a computer end, and voice recognition is performed. For example, when browsing a web page, a user opens a plurality of web pages, such as a large stack of web pages like "qin", "chess", "book", "drawing", etc., and the user suddenly wants to find out the "qin" web page, and does not know where the label of the web page is, and it is too troublesome to find out one by one, so the user can speak out "qin" or voice information related to "qin" through a microphone at a computer end.

And step S120, matching the voice information with the content information of each webpage pointed by the plurality of labels respectively, and confirming the webpage with the maximum similarity with the voice information.

After the voice information of the user is acquired, the voice information is converted into character information in a specific format or symbol information in a specific type and the like, and similarly, the content information of each webpage respectively pointed by the tags is converted into the character information in the specific format or the symbol information in the specific type and the like, and the character information, the symbol information and the like are matched to confirm the webpage with the maximum voice information similarity. For example, after a user speaks voice information 'qin' to a microphone, the voice information is recognized and converted into character information, the character information is matched with content information of each webpage respectively pointed by a plurality of labels, and the webpage with the maximum similarity to the 'qin' is found.

The matching mode of the present invention is described below, specifically as follows:

step S1201, calculating similarity of the content information of each webpage to which the voice information and the plurality of tags point respectively.

Step A, converting the voice information into sentence vector

Specifically, after the microphone acquires the voice information, the voice signal is converted into an electric signal, and then the electric signal is converted into a sentence vector

Step B, converting the content information of each webpage into vectors respectively

Specifically, the content information of each web page is extracted, which may include text information, picture information, video information, and the like, or may be only text information, and the content information is converted into vectors respectively

Corresponding to web page 1, web page 2, … …, and web page n, respectively.

Step C, the sentence vector

Vector with content information of each web page

The multiplication is carried out in such a way that,to obtain the similarity.

Specifically, a sentence vector is obtained

Sum vector

Then, the sentence is vector

And vector

Multiplying to obtain similarity respectively

Corresponding to web page 1, web page 2, … …, web page n.

And step S1202, confirming the webpage with the maximum similarity to the voice information according to the similarity.

Specifically, the maximum value is taken out of the similarity

Confirming the webpage with the maximum similarity with the voice information:

the corresponding web page is the web page with the maximum similarity.

And step S130, displaying the webpage with the maximum similarity.

Specifically, the web page with the maximum similarity is found, the web page can be directly popped up, the web page can be prompted through different colors, and whether the web page is displayed or not is determined by the user.

Example two

For the exploratory data searching process, a large amount of domain-specific languages are needed, and a user cannot express the problem of the specific domain, so that useful data are difficult to search. Fig. 2 is a schematic flowchart of a multi-tag webpage searching method according to a second embodiment of the present invention. The method of the embodiment of the invention can be executed by a multi-label web page searching device, which can be realized by software and/or hardware, and can be generally integrated in a server or a terminal device. Referring to fig. 2, a method for searching a multi-tag webpage according to an embodiment of the present invention specifically includes the following steps:

and step S210, acquiring the voice information of the user.

Step S220, matching the content information of each webpage pointed by the voice information and the plurality of labels, and determining the webpage with the maximum similarity to the voice information.

And step S230, displaying the webpage with the maximum similarity.

Step S240, after displaying the web page with the maximum similarity, obtaining the description keyword input by the user in the multiple search processes and/or the description keyword of the reading web page mark corresponding to the multiple search processes.

Specifically, the description keyword refers to a term in some areas of expertise that the user wants to input in the process of searching for data exploratory, but it is unclear what the term is, and the keyword is attempted to be input. For example, the user wants to find out what is the "blockchain", but does not know the word, and inputs the information of "a decentralized distributed account book database", "the concatenated text records which are cryptographically concatenated and protect the content", and the like, and at this time, "a decentralized distributed account book database", "the concatenated text records which are cryptographically concatenated and protect the content" is the descriptionA keyword. In the embodiment of the present invention, the

A plurality of domain-specific description keywords that represent user attempts to input are used as vectors,

is a vector representing the answer keywords intended by the user. It can be understood that the time for acquiring the description keyword input by the user in the multiple search processes is not limited, and may be acquired after displaying the web page with the maximum similarity in the first implementation, or may be acquired at other times.

And S250, inputting the description keywords into a training model trained in advance, and outputting result keywords.

Specifically, the description keywords are input into a training model trained in advance, and result keywords are output. Based on the model, when a user inputs a plurality of professional field exclusive descriptors, the answer of the user is automatically and intelligently matched and found

Vector quantity;

annotation data derived from a user's descriptors of a search question multiple times,

results from the user labeling of the results. Can be expressed by the following formula:

wherein the content of the first and second substances,

a description keyword representing a user input,

indicating the result key words to be output.

Or can be represented schematically as shown in FIG. 3, d₁Representing a decentralized distributed ledger database, d₂The recommendation model is a recommendation model for automatically generating answers by multi-description input, wherein the model adopts description marks and answer mark data used by multiple persons as a training set, and r represents what is a block chain.

And S260, displaying the result keywords on the webpage with the maximum similarity or in a preset area of the current page.

After the result keywords are generated, the result keywords can be displayed on the webpage with the maximum similarity or in a preset area of the current page. The preset area can be a user-defined area or a system default area.

Generally, before the training model is used to output the result keywords, the training model needs to be trained, and the calculation parameters of the model are adjusted through training, so that the result keywords are more accurately output when being used. When the user uses the model, the input description keywords and the output result keywords are recorded in the background so as to train the model to use. And training the model by using the domain mark description keyword d and the result keyword mark r to help the user to quickly predict the answer keyword of the search. Training the training model comprises:

step A, collecting a large number of description keywords and result keywords in a specific field.

Specifically, the method can be used for collecting the words according to the input condition of the user in the using process, and each word searched by the user is recorded as a training sample set.

And B, marking the description keywords by using the result keywords to generate a training sample set.

And step C, inputting each description keyword of the training sample set into a training model for training.

After training of the training model is completed, the model also needs to be tested. The detection of the training model comprises:

step a, collecting a large number of description keywords and result keywords of specific fields.

Specifically, the collected sample data for detection is different from the sample data for training, and the sample size for detection can be smaller. For example, 70% of the data is used for training and 30% of the data is used for detection, which can be adjusted according to actual conditions.

And b, marking the description keywords by using the result keywords to generate a detection sample set.

And c, inputting each description keyword of the detection sample set into a training model for detection so as to output a detection result.

And d, confirming whether the training model needs to be trained continuously or not according to the matching degree of the detection result and the result keyword.

Specifically, if the obtained detection result is not much different from the result keyword, it indicates that the training model does not need to be trained continuously; if the obtained detection result is greatly different from the result keywords, the training model needs to be trained continuously.

According to the technical scheme of the embodiment of the invention, the user can be helped to quickly predict the search answer through the labeling data of the multiple descriptors and the result keywords.

EXAMPLE III

The multi-tag webpage searching device provided by the embodiment of the invention can execute the webpage searching method provided by any embodiment of the invention, has corresponding functional modules and beneficial effects of the execution method, can be realized in a software and/or hardware (integrated circuit) mode, and can be generally integrated in a browser, a server or terminal equipment. Fig. 4 is a schematic structural diagram of a multi-tag web page searching device or browser according to a third embodiment of the present invention. Referring to fig. 4, a multi-tag web page searching apparatus or browser according to an embodiment of the present invention may specifically include:

an obtaining unit 410, configured to obtain voice information of a user;

a matching unit 420, configured to match the content information of each webpage to which the voice information and the multiple tags point respectively, and determine a webpage with the largest similarity to the voice information;

the display unit 430 is configured to display the web page with the largest similarity.

Optionally, the matching unit 420 is further configured to: calculating the similarity of the content information of each webpage respectively pointed by the voice information and the plurality of labels; and confirming the webpage with the maximum similarity with the voice information according to the similarity.

converting the speech information into sentence vectors

Respectively converting the content information of each webpage into a vector

Vector the sentence

Vector with content information of each web page

Multiplying to obtain the similarity;

taking a maximum value among the similarities

Confirming the webpage with the maximum similarity with the voice information:

optionally, the apparatus further comprises:

the description unit is used for acquiring the description keywords input by the user in the multiple searching processes and/or the description keywords of the reading webpage marks corresponding to the multiple searching processes after the webpage with the maximum similarity is displayed;

the result unit is used for inputting the description keywords into a training model trained in advance and outputting result keywords;

and the display unit is used for displaying the result keywords on the webpage with the maximum similarity or in a preset area of the current page.

Optionally, before obtaining the description keyword input by the user in the multiple search processes and/or the description keyword of the reading webpage mark corresponding to the multiple search processes, the training of the training model further includes: collecting a large number of description keywords and result keywords of a specific field; marking the description keywords by using the result keywords to generate a training sample set; and inputting each description keyword of the training sample set into a training model for training.

Optionally, after the training of the training model, detecting the training model further includes: collecting a large number of description keywords and result keywords of a specific field; marking the description keywords by using the result keywords to generate a detection sample set; inputting each description keyword of the detection sample set into a training model for detection so as to output a detection result; and confirming whether the training model needs to be trained continuously or not according to the matching degree of the detection result and the result keyword.

Example four

Fig. 5 is a schematic structural diagram of a server according to a fourth embodiment of the present invention, as shown in fig. 5, the server includes a processor 510, a memory 520, an input device 530, and an output device 540; the number of the processors 510 in the server may be one or more, and one processor 510 is taken as an example in fig. 5; the processor 510, the memory 520, the input device 530 and the output device 540 in the server may be connected by a bus or other means, and the bus connection is exemplified in fig. 5.

The memory 520 may be used as a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the multi-tag web page searching method in the embodiment of the present invention (for example, the obtaining unit 410, the matching unit 420, and the display unit 430 in the multi-tag web page searching apparatus). The processor 510 executes various functional applications of the server and data processing by executing software programs, instructions and modules stored in the memory 520, that is, implements the multi-tag web page searching method described above.

Namely:

acquiring voice information of a user;

and displaying the webpage with the maximum similarity.

Of course, the processor of the server provided in the embodiment of the present invention is not limited to execute the method operations described above, and may also execute related operations in the multi-tag web page searching method provided in any embodiment of the present invention.

The memory 520 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 520 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 520 may further include memory located remotely from processor 510, which may be connected to a server over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 530 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the server. The output device 540 may include a display device such as a display screen.

EXAMPLE five

An embodiment of the present invention further provides a storage medium containing computer-executable instructions, where the computer-executable instructions are executed by a computer processor to perform a multi-tag web page lookup method, where the method includes:

acquiring voice information of a user;

and displaying the webpage with the maximum similarity.

Of course, the storage medium provided by the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the method operations described above, and may also perform related operations in the multi-tag web page searching method provided by any embodiment of the present invention.

The computer-readable storage media of embodiments of the invention may take any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or terminal. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A multi-label web page searching method is characterized by comprising the following steps:

acquiring voice information of a user;

and displaying the webpage with the maximum similarity.

2. The method for searching for a multi-tag webpage according to claim 1, wherein the matching the content information of each webpage to which the voice information and the tags point respectively to determine the webpage with the maximum similarity to the voice information comprises:

3. The method for searching for a multi-tag web page according to claim 2, wherein the calculating the similarity between the voice information and the content information of each web page to which the plurality of tags point respectively comprises:

converting the speech information into sentence vectors

Respectively converting the content information of each webpage into a vector

Vector the sentence

Vector with content information of each web page

Multiplying to obtain the similarity;

taking a maximum value among the similarities

Confirming the webpage with the maximum similarity with the voice information:

4. the multi-tag web page lookup method of claim 1, further comprising:

5. The method for finding the multi-label web page according to claim 4, wherein before obtaining the description keywords input by the user in the multiple search processes and/or the description keywords of the reading web page tags corresponding to the multiple search processes, the training of the training model further comprises:

6. The method for searching for a multi-label web page according to claim 5, wherein after the training of the training model, the method further comprises detecting the training model, and the detecting the training model comprises:

7. A browser, comprising:

the acquisition unit is used for acquiring voice information of a user;

8. A server comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the multi-tag web page lookup method according to any one of claims 1-6 when executing the computer program.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a multi-tag web page lookup method according to any one of claims 1-6.