CN112307073A

CN112307073A - Information query method, device, equipment and storage medium

Info

Publication number: CN112307073A
Application number: CN201910818419.0A
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2021-02-02

Abstract

The embodiment of the disclosure discloses a query method, a query device, equipment and a storage medium, wherein the method comprises the following steps: constructing a corpus based on pre-collected data, and counting the word frequency of each word in the corpus; acquiring a voice query instruction input by a user, and identifying user intention of the voice query instruction to obtain a character object corresponding to the user intention; searching in a corpus according to the pinyin and the tone of the character object to obtain at least one search result; reading the word frequency corresponding to each retrieval result, and sequencing at least one retrieval result according to the word frequency; displaying the at least one retrieval result according to the sorting result for selection by a user; and responding to the triggering operation of the user on a certain retrieval result, and navigating to the next level of page to perform information query. The embodiment of the invention realizes the purpose of inquiring the characters through voice, and simultaneously displays the characters with the same pronunciation to the user for selection according to the order of word frequency, thereby improving the inquiring efficiency.

Description

Information query method, device, equipment and storage medium

Technical Field

The embodiment of the disclosure relates to the technical field of computers, and in particular, to an information query method, an information query device, information query equipment and a storage medium.

Background

In daily life, some unfamiliar words or words forgotten how to write are easily encountered, and the words are usually queried by manual input through a dictionary. However, in chinese, many characters have a state of one character with multiple tones or multiple meanings, and when a user needs to ask how a certain multi-tone or multi-meaning character is written, there are many characters or words that can be searched by a dictionary, so that the target character required by the user cannot be quickly and accurately identified, and the identification efficiency is low.

BRIEF SUMMARY OF THE PRESENT DISCLOSURE

The embodiment of the disclosure provides an information query method, an information query device, information query equipment and a storage medium, so as to achieve the purpose of quickly and accurately identifying characters required by a user.

In a first aspect, an embodiment of the present disclosure provides an information query method, where the method includes:

constructing a corpus based on pre-collected data, and counting the word frequency of each word in the corpus;

acquiring a voice query instruction input by a user, and identifying user intention of the voice query instruction to obtain a character object corresponding to the user intention;

searching in the corpus according to the pinyin and the tone of the character object to obtain at least one search result, wherein the search result is a word with the same pronunciation as the character object;

reading the word frequency corresponding to each retrieval result, and sequencing the at least one retrieval result according to the word frequency;

displaying the at least one retrieval result according to the sorting result for selection by a user;

and responding to the triggering operation of the user on a certain retrieval result, and navigating to the next level of page to perform information query.

In a second aspect, an embodiment of the present disclosure further provides an information query apparatus, where the apparatus includes:

the building module is used for building a corpus based on pre-collected data and counting the word frequency of each word in the corpus;

the acquisition and identification module is used for acquiring a voice query instruction input by a user and identifying user intention of the voice query instruction to obtain a character object corresponding to the user intention;

the retrieval module is used for retrieving in the corpus according to the pinyin and the tone of the character object to obtain at least one retrieval result, wherein the retrieval result is a word with the same pronunciation as the character object;

the sequencing module is used for reading the word frequency corresponding to each retrieval result and sequencing the at least one retrieval result according to the word frequency;

the display module is used for displaying the at least one retrieval result according to the sorting result for the user to select;

and the response module is used for responding to the triggering operation of a user on a certain retrieval result, navigating to the next level of page to carry out information query.

In a third aspect, an embodiment of the present disclosure further provides an apparatus, including:

one or more processors;

a storage device for storing one or more programs,

when the one or more programs are executed by the one or more processors, the one or more processors implement the information query method according to any embodiment of the disclosure.

In a fourth aspect, the embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the information query method according to any embodiment of the present disclosure.

After the query voice of the user is obtained, the pinyin and the tone of the characters intended to be queried by the user are determined through voice recognition, all words with the same pronunciation are retrieved from the corpus according to the pinyin and the tone, and the words are sequentially displayed to the user for selection according to the word frequency. Therefore, the purpose of inquiring through voice is achieved, all words with the same pronunciation are displayed for the user to select, and the character recognition efficiency is improved.

Drawings

FIG. 1 is a flow chart of a method of querying information in an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of an information query device in an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of an apparatus in an embodiment of the disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the disclosure and are not limiting of the disclosure. It should be further noted that, for the convenience of description, only some of the structures relevant to the present disclosure are shown in the drawings, not all of them.

It should be noted that the terms "system" and "network" are often used interchangeably in this disclosure. Reference to "and/or" in embodiments of the present disclosure is intended to "include any and all combinations of one or more of the associated listed items. The terms "first", "second", and the like in the description and claims of the present disclosure and in the drawings are used for distinguishing between different objects and not for limiting a particular order.

It should also be noted that the following embodiments of the present disclosure may be implemented individually, or may be implemented in combination with each other, and the embodiments of the present disclosure are not limited specifically.

Referring to fig. 1, which shows a flow chart of an information query method provided by an embodiment of the present disclosure, the method disclosed by the embodiment of the present disclosure is mainly applicable to a case of querying information by voice, for example, querying a writing method of a certain chinese character by voice, the method can be executed by a corresponding information query device, the device can be implemented by software and/or hardware, and can be configured on a device having a voice recognition function and a display device, for example, on a mobile terminal.

As shown in fig. 1, the method specifically includes the following steps:

s101, a corpus is constructed based on data collected in advance, and the word frequency of each word in the corpus is counted.

When the corpus is established, data such as texts, compositions and web texts of a preset number of primary and secondary school teaching materials are collected, the data can be collected in an exemplary manual downloading mode or a crawler crawling mode, word segmentation processing is carried out on the collected data, stop words or nonsense words such as connecting words or language and atmosphere words included in the data are removed, and the corpus is obtained. When a corpus is constructed, the function of combined query of pinyin and tone is added to the corpus.

In the embodiment of the disclosure, after the corpus is established, word frequency statistics is performed on each word in the corpus, that is, the frequency of each word or word appearing in the corpus is determined, for example, the word frequency statistics may be performed through a TF-IDF (term frequency-inverse document frequency) algorithm, and in addition, in order to reduce the amount of calculation, the frequency of each word appearing in the corpus may be directly used as the word frequency of the word. And storing the word frequency statistical result in the established corpus in the form of a data list.

S102, a voice query instruction input by a user is obtained, user intention recognition is carried out on the voice query instruction, and a character object corresponding to the user intention is obtained.

In the embodiment of the invention, after the voice query instruction of the user is acquired, the voice query instruction of the user is identified to obtain the text information corresponding to the voice query instruction, and the obtained text information is matched with the pre-stored intention list to determine the intention of the user and the text object corresponding to the intention of the user. For example, the text information corresponding to the recognized voice query command input by the user is "how to write a word asking for a question? "matching with the intention list shows that the user intention is how the query term is written, and the character object corresponding to the intention is" strange ". Furthermore, after the character object corresponding to the user intention is obtained, the pinyin and the tone of the character object are identified and stored in the word slot together with the character object, and then the subsequent query can be directly carried out based on the pinyin and the tone in the word slot. In the embodiment of the invention, the numbers 1-4 can be used to respectively represent four tones (yin Ping, yang Ping, upward tone and de-voice) of the Chinese Pinyin.

S103, retrieving in the corpus according to the pinyin and the tone of the character object to obtain at least one retrieval result, wherein the retrieval result is a word with the same pronunciation as the character object.

In the embodiment of the present disclosure, after the corpus is established, the corpus may be stored in the local device or in the network server. If the word is stored in the local equipment, retrieval can be directly carried out according to the pinyin and the tone of the character object in the word slot to obtain at least one word with the same pronunciation as the character object; if the corpus is stored in the network server, the pinyin tone in the word slot can be sent to the network server, so that the network server can send the retrieval result to the equipment after retrieving. Illustratively, when the character object recognized by the device is a word of "singularity", the query is performed through the pinyin "qiyi" of the word and the corresponding tone "24" of the word, and all the words with the same pronunciation as the word of "singularity" are obtained. It should be noted that, since the tone is added during the query, the purpose of reducing the search result can be achieved, and the accuracy of the search can be improved.

And S104, reading the word frequency corresponding to each retrieval result, and sequencing the at least one retrieval result according to the word frequency.

Through the steps of S101-S103, all words with the same pronunciation as the text object can be retrieved, and in order to ensure that the user can accurately find the required word, the retrieval result is displayed on the display device of the device (for example, on the touch screen) at the same time, so that the user can select the required retrieval result.

Furthermore, in order to improve the efficiency and accuracy of obtaining the query information required by the user, after the retrieval result is obtained, the word frequency of the retrieval result can be read by matching any retrieval result with the data list for storing the word frequency statistical result. And then, the retrieval results can be sorted according to the word frequency of each retrieval result, and optionally, the retrieval results are sorted according to the sequence of the frequency from high to low.

And S105, displaying the at least one retrieval result according to the sorting result for the user to select.

Since the higher the frequency of a certain word, the greater the probability that it is the word that the user desires. Therefore, at least one retrieval result is displayed according to the sequencing result, and the purpose of quickly selecting the required words by the user can be achieved. Further, for the purpose of reminding the user, the search result ranked first may be highlighted, or all the top N ranked bits may be highlighted, where the value of N may be preset. Furthermore, if the retrieved retrieval results are numerous, in order to facilitate the user to quickly browse all the retrieval results, a scroll bar control is further arranged on the display interface of the device, so that the user can slide the scroll bar to browse the retrieval results.

And S106, responding to the trigger operation of the user on a certain retrieval result, and navigating to the next level of page to perform information query.

After determining the search result required by the user, the user may select the search result word through a trigger operation, where the trigger operation may be a single click or a double click, or may be other trigger operations, and is not specifically limited herein. And the equipment responds to the trigger operation of a user on a certain retrieval result, and navigates to the next level page to perform information query. For example, the search result is queried for detailed information, such as word interpretation, usage, etc.

After the query voice of the user is obtained, the pinyin and the tone of the characters intended to be queried by the user are determined through voice recognition, all characters or words with the same pronunciation are retrieved from the corpus according to the pinyin and the tone, and are displayed to the user for selection. Therefore, the purpose of inquiring through voice is achieved, and the retrieval result with high word frequency is preferentially displayed to the user so as to be convenient for the user to select, and therefore the character recognition efficiency is improved.

Fig. 2 is a schematic structural diagram of an information query apparatus in an embodiment of the present disclosure. As shown in fig. 2, the apparatus includes:

the building module 201 is configured to build a corpus based on pre-collected data, and count word frequency of each word in the corpus;

the acquisition and recognition module 202 is configured to acquire a voice query instruction input by a user, and perform user intention recognition on the voice query instruction to obtain a text object corresponding to the user intention;

the retrieval module 203 is configured to perform retrieval in the corpus according to the pinyin and the tone of the literal object to obtain at least one retrieval result, where the retrieval result is a word having the same pronunciation as the literal object;

the sorting module 204 is configured to read a word frequency corresponding to each search result, and sort the at least one search result according to the word frequency;

a display module 205, configured to display the at least one search result according to the sorting result, so as to be selected by a user;

and the response module 206 is configured to navigate to a next-level page for information query in response to a user triggering operation on a certain retrieval result.

After the query voice of the user is obtained, the pinyin and the tone of the characters intended to be queried by the user are determined through voice recognition, all characters or words with the same pronunciation are retrieved from the corpus according to the pinyin and the tone, and are displayed to the user for selection. Therefore, the purpose of inquiring through voice is achieved, all words with the same pronunciation are displayed for a user to select according to the sequence of word frequency from high to low, and the character recognition efficiency is improved.

On the basis of the above embodiment, the building module includes:

the building unit is used for performing word segmentation processing on the acquired data, removing stop words or nonsense words included in the data and obtaining a corpus;

and the counting unit is used for carrying out word frequency counting based on the TF-IDF algorithm and storing the word frequency counting result in a corpus in a data list form.

On the basis of the above embodiment, the apparatus further includes:

and the highlight processing module is used for highlighting the retrieval result ranked at the first place.

On the basis of the above embodiment, the acquiring and identifying module includes:

the voice recognition unit is used for recognizing a voice query instruction of a user to obtain character information corresponding to the voice query instruction;

and the intention matching unit is used for matching the character information with a pre-stored intention list so as to determine the user intention and the character object corresponding to the user intention.

The information inquiry device provided by the embodiment of the disclosure can execute the information inquiry method provided by any embodiment of the disclosure, and has corresponding functional modules and beneficial effects of the execution method.

Fig. 3 is a schematic structural diagram of an apparatus provided in an embodiment of the present disclosure, and as shown in fig. 3, a schematic structural diagram of an apparatus suitable for implementing an embodiment of the present disclosure is shown. The device shown in fig. 3 is only an example and should not bring any limitation to the function and use range of the embodiments of the present disclosure.

As shown in fig. 3, the apparatus 300 may include a processor (e.g., a central processing unit, a graphics processor, etc.) 301, which may perform various suitable actions and processes according to a program stored in a Read Only Memory (ROM)302 or a program loaded from a storage device 308 into a Random Access Memory (RAM)303, for example, implementing a query method provided by the embodiments of the present disclosure, wherein the method includes:

constructing a corpus based on pre-collected data, and counting the word frequency of each word in the corpus; acquiring a voice query instruction input by a user, and identifying user intention of the voice query instruction to obtain a character object corresponding to the user intention; searching in the corpus according to the pinyin and the tone of the character object to obtain at least one search result, wherein the search result is a word with the same pronunciation as the character object; reading the word frequency corresponding to each retrieval result, and sequencing the at least one retrieval result according to the word frequency; displaying the at least one retrieval result according to the sorting result for selection by a user; and responding to the triggering operation of the user on a certain retrieval result, and navigating to the next level of page to perform information query.

In the RAM 303, various programs and data necessary for the operation of the apparatus 300 are also stored. The processor 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

Generally, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 308 including, for example, magnetic tape, hard disk, etc.; and a communication device 309. The communication means 309 may allow the device 300 to communicate wirelessly or by wire with other devices to exchange data. While fig. 3 illustrates an apparatus 300 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 309, or installed from the storage means 308, or installed from the ROM 302. The computer program, when executed by the processor 301, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the apparatus; or may be separate and not incorporated into the device.

The computer readable medium carries one or more programs, and when the one or more programs are executed by the apparatus, the server executes the query method provided by the embodiment, where the method includes: constructing a corpus based on pre-collected data, and counting the word frequency of each word in the corpus; acquiring a voice query instruction input by a user, and identifying user intention of the voice query instruction to obtain a character object corresponding to the user intention; searching in the corpus according to the pinyin and the tone of the character object to obtain at least one search result, wherein the search result is a word with the same pronunciation as the character object; reading the word frequency corresponding to each retrieval result, and sequencing the at least one retrieval result according to the word frequency; displaying the at least one retrieval result according to the sorting result for selection by a user; and responding to the triggering operation of the user on a certain retrieval result, and navigating to the next level of page to perform information query.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented by software or hardware. The name of the module does not in some cases constitute a limitation of the module itself, and for example, the display module may be further described as a "module for displaying at least one search result".

In accordance with one or more embodiments of the present disclosure, the following is also disclosed:

a1, an information query method, comprising:

A2, according to the method of A1, constructing a corpus based on pre-collected data, and counting word frequency of each word in the corpus, including:

performing word segmentation on the acquired data, and removing stop words or nonsense words included in the data to obtain a corpus;

and carrying out word frequency statistics based on the TF-IDF algorithm, and storing the word frequency statistical result in a corpus in a data list form.

A3, the method of A1, the method further comprising:

the search result ranked first is highlighted.

A4, according to the method of A1, performing user intention recognition on the user voice query instruction to obtain a character object corresponding to the user intention, including:

recognizing a voice query instruction of a user to obtain character information corresponding to the voice query instruction;

and matching the text information with a pre-stored intention list to determine the user intention and the text object corresponding to the user intention.

B1, an information inquiry apparatus, the apparatus comprising:

B2, the apparatus of claim B1, the building blocks comprising:

B3, the apparatus of B1, the apparatus further comprising:

B4, the apparatus of B1, the acquiring identification module comprising:

C. An apparatus, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the information query method of any one of A1-A4.

D. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out an information query method according to any one of claims a1-a 4.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. An information query method, comprising:

2. The method of claim 1, wherein constructing a corpus based on pre-collected data and counting word frequencies of each word in the corpus comprises:

3. The method of claim 1, further comprising:

the search result ranked first is highlighted.

4. The method of claim 1, wherein performing user intention recognition on the user voice query instruction to obtain a text object corresponding to the user intention comprises:

5. An information query apparatus, comprising:

6. The apparatus of claim 5, wherein the building module comprises:

7. The apparatus of claim 5, further comprising:

8. The apparatus of claim 5, wherein the acquisition identification module comprises:

9. An apparatus, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the information query method of any one of claims 1-4.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the information query method according to any one of claims 1 to 4.