CN108093124B

CN108093124B - Audio positioning method and device and mobile terminal

Info

Publication number: CN108093124B
Application number: CN201711132035.0A
Authority: CN
Inventors: 王亚运
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2017-11-15
Filing date: 2017-11-15
Publication date: 2021-01-08
Anticipated expiration: 2037-11-15
Also published as: CN108093124A

Abstract

The invention provides an audio positioning method, an audio positioning device and a mobile terminal. The method comprises the following steps: receiving first audio data; extracting first key information from the first audio data; searching sub audio data matched with the first key information from second audio data; adding a feature label to the sub-audio data. The user can quickly find out the required content from the second audio data according to the feature marks, the efficiency of repeatedly listening to the recording is improved, and the use experience of the user in repeatedly listening to the recording is improved.

Description

Audio positioning method and device and mobile terminal

Technical Field

The invention relates to the technical field of mobile terminals, in particular to an audio positioning method and device and a mobile terminal.

Background

With the development of science and technology, mobile terminals have become indispensable communication tools in people's lives, wherein the recording function is one of the necessary functions of mobile terminals. Many users of mobile terminals use the recording function, and the good recording experience is also an important factor for improving the working efficiency of many workers engaged in special professions, such as meeting recorders.

The experience upgrading of present recording is the tone quality effect of reinforcing recording more, and does not promote well to the work efficiency that the user listened the recording repeatedly.

Disclosure of Invention

The embodiment of the invention provides an audio positioning method, an audio positioning device and a mobile terminal, and aims to solve the problem that the efficiency of acquiring and positioning recording contents is low when a user listens to a recording repeatedly.

In order to solve the above technical problem, an embodiment of the present invention provides an audio positioning method applied to a mobile terminal, where the method includes:

receiving first audio data;

extracting first key information from the first audio data;

searching sub audio data matched with the first key information from second audio data;

adding a feature label to the sub-audio data.

The embodiment of the invention also provides an audio positioning device, which is deployed on a mobile terminal, and comprises:

the audio data receiving module is used for receiving first audio data;

the key information extraction module is used for extracting first key information from the first audio data;

the sub audio data searching module is used for searching sub audio data matched with the first key information from second audio data;

and the characteristic mark adding module is used for adding characteristic marks to the sub audio data.

An embodiment of the present invention further provides a mobile terminal, which includes a processor, a memory, and a computer program stored in the memory and capable of running on the processor, where the computer program, when executed by the processor, implements the steps of the audio positioning method as described above.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the audio positioning method are implemented as described above.

In the embodiment of the invention, the mobile terminal extracts the first key information from the first audio data, matches the second audio data with the first key information, and adds the feature tag to the sub-audio data matched with the first key information in the second audio data, so that a user can quickly find out required content from the second audio data according to the feature tag, the efficiency of repeatedly listening to the recording is improved, and the use experience of the user in repeatedly listening to the recording is improved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

Fig. 1 is a flowchart illustrating steps of an audio positioning method according to a first embodiment of the present invention;

FIG. 2 is a flowchart illustrating steps of an audio positioning method according to a second embodiment of the present invention;

fig. 3 is a block diagram of an audio positioning apparatus according to a third embodiment of the present invention;

fig. 4 is a second block diagram of an audio positioning apparatus according to a third embodiment of the present invention;

fig. 5 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

Fig. 1 is a flowchart illustrating steps of an audio positioning method according to an embodiment of the present invention. The method comprises the following steps:

step 101, receiving first audio data.

In this embodiment, a user inputs first audio data including search content to a mobile terminal, and the mobile terminal receives the first audio data submitted by the user. The first audio data may contain a word or may contain a plurality of words. For example, the user inputs first audio data containing the search content "the second content is Y", and the mobile terminal receives the first audio data. The first audio data is not limited in detail in the embodiment of the present invention, and may be set according to actual situations.

Step 102, extracting first key information from the first audio data.

In this embodiment, after receiving the first audio data, the first key information is extracted from the first audio data. Specifically, the first audio data may be converted into text content and displayed on a screen of the mobile terminal, and first key information selected by a user from the plurality of text contents may be received. For example, the first audio data is converted into a text content of "Y is the second content", and "Y" selected by the user is received as the first key information. Or converting the first audio data into text content, splitting the text content into a plurality of phrases, and using the split phrases as the first key information. For example, the first audio data is converted into text content "the second content is Y", two phrases "the second" and "Y" are split from the text content, and the "second" and "Y" are used as the first key information. The embodiment of the invention does not limit in detail how to extract the first key information, and can be set according to actual conditions.

Step 103, searching the sub audio data matched with the first key information from the second audio data.

In this embodiment, the second audio data is to-be-identified audio data, and after the first key information is extracted from the first audio data, the sub audio data matched with the first key information is searched from the second audio data. Specifically, the second audio data may be converted into text content, a phrase matching the first key information is found from the text content, and the audio data where the matching phrase is located is used as the found sub-audio data. For example, the converting the second audio data into text content includes: today's conferences include three pieces of content X, Y, Z; the first content is X … …; the second content is Y … …; the third is Z … …. And finding the phrase "second" and "Y" matched with the first key information from the converted text content, wherein the "second content is Y … …" is the found sub-audio data. The embodiment of the invention does not limit the division of the molecular audio data in detail, and can be set according to the actual situation.

And 104, adding a characteristic mark to the sub-audio data.

In this embodiment, a feature mark is added to the searched sub audio data. For example, if the searched sub audio data is between 0:31 '45 "-0: 57' 11" of the second audio data, the sub audio data between 0:31 '45 "-0: 57' 11" may be marked with a special color to distinguish the searched sub audio data from other audio data; a label may also be displayed at 0: 31' 45 ". The embodiment of the invention does not limit in detail how the feature mark is added, and can be set according to actual conditions.

In summary, in the embodiment of the present invention, the mobile terminal extracts the first key information from the first audio data, searches for the sub-audio data in the second audio data that matches the first key information, and adds the feature tag to the searched sub-audio data, so that the user can quickly find the required content from the second audio data according to the feature tag, thereby improving the efficiency of repeatedly listening to the recording, and improving the user experience when the user repeatedly listens to the recording.

Example two

Fig. 2 is a flowchart illustrating steps of an audio positioning method according to an embodiment of the present invention. The method comprises the following steps:

step 201, receiving first audio data.

Step 202, extracting first key information from the first audio data.

Step 203, identifying the audio content of the second audio data.

In this embodiment, the second audio data is subjected to speech recognition, and the audio content of the second audio data is recognized. For example, the second audio data is subjected to speech recognition, and the recognized audio content includes: today's conferences include three pieces of content X, Y, Z; the first content is X … …; the second content is Y … …; the third is Z … …. The embodiment of the invention does not limit the speech recognition in detail, and can be set according to the actual situation.

Step 204, dividing the second audio data into a plurality of sub audio data according to the audio content.

In this embodiment, the second audio data is divided according to the audio content. For example, the second audio data is divided into four sub-audio data according to the audio contents, the first sub-audio data including "the conference today includes three contents X, Y, Z"; the second sub audio data includes "the first content is X … …", and the third sub audio data includes "the second content is Y … …"; the fourth sub audio data includes "the third content is Z … …".

Step 205, searching for the sub audio data matched with the first key information from the plurality of sub audio data.

In this embodiment, searching for the sub audio data matched with the first key information from the plurality of sub audio data may specifically include the following steps:

and the first substep is to extract second key information for each sub audio data.

In this embodiment, after the second audio data is divided into a plurality of sub audio data, the second key information is extracted from each sub audio data. For example, second key information is extracted for the first sub audio data, the second key information being "three", "X", "Y", "Z"; extracting second key information from the second sub-audio data, wherein the second key information is 'first' and 'X'; extracting second key information from the third sub-audio data, wherein the second key information is 'second', 'Y'; and extracting second key information from the fourth sub audio data, wherein the second key information is 'third' and 'Z'.

And a second substep of performing word sense matching on the first key information and the second key information and determining the matching degree of the content of the first key information and the content of each piece of sub-audio data.

In this embodiment, word sense matching is performed on the first key information and the second key information. For example, the first key information "second" and "Y" are word sense-matched with the second key information "third", "X", "Y" and "Z" of the first sub-audio data, word sense-matched with the second key information "first" and "X" of the second sub-audio data, word sense-matched with the second key information "second" and "Y" of the third sub-audio data, and word sense-matched with the second key information "third" and "Z" of the fourth sub-audio data. Optionally, the word sense matching includes at least one of same word matching, similar word matching, and multilingual matching. Word banks can be established for matching by matching of near-meaning words, similar words and multiple languages. The embodiment of the present invention is not limited in detail, and may be set according to actual situations.

And after word sense matching, determining the matching degree of the content of the first key information and the content of each sub audio data. For example, the second key information "Y" in the first sub-audio data matches "Y" in the first key information, which accounts for 1/2 of the two first key information, and the matching degree is determined to be 50%; the second key information in the second sub audio data is not matched with the first key information, and the matching degree is determined to be 0; the second key information "second" and "Y" in the third sub-audio data are matched with the second key information "second" and "Y" in the first key information, the matching degree is determined to be 100%, the second key information in the fourth sub-audio data is not matched with the first key information, and the matching degree is determined to be 0. For example, the second key information extracted from the third sub-audio data includes 10 pieces of "second", 8 pieces of "Y", and 1 piece of "a", and after performing word sense matching with the first key information, the number of matches is 9, and the matching degree is determined to be 90%. How to determine the matching degree is not limited in detail in this embodiment, and may be set according to actual situations.

And a third substep of searching the sub-audio data with the matching degree meeting the preset condition.

In this embodiment, after the matching degree between the first key information and each piece of sub-audio data is determined, the sub-audio data whose matching degree meets the preset condition is searched. For example, if the preset condition is that the matching degree is greater than 80%, the third sub-audio data is found; if the preset condition is that the matching degree is greater than or equal to 50%, the first sub audio data and the third sub audio data are searched. The preset conditions are not limited in detail in the embodiment of the invention, and can be set according to actual conditions.

In step 206, corresponding feature labels are added to the plurality of searched sub-audio data.

In this embodiment, the found sub audio data is added with the feature marks respectively, for example, the first sub audio data and the third sub audio data are found, a red mark may be added to the first sub audio data, and a yellow mark may be added to the third sub audio data; it is also possible to add a mark containing "Y" to the first sub audio data and a mark containing "Y" to the third sub audio data. The embodiment of the invention does not limit the feature marks in detail, and can be set according to actual conditions.

And step 207, receiving a click operation instruction for the feature tag.

In this embodiment, when the user finds the content to be listened to again according to the feature tag, the user clicks the feature tag, and the mobile terminal receives an operation instruction for clicking the feature tag. For example, the mobile terminal receives an operation instruction of clicking a yellow mark.

And step 208, playing the sub-audio data corresponding to the feature tag.

In this embodiment, after receiving an operation instruction for clicking the feature tag, the sub-audio data corresponding to the feature tag is played. For example, after receiving an operation instruction of clicking the yellow mark, the third sub-audio data "the second content is Y … …" corresponding to the yellow mark is played. Therefore, the user can quickly find the required content from the second audio data according to the feature marks, and the efficiency of listening to the recording again is improved.

In summary, in the embodiment of the present invention, the mobile terminal extracts the first key information from the first audio data, matches the second audio data with the first key information, and adds the feature tag to the sub-audio data in the second audio data that matches the first key information, so that the user can quickly find the required content from the second audio data according to the feature tag, thereby improving the efficiency of repeatedly listening to the recording and improving the user experience when the user repeatedly listens to the recording.

EXAMPLE III

Fig. 3 shows a block diagram of an audio positioning apparatus according to an embodiment of the present invention. The audio positioning device is deployed on a mobile terminal, and comprises an audio data receiving module 301, a key information extracting module 302, a sub audio data searching module 303 and a feature tag adding module 304:

an audio data receiving module 301, configured to receive first audio data;

a key information extraction module 302, configured to extract first key information from the first audio data;

a sub audio data searching module 303, configured to search, from second audio data, sub audio data that matches the first key information;

a feature label adding module 304, configured to add a feature label to the sub audio data.

On the basis of fig. 3, optionally, the sub audio data searching module 303 includes an audio content identifying sub-module 3031, an audio data dividing sub-module 3032, and a sub audio data searching sub-module 3033, as shown in fig. 4:

an audio content identification submodule 3031, configured to identify audio content of the second audio data;

an audio data dividing sub-module 3032, configured to divide the second audio data into a plurality of sub-audio data according to the audio content;

the sub audio data searching sub-module 3033 is configured to search sub audio data matching the first key information from the plurality of sub audio data.

On the basis of fig. 4, optionally, the sub audio data searching sub-module 3033 includes a key information extracting unit, a matching degree determining unit, and a sub audio data searching unit:

a key information extraction unit for extracting second key information for each sub audio data, respectively;

a matching degree determining unit, configured to perform word sense matching on the first key information and the second key information, and determine a matching degree between the content of the first key information and the content of each piece of sub-audio data;

and the sub audio data searching unit is used for searching the sub audio data of which the matching degree meets the preset condition.

On the basis of fig. 4, optionally, the feature label adding module 304 is specifically configured to add corresponding feature labels to the multiple found sub audio data respectively.

On the basis of fig. 3, optionally, after the feature label adding module 304, the apparatus further includes an operation instruction receiving module 305 and a sub audio data playing module 306, see fig. 4:

an operation instruction receiving module 305, configured to receive a click operation instruction for the feature tag;

a sub audio data playing module 306, configured to play the sub audio data corresponding to the feature tag.

The audio positioning apparatus provided in the embodiment of the present invention can implement each process implemented by the method embodiments of fig. 1 to fig. 2, and is not described herein again to avoid repetition. According to the method and the device, the user can quickly find the required content from the second audio data according to the characteristic marks, the efficiency of repeatedly listening to the recording is improved, and the use experience of the user in repeatedly listening to the recording is improved.

Fig. 5 is a schematic diagram of a hardware structure of a mobile terminal for implementing various embodiments of the present invention, where the mobile terminal 400 includes, but is not limited to: radio frequency unit 401, network module 402, audio output unit 403, input unit 404, sensor 405, display unit 406, user input unit 407, interface unit 408, memory 409, processor 410, and power supply 411. Those skilled in the art will appreciate that the mobile terminal architecture shown in fig. 5 is not intended to be limiting of mobile terminals, and that a mobile terminal may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. In the embodiment of the present invention, the mobile terminal includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer, and the like.

A processor 410 for receiving first audio data; extracting first key information from the first audio data; searching sub audio data matched with the first key information from second audio data; adding a feature label to the sub-audio data.

According to the method and the device, the user can quickly find the required content from the second audio data according to the feature marks, the efficiency of repeatedly listening to the recording is improved, and the use experience of the user in repeatedly listening to the recording is improved.

It should be understood that, in the embodiment of the present invention, the radio frequency unit 401 may be used for receiving and sending signals during a message sending and receiving process or a call process, and specifically, receives downlink data from a base station and then processes the received downlink data to the processor 410; in addition, the uplink data is transmitted to the base station. Typically, radio unit 401 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. Further, the radio unit 401 can also communicate with a network and other devices through a wireless communication system.

The mobile terminal provides the user with wireless broadband internet access through the network module 402, such as helping the user send and receive e-mails, browse web pages, and access streaming media.

The audio output unit 403 may convert audio data received by the radio frequency unit 401 or the network module 402 or stored in the memory 409 into an audio signal and output as sound. Also, the audio output unit 403 may also provide audio output related to a specific function performed by the mobile terminal 400 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 403 includes a speaker, a buzzer, a receiver, and the like.

The input unit 404 is used to receive audio or video signals. The input Unit 404 may include a Graphics Processing Unit (GPU) 4041 and a microphone 4042, and the Graphics processor 4041 processes image data of a still picture or video obtained by an image capturing apparatus (such as a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 406. The image frames processed by the graphic processor 4041 may be stored in the memory 409 (or other storage medium) or transmitted via the radio frequency unit 401 or the network module 402. The microphone 4042 may receive sound, and may be capable of processing such sound into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 401 in case of the phone call mode.

The mobile terminal 400 also includes at least one sensor 405, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor that can adjust the brightness of the display panel 4061 according to the brightness of ambient light, and a proximity sensor that can turn off the display panel 4061 and/or the backlight when the mobile terminal 400 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of the mobile terminal (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), and vibration identification related functions (such as pedometer, tapping); the sensors 405 may also include a fingerprint sensor, a pressure sensor, an iris sensor, a molecular sensor, a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor, etc., which will not be described in detail herein.

The display unit 406 is used to display information input by the user or information provided to the user. The Display unit 406 may include a Display panel 4061, and the Display panel 4061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 407 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the mobile terminal. Specifically, the user input unit 407 includes a touch panel 4071 and other input devices 4072. Touch panel 4071, also referred to as a touch screen, may collect touch operations by a user on or near it (e.g., operations by a user on or near touch panel 4071 using a finger, a stylus, or any suitable object or attachment). The touch panel 4071 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 410, receives a command from the processor 410, and executes the command. In addition, the touch panel 4071 can be implemented by using various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the touch panel 4071, the user input unit 407 may include other input devices 4072. Specifically, the other input devices 4072 may include, but are not limited to, a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a track ball, a mouse, and a joystick, which are not described herein again.

Further, the touch panel 4071 can be overlaid on the display panel 4061, and when the touch panel 4071 detects a touch operation thereon or nearby, the touch operation is transmitted to the processor 410 to determine the type of the touch event, and then the processor 410 provides a corresponding visual output on the display panel 4061 according to the type of the touch event. Although in fig. 5, the touch panel 4071 and the display panel 4061 are two separate components to implement the input and output functions of the mobile terminal, in some embodiments, the touch panel 4071 and the display panel 4061 may be integrated to implement the input and output functions of the mobile terminal, which is not limited herein.

The interface unit 408 is an interface through which an external device is connected to the mobile terminal 400. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 408 may be used to receive input (e.g., data information, power, etc.) from external devices and transmit the received input to one or more elements within the mobile terminal 400 or may be used to transmit data between the mobile terminal 400 and external devices.

The memory 409 may be used to store software programs as well as various data. The memory 409 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 409 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 410 is a control center of the mobile terminal, connects various parts of the entire mobile terminal using various interfaces and lines, and performs various functions of the mobile terminal and processes data by operating or executing software programs and/or modules stored in the memory 409 and calling data stored in the memory 409, thereby integrally monitoring the mobile terminal. Processor 410 may include one or more processing units; preferably, the processor 410 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 410.

The mobile terminal 400 may further include a power supply 411 (e.g., a battery) for supplying power to various components, and preferably, the power supply 411 may be logically connected to the processor 410 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system.

In addition, the mobile terminal 400 includes some functional modules that are not shown, and thus, are not described in detail herein.

Preferably, an embodiment of the present invention further provides a mobile terminal, which includes a processor 410, a memory 409, and a computer program that is stored in the memory 409 and can be run on the processor 410, and when being executed by the processor 410, the computer program implements each process of the above-mentioned embodiment of the audio positioning method, and can achieve the same technical effect, and in order to avoid repetition, details are not described here again.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the above-mentioned embodiment of the audio positioning method, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. An audio positioning method is applied to a mobile terminal, and comprises the following steps:

receiving first audio data;

extracting first key information from the first audio data;

adding a feature label to the sub-audio data;

the searching for the sub audio data matching with the first key information from the second audio data comprises:

identifying audio content of the second audio data;

dividing the second audio data into a plurality of sub audio data according to the audio content;

searching sub audio data matched with the first key information from the plurality of sub audio data;

the searching for sub audio data matched with the first key information comprises:

respectively extracting second key information from each sub audio data;

performing word sense matching on the first key information and the second key information, and determining the matching degree of the content of the first key information and the content of each sub-audio data;

searching the sub-audio data with the matching degree meeting a preset condition;

the step of determining the matching degree between the content of the first key information and the content of each piece of sub-audio data includes:

calculating the proportion of the second key information in the first key information;

and determining the matching degree of the content of the first key information and the content of each piece of sub-audio data according to the proportion.

2. The method of claim 1, wherein the adding feature labels to the sub-audio data comprises:

and respectively adding corresponding characteristic marks to the plurality of searched sub-audio data.

3. The method of claim 1, wherein after said adding feature labels to said sub-audio data, said method further comprises:

receiving a click operation instruction of the feature mark;

and playing the sub-audio data corresponding to the feature marks.

4. An audio positioning apparatus, disposed in a mobile terminal, the apparatus comprising:

the audio data receiving module is used for receiving first audio data;

the characteristic mark adding module is used for adding a characteristic mark to the sub audio data;

the sub audio data searching module comprises:

an audio content identification submodule for identifying the audio content of the second audio data;

the audio data dividing submodule is used for dividing the second audio data into a plurality of sub audio data according to the audio content;

the sub-audio data searching sub-module is used for searching sub-audio data matched with the first key information from the plurality of sub-audio data;

the sub audio data searching sub-module comprises:

the sub audio data searching unit is used for searching the sub audio data of which the matching degree meets the preset condition;

the matching degree determination unit is further configured to:

5. The apparatus of claim 4,

the feature mark adding module is specifically configured to add corresponding feature marks to the multiple searched sub audio data respectively.

6. The apparatus of claim 4, wherein after the signature adding module, the mobile terminal further comprises:

the operation instruction receiving module is used for receiving a click operation instruction of the feature mark;

and the sub audio data playing module is used for playing the sub audio data corresponding to the characteristic mark.

7. A mobile terminal, characterized in that it comprises a processor, a memory and a computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, carries out the steps of the audio positioning method according to any of claims 1 to 3.