WO2019128593A1 - 搜索音频的方法和装置 - Google Patents
搜索音频的方法和装置 Download PDFInfo
- Publication number
- WO2019128593A1 WO2019128593A1 PCT/CN2018/117509 CN2018117509W WO2019128593A1 WO 2019128593 A1 WO2019128593 A1 WO 2019128593A1 CN 2018117509 W CN2018117509 W CN 2018117509W WO 2019128593 A1 WO2019128593 A1 WO 2019128593A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio
- sequence
- reference time
- time series
- time point
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000010079 rubber tapping Methods 0.000 abstract description 6
- 230000008859 change Effects 0.000 abstract description 4
- 230000002093 peripheral effect Effects 0.000 description 10
- 230000001133 acceleration Effects 0.000 description 9
- 238000004891 communication Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 230000009471 action Effects 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 239000000919 ceramic Substances 0.000 description 2
- 238000005314 correlation function Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 241001342895 Chorus Species 0.000 description 1
- HEFNNWSXXWATRW-UHFFFAOYSA-N Ibuprofen Chemical compound CC(C)CC1=CC=C(C(C)C(O)=O)C=C1 HEFNNWSXXWATRW-UHFFFAOYSA-N 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- HAORKNGNJCEJBX-UHFFFAOYSA-N cyprodinil Chemical compound N=1C(C)=CC(C2CC2)=NC=1NC1=CC=CC=C1 HAORKNGNJCEJBX-UHFFFAOYSA-N 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/686—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/43—Querying
- G06F16/435—Filtering based on additional data, e.g. user or group profiles
- G06F16/436—Filtering based on additional data, e.g. user or group profiles using biological or physiological data of a human being, e.g. blood pressure, facial expression, gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/685—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0484—Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0487—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
- G06F3/0488—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
- G06F3/04883—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
- G06F3/0482—Interaction with lists of selectable items, e.g. menus
Definitions
- the present disclosure relates to the field of electronic technology, and more particularly to a method and apparatus for searching for audio.
- a user can enjoy a wide variety of audio using a music application installed in a mobile phone.
- the user can search for corresponding audio data based on song information such as song name, lyrics, and the like, and play.
- a method of searching for audio comprising:
- the time point at which the trigger event occurs is recorded, until a preset end event is detected, and each time point of the record is acquired to obtain a time point sequence;
- the target audio data corresponding to the target reference time series is determined according to a correspondence between the pre-stored audio data and the reference time series.
- selecting a target reference time series that matches the time point sequence including:
- the determining the difference between each pre-stored reference time series and the time point sequence respectively includes:
- the edit distance of each pre-stored reference time series and the time point sequence is separately calculated as the degree of difference.
- the detecting the preset trigger event includes any one of the following:
- a touch signal is detected by a preset area of the touch screen of the device.
- the environment audio data is acquired by the audio collection component of the device, and the preset audio feature information is identified in the environment audio data.
- the audio unit is an audio segment corresponding to a note, or the audio unit is an audio segment corresponding to a word in a lyric corresponding to the audio data.
- the time information includes a start time point of the audio unit or a duration of the audio unit.
- the time information is a duration of the audio unit, and in the pre-stored reference time series, selecting a target reference time series that matches the time point sequence, including:
- a target reference time series that matches the time difference sequence is selected.
- an apparatus for searching for audio comprising:
- a detecting module configured to receive a trigger instruction for searching for audio, and detect a preset trigger event
- a recording module configured to record a time point at which the triggering event occurs when the preset triggering event is detected, until a preset ending event is detected, obtain each time point of the recording, and obtain a sequence of time points;
- a selection module configured to select, in a pre-stored reference time series, a target reference time series that matches the sequence of time points, wherein the reference time series is a time of a plurality of consecutive audio units included in the audio data a sequence of information;
- a determining module configured to determine target audio data corresponding to the target reference time series according to the correspondence between the pre-stored audio data and the reference time series.
- the selecting module is configured to:
- the selecting module is configured to:
- the edit distance of each pre-stored reference time series and the time point sequence is separately calculated as the degree of difference.
- the detecting module is used in any of the following situations:
- a touch signal is detected by a preset area of the touch screen of the device.
- the environment audio data is acquired by the audio collection component of the device, and the preset audio feature information is identified in the environment audio data.
- the audio unit is an audio segment corresponding to a note, or the audio unit is an audio segment corresponding to a word in a lyric corresponding to the audio data.
- the time information includes a start time point of the audio unit or a duration of the audio unit.
- the time information is a duration of the audio unit
- the selecting module includes:
- a determining unit configured to determine a time difference of each two adjacent time points in the sequence of time points based on the sequence of time points, to obtain a time difference sequence
- a selecting unit configured to select, in a pre-stored reference time series, a target reference time series that matches the time difference sequence.
- a terminal comprising a processor and a memory, wherein the memory stores at least one instruction, at least one program, a code set or a set of instructions, the at least one instruction, The at least one program, the set of codes, or a set of instructions is loaded and executed by the processor to implement the method of searching for audio as described above.
- a computer readable storage medium storing at least one instruction, at least one program, a code set, or a set of instructions, the at least one instruction, the at least one segment
- the program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the method of searching for audio as described above.
- the method provided in this embodiment receives a trigger instruction for searching for audio, and detects a preset trigger event; each time a preset trigger event is detected, a time point at which the trigger event occurs is recorded until a preset end event is detected, Obtaining a time point sequence at each time point of the record; selecting, in a pre-stored reference time series, a target reference time series that matches the time point sequence, wherein the reference time series is a plurality of consecutive audios included in the audio data a sequence of time information of the unit; determining target audio data corresponding to the target reference time series according to the correspondence between the pre-stored audio data and the reference time series.
- the user can input a sequence of time points reflecting the characteristics of the audio change through the operation mode corresponding to the preset trigger event (such as tapping the screen of the mobile phone), and search for the corresponding audio data based on the sequence of time points, thereby achieving no songs.
- the preset trigger event such as tapping the screen of the mobile phone
- FIG. 1 is a flow chart showing a method of searching for audio according to an exemplary embodiment
- FIG. 2 is a schematic diagram of a search interface of a music application, according to an exemplary embodiment
- FIG. 3 is a schematic structural diagram of an apparatus for searching audio according to an exemplary embodiment
- FIG. 4 is a schematic structural diagram of a terminal according to an exemplary embodiment.
- Embodiments of the present disclosure provide a method of searching for audio, which may be implemented by a terminal.
- the terminal may be a mobile phone, a tablet computer, a desktop computer, a notebook computer, or the like.
- the terminal can include components such as a processor, a memory, and the like.
- the processor may be a CPU (Central Processing Unit) or the like, and may be used to record a time point when the trigger event occurs when a preset trigger event is detected, until the preset end event is detected, and the record is acquired. At each time point, a sequence of time points is obtained, and so on.
- the memory may be a RAM (Random Access Memory), a Flash (flash memory), etc., and may be used to store received data, data required for processing, data generated during processing, and the like, such as a reference time series. Wait.
- the terminal may also include a transceiver, an input component, a display component, an audio output component, and the like.
- the transceiver can be used for data transmission with the server, for example, can receive a reference time sequence sent by the server, and the transceiver can include a Bluetooth component, a WiFi (Wireless-Fidelity) component, an antenna, a matching circuit, and a modem. Wait.
- the input component can be a touch screen, a keyboard, a mouse, or the like.
- the audio output unit can be a speaker, a headphone, or the like.
- System programs and applications can be installed in the terminal. Users use a variety of applications based on their own needs in the process of using the terminal. An application with music playback capability can be installed in the terminal.
- An exemplary embodiment of the present disclosure provides a method for searching for audio. As shown in FIG. 1, the processing flow of the method may include the following steps:
- Step S110 receiving a trigger instruction for searching for audio, and detecting a preset trigger event.
- the audio can be searched for by entering a time series into the terminal at the search interface.
- the search interface displays “Please shake the mobile phone based on the content of the search song”.
- the terminal can start detecting the preset trigger event.
- detecting the preset trigger event may include any one of the following:
- a touch signal is detected through a preset area of the touch screen of the device.
- the terminal may detect the touch signal when the user performs the tapping operation.
- the preset user action can be an action of shaking the head, clapping, swinging, jumping, kicking, and the like. It is possible to detect whether the user has performed these actions by recognizing a multi-frame image.
- the preset audio feature information may be an audio feature extracted from the acquired audio in the process of the user performing the carol, or an audio feature of the rhythm.
- the trigger event may also be an event of tapping the keyboard or the like.
- Step S120 When a preset trigger event is detected, the time point at which the trigger event occurs is recorded, until a preset end event is detected, and each time point of the record is acquired to obtain a time point sequence.
- the time point at which the trigger event occurs is recorded. For example, whenever it is detected that the terminal is shaken once, the time point at which the terminal is shaken is recorded, or the time point at which the terminal is shaken by the preset angle. Until the preset end event is detected, each time point of the record is acquired, and each time point is combined to obtain a time point sequence.
- the preset end event may be the duration of the preset elapsed time, for example, the default detection time is 30 s, the timing is started when the trigger instruction for searching for the audio is received, and after the lapse of 30 s, the detection of the preset trigger event is ended.
- the preset end event may be an input end trigger command detected.
- the button for ending the input can be set in the search interface.
- the terminal can detect the input end trigger command and end the detection of the preset trigger event.
- the preset end event may be that the countdown starts after each preset trigger event is detected. If the next preset trigger event has not been detected after the countdown expires, the detection of the preset is ended. trigger event.
- each time point of the record is obtained, and a sequence of time points is obtained. For example, it is detected that five preset trigger events have occurred, and each time point of the record is acquired, and the time point sequence is obtained: 00 seconds 20, 00 seconds 54, 00 seconds 84, 01 seconds 12, 01 seconds 37.
- the actual time point sequence is more than the data included in the example, and the manner in which the embodiment is implemented is illustrated here.
- Step S130 in the pre-stored reference time series, select a target reference time series that matches the time point sequence.
- the reference time series is a sequence consisting of time information of a plurality of consecutive audio units included in the audio data.
- the audio unit may be an audio segment corresponding to the note, or the audio unit may be an audio segment corresponding to a word in the lyrics corresponding to the audio data.
- a song can include multiple songs, and each song can include multiple notes or multiple lyrics. Multiple words can be included in multiple lyrics. Therefore, the audio data of a song is segmented, and the score can be divided according to each song. Each sentence can include multiple consecutive audio units.
- the time information may include the start time point of the audio unit or the duration of the audio unit.
- the time information of the audio unit may be the start time point of the audio segment corresponding to each note. Since the audio of the note lasts for a period of time, you can choose to record the start time point.
- the time information of the audio unit may also be the start time point of the audio segment corresponding to the word in the lyrics corresponding to the audio data. In the same way, since the audio of the words in the lyrics will last for a long time, you can choose to record the start time point. Alternatively, it is also possible to record the time difference between each two adjacent audio units, that is, the duration of the audio unit, without recording the start time point.
- the reference time series in the database can be appropriately expanded. That is, each song can be divided in different ways, and the time information of different types of audio units can be recorded. Different types of audio units include audio segments corresponding to notes, or audio segments corresponding to words in lyrics corresponding to audio data.
- the reference time series in the database can be appropriately reduced. For example, only the climax of each song or the time information of a plurality of consecutive audio units corresponding to the main song portion is recorded, and the time information of the plurality of consecutive audio units corresponding to the intro or chorus portion of each song is not recorded. Because the user usually chooses to input the sequence of time points corresponding to the climax or the main song part.
- step S130 may include: determining a time difference of every two adjacent time points in the sequence of time points based on the sequence of time points, obtaining a time difference sequence; in a pre-stored reference time series , select the target reference time series that matches the time difference sequence.
- the time point sequence input by the user may be converted into a time difference sequence and then processed. If the pre-stored reference time series in the database is the duration of the audio unit, it can be used directly. If the pre-stored reference time series in the database is the start time point of the audio unit, these start time points are also first converted into time differences for subsequent processing. Specifically, the time difference of every two adjacent time points in the sequence of time points may be determined based on the sequence of time points to obtain a sequence of time differences. Then, the time difference sequence and the reference time difference sequence are matched. The sequence of time differences converted to user input in the form of time difference can be written as The sequence of reference time differences can be recorded as with Can be expressed as an expression like this:
- step S130 may include: respectively determining a difference between each pre-stored reference time series and a time point sequence, and selecting a target reference time series having the smallest degree of difference from the time point sequence.
- the expression of the target reference time series with the smallest difference from the time point sequence may be of the following formula:
- the step of respectively determining the difference between each pre-stored reference time series and the time point sequence may include separately calculating an edit distance of each pre-stored reference time series and a time point sequence as the difference degree.
- abs() is an absolute value operator.
- a and b are weighting constants and can be valued empirically.
- c[i][j] is an edit distance matrix, which is a matrix of size n ⁇ m. And the ith The editing distance is:
- n is The number of time differences included.
- m is the ith The number of time differences included.
- the cross-correlation degree of each pre-stored reference time series and the time point sequence may also be calculated by the cross-correlation function as the degree of difference.
- the cross-correlation function can be used to calculate the cross-correlation degree, and the calculation formula is as follows:
- an EMD Earth Mover's Distance
- an EMD Earth Mover's Distance algorithm
- Step S140 determining target audio data corresponding to the target reference time series according to the correspondence between the pre-stored audio data and the reference time series.
- the song that the user is looking for is the target audio data corresponding to the target reference time difference sequence.
- the method provided in this embodiment receives a trigger instruction for searching for audio, and detects a preset trigger event; each time a preset trigger event is detected, a time point at which the trigger event occurs is recorded until a preset end event is detected, Obtaining a time point sequence at each time point of the record; selecting, in a pre-stored reference time series, a target reference time series that matches the time point sequence, wherein the reference time series is a plurality of consecutive audios included in the audio data a sequence of time information of the unit; determining target audio data corresponding to the target reference time series according to the correspondence between the pre-stored audio data and the reference time series.
- the user can input a sequence of time points reflecting the characteristics of the audio change through the operation mode corresponding to the preset trigger event (such as tapping the screen of the mobile phone), and search for the corresponding audio data based on the sequence of time points, thereby achieving no songs.
- the preset trigger event such as tapping the screen of the mobile phone
- Yet another exemplary embodiment of the present disclosure provides an apparatus for searching for audio. As shown in FIG. 3, the apparatus includes:
- the detecting module 310 is configured to receive a trigger instruction for searching for audio, and detect a preset trigger event;
- the recording module 320 is configured to record a time point at which the trigger event occurs every time the preset trigger event is detected, until a preset end event is detected, obtain each time point of the record, and obtain a time point sequence. ;
- the selecting module 330 is configured to select, in a pre-stored reference time series, a target reference time series that matches the time point sequence, wherein the reference time series is a plurality of consecutive audio units included in the audio data. a sequence of time information;
- the determining module 340 is configured to determine target audio data corresponding to the target reference time series according to the correspondence between the pre-stored audio data and the reference time series.
- the selecting module 330 is configured to:
- the selecting module 330 is configured to:
- the edit distance of each pre-stored reference time series and the time point sequence is separately calculated as the degree of difference.
- the detecting module 310 is used in any of the following situations:
- a touch signal is detected by a preset area of the touch screen of the device.
- the environment audio data is acquired by the audio collection component of the device, and the preset audio feature information is identified in the environment audio data.
- the audio unit is an audio segment corresponding to a note, or the audio unit is an audio segment corresponding to a word in a lyric corresponding to the audio data.
- the time information includes a start time point of the audio unit or a duration of the audio unit.
- the time information is a duration of the audio unit
- the selecting module 330 includes:
- a determining unit configured to determine a time difference of each two adjacent time points in the sequence of time points based on the sequence of time points, to obtain a time difference sequence
- a selecting unit configured to select, in a pre-stored reference time series, a target reference time series that matches the time difference sequence.
- the user can input a sequence of time points reflecting the characteristics of the audio change through the operation mode corresponding to the preset trigger event (such as tapping the screen of the mobile phone), and search for the corresponding audio data based on the sequence of time points, thereby achieving no songs.
- the preset trigger event such as tapping the screen of the mobile phone
- the device for searching for audio is only illustrated by the division of each functional module mentioned above. In actual applications, the function distribution may be completed by different functional modules as needed. The internal structure of the terminal is divided into different functional modules to complete all or part of the functions described above.
- the device for searching audio provided by the foregoing embodiment is the same as the method for searching for audio. The specific implementation process is described in detail in the method embodiment, and details are not described herein again.
- FIG. 4 is a schematic structural diagram of a terminal 1800 provided by an exemplary embodiment of the present application.
- the terminal 1800 can be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III), and an MP4 (Moving Picture Experts Group Audio Layer IV). Level 4) Player, laptop or desktop computer.
- Terminal 1800 may also be referred to as a user device, a portable terminal, a laptop terminal, a desktop terminal, and the like.
- the terminal 1800 includes a processor 1801 and a memory 1802.
- the processor 1801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
- the processor 1801 may be configured by at least one of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). achieve.
- the processor 1801 may also include a main processor and a coprocessor.
- the main processor is a processor for processing data in an awake state, which is also called a CPU (Central Processing Unit); the coprocessor is A low-power processor for processing data in standby.
- the processor 1801 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and rendering of the content that the display needs to display.
- the processor 1801 may further include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.
- AI Artificial Intelligence
- Memory 1802 can include one or more computer readable storage media, which can be non-transitory. Memory 1802 can also include high speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash storage devices. In some embodiments, the non-transitory computer readable storage medium in memory 1802 is for storing at least one instruction for execution by processor 1801 to implement the search audio provided by the method embodiments of the present application. Methods.
- the terminal 1800 optionally further includes: a peripheral device interface 1803 and at least one peripheral device.
- the processor 1801, the memory 1802, and the peripheral device interface 1803 may be connected by a bus or a signal line.
- Each peripheral device can be connected to the peripheral device interface 1803 via a bus, signal line or circuit board.
- the peripheral device includes at least one of a radio frequency circuit 1804, a touch display screen 1805, a camera 1806, an audio circuit 1807, a positioning component 1808, and a power source 1809.
- the peripheral device interface 1803 can be used to connect at least one peripheral device associated with an I/O (Input/Output) to the processor 1801 and the memory 1802.
- the processor 1801, the memory 1802, and the peripheral device interface 1803 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 1801, the memory 1802, and the peripheral device interface 1803 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.
- the RF circuit 1804 is configured to receive and transmit an RF (Radio Frequency) signal, also referred to as an electromagnetic signal.
- the RF circuit 1804 communicates with the communication network and other communication devices via electromagnetic signals.
- the RF circuit 1804 converts the electrical signal into an electromagnetic signal for transmission, or converts the received electromagnetic signal into an electrical signal.
- the RF circuit 1804 includes an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and the like.
- Radio frequency circuitry 1804 can communicate with other terminals via at least one wireless communication protocol.
- the wireless communication protocols include, but are not limited to, the World Wide Web, a metropolitan area network, an intranet, generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity) networks.
- the RF circuit 1804 may also include NFC (Near Field Communication) related circuitry, which is not limited in this application.
- the display 1805 is used to display a UI (User Interface).
- the UI can include graphics, text, icons, video, and any combination thereof.
- the display 1805 also has the ability to acquire touch signals over the surface or surface of the display 1805.
- the touch signal can be input to the processor 1801 as a control signal for processing.
- the display 1805 can also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards.
- the display screen 1805 can be one, and the front panel of the terminal 1800 is disposed; in other embodiments, the display screen 1805 can be at least two, respectively disposed on different surfaces of the terminal 1800 or in a folded design; In still other embodiments, display screen 1805 can be a flexible display screen disposed on a curved surface or a folded surface of terminal 1800. Even the display screen 1805 can be set to a non-rectangular irregular pattern, that is, a profiled screen.
- the display 1805 can be made of a material such as an LCD (Liquid Crystal Display) or an OLED (Organic Light-Emitting Diode).
- Camera component 1806 is used to capture images or video.
- camera assembly 1806 includes a front camera and a rear camera.
- the front camera is placed on the front panel of the terminal, and the rear camera is placed on the back of the terminal.
- the rear camera is at least two, which are respectively a main camera, a depth camera, a wide-angle camera, and a telephoto camera, so as to realize the background blur function of the main camera and the depth camera, and the main camera Combine with a wide-angle camera for panoramic shooting and VR (Virtual Reality) shooting or other integrated shooting functions.
- camera assembly 1806 can also include a flash.
- the flash can be a monochrome temperature flash or a two-color temperature flash.
- the two-color temperature flash is a combination of a warm flash and a cool flash that can be used for light compensation at different color temperatures.
- the audio circuit 1807 can include a microphone and a speaker.
- the microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals for processing into the processor 1801 for processing, or input to the RF circuit 1804 for voice communication.
- the microphones may be multiple, and are respectively disposed at different parts of the terminal 1800.
- the microphone can also be an array microphone or an omnidirectional acquisition microphone.
- the speaker is then used to convert electrical signals from the processor 1801 or the RF circuit 1804 into sound waves.
- the speaker can be a conventional film speaker or a piezoelectric ceramic speaker.
- audio circuit 1807 can also include a headphone jack.
- the positioning component 1808 is configured to locate the current geographic location of the terminal 1800 to implement navigation or LBS (Location Based Service).
- the positioning component 1808 can be a positioning component based on a US-based GPS (Global Positioning System), a Chinese Beidou system, or a Russian Galileo system.
- a power supply 1809 is used to power various components in the terminal 1800.
- the power source 1809 can be an alternating current, a direct current, a disposable battery, or a rechargeable battery.
- the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery.
- a wired rechargeable battery is a battery that is charged by a wired line
- a wireless rechargeable battery is a battery that is charged by a wireless coil.
- the rechargeable battery can also be used to support fast charging technology.
- terminal 1800 also includes one or more sensors 1810.
- the one or more sensors 1810 include, but are not limited to, an acceleration sensor 1811, a gyro sensor 1812, a pressure sensor 1813, a fingerprint sensor 1814, an optical sensor 1815, and a proximity sensor 1816.
- the acceleration sensor 1811 can detect the magnitude of the acceleration on the three coordinate axes of the coordinate system established by the terminal 1800.
- the acceleration sensor 1811 can be used to detect components of gravity acceleration on three coordinate axes.
- the processor 1801 can control the touch display screen 1805 to display the user interface in a landscape view or a portrait view according to the gravity acceleration signal collected by the acceleration sensor 1811.
- the acceleration sensor 1811 can also be used for the acquisition of game or user motion data.
- the gyro sensor 1812 can detect the body direction and the rotation angle of the terminal 1800, and the gyro sensor 1812 can cooperate with the acceleration sensor 1811 to collect the 3D action of the user on the terminal 1800. Based on the data collected by the gyro sensor 1812, the processor 1801 can implement functions such as motion sensing (such as changing the UI according to the user's tilting operation), image stabilization at the time of shooting, game control, and inertial navigation.
- functions such as motion sensing (such as changing the UI according to the user's tilting operation), image stabilization at the time of shooting, game control, and inertial navigation.
- the pressure sensor 1813 can be disposed on a side border of the terminal 1800 and/or a lower layer of the touch display screen 1805.
- the processor 1801 performs left and right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 1813.
- the processor 1801 controls the operability control on the UI interface according to the user's pressure on the touch display screen 1805.
- the operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.
- the fingerprint sensor 1814 is configured to collect the fingerprint of the user, and the processor 1801 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 1814, or the fingerprint sensor 1814 identifies the identity of the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 1801 authorizes the user to perform related sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying and changing settings, and the like.
- Fingerprint sensor 1814 can be provided with the front, back or side of terminal 1800. When the physical button or vendor logo is set on the terminal 1800, the fingerprint sensor 1814 can be integrated with the physical button or the manufacturer logo.
- Optical sensor 1815 is used to collect ambient light intensity.
- the processor 1801 can control the display brightness of the touch display 1805 based on the ambient light intensity acquired by the optical sensor 1815. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 1805 is raised; when the ambient light intensity is low, the display brightness of the touch display screen 1805 is lowered.
- the processor 1801 can also dynamically adjust the shooting parameters of the camera assembly 1806 based on the ambient light intensity acquired by the optical sensor 1815.
- Proximity sensor 1816 also referred to as a distance sensor, is typically disposed on the front panel of terminal 1800.
- Proximity sensor 1816 is used to capture the distance between the user and the front of terminal 1800.
- the processor 1801 controls the touch display 1805 to switch from the bright screen state to the interest screen state; when the proximity sensor 1816 detects When the distance between the user and the front side of the terminal 1800 gradually becomes larger, the processor 1801 controls the touch display screen 1805 to switch from the state of the screen to the bright state.
- FIG. 4 does not constitute a limitation to terminal 1800, may include more or fewer components than illustrated, or may combine certain components, or employ different component arrangements.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Biomedical Technology (AREA)
- Physiology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
本公开是关于一种搜索音频的方法和装置,属于电子技术领域。所述方法包括:接收搜索音频的触发指令,检测预设的触发事件;每当检测到预设的触发事件时,记录触发事件发生的时间点,直到检测到预设的结束事件时,获取记录的各时间点,得到时间点序列;在预先存储的基准时间序列中,选取与时间点序列相匹配的目标基准时间序列;根据预先存储的音频数据与基准时间序列的对应关系,确定目标基准时间序列对应的目标音频数据。这样,用户可以通过与预设触发事件相对应的操作方式(如敲击手机屏幕),输入反映音频变化特点的时间点序列,基于时间点序列查找对应的音频数据,从而,可以实现在没有歌曲名称的情况下进行音频数据的查找。
Description
本申请要求于2017年12月29日提交的申请号为201711474934.9、发明名称为“搜索音频的方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本公开是关于电子技术领域,尤其是关于一种搜索音频的方法和装置。
在相关技术中,用户可以使用手机中安装的音乐应用程序欣赏各种各样的音频。用户可以基于歌曲信息如歌曲名称、歌词等信息来搜索对应的音频数据,进行播放。
在实现本公开的过程中,发明人发现至少存在以下问题:
在某些情况下,如果用户不记得歌曲名称或者歌词时,就无法搜索相应的音频数据了。
发明内容
为了克服相关技术中存在的问题,本公开提供了以下技术方案:
根据本公开实施例的第一方面,提供一种搜索音频的方法,所述方法包括:
接收搜索音频的触发指令,检测预设的触发事件;
每当检测到所述预设的触发事件时,记录所述触发事件发生的时间点,直到检测到预设的结束事件时,获取记录的各时间点,得到时间点序列;
在预先存储的基准时间序列中,选取与所述时间点序列相匹配的目标基准时间序列,其中,所述基准时间序列是音频数据中包括的多个连续的音频单元的时间信息组成的序列;
根据预先存储的音频数据与基准时间序列的对应关系,确定所述目标基准时间序列对应的目标音频数据。
可选地,所述在预先存储的基准时间序列中,选取与所述时间点序列相匹 配的目标基准时间序列,包括:
分别确定每个预先存储的基准时间序列与所述时间点序列的差别度,选取与所述时间点序列的差别度最小的目标基准时间序列。
可选地,所述分别确定每个预先存储的基准时间序列与所述时间点序列的差别度,包括:
分别计算每个预先存储的基准时间序列与所述时间点序列的编辑距离,作为差别度。
可选地,所述检测到所述预设的触发事件,包括以下任一项:
检测到设备被摇动;
通过所述设备的触摸屏的预设区域检测到触摸信号;
通过所述设备的图像采集部件获取多帧图像,在所述多帧图像中检测到预设用户动作的图像;
通过所述设备的音频采集部件获取环境音频数据,在所述环境音频数据中识别到预设的音频特征信息。
可选地,所述音频单元为音符对应的音频段,或者,所述音频单元为音频数据对应的歌词中的字对应的音频段。
可选地,所述时间信息包括音频单元的开始时间点或音频单元的时长。
可选地,所述时间信息为音频单元的时长,所述在预先存储的基准时间序列中,选取与所述时间点序列相匹配的目标基准时间序列,包括:
基于所述时间点序列,确定所述时间点序列中每两个相邻的时间点的时间差,得到时间差序列;
在预先存储的基准时间序列中,选取与所述时间差序列相匹配的目标基准时间序列。
根据本公开实施例的第二方面,提供一种搜索音频的装置,所述装置包括:
检测模块,用于接收搜索音频的触发指令,检测预设的触发事件;
记录模块,用于每当检测到所述预设的触发事件时,记录所述触发事件发生的时间点,直到检测到预设的结束事件时,获取记录的各时间点,得到时间点序列;
选取模块,用于在预先存储的基准时间序列中,选取与所述时间点序列相匹配的目标基准时间序列,其中,所述基准时间序列是音频数据中包括的多个 连续的音频单元的时间信息组成的序列;
确定模块,用于根据预先存储的音频数据与基准时间序列的对应关系,确定所述目标基准时间序列对应的目标音频数据。
可选地,所述选取模块,用于:
分别确定每个预先存储的基准时间序列与所述时间点序列的差别度,选取与所述时间点序列的差别度最小的目标基准时间序列。
可选地,所述选取模块,用于:
分别计算每个预先存储的基准时间序列与所述时间点序列的编辑距离,作为差别度。
可选地,所述检测模块用于以下任一情形:
检测到设备被摇动;
通过所述设备的触摸屏的预设区域检测到触摸信号;
通过所述设备的图像采集部件获取多帧图像,在所述多帧图像中检测到预设用户动作的图像;
通过所述设备的音频采集部件获取环境音频数据,在所述环境音频数据中识别到预设的音频特征信息。
可选地,所述音频单元为音符对应的音频段,或者,所述音频单元为音频数据对应的歌词中的字对应的音频段。
可选地,所述时间信息包括音频单元的开始时间点或音频单元的时长。
可选地,所述时间信息为音频单元的时长,所述选取模块包括:
确定单元,用于基于所述时间点序列,确定所述时间点序列中每两个相邻的时间点的时间差,得到时间差序列;
选取单元,用于在预先存储的基准时间序列中,选取与所述时间差序列相匹配的目标基准时间序列。
根据本公开实施例的第三方面,提供一种终端,所述终端包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现上述搜索音频的方法。
根据本公开实施例的第四方面,提供一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条 指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现上述搜索音频的方法。
本公开的实施例提供的技术方案可以包括以下有益效果:
本实施例提供的方法,接收搜索音频的触发指令,检测预设的触发事件;每当检测到预设的触发事件时,记录触发事件发生的时间点,直到检测到预设的结束事件时,获取记录的各时间点,得到时间点序列;在预先存储的基准时间序列中,选取与时间点序列相匹配的目标基准时间序列,其中,基准时间序列是音频数据中包括的多个连续的音频单元的时间信息组成的序列;根据预先存储的音频数据与基准时间序列的对应关系,确定目标基准时间序列对应的目标音频数据。这样,用户可以通过与预设触发事件相对应的操作方式(如敲击手机屏幕),输入反映音频变化特点的时间点序列,基于时间点序列查找对应的音频数据,从而,可以实现在没有歌曲名称的情况下进行音频数据的查找。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。在附图中:
图1是根据一示例性实施例示出的一种搜索音频的方法的流程图示意图;
图2是根据一示例性实施例示出的一种音乐应用程序的搜索界面的示意图;
图3是根据一示例性实施例示出的一种搜索音频的装置的结构示意图;
图4是根据一示例性实施例示出的一种终端的结构示意图。
通过上述附图,已示出本公开明确的实施例,后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本公开构思的范围,而是通过参考特定实施例为本领域技术人员说明本公开的概念。
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方 式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
本公开实施例提供了一种搜索音频的方法,该方法可以由终端实现。
其中,终端可以是手机、平板电脑、台式计算机、笔记本计算机等。
终端可以包括处理器、存储器等部件。处理器,可以为CPU(Central Processing Unit,中央处理单元)等,可以用于当检测到预设的触发事件时,记录触发事件发生的时间点,直到检测到预设的结束事件时,获取记录的各时间点,得到时间点序列,等处理。存储器,可以为RAM(Random Access Memory,随机存取存储器),Flash(闪存)等,可以用于存储接收到的数据、处理过程所需的数据、处理过程中生成的数据等,如基准时间序列等。
终端还可以包括收发器、输入部件、显示部件、音频输出部件等。收发器,可以用于与服务器进行数据传输,例如,可以接收服务器下发的基准时间序列,收发器可以包括蓝牙部件、WiFi(Wireless-Fidelity,无线高保真技术)部件、天线、匹配电路、调制解调器等。输入部件可以是触摸屏、键盘、鼠标等。音频输出部件可以是音箱、耳机等。
终端中可以安装有系统程序和应用程序。用户在使用终端的过程中,基于自己的不同需求,会使用各种各样的应用程序。终端中可以安装有具备音乐播放功能的应用程序。
本公开一示例性实施例提供了一种搜索音频的方法,如图1所示,该方法的处理流程可以包括如下的步骤:
步骤S110,接收搜索音频的触发指令,检测预设的触发事件。
在实施中,当用户打开音乐应用程序之后,可以在搜索界面通过向终端输入时间序列来搜索音频。如图2所示,搜索界面上显示“请基于搜索歌曲的内容摇动手机”,当用户触碰了“搜索”按键,终端就可以开始检测预设的触发事件了。
可选地,检测预设的触发事件可以包括以下任一项:
1)检测到设备被摇动。
2)通过设备的触摸屏的预设区域检测到触摸信号。
3)通过设备的图像采集部件获取多帧图像,在多帧图像中检测到预设用户 动作的图像。
4)通过设备的音频采集部件获取环境音频数据,在环境音频数据中识别到预设的音频特征信息。
在实施中,对于2),可以是检测到用户在触摸屏的预设区域进行敲击的操作,在用户进行敲击的操作时,终端可以检测到触摸信号。对于3),预设用户动作可以是摇头、拍手、摆手、跳跃、跺脚等动作。可以通过识别多帧图像来检测用户是否做了这些动作。对于4),预设的音频特征信息可以是在用户进行哼歌的过程中从获取到的音频中提取的音频特征,或者打节奏的音频特征等。除上述预设的触发事件之外,如果终端是PC(Personal Computer,个人计算机),触发事件还可以是敲击键盘的事件等。
步骤S120,每当检测到预设的触发事件时,记录触发事件发生的时间点,直到检测到预设的结束事件时,获取记录的各时间点,得到时间点序列。
在实施中,每当检测到预设的触发事件时,记录触发事件发生的时间点。例如,每当检测到终端被摇动一次,记录摇动终端的时间点,或者将终端摇动预设角度的时间点。直到检测到预设的结束事件时,获取记录的各时间点,将各时间点进行组合得到时间点序列。预设的结束事件可以是经历过预设的时长,例如默认检测的时长为30s,当接收搜索音频的触发指令时开始计时,当经历过30s之后,结束检测预设的触发事件。或者,预设的结束事件可以是检测到输入结束触发指令。可以在搜索界面中设置结束输入的按键,当用户触碰了结束输入的按键,终端可以检测到输入结束触发指令,结束检测预设的触发事件。或者,预设的结束事件可以是在每次检测到预设的触发事件之后,开始进行倒计时,如果在倒计时超时后,还未检测到下一次预设的触发事件发生,则结束检测预设的触发事件。
最后,获取记录的各时间点,得到时间点序列。例如,检测到发生了5次预设的触发事件,获取记录的各时间点,得到时间点序列为:00秒20、00秒54、00秒84、01秒12、01秒37。当然,实际中时间点序列要比示例中包含的数据多,在此指示对本实施例实现的方式进行举例说明。
步骤S130,在预先存储的基准时间序列中,选取与时间点序列相匹配的目标基准时间序列。
其中,基准时间序列是音频数据中包括的多个连续的音频单元的时间信息 组成的序列。
在实施中,音频单元可以为音符对应的音频段,或者,音频单元可以为音频数据对应的歌词中的字对应的音频段。一首歌曲中可以包括多句歌,每句歌中又可以包括多个音符或者多个歌词。多个歌词中又可以包括多个字。因此,对一首歌曲的音频数据进行切分,可以按照每句歌进行切分。每句歌中可以包括多个连续的音频单元。
时间信息可以包括音频单元的开始时间点或音频单元的时长。音频单元的时间信息可以是每个音符对应的音频段的开始时间点。由于音符的音频会持续一段时长,因此可以选取记录其开始时间点。音频单元的时间信息还可以是音频数据对应的歌词中的字对应的音频段的开始时间点。同理,由于歌词中的字的音频会持续一段时长,因此可以选取记录其开始时间点。或者,不去记录开始时间点而去记录每两个相邻的音频单元相隔的时间差即音频单元的时长也可以。
为了能更好的与用户输入的时间点序列进行匹配,可以适当扩充数据库中的基准时间序列。即可以将每首歌曲按照不同的方式进行切分,并记录不同类型的音频单元的时间信息。不同类型的音频单元包括音符对应的音频段,或者,音频数据对应的歌词中的字对应的音频段。
为了减小计算量,可以适当缩减数据库中的基准时间序列。例如,只记录每首歌曲的高潮或者主歌部分对应的多个连续的音频单元的时间信息,不记录每首歌曲的前奏或者副歌部分对应的多个连续的音频单元的时间信息。因为用户一般会选择输入高潮或者主歌部分对应的时间点序列。
可选地,时间信息为音频单元的时长,步骤S130可以包括:基于时间点序列,确定时间点序列中每两个相邻的时间点的时间差,得到时间差序列;在预先存储的基准时间序列中,选取与时间差序列相匹配的目标基准时间序列。
在实施中,为了方便计算,可以将用户输入的时间点序列转换为时间差序列再进行后续处理。如果数据库中预先存储的基准时间序列是音频单元的时长则可以直接使用。如果数据库中预先存储的基准时间序列是音频单元的开始时间点,则也先将这些开始时间点转换为时间差再进行后续处理。具体地,可以基于时间点序列,确定时间点序列中每两个相邻的时间点的时间差,得到时间差序列。然后,将时间差序列和基准时间差序列进行匹配。转换为时间差形式 的用户输入的时间差序列可以记为
基准时间差序列可以记为
和
可以表示为如下所示的表达式:
可选地,步骤S130可以包括:分别确定每个预先存储的基准时间序列与时间点序列的差别度,选取与时间点序列的差别度最小的目标基准时间序列。
其中,Q为基准时间序列的数量。
可选地,分别确定每个预先存储的基准时间序列与时间点序列的差别度的步骤可以包括:分别计算每个预先存储的基准时间序列与时间点序列的编辑距离,作为差别度。
在实施中,当i=0或j=0时,有:
c[i][j]=0 (表达式4)
当i>0且j>0时,有:
可选地,还可以通过互相关函数来计算每个预先存储的基准时间序列与时间点序列的互相关度,作为差别度。
在实施中,可以采用互相关函数来计算互相关度,计算公式如下:
可选地,还可以通过EMD(Earth Mover's Distance)算法来计算每个预先 存储的基准时间序列与时间点序列的差别度。
步骤S140,根据预先存储的音频数据与基准时间序列的对应关系,确定目标基准时间序列对应的目标音频数据。
在实施中,当用户输入的时间差序列和目标基准时间差序列匹配上时,就可以认为用户要找的歌曲就是目标基准时间差序列所对应的目标音频数据。
本实施例提供的方法,接收搜索音频的触发指令,检测预设的触发事件;每当检测到预设的触发事件时,记录触发事件发生的时间点,直到检测到预设的结束事件时,获取记录的各时间点,得到时间点序列;在预先存储的基准时间序列中,选取与时间点序列相匹配的目标基准时间序列,其中,基准时间序列是音频数据中包括的多个连续的音频单元的时间信息组成的序列;根据预先存储的音频数据与基准时间序列的对应关系,确定目标基准时间序列对应的目标音频数据。这样,用户可以通过与预设触发事件相对应的操作方式(如敲击手机屏幕),输入反映音频变化特点的时间点序列,基于时间点序列查找对应的音频数据,从而,可以实现在没有歌曲名称的情况下进行音频数据的查找。
本公开又一示例性实施例提供了一种搜索音频的装置,如图3所示,该装置包括:
检测模块310,用于接收搜索音频的触发指令,检测预设的触发事件;
记录模块320,用于每当检测到所述预设的触发事件时,记录所述触发事件发生的时间点,直到检测到预设的结束事件时,获取记录的各时间点,得到时间点序列;
选取模块330,用于在预先存储的基准时间序列中,选取与所述时间点序列相匹配的目标基准时间序列,其中,所述基准时间序列是音频数据中包括的多个连续的音频单元的时间信息组成的序列;
确定模块340,用于根据预先存储的音频数据与基准时间序列的对应关系,确定所述目标基准时间序列对应的目标音频数据。
可选地,所述选取模块330,用于:
分别确定每个预先存储的基准时间序列与所述时间点序列的差别度,选取与所述时间点序列的差别度最小的目标基准时间序列。
可选地,所述选取模块330,用于:
分别计算每个预先存储的基准时间序列与所述时间点序列的编辑距离,作为差别度。
可选地,所述检测模块310用于以下任一情形:
检测到设备被摇动;
通过所述设备的触摸屏的预设区域检测到触摸信号;
通过所述设备的图像采集部件获取多帧图像,在所述多帧图像中检测到预设用户动作的图像;
通过所述设备的音频采集部件获取环境音频数据,在所述环境音频数据中识别到预设的音频特征信息。
可选地,所述音频单元为音符对应的音频段,或者,所述音频单元为音频数据对应的歌词中的字对应的音频段。
可选地,所述时间信息包括音频单元的开始时间点或音频单元的时长。
可选地,所述时间信息为音频单元的时长,所述选取模块330包括:
确定单元,用于基于所述时间点序列,确定所述时间点序列中每两个相邻的时间点的时间差,得到时间差序列;
选取单元,用于在预先存储的基准时间序列中,选取与所述时间差序列相匹配的目标基准时间序列。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
这样,用户可以通过与预设触发事件相对应的操作方式(如敲击手机屏幕),输入反映音频变化特点的时间点序列,基于时间点序列查找对应的音频数据,从而,可以实现在没有歌曲名称的情况下进行音频数据的查找。
需要说明的是:上述实施例提供的搜索音频的装置在搜索音频时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将终端的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的搜索音频的装置与搜索音频的方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图4示出了本申请一个示例性实施例提供的终端1800的结构示意图。该终 端1800可以是:智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。终端1800还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。
通常,终端1800包括有:处理器1801和存储器1802。
处理器1801可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器1801可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器1801也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器1801可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器1801还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器1802可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器1802还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器1802中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器1801所执行以实现本申请中方法实施例提供的搜索音频的方法。
在一些实施例中,终端1800还可选包括有:外围设备接口1803和至少一个外围设备。处理器1801、存储器1802和外围设备接口1803之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口1803相连。具体地,外围设备包括:射频电路1804、触摸显示屏1805、摄像头1806、音频电路1807、定位组件1808和电源1809中的至少一种。
外围设备接口1803可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器1801和存储器1802。在一些实施例中,处理器1801、 存储器1802和外围设备接口1803被集成在同一芯片或电路板上;在一些其他实施例中,处理器1801、存储器1802和外围设备接口1803中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限定。
射频电路1804用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路1804通过电磁信号与通信网络以及其他通信设备进行通信。射频电路1804将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。可选地,射频电路1804包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路1804可以通过至少一种无线通信协议来与其它终端进行通信。该无线通信协议包括但不限于:万维网、城域网、内联网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。在一些实施例中,射频电路1804还可以包括NFC(Near Field Communication,近距离无线通信)有关的电路,本申请对此不加以限定。
显示屏1805用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏1805是触摸显示屏时,显示屏1805还具有采集在显示屏1805的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器1801进行处理。此时,显示屏1805还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏1805可以为一个,设置终端1800的前面板;在另一些实施例中,显示屏1805可以为至少两个,分别设置在终端1800的不同表面或呈折叠设计;在再一些实施例中,显示屏1805可以是柔性显示屏,设置在终端1800的弯曲表面上或折叠面上。甚至,显示屏1805还可以设置成非矩形的不规则图形,也即异形屏。显示屏1805可以采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件1806用于采集图像或视频。可选地,摄像头组件1806包括前置摄像头和后置摄像头。通常,前置摄像头设置在终端的前面板,后置摄像头设置在终端的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能或者其它融合拍摄功能。在一些 实施例中,摄像头组件1806还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,可以用于不同色温下的光线补偿。
音频电路1807可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器1801进行处理,或者输入至射频电路1804以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在终端1800的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器1801或射频电路1804的电信号转换为声波。扬声器可以是传统的薄膜扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路1807还可以包括耳机插孔。
定位组件1808用于定位终端1800的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。定位组件1808可以是基于美国的GPS(Global Positioning System,全球定位系统)、中国的北斗系统或俄罗斯的伽利略系统的定位组件。
电源1809用于为终端1800中的各个组件进行供电。电源1809可以是交流电、直流电、一次性电池或可充电电池。当电源1809包括可充电电池时,该可充电电池可以是有线充电电池或无线充电电池。有线充电电池是通过有线线路充电的电池,无线充电电池是通过无线线圈充电的电池。该可充电电池还可以用于支持快充技术。
在一些实施例中,终端1800还包括有一个或多个传感器1810。该一个或多个传感器1810包括但不限于:加速度传感器1811、陀螺仪传感器1812、压力传感器1813、指纹传感器1814、光学传感器1815以及接近传感器1816。
加速度传感器1811可以检测以终端1800建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器1811可以用于检测重力加速度在三个坐标轴上的分量。处理器1801可以根据加速度传感器1811采集的重力加速度信号,控制触摸显示屏1805以横向视图或纵向视图进行用户界面的显示。加速度传感器1811还可以用于游戏或者用户的运动数据的采集。
陀螺仪传感器1812可以检测终端1800的机体方向及转动角度,陀螺仪传 感器1812可以与加速度传感器1811协同采集用户对终端1800的3D动作。处理器1801根据陀螺仪传感器1812采集的数据,可以实现如下功能:动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。
压力传感器1813可以设置在终端1800的侧边框和/或触摸显示屏1805的下层。当压力传感器1813设置在终端1800的侧边框时,可以检测用户对终端1800的握持信号,由处理器1801根据压力传感器1813采集的握持信号进行左右手识别或快捷操作。当压力传感器1813设置在触摸显示屏1805的下层时,由处理器1801根据用户对触摸显示屏1805的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
指纹传感器1814用于采集用户的指纹,由处理器1801根据指纹传感器1814采集到的指纹识别用户的身份,或者,由指纹传感器1814根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器1801授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器1814可以被设置终端1800的正面、背面或侧面。当终端1800上设置有物理按键或厂商Logo时,指纹传感器1814可以与物理按键或厂商Logo集成在一起。
光学传感器1815用于采集环境光强度。在一个实施例中,处理器1801可以根据光学传感器1815采集的环境光强度,控制触摸显示屏1805的显示亮度。具体地,当环境光强度较高时,调高触摸显示屏1805的显示亮度;当环境光强度较低时,调低触摸显示屏1805的显示亮度。在另一个实施例中,处理器1801还可以根据光学传感器1815采集的环境光强度,动态调整摄像头组件1806的拍摄参数。
接近传感器1816,也称距离传感器,通常设置在终端1800的前面板。接近传感器1816用于采集用户与终端1800的正面之间的距离。在一个实施例中,当接近传感器1816检测到用户与终端1800的正面之间的距离逐渐变小时,由处理器1801控制触摸显示屏1805从亮屏状态切换为息屏状态;当接近传感器1816检测到用户与终端1800的正面之间的距离逐渐变大时,由处理器1801控制触摸显示屏1805从息屏状态切换为亮屏状态。
本领域技术人员可以理解,图4中示出的结构并不构成对终端1800的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
本领域技术人员在考虑说明书及实践这里公开的公开后,将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。
Claims (16)
- 一种搜索音频的方法,其特征在于,所述方法包括:接收搜索音频的触发指令,检测预设的触发事件;每当检测到所述预设的触发事件时,记录所述触发事件发生的时间点,直到检测到预设的结束事件时,获取记录的各时间点,得到时间点序列;在预先存储的基准时间序列中,选取与所述时间点序列相匹配的目标基准时间序列,其中,所述基准时间序列是音频数据中包括的多个连续的音频单元的时间信息组成的序列;根据预先存储的音频数据与基准时间序列的对应关系,确定所述目标基准时间序列对应的目标音频数据。
- 根据权利要求1所述的方法,其特征在于,所述在预先存储的基准时间序列中,选取与所述时间点序列相匹配的目标基准时间序列,包括:分别确定每个预先存储的基准时间序列与所述时间点序列的差别度,选取与所述时间点序列的差别度最小的目标基准时间序列。
- 根据权利要求2所述的方法,其特征在于,所述分别确定每个预先存储的基准时间序列与所述时间点序列的差别度,包括:分别计算每个预先存储的基准时间序列与所述时间点序列的编辑距离,作为差别度。
- 根据权利要求1所述的方法,其特征在于,所述检测预设的触发事件,包括以下任一项:检测到设备被摇动;通过所述设备的触摸屏的预设区域检测到触摸信号;通过所述设备的图像采集部件获取多帧图像,在所述多帧图像中检测到预设用户动作的图像;通过所述设备的音频采集部件获取环境音频数据,在所述环境音频数据中识别到预设的音频特征信息。
- 根据权利要求1所述的方法,其特征在于,所述音频单元为音符对应的音频段,或者,所述音频单元为音频数据对应的歌词中的字对应的音频段。
- 根据权利要求1所述的方法,其特征在于,所述时间信息包括音频单元 的开始时间点或音频单元的时长。
- 根据权利要求1所述的方法,其特征在于,所述时间信息为音频单元的时长,所述在预先存储的基准时间序列中,选取与所述时间点序列相匹配的目标基准时间序列,包括:基于所述时间点序列,确定所述时间点序列中每两个相邻的时间点的时间差,得到时间差序列;在预先存储的基准时间序列中,选取与所述时间差序列相匹配的目标基准时间序列。
- 一种搜索音频的装置,其特征在于,所述装置包括:检测模块,用于接收搜索音频的触发指令,检测预设的触发事件;记录模块,用于每当检测到所述预设的触发事件时,记录所述触发事件发生的时间点,直到检测到预设的结束事件时,获取记录的各时间点,得到时间点序列;选取模块,用于在预先存储的基准时间序列中,选取与所述时间点序列相匹配的目标基准时间序列,其中,所述基准时间序列是音频数据中包括的多个连续的音频单元的时间信息组成的序列;确定模块,用于根据预先存储的音频数据与基准时间序列的对应关系,确定所述目标基准时间序列对应的目标音频数据。
- 根据权利要求8所述的装置,其特征在于,所述选取模块,用于:分别确定每个预先存储的基准时间序列与所述时间点序列的差别度,选取与所述时间点序列的差别度最小的目标基准时间序列。
- 根据权利要求9所述的装置,其特征在于,所述选取模块,用于:分别计算每个预先存储的基准时间序列与所述时间点序列的编辑距离,作为差别度。
- 根据权利要求8所述的装置,其特征在于,所述检测模块用于以下任一情形:检测到设备被摇动;通过所述设备的触摸屏的预设区域检测到触摸信号;通过所述设备的图像采集部件获取多帧图像,在所述多帧图像中检测到预 设用户动作的图像;通过所述设备的音频采集部件获取环境音频数据,在所述环境音频数据中识别到预设的音频特征信息。
- 根据权利要求8所述的装置,其特征在于,所述音频单元为音符对应的音频段,或者,所述音频单元为音频数据对应的歌词中的字对应的音频段。
- 根据权利要求8所述的装置,其特征在于,所述时间信息包括音频单元的开始时间点或音频单元的时长。
- 根据权利要求8所述的装置,其特征在于,所述时间信息为音频单元的时长,所述选取模块包括:确定单元,用于基于所述时间点序列,确定所述时间点序列中每两个相邻的时间点的时间差,得到时间差序列;选取单元,用于在预先存储的基准时间序列中,选取与所述时间差序列相匹配的目标基准时间序列。
- 一种终端,其特征在于,所述终端包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1-7任一所述的搜索音频的方法。
- 一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如权利要求1-7任一所述的搜索音频的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP18897526.2A EP3629198A4 (en) | 2017-12-29 | 2018-11-26 | SOUND SEARCHING METHOD AND DEVICE |
US16/617,936 US11574009B2 (en) | 2017-12-29 | 2018-11-26 | Method, apparatus and computer device for searching audio, and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711474934.9A CN108090210A (zh) | 2017-12-29 | 2017-12-29 | 搜索音频的方法和装置 |
CN201711474934.9 | 2017-12-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019128593A1 true WO2019128593A1 (zh) | 2019-07-04 |
Family
ID=62180566
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/117509 WO2019128593A1 (zh) | 2017-12-29 | 2018-11-26 | 搜索音频的方法和装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11574009B2 (zh) |
EP (1) | EP3629198A4 (zh) |
CN (1) | CN108090210A (zh) |
WO (1) | WO2019128593A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111489757A (zh) * | 2020-03-26 | 2020-08-04 | 北京达佳互联信息技术有限公司 | 音频处理方法、装置、电子设备及可读存储介质 |
CN112946304A (zh) * | 2019-11-26 | 2021-06-11 | 深圳市帝迈生物技术有限公司 | 样本检测的插入方法、样本检测设备以及存储介质 |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108090210A (zh) | 2017-12-29 | 2018-05-29 | 广州酷狗计算机科技有限公司 | 搜索音频的方法和装置 |
CN109815360B (zh) * | 2019-01-28 | 2023-12-29 | 腾讯科技(深圳)有限公司 | 音频数据的处理方法、装置和设备 |
CN111081276B (zh) * | 2019-12-04 | 2023-06-27 | 广州酷狗计算机科技有限公司 | 音频段的匹配方法、装置、设备及可读存储介质 |
CN112069350A (zh) * | 2020-09-09 | 2020-12-11 | 广州酷狗计算机科技有限公司 | 歌曲推荐方法、装置、设备以及计算机存储介质 |
CN112529871B (zh) * | 2020-12-11 | 2024-02-23 | 杭州海康威视系统技术有限公司 | 评价图像的方法、装置及计算机存储介质 |
CN113157968B (zh) * | 2021-04-07 | 2024-10-11 | 腾讯音乐娱乐科技(深圳)有限公司 | 获取同旋律音频组方法、终端及存储介质 |
CN115702993B (zh) * | 2021-08-12 | 2023-10-31 | 荣耀终端有限公司 | 跳绳状态的检测方法及电子设备 |
CN113920473B (zh) * | 2021-10-15 | 2022-07-29 | 宿迁硅基智能科技有限公司 | 完整事件确定方法、存储介质及电子装置 |
CN117221016B (zh) * | 2023-11-09 | 2024-01-12 | 北京亚康万玮信息技术股份有限公司 | 一种远程连接过程中数据安全传输方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103177722A (zh) * | 2013-03-08 | 2013-06-26 | 北京理工大学 | 一种基于音色相似度的歌曲检索方法 |
US20170162205A1 (en) * | 2015-12-07 | 2017-06-08 | Semiconductor Components Industries, Llc | Method and apparatus for a low power voice trigger device |
CN107229629A (zh) * | 2016-03-24 | 2017-10-03 | 腾讯科技(深圳)有限公司 | 音频识别方法及装置 |
CN108090210A (zh) * | 2017-12-29 | 2018-05-29 | 广州酷狗计算机科技有限公司 | 搜索音频的方法和装置 |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB9918611D0 (en) | 1999-08-07 | 1999-10-13 | Sibelius Software Ltd | Music database searching |
US6188010B1 (en) * | 1999-10-29 | 2001-02-13 | Sony Corporation | Music search by melody input |
US20070143499A1 (en) * | 2003-12-30 | 2007-06-21 | Ting-Mao Chang | Proximity triggered job scheduling system and method |
GB0406512D0 (en) * | 2004-03-23 | 2004-04-28 | British Telecomm | Method and system for semantically segmenting scenes of a video sequence |
JP5225548B2 (ja) * | 2005-03-25 | 2013-07-03 | ソニー株式会社 | コンテンツ検索方法、コンテンツリスト検索方法、コンテンツ検索装置、コンテンツリスト検索装置および検索サーバ |
US20070254271A1 (en) * | 2006-04-28 | 2007-11-01 | Volodimir Burlik | Method, apparatus and software for play list selection in digital music players |
US8116746B2 (en) | 2007-03-01 | 2012-02-14 | Microsoft Corporation | Technologies for finding ringtones that match a user's hummed rendition |
JP5023752B2 (ja) | 2007-03-22 | 2012-09-12 | ソニー株式会社 | コンテンツ検索装置、コンテンツ検索方法及びコンテンツ検索プログラム |
US8831946B2 (en) | 2007-07-23 | 2014-09-09 | Nuance Communications, Inc. | Method and system of indexing speech data |
US7904530B2 (en) * | 2008-01-29 | 2011-03-08 | Palo Alto Research Center Incorporated | Method and apparatus for automatically incorporating hypothetical context information into recommendation queries |
US20130297599A1 (en) * | 2009-11-10 | 2013-11-07 | Dulcetta Inc. | Music management for adaptive distraction reduction |
US20110295843A1 (en) * | 2010-05-26 | 2011-12-01 | Apple Inc. | Dynamic generation of contextually aware playlists |
US10459972B2 (en) * | 2012-09-07 | 2019-10-29 | Biobeats Group Ltd | Biometric-music interaction methods and systems |
JP6149365B2 (ja) * | 2012-09-20 | 2017-06-21 | カシオ計算機株式会社 | 情報生成装置、情報生成方法及びプログラム |
US9213819B2 (en) * | 2014-04-10 | 2015-12-15 | Bank Of America Corporation | Rhythm-based user authentication |
US9860286B2 (en) * | 2014-09-24 | 2018-01-02 | Sonos, Inc. | Associating a captured image with a media item |
JP6794990B2 (ja) * | 2015-09-30 | 2020-12-02 | ヤマハ株式会社 | 楽曲検索方法および楽曲検索装置 |
US20170300531A1 (en) * | 2016-04-14 | 2017-10-19 | Sap Se | Tag based searching in data analytics |
CN105808996A (zh) * | 2016-05-12 | 2016-07-27 | 北京小米移动软件有限公司 | 终端解锁屏方法及装置 |
CN106790997A (zh) * | 2016-11-23 | 2017-05-31 | 北京小米移动软件有限公司 | 铃声设置方法、装置和电子设备 |
-
2017
- 2017-12-29 CN CN201711474934.9A patent/CN108090210A/zh active Pending
-
2018
- 2018-11-26 EP EP18897526.2A patent/EP3629198A4/en not_active Ceased
- 2018-11-26 US US16/617,936 patent/US11574009B2/en active Active
- 2018-11-26 WO PCT/CN2018/117509 patent/WO2019128593A1/zh unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103177722A (zh) * | 2013-03-08 | 2013-06-26 | 北京理工大学 | 一种基于音色相似度的歌曲检索方法 |
US20170162205A1 (en) * | 2015-12-07 | 2017-06-08 | Semiconductor Components Industries, Llc | Method and apparatus for a low power voice trigger device |
CN107229629A (zh) * | 2016-03-24 | 2017-10-03 | 腾讯科技(深圳)有限公司 | 音频识别方法及装置 |
CN108090210A (zh) * | 2017-12-29 | 2018-05-29 | 广州酷狗计算机科技有限公司 | 搜索音频的方法和装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3629198A4 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112946304A (zh) * | 2019-11-26 | 2021-06-11 | 深圳市帝迈生物技术有限公司 | 样本检测的插入方法、样本检测设备以及存储介质 |
CN111489757A (zh) * | 2020-03-26 | 2020-08-04 | 北京达佳互联信息技术有限公司 | 音频处理方法、装置、电子设备及可读存储介质 |
CN111489757B (zh) * | 2020-03-26 | 2023-08-18 | 北京达佳互联信息技术有限公司 | 音频处理方法、装置、电子设备及可读存储介质 |
Also Published As
Publication number | Publication date |
---|---|
EP3629198A1 (en) | 2020-04-01 |
CN108090210A (zh) | 2018-05-29 |
US20200104320A1 (en) | 2020-04-02 |
US11574009B2 (en) | 2023-02-07 |
EP3629198A4 (en) | 2020-05-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019128593A1 (zh) | 搜索音频的方法和装置 | |
WO2019105351A1 (zh) | 确定k歌分值的方法和装置 | |
CN107908929B (zh) | 播放音频数据的方法和装置 | |
CN107978323B (zh) | 音频识别方法、装置及存储介质 | |
CN108829881B (zh) | 视频标题生成方法及装置 | |
CN110491358B (zh) | 进行音频录制的方法、装置、设备、系统及存储介质 | |
CN109327608B (zh) | 歌曲分享的方法、终端、服务器和系统 | |
CN110209871B (zh) | 歌曲评论发布方法及装置 | |
WO2020103550A1 (zh) | 音频信号的评分方法、装置、终端设备及计算机存储介质 | |
CN111061405B (zh) | 录制歌曲音频的方法、装置、设备及存储介质 | |
WO2019127899A1 (zh) | 歌词添加方法及装置 | |
CN109192218B (zh) | 音频处理的方法和装置 | |
WO2022111168A1 (zh) | 视频的分类方法和装置 | |
CN108922506A (zh) | 歌曲音频生成方法、装置和计算机可读存储介质 | |
CN111711838B (zh) | 视频切换方法、装置、终端、服务器及存储介质 | |
WO2022134634A1 (zh) | 视频处理方法及电子设备 | |
CN111081277B (zh) | 音频测评的方法、装置、设备及存储介质 | |
WO2020253129A1 (zh) | 歌曲显示方法、装置、设备及存储介质 | |
CN108831423B (zh) | 提取音频数据中主旋律音轨的方法、装置、终端及存储介质 | |
CN112118482A (zh) | 音频文件的播放方法、装置、终端及存储介质 | |
CN111611430A (zh) | 歌曲播放方法、装置、终端及存储介质 | |
CN109005359B (zh) | 视频录制方法、装置存储介质 | |
CN109003627B (zh) | 确定音频得分的方法、装置、终端及存储介质 | |
CN108806730B (zh) | 音频处理方法、装置及计算机可读存储介质 | |
CN109491636A (zh) | 音乐播放方法、装置及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18897526 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2018897526 Country of ref document: EP Effective date: 20191129 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |