CN110209877A - Video analysis method and device - Google Patents
Video analysis method and device Download PDFInfo
- Publication number
- CN110209877A CN110209877A CN201810118854.8A CN201810118854A CN110209877A CN 110209877 A CN110209877 A CN 110209877A CN 201810118854 A CN201810118854 A CN 201810118854A CN 110209877 A CN110209877 A CN 110209877A
- Authority
- CN
- China
- Prior art keywords
- eyeglass section
- label
- dividing
- eyeglass
- section
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/7867—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
Abstract
This disclosure relates to video analysis method and device.This method comprises: carrying out shot segmentation to video, each point of eyeglass section of the video is obtained;Divide eyeglass section for any one, divides the audio of eyeglass section to carry out speech recognition this, obtain the word content of the audio;According to the word content of the audio, determine that this divides a group of labels of eyeglass section;Divide the video frame of eyeglass section to carry out target identification this, obtains the target identification result for dividing eyeglass section;According to the target identification as a result, determining that this divides second group of label of eyeglass section;According to a group of labels and second group of label, determine that this divides the label of eyeglass section.The disclosure can automatically determine the label of each point of eyeglass section of video, it labels after video to video without manually watching, it can be improved accuracy and efficiency that label determines, and, based on dividing eyeglass section to determine label, it allows users to determine the corresponding video clip of label, to allow users to direct viewing video clip relevant to label.
Description
Technical field
This disclosure relates to video technique field more particularly to a kind of video analysis method and device.
Background technique
Video is analyzed, and is labelled to video, facilitates user according to keyword search to respective labels
Video.In the related technology, after analyzing video, usually entire video determines label.In this case, Yong Huwu
Method determines which video clip relevant to the label of video has, therefore usually requires to watch video from the beginning, and can not be straight
Connect viewing video clip relevant to label.
Summary of the invention
In view of this, the present disclosure proposes a kind of video analysis method and devices.
According to the one side of the disclosure, a kind of video analysis method is provided, comprising:
Shot segmentation is carried out to video, obtains each point of eyeglass section of the video;
Divide eyeglass section for any one, speech recognition is carried out to the audio for dividing eyeglass section, obtains the audio
Word content;
According to the word content of the audio, a group of labels of eyeglass section are divided described in determination;
Target identification is carried out to the video frame for dividing eyeglass section, obtains the target identification result for dividing eyeglass section;
According to the target identification as a result, dividing second group of label of eyeglass section described in determining;
According to a group of labels and second group of label, the label of eyeglass section is divided described in determination.
In one possible implementation, after dividing the label of eyeglass section described in the determination, the method also includes:
It is multiple continuously divide the label of eyeglass section to match in the case where, by it is the multiple continuously divide eyeglass section merge
Divide eyeglass section for combination;
According to the multiple label for continuously dividing eyeglass section, the label for combining and dividing eyeglass section is determined.
In one possible implementation, the method also includes:
It is multiple continuously divide eyeglass section that there is N number of above identical label in the case where, determine it is the multiple continuously
The label of eyeglass section is divided to match, wherein N is positive integer.
In one possible implementation, target identification is carried out to the video frame for dividing eyeglass section, obtains described point
The target identification result of eyeglass section, comprising:
Target identification is carried out to the key frame for dividing eyeglass section, obtains the target identification result for dividing eyeglass section.
In one possible implementation, the target identification includes one of object identification and recognition of face or two
Kind.
In one possible implementation, according to the word content of the audio, determine described in divide the first of eyeglass section
Group label, comprising:
Word segmentation processing is carried out to the word content of the audio, obtains word segmentation result;
Keyword in the word segmentation result is divided to a group of labels of eyeglass section as described in.
According to another aspect of the present disclosure, a kind of video analysis device is provided, comprising:
Shot segmentation module obtains each point of eyeglass section of the video for carrying out shot segmentation to video;
Speech recognition module carries out voice knowledge to the audio for dividing eyeglass section for dividing eyeglass section for any one
Not, the word content of the audio is obtained;
First determining module divides a group of labels of eyeglass section for the word content according to the audio described in determination;
Target identification module, the video frame for dividing eyeglass section to described carry out target identification, obtain described dividing eyeglass section
Target identification result;
Second determining module, for according to the target identification as a result, dividing second group of label of eyeglass section described in determining;
Third determining module, for dividing eyeglass section described according to a group of labels and second group of label, determining
Label.
In one possible implementation, described device further include:
Merging module, for it is multiple continuously divide the label of eyeglass section to match in the case where, will it is the multiple continuously
Divide eyeglass section to merge into combination to divide eyeglass section;
4th determining module, for determining that the combination divides eyeglass according to the multiple label for continuously dividing eyeglass section
The label of section.
In one possible implementation, described device further include:
5th determining module, for it is multiple continuously divide eyeglass section have it is N number of more than identical label in the case where, really
It is calmly the multiple continuously the label of eyeglass section to be divided to match, wherein N is positive integer.
In one possible implementation, the target identification module is used for:
Target identification is carried out to the key frame for dividing eyeglass section, obtains the target identification result for dividing eyeglass section.
In one possible implementation, the target identification includes one of object identification and recognition of face or two
Kind.
In one possible implementation, first determining module includes:
Word segmentation processing submodule carries out word segmentation processing for the word content to the audio, obtains word segmentation result;
Submodule is determined, for the keyword in the word segmentation result to be divided to a group of labels of eyeglass section as described in.
According to another aspect of the present disclosure, a kind of video analysis device is provided, comprising: processor;It is handled for storage
The memory of device executable instruction;Wherein, the processor is configured to executing the above method.
According to another aspect of the present disclosure, a kind of non-volatile computer readable storage medium storing program for executing is provided, is stored thereon with
Computer program instructions, wherein the computer program instructions realize the above method when being executed by processor.
The video analysis method and device of all aspects of this disclosure obtains the video by carrying out shot segmentation to video
Each point of eyeglass section carries out speech recognition and target identification to each point of eyeglass section respectively, obtains the label of each point of eyeglass section,
Thus, it is possible to automatically determine the label of each point of eyeglass section of video, label after video to video without manually watching, energy
Accuracy and efficiency that label determines enough are improved, also, based on dividing eyeglass section to determine label, allows users to determine that label is corresponding
Video clip, to allow users to direct viewing video clip relevant to label.
According to below with reference to the accompanying drawings to detailed description of illustrative embodiments, the other feature and aspect of the disclosure will become
It is clear.
Detailed description of the invention
Comprising in the description and constituting the attached drawing of part of specification and specification together illustrates the disclosure
Exemplary embodiment, feature and aspect, and for explaining the principles of this disclosure.
Fig. 1 shows the flow chart of the video analysis method according to one embodiment of the disclosure.
Fig. 2 shows the illustrative flow charts according to the video analysis method of one embodiment of the disclosure.
Fig. 3 shows an illustrative flow chart of the video analysis method and step S13 according to one embodiment of the disclosure.
Fig. 4 shows the block diagram of the video analysis device according to one embodiment of the disclosure.
Fig. 5 shows an illustrative block diagram of the video analysis device according to one embodiment of the disclosure.
Fig. 6 is a kind of block diagram of device 800 for video analysis shown according to an exemplary embodiment.
Fig. 7 is a kind of block diagram of device 1900 for video analysis shown according to an exemplary embodiment.
Specific embodiment
Various exemplary embodiments, feature and the aspect of the disclosure are described in detail below with reference to attached drawing.It is identical in attached drawing
Appended drawing reference indicate element functionally identical or similar.Although the various aspects of embodiment are shown in the attached drawings, remove
It non-specifically points out, it is not necessary to attached drawing drawn to scale.
Dedicated word " exemplary " means " being used as example, embodiment or illustrative " herein.Here as " exemplary "
Illustrated any embodiment should not necessarily be construed as preferred or advantageous over other embodiments.
In addition, giving numerous details in specific embodiment below to better illustrate the disclosure.
It will be appreciated by those skilled in the art that without certain details, the disclosure equally be can be implemented.In some instances, for
Method, means, element and circuit well known to those skilled in the art are not described in detail, in order to highlight the purport of the disclosure.
Fig. 1 shows the flow chart of the video analysis method according to one embodiment of the disclosure.This method can be applied to terminal
It in equipment, also can be applied in server, be not limited thereto.As shown in Figure 1, the method comprising the steps of S11 is to step
S16。
In step s 11, shot segmentation is carried out to video, obtains each point of eyeglass section of the video.
In the present embodiment, before carrying out shot segmentation to video, the essential information of video can be extracted.For example, view
The essential information of frequency may include one or more in duration, Container Format and coded format of video etc..
In one possible implementation, shot segmentation is carried out to video, obtains each point of eyeglass section of video, it can be with
It include: the feature for extracting each key frame of video;According to the feature of adjacent key frame, the similar of adjacent key frame is calculated
Degree;In the case where the similarity of adjacent key frame is less than first threshold, determine that there are camera lenses between the adjacent key frame
Switching;In the case where the similarity of adjacent key frame is greater than or equal to first threshold, determine that the adjacent key frame belongs to
The same camera lens.It, can will be after in the adjacent key frame there are in the case where Shot change between adjacent key frame
Start frame of one key frame as new camera lens.For example, in the latter key frame that key frame 2 is key frame 1, and key frame
It, can start frame by key frame 2 as new camera lens there are in the case where Shot change between 1 and key frame 2.In this implementation
In example, the corresponding video clip of a camera lens of video can be determined as a point of eyeglass section.
In alternatively possible implementation, shot segmentation is carried out to video, obtains each point of eyeglass section of video, it can
To include: to be tracked to the moving target in video, using tracking moving object technology to determine each point of eyeglass of video
Section.
In step s 12, divide eyeglass section for any one, divide the audio of eyeglass section to carry out speech recognition this, obtain
The word content of the audio.
In one possible implementation, depth learning technology can be used, voice is carried out to the audio for dividing eyeglass section
Identification, obtains the word content of the audio.
In step s 13, according to the word content of the audio, determine that this divides a group of labels of eyeglass section.
The present embodiment is based on the determining label for dividing eyeglass section of the corresponding word content of audio for dividing eyeglass section, to make a point mirror
The label of segment can embody a point audio content for eyeglass section, so as to further increase the accuracy of identified label,
Label determined by making can more reflect a point video content for eyeglass section.
In step S14, divides the video frame of eyeglass section to carry out target identification this, obtain the target identification for dividing eyeglass section
As a result.
In one possible implementation, divide the video frame of eyeglass section to carry out target identification this, obtain this and divide eyeglass
Section target identification as a result, may include: to this divide eyeglass section key frame carry out target identification, obtain the mesh for dividing eyeglass section
Mark recognition result.
In this implementation, by carrying out target identification to the key frame for dividing eyeglass section, without to the institute for dividing eyeglass section
There is video frame to carry out target identification, the speed of target identification can be greatly improved under the premise of guaranteeing the accuracy of target identification
Degree.
In one possible implementation, target identification includes one or both of object identification and recognition of face.
In step S15, according to the target identification as a result, determining that this divides second group of label of eyeglass section.
For example, dividing second group of label of eyeglass section may include personage in the case where target identification includes recognition of face
Title.
In step s 16, according to a group of labels and second group of label, determine that this divides the label of eyeglass section.
In one possible implementation, a group of labels and second group of label can be converged using summary device
Always, the label for dividing eyeglass section is obtained.
In one possible implementation, duplicate label in a group of labels and second group of label can be removed, is obtained
To duplicate removal as a result, can simultaneously divide the label of eyeglass section using duplicate removal result as this.
It should be noted that in the present embodiment each step execution sequence can for step S11, step S12, step S13,
Step S14, step S15 and step S16, or step S11, step S14, step S15, step S12, step S13 and step
Rapid S16.
The present embodiment obtains each point of eyeglass section of the video, to each point of eyeglass by carrying out shot segmentation to video
Section carries out speech recognition and target identification respectively, obtains the label of each point of eyeglass section, thus, it is possible to automatically determine each of video
A label for dividing eyeglass section labels after video to video without manually watching, can be improved label determine accuracy and
Efficiency.In addition, make user when searching for video based on dividing eyeglass section to determine label, it can be according to the corresponding video of tag search
Segment allows users to direct viewing video clip relevant to label, without watching entire video, so as to save use
The time at family.
Fig. 2 shows the illustrative flow charts according to the video analysis method of one embodiment of the disclosure.As shown in Fig. 2,
This method may include step S11 to step S18.
In step s 11, shot segmentation is carried out to video, obtains each point of eyeglass section of the video.
In step s 12, divide eyeglass section for any one, divide the audio of eyeglass section to carry out speech recognition this, obtain
The word content of the audio.
In step s 13, according to the word content of the audio, determine that this divides a group of labels of eyeglass section.
In step S14, divides the video frame of eyeglass section to carry out target identification this, obtain the target identification for dividing eyeglass section
As a result.
In step S15, according to the target identification as a result, determining that this divides second group of label of eyeglass section.
In step s 16, according to a group of labels and second group of label, determine that this divides the label of eyeglass section.
In step S17, it is multiple continuously divide the label of eyeglass section to match in the case where, will it is multiple continuously divide
Eyeglass section merges into combination and divides eyeglass section.
In the present embodiment, it is multiple continuously divide the label of eyeglass section to match in the case where, can determine multiple
Continuously divide the correlation of the content of eyeglass section higher, therefore, continuously eyeglass section can be divided to merge into combination point for multiple
Eyeglass section.
In step S18, according to multiple label for continuously dividing eyeglass section, determine that the combination divides the label of eyeglass section.
In one possible implementation, continuously the label of eyeglass section can be divided to merge for multiple, obtained
Amalgamation result;Duplicate label in amalgamation result is removed, the label that the combination divides eyeglass section is obtained.
For example, dividing eyeglass section 1, dividing eyeglass section 2 and eyeglass section 3 being divided to be continuously to divide eyeglass section, and divide eyeglass section 1, divide
Eyeglass section 2 and in the case where dividing the label of eyeglass section 3 to match, can will divide eyeglass section 1, and divide eyeglass section 2 and eyeglass section 3 is divided to close
And divide eyeglass section for combination, and the combination point can be determined according to the label for dividing eyeglass section 1, dividing eyeglass section 2 and dividing eyeglass section 3
The label of eyeglass section.
In this example, by it is multiple continuously divide the label of eyeglass section to match in the case where, will it is multiple continuously
Divide eyeglass section to merge into combination to divide eyeglass section, and according to multiple label for continuously dividing eyeglass section, determine that the combination divides mirror
Thus the label of segment makes user when searching for video, can divide eyeglass section according to the corresponding combination of tag search, enable users to
Enough direct viewing is relevant with label combines a point eyeglass section, without watching entire video, so as to save the time of user.
In one possible implementation, this method can also include: it is multiple continuously divide eyeglass section have it is N number of with
In the case where upper identical label, determines and multiple continuously the label of eyeglass section is divided to match, wherein N is positive integer.
For example, N is equal to 1, then it can divide eyeglass section 1, divide eyeglass section 2 and divide eyeglass section 3 for continuously dividing eyeglass section, and
In the case where dividing eyeglass section 1, dividing eyeglass section 2 and eyeglass section 3 is divided to all have label A, determines and divide eyeglass section 1, divide eyeglass section 2 and divide
The label of eyeglass section 3 matches.
For another example, N is equal to 2, then can divide eyeglass section 1, divides eyeglass section 2 and divide eyeglass section 3 continuously to divide eyeglass section, and
In the case where dividing eyeglass section 1, dividing eyeglass section 2 and eyeglass section 3 is divided to all have label A and label B, determination divides eyeglass section 1, divides eyeglass
Section 2 and divides the label of eyeglass section 3 to match.
Fig. 3 shows an illustrative flow chart of the video analysis method and step S13 according to one embodiment of the disclosure.Such as figure
Shown in 3, step S13 may include step S131 and step S132.
In step S131, word segmentation processing is carried out to the word content of the audio, obtains word segmentation result.
In this example, can using any one participle technique in the related technology, to the word content of the audio into
Row word segmentation processing, obtains word segmentation result.
In step S132, a group of labels of eyeglass section are divided using the keyword in word segmentation result as this.
In one possible implementation, word segmentation result can be determined according to the noun of specified type in word segmentation result
In keyword.For example, the noun of specified type is name, then the number of a certain name appearance can be greater than in word segmentation result
In the case where second threshold, the keyword which is determined as in word segmentation result.
Fig. 4 shows the block diagram of the video analysis device according to one embodiment of the disclosure.As shown in figure 4, the device includes: mirror
Head segmentation module 41 obtains each point of eyeglass section of the video for carrying out shot segmentation to video;Speech recognition module 42,
For dividing eyeglass section for any one, divides the audio of eyeglass section to carry out speech recognition this, obtain the word content of the audio;
First determining module 43 determines that this divides a group of labels of eyeglass section for the word content according to the audio;Target identification mould
Block 44 obtains the target identification result for dividing eyeglass section for dividing this video frame of eyeglass section to carry out target identification;Second really
Cover half block 45, for according to the target identification as a result, determine this divide second group of label of eyeglass section;Third determining module 46 is used
According to a group of labels and second group of label, determining that this divides the label of eyeglass section.
Fig. 5 shows an illustrative block diagram of the video analysis device according to one embodiment of the disclosure.It is as shown in Figure 5:
In one possible implementation, the device further include: merging module 47, for continuously dividing eyeglass multiple
In the case that the label of section matches, continuously divide eyeglass section to merge into combination to divide eyeglass section by multiple;4th determining module
48, for determining that the combination divides the label of eyeglass section according to multiple label for continuously dividing eyeglass section.
In one possible implementation, the device further include: the 5th determining module 49, at multiple continuous points
In the case that eyeglass section has N number of above identical label, determine it is multiple continuously the label of eyeglass section is divided to match,
In, N is positive integer.
In one possible implementation, target identification module 44 is used for: dividing the key frame of eyeglass section to carry out mesh this
Mark not, obtains the target identification result for dividing eyeglass section.
In one possible implementation, target identification includes one or both of object identification and recognition of face.
In one possible implementation, the first determining module 43 includes: word segmentation processing submodule 431, for this
The word content of audio carries out word segmentation processing, obtains word segmentation result;Submodule 432 is determined, for by the key in word segmentation result
Word divides a group of labels of eyeglass section as this.
The present embodiment obtains each point of eyeglass section of the video, to each point of eyeglass by carrying out shot segmentation to video
Section carries out speech recognition and target identification respectively, obtains the label of each point of eyeglass section, thus, it is possible to automatically determine each of video
A label for dividing eyeglass section labels after video to video without manually watching, can be improved label determine accuracy and
Efficiency, also, based on dividing eyeglass section to determine label, it allows users to determine the corresponding video clip of label, to enable users to
Enough direct viewing video clips relevant to label.
Fig. 6 is a kind of block diagram of device 800 for video analysis shown according to an exemplary embodiment.For example, dress
Setting 800 can be mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, medical treatment
Equipment, body-building equipment, personal digital assistant etc..
Referring to Fig. 6, device 800 may include following one or more components: processing component 802, memory 804, power supply
Component 806, multimedia component 808, audio component 810, the interface 812 of input/output (I/O), sensor module 814, and
Communication component 816.
The integrated operation of the usual control device 800 of processing component 802, such as with display, telephone call, data communication, phase
Machine operation and record operate associated operation.Processing component 802 may include that one or more processors 820 refer to execute
It enables, to perform all or part of the steps of the methods described above.In addition, processing component 802 may include one or more modules, just
Interaction between processing component 802 and other assemblies.For example, processing component 802 may include multi-media module, it is more to facilitate
Interaction between media component 808 and processing component 802.
Memory 804 is configured as storing various types of data to support the operation in device 800.These data are shown
Example includes the instruction of any application or method for operating on device 800, contact data, and telephone book data disappears
Breath, picture, video etc..Memory 804 can be by any kind of volatibility or non-volatile memory device or their group
It closes and realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile
Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash
Device, disk or CD.
Power supply module 806 provides electric power for the various assemblies of device 800.Power supply module 806 may include power management system
System, one or more power supplys and other with for device 800 generate, manage, and distribute the associated component of electric power.
Multimedia component 808 includes the screen of one output interface of offer between described device 800 and user.One
In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen
Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings
Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action
Boundary, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more matchmakers
Body component 808 includes a front camera and/or rear camera.When device 800 is in operation mode, such as screening-mode or
When video mode, front camera and/or rear camera can receive external multi-medium data.Each front camera and
Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 810 is configured as output and/or input audio signal.For example, audio component 810 includes a Mike
Wind (MIC), when device 800 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone is matched
It is set to reception external audio signal.The received audio signal can be further stored in memory 804 or via communication set
Part 816 is sent.In some embodiments, audio component 810 further includes a loudspeaker, is used for output audio signal.
I/O interface 812 provides interface between processing component 802 and peripheral interface module, and above-mentioned peripheral interface module can
To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock
Determine button.
Sensor module 814 includes one or more sensors, and the state for providing various aspects for device 800 is commented
Estimate.For example, sensor module 814 can detecte the state that opens/closes of device 800, and the relative positioning of component, for example, it is described
Component is the display and keypad of device 800, and sensor module 814 can be with 800 1 components of detection device 800 or device
Position change, the existence or non-existence that user contacts with device 800,800 orientation of device or acceleration/deceleration and device 800
Temperature change.Sensor module 814 may include proximity sensor, be configured to detect without any physical contact
Presence of nearby objects.Sensor module 814 can also include optical sensor, such as CMOS or ccd image sensor, at
As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors
Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 816 is configured to facilitate the communication of wired or wireless way between device 800 and other equipment.Device
800 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.In an exemplary implementation
In example, communication component 816 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel.
In one exemplary embodiment, the communication component 816 further includes near-field communication (NFC) module, to promote short range communication.Example
Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology,
Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, device 800 can be believed by one or more application specific integrated circuit (ASIC), number
Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array
(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, a kind of non-volatile computer readable storage medium storing program for executing is additionally provided, for example including calculating
The memory 804 of machine program instruction, above-mentioned computer program instructions can be executed above-mentioned to complete by the processor 820 of device 800
Method.
Fig. 7 is a kind of block diagram of device 1900 for video analysis shown according to an exemplary embodiment.For example, dress
Setting 1900 may be provided as a server.Referring to Fig. 7, it further comprises one that device 1900, which includes processing component 1922,
Or multiple processors and memory resource represented by a memory 1932, it can holding by processing component 1922 for storing
Capable instruction, such as application program.The application program stored in memory 1932 may include one or more each
A module for corresponding to one group of instruction.In addition, processing component 1922 is configured as executing instruction, to execute the above method.
Device 1900 can also include that a power supply module 1926 be configured as the power management of executive device 1900, and one
Wired or wireless network interface 1950 is configured as device 1900 being connected to network and input and output (I/O) interface
1958.Device 1900 can be operated based on the operating system for being stored in memory 1932, such as Windows ServerTM, Mac
OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.
In the exemplary embodiment, a kind of non-volatile computer readable storage medium storing program for executing is additionally provided, for example including calculating
The memory 1932 of machine program instruction, above-mentioned computer program instructions can be executed by the processing component 1922 of device 1900 to complete
The above method.
The disclosure can be system, method and/or computer program product.Computer program product may include computer
Readable storage medium storing program for executing, containing for making processor realize the computer-readable program instructions of various aspects of the disclosure.
Computer readable storage medium, which can be, can keep and store the tangible of the instruction used by instruction execution equipment
Equipment.Computer readable storage medium for example can be-- but it is not limited to-- storage device electric, magnetic storage apparatus, optical storage
Equipment, electric magnetic storage apparatus, semiconductor memory apparatus or above-mentioned any appropriate combination.Computer readable storage medium
More specific example (non exhaustive list) includes: portable computer diskette, hard disk, random access memory (RAM), read-only deposits
It is reservoir (ROM), erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), portable
Compact disk read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding equipment, for example thereon
It is stored with punch card or groove internal projection structure and the above-mentioned any appropriate combination of instruction.Calculating used herein above
Machine readable storage medium storing program for executing is not interpreted that instantaneous signal itself, the electromagnetic wave of such as radio wave or other Free propagations lead to
It crosses the electromagnetic wave (for example, the light pulse for passing through fiber optic cables) of waveguide or the propagation of other transmission mediums or is transmitted by electric wire
Electric signal.
Computer-readable program instructions as described herein can be downloaded to from computer readable storage medium it is each calculate/
Processing equipment, or outer computer or outer is downloaded to by network, such as internet, local area network, wide area network and/or wireless network
Portion stores equipment.Network may include copper transmission cable, optical fiber transmission, wireless transmission, router, firewall, interchanger, gateway
Computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment are received from network to be counted
Calculation machine readable program instructions, and the computer-readable program instructions are forwarded, for the meter being stored in each calculating/processing equipment
In calculation machine readable storage medium storing program for executing.
Computer program instructions for executing disclosure operation can be assembly instruction, instruction set architecture (ISA) instructs,
Machine instruction, machine-dependent instructions, microcode, firmware instructions, condition setup data or with one or more programming languages
The source code or object code that any combination is write, the programming language include the programming language-of object-oriented such as
Smalltalk, C++ etc., and conventional procedural programming languages-such as " C " language or similar programming language.Computer
Readable program instructions can be executed fully on the user computer, partly execute on the user computer, be only as one
Vertical software package executes, part executes on the remote computer or completely in remote computer on the user computer for part
Or it is executed on server.In situations involving remote computers, remote computer can pass through network-packet of any kind
It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit
It is connected with ISP by internet).In some embodiments, by utilizing computer-readable program instructions
Status information carry out personalized customization electronic circuit, such as programmable logic circuit, field programmable gate array (FPGA) or can
Programmed logic array (PLA) (PLA), the electronic circuit can execute computer-readable program instructions, to realize each side of the disclosure
Face.
Referring herein to according to the flow chart of the method, apparatus (system) of the embodiment of the present disclosure and computer program product and/
Or block diagram describes various aspects of the disclosure.It should be appreciated that flowchart and or block diagram each box and flow chart and/
Or in block diagram each box combination, can be realized by computer-readable program instructions.
These computer-readable program instructions can be supplied to general purpose computer, special purpose computer or other programmable datas
The processor of processing unit, so that a kind of machine is produced, so that these instructions are passing through computer or other programmable datas
When the processor of processing unit executes, function specified in one or more boxes in implementation flow chart and/or block diagram is produced
The device of energy/movement.These computer-readable program instructions can also be stored in a computer-readable storage medium, these refer to
It enables so that computer, programmable data processing unit and/or other equipment work in a specific way, thus, it is stored with instruction
Computer-readable medium then includes a manufacture comprising in one or more boxes in implementation flow chart and/or block diagram
The instruction of the various aspects of defined function action.
Computer-readable program instructions can also be loaded into computer, other programmable data processing units or other
In equipment, so that series of operation steps are executed in computer, other programmable data processing units or other equipment, to produce
Raw computer implemented process, so that executed in computer, other programmable data processing units or other equipment
Instruct function action specified in one or more boxes in implementation flow chart and/or block diagram.
The flow chart and block diagram in the drawings show system, method and the computer journeys according to multiple embodiments of the disclosure
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
One module of table, program segment or a part of instruction, the module, program segment or a part of instruction include one or more use
The executable instruction of the logic function as defined in realizing.In some implementations as replacements, function marked in the box
It can occur in a different order than that indicated in the drawings.For example, two continuous boxes can actually be held substantially in parallel
Row, they can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that block diagram and/or
The combination of each box in flow chart and the box in block diagram and or flow chart, can the function as defined in executing or dynamic
The dedicated hardware based system made is realized, or can be realized using a combination of dedicated hardware and computer instructions.
The presently disclosed embodiments is described above, above description is exemplary, and non-exclusive, and
It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill
Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport
In the principle, practical application or technological improvement to the technology in market for best explaining each embodiment, or lead this technology
Other those of ordinary skill in domain can understand each embodiment disclosed herein.
Claims (14)
1. a kind of video analysis method characterized by comprising
Shot segmentation is carried out to video, obtains each point of eyeglass section of the video;
Divide eyeglass section for any one, speech recognition is carried out to the audio for dividing eyeglass section, obtains the text of the audio
Content;
According to the word content of the audio, a group of labels of eyeglass section are divided described in determination;
Target identification is carried out to the video frame for dividing eyeglass section, obtains the target identification result for dividing eyeglass section;
According to the target identification as a result, dividing second group of label of eyeglass section described in determining;
According to a group of labels and second group of label, the label of eyeglass section is divided described in determination.
2. the method according to claim 1, wherein after dividing the label of eyeglass section described in the determination, the side
Method further include:
It is multiple continuously divide the label of eyeglass section to match in the case where, continuously eyeglass section is divided to merge into group for the multiple
Conjunction divides eyeglass section;
According to the multiple label for continuously dividing eyeglass section, the label for combining and dividing eyeglass section is determined.
3. according to the method described in claim 2, it is characterized in that, the method also includes:
It is multiple continuously divide eyeglass section that there is N number of above identical label in the case where, determine and the multiple continuous divide mirror
The label of segment matches, wherein N is positive integer.
4. the method according to claim 1, wherein the video frame for dividing eyeglass section to described carries out target identification,
Obtain the target identification result for dividing eyeglass section, comprising:
Target identification is carried out to the key frame for dividing eyeglass section, obtains the target identification result for dividing eyeglass section.
5. the method according to claim 1, wherein the target identification includes in object identification and recognition of face
One or two.
6. the method according to claim 1, wherein dividing mirror described in determination according to the word content of the audio
The a group of labels of segment, comprising:
Word segmentation processing is carried out to the word content of the audio, obtains word segmentation result;
Keyword in the word segmentation result is divided to a group of labels of eyeglass section as described in.
7. a kind of video analysis device characterized by comprising
Shot segmentation module obtains each point of eyeglass section of the video for carrying out shot segmentation to video;
Speech recognition module carries out speech recognition to the audio for dividing eyeglass section, obtains for dividing eyeglass section for any one
To the word content of the audio;
First determining module divides a group of labels of eyeglass section for the word content according to the audio described in determination;
Target identification module obtains the mesh for dividing eyeglass section for carrying out target identification to the video frame for dividing eyeglass section
Mark recognition result;
Second determining module, for according to the target identification as a result, dividing second group of label of eyeglass section described in determining;
Third determining module, for dividing the mark of eyeglass section described according to a group of labels and second group of label, determining
Label.
8. device according to claim 7, which is characterized in that described device further include:
Merging module, for it is multiple continuously divide the label of eyeglass section to match in the case where, will it is the multiple continuously divide
Eyeglass section merges into combination and divides eyeglass section;
4th determining module, for determining that the combination divides eyeglass section according to the multiple label for continuously dividing eyeglass section
Label.
9. device according to claim 8, which is characterized in that described device further include:
5th determining module, for it is multiple continuously divide eyeglass section have it is N number of more than identical label in the case where, determine institute
It states and multiple continuously the label of eyeglass section is divided to match, wherein N is positive integer.
10. device according to claim 7, which is characterized in that the target identification module is used for:
Target identification is carried out to the key frame for dividing eyeglass section, obtains the target identification result for dividing eyeglass section.
11. device according to claim 7, which is characterized in that the target identification includes object identification and recognition of face
One or both of.
12. device according to claim 7, which is characterized in that first determining module includes:
Word segmentation processing submodule carries out word segmentation processing for the word content to the audio, obtains word segmentation result;
Submodule is determined, for the keyword in the word segmentation result to be divided to a group of labels of eyeglass section as described in.
13. a kind of video analysis device characterized by comprising
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to method described in any one of perform claim requirement 1 to 6.
14. a kind of non-volatile computer readable storage medium storing program for executing, is stored thereon with computer program instructions, which is characterized in that institute
It states and realizes method described in any one of claim 1 to 6 when computer program instructions are executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810118854.8A CN110209877A (en) | 2018-02-06 | 2018-02-06 | Video analysis method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810118854.8A CN110209877A (en) | 2018-02-06 | 2018-02-06 | Video analysis method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110209877A true CN110209877A (en) | 2019-09-06 |
Family
ID=67778535
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810118854.8A Pending CN110209877A (en) | 2018-02-06 | 2018-02-06 | Video analysis method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110209877A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111191537A (en) * | 2019-12-19 | 2020-05-22 | 中译语通文娱科技(青岛)有限公司 | Video analysis method |
CN111506771A (en) * | 2020-04-22 | 2020-08-07 | 上海极链网络科技有限公司 | Video retrieval method, device, equipment and storage medium |
CN111711849A (en) * | 2020-06-30 | 2020-09-25 | 浙江同花顺智能科技有限公司 | Method, device and storage medium for displaying multimedia data |
CN113032342A (en) * | 2021-03-03 | 2021-06-25 | 北京车和家信息技术有限公司 | Video labeling method and device, electronic equipment and storage medium |
CN114302231A (en) * | 2021-12-31 | 2022-04-08 | 中国传媒大学 | Video processing method and device, electronic equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104506947A (en) * | 2014-12-24 | 2015-04-08 | 福州大学 | Video fast forward/fast backward speed self-adaptive regulating method based on semantic content |
CN105677735A (en) * | 2015-12-30 | 2016-06-15 | 腾讯科技(深圳)有限公司 | Video search method and apparatus |
CN106649713A (en) * | 2016-12-21 | 2017-05-10 | 中山大学 | Movie visualization processing method and system based on content |
CN106776890A (en) * | 2016-11-29 | 2017-05-31 | 北京小米移动软件有限公司 | The method of adjustment and device of video playback progress |
CN106878632A (en) * | 2017-02-28 | 2017-06-20 | 北京知慧教育科技有限公司 | A kind for the treatment of method and apparatus of video data |
CN106919652A (en) * | 2017-01-20 | 2017-07-04 | 东北石油大学 | Short-sighted frequency automatic marking method and system based on multi-source various visual angles transductive learning |
CN107133266A (en) * | 2017-03-31 | 2017-09-05 | 北京奇艺世纪科技有限公司 | The detection method and device and database update method and device of video lens classification |
-
2018
- 2018-02-06 CN CN201810118854.8A patent/CN110209877A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104506947A (en) * | 2014-12-24 | 2015-04-08 | 福州大学 | Video fast forward/fast backward speed self-adaptive regulating method based on semantic content |
CN105677735A (en) * | 2015-12-30 | 2016-06-15 | 腾讯科技(深圳)有限公司 | Video search method and apparatus |
CN106776890A (en) * | 2016-11-29 | 2017-05-31 | 北京小米移动软件有限公司 | The method of adjustment and device of video playback progress |
CN106649713A (en) * | 2016-12-21 | 2017-05-10 | 中山大学 | Movie visualization processing method and system based on content |
CN106919652A (en) * | 2017-01-20 | 2017-07-04 | 东北石油大学 | Short-sighted frequency automatic marking method and system based on multi-source various visual angles transductive learning |
CN106878632A (en) * | 2017-02-28 | 2017-06-20 | 北京知慧教育科技有限公司 | A kind for the treatment of method and apparatus of video data |
CN107133266A (en) * | 2017-03-31 | 2017-09-05 | 北京奇艺世纪科技有限公司 | The detection method and device and database update method and device of video lens classification |
Non-Patent Citations (1)
Title |
---|
王冲: "《现代信息检索技术基本原理教程》", 30 November 2013, 西安电子科技大学出版社 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111191537A (en) * | 2019-12-19 | 2020-05-22 | 中译语通文娱科技(青岛)有限公司 | Video analysis method |
CN111506771A (en) * | 2020-04-22 | 2020-08-07 | 上海极链网络科技有限公司 | Video retrieval method, device, equipment and storage medium |
CN111711849A (en) * | 2020-06-30 | 2020-09-25 | 浙江同花顺智能科技有限公司 | Method, device and storage medium for displaying multimedia data |
CN113032342A (en) * | 2021-03-03 | 2021-06-25 | 北京车和家信息技术有限公司 | Video labeling method and device, electronic equipment and storage medium |
CN113032342B (en) * | 2021-03-03 | 2023-09-05 | 北京车和家信息技术有限公司 | Video labeling method and device, electronic equipment and storage medium |
CN114302231A (en) * | 2021-12-31 | 2022-04-08 | 中国传媒大学 | Video processing method and device, electronic equipment and storage medium |
CN114302231B (en) * | 2021-12-31 | 2023-08-18 | 中国传媒大学 | Video processing method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109189987A (en) | Video searching method and device | |
CN110209877A (en) | Video analysis method and device | |
CN109089133A (en) | Method for processing video frequency and device, electronic equipment and storage medium | |
CN110121093A (en) | The searching method and device of target object in video | |
CN109729435A (en) | The extracting method and device of video clip | |
CN108833939A (en) | Generate the method and device of the poster of video | |
CN106792170A (en) | Method for processing video frequency and device | |
CN108985176A (en) | image generating method and device | |
CN107948708A (en) | Barrage methods of exhibiting and device | |
CN110121083A (en) | The generation method and device of barrage | |
CN108924644A (en) | Video clip extracting method and device | |
CN108259991A (en) | Method for processing video frequency and device | |
CN110519655A (en) | Video clipping method and device | |
CN106960014A (en) | Association user recommends method and device | |
CN109146789A (en) | Picture splicing method and device | |
CN108540850A (en) | Barrage display methods and device | |
CN109862421A (en) | A kind of video information recognition methods, device, electronic equipment and storage medium | |
CN110121106A (en) | Video broadcasting method and device | |
CN108062364A (en) | Information displaying method and device | |
CN109544716A (en) | Student registers method and device, electronic equipment and storage medium | |
CN107943550A (en) | Method for showing interface and device | |
CN108833952A (en) | The advertisement placement method and device of video | |
CN109359218A (en) | Multimedia resource methods of exhibiting and device | |
CN109344703A (en) | Method for checking object and device, electronic equipment and storage medium | |
CN108650543A (en) | The caption editing method and device of video |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200521 Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province Applicant after: Alibaba (China) Co.,Ltd. Address before: 200241 room 1162, building 555, Dongchuan Road, Shanghai, Minhang District Applicant before: SHANGHAI QUAN TOODOU CULTURAL COMMUNICATION Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190906 |
|
RJ01 | Rejection of invention patent application after publication |