CN106528894B - The method and device of label information is set - Google Patents

The method and device of label information is set Download PDF

Info

Publication number
CN106528894B
CN106528894B CN201611235463.1A CN201611235463A CN106528894B CN 106528894 B CN106528894 B CN 106528894B CN 201611235463 A CN201611235463 A CN 201611235463A CN 106528894 B CN106528894 B CN 106528894B
Authority
CN
China
Prior art keywords
keyword
probability
information
subject information
multimedia file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611235463.1A
Other languages
Chinese (zh)
Other versions
CN106528894A (en
Inventor
高阳
丁晓亮
刘爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN201611235463.1A priority Critical patent/CN106528894B/en
Publication of CN106528894A publication Critical patent/CN106528894A/en
Application granted granted Critical
Publication of CN106528894B publication Critical patent/CN106528894B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Present disclose provides a kind of method and devices that label information is arranged, and belong to Internet technical field.The described method includes: obtaining the caption information of destination multimedia file;The caption information is segmented, the first keyword set is obtained;Each keyword in first keyword set is analyzed, the label information of the destination multimedia file is obtained;For the destination multimedia file, the label information is set.The disclosure carries out semantic analysis by the caption information to destination multimedia file, extracts the label information of destination multimedia file, the label information is arranged for the destination multimedia file.To not only increase the efficiency of setting label information, the accuracy of setting label information is also improved.

Description

The method and device of label information is set
Technical field
This disclosure relates to Internet technical field more particularly to a kind of method and device that label information is arranged.
Background technique
With the arrival of information age, the video file stored in server is more and more, and user obtains from server The difficulty of the interested video file of user is increasing.In order to reduce difficulty, label can be arranged for video file in server Information, so that user can select the interested video file of user according to the label information of video file from server.
Currently, usually the label information of the video file is defined, to be embodied as the video by manually watching video file Label information is arranged in file;The label information includes subject information belonging to the video file, for example, the label information can be Emotion or comedy etc..
Summary of the invention
To overcome the problems in correlation technique, the disclosure provides a kind of method and device that label information is arranged, institute It is as follows to state technical solution:
According to the first aspect of the embodiments of the present disclosure, a kind of method that label information is arranged is provided, which comprises
Obtain the caption information of destination multimedia file;
The caption information is segmented, the first keyword set is obtained;
Each keyword in first keyword set is analyzed, the mark of the destination multimedia file is obtained Sign information;
For the destination multimedia file, the label information is set.
In the embodiments of the present disclosure, semantic analysis is carried out by the caption information to destination multimedia file, extracts target The label information is arranged for the destination multimedia file in the label information of multimedia file.To not only increase setting label The efficiency of information also improves the accuracy of setting label information.
In a kind of possible implementation, each keyword in first keyword set is analyzed, Obtain the label information of the destination multimedia file, comprising:
Probability of each keyword in the caption information is obtained, and, it obtains each keyword and belongs to The probability of each subject information in subject information library, the subject information library is for storing multiple preset subject informations;
Belong to each theme according to probability of each keyword in the caption information and each keyword The probability of information determines that the destination multimedia file belongs to the probability of each subject information;
The probability for belonging to each subject information according to the destination multimedia file, from each subject information The maximum preset number subject information of select probability;
The preset number subject information of selection is formed to the label information of the destination multimedia file.
In the embodiments of the present disclosure, the probability according to each keyword in the caption information and each keyword belong to often The probability of a subject information determines that destination multimedia file belongs to the probability of each subject information;According to destination multimedia file The probability for belonging to each subject information, the maximum preset number subject information of select probability from each subject information, thus Improve the accuracy of setting label information.
In a kind of possible implementation, the probability and institute according to each keyword in the caption information The probability that each keyword belongs to each subject information is stated, determines that the destination multimedia file belongs to each subject information Probability, comprising:
Probability of each keyword in the caption information is formed into the first probability matrix, and, it will be described every The probability that a keyword belongs to each subject information forms the second probability matrix;
The inverse matrix of second probability matrix is multiplied with first probability matrix, obtains third probability matrix;
The probability that the destination multimedia file belongs to each subject information is obtained from the third probability matrix.
In the embodiments of the present disclosure, the probability by each keyword in the caption information forms the first probability matrix, will The probability that each keyword belongs to each subject information forms the second probability matrix, according to the first probability matrix and the second probability square Battle array, determines that the destination multimedia file belongs to the probability of each subject information, improves and determine the destination multimedia file category In the accuracy of the probability of each subject information, and then improve the accuracy of setting label information.
In a kind of possible implementation, acquisition each keyword belongs to each theme in subject information library The probability of information, comprising:
For each subject information, the corresponding predetermined keyword set of the subject information is obtained;
According to probability of each keyword in the caption information, the predetermined keyword set and described default The number for the keyword that keyword set includes determines that each keyword belongs to the probability of the subject information.
In a kind of possible implementation, the probability according to each keyword in the caption information, institute The number for stating the keyword that predetermined keyword set and the predetermined keyword set include determines that each keyword belongs to The probability of the subject information, comprising:
If including each keyword in the predetermined keyword set, by each keyword in the subtitle The ratio of the number for the keyword that probability and the predetermined keyword set in information include is as each keyword category In the probability of the subject information;
If not including each keyword in the predetermined keyword set, determine that each keyword belongs to institute The probability for stating subject information is zero.
In the embodiments of the present disclosure, the probability by each keyword in the caption information and predetermined keyword set include The ratio of number of keyword belong to the probability of the subject information as each keyword, exist due to combining each keyword Probability in the caption information determines that each keyword belongs to the probability of the subject information, improves and determine each keyword Belong to the accuracy of the probability of the subject information, and then improves the accuracy of setting label information.
It is described that the caption information is segmented in a kind of possible implementation, the first keyword set is obtained, is wrapped It includes:
The caption information is segmented, each participle for including by the caption information forms the second keyword set It closes;
By the keyword removal of preset kind in second keyword set, first set of keywords is obtained.
In the embodiments of the present disclosure, the keyword of preset kind in the second keyword set is removed, not only reduces fortune Calculation amount also improves the accuracy of setting label information.
According to the second aspect of an embodiment of the present disclosure, a kind of device that label information is set is provided, described device includes:
Module is obtained, for obtaining the caption information of destination multimedia file;
Word segmentation module obtains the first keyword set for segmenting to the caption information;
Analysis module obtains the target for analyzing each keyword in first keyword set The label information of multimedia file;
Setup module, for the label information to be arranged for the destination multimedia file.
In a kind of possible implementation, the analysis module, comprising:
First acquisition unit, for obtaining probability of each keyword in the caption information;
Second acquisition unit, for obtaining the general of each subject information that each keyword belongs in subject information library Rate, the subject information library is for storing multiple preset subject informations;
Determination unit, for the probability and each keyword according to each keyword in the caption information The probability for belonging to each subject information determines that the destination multimedia file belongs to the probability of each subject information;
Selecting unit, for belonging to the probability of each subject information according to the destination multimedia file, from described The maximum preset number subject information of select probability in each subject information;
First component units, for the preset number subject information of selection to be formed to the mark of the destination multimedia file Sign information.
In a kind of possible implementation, the determination unit is also used to believe each keyword in the subtitle Probability in breath forms the first probability matrix, and, each keyword is belonged to the probability composition the of each subject information The inverse matrix of second probability matrix is multiplied with first probability matrix, obtains third probability matrix by two probability matrixs, The probability that the destination multimedia file belongs to each subject information is obtained from the third probability matrix.
In a kind of possible implementation, the second acquisition unit is also used to obtain each subject information The corresponding predetermined keyword set of the subject information, according to probability of each keyword in the caption information, institute The number for stating the keyword that predetermined keyword set and the predetermined keyword set include determines that each keyword belongs to The probability of the subject information.
In a kind of possible implementation, the second acquisition unit is also used to, if in the predetermined keyword set Comprising each keyword, by probability of each keyword in the caption information and the predetermined keyword set The ratio of the number for the keyword for including belongs to the probability of the subject information as each keyword, if described default Each keyword is not included in keyword set, the probability for determining that each keyword belongs to the subject information is Zero.
In a kind of possible implementation, the word segmentation module, comprising:
Participle unit, for being segmented to the caption information;
Second component units, each participle for including by the caption information form the second keyword set;
Removal unit obtains described first for removing the keyword of preset kind in second keyword set Set of keywords.
According to the third aspect of an embodiment of the present disclosure, a kind of device that label information is set is provided, described device includes:
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to:
Obtain the caption information of destination multimedia file;
The caption information is segmented, the first keyword set is obtained;
Each keyword in first keyword set is analyzed, the mark of the destination multimedia file is obtained Sign information;
For the destination multimedia file, the label information is set.
The technical scheme provided by this disclosed embodiment can include the following benefits:
In the embodiments of the present disclosure, semantic analysis is carried out by the caption information to destination multimedia file, extracts target The label information is arranged for the destination multimedia file in the label information of multimedia file.To not only increase setting label The efficiency of information also improves the accuracy of setting label information.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure.
Fig. 1 is a kind of flow chart of method that label information is arranged shown according to an exemplary embodiment;
Fig. 2 is a kind of flow chart of method that label information is arranged shown according to an exemplary embodiment;
Fig. 3 is a kind of block diagram of device that label information is arranged shown according to an exemplary embodiment;
Fig. 4 is a kind of block diagram of analysis module shown according to an exemplary embodiment;
Fig. 5 is a kind of block diagram of word segmentation module shown according to an exemplary embodiment;
Fig. 6 is a kind of block diagram of device that label information is arranged shown according to an exemplary embodiment.
Specific embodiment
To keep the purposes, technical schemes and advantages of the disclosure clearer, below in conjunction with attached drawing to disclosure embodiment party Formula is described in further detail.
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.
The difficulty of the interested video file of user is obtained to reduce user from server, server needs for service Label information is arranged in video file in device, which may include subject information belonging to video file etc..To use Family can select the interested video file of user according to the label information of video file from server.
In the related art, usually the label information of video file is defined, to be embodied as by manually watching video file Label information is arranged in the video file;However the quantity of the video file in server is very big, and each video file Time length comparison is long, usually percentage clock or so;Therefore user is the low efficiency of video file setting label information manually.And And the influence of family subjective factor is benefited from, lead to the label information inaccuracy of user setting.
In the embodiments of the present disclosure, semantic analysis is carried out by the caption information to multimedia file by server, extracted The label information is arranged for the multimedia file in the label information of multimedia file.To not only increase setting label information Efficiency, also improve setting label information accuracy.
Fig. 1 is a kind of method flow diagram that label information is arranged shown according to an exemplary embodiment, and this method is held Row main body can be server, as shown in Figure 1, including the following steps.
In step s101, the caption information of destination multimedia file is obtained.
In step s 102, which is segmented, obtains the first keyword set.
In step s 103, each keyword in the first keyword set is analyzed, obtains the destination multimedia The label information of file.
In step S104, the label information is set for the destination multimedia file.
In a kind of possible implementation, each keyword in the first keyword set is analyzed, the mesh is obtained Mark the label information of multimedia file, comprising:
Probability of each keyword in the caption information is obtained, and, it obtains each keyword and belongs to subject information library In each subject information probability, the subject information library is for storing multiple preset subject informations;
Belong to the probability of each subject information according to probability of each keyword in the caption information and each keyword, Determine that the destination multimedia file belongs to the probability of each subject information;
The probability for belonging to each subject information according to the destination multimedia file, select probability is most from each subject information Big preset number subject information;
The preset number subject information of selection is formed to the label information of the destination multimedia file.
In a kind of possible implementation, according to probability and each keyword category of each keyword in the caption information In the probability of each subject information, determine that the destination multimedia file belongs to the probability of each subject information, comprising:
Probability of each keyword in the caption information is formed into the first probability matrix, and, by each keyword category The second probability matrix is formed in the probability of each subject information;
The inverse matrix of second probability matrix is multiplied with the first probability matrix, obtains third probability matrix;
The probability that the destination multimedia file belongs to each subject information is obtained from third probability matrix.
In a kind of possible implementation, the general of each subject information that each keyword belongs in subject information library is obtained Rate, comprising:
For each subject information, the corresponding predetermined keyword set of the subject information is obtained;
Include according to probability, predetermined keyword set and predetermined keyword set of each keyword in the caption information Keyword number, determine that each keyword belongs to the probability of the subject information.
In a kind of possible implementation, according to probability, predetermined keyword collection of each keyword in the caption information The number for closing the keyword for including with predetermined keyword set, determines that each keyword belongs to the probability of the subject information, comprising:
If in predetermined keyword set including each keyword, by probability of each keyword in the caption information and The ratio of the number for the keyword that predetermined keyword set includes belongs to the probability of the subject information as each keyword;
If not including each keyword in predetermined keyword set, determine that each keyword belongs to the general of the subject information Rate is zero.
In a kind of possible implementation, which is segmented, obtains the first keyword set, comprising:
The caption information is segmented, each participle for including by the caption information forms the second keyword set;
By the keyword removal of preset kind in the second keyword set, the first set of keywords is obtained.
All the above alternatives can form the alternative embodiment of the disclosure, herein no longer using any combination It repeats one by one.
Fig. 2 is a kind of method flow diagram that label information is arranged shown according to an exemplary embodiment, and this method is held Row main body can be server, as shown in Fig. 2, including the following steps.
In step s 201, server obtains the caption information of destination multimedia file.
A large amount of multimedia files are stored in server, server selects a not set label from a large amount of multimedia files The multimedia file of information is as destination multimedia file.Also, the mark and subtitle of multimedia file are stored in server The corresponding relationship of file;Correspondingly, this step can be with are as follows:
Server is closed according to the mark of the destination multimedia file from the mark and the corresponding of subtitle file of multimedia file It is more to obtain the target from the subtitle file of the destination multimedia file for the subtitle file that the destination multimedia file is obtained in system The caption information of media file.
Destination multimedia file can be video file or audio file.The mark of the destination multimedia file can be Title or number of the destination multimedia file etc..In the embodiments of the present disclosure, the mark of destination multimedia file is not made It is specific to limit.
In step S202, server segments the caption information, obtains the first keyword set.
In this step, server can segment the caption information, each participle for including by the caption information Form the first keyword set;Server can also obtain the first keyword set by following steps (1)-(2), comprising:
(1): server segments the caption information, and each participle composition second for including by the caption information is crucial Set of words.
Server segments the caption information, obtaining the caption information each of includes by presetting participle tool Participle, each participle for including by the caption information form the second keyword set.
For example, the caption information is " most to understand your people not instead of your friend, your enemy." then pass through default point Word tool segments the caption information, obtain each participle that the caption information includes be " most ", " understanding ", " you ", " people ", "no", " you ", " friend ", " but ", " you ", " enemy ", then the second keyword set be combined into " most ", " Solution ", " you ", " people ", "no", " you ", " friend ", " but ", " you ", " enemy ".
Default participle tool can be StandardAnalyzer (standardization participle tool), ChineseAnalyzer (in State segments tool), CJKAnalyzer (CJK segments tool) or IKAnalyzer (IK segments tool).In the embodiment of the present disclosure In, default participle tool is not specifically limited.
Due to " ", " ", " ", " ", " ", " most " etc keyword crucial work is not had to label information With;Therefore, in order to reduce operand and improve the accuracy of setting label information, in this step, server can also lead to Cross following steps (2) by " ", " ", " ", " ", " ", " most " etc keyword from the second keyword set It removes.
(2): server removes the keyword of preset kind in the second keyword set, obtains the first set of keywords.
The keyword of preset kind can be modal particle or auxiliary word etc..Then this step can be with are as follows: server mark second The part of speech of each keyword in keyword set, according to each keyword in the second keyword set, from the second keyword The keyword that preset kind is searched in set, the keyword of preset kind is removed from the second keyword set, obtains first Keyword set.
For example, server is by the second keyword set { " most ", " understanding ", " you ", " people ", "no", " you ", " friend Friend ", " but ", " you ", " enemy " in " most ", " you ", " people ", "no" and " but " removal, obtain the first key Word set is combined into { " understanding ", " friend ", " enemy " }.
In a possible implementation, due in the first keyword set may include synonym or near synonym, For example, " capital " and " Beijing " is synonym;Therefore, in order to reduce operand, after server obtains the first keyword set, Can also by the first keyword set multiple synonyms or near synonym merge into a keyword.Due to reducing first Therefore the quantity of keyword in keyword set reduces the operand of server, and then improve setting label information Efficiency.
In step S203, server analyzes each keyword in the first keyword set, and it is more to obtain target The label information of media file.
This step can be realized by following first way or the second way;For the first implementation, originally Step can be realized by following steps (1)-(3), comprising:
(1): server obtains probability of each keyword in the caption information.
Server obtains the frequency of occurrence that each keyword occurs in the caption information, calculates the appearance of each keyword The ratio of the sum of the frequency of occurrence of each keyword and the frequency of occurrence is determined as each keyword in the subtitle by the sum of number Probability in information.
It should be noted that if server by the first keyword set multiple synonyms or near synonym merge into One keyword, then when server obtains probability of the keyword in the caption information, server obtains the same of the keyword The sum of the frequency of occurrence that adopted word or near synonym occur in the caption information calculates the sum of the frequency of occurrence of each keyword, The sum of frequency of occurrence that the synonym of the keyword or near synonym are occurred in the caption information goes out with each keyword The ratio of the sum of occurrence number is determined as probability of the keyword in the caption information.
(2): server obtains the probability for each subject information that each keyword belongs in subject information library, theme letter Breath library is for storing multiple preset subject informations.
Preset subject information can be " friendship ", " emotion " and " love " etc..This step can pass through following steps (2- 1) it-(2-2) realizes, comprising:
(2-1): for each subject information, server obtains the corresponding predetermined keyword set of the subject information.
For each subject information in subject information library, each subject information and predetermined keyword collection are stored in server The corresponding relationship of conjunction;Correspondingly, this step can be with are as follows:
Server obtains the theme from subject information and the corresponding relationship of predetermined keyword set according to the subject information The corresponding predetermined keyword set of information.It wherein, include belonging to the multiple of the subject information to preset in the predetermined keyword set Keyword.
For example, server, which obtains subject information " friendship " corresponding predetermined keyword collection, is combined into { friend, friendship, the code of brotherhood }.
(2-2): server is according to probability of each keyword in caption information, the predetermined keyword set and this is default The number for the keyword that keyword set includes determines that each keyword belongs to the probability of the subject information.
For each keyword, whether server is detected in the predetermined keyword set comprising the keyword;If this is pre- If including the keyword in keyword set, by probability of the keyword in the caption information and the predetermined keyword set packet The ratio of the number of the keyword contained belongs to the probability of the theme as the keyword.
If not including the keyword in the predetermined keyword set, determine that the keyword belongs to the probability of the subject information It is zero.
(3): server belongs to each theme according to probability of each keyword in the caption information and each keyword The probability of information determines that destination multimedia file belongs to the probability of each subject information.
This step can be realized by following steps (3-1)-(3-3), comprising:
(3-1): probability of each keyword in the caption information is formed the first probability matrix by server, and, it will The probability that each keyword belongs to each subject information forms the second probability matrix.
Server forms the first probability matrix using probability of each keyword in the caption information as data line; For each keyword, it is general to form second as data line for the probability which is belonged to each subject information by server Rate matrix.
First probability matrix is the matrix of n × 1, and the second probability matrix is the matrix of n × m;Wherein, n is the first keyword The number for the keyword for including in set, m are the theme the number of the preset subject information for including in information bank.
For example, each keyword is respectively A, B and C;A, probability of the B and C in the caption information is respectively PA、PBAnd PC, The each subject information for including in subject information library is the theme 1, theme 2, theme 3 and theme 4 respectively;Keyword A belongs to each The probability of subject information is respectively A1, A2, A3 and A4, and the probability that keyword B belongs to each subject information is respectively B1, B2, B3 And it is respectively C1, C2, C3 and C4 that B4, keyword C, which belong to the probability of each subject information,.
Then the first probability matrix isSecond probability matrix is
(3-2): the inverse matrix of the second probability matrix is multiplied by server with the first probability matrix, obtains third probability square Battle array.
Server determines the inverse matrix of the second probability matrix according to the second probability matrix;By the inverse square of the second probability matrix Battle array is multiplied with the first probability matrix, obtains third probability matrix.Wherein, third probability matrix is the matrix of m × 1, third probability Each row of data in matrix is the probability that the destination multimedia file belongs to each subject information.
For example, server obtains third probability matrix is
(3-3): server obtains the probability that destination multimedia file belongs to each subject information from third probability matrix.
Each row of data in third probability matrix is the probability that the destination multimedia file belongs to each subject information.Clothes Business device can obtain the probability that destination multimedia file belongs to each subject information from third probability matrix.
For example, third matrix isThen P1Belong to the probability of subject information 1, P for the destination multimedia file2For the mesh Mark multimedia file belongs to the probability of subject information 2, P3Belong to the probability of subject information 3, P for the destination multimedia file4For this Destination multimedia file belongs to the probability of subject information 4.
(4): server belongs to the probability of each subject information according to the destination multimedia file, from each subject information The maximum preset number subject information of select probability.
For the ease of distinguishing, the preset number at this is known as the first preset number, the first preset number can be according to need It is configured and changes, in the embodiments of the present disclosure, the first preset number is not especially limited;For example, the first present count Mesh can be 1 or 2 etc..
(5): server believes the label that the first preset number subject information of selection forms the destination multimedia file Breath.
For example, the subject information selected is comedy and love, then the label information of the multimedia file is comedy and love.
For second of implementation, this step can be with are as follows:
Server obtains probability of each keyword in the caption information, according to each keyword in the caption information Probability, the maximum second preset number keyword of select probability, obtains belonging to the keyword of selection from each keyword Subject information, subject information belonging to the keyword by selection forms the label information of the destination multimedia file.
In step S204, server is that the label information is arranged in the destination multimedia file.
In the embodiments of the present disclosure, semantic analysis is carried out by the caption information to destination multimedia file by server, The label information for extracting multimedia file, is arranged the label information for the multimedia file.To not only increase setting label The efficiency of information also improves the accuracy of setting label information.
Fig. 3 is a kind of device block diagram that label information is arranged shown according to an exemplary embodiment.Referring to Fig. 3, the dress Set includes: to obtain module 301, word segmentation module 302, analysis module 303 and setup module 304.
Module 301 is obtained, is configured as obtaining the caption information of destination multimedia file;
Word segmentation module 302 is configured as segmenting the caption information, obtains the first keyword set;
Analysis module 303 is configured as analyzing each keyword in first keyword set, obtains institute State the label information of destination multimedia file;
Setup module 304 is configured as that the label information is arranged for the destination multimedia file.
In a kind of possible implementation, referring to fig. 4, the analysis module 303, comprising:
First acquisition unit 3031 is configured as obtaining probability of each keyword in the caption information;
Second acquisition unit 3032 is configured as obtaining each theme that each keyword belongs in subject information library The probability of information, the subject information library are configured as storing multiple preset subject informations;
Determination unit 3033 is configured as probability according to each keyword in the caption information and described every A keyword belongs to the probability of each subject information, determines that the destination multimedia file belongs to the general of each subject information Rate;
Selecting unit 3034 is configured as belonging to the general of each subject information according to the destination multimedia file Rate, the maximum preset number subject information of select probability from each subject information;
First component units 3035 are configured as the preset number subject information that will be selected and form the destination multimedia The label information of file.
In a kind of possible implementation, the determination unit 3033 is additionally configured to each keyword in institute It states the probability in caption information and forms the first probability matrix, and, each keyword is belonged into the general of each subject information Rate forms the second probability matrix, and the inverse matrix of second probability matrix is multiplied with first probability matrix, obtains third Probability matrix obtains the destination multimedia file from the third probability matrix and belongs to the general of each subject information Rate.
In a kind of possible implementation, the second acquisition unit 3032 is additionally configured to for each theme Information obtains the corresponding predetermined keyword set of the subject information, according to each keyword in the caption information Probability, the predetermined keyword set and the predetermined keyword set keyword that includes number, determine described each Keyword belongs to the probability of the subject information.
In a kind of possible implementation, the second acquisition unit 3032, if being additionally configured to the default key It include each keyword in set of words, by probability of each keyword in the caption information and the default pass The ratio of the number for the keyword that keyword set includes belongs to the probability of the subject information as each keyword, if Do not include each keyword in the predetermined keyword set, determines that each keyword belongs to the subject information Probability is zero.
In a kind of possible implementation, referring to Fig. 5, the word segmentation module 302, comprising:
Participle unit 3021 is configured as segmenting the caption information;
Second component units 3022 are configured as each participle for including by the caption information and form the second keyword set It closes;
Removal unit 3023 is configured as obtaining the keyword removal of preset kind in second keyword set First set of keywords.
In the embodiments of the present disclosure, semantic analysis is carried out by the caption information to destination multimedia file by server, The label information for extracting multimedia file, is arranged the label information for the multimedia file.To not only increase setting label The efficiency of information also improves the accuracy of setting label information.
All the above alternatives can form the alternative embodiment of the disclosure, herein no longer using any combination It repeats one by one.
It should be understood that it is provided by the above embodiment setting label information device when label information is arranged, only with The division progress of above-mentioned each functional module can according to need and for example, in practical application by above-mentioned function distribution by not Same functional module is completed, i.e., the internal structure of device is divided into different functional modules, to complete whole described above Or partial function.In addition, the device and setting label information of setting label information provided by the above embodiment are that method is implemented Example belongs to same design, and specific implementation process is detailed in embodiment of the method, and which is not described herein again.
Fig. 6 is shown according to an exemplary embodiment a kind of for the block diagram of the device 600 of label information to be arranged.Example Such as, device 600 may be provided as a server.Referring to Fig. 6, it further comprises one that device 600, which includes processing component 622, A or multiple processors, and the memory resource as representated by memory 632, can holding by processing component 622 for storing Capable instruction, such as application program.The application program stored in memory 632 may include it is one or more each Module corresponding to one group of instruction.In addition, processing component 622 is configured as executing instruction, to execute above-mentioned setting label information Method.
Device 600 can also include the power management that a power supply module 626 is configured as executive device 600, and one has Line or radio network interface 650 are configured as device 600 being connected to network and input and output (I/O) interface 658.Dress Setting 600 can operate based on the operating system for being stored in memory 632, such as Windows ServerTM, Mac OS XTM, UnixTM,LinuxTM, FreeBSDTMOr it is similar.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure Its embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by following Claim is pointed out.
It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by the accompanying claims.

Claims (9)

1. a kind of method that label information is arranged, which is characterized in that the described method includes:
Obtain the caption information of destination multimedia file;
The caption information is segmented, and in the multiple keywords obtained after participle synonym or near synonym close And obtain the first keyword set;
Probability of each keyword in first keyword set in the caption information is obtained, and, described in acquisition Each keyword belongs to the probability of each subject information in subject information library, and the subject information library is multiple default for storing Subject information;Probability of each keyword in the caption information is formed into the first probability matrix, and, it will be described The probability that each keyword belongs to each subject information forms the second probability matrix;By the inverse matrix of second probability matrix with First probability matrix is multiplied, and obtains third probability matrix;The destination multimedia is obtained from the third probability matrix File belongs to the probability of each subject information;Belong to the general of each subject information according to the destination multimedia file Rate, the maximum preset number subject information of select probability from each subject information;The preset number of selection is main Topic information forms the label information of the destination multimedia file;
For the destination multimedia file, the label information is set.
2. the method according to claim 1, wherein acquisition each keyword belongs to subject information library In each subject information probability, comprising:
For each subject information, the corresponding predetermined keyword set of the subject information is obtained;
According to probability, the predetermined keyword set and the default key of each keyword in the caption information The number for the keyword that set of words includes determines that each keyword belongs to the probability of the subject information.
3. according to the method described in claim 2, it is characterized in that, it is described according to each keyword in the caption information In probability, the predetermined keyword set and the predetermined keyword set keyword that includes number, determine described every A keyword belongs to the probability of the subject information, comprising:
If including each keyword in the predetermined keyword set, by each keyword in the caption information In the ratio of number of probability and the predetermined keyword set keyword that includes belong to institute as each keyword State the probability of subject information;
If not including each keyword in the predetermined keyword set, determine that each keyword belongs to the master The probability for inscribing information is zero.
4. obtaining first the method according to claim 1, wherein described segment the caption information Keyword set, comprising:
The caption information is segmented, each participle for including by the caption information forms the second keyword set;
By the keyword removal of preset kind in second keyword set, first set of keywords is obtained.
5. a kind of device that label information is arranged, which is characterized in that described device includes:
Module is obtained, for obtaining the caption information of destination multimedia file;
Word segmentation module, for being segmented to the caption information, and to the synonym in the multiple keywords obtained after participle Or near synonym merge, and obtain the first keyword set;
Analysis module obtains the more matchmakers of the target for analyzing each keyword in first keyword set The label information of body file;
Setup module, for the label information to be arranged for the destination multimedia file;
Wherein, the analysis module, comprising:
First acquisition unit, for obtaining probability of each keyword in the caption information;
Second acquisition unit, for obtaining the probability for each subject information that each keyword belongs in subject information library, The subject information library is for storing multiple preset subject informations;
Determination unit, for being belonged to according to probability of each keyword in the caption information and each keyword The probability of each subject information determines that the destination multimedia file belongs to the probability of each subject information;
Selecting unit, for belonging to the probability of each subject information according to the destination multimedia file, from described each The maximum preset number subject information of select probability in subject information;
First component units, the label for the preset number subject information of selection to be formed to the destination multimedia file are believed Breath;
Wherein, the determination unit, the probability composition first being also used to by each keyword in the caption information are general Rate matrix, and, the probability that each keyword belongs to each subject information is formed into the second probability matrix, by described second The inverse matrix of probability matrix is multiplied with first probability matrix, obtains third probability matrix, from the third probability matrix Obtain the probability that the destination multimedia file belongs to each subject information.
6. device according to claim 5, which is characterized in that the second acquisition unit is also used to for described each Subject information obtains the corresponding predetermined keyword set of the subject information, is believed according to each keyword in the subtitle The number for the keyword that probability, the predetermined keyword set and the predetermined keyword set in breath include, determine described in Each keyword belongs to the probability of the subject information.
7. device according to claim 6, which is characterized in that
The second acquisition unit will be described if be also used in the predetermined keyword set comprising each keyword The ratio of the number for the keyword that probability and the predetermined keyword set of each keyword in the caption information include Belong to the probability of the subject information as each keyword, if do not included in the predetermined keyword set described every A keyword, the probability for determining that each keyword belongs to the subject information is zero.
8. device according to claim 5, which is characterized in that the word segmentation module, comprising:
Participle unit, for being segmented to the caption information;
Second component units, each participle for including by the caption information form the second keyword set;
It is crucial to obtain described first for removing the keyword of preset kind in second keyword set for removal unit Word set.
9. a kind of device that label information is arranged characterized by comprising
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to:
Obtain the caption information of destination multimedia file;
The caption information is segmented, and in the multiple keywords obtained after participle synonym or near synonym close And obtain the first keyword set;
Probability of each keyword in first keyword set in the caption information is obtained, and, described in acquisition Each keyword belongs to the probability of each subject information in subject information library, and the subject information library is multiple default for storing Subject information;Probability of each keyword in the caption information is formed into the first probability matrix, and, it will be described The probability that each keyword belongs to each subject information forms the second probability matrix;By the inverse matrix of second probability matrix with First probability matrix is multiplied, and obtains third probability matrix;The destination multimedia is obtained from the third probability matrix File belongs to the probability of each subject information;Belong to the general of each subject information according to the destination multimedia file Rate, the maximum preset number subject information of select probability from each subject information;The preset number of selection is main Topic information forms the label information of the destination multimedia file;
For the destination multimedia file, the label information is set.
CN201611235463.1A 2016-12-28 2016-12-28 The method and device of label information is set Active CN106528894B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611235463.1A CN106528894B (en) 2016-12-28 2016-12-28 The method and device of label information is set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611235463.1A CN106528894B (en) 2016-12-28 2016-12-28 The method and device of label information is set

Publications (2)

Publication Number Publication Date
CN106528894A CN106528894A (en) 2017-03-22
CN106528894B true CN106528894B (en) 2019-11-15

Family

ID=58339089

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611235463.1A Active CN106528894B (en) 2016-12-28 2016-12-28 The method and device of label information is set

Country Status (1)

Country Link
CN (1) CN106528894B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107656958B (en) * 2017-06-09 2019-07-19 平安科技(深圳)有限公司 A kind of classifying method and server of multi-data source data
CN107295375A (en) * 2017-06-13 2017-10-24 中国传媒大学 Variety show content characteristic obtains system and application system
CN109213841B (en) * 2017-06-29 2021-01-01 武汉斗鱼网络科技有限公司 Live broadcast theme sample extraction method, storage medium, electronic device and system
CN107832287A (en) * 2017-09-26 2018-03-23 晶赞广告(上海)有限公司 A kind of label identification method and device, storage medium, terminal
CN108595660A (en) * 2018-04-28 2018-09-28 腾讯科技(深圳)有限公司 Label information generation method, device, storage medium and the equipment of multimedia resource
CN109753563B (en) * 2019-03-28 2019-09-10 深圳市酷开网络科技有限公司 Tag extraction method, apparatus and computer readable storage medium based on big data
CN110650364B (en) * 2019-09-27 2022-04-01 北京达佳互联信息技术有限公司 Video attitude tag extraction method and video-based interaction method
CN116092063B (en) * 2022-12-09 2024-05-17 湖南润科通信科技有限公司 Short video keyword extraction method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853250A (en) * 2009-04-03 2010-10-06 华为技术有限公司 Method and device for classifying documents
CN102855312B (en) * 2012-08-24 2013-08-14 武汉大学 Domain-and-theme-oriented Web service clustering method
CN103744835A (en) * 2014-01-02 2014-04-23 上海大学 Text keyword extracting method based on subject model
CN104239373A (en) * 2013-06-24 2014-12-24 腾讯科技(深圳)有限公司 Document tag adding method and document tag adding device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853250A (en) * 2009-04-03 2010-10-06 华为技术有限公司 Method and device for classifying documents
CN102855312B (en) * 2012-08-24 2013-08-14 武汉大学 Domain-and-theme-oriented Web service clustering method
CN104239373A (en) * 2013-06-24 2014-12-24 腾讯科技(深圳)有限公司 Document tag adding method and document tag adding device
CN103744835A (en) * 2014-01-02 2014-04-23 上海大学 Text keyword extracting method based on subject model

Also Published As

Publication number Publication date
CN106528894A (en) 2017-03-22

Similar Documents

Publication Publication Date Title
CN106528894B (en) The method and device of label information is set
CN106649818B (en) Application search intention identification method and device, application search method and server
TWI653542B (en) Method, system and device for discovering and tracking hot topics based on network media data flow
JP5449628B2 (en) Determining category information using multistage
US10552422B2 (en) Extended search method and apparatus
US20150278359A1 (en) Method and apparatus for generating a recommendation page
CN103294778B (en) A kind of method and system pushing information
CN108595679B (en) Label determining method, device, terminal and storage medium
CN104751354B (en) A kind of advertisement crowd screening technique
CN103136228A (en) Image search method and image search device
US10346496B2 (en) Information category obtaining method and apparatus
CN110909120B (en) Resume searching/delivering method, device and system and electronic equipment
US20190266406A1 (en) Automatically detecting contents expressing emotions from a video and enriching an image index
CN111241389A (en) Sensitive word filtering method and device based on matrix, electronic equipment and storage medium
CN108376164B (en) Display method and device of potential anchor
CN104915426B (en) Information sorting method, the method and device for generating information sorting model
CN104915359B (en) Theme label recommended method and device
CN110968789B (en) Electronic book pushing method, electronic equipment and computer storage medium
CN105574030B (en) A kind of information search method and device
Wicaksono et al. Automatic extraction of advice-revealing sentences foradvice mining from online forums
de Oliveira et al. FS-NER: A lightweight filter-stream approach to named entity recognition on twitter data
Jeon et al. Hashtag recommendation based on user tweet and hashtag classification on twitter
CN103559313B (en) Searching method and device
CN103902596B (en) High frequency content of pages clustering method and system
CN105159927B (en) Method and device for selecting subject term of target text and terminal

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant