CN112800183B - Content name data processing method and terminal equipment - Google Patents

Content name data processing method and terminal equipment Download PDF

Info

Publication number
CN112800183B
CN112800183B CN202110212680.3A CN202110212680A CN112800183B CN 112800183 B CN112800183 B CN 112800183B CN 202110212680 A CN202110212680 A CN 202110212680A CN 112800183 B CN112800183 B CN 112800183B
Authority
CN
China
Prior art keywords
initial
vector
name data
feature vectors
target feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110212680.3A
Other languages
Chinese (zh)
Other versions
CN112800183A (en
Inventor
范辉
马天祥
曾四鸣
贾伯岩
罗蓬
李卓
彭鹏
王彬志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Electric Power Research Institute of State Grid Hebei Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Electric Power Research Institute of State Grid Hebei Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Electric Power Research Institute of State Grid Hebei Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202110212680.3A priority Critical patent/CN112800183B/en
Publication of CN112800183A publication Critical patent/CN112800183A/en
Application granted granted Critical
Publication of CN112800183B publication Critical patent/CN112800183B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application is applicable to the technical field of data processing, and provides a content name data processing method and terminal equipment, wherein the method comprises the following steps: acquiring the name data of the content to be processed, and converting the name data of the content to be processed into an initial matrix vector according to a preset query dictionary; extracting features of the initial matrix vectors to obtain a first number of initial feature vectors; respectively performing dimension reduction on each initial feature vector to obtain a first number of target feature vectors; and linearly combining the first number of target feature vectors to obtain target code values. According to the method, the target feature vectors after feature extraction and dimension reduction are further integrated, so that semantic information can be reserved to the greatest extent, data storage consumption is reduced, and application requirements of various different scenes can be met.

Description

Content name data processing method and terminal equipment
Technical Field
The application belongs to the technical field of data processing, and particularly relates to a content name data processing method and terminal equipment.
Background
The content center network is an internet new technology which is developed in recent years, the communication process is realized by using variable-length borderless content name data instead of IP addresses, the unique route node buffer storage mechanism and the special forwarding plane structure thereof realize the data sharing in the true sense, meet the communication requirements of mobility, high reliability and the like, and greatly improve the data transmission efficiency of mobile communication.
In the prior art, the processing of variable-length borderless content name data, or the loss of a large amount of content name data information, or the generation of great memory consumption cannot meet the actual demands.
Disclosure of Invention
In view of this, the embodiment of the application provides a content name data processing method and terminal equipment, so as to solve the problems that in the prior art, the content name data has large information loss and large storage consumption, and cannot meet the actual demands in the processing method of the variable-length borderless content name data.
A first aspect of an embodiment of the present application provides a content name data processing method, including:
acquiring the name data of the content to be processed, and converting the name data of the content to be processed into an initial matrix vector according to a preset query dictionary;
extracting features of the initial matrix vectors to obtain a first number of initial feature vectors;
respectively performing dimension reduction on each initial feature vector to obtain a first number of target feature vectors;
and linearly combining the first number of target feature vectors to obtain target code values.
A second aspect of an embodiment of the present application provides a content name data processing apparatus, including:
the coding module is used for acquiring the name data of the content to be processed and converting the name data of the content to be processed into an initial matrix vector according to a preset query dictionary;
the feature extraction module is used for extracting features of the initial matrix vectors to obtain a first number of initial feature vectors;
the dimension reduction module is used for respectively reducing the dimension of each initial feature vector to obtain a first number of target feature vectors;
and the linear combination module is used for carrying out linear combination on the first number of target feature vectors to obtain target code values.
A third aspect of the embodiments of the present application provides a terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the content name data processing method as provided in the first aspect of the embodiments of the present application when the computer program is executed by the processor.
A fourth aspect of the embodiments of the present application provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the content name data processing method as provided in the first aspect of the embodiments of the present application.
The embodiment of the application provides a content name data processing method, which comprises the following steps: acquiring the name data of the content to be processed, and converting the name data of the content to be processed into an initial matrix vector according to a preset query dictionary; extracting features of the initial matrix vectors to obtain a first number of initial feature vectors; respectively performing dimension reduction on each initial feature vector to obtain a first number of target feature vectors; and linearly combining the first number of target feature vectors to obtain target code values. According to the embodiment of the application, the target feature vectors after feature extraction and dimension reduction are further integrated, so that semantic information can be reserved to the greatest extent, data storage consumption is reduced, and application requirements of various different scenes can be met.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic implementation flow diagram of a content name data processing method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a content name data processing device according to an embodiment of the present application;
fig. 3 is a schematic diagram of a terminal device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
In order to illustrate the technical scheme of the application, the following description is made by specific examples.
Referring to fig. 1, an embodiment of the present application provides a content name data processing method, including:
s101: acquiring the name data of the content to be processed, and converting the name data of the content to be processed into an initial matrix vector according to a preset query dictionary;
s102: extracting features of the initial matrix vectors to obtain a first number of initial feature vectors;
s103: respectively performing dimension reduction on each initial feature vector to obtain a first number of target feature vectors;
s104: and linearly combining the first number of target feature vectors to obtain target code values.
All possible characters are encoded, and each character corresponds to a unique code value to form a preset query dictionary. In the embodiment of the application, characters in the content name data to be processed are matched with a preset query dictionary one by one according to the sequence, and a code value corresponding to each character is obtained. For example, each character in the preset query dictionary is represented as an N-dimensional vector containing only 0 and 1, e.g., character a may be represented as a 1*N-dimensional vector (1, 0 …, 0), and each character corresponds to a unique vector. The length of the actual character string of the content name data to be processed is M, character-by-character matching is carried out through a query dictionary, and the input content name data to be processed is converted into an initial matrix vector with M-dimension and N-dimension.
The linear combination is used for further integrating the target feature vectors, analyzing the proportion of different feature vectors in the semantic information expression, removing repeated features, distinguishing the importance of each feature vector, expanding the matrix vector, and linearly combining each feature to obtain the simplest target code value. The linear combination mode can be set according to practical application requirements.
According to the embodiment of the application, firstly, the content name data to be processed is encoded, further the target feature vectors after feature extraction and dimension reduction are further integrated, semantic information is reserved to the greatest extent while different features are combined, different content name data are distinguished obviously by code values according to the semantic information, the data storage consumption is reduced, and the retrieval efficiency is improved. Meanwhile, the embodiment of the application adopts character coding, is applied to the forwarding plane of the content center network, and can meet the application requirements of various different scenes.
In some embodiments, S102 may include:
s1021: and carrying out convolution operation on the initial matrix vectors to obtain a first number of initial feature vectors.
In the embodiment of the application, the feature extraction can be carried out on the initial matrix vectors through convolution operation, so as to obtain the initial feature vectors with fixed first quantity and different connotations. For example, the initial matrix vector is a matrix vector in m×n dimensions, and the initial matrix vector and the convolution kernels in x (first number) k×k dimensions are gradually subjected to convolution operation to obtain x (M-k+1) x (N-k+1) dimension initial feature vectors, so as to realize extraction of different features of the initial matrix vector. Different numbers of convolution kernels can be set according to actual application requirements to carry out convolution for multiple times, the dimension of the matrix vector is moderately reduced, and different features of the initial matrix vector are further extracted.
In some embodiments, before S102, the method may further include:
s105: and expanding the initial matrix vector according to the size of the convolution kernel in the convolution operation to obtain an expanded initial matrix vector.
Accordingly, S1021 may include:
and performing convolution operation on the expanded initial matrix vectors to obtain a first number of initial feature vectors.
The dimension of the initial matrix vector should be adapted to the size of the convolution kernel, and in order to improve the applicability of the method, the size of the convolution kernel is fixed in the embodiment of the present application. Therefore, the maximum length L1 of a large number of real name data and the length L2 of all different characters are counted, the initial matrix vector is expanded, and the zero vector is filled beyond the initial matrix vector, so that a matrix vector with L1 x L2 dimensions is formed.
In some embodiments, the size of the convolution kernel in the convolution operation is 3*3.
The choice of parameters in the convolution operation determines the breadth and depth of the syndrome extraction. The number of convolution kernels represents the number of the extracted feature vectors, the number is too large, which leads to redundancy of the feature vectors, and the number is too small, which leads to incomplete feature vector extraction, so that semantic information of content name data in practical application needs to be analyzed, and the reasonable number of convolution kernels is selected. In the embodiment of the application, the convolution kernel of 3*3 is adopted for feature extraction, so that the integrity of feature extraction is ensured.
In some embodiments, the calculation formula for the initial feature vector is:
where X is the initial matrix vector, W is the convolution kernel, i and j are the dimensions of matrix vector X, and m and n are the dimensions of matrix vector W. In the embodiment of the application, one-dimensional convolution operation is adopted for feature extraction.
In some embodiments, S103 may include:
s1031: and respectively carrying out pooling operation on each initial feature vector to obtain a first number of target feature vectors.
And (3) carrying out maximum pooling operation of p-dimension on x (M-k+1) x (N-k+1) matrix vectors obtained after convolution to finally obtain x [ (M-k+1) (N-k+1) ]/p matrix vectors. The pooling operation is used for the characteristic screening, redundancy removing and fusion processes, parameters of the pooling operation can be set according to actual needs, and redundancy information is removed and the characteristic vector dimension is reduced under the condition that semantic information is reserved as much as possible.
In some embodiments, the size of the pooling window in the pooling operation is 2×2.
The choice of parameters in the pooling operation determines the feature extraction capacity and redundancy elimination. The size of the pooling window determines the redundancy removal degree of feature extraction, if the window is too large, a part of important vector values will be lost, and if the window is too small, too many redundancy values will be reserved, so that an appropriate window size needs to be adopted, redundancy is removed under the condition of retaining important information to the greatest extent, for example, the pooling window size of 2 x 2 is adopted.
In some embodiments, the calculation formula of the target feature vector is:
P(r,s)=max p {S(q,l)}
wherein S is the initial feature vector, P is the target feature vector, P is the pool area size, r and S are the dimensions of the target feature vector, and q and l are the dimensions of the initial feature vector.
In some embodiments, after S103, the content name data processing method further includes:
s106: screening the first number of target feature vectors, removing the same target feature vectors, and obtaining a second number of target feature vectors;
accordingly, S104 may include:
s1041: and linearly combining the second number of target feature vectors to obtain target code values.
In the embodiment of the application, the target feature vector is screened, redundant data is removed, and the calculation efficiency is improved.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
Referring to fig. 2, an embodiment of the present application further provides a content name data processing apparatus, including:
the encoding module 21 is configured to obtain content name data to be processed, and convert the content name data to be processed into an initial matrix vector according to a preset query dictionary;
the feature extraction module 22 is configured to perform feature extraction on the initial matrix vectors to obtain a first number of initial feature vectors;
the dimension reduction module 23 is configured to reduce dimensions of each initial feature vector to obtain a first number of target feature vectors;
the linear combination module 24 is configured to perform linear combination on the first number of target feature vectors to obtain a target code value.
In some embodiments, feature extraction module 22 may include:
the convolution unit 221 is configured to perform a convolution operation on the initial matrix vectors to obtain a first number of initial feature vectors.
In some embodiments, the size of the convolution kernel in the convolution operation is 3*3.
In some embodiments, the calculation formula for the initial feature vector is:
where X is the initial matrix vector, W is the convolution kernel, i and j are the dimensions of matrix vector X, and m and n are the dimensions of matrix vector W.
In some embodiments the dimension reduction module 23 may include:
the pooling unit 231 is configured to perform pooling operation on each initial feature vector, so as to obtain a first number of target feature vectors.
In some embodiments, the size of the pooling window in the pooling operation is 2×2.
In some embodiments, the calculation formula of the target feature vector is:
P(r,s)=max p {S(q,l)}
wherein S is the initial feature vector, P is the target feature vector, P is the pool area size, r and S are the dimensions of the target feature vector, and q and l are the dimensions of the initial feature vector.
In some embodiments, the content name data processing apparatus may further include:
a screening module 26, configured to screen the first number of target feature vectors, remove the same target feature vectors, and obtain a second number of target feature vectors;
correspondingly, the linear combination module 24 is further configured to perform linear combination on the second number of target feature vectors to obtain a target code value.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional units and modules, that is, the internal structure of the terminal device is divided into different functional units or modules, so as to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above device may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
Fig. 3 is a schematic block diagram of a terminal device according to an embodiment of the present application. As shown in fig. 3, the terminal device 4 of this embodiment includes: one or more processors 40, a memory 41, and a computer program 42 stored in the memory 41 and executable on the processor 40. The steps in the respective embodiments of the content name data processing method described above, such as steps S101 to S104 shown in fig. 1, are implemented when the processor 40 executes the computer program 42. Alternatively, the processor 40, when executing the computer program 42, implements the functions of the modules/units of the embodiment of the content name data processing device described above, such as the functions of the modules 21 to 24 shown in fig. 2.
Illustratively, the computer program 42 may be partitioned into one or more modules/units, which are stored in the memory 41 and executed by the processor 40 to complete the present application. One or more of the modules/units may be a series of computer program instruction segments capable of performing a specific function for describing the execution of the computer program 42 in the terminal device 4. For example, the computer program 42 may be partitioned into an encoding module 21, a feature extraction module 22, a dimension reduction module 23, and a linear combination module 24.
The encoding module 21 is configured to obtain content name data to be processed, and convert the content name data to be processed into an initial matrix vector according to a preset query dictionary;
the feature extraction module 22 is configured to perform feature extraction on the initial matrix vectors to obtain a first number of initial feature vectors;
the dimension reduction module 23 is configured to reduce dimensions of each initial feature vector to obtain a first number of target feature vectors;
the linear combination module 24 is configured to perform linear combination on the first number of target feature vectors to obtain a target code value.
The terminal device 4 includes, but is not limited to, a processor 40, a memory 41. It will be appreciated by those skilled in the art that fig. 3 is only one example of a terminal device and does not constitute a limitation of the terminal device 4, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the terminal device 4 may also include an input device, an output device, a network access device, a bus, etc.
The processor 40 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 41 may be an internal storage unit of the terminal device, such as a hard disk or a memory of the terminal device. The memory 41 may also be an external storage device of the terminal device, such as a plug-in hard disk provided on the terminal device, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like. Further, the memory 41 may also include both an internal storage unit of the terminal device and an external storage device. The memory 41 is used for storing a computer program 42 and other programs and data required by the terminal device. The memory 41 may also be used to temporarily store data that has been output or is to be output.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed terminal device and method may be implemented in other manners. For example, the above-described terminal device embodiments are merely illustrative, e.g., the division of modules or units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (5)

1. A content name data processing method, characterized by comprising:
acquiring content name data to be processed, and converting the content name data to be processed into an initial matrix vector according to a preset query dictionary; the content name data to be processed comprises M characters, the preset query dictionary comprises a representation mode corresponding to each character, and the representation mode is a 1*N-dimensional vector;
extracting features of the initial matrix vectors to obtain a first number of initial feature vectors;
respectively performing dimension reduction on each initial feature vector to obtain the first number of target feature vectors;
linearly combining the first number of target feature vectors to obtain a target code value;
the converting the content name data to be processed into an initial matrix vector according to a preset query dictionary includes:
matching each character in the content name data to be processed with the characters in the preset query dictionary to obtain a representation mode of M characters;
combining the expression modes of the M characters to obtain an initial matrix vector of M-N dimensions;
the feature extraction is performed on the initial matrix vectors to obtain a first number of initial feature vectors, including:
performing convolution operation on the initial matrix vector of the M x N dimension and a convolution kernel of a first number of k x k dimensions to obtain an initial feature vector of the first number of (M-k+1) x (N-k+1) dimensions;
the convolution operation is performed on the m×n initial matrix vectors and a first number of k×k convolution kernels, so as to obtain the first number of (M-k+1) x (N-k+1) initial feature vectors, where the convolution operation includes:
expanding the initial matrix vector of M-dimension and N-dimension according to the size of the convolution kernel in the convolution operation to obtain an expanded initial matrix vector, and carrying out convolution operation on the expanded initial matrix vector to obtain the initial feature vector of the first number of (M-k+1) -dimension and (N-k+1) -dimension;
the step of performing dimension reduction on each initial feature vector to obtain the first number of target feature vectors includes:
respectively carrying out pooling operation on each initial feature vector to obtain the first number of target feature vectors; the calculation formula of the target feature vector is as follows:
P(r,s)=max p { S (M-k+1, N-k+1) }; wherein S represents an initial featureVectors, M-k+1 and N-k+1 represent the dimensions of the initial feature vector, P represents the target feature vector, P represents the size of the pooling region, max represents the pooling operation, and r and s represent the dimensions of the target feature vector;
the linear combination is used for analyzing the proportion of different target feature vectors in semantic information expression, distinguishing the importance of each target feature vector, integrating each target feature vector to obtain a target code value so as to keep the semantic information of the content name data;
after the dimension reduction is performed on each initial feature vector to obtain the first number of target feature vectors, the content name data processing method further includes:
screening the first number of target feature vectors, removing the same target feature vectors, and obtaining a second number of target feature vectors;
correspondingly, the linearly combining the first number of target feature vectors to obtain a target code value includes:
and linearly combining the second number of target feature vectors to obtain the target code value.
2. The content name data processing method according to claim 1, wherein a size of a pooling window in the pooling operation is 2 x 2.
3. A content name data processing apparatus, characterized by comprising:
the coding module is used for acquiring the name data of the content to be processed and converting the name data of the content to be processed into an initial matrix vector according to a preset query dictionary; the content name data to be processed comprises M characters, the preset query dictionary comprises a representation mode corresponding to each character, and the representation mode is a 1*N-dimensional vector;
the feature extraction module is used for extracting features of the initial matrix vectors to obtain a first number of initial feature vectors;
the dimension reduction module is used for respectively reducing the dimension of each initial feature vector to obtain the first number of target feature vectors;
the linear combination module is used for carrying out linear combination on the first number of target feature vectors to obtain target code values;
the converting the content name data to be processed into an initial matrix vector according to a preset query dictionary includes:
matching each character in the content name data to be processed with the characters in the preset query dictionary to obtain a representation mode of M characters;
combining the expression modes of the M characters to obtain an initial matrix vector of M-N dimensions;
the feature extraction is performed on the initial matrix vectors to obtain a first number of initial feature vectors, including:
performing convolution operation on the initial matrix vector of the M x N dimension and a convolution kernel of a first number of k x k dimensions to obtain an initial feature vector of the first number of (M-k+1) x (N-k+1) dimensions;
the convolution operation is performed on the m×n initial matrix vectors and a first number of k×k convolution kernels, so as to obtain the first number of (M-k+1) x (N-k+1) initial feature vectors, where the convolution operation includes:
expanding the initial matrix vector of M-dimension and N-dimension according to the size of the convolution kernel in the convolution operation to obtain an expanded initial matrix vector, and carrying out convolution operation on the expanded initial matrix vector to obtain the initial feature vector of the first number of (M-k+1) -dimension and (N-k+1) -dimension;
the dimension reduction module is used for respectively carrying out pooling operation on each initial feature vector to obtain the first number of target feature vectors; the calculation formula of the target feature vector is as follows:
P(r,s)=max p { S (M-k+1, N-k+1) }; wherein S represents an initial feature vector, M-k+1 and N-k+1 represent dimensions of the initial feature vector, P represents a target feature vector, P represents a pooling region size, max represents a pooling operation, and r and S represent dimensions of the target feature vector;
The linear combination is used for analyzing the proportion of different target feature vectors in semantic information expression, distinguishing the importance of each target feature vector, integrating each target feature vector to obtain a target code value so as to keep the semantic information of the content name data;
the screening module is used for screening the first number of target feature vectors, removing the same target feature vectors and obtaining a second number of target feature vectors;
correspondingly, the linear combination module is further configured to perform linear combination on the second number of target feature vectors to obtain the target code value.
4. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the content name data processing method according to any of claims 1 to 2 when the computer program is executed.
5. A computer-readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the content name data processing method according to any one of claims 1 to 2.
CN202110212680.3A 2021-02-25 2021-02-25 Content name data processing method and terminal equipment Active CN112800183B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110212680.3A CN112800183B (en) 2021-02-25 2021-02-25 Content name data processing method and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110212680.3A CN112800183B (en) 2021-02-25 2021-02-25 Content name data processing method and terminal equipment

Publications (2)

Publication Number Publication Date
CN112800183A CN112800183A (en) 2021-05-14
CN112800183B true CN112800183B (en) 2023-09-26

Family

ID=75815847

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110212680.3A Active CN112800183B (en) 2021-02-25 2021-02-25 Content name data processing method and terminal equipment

Country Status (1)

Country Link
CN (1) CN112800183B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114510525B (en) * 2022-04-18 2022-08-30 深圳丰尚智慧农牧科技有限公司 Data format conversion method and device, computer equipment and storage medium

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1345441A (en) * 1999-12-17 2002-04-17 索尼公司 Information processor and processing method and information storage medium
CN107562729A (en) * 2017-09-14 2018-01-09 云南大学 The Party building document representation method strengthened based on neutral net and theme
CN107908757A (en) * 2017-11-21 2018-04-13 恒安嘉新(北京)科技股份公司 Website classification method and system
CN108055529A (en) * 2017-12-25 2018-05-18 国家电网公司 Electric power unmanned plane and robot graphics' data normalization artificial intelligence analysis's system
CN108804423A (en) * 2018-05-30 2018-11-13 平安医疗健康管理股份有限公司 Medical Text character extraction and automatic matching method and system
CN109213975A (en) * 2018-08-23 2019-01-15 重庆邮电大学 It is a kind of that special document representation method is pushed away from coding based on character level convolution variation
CN109255377A (en) * 2018-08-30 2019-01-22 北京信立方科技发展股份有限公司 Instrument recognition methods, device, electronic equipment and storage medium
CN110019793A (en) * 2017-10-27 2019-07-16 阿里巴巴集团控股有限公司 A kind of text semantic coding method and device
CN110162601A (en) * 2019-05-22 2019-08-23 吉林大学 A kind of biomedical publication submission recommender system based on deep learning
JP2019149161A (en) * 2018-02-27 2019-09-05 株式会社リコー Method for generating word expression, device, and computer-readable storage medium
CN110557439A (en) * 2019-08-07 2019-12-10 中国联合网络通信集团有限公司 Network content management method and block chain content network platform
CN111339775A (en) * 2020-02-11 2020-06-26 平安科技(深圳)有限公司 Named entity identification method, device, terminal equipment and storage medium
CN111666482A (en) * 2019-03-06 2020-09-15 珠海格力电器股份有限公司 Query method and device, storage medium and processor
WO2020224219A1 (en) * 2019-05-06 2020-11-12 平安科技(深圳)有限公司 Chinese word segmentation method and apparatus, electronic device and readable storage medium
CN112149710A (en) * 2019-06-28 2020-12-29 英特尔公司 Machine-generated content naming in information-centric networks

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11366990B2 (en) * 2017-05-15 2022-06-21 International Business Machines Corporation Time-series representation learning via random time warping
US20190251480A1 (en) * 2018-02-09 2019-08-15 NEC Laboratories Europe GmbH Method and system for learning of classifier-independent node representations which carry class label information
US11182559B2 (en) * 2019-03-26 2021-11-23 Siemens Aktiengesellschaft System and method for natural language processing

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1345441A (en) * 1999-12-17 2002-04-17 索尼公司 Information processor and processing method and information storage medium
CN107562729A (en) * 2017-09-14 2018-01-09 云南大学 The Party building document representation method strengthened based on neutral net and theme
CN110019793A (en) * 2017-10-27 2019-07-16 阿里巴巴集团控股有限公司 A kind of text semantic coding method and device
CN107908757A (en) * 2017-11-21 2018-04-13 恒安嘉新(北京)科技股份公司 Website classification method and system
CN108055529A (en) * 2017-12-25 2018-05-18 国家电网公司 Electric power unmanned plane and robot graphics' data normalization artificial intelligence analysis's system
JP2019149161A (en) * 2018-02-27 2019-09-05 株式会社リコー Method for generating word expression, device, and computer-readable storage medium
CN108804423A (en) * 2018-05-30 2018-11-13 平安医疗健康管理股份有限公司 Medical Text character extraction and automatic matching method and system
CN109213975A (en) * 2018-08-23 2019-01-15 重庆邮电大学 It is a kind of that special document representation method is pushed away from coding based on character level convolution variation
CN109255377A (en) * 2018-08-30 2019-01-22 北京信立方科技发展股份有限公司 Instrument recognition methods, device, electronic equipment and storage medium
CN111666482A (en) * 2019-03-06 2020-09-15 珠海格力电器股份有限公司 Query method and device, storage medium and processor
WO2020224219A1 (en) * 2019-05-06 2020-11-12 平安科技(深圳)有限公司 Chinese word segmentation method and apparatus, electronic device and readable storage medium
CN110162601A (en) * 2019-05-22 2019-08-23 吉林大学 A kind of biomedical publication submission recommender system based on deep learning
CN112149710A (en) * 2019-06-28 2020-12-29 英特尔公司 Machine-generated content naming in information-centric networks
CN110557439A (en) * 2019-08-07 2019-12-10 中国联合网络通信集团有限公司 Network content management method and block chain content network platform
CN111339775A (en) * 2020-02-11 2020-06-26 平安科技(深圳)有限公司 Named entity identification method, device, terminal equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
内容中心网络移动终端数据优化挖掘模型仿真;李晓东;魏惠茹;;科技通报(10);全文 *
徐洁磐.《人工智能导论》.中国铁道出版社有限公司,2019,第116-117页. *
胡盼盼.《自然语言处理从入门到实践》.中国铁道出版社有限公司,2020,第54-56页. *
软件定义的内容中心网络的多域分段路由机制;李根;伊鹏;张震;;计算机应用研究(09);全文 *

Also Published As

Publication number Publication date
CN112800183A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN106549673B (en) Data compression method and device
CN110830435A (en) Method and device for extracting network flow space-time characteristics and detecting abnormity
JP6681313B2 (en) Method, computer program and system for encoding data
CN106849956B (en) Compression method, decompression method, device and data processing system
CN112800183B (en) Content name data processing method and terminal equipment
CN114614829A (en) Satellite data frame processing method and device, electronic equipment and readable storage medium
CN110769263A (en) Image compression method and device and terminal equipment
CN108880559B (en) Data compression method, data decompression method, compression equipment and decompression equipment
CN106293542B (en) Method and device for decompressing file
CN111178513B (en) Convolution implementation method and device of neural network and terminal equipment
CN111384972A (en) Optimization method and device of multi-system LDPC decoding algorithm and decoder
WO2023159820A1 (en) Image compression method, image decompression method, and apparatuses
CN104682966B (en) The lossless compression method of table data
WO2022179355A1 (en) Data processing method and apparatus for sample adaptive offset sideband compensating mode
CN115765756A (en) Lossless data compression method, system and device for high-speed transparent transmission
CN111224674B (en) Decoding method, device and decoder for multi-system LDPC code
CN111049836A (en) Data processing method, electronic device and computer readable storage medium
CN110913220A (en) Video frame coding method and device and terminal equipment
CN102395031B (en) Data compression method
CN113595557B (en) Data processing method and device
CN112686966B (en) Lossless image compression method and device
CN108989813A (en) A kind of high efficiency of compression/decompression method, computer installation and storage medium
CN115062673B (en) Image processing method, image processing device, electronic equipment and storage medium
CN112669396B (en) Lossless image compression method and device
CN112200301B (en) Convolution computing device and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant