CN112800183B

CN112800183B - Content name data processing method and terminal equipment

Info

Publication number: CN112800183B
Application number: CN202110212680.3A
Authority: CN
Inventors: 范辉; 马天祥; 曾四鸣; 贾伯岩; 罗蓬; 李卓; 彭鹏; 王彬志
Original assignee: State Grid Corp of China SGCC; Electric Power Research Institute of State Grid Hebei Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; Electric Power Research Institute of State Grid Hebei Electric Power Co Ltd
Priority date: 2021-02-25
Filing date: 2021-02-25
Publication date: 2023-09-26
Anticipated expiration: 2041-02-25
Also published as: CN112800183A

Abstract

The application is applicable to the technical field of data processing, and provides a content name data processing method and terminal equipment, wherein the method comprises the following steps: acquiring the name data of the content to be processed, and converting the name data of the content to be processed into an initial matrix vector according to a preset query dictionary; extracting features of the initial matrix vectors to obtain a first number of initial feature vectors; respectively performing dimension reduction on each initial feature vector to obtain a first number of target feature vectors; and linearly combining the first number of target feature vectors to obtain target code values. According to the method, the target feature vectors after feature extraction and dimension reduction are further integrated, so that semantic information can be reserved to the greatest extent, data storage consumption is reduced, and application requirements of various different scenes can be met.

Description

Content name data processing method and terminal equipment

Technical Field

The application belongs to the technical field of data processing, and particularly relates to a content name data processing method and terminal equipment.

Background

The content center network is an internet new technology which is developed in recent years, the communication process is realized by using variable-length borderless content name data instead of IP addresses, the unique route node buffer storage mechanism and the special forwarding plane structure thereof realize the data sharing in the true sense, meet the communication requirements of mobility, high reliability and the like, and greatly improve the data transmission efficiency of mobile communication.

In the prior art, the processing of variable-length borderless content name data, or the loss of a large amount of content name data information, or the generation of great memory consumption cannot meet the actual demands.

Disclosure of Invention

In view of this, the embodiment of the application provides a content name data processing method and terminal equipment, so as to solve the problems that in the prior art, the content name data has large information loss and large storage consumption, and cannot meet the actual demands in the processing method of the variable-length borderless content name data.

A first aspect of an embodiment of the present application provides a content name data processing method, including:

acquiring the name data of the content to be processed, and converting the name data of the content to be processed into an initial matrix vector according to a preset query dictionary;

extracting features of the initial matrix vectors to obtain a first number of initial feature vectors;

respectively performing dimension reduction on each initial feature vector to obtain a first number of target feature vectors;

and linearly combining the first number of target feature vectors to obtain target code values.

A second aspect of an embodiment of the present application provides a content name data processing apparatus, including:

the coding module is used for acquiring the name data of the content to be processed and converting the name data of the content to be processed into an initial matrix vector according to a preset query dictionary;

the feature extraction module is used for extracting features of the initial matrix vectors to obtain a first number of initial feature vectors;

the dimension reduction module is used for respectively reducing the dimension of each initial feature vector to obtain a first number of target feature vectors;

and the linear combination module is used for carrying out linear combination on the first number of target feature vectors to obtain target code values.

A third aspect of the embodiments of the present application provides a terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the content name data processing method as provided in the first aspect of the embodiments of the present application when the computer program is executed by the processor.

A fourth aspect of the embodiments of the present application provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the content name data processing method as provided in the first aspect of the embodiments of the present application.

The embodiment of the application provides a content name data processing method, which comprises the following steps: acquiring the name data of the content to be processed, and converting the name data of the content to be processed into an initial matrix vector according to a preset query dictionary; extracting features of the initial matrix vectors to obtain a first number of initial feature vectors; respectively performing dimension reduction on each initial feature vector to obtain a first number of target feature vectors; and linearly combining the first number of target feature vectors to obtain target code values. According to the embodiment of the application, the target feature vectors after feature extraction and dimension reduction are further integrated, so that semantic information can be reserved to the greatest extent, data storage consumption is reduced, and application requirements of various different scenes can be met.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic implementation flow diagram of a content name data processing method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a content name data processing device according to an embodiment of the present application;

fig. 3 is a schematic diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In order to illustrate the technical scheme of the application, the following description is made by specific examples.

Referring to fig. 1, an embodiment of the present application provides a content name data processing method, including:

s101: acquiring the name data of the content to be processed, and converting the name data of the content to be processed into an initial matrix vector according to a preset query dictionary;

s102: extracting features of the initial matrix vectors to obtain a first number of initial feature vectors;

s103: respectively performing dimension reduction on each initial feature vector to obtain a first number of target feature vectors;

s104: and linearly combining the first number of target feature vectors to obtain target code values.

All possible characters are encoded, and each character corresponds to a unique code value to form a preset query dictionary. In the embodiment of the application, characters in the content name data to be processed are matched with a preset query dictionary one by one according to the sequence, and a code value corresponding to each character is obtained. For example, each character in the preset query dictionary is represented as an N-dimensional vector containing only 0 and 1, e.g., character a may be represented as a 1*N-dimensional vector (1, 0 …, 0), and each character corresponds to a unique vector. The length of the actual character string of the content name data to be processed is M, character-by-character matching is carried out through a query dictionary, and the input content name data to be processed is converted into an initial matrix vector with M-dimension and N-dimension.

The linear combination is used for further integrating the target feature vectors, analyzing the proportion of different feature vectors in the semantic information expression, removing repeated features, distinguishing the importance of each feature vector, expanding the matrix vector, and linearly combining each feature to obtain the simplest target code value. The linear combination mode can be set according to practical application requirements.

According to the embodiment of the application, firstly, the content name data to be processed is encoded, further the target feature vectors after feature extraction and dimension reduction are further integrated, semantic information is reserved to the greatest extent while different features are combined, different content name data are distinguished obviously by code values according to the semantic information, the data storage consumption is reduced, and the retrieval efficiency is improved. Meanwhile, the embodiment of the application adopts character coding, is applied to the forwarding plane of the content center network, and can meet the application requirements of various different scenes.

In some embodiments, S102 may include:

s1021: and carrying out convolution operation on the initial matrix vectors to obtain a first number of initial feature vectors.

In the embodiment of the application, the feature extraction can be carried out on the initial matrix vectors through convolution operation, so as to obtain the initial feature vectors with fixed first quantity and different connotations. For example, the initial matrix vector is a matrix vector in m×n dimensions, and the initial matrix vector and the convolution kernels in x (first number) k×k dimensions are gradually subjected to convolution operation to obtain x (M-k+1) x (N-k+1) dimension initial feature vectors, so as to realize extraction of different features of the initial matrix vector. Different numbers of convolution kernels can be set according to actual application requirements to carry out convolution for multiple times, the dimension of the matrix vector is moderately reduced, and different features of the initial matrix vector are further extracted.

In some embodiments, before S102, the method may further include:

s105: and expanding the initial matrix vector according to the size of the convolution kernel in the convolution operation to obtain an expanded initial matrix vector.

Accordingly, S1021 may include:

and performing convolution operation on the expanded initial matrix vectors to obtain a first number of initial feature vectors.

The dimension of the initial matrix vector should be adapted to the size of the convolution kernel, and in order to improve the applicability of the method, the size of the convolution kernel is fixed in the embodiment of the present application. Therefore, the maximum length L1 of a large number of real name data and the length L2 of all different characters are counted, the initial matrix vector is expanded, and the zero vector is filled beyond the initial matrix vector, so that a matrix vector with L1 x L2 dimensions is formed.

In some embodiments, the size of the convolution kernel in the convolution operation is 3*3.

The choice of parameters in the convolution operation determines the breadth and depth of the syndrome extraction. The number of convolution kernels represents the number of the extracted feature vectors, the number is too large, which leads to redundancy of the feature vectors, and the number is too small, which leads to incomplete feature vector extraction, so that semantic information of content name data in practical application needs to be analyzed, and the reasonable number of convolution kernels is selected. In the embodiment of the application, the convolution kernel of 3*3 is adopted for feature extraction, so that the integrity of feature extraction is ensured.

In some embodiments, the calculation formula for the initial feature vector is:

where X is the initial matrix vector, W is the convolution kernel, i and j are the dimensions of matrix vector X, and m and n are the dimensions of matrix vector W. In the embodiment of the application, one-dimensional convolution operation is adopted for feature extraction.

In some embodiments, S103 may include:

s1031: and respectively carrying out pooling operation on each initial feature vector to obtain a first number of target feature vectors.

And (3) carrying out maximum pooling operation of p-dimension on x (M-k+1) x (N-k+1) matrix vectors obtained after convolution to finally obtain x [ (M-k+1) (N-k+1) ]/p matrix vectors. The pooling operation is used for the characteristic screening, redundancy removing and fusion processes, parameters of the pooling operation can be set according to actual needs, and redundancy information is removed and the characteristic vector dimension is reduced under the condition that semantic information is reserved as much as possible.

In some embodiments, the size of the pooling window in the pooling operation is 2×2.

The choice of parameters in the pooling operation determines the feature extraction capacity and redundancy elimination. The size of the pooling window determines the redundancy removal degree of feature extraction, if the window is too large, a part of important vector values will be lost, and if the window is too small, too many redundancy values will be reserved, so that an appropriate window size needs to be adopted, redundancy is removed under the condition of retaining important information to the greatest extent, for example, the pooling window size of 2 x 2 is adopted.

In some embodiments, the calculation formula of the target feature vector is:

P(r,s)＝max _p {S(q,l)}

wherein S is the initial feature vector, P is the target feature vector, P is the pool area size, r and S are the dimensions of the target feature vector, and q and l are the dimensions of the initial feature vector.

In some embodiments, after S103, the content name data processing method further includes:

s106: screening the first number of target feature vectors, removing the same target feature vectors, and obtaining a second number of target feature vectors;

accordingly, S104 may include:

s1041: and linearly combining the second number of target feature vectors to obtain target code values.

In the embodiment of the application, the target feature vector is screened, redundant data is removed, and the calculation efficiency is improved.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

Referring to fig. 2, an embodiment of the present application further provides a content name data processing apparatus, including:

the encoding module 21 is configured to obtain content name data to be processed, and convert the content name data to be processed into an initial matrix vector according to a preset query dictionary;

the feature extraction module 22 is configured to perform feature extraction on the initial matrix vectors to obtain a first number of initial feature vectors;

the dimension reduction module 23 is configured to reduce dimensions of each initial feature vector to obtain a first number of target feature vectors;

the linear combination module 24 is configured to perform linear combination on the first number of target feature vectors to obtain a target code value.

In some embodiments, feature extraction module 22 may include:

the convolution unit 221 is configured to perform a convolution operation on the initial matrix vectors to obtain a first number of initial feature vectors.

In some embodiments, the calculation formula for the initial feature vector is:

where X is the initial matrix vector, W is the convolution kernel, i and j are the dimensions of matrix vector X, and m and n are the dimensions of matrix vector W.

In some embodiments the dimension reduction module 23 may include:

the pooling unit 231 is configured to perform pooling operation on each initial feature vector, so as to obtain a first number of target feature vectors.

In some embodiments, the calculation formula of the target feature vector is:

P(r,s)＝max _p {S(q,l)}

In some embodiments, the content name data processing apparatus may further include:

a screening module 26, configured to screen the first number of target feature vectors, remove the same target feature vectors, and obtain a second number of target feature vectors;

correspondingly, the linear combination module 24 is further configured to perform linear combination on the second number of target feature vectors to obtain a target code value.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional units and modules, that is, the internal structure of the terminal device is divided into different functional units or modules, so as to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above device may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

Fig. 3 is a schematic block diagram of a terminal device according to an embodiment of the present application. As shown in fig. 3, the terminal device 4 of this embodiment includes: one or more processors 40, a memory 41, and a computer program 42 stored in the memory 41 and executable on the processor 40. The steps in the respective embodiments of the content name data processing method described above, such as steps S101 to S104 shown in fig. 1, are implemented when the processor 40 executes the computer program 42. Alternatively, the processor 40, when executing the computer program 42, implements the functions of the modules/units of the embodiment of the content name data processing device described above, such as the functions of the modules 21 to 24 shown in fig. 2.

Illustratively, the computer program 42 may be partitioned into one or more modules/units, which are stored in the memory 41 and executed by the processor 40 to complete the present application. One or more of the modules/units may be a series of computer program instruction segments capable of performing a specific function for describing the execution of the computer program 42 in the terminal device 4. For example, the computer program 42 may be partitioned into an encoding module 21, a feature extraction module 22, a dimension reduction module 23, and a linear combination module 24.

The terminal device 4 includes, but is not limited to, a processor 40, a memory 41. It will be appreciated by those skilled in the art that fig. 3 is only one example of a terminal device and does not constitute a limitation of the terminal device 4, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the terminal device 4 may also include an input device, an output device, a network access device, a bus, etc.

The processor 40 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 41 may be an internal storage unit of the terminal device, such as a hard disk or a memory of the terminal device. The memory 41 may also be an external storage device of the terminal device, such as a plug-in hard disk provided on the terminal device, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like. Further, the memory 41 may also include both an internal storage unit of the terminal device and an external storage device. The memory 41 is used for storing a computer program 42 and other programs and data required by the terminal device. The memory 41 may also be used to temporarily store data that has been output or is to be output.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed terminal device and method may be implemented in other manners. For example, the above-described terminal device embodiments are merely illustrative, e.g., the division of modules or units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A content name data processing method, characterized by comprising:

acquiring content name data to be processed, and converting the content name data to be processed into an initial matrix vector according to a preset query dictionary; the content name data to be processed comprises M characters, the preset query dictionary comprises a representation mode corresponding to each character, and the representation mode is a 1*N-dimensional vector;

respectively performing dimension reduction on each initial feature vector to obtain the first number of target feature vectors;

linearly combining the first number of target feature vectors to obtain a target code value;

the converting the content name data to be processed into an initial matrix vector according to a preset query dictionary includes:

matching each character in the content name data to be processed with the characters in the preset query dictionary to obtain a representation mode of M characters;

combining the expression modes of the M characters to obtain an initial matrix vector of M-N dimensions;

the feature extraction is performed on the initial matrix vectors to obtain a first number of initial feature vectors, including:

performing convolution operation on the initial matrix vector of the M x N dimension and a convolution kernel of a first number of k x k dimensions to obtain an initial feature vector of the first number of (M-k+1) x (N-k+1) dimensions;

the convolution operation is performed on the m×n initial matrix vectors and a first number of k×k convolution kernels, so as to obtain the first number of (M-k+1) x (N-k+1) initial feature vectors, where the convolution operation includes:

expanding the initial matrix vector of M-dimension and N-dimension according to the size of the convolution kernel in the convolution operation to obtain an expanded initial matrix vector, and carrying out convolution operation on the expanded initial matrix vector to obtain the initial feature vector of the first number of (M-k+1) -dimension and (N-k+1) -dimension;

the step of performing dimension reduction on each initial feature vector to obtain the first number of target feature vectors includes:

respectively carrying out pooling operation on each initial feature vector to obtain the first number of target feature vectors; the calculation formula of the target feature vector is as follows:

P（r，s）=max _p { S (M-k+1, N-k+1) }; wherein S represents an initial featureVectors, M-k+1 and N-k+1 represent the dimensions of the initial feature vector, P represents the target feature vector, P represents the size of the pooling region, max represents the pooling operation, and r and s represent the dimensions of the target feature vector;

the linear combination is used for analyzing the proportion of different target feature vectors in semantic information expression, distinguishing the importance of each target feature vector, integrating each target feature vector to obtain a target code value so as to keep the semantic information of the content name data;

after the dimension reduction is performed on each initial feature vector to obtain the first number of target feature vectors, the content name data processing method further includes:

screening the first number of target feature vectors, removing the same target feature vectors, and obtaining a second number of target feature vectors;

correspondingly, the linearly combining the first number of target feature vectors to obtain a target code value includes:

and linearly combining the second number of target feature vectors to obtain the target code value.

2. The content name data processing method according to claim 1, wherein a size of a pooling window in the pooling operation is 2 x 2.

3. A content name data processing apparatus, characterized by comprising:

the coding module is used for acquiring the name data of the content to be processed and converting the name data of the content to be processed into an initial matrix vector according to a preset query dictionary; the content name data to be processed comprises M characters, the preset query dictionary comprises a representation mode corresponding to each character, and the representation mode is a 1*N-dimensional vector;

the dimension reduction module is used for respectively reducing the dimension of each initial feature vector to obtain the first number of target feature vectors;

the linear combination module is used for carrying out linear combination on the first number of target feature vectors to obtain target code values;

the dimension reduction module is used for respectively carrying out pooling operation on each initial feature vector to obtain the first number of target feature vectors; the calculation formula of the target feature vector is as follows:

P（r，s）=max _p { S (M-k+1, N-k+1) }; wherein S represents an initial feature vector, M-k+1 and N-k+1 represent dimensions of the initial feature vector, P represents a target feature vector, P represents a pooling region size, max represents a pooling operation, and r and S represent dimensions of the target feature vector；

the screening module is used for screening the first number of target feature vectors, removing the same target feature vectors and obtaining a second number of target feature vectors;

correspondingly, the linear combination module is further configured to perform linear combination on the second number of target feature vectors to obtain the target code value.

4. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the content name data processing method according to any of claims 1 to 2 when the computer program is executed.

5. A computer-readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the content name data processing method according to any one of claims 1 to 2.