[go: up one dir, main page]

CN114238304A - A label generation method, device, computer equipment and storage medium - Google Patents

A label generation method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN114238304A
CN114238304A CN202111601380.0A CN202111601380A CN114238304A CN 114238304 A CN114238304 A CN 114238304A CN 202111601380 A CN202111601380 A CN 202111601380A CN 114238304 A CN114238304 A CN 114238304A
Authority
CN
China
Prior art keywords
label
tag
tags
original
data table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111601380.0A
Other languages
Chinese (zh)
Other versions
CN114238304B (en
Inventor
刘新宇
王霏
王彪
胡玉玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xinguodu Digital Technology Co ltd
Original Assignee
Shenzhen Xinguodu Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xinguodu Digital Technology Co ltd filed Critical Shenzhen Xinguodu Digital Technology Co ltd
Priority to CN202111601380.0A priority Critical patent/CN114238304B/en
Publication of CN114238304A publication Critical patent/CN114238304A/en
Application granted granted Critical
Publication of CN114238304B publication Critical patent/CN114238304B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种标签生成方法、装置、计算机设备及存储介质,该方法包括:获取包含不同类别标签的历史数据记录,并基于历史数据记录中的标签构建一标签元数据表;将历史数据记录中的标签汇总归纳为标签元数据表中的标签配置信息;基于标签配置信息对标签元数据表中的各标签进行提取,并根据提取得到的标签构建一原始标签数据表;通过预设的标签映射表对原始标签数据表进行标签清洗处理,并将标签清洗后的原始标签数据表设置为最终的标签数据表。本发明通过将历史数据记录中的标签归纳总结为标签配置信息,并以此构建原始标签数据表,然后对所述原始标签数据表进行清洗等处理,得到最终的标签数据表,如此可以提高标签生成效率和标签管理效果。

Figure 202111601380

The invention discloses a label generation method, device, computer equipment and storage medium. The method includes: acquiring historical data records containing different types of labels, and constructing a label metadata table based on the labels in the historical data records; The tags in the records are summarized into tag configuration information in the tag metadata table; each tag in the tag metadata table is extracted based on the tag configuration information, and an original tag data table is constructed according to the extracted tags; The label mapping table performs label cleaning on the original label data table, and sets the original label data table after label cleaning as the final label data table. The present invention summarizes the tags in the historical data records into tag configuration information, and constructs the original tag data table based on this, and then cleans the original tag data table to obtain the final tag data table, so that the tag data table can be improved. Build efficiency and label management effects.

Figure 202111601380

Description

Label generation method and device, computer equipment and storage medium
Technical Field
The present invention relates to the field of computer software technologies, and in particular, to a tag generation method and apparatus, a computer device, and a storage medium.
Background
The existing label generation technology is generally based on manual label definition and arrangement at a database field level, then configured into a system background, and generates logic of each label through code correspondence, but the method has the problem of large labor consumption. In addition, the extraction and calculation of the labels are performed based on the traditional relational database, but when the data volume is large, the defects of insufficient data storage space, low calculation efficiency and the like exist. Meanwhile, when performing tag management, metadata of each tag needs to be manually maintained, which may cause omission, errors, and the like when the number of tags is too large.
Disclosure of Invention
The embodiment of the invention provides a label generation method, a label generation device, computer equipment and a storage medium, and aims to improve label generation efficiency and label management effect.
In a first aspect, an embodiment of the present invention provides a tag generation method, including:
acquiring a historical data record containing different types of labels, and constructing a label metadata table based on the labels in the historical data record;
summarizing and summarizing the labels in the historical data records into label configuration information in the label metadata table;
extracting each label in the label metadata table based on the label configuration information, and constructing an original label data table according to the extracted label;
and carrying out label cleaning treatment on the original label data table through a preset label mapping table, and setting the original label data table after label cleaning as a final label data table.
In a second aspect, an embodiment of the present invention provides a tag generation apparatus, including:
the data acquisition unit is used for acquiring historical data records containing different types of labels and constructing a label metadata table based on the labels in the historical data records;
the summarizing and summarizing unit is used for summarizing and summarizing the labels in the historical data record into label configuration information in the label metadata table;
the tag extraction unit is used for extracting each tag in the tag metadata table based on the tag configuration information and constructing an original tag data table according to the extracted tag;
and the label cleaning unit is used for carrying out label cleaning treatment on the original label data table through a preset label mapping table and setting the original label data table after label cleaning as a final label data table.
In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the tag generation method according to the first aspect when executing the computer program.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the tag generation method according to the first aspect.
The embodiment of the invention provides a label generation method, a label generation device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring a historical data record containing different types of labels, and constructing a label metadata table based on the labels in the historical data record; summarizing and summarizing the labels in the historical data records into label configuration information in the label metadata table; extracting each label in the label metadata table based on the label configuration information, and constructing an original label data table according to the extracted label; and carrying out label cleaning treatment on the original label data table through a preset label mapping table, and setting the original label data table after label cleaning as a final label data table. According to the embodiment of the invention, the labels in the historical data record are summarized and summarized into the label configuration information, the original label data table is constructed according to the label configuration information, and then the original label data table is cleaned and the like by combining the label mapping table, so that the final label data table is obtained, and the label generation efficiency and the label management effect can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a tag generation method according to an embodiment of the present invention;
fig. 2 is a schematic block diagram of a tag generation apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1, fig. 1 is a schematic flow chart of a tag generation method according to an embodiment of the present invention, which specifically includes: steps S101 to S104.
S101, acquiring historical data records containing different types of labels, and constructing a label metadata table based on the labels in the historical data records;
s102, summarizing and summarizing the labels in the historical data record into label configuration information in the label metadata table;
s103, extracting each label in the label metadata table based on the label configuration information, and constructing an original label data table according to the extracted label;
and S104, carrying out label cleaning treatment on the original label data table through a preset label mapping table, and setting the original label data table after label cleaning as a final label data table.
In this embodiment, the labels in the history data record are summarized and summarized into label configuration information, an original label data table is constructed according to the label configuration information, and then the original label data table is cleaned and the like by combining a label mapping table, so that a final label data table is obtained, and thus, the label generation efficiency and the label management effect can be improved.
In addition, the embodiment also summarizes and summarizes the tags into the tag configuration information in a tag customization mode, so that the generation of complex tags which cannot be defined by the tag types is covered. And in the generation process, the generation of the label of the complex logic can be realized through an sql statement.
In one embodiment, the step S102 includes:
when the number of the data values corresponding to the labels in the historical data record is multiple, summarizing the labels into a row of multiple labels;
when the number of the corresponding data values of the labels in the historical data record is one, summarizing the labels into a list of single labels;
when the labels in the historical data records correspond to the identification codes, the labels are summarized into one-hot coded labels;
when the label in the historical data record is a table head name, summarizing the label into a table label;
when the data value corresponding to the label in the historical data record is obtained through calculation, the label is summarized into a self-defined label;
setting the list of multi-tags, the list of single tags, the one-hot coded tags, the table tags and the custom tags as the tag configuration information.
In this embodiment, the tags are respectively summarized into a list of multi-tags, a list of single-tags, single-hot-coded tags, table tags and custom tags according to the data values or identifiers of the tags in the history data record, and further, the list of multi-tags, the list of single-tags, the single-hot-coded tags, the table tags and the custom tags are summarized as the tag configuration information.
For example, if one column in the history record is: 'type of business', where the data value is 'five hundred strong businesses, listed companies', it can be seen that two tags (i.e. five hundred strong businesses, listed companies) are separated by a separator ',' i.e. by configuring the separator, the tags can be automatically split, thus defining the type of tags as 'one column of multiple tags'; if one column in the historical data record is 'green channel enterprise', the qualified green channel enterprise can store a 'green channel' value in a corresponding field, so that the type of label can be defined as 'a column of single labels'; if a column name in the historical data record is 'whether China is 500 strong', the storage value in the column is 0 or 1, namely if the corresponding enterprise is China 500 strong, the corresponding enterprise is marked as 1, otherwise, the corresponding enterprise is 0, and the label in the situation can be defined as a 'one-hot coded label'; if the table name in the history data record is 'green channel enterprise list', the label in this case can be defined as 'the table itself is a label', that is, a table label; for the tag data, the tag data needs to be obtained through calculation, and can be defined as a 'custom tag', and meanwhile, a specific calculation rule is defined by configuring tag calculation logic (sql).
In one embodiment, the step S103 includes:
extracting the labels in the label metadata table in batches based on the list of multi-labels, the list of single labels, the single-hot coded labels, the table labels and the custom labels;
and sequentially setting the labels extracted from the batches in order to construct the original label data table.
In this embodiment, after the tags are summarized into different types, the tags in the tag metadata table can all correspond to respective tag configuration information, that is, the types of the tags to which the tags belong. Then extracting the labels in the label metadata table in batches according to types according to the label configuration information, arranging the labels in the extraction sequence according to the regions or in the sequence, and setting the formed form as the original label data table. Of course, in the original tag data table, the order between different types of tags can be freely set, for example, a list of tags is located before a list of individual tags, or a unique thermally encoded tag is located after a custom tag, and so on.
In an embodiment, the tag generation method further includes:
and acquiring an enterprise standard information table, and correcting and completing the enterprise basic information in the original label data table based on the enterprise standard information table so as to correctly associate each label in the original label data table with an enterprise.
In this embodiment, according to an existing enterprise standard information table (that is, the existing enterprise standard information table includes real and accurate enterprise basic information, such as an enterprise name, a unicode, a registration number, and the like), operations such as correcting and complementing the enterprise basic information (such as a name, a unicode, a registration number, and the like) in the original tag data table are performed, so as to ensure that the tags in the original tag data table can be associated with correct enterprises and enterprise information.
In one embodiment, the step S104 includes:
based on the label mapping table, mapping labels with different names and the same meaning in the original label data table into a standard label;
and acquiring labels with the same name in the original label data table, and performing duplicate removal processing on the labels with the same name according to the enterprise standard information table.
In this embodiment, the tag mapping table is configured with information on how tags from different data tables should be cleaned, and a standard name of the tag. For example, a business is labeled as 'china 500 strong' and 'five hundred strong' in two different data tables, respectively, and the two labels have different names but identical meanings, so that by mapping the two labels onto a standard 'china five hundred strong' label in the label mapping table, the consistency and accuracy of the final label can be ensured. In one embodiment, based on the tags in the history and past tag configuration experience, a tag mapping system may be constructed, by which it may be determined whether to map the name of the tag and to which standard name the tag should be mapped.
Further, in an embodiment, the tag generation method further includes:
setting an effective period for the label after mapping processing and de-duplication processing;
and disabling the failed label and the expired label in the original label data table.
In this embodiment, validity and expiration date are set for the tag, so that the tag can be used only within the expiration date, and if the expiration date is exceeded, the tag loses its validity, and is changed to a failed tag or an expired tag. For the invalid label and the overdue label, the invalid label and the overdue label can be forbidden in time, so that the invalid or overdue label is prevented from being found after being used, and the use experience degree can be improved.
In an embodiment, the tag generation method further includes:
loading a Hive configuration file based on Spark SQL, and acquiring the metadata information of the Hive;
storing the tag data table as a Hive table through the metadata information of the Hive;
and correspondingly operating the Hive table based on Spark SQL.
In this embodiment, by using a big data technology, data storage and processing are performed on the tag data table based on Hive (data warehouse tool) and Spark (a calculation engine), etc., a tag calculation process can be concurrently processed, a storage space of data is reduced, tag generation efficiency is improved, and linear promotion can be achieved by laterally expanding hardware resources. Specifically, the Hive configuration file is loaded through Spark SQL to obtain corresponding metadata information, and the tag data table is stored as a Hive table, and then corresponding operations, such as query, update, and the like, may be performed on the Hive table through Spark SQL. Further, by means of distributed computing, millions of data labels can be generated in a very short time. Through the steps, the accuracy and the timeliness of the generated label data can be effectively guaranteed, meanwhile, only few manual intervention processes are needed, and compared with the existing label generation technology, the method has the advantage of greatly improving the efficiency.
Fig. 2 is a schematic block diagram of a tag generation apparatus 200 according to an embodiment of the present invention, where the apparatus 200 includes:
the data acquisition unit 201 is used for acquiring historical data records containing different types of tags and constructing a tag metadata table based on the tags in the historical data records;
a summary summarization unit 202, configured to summarize the tags in the history data record into tag configuration information in the tag metadata table;
the tag extraction unit 203 is configured to extract each tag in the tag metadata table based on the tag configuration information, and construct an original tag data table according to the extracted tag;
and the label cleaning unit 204 is configured to perform label cleaning processing on the original label data table through a preset label mapping table, and set the original label data table after label cleaning as a final label data table.
In one embodiment, the summary summarization unit 202 comprises:
the first induction unit is used for inducing the labels into a column of multi-labels when the number of the data values corresponding to the labels in the historical data record is multiple;
the second induction unit is used for inducing the labels into a list of single labels when the corresponding data value of the label in the historical data record is one;
the third storage unit is used for summarizing the label in the historical data record into a one-hot coded label when the label corresponds to the identification code;
a fourth induction unit, configured to induce a tag in the history data record into a table tag when the tag is a table header name;
a fifth induction unit, configured to induce a tag into a custom tag when a data value corresponding to the tag in the historical data record is obtained through calculation;
and the summarizing unit is used for setting the list of multi-labels, the list of single labels, the one-hot coded labels, the table labels and the custom labels as the label configuration information.
In one embodiment, the tag extracting unit 203 includes:
the batch extraction unit is used for extracting the batches of the tags in the tag metadata table based on the list of multi-tags, the list of single tags, the one-hot coded tags, the table tags and the custom tags;
and the sequence setting unit is used for sequentially setting the labels extracted from the batches in sequence so as to construct the original label data table.
In one embodiment, the tag generation apparatus 200 further comprises:
and the correction and completion unit is used for acquiring an enterprise standard information table, and correcting and completing the enterprise basic information in the original label data table based on the enterprise standard information table so as to correctly associate each label in the original label data table with the enterprise.
In one embodiment, the label washing unit 204 includes:
the label mapping unit is used for mapping labels with different names and the same meaning in the original label data table into a standard label based on the label mapping table;
and the label duplication removing unit is used for acquiring labels with the same name in the original label data table and carrying out duplication removing processing on the labels with the same name according to the enterprise standard information table.
In one embodiment, the tag generation apparatus 200 further comprises:
the time limit setting unit is used for setting the valid time limit for the label after the mapping processing and the de-duplication processing;
and the disabling unit is used for disabling the failed label and the expired label in the original label data table.
In one embodiment, the tag generation apparatus 200 further comprises:
the file loading unit is used for loading the Hive configuration file based on Spark SQL and acquiring the metadata information of the Hive;
a data storage unit, configured to store the tag data table as a Hive table through the metadata information of Hive;
and the table operation unit is used for carrying out corresponding operation on the Hive table based on Spark SQL.
Since the embodiments of the apparatus portion and the method portion correspond to each other, please refer to the description of the embodiments of the method portion for the embodiments of the apparatus portion, which is not repeated here.
Embodiments of the present invention also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed, the steps provided by the above embodiments can be implemented. The storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The embodiment of the present invention further provides a computer device, which may include a memory and a processor, where the memory stores a computer program, and the processor may implement the steps provided in the above embodiments when calling the computer program in the memory. Of course, the computer device may also include various network interfaces, power supplies, and the like.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1.一种标签生成方法,其特征在于,包括:1. a label generation method, is characterized in that, comprises: 获取包含不同类别标签的历史数据记录,并基于历史数据记录中的标签构建一标签元数据表;Obtain historical data records containing tags of different categories, and build a tag metadata table based on the tags in the historical data records; 将所述历史数据记录中的标签汇总归纳为所述标签元数据表中的标签配置信息;Summarize the tags in the historical data records into tag configuration information in the tag metadata table; 基于所述标签配置信息对所述标签元数据表中的各标签进行提取,并根据提取得到的标签构建一原始标签数据表;Extracting each tag in the tag metadata table based on the tag configuration information, and constructing an original tag data table according to the extracted tags; 通过预设的标签映射表对所述原始标签数据表进行标签清洗处理,并将标签清洗后的原始标签数据表设置为最终的标签数据表。Label cleaning is performed on the original label data table through a preset label mapping table, and the original label data table after label cleaning is set as the final label data table. 2.根据权利要求1所述的标签生成方法,其特征在于,所述将所述历史数据记录中的标签汇总归纳为所述标签元数据表中的标签配置信息,包括:2. The method for generating labels according to claim 1, wherein the summarizing the labels in the historical data records into label configuration information in the label metadata table, comprising: 当所述历史数据记录中的标签对应数据值为多个时,则将所述标签归纳为一列多标签;When there are multiple data values corresponding to the tags in the historical data record, the tags are summarized into one column of multiple tags; 当所述历史数据记录中的标签对应数据值为一个时,则将所述标签归纳为一列单标签;When the data value corresponding to the tag in the historical data record is one, the tag is summarized into a list of single tags; 当所述历史数据记录中的标签对应有标识编码时,则将所述标签归纳为独热编码标签;When the label in the historical data record corresponds to an identification code, the label is summarized as a one-hot encoded label; 当所述历史数据记录中的标签为表头名时,则将所述标签归纳为表标签;When the label in the historical data record is the header name, the label is summarized as a table label; 当所述历史数据记录中的标签对应数据值通过计算得到时,则将所述标签归纳为自定义标签;When the data value corresponding to the tag in the historical data record is obtained by calculation, the tag is summarized as a custom tag; 将所述一列多标签、一列单标签、独热编码标签、表标签和自定义标签设置为所述标签配置信息。The one-column multi-label, one-column single-label, one-hot encoding label, table label, and custom label are set as the label configuration information. 3.根据权利要求2所述的标签生成方法,其特征在于,所述基于所述标签配置信息对所述标签元数据表中的各标签进行提取,并根据提取得到的标签构建一原始标签数据表,包括:3. The label generation method according to claim 2, characterized in that, each label in the label metadata table is extracted based on the label configuration information, and an original label data is constructed according to the extracted label. Table, including: 基于所述一列多标签、一列单标签、独热编码标签、表标签和自定义标签,对所述标签元数据表中的标签进行批次提取;Based on the one-column multi-label, one-column single-label, one-hot encoding label, table label and custom label, batch extraction is performed on the labels in the label metadata table; 将批次提取的标签按顺序依次设置,以此构建所述原始标签数据表。The labels extracted from the batches are set in sequence to construct the original label data table. 4.根据权利要求1所述的标签生成方法,其特征在于,还包括:4. label generation method according to claim 1, is characterized in that, also comprises: 获取企业标准信息表,并基于所述企业标准信息表对所述原始标签数据表中的企业基本信息进行矫正、补全处理,以使所述原始标签数据表中的各标签与企业正确关联。The enterprise standard information table is obtained, and the basic information of the enterprise in the original label data table is corrected and supplemented based on the enterprise standard information table, so that each label in the original label data table is correctly associated with the enterprise. 5.根据权利要求4所述的标签生成方法,其特征在于,所述通过预设的标签映射表对所述原始标签数据表进行标签清洗处理,并将标签清洗后的原始标签数据表设置为最终的标签数据表,包括:5 . The label generation method according to claim 4 , wherein the label cleaning process is performed on the original label data table through a preset label mapping table, and the original label data table after label cleaning is set as 5 . The final label data sheet, including: 基于所述标签映射表,将所述原始标签数据表中名称不同、含义相同的标签映射为一个标准标签;Based on the label mapping table, the labels with different names and the same meaning in the original label data table are mapped to a standard label; 获取所述原始标签数据表中名称相同的标签,并根据所述企业标准信息表对名称相同的标签进行去重处理。Acquire tags with the same name in the original tag data table, and perform deduplication processing on the tags with the same name according to the enterprise standard information table. 6.根据权利要求5所述的标签生成方法,其特征在于,还包括:6. label generation method according to claim 5, is characterized in that, also comprises: 对经过映射处理和去重处理后的标签设置有效期限;Set the validity period for the tags after mapping processing and deduplication processing; 对所述原始标签数据表中的失效标签和过期标签进行禁用处理。Disabling processing is performed on the invalid labels and expired labels in the original label data table. 7.根据权利要求1所述的标签生成方法,其特征在于,还包括:7. label generation method according to claim 1, is characterized in that, also comprises: 基于Spark SQL加载Hive的配置文件,并获取得到Hive的元数据信息;Load the Hive configuration file based on Spark SQL, and obtain the metadata information of Hive; 通过所述Hive的元数据信息将所述标签数据表存储为Hive表;Store the tag data table as a Hive table through the metadata information of the Hive; 基于Spark SQL对所述Hive表进行相应操作。Perform corresponding operations on the Hive table based on Spark SQL. 8.一种标签生成装置,其特征在于,包括:8. A label generating device, characterized in that, comprising: 数据获取单元,用于获取包含不同类别标签的历史数据记录,并基于历史数据记录中的标签构建一标签元数据表;A data acquisition unit, used to acquire historical data records containing tags of different categories, and build a tag metadata table based on the tags in the historical data records; 汇总归纳单元,用于将所述历史数据记录中的标签汇总归纳为所述标签元数据表中的标签配置信息;a summary and summarization unit for summarizing the tags in the historical data records into tag configuration information in the tag metadata table; 标签提取单元,用于基于所述标签配置信息对所述标签元数据表中的各标签进行提取,并根据提取得到的标签构建一原始标签数据表;A label extraction unit, configured to extract each label in the label metadata table based on the label configuration information, and construct an original label data table according to the extracted labels; 标签清洗单元,用于通过预设的标签映射表对所述原始标签数据表进行标签清洗处理,并将标签清洗后的原始标签数据表设置为最终的标签数据表。The label cleaning unit is configured to perform label cleaning processing on the original label data table through a preset label mapping table, and set the original label data table after label cleaning as the final label data table. 9.一种计算机设备,其特征在于,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至7任一项所述的标签生成方法。9. A computer device, characterized in that it comprises a memory, a processor and a computer program stored on the memory and running on the processor, the processor implementing the computer program as claimed in the claims The label generation method described in any one of 1 to 7. 10.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述的标签生成方法。10. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the label according to any one of claims 1 to 7 is implemented generate method.
CN202111601380.0A 2021-12-24 2021-12-24 A label generation method, device, computer equipment and storage medium Active CN114238304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111601380.0A CN114238304B (en) 2021-12-24 2021-12-24 A label generation method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111601380.0A CN114238304B (en) 2021-12-24 2021-12-24 A label generation method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114238304A true CN114238304A (en) 2022-03-25
CN114238304B CN114238304B (en) 2025-03-28

Family

ID=80762770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111601380.0A Active CN114238304B (en) 2021-12-24 2021-12-24 A label generation method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114238304B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060067334A1 (en) * 2004-08-18 2006-03-30 Ougarov Andrei V System and methods for dynamic generation of point / tag configurations
CN101266679A (en) * 2008-02-15 2008-09-17 上海申通轨道交通研究咨询有限公司 A track traffic auxiliary decision system and method
US20150142764A1 (en) * 2013-11-20 2015-05-21 International Business Machines Corporation Language tag management on international data storage
CN108614862A (en) * 2018-03-28 2018-10-02 国家计算机网络与信息安全管理中心 Real-time tag treating method and apparatus based on stream calculation engine
US20200026710A1 (en) * 2018-07-19 2020-01-23 Bank Of Montreal Systems and methods for data storage and processing
CN111061733A (en) * 2019-12-10 2020-04-24 北京明略软件系统有限公司 Data processing method, apparatus, electronic device and computer-readable storage medium
CN111324647A (en) * 2020-01-21 2020-06-23 北京东方金信科技有限公司 Method and device for generating ETL code
CN111813770A (en) * 2020-09-03 2020-10-23 平安国际智慧城市科技股份有限公司 Data model construction method and device and computer readable storage medium
CN112579638A (en) * 2019-09-29 2021-03-30 北京国双科技有限公司 Behavior tag information processing method and device, computer equipment and storage medium
CN112613567A (en) * 2020-12-28 2021-04-06 江苏满运物流信息有限公司 User label management method, system, device and storage medium
CN113485987A (en) * 2021-06-30 2021-10-08 中国建设银行股份有限公司 Enterprise information tag generation method and device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060067334A1 (en) * 2004-08-18 2006-03-30 Ougarov Andrei V System and methods for dynamic generation of point / tag configurations
CN101266679A (en) * 2008-02-15 2008-09-17 上海申通轨道交通研究咨询有限公司 A track traffic auxiliary decision system and method
US20150142764A1 (en) * 2013-11-20 2015-05-21 International Business Machines Corporation Language tag management on international data storage
CN108614862A (en) * 2018-03-28 2018-10-02 国家计算机网络与信息安全管理中心 Real-time tag treating method and apparatus based on stream calculation engine
US20200026710A1 (en) * 2018-07-19 2020-01-23 Bank Of Montreal Systems and methods for data storage and processing
CN112579638A (en) * 2019-09-29 2021-03-30 北京国双科技有限公司 Behavior tag information processing method and device, computer equipment and storage medium
CN111061733A (en) * 2019-12-10 2020-04-24 北京明略软件系统有限公司 Data processing method, apparatus, electronic device and computer-readable storage medium
CN111324647A (en) * 2020-01-21 2020-06-23 北京东方金信科技有限公司 Method and device for generating ETL code
CN111813770A (en) * 2020-09-03 2020-10-23 平安国际智慧城市科技股份有限公司 Data model construction method and device and computer readable storage medium
CN112613567A (en) * 2020-12-28 2021-04-06 江苏满运物流信息有限公司 User label management method, system, device and storage medium
CN113485987A (en) * 2021-06-30 2021-10-08 中国建设银行股份有限公司 Enterprise information tag generation method and device

Also Published As

Publication number Publication date
CN114238304B (en) 2025-03-28

Similar Documents

Publication Publication Date Title
CN112199366B (en) Data table processing method, device and equipment
CN103995879B (en) Data query method, apparatus and system based on OLAP system
CN110704411A (en) Knowledge graph building method and device suitable for art field and electronic equipment
CN111639066A (en) Data cleaning method and device
CN103514201A (en) Method and device for querying data in non-relational database
WO2019165671A1 (en) Method for rapidly importing big data, apparatus, terminal device, and storage medium
CN112559726A (en) Resume information filtering method, model training method, device, equipment and medium
CN112241445B (en) Labeling method and device, electronic equipment and storage medium
CN108255877B (en) Storage method and device of referee document
CN112948473A (en) Data processing method, device and system of data warehouse and storage medium
US8438153B2 (en) Performing database joins
CN113326276A (en) Graph database updating method and device
CN112416904A (en) Electric power data standardization processing method and device
CN115470290A (en) Increment synchronization method and device based on materialized view logs and computer equipment
CN112241399B (en) NoSQL-based PSD-BPA data analysis and management method and system
CN110147396B (en) A method and device for generating a mapping relationship
CN114238304A (en) A label generation method, device, computer equipment and storage medium
CN113672653A (en) Method and apparatus for identifying private data in a database
CN112612810A (en) Slow SQL statement identification method and system
CN116662327B (en) Data fusion cleaning method for database
CN118013364A (en) A method for intelligent identification of multidimensional data
CN108647243B (en) Industrial big data storage method based on time series
CN111797108A (en) A method and device for updating an analysis database
CN104573095A (en) Large-scale object recognition method based on Hadoop frame
CN106599112A (en) Massive incomplete data storage and operation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant