CN117807177A - Data processing method, device, equipment and storage medium - Google Patents
Data processing method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN117807177A CN117807177A CN202211157393.8A CN202211157393A CN117807177A CN 117807177 A CN117807177 A CN 117807177A CN 202211157393 A CN202211157393 A CN 202211157393A CN 117807177 A CN117807177 A CN 117807177A
- Authority
- CN
- China
- Prior art keywords
- features
- metadata
- data set
- data
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Business, Economics & Management (AREA)
- Finance (AREA)
- Data Mining & Analysis (AREA)
- Accounting & Taxation (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- Health & Medical Sciences (AREA)
- Entrepreneurship & Innovation (AREA)
- Probability & Statistics with Applications (AREA)
- Databases & Information Systems (AREA)
- General Business, Economics & Management (AREA)
- Economics (AREA)
- Game Theory and Decision Science (AREA)
- Marketing (AREA)
- Machine Translation (AREA)
Abstract
Description
技术领域Technical field
本申请实施例涉及数据处理技术领域,尤其涉及一种数据处理方法、装置、设备及存储介质。The embodiments of the present application relate to the field of data processing technology, and in particular, to a data processing method, apparatus, device and storage medium.
背景技术Background technique
多个系统中的数据可以形成一个数据集,可以对该数据集进行数据处理以形成资源目录,通过该资源目录可以共享该多个系统中的数据。Data in multiple systems can form a data set, and data processing can be performed on the data set to form a resource catalog through which data in the multiple systems can be shared.
在相关技术中,在获取得到数据集之后,可以由技术人员手动的对数据集进行处理,以确定得到数据集对应的资源目录。然而,当数据集中的数据量较大时,技术人员在短时间内无法及时根据数据集确定得到对应的资源目录,导致资源目录的确定效率较低。In the related art, after obtaining the data set, technical personnel can manually process the data set to determine the resource directory corresponding to the data set. However, when the amount of data in the data set is large, technicians cannot determine the corresponding resource directory based on the data set in a short period of time, resulting in low efficiency in determining the resource directory.
发明内容Contents of the invention
本申请实施例提供一种数据处理方法、装置、设备及存储介质,用以解决资源目录的确定效率较低的问题。Embodiments of the present application provide a data processing method, device, equipment and storage medium to solve the problem of low efficiency in determining resource directories.
第一方面,本申请实施例提供一种数据处理方法,包括:In the first aspect, embodiments of the present application provide a data processing method, including:
获取待处理的数据集,所述数据集中包括从多个设备获取的数据;Obtaining a data set to be processed, the data set including data obtained from multiple devices;
对所述数据集进行特征提取处理,得到所述数据集的元数据特征和语义特征;Perform feature extraction processing on the data set to obtain metadata features and semantic features of the data set;
获取所述数据集对应的用户特征和预设的策略特征,所述用户特征为使用所述数据集的用户的特征;Obtain user characteristics and preset policy characteristics corresponding to the data set, where the user characteristics are characteristics of users who use the data set;
根据所述元数据特征、所述语义特征、所述用户特征和所述策略特征,确定所述数据集对应的目标资源目录。According to the metadata characteristics, the semantic characteristics, the user characteristics and the policy characteristics, the target resource directory corresponding to the data set is determined.
在一种可能的实施方式中,对所述数据集进行特征提取处理,得到所述数据集的元数据特征和语义特征,包括:In a possible implementation, feature extraction processing is performed on the data set to obtain metadata features and semantic features of the data set, including:
在所述数据集中确定业务元数据和技术元数据;Determine business metadata and technical metadata in said data set;
根据所述业务元数据和所述技术元数据,确定所述元数据特征;Determine the metadata characteristics according to the business metadata and the technical metadata;
在所述数据集中确定结构化数据;determining structured data in said data set;
根据所述结构化数据,确定所述语义特征。The semantic features are determined based on the structured data.
在一种可能的实施方式中,根据所述业务元数据和所述技术元数据,确定所述元数据特征,包括:In a possible implementation, determining the metadata characteristics according to the business metadata and the technical metadata includes:
对所述业务元数据进行数据特征提取,得到业务元数据特征,所述业务元数据特征中包括地域主题;Extract data features from the business metadata to obtain business metadata features, where the business metadata features include regional themes;
对所述技术元数据进行数据特征提取,得到技术元数据特征;Extract data features from the technical metadata to obtain technical metadata features;
其中,所述元数据特征包括所述业务元数据特征和所述技术元数据特征。Wherein, the metadata characteristics include the business metadata characteristics and the technical metadata characteristics.
在一种可能的实施方式中,根据所述结构化数据,确定所述语义特征,包括:In a possible implementation, determining the semantic features according to the structured data includes:
对所述结构化数据进行分词处理,得到多个词汇;Perform word segmentation processing on the structured data to obtain multiple words;
在所述多个词汇中确定高频词汇;Determine high-frequency words among the plurality of words;
根据所述高频词汇确定所述语义特征。The semantic features are determined based on the high-frequency words.
在一种可能的实施方式中,在所述多个词汇中确定高频词汇,包括:In a possible implementation, determining high-frequency words among the plurality of words includes:
获取每个词汇的词频;Get the word frequency of each vocabulary;
将词频大于预设阈值的词汇确定为所述高频词汇;或者,根据所述词频对所述多个词汇进行排序,将排序后的多个词汇中前N个词汇确定为所述高频词汇,所述N为大于或等于1的整数。Determine words with a word frequency greater than a preset threshold as the high-frequency words; or, sort the multiple words according to the word frequency, and determine the top N words among the sorted words as the high-frequency words. , the N is an integer greater than or equal to 1.
在一种可能的实施方式中,根据所述高频词汇确定所述语义特征,包括:In a possible implementation, determining the semantic features based on the high-frequency words includes:
获取所述高频词汇的词频逆文本频率指数TF-IDF值;Obtain the word frequency inverse text frequency index TF-IDF value of the high-frequency vocabulary;
根据所述高频词汇的TF-IDF值,确定所述语义特征。The semantic feature is determined according to the TF-IDF value of the high-frequency vocabulary.
在一种可能的实施方式中,根据所述元数据特征、所述语义特征、所述用户特征和所述策略特征,确定所述数据集对应的目标资源目录,包括:In a possible implementation, determining the target resource directory corresponding to the data set according to the metadata characteristics, the semantic characteristics, the user characteristics and the policy characteristics includes:
确定应用场景;Determine application scenarios;
根据所述应用场景、所述元数据特征和所述语义特征,确定初始特征模型;Determine an initial feature model according to the application scenario, the metadata features and the semantic features;
根据所述用户特征、所述策略特征对所述初始特征模型进行修正处理,得到所述目标资源目录。The initial feature model is modified according to the user features and the policy features to obtain the target resource directory.
第二方面,本申请实施例提供一种数据处理装置,包括:第一获取模块、处理模块、第二获取模块和确定模块,其中,In a second aspect, embodiments of the present application provide a data processing device, including: a first acquisition module, a processing module, a second acquisition module and a determination module, wherein,
所述第一获取模块用于,获取待处理的数据集,所述数据集中包括从多个设备获取的数据;The first acquisition module is used to acquire a data set to be processed, wherein the data set includes data acquired from multiple devices;
所述处理模块用于,对所述数据集进行特征提取处理,得到所述数据集的元数据特征和语义特征;The processing module is used to perform feature extraction processing on the data set to obtain metadata features and semantic features of the data set;
所述第二获取模块用于,获取所述数据集对应的用户特征和预设的策略特征,所述用户特征为使用所述数据集的用户的特征;The second acquisition module is configured to acquire user characteristics and preset policy characteristics corresponding to the data set, where the user characteristics are characteristics of users who use the data set;
所述确定模块用于,根据所述元数据特征、所述语义特征、所述用户特征和所述策略特征,确定所述数据集对应的目标资源目录。The determination module is configured to determine the target resource directory corresponding to the data set according to the metadata characteristics, the semantic characteristics, the user characteristics and the policy characteristics.
在一种可能的实施方式中,所述处理模块用于:In a possible implementation, the processing module is used to:
在所述数据集中确定业务元数据和技术元数据;Determine business metadata and technical metadata in said data set;
根据所述业务元数据和所述技术元数据,确定所述元数据特征;Determine the metadata characteristics according to the business metadata and the technical metadata;
在所述数据集中确定结构化数据;determining structured data in said data set;
根据所述结构化数据,确定所述语义特征。The semantic features are determined based on the structured data.
在一种可能的实施方式中,所述处理模块用于:In a possible implementation, the processing module is used to:
对所述业务元数据进行数据特征提取,得到业务元数据特征,所述业务元数据特征中包括地域主题;Extract data features from the business metadata to obtain business metadata features, where the business metadata features include regional themes;
对所述技术元数据进行数据特征提取,得到技术元数据特征;Extract data features from the technical metadata to obtain technical metadata features;
其中,所述元数据特征包括所述业务元数据特征和所述技术元数据特征。Wherein, the metadata characteristics include the business metadata characteristics and the technical metadata characteristics.
在一种可能的实施方式中,所述处理模块用于:In a possible implementation, the processing module is used to:
对所述结构化数据进行分词处理,得到多个词汇;Perform word segmentation processing on the structured data to obtain multiple words;
在所述多个词汇中确定高频词汇;determining high frequency words among the plurality of words;
根据所述高频词汇确定所述语义特征。The semantic features are determined based on the high-frequency words.
在一种可能的实施方式中,所述处理模块用于:In a possible implementation, the processing module is used to:
获取每个词汇的词频;Get the word frequency of each vocabulary word;
将词频大于预设阈值的词汇确定为所述高频词汇;或者,根据所述词频对所述多个词汇进行排序,将排序后的多个词汇中前N个词汇确定为所述高频词汇,所述N为大于或等于1的整数。Determine words with a word frequency greater than a preset threshold as the high-frequency words; or, sort the multiple words according to the word frequency, and determine the top N words among the sorted words as the high-frequency words. , the N is an integer greater than or equal to 1.
在一种可能的实施方式中,所述处理模块用于:In a possible implementation, the processing module is used to:
获取所述高频词汇的词频逆文本频率指数TF-IDF值;Obtain the word frequency inverse text frequency index TF-IDF value of the high-frequency vocabulary;
根据所述高频词汇的TF-IDF值,确定所述语义特征。The semantic features are determined according to the TF-IDF values of the high-frequency words.
在一种可能的实施方式中,所述确定模块用于:In a possible implementation, the determining module is used to:
确定应用场景;Determine application scenarios;
根据所述应用场景、所述元数据特征和所述语义特征,确定初始特征模型;Determine an initial feature model according to the application scenario, the metadata features and the semantic features;
根据所述用户特征、所述策略特征对所述初始特征模型进行修正处理,得到所述目标资源目录。The initial feature model is modified according to the user features and the policy features to obtain the target resource directory.
第三方面,本申请实施例提供一种电子设备,包括:存储器和处理器;In a third aspect, embodiments of the present application provide an electronic device, including: a memory and a processor;
所述存储器存储计算机执行指令;The memory stores computer execution instructions;
所述处理器执行所述存储器存储的计算机执行指令,使得所述处理器执行第一方面任一项所述的数据处理方法。The processor executes the computer-executable instructions stored in the memory, so that the processor executes the data processing method described in any one of the first aspects.
第四方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当所述计算机执行指令被处理器执行时用于实现第一方面任一项所述的数据处理方法。In a fourth aspect, embodiments of the present application provide a computer-readable storage medium. Computer-executable instructions are stored in the computer-readable storage medium. When the computer-executable instructions are executed by a processor, they are used to implement any one of the first aspects. The data processing methods described in the item.
本申请实施例提供的一种数据处理方法、装置、设备及存储介质,通过在获取得到待处理的数据集之后,可以提取数据集中的元数据特征和语义特征,根据元数据特征和语义特征构建数据集的资源目录,再利用用户特征和策略特征修正资源目录得到目标资源目录。获取资源目录的过程无需技术人员在数据集中进行繁琐的特征分析,而是提取元数据特征和语义特征后,再根据用户特征和策略特征修正资源目录,提高了确定资源目录的效率。The data processing method, device, equipment and storage medium provided by the embodiment of the present application can extract metadata features and semantic features in the data set after obtaining the data set to be processed, construct a resource directory of the data set based on the metadata features and semantic features, and then use user features and policy features to correct the resource directory to obtain the target resource directory. The process of obtaining the resource directory does not require the technician to perform tedious feature analysis in the data set, but extracts metadata features and semantic features, and then corrects the resource directory based on user features and policy features, thereby improving the efficiency of determining the resource directory.
附图说明Description of drawings
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings described here are used to provide a further understanding of the present application and constitute a part of the present application. The illustrative embodiments of the present application and their descriptions are used to explain the present application and do not constitute an improper limitation of the present application. In the attached picture:
图1为本申请实施例提供的应用场景的示意图;Figure 1 is a schematic diagram of an application scenario provided by an embodiment of the present application;
图2为本申请实施例提供的一种数据处理方法的流程示意图;Figure 2 is a schematic flow chart of a data processing method provided by an embodiment of the present application;
图3为本申请实施例提供的另外一种数据处理方法的流程示意图;Figure 3 is a schematic flow chart of another data processing method provided by an embodiment of the present application;
图4为本申请实施例提供的数据处理的流程示意图;Figure 4 is a schematic flow chart of data processing provided by the embodiment of the present application;
图5为本申请实施例提供的一种数据处理装置的结构示意图;FIG5 is a schematic diagram of the structure of a data processing device provided in an embodiment of the present application;
图6为本申请实施例提供的一种电子设备的结构示意图。FIG. 6 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
图1为本申请实施例提供的应用场景的示意图。请参见图1,包括多个数据存储设备101、数据处理设备102和多个用户设备103。数据存储设备101可以存储数据,该多个数据存储设备101可以位于不同的地理位置,例如,数据存储设备中存储的数据可以包括养老数据、社区数据等。数据处理设备102可以从各个数据存储设备101中获取数据,以得到待处理的数据集,数据处理设备102可以对该数据集进行处理,以得到数据集对应的目标资源目录。用户设备103可以通过目标资源目录对数据集进行访问。FIG1 is a schematic diagram of an application scenario provided by an embodiment of the present application. Please refer to FIG1, which includes multiple data storage devices 101, a data processing device 102, and multiple user devices 103. The data storage device 101 can store data, and the multiple data storage devices 101 can be located in different geographical locations. For example, the data stored in the data storage device may include pension data, community data, etc. The data processing device 102 can obtain data from each data storage device 101 to obtain a data set to be processed, and the data processing device 102 can process the data set to obtain a target resource directory corresponding to the data set. The user device 103 can access the data set through the target resource directory.
在相关技术中,在获取得到数据集之后,可以由技术人员手动的对数据集进行处理,以确定得到数据集对应的资源目录。然而,当数据集中的数据量较大时,技术人员在短时间内无法及时根据数据集确定得到对应的资源目录,导致资源目录的确定效率较低。In the related art, after obtaining the data set, technical personnel can manually process the data set to determine the resource directory corresponding to the data set. However, when the amount of data in the data set is large, technicians cannot determine the corresponding resource directory based on the data set in a short period of time, resulting in low efficiency in determining the resource directory.
在本申请实施例中,在获取得到待处理的数据集之后,可以提取数据集中的元数据特征和语义特征,根据元数据特征和语义特征构建数据集的资源目录,再利用用户特征和策略特征修正资源目录得到目标资源目录。在上述过程中,无需技术人员在数据集中进行繁琐的特征分析,而是采用设备提取元数据特征和语义特征后,再根据用户特征和策略特征修正资源目录,提高了确定资源目录的效率。In the embodiment of this application, after obtaining the data set to be processed, the metadata features and semantic features in the data set can be extracted, the resource directory of the data set can be constructed based on the metadata features and semantic features, and then the user features and policy features can be used Correct the resource directory to obtain the target resource directory. In the above process, there is no need for technicians to perform tedious feature analysis in the data set. Instead, equipment is used to extract metadata features and semantic features, and then the resource directory is corrected based on user characteristics and policy features, which improves the efficiency of determining the resource directory.
下面,通过具体实施例对本申请所示的方法进行说明。需要说明的是,下面几个实施例可以单独存在,也可以互相结合,对于相同或相似的内容,在不同的实施例中不再重复说明。The method shown in the present application is described below by means of specific embodiments. It should be noted that the following embodiments can exist independently or in combination with each other, and the same or similar contents will not be described repeatedly in different embodiments.
图2为本申请实施例提供的一种数据处理方法。请参见图2,该方法可以包括:Figure 2 is a data processing method provided by an embodiment of the present application. See Figure 2, the method can include:
S201、获取待处理的数据集。S201. Obtain the data set to be processed.
本申请实施例的执行主体可以为数据处理设备,也可以为设置在数据处理设备中的数据处理装置。数据处理装置可以通过软件实现,也可以通过软件和硬件的结合实现。该数据处理设备可以为图1实施例中的数据处理设备102。The execution subject of the embodiment of the present application may be a data processing device, or a data processing device arranged in the data processing device. The data processing device may be implemented by software, or by a combination of software and hardware. The data processing device may be the data processing device 102 in the embodiment of FIG. 1 .
其中,数据集中包括从多个设备获取的数据。The data set includes data obtained from multiple devices.
该多个设备可以为位于不同地理位置的数据处理设备,例如,该多个数据处理设备可以为图1实施例中的数据存储设备101。The multiple devices may be data processing devices located at different geographical locations. For example, the multiple data processing devices may be the data storage device 101 in the embodiment of FIG. 1 .
例如,假设多个数据处理设备包括设备A、设备B和设备C,则数据集中包括从设备A中获取的数据、从设备B中获取的数据、以及从设备C中获取的数据。For example, assuming that the plurality of data processing devices include device A, device B, and device C, the data set includes data obtained from device A, data obtained from device B, and data obtained from device C.
例如,数据集中的数据可以包括养老数据、社区数据等。For example, the data in the data set can include elderly care data, community data, etc.
S202、对数据集进行特征提取处理,得到数据集的元数据特征和语义特征。S202. Perform feature extraction processing on the data set to obtain metadata features and semantic features of the data set.
数据集中可以包括业务元数据和技术元数据。A dataset can include both business metadata and technical metadata.
技术元数据用于描述数据集的数据结构、数据源信息、值域等特征。Technical metadata is used to describe the data structure, data source information, value range and other characteristics of the data set.
业务元数据用于描述数据集所属业务领域、业务主题之间的关系、业务主题的规则等特征。Business metadata is used to describe the business domain to which the data set belongs, the relationship between business topics, the rules of business topics, and other characteristics.
数据集中可以包括多个业务主题,可以采用多级树形结构对多个业务主体进行划分,得到业务元数据。The data set can include multiple business subjects, and a multi-level tree structure can be used to divide multiple business entities to obtain business metadata.
可以对业务元数据和技术元数据进行特征提取,得到元数据特征。Feature extraction can be performed on business metadata and technical metadata to obtain metadata features.
数据集中包括结构化数据,结构化数据可以为存储和排列中具有规律性的数据。例如,结构化数据可以包括企业资源计划系统(Enterprise Resource Planning,ERP)数据、财务系统数据、审批系统数据等。The data set includes structured data, which can be data with regularity in storage and arrangement. For example, structured data can include enterprise resource planning system (Enterprise Resource Planning, ERP) data, financial system data, approval system data, etc.
语义特征可以为数据集中结构化数据的特征。Semantic features can be features of structured data in a dataset.
可以通过如下方式确定语义特征:对结构化数据进行分词处理,得到多个词汇;在多个词汇中确定高频词汇;根据高频词汇确定语义特征。Semantic features can be determined in the following ways: segmenting structured data to obtain multiple words; determining high-frequency words among multiple words; and determining semantic features based on high-frequency words.
对数据集进行特征提取处理可以减少需要处理的数据量,进而提高数据处理效率。Feature extraction processing of data sets can reduce the amount of data that needs to be processed, thereby improving data processing efficiency.
S203、获取数据集对应的用户特征和预设的策略特征。S203: Obtain user features and preset policy features corresponding to the data set.
其中,用户特征为使用数据集的用户的特征。Among them, user characteristics are the characteristics of users who use the data set.
用户特征可以包括行业背景、调用频率、用数喜好、用户单位之间的关联性等。User characteristics can include industry background, call frequency, usage preferences, correlation between user units, etc.
策略特征可以为行业专家根据时事状况描述的数据集特征。The strategic features can be data set features described by industry experts based on current events.
S204、根据元数据特征、语义特征、用户特征和策略特征,确定数据集对应的目标资源目录。S204. Determine the target resource directory corresponding to the data set based on metadata features, semantic features, user features and policy features.
可以通过如下方式确定目标资源目录:根据资源目录的应用场景、元数据特征和语义特征,确定资源目录,再根据用户特征和策略特征修正资源目录,得到数据集对应的目标资源目录。The target resource directory can be determined in the following way: determine the resource directory based on the application scenarios, metadata features, and semantic features of the resource directory, and then modify the resource directory based on user characteristics and policy characteristics to obtain the target resource directory corresponding to the data set.
本申请实施例提供的数据处理方法,先获取待处理的多个设备的数据集,对数据集进行特征提取处理,得到数据集的元数据特征和语义特征;再获取数据集对应的用户特征和预设的策略特征;最后根据元数据特征、语义特征、用户特征和策略特征,确定数据集对应的目标资源目录。在上述过程中,可以根据元数据特征和语义特征确定初始特征模型,再通过用户特征和策略特征修正,确定目标资源目录。提取数据集的元数据特征和语义特征构建资源目录,再通过用户特征和策略特征修正资源目录提高了确定目标资源目录的效率。The data processing method provided by the embodiment of this application first obtains the data sets of multiple devices to be processed, performs feature extraction processing on the data sets, and obtains the metadata features and semantic features of the data set; and then obtains the user features and semantic features corresponding to the data set. Preset policy features; finally, determine the target resource directory corresponding to the data set based on metadata features, semantic features, user features and policy features. In the above process, the initial feature model can be determined based on metadata features and semantic features, and then modified through user features and policy features to determine the target resource directory. Extracting metadata features and semantic features of the data set to construct a resource catalog, and then modifying the resource catalog through user features and policy features improves the efficiency of determining the target resource catalog.
下面,结合图3,对本申请实施例所示的数据处理方法进行进一步详细说明。Next, with reference to Figure 3, the data processing method shown in the embodiment of the present application will be described in further detail.
图3为本申请实施例提供的为数据处理方法的流程示意图。请参见图3,该方法可以包括:Figure 3 is a schematic flowchart of a data processing method provided by an embodiment of the present application. See Figure 3, the method can include:
S301、获取待处理的数据集。S301. Obtain the data set to be processed.
S301的执行过程可以参见S201的执行过程,此处不再进行赘述。For the execution process of S301, please refer to the execution process of S201, which will not be described again here.
S302、在数据集中确定业务元数据和技术元数据。S302: Determine business metadata and technical metadata in the data set.
可以采用多级树形结构划分数据集的业务主题,以确定数据集的业务元数据,其中,业务主题可以包括地域主题。A multi-level tree structure can be used to divide the business themes of the data set to determine the business metadata of the data set, where the business themes can include regional themes.
多级树形结构为一种数据结构,可以用于表征数据之间的关联关系。The multi-level tree structure is a data structure that can be used to represent the relationship between data.
技术元数据可以包括数据集中开发和日常管理时使用的数据。Technical metadata can include data used in the central development and day-to-day management of the dataset.
S303、根据业务元数据和技术元数据,确定元数据特征。S303. Determine metadata characteristics based on business metadata and technical metadata.
可以通过如下方式确定元数据特征:对业务元数据进行数据特征提取,得到业务元数据特征,其中,业务元数据特征中可以包括地域主题;对技术元数据进行数据特征提取,得到技术元数据特征;元数据特征可以包括业务元数据特征和技术元数据特征。Metadata characteristics can be determined in the following ways: Extract data characteristics from business metadata to obtain business metadata characteristics, in which business metadata characteristics can include regional themes; Extract data characteristics from technical metadata to obtain technical metadata characteristics ; Metadata characteristics can include business metadata characteristics and technical metadata characteristics.
元数据特性具有技术和业务双重特征。Metadata features have both technical and business characteristics.
S304、在数据集中确定结构化数据。S304. Determine structured data in the data set.
可以根据数据集中具有规律性的数据,确定结构化数据。Structured data can be determined based on regular data in the data set.
S305、对结构化数据进行分词处理,得到多个词汇。S305. Perform word segmentation processing on the structured data to obtain multiple words.
可以采用汉语词法分析系统(Institute of Computing Technology,ChineseLexical Analysis System,ICTCLAS)对结构化数据进行分词,得到多个词汇。The Chinese Lexical Analysis System (Institute of Computing Technology, Chinese Lexical Analysis System, ICTCLAS) can be used to segment structured data to obtain multiple words.
S306、在多个词汇中确定高频词汇。S306. Determine high-frequency words among multiple words.
可以通过如下方式在多个词汇中确定高频词汇:获取每个词汇的词频,将词频大于预设阈值的词汇确定为高频词汇;或者,根据词频对多个词汇进行排序,将排序后的多个词汇中前N个词汇确定为高频词汇,N为大于或等于1的整数。High-frequency words can be determined from a plurality of words in the following manner: obtaining the word frequency of each word, and determining words whose word frequency is greater than a preset threshold as high-frequency words; or, sorting a plurality of words according to the word frequency, and determining the first N words from the sorted plurality of words as high-frequency words, where N is an integer greater than or equal to 1.
词频可以是指词汇在数据集中的出现频率。Word frequency can refer to the frequency of occurrence of words in the data set.
S307、根据高频词汇确定语义特征。S307. Determine semantic features based on high-frequency words.
获取高频词汇的词频逆文本频率指数(term frequency–inverse documentfrequency,TF-IDF)的TF-IDF值。Obtain the TF-IDF value of the term frequency–inverse document frequency index (TF-IDF) of the high-frequency vocabulary.
根据高频词汇的TF-IDF值,确定语义特征。Semantic features are determined based on the TF-IDF values of high-frequency words.
S308、确定应用场景。S308: Determine the application scenario.
应用场景可以指使用资源目录场景。例如,应用场景可以包括用于建立推荐系统的场景、知识推送的场景、使用指标预测的场景等。Application scenarios can refer to scenarios using resource directories. For example, application scenarios may include scenarios for building recommendation systems, scenarios for knowledge push, scenarios for using indicator predictions, etc.
S309、根据应用场景、元数据特征和语义特征,确定初始特征模型。S309. Determine the initial feature model based on the application scenario, metadata features and semantic features.
初始特征模型是指,符合应用场景使用的特征集合。初始特征模型可以包括特征库和标签集。The initial feature model refers to a set of features that conforms to the application scenario. The initial feature model can include a feature library and a label set.
可以对元数据特征和语义特征进行归一化处理,以确定得到特征库。Metadata features and semantic features can be normalized to determine the feature library.
可以对语义分析中的高频词汇进行词典匹配处理,以确定标签集。其中,词典可以是应用场景预设的词典库。Dictionary matching can be performed on high-frequency words in semantic analysis to determine the tag set. The dictionary may be a dictionary library preset by the application scenario.
可以根据特征库和标签库,确定初始特征模型。The initial feature model can be determined based on the feature library and label library.
S310、确定用户特征和策略特征。S310. Determine user characteristics and policy characteristics.
可以由行业专家对用户的行业背景、调用频率、用数喜好等进行分析,以确定得到用户特征。其中,分析方法可以包括聚类方法、分类方法等。Industry experts can analyze the user's industry background, call frequency, usage preferences, etc. to determine user characteristics. Among them, the analysis methods may include clustering methods, classification methods, etc.
行业专家可以根据时事状况,确定策略特征。Industry experts can determine the characteristics of the strategy based on current events.
S311、根据用户特征、策略特征对初始特征模型进行修正处理,得到目标资源目录。S311. Modify the initial feature model according to user features and policy features to obtain a target resource directory.
可以根据用户特征和策略特征对初始特征模型中的对应特征进行修正。可以采用加权的方式强化初始特征模型中的用户特征和策略特征,得到修正后的目标资源目录。The corresponding features in the initial feature model can be modified according to user features and policy features. A weighted approach can be used to strengthen the user features and policy features in the initial feature model to obtain a revised target resource directory.
本申请实施例提供的是另一种数据处理方法,先获取待处理的多个设备的数据集;再根据数据集中的技术元数据和业务元数据,进行特征提取处理,得到数据集的技术元数据特征和业务元数据特征;根据技术元数据特征和业务元数据特征,确定元数据特征;在数据集中确定结构化数据;对结构化数据进行分词处理,得到多个词汇;在多个词汇中确定高频词汇,根据高频词汇确定语义特征;确定应用场景后,根据应用场景、元数据特征和语义特征,确定初始特征模型。确定用户特征和策略特征后,根据用户特征、策略特征对初始特征模型进行修正处理,得到目标资源目录。上述过程中,先提取数据集的技术元数据和业务元数据,得到元数据特征,再根据数据集的高频词汇得到语义特征,最后通过用户特征和策略特征修正,得到目标资源目录,提高了确定目标资源目录的效率。The embodiments of this application provide another data processing method, which first obtains the data sets of multiple devices to be processed; and then performs feature extraction processing based on the technical metadata and business metadata in the data set to obtain the technical metadata of the data set. Data characteristics and business metadata characteristics; determine metadata characteristics based on technical metadata characteristics and business metadata characteristics; determine structured data in the data set; perform word segmentation processing on structured data to obtain multiple words; among multiple words Determine high-frequency words and determine semantic features based on high-frequency words; after determining the application scenario, determine the initial feature model based on the application scenario, metadata features, and semantic features. After determining the user characteristics and policy characteristics, the initial feature model is modified according to the user characteristics and policy characteristics to obtain the target resource directory. In the above process, the technical metadata and business metadata of the data set are first extracted to obtain the metadata features, and then the semantic features are obtained based on the high-frequency vocabulary of the data set. Finally, the target resource directory is obtained through correction of user features and policy features, which improves Determine the efficiency of the target resource directory.
下面,结合图4,通过具体实例,对本申请实施例所示的数据处理方法进行详细说明。Below, with reference to Figure 4, the data processing method shown in the embodiment of the present application will be described in detail through specific examples.
图4为本申请实施例提供的数据处理的过程示意图。请参见图4,在实际应用过程中,在获取待处理的数据集之后,可以在待处理的数据集中确定得到业务元数据、技术元数据和结构化数据。Fig. 4 is a schematic diagram of the data processing process provided by the embodiment of the present application. Referring to Fig. 4, in the actual application process, after obtaining the data set to be processed, business metadata, technical metadata and structured data can be determined in the data set to be processed.
对业务元数据和技术元数据进行特征提取处理,可以得到元数据特征。对结构化数据进行处理,可以得到语义特征。并根据元数据特征和语义特征确定得到初始特征模型。By performing feature extraction processing on business metadata and technical metadata, metadata features can be obtained. By processing structured data, semantic features can be obtained. And the initial feature model is determined based on metadata features and semantic features.
还可以获取用户特征(例如,包括用户细化和用户行业背景等)和策略特征(例如,包括地域特征和行业特征等)对初始特征模型进行更新,以得到目标资源目录。User characteristics (for example, including user refinement and user industry background, etc.) and policy characteristics (for example, including regional characteristics and industry characteristics, etc.) can also be obtained to update the initial feature model to obtain the target resource directory.
图5为本公开实施例提供的一种数据处理装置的结构示意图。请参见图5,该数据处理装置可以包括第一获取模块11、处理模块12、第二获取模块13和确定模块14,其中,FIG. 5 is a schematic structural diagram of a data processing device provided by an embodiment of the present disclosure. Referring to Figure 5, the data processing device may include a first acquisition module 11, a processing module 12, a second acquisition module 13 and a determination module 14, where,
第一获取模块11用于,获取待处理的数据集,数据集中包括从多个设备获取的数据;The first acquisition module 11 is used to acquire a data set to be processed, where the data set includes data acquired from multiple devices;
处理模块12用于,对数据集进行特征提取处理,得到数据集的元数据特征和语义特征;The processing module 12 is used to perform feature extraction processing on the data set to obtain metadata features and semantic features of the data set;
第二获取模块13用于,获取数据集对应的用户特征和预设的策略特征,用户特征为使用数据集的用户的特征;The second acquisition module 13 is used to acquire user characteristics and preset policy characteristics corresponding to the data set, where the user characteristics are characteristics of the user who uses the data set;
确定模块14用于,根据元数据特征、语义特征、用户特征和策略特征,确定数据集对应的目标资源目录。The determination module 14 is used to determine the target resource directory corresponding to the data set based on metadata features, semantic features, user features and policy features.
本申请实施例提供的数据处理装置可以执行上述方法实施例所示的技术方案,其实现原理以及有益效果类似,此处不再进行赘述。The data processing device provided in the embodiment of the present application can execute the technical solution shown in the above method embodiment, and its implementation principle and beneficial effects are similar, which will not be repeated here.
在一种可能的实施方式中,处理模块12具体用于:In a possible implementation, the processing module 12 is specifically used to:
在所述数据集中确定业务元数据和技术元数据;Determine business metadata and technical metadata in said data set;
根据所述业务元数据和所述技术元数据,确定所述元数据特征;Determine the metadata characteristics according to the business metadata and the technical metadata;
在所述数据集中确定结构化数据;determining structured data in the dataset;
根据所述结构化数据,确定所述语义特征。The semantic features are determined based on the structured data.
在一种可能的实施方式中,处理模块12具体用于:In a possible implementation manner, the processing module 12 is specifically configured to:
对所述业务元数据进行数据特征提取,得到业务元数据特征,所述业务元数据特征中包括地域主题;Extract data features from the business metadata to obtain business metadata features, where the business metadata features include regional themes;
对所述技术元数据进行数据特征提取,得到技术元数据特征;Extract data features from the technical metadata to obtain technical metadata features;
其中,所述元数据特征包括所述业务元数据特征和所述技术元数据特征。The metadata features include the business metadata features and the technical metadata features.
在一种可能的实施方式中,处理模块12具体用于:In a possible implementation, the processing module 12 is specifically used to:
对所述结构化数据进行分词处理,得到多个词汇;Perform word segmentation processing on the structured data to obtain multiple words;
在所述多个词汇中确定高频词汇;Determine high-frequency words among the plurality of words;
根据所述高频词汇确定所述语义特征。The semantic feature is determined according to the high-frequency words.
在一种可能的实施方式中,处理模块12具体用于:In a possible implementation, the processing module 12 is specifically used to:
获取每个词汇的词频;Get the word frequency of each vocabulary word;
将词频大于预设阈值的词汇确定为所述高频词汇;或者,根据所述词频对所述多个词汇进行排序,将排序后的多个词汇中前N个词汇确定为所述高频词汇,所述N为大于或等于1的整数。Determine words with a word frequency greater than a preset threshold as the high-frequency words; or, sort the multiple words according to the word frequency, and determine the top N words among the sorted words as the high-frequency words. , the N is an integer greater than or equal to 1.
在一种可能的实施方式中,处理模块12具体用于:In a possible implementation, the processing module 12 is specifically used to:
获取所述高频词汇的词频逆文本频率指数TF-IDF值;Obtain the word frequency inverse text frequency index TF-IDF value of the high-frequency vocabulary;
根据所述高频词汇的TF-IDF值,确定所述语义特征。The semantic features are determined according to the TF-IDF values of the high-frequency words.
在一种可能的实施方式中,确定模块14具体用于:In a possible implementation, the determining module 14 is specifically used to:
确定应用场景;Determine application scenarios;
根据所述应用场景、所述元数据特征和所述语义特征,确定初始特征模型;Determine an initial feature model according to the application scenario, the metadata features and the semantic features;
根据所述用户特征、所述策略特征对所述初始特征模型进行修正处理,得到所述目标资源目录。The initial feature model is modified according to the user features and the policy features to obtain the target resource directory.
本申请实施例提供的数据处理装置可以执行上述方法实施例所示的技术方案,其实现原理以及有益效果类似,此处不再进行赘述。The data processing device provided by the embodiments of the present application can execute the technical solutions shown in the above method embodiments. The implementation principles and beneficial effects are similar and will not be described again here.
本申请实施例提供一种电子设备的结构示意图,请参见图6,该电子设备20可以包括处理器21和存储器22。示例性地,处理器21、存储器22,各部分之间通过总线23相互连接。This embodiment of the present application provides a schematic structural diagram of an electronic device. See FIG. 6 . The electronic device 20 may include a processor 21 and a memory 22 . For example, the processor 21 and the memory 22 are connected to each other through a bus 23 .
所述存储器22存储计算机执行指令;The memory 22 stores computer-executable instructions;
所述处理器21执行所述存储器22存储的计算机执行指令,使得所述处理器21执行如上述方法实施例所示的数据处理方法。The processor 21 executes the computer execution instructions stored in the memory 22, so that the processor 21 executes the data processing method shown in the above method embodiment.
相应地,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当所述计算机执行指令被处理器执行时用于实现上述方法实施例所述的数据处理方法。Correspondingly, embodiments of the present application provide a computer-readable storage medium in which computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, they are used to implement the above method embodiments. data processing methods.
相应地,本申请实施例还可提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时,可实现上述方法实施例所示的数据处理方法。Accordingly, an embodiment of the present application may also provide a computer program product, including a computer program, which, when executed by a processor, can implement the data processing method shown in the above method embodiment.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention may be provided as methods, systems, or computer program products. Therefore, the present invention may take the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for implementing the functions specified in one process or processes of the flowchart and/or one block or blocks of the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions The device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。Memory may include non-permanent storage in computer-readable media, random access memory (RAM), and/or non-volatile memory in the form of read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information. Information may be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory. (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cassettes, tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium can be used to store information that can be accessed by a computing device. As defined in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, commodity or device. In the absence of more restrictions, the elements defined by the sentence "comprises a ..." do not exclude the existence of other identical elements in the process, method, commodity or device including the elements.
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above descriptions are only examples of the present application and are not intended to limit the present application. To those skilled in the art, various modifications and variations may be made to this application. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of this application shall be included in the scope of the claims of this application.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211157393.8A CN117807177A (en) | 2022-09-22 | 2022-09-22 | Data processing method, device, equipment and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211157393.8A CN117807177A (en) | 2022-09-22 | 2022-09-22 | Data processing method, device, equipment and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN117807177A true CN117807177A (en) | 2024-04-02 |
Family
ID=90420582
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202211157393.8A Pending CN117807177A (en) | 2022-09-22 | 2022-09-22 | Data processing method, device, equipment and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN117807177A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118939651A (en) * | 2024-07-16 | 2024-11-12 | 中国科学院软件研究所 | A threat data resource directory updating method and management system |
-
2022
- 2022-09-22 CN CN202211157393.8A patent/CN117807177A/en active Pending
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118939651A (en) * | 2024-07-16 | 2024-11-12 | 中国科学院软件研究所 | A threat data resource directory updating method and management system |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11409642B2 (en) | Automatic parameter value resolution for API evaluation | |
| US20250021884A1 (en) | Machine learning service | |
| TWI718643B (en) | Method and device for identifying abnormal groups | |
| US10262272B2 (en) | Active machine learning | |
| US10318882B2 (en) | Optimized training of linear machine learning models | |
| US9460117B2 (en) | Image searching | |
| US9740771B2 (en) | Information handling system and computer program product for deducing entity relationships across corpora using cluster based dictionary vocabulary lexicon | |
| US10963810B2 (en) | Efficient duplicate detection for machine learning data sets | |
| US11256712B2 (en) | Rapid design, development, and reuse of blockchain environment and smart contracts | |
| JP5534280B2 (en) | Text clustering apparatus, text clustering method, and program | |
| US11681817B2 (en) | System and method for implementing attribute classification for PII data | |
| CN111143578B (en) | Method, device and processor for extracting event relationship based on neural network | |
| CN107368489B (en) | Information data processing method and device | |
| CN111078776A (en) | Data table standardization method, device, equipment and storage medium | |
| US10394907B2 (en) | Filtering data objects | |
| CN112488557A (en) | Automatic calculation method, device and terminal based on grading standard objective scores | |
| CN110263184A (en) | A kind of data processing method and relevant device | |
| CN117807177A (en) | Data processing method, device, equipment and storage medium | |
| US20210034704A1 (en) | Identifying Ambiguity in Semantic Resources | |
| CN110399431A (en) | A kind of incidence relation construction method, device and equipment | |
| US11520764B2 (en) | Multicriteria record linkage with surrogate blocking keys | |
| CN115392389B (en) | Cross-modal information matching, processing method, device, electronic device and storage medium | |
| CN117709456A (en) | Knowledge graph construction method and device for financial data and electronic equipment | |
| CN116860969A (en) | A customer review analysis method, system, equipment and medium | |
| CN111552706B (en) | Public opinion information grouping method, device and equipment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |