[go: up one dir, main page]

CN118967202B - A multi-source data processing method and system for supplier evaluation - Google Patents

A multi-source data processing method and system for supplier evaluation Download PDF

Info

Publication number
CN118967202B
CN118967202B CN202411444157.3A CN202411444157A CN118967202B CN 118967202 B CN118967202 B CN 118967202B CN 202411444157 A CN202411444157 A CN 202411444157A CN 118967202 B CN118967202 B CN 118967202B
Authority
CN
China
Prior art keywords
data
evaluation
source
fusion
supplier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411444157.3A
Other languages
Chinese (zh)
Other versions
CN118967202A (en
Inventor
向瑞
曹彧
晋高产
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Great Wall Electronic Commerce Co ltd
Original Assignee
Beijing Great Wall Electronic Commerce Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Great Wall Electronic Commerce Co ltd filed Critical Beijing Great Wall Electronic Commerce Co ltd
Priority to CN202411444157.3A priority Critical patent/CN118967202B/en
Publication of CN118967202A publication Critical patent/CN118967202A/en
Application granted granted Critical
Publication of CN118967202B publication Critical patent/CN118967202B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Accounting & Taxation (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Theoretical Computer Science (AREA)
  • Educational Administration (AREA)
  • Data Mining & Analysis (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明提出了一种应用于供应商评价的多源数据处理方法和系统。属于数据融合处理技术领域,所述方法包括:对不同来源的多源数据进行收集,并对收集到的多源数据进行预处理;根据供应商评价需求,确定主数据源以及相关数据源,基于预设的数据筛查规则,从主数据源中进行有效数据提取,并将提取的有效数据与相关数据源中的数据进行关联。通过对不同来源的数据进行收集和预处理,确保了数据的质量。

The present invention proposes a multi-source data processing method and system for supplier evaluation. The method belongs to the field of data fusion processing technology. The method comprises: collecting multi-source data from different sources and pre-processing the collected multi-source data; determining the main data source and related data sources according to the supplier evaluation requirements, extracting valid data from the main data source based on preset data screening rules, and associating the extracted valid data with the data in the related data sources. By collecting and pre-processing data from different sources, the quality of the data is ensured.

Description

Multi-source data processing method and system applied to supplier evaluation
Technical Field
The invention provides a multi-source data processing method and system applied to supplier evaluation, and belongs to the technical field of data fusion processing.
Background
With the acceleration of globalization and digitization processes, enterprise supply chain management is increasingly complex, and as a key link of supply chain management, provider evaluation directly influences the operation efficiency and market competitiveness of an enterprise in accuracy and comprehensiveness. The traditional supplier evaluation method mainly depends on a single data source, such as transaction data in an enterprise or information reported by a supplier, and the method has the problems of single data, one-sided information, susceptibility to subjective factors and the like, and is difficult to comprehensively and objectively reflect the real performance of the supplier.
The multi-source data is an important characteristic of the modern data processing technology, and provides new possibility for the evaluation of suppliers due to the characteristics of large data quantity, wide sources, various structures and the like. The multi-source data not only comprises structured data such as transaction data and financial reports inside enterprises, but also covers unstructured or semi-structured data such as social media comments, industry reports, market research and the like. The data sources together form a comprehensive information base for the evaluation of the suppliers, and key elements such as the capability, quality and reputation of the suppliers can be reflected more truly and comprehensively.
However, the processing of multi-source data also faces a number of challenges. First, the data formats, structures and quality standards of different sources are different, and preprocessing work such as data cleaning, conversion and standardization is required to ensure the consistency and comparability of the data. Secondly, complex relationships and potential conflicts exist between the multi-source data, and integration are required through a proper data fusion algorithm to form a comprehensive and accurate comprehensive data set. Finally, the processing and analysis of the multi-source data are required to extract key evaluation indexes and characteristics by means of advanced intelligent technologies such as machine learning, data mining and the like, so that powerful support is provided for the evaluation of suppliers.
Disclosure of Invention
The invention provides a multi-source data processing method and a system applied to supplier evaluation, which are used for solving the problems in the background art:
the invention provides a multi-source data processing method applied to supplier evaluation, which comprises the following steps:
s1, collecting multi-source data of different sources, and preprocessing the collected multi-source data;
s2, determining a main data source and related data sources according to the evaluation requirements of suppliers, extracting effective data from the main data source based on preset data screening rules, and associating the extracted effective data with data in the related data sources;
S3, fusing the effective data and the effective related data to form a comprehensive data set, and constructing an index system for evaluating suppliers based on the fused multi-source data;
s4, constructing a data processing model aiming at each evaluation dimension, carrying out deep analysis on multi-source data, extracting key evaluation indexes and characteristics, and carrying out comprehensive evaluation on suppliers by adopting a comprehensive evaluation algorithm based on the evaluation indexes and characteristics of each dimension;
s5, outputting a comprehensive evaluation report of the supplier according to the result of the comprehensive evaluation algorithm, and feeding back the evaluation result to the supplier and related personnel.
Further, the step S1 includes:
S11, collecting multi-source data of different sources through an acquisition script or an API interface;
s12, the collected multi-source data are sent to a cloud space, and the cloud space stores the received multi-source data through a layered storage structure;
s13, preprocessing stored multi-source data, encrypting sensitive data, and desensitizing data which are not sensitive but related to privacy.
Further, the step S12 includes:
Compressing the uploaded multi-source data through a compression algorithm to obtain a compressed first data packet, performing redundancy backup on the compressed data packet to obtain a second data packet, and respectively transmitting the first data packet and the second data packet to a cloud space through a multi-channel transmission protocol;
after the cloud space receives the first data packet and the second data packet, the second data packet is stored in the second storage space;
decompressing a first data packet in a first storage space, and capturing a data source which needs to be processed in real time in the first storage space by adopting a stream processing frame through a real-time data stream processing layer;
And rejecting error or invalid data, analyzing the processed data in real time, extracting key indexes or performing preliminary data aggregation;
constructing a layered historical data storage architecture in the first storage space, and automatically migrating data according to the data access frequency and importance by utilizing the combination of object storage and block storage;
And introducing a data archiving strategy, migrating data which is inactive for a long time and still needs to be reserved to a cold storage area, and establishing an index for historical data by utilizing a search engine technology.
Further, the step S2 includes:
S21, carrying out quantization scoring on each data source based on a preset threshold and weight through a multi-dimensional data source quality evaluation framework, and identifying a main data source;
S22, aiming at a main data source, automatically adjusting an extraction time interval based on the frequency of data updating, and identifying and extracting key information fields by using a regular expression and a machine learning model;
S23, in the extraction process, based on a data quality secondary verification mechanism, comparing the logic relationship and consistency between historical data and verification data;
S24, dynamically adjusting association rules according to data characteristics and service requirements by jointly identifying and combining with a multidimensional association strategy when associating related data sources;
And S25, after correlation, carrying out consistency verification, establishing a feedback mechanism, feeding back a consistency verification result to a data collection and processing link, and simultaneously continuously optimizing a correlation rule and a verification algorithm according to the feedback result to form an iterative optimization process of a closed loop.
Further, the step S22 includes:
Monitoring an update log or a time stamp of a main data source in real time or periodically, recording the time interval of each data update, and analyzing historical update frequency data to obtain an analysis result;
setting an initial data extraction time interval according to an analysis result, and automatically adjusting the extraction time interval to match a new update speed when the change of the data update frequency is monitored through a dynamic adjustment mechanism;
aiming at the structured data, positioning and extracting key information fields through regular expressions;
for unstructured or semi-structured data, key information is identified and extracted through an existing named entity identification model;
Combining the regular expression and the named entity recognition model to construct a mixed extraction strategy;
Monitoring the extraction process in real time, recording key indexes of each extraction task, triggering an alarm and taking corresponding emergency measures when an abnormality occurs in the extraction process through an abnormality detection mechanism;
the method comprises the steps of carrying out primary verification on extracted data through an extraction result verification mechanism, evaluating the effectiveness of an extraction strategy based on data quality analysis of an extraction result, and carrying out optimization adjustment on a mixing strategy according to an analysis result.
Further, the step S3 includes:
S31, identifying and processing the isomerism problem among the multi-source data, and converting the data of different sources into a uniform format;
S32, designing a data weight distribution mechanism by a hierarchical analysis method and a fuzzy comprehensive evaluation method and combining an expert database based on multidimensional indexes of the data, and dynamically adjusting the weight distribution mechanism by a dynamic adjustment strategy of the weight;
S33, fusing the data through a fusion algorithm, and performing outlier detection and a data cleaning mechanism in the fusion process;
S34, after fusion is completed, verifying and evaluating the fused data by comparing historical data, industry references and business logic;
s35, dividing the index system into a plurality of dimensions according to the requirements and targets of the supplier evaluation, further refining the evaluation index under each dimension to form a multi-level index system structure, establishing an index dynamic adjustment mechanism, and dynamically adjusting the index system according to influence factors.
Further, the step S33 includes:
Selecting a fusion algorithm according to data characteristics (such as data quantity, data distribution, heterogeneous degree and the like) and service requirements, detecting abnormal values in the data again by using a statistical method, eliminating or correcting the abnormal values, and cleaning the multi-source data after the detection again;
according to the selected fusion algorithm and data characteristics, a fusion strategy is formulated, data fusion operation is executed according to the fusion strategy, and the fusion process is monitored in real time;
preliminary evaluation is carried out on the quality of the fusion result by calculating the statistical index and visual analysis of the fusion result;
And feeding back and adjusting the fusion strategy according to the preliminary evaluation result, and continuously optimizing the fusion algorithm and strategy through multiple iterations.
Further, the step S4 includes:
S41, constructing an advanced model by combining a plurality of integrated learning methods with a plurality of machine learning algorithms, and carrying out parameter tuning and model fusion;
S42, evaluating the performance of the model through cross validation and ROC curve analysis, and selecting the model for deployment based on an evaluation result;
s43, selecting an evaluation method according to the evaluation requirement, forming a composite evaluation algorithm through weighted fusion and nonlinear combination, and automatically adjusting the weight of each evaluation method according to the change of an evaluation target and the real-time update of data through a weight dynamic adjustment mechanism;
s44, constructing a risk assessment model through machine learning, quantitatively assessing risks existing in suppliers, automatically triggering early warning when a risk assessment result exceeds a preset threshold through a preset early warning threshold, and responding through an early warning response mechanism.
Further, the step S5 includes:
s51, constructing a trend prediction system based on time sequence analysis or a machine learning model, modeling historical performance data of suppliers, and predicting future development trend;
s52, comparing and analyzing the evaluation result of the supplier with an industry standard, and generating personalized improvement suggestions according to the concrete performances of the supplier;
S53, establishing an improvement effect tracking mechanism, periodically collecting performance data of suppliers after adopting improvement suggestions, evaluating the improvement effect, and adjusting the suggestion content according to feedback;
s54, establishing a multi-channel user feedback collection mechanism, sorting and analyzing the collected user feedback through an iterative optimization flow, and making a corresponding optimization plan, and continuously iterating and optimizing system functions and performances based on an optimization strategy.
The invention provides a system for realizing the multi-source data processing method applied to supplier evaluation, which comprises the following steps:
The data collection module is used for collecting multi-source data of different sources and preprocessing the collected multi-source data;
The data association module is used for determining a main data source and related data sources according to the evaluation requirements of suppliers, extracting effective data from the main data source based on a preset data screening rule, and associating the extracted effective data with the data in the related data source;
The data fusion module is used for fusing the effective data and the effective related data to form a comprehensive data set, and constructing an index system for evaluating suppliers based on the fused multi-source data;
the comprehensive evaluation module is used for constructing a data processing model aiming at each evaluation dimension, carrying out deep analysis on the multi-source data, extracting key evaluation indexes and characteristics, and carrying out comprehensive evaluation on suppliers by adopting a comprehensive evaluation algorithm based on the evaluation indexes and characteristics of each dimension;
and the result feedback module is used for outputting a comprehensive evaluation report of the supplier according to the result of the comprehensive evaluation algorithm and feeding back the evaluation result to the supplier and related personnel.
The method has the beneficial effects that the data quality is ensured by collecting and preprocessing the data from different sources. By adopting the cloud storage technology and the data processing mechanism, the security and reliability of the data can be enhanced. Sensitive information can be protected by encrypting and desensitizing the data. By utilizing a data source quality assessment framework, the most reliable and representative data sources can be identified. The timeliness of the data is ensured by dynamically adjusting the data extraction time and rules. The fusion and standardization of data solves the problem of data isomerism, so that the data can be effectively compared and analyzed on the same platform. By combining a plurality of machine learning algorithms and an integrated learning method, the accuracy and depth of data analysis are improved. The performance of the provider can be reflected more comprehensively by adopting a composite evaluation algorithm and a dynamic weight adjustment mechanism. The use of a risk assessment model can help discover potential problems ahead of time, reducing the risk of supply chain interruption. The trend prediction system can provide insight into future performance changes of the provider and help the enterprise make more intelligent decisions. The dominant and disadvantaged areas of suppliers can be identified by comparative analysis with industry benchmarks. The improvement effect tracking mechanism ensures efficient implementation of improvement suggestions and facilitates the process of continuous optimization. Providing detailed rating reports to suppliers helps to improve both understanding and trust. The user feedback collection mechanism ensures the continuous improvement of the system and meets the continuous changing business requirements.
Drawings
FIG. 1 is a diagram of the steps of the method of the present invention;
FIG. 2 is a block diagram of a system according to the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, and the described embodiments are merely some, rather than all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
In one embodiment of the present invention, as shown in fig. 1, a multi-source data processing method applied to vendor evaluation, the method includes:
s1, collecting multi-source data of different sources, and preprocessing the collected multi-source data;
S2, determining a main data source and related data sources according to the evaluation requirements of suppliers, wherein the main data source is core data in the evaluation process, such as a financial statement; the related data sources are auxiliary data, such as social media evaluation, effective data extraction is carried out from the main data sources based on preset data screening rules, and the extracted effective data are associated with the data in the related data sources;
S3, fusing the effective data and the effective related data to form a comprehensive data set, and constructing an index system for evaluating suppliers based on the fused multi-source data;
S4, constructing a data processing model aiming at each evaluation dimension, carrying out deep analysis on multi-source data, extracting key evaluation indexes and characteristics, and carrying out comprehensive evaluation on suppliers by adopting a comprehensive evaluation algorithm (such as a hierarchical analysis method, a fuzzy comprehensive evaluation method and the like) based on the evaluation indexes and characteristics of each dimension;
S5, outputting a comprehensive evaluation report of the provider according to the result of the comprehensive evaluation algorithm, wherein the evaluation report comprises scores, ranks, advantages and disadvantages of the provider, feeding back the evaluation result to the provider and related personnel, helping the provider to know the problems and the disadvantages of the provider and making corresponding improvement measures. And meanwhile, continuously optimizing and perfecting the data processing model and the evaluation algorithm according to the evaluation result and the feedback opinion.
The technical scheme has the working principle that multi-source data related to the supplier evaluation are collected through different channels (such as an enterprise internal system, a public database, social media, a third party evaluation platform and the like). The multi-source data comprises multiple types of financial reports, transaction records, customer evaluations, social media feedback and the like, the collected multi-source data is preprocessed, and a main data source (such as the financial reports) and related data sources (such as the social media evaluations) are determined according to evaluation requirements. The main data source is the core of evaluation, the related data source is used for providing auxiliary information, and the effective data is extracted from the main data source based on a preset data screening rule. The data screening rules relate to aspects of time range, integrity, rationality and the like of the data, such as screening the latest data acquired within one week, and matching and correlating the extracted effective data with the data in the related data sources by utilizing common identifiers (such as supplier ID, product number and the like) in the data. And fusing the effective data and the effective related data to form a comprehensive data set. And constructing an index system for evaluating suppliers based on the fused multi-source data. The index system covers multiple aspects of suppliers (such as product quality, delivery capacity, service attitude, price competitiveness and the like), determines specific evaluation indexes of each aspect, such as the product quality reaching national standards or the service attitude being five stars, builds a data processing model aiming at each evaluation dimension, carries out deep analysis on multi-source data, and extracts key evaluation indexes and characteristics. Based on the evaluation index and the characteristics of each dimension, comprehensive evaluation algorithms (such as a hierarchical analysis method, a fuzzy comprehensive evaluation method and the like) are adopted to comprehensively evaluate the suppliers, comprehensively consider a plurality of factors, give comprehensive scores and ranks, and output comprehensive evaluation reports of the suppliers according to the results of the comprehensive evaluation algorithms. The comprehensive evaluation report comprises information on the score, ranking, superiority, deficiency and the like of the suppliers, and the evaluation result is fed back to the suppliers and related personnel to help the suppliers and related personnel to know the problems and the deficiency of the suppliers and the related personnel and to formulate corresponding improvement measures. And meanwhile, continuously optimizing and perfecting the data processing model and the evaluation algorithm according to the evaluation result and the feedback opinion.
The technical scheme has the advantages that the real situation of a provider can be reflected more comprehensively by collecting and fusing multi-source data (such as financial statement, social media evaluation, customer feedback and the like) from different sources, and the unilateral or limitation caused by a single data source is avoided; the method comprises the steps of constructing an evaluation index system based on the fused multi-source data, covering multiple aspects (such as product quality, price, delivery capacity, service attitude, innovation capacity and the like) of a supplier, realizing multi-dimensional and omnibearing evaluation, preprocessing and screening the collected multi-source data, effectively removing noise data and abnormal values, improving accuracy and reliability of the data, precisely matching by utilizing common identifiers (such as supplier ID, product number and the like) in a data association process, ensuring consistency and accuracy of the data, reducing evaluation errors caused by data inconsistency, providing clear and visual reference basis for decision makers by output comprehensive evaluation reports comprising information of scores, ranks, advantages, shortages and the like of the supplier, helping to make more scientific and reasonable decisions, feeding back evaluation results to suppliers and related personnel, helping to solve problems and shortages existing in the suppliers, making improvement measures aiming at the accuracy, improving overall efficiency by utilizing common identifiers (such as supplier ID, product number and the like) in the data association process, improving the accuracy and the accuracy of the data by continuously optimizing and processing results and the evaluation algorithms, improving the transparency of the suppliers and optimizing the evaluation system, and optimizing the performance of the evaluation system, through a feedback mechanism, the suppliers can timely know the defects of the suppliers and take measures to improve, thereby not only being beneficial to improving the competitiveness of the suppliers, but also being beneficial to establishing a long-term stable cooperative relationship between enterprises and the suppliers and realizing win-win.
In one embodiment of the present invention, the S1 includes:
S11, collecting multi-source data of different sources through an acquisition script or an API interface, wherein the multi-source data comprises financial reports, historical transaction records, logistics tracking data, social media evaluation and market research reports provided by suppliers;
s12, the collected multi-source data are sent to a cloud space, and the cloud space stores the received multi-source data through a layered storage structure;
s13, preprocessing stored multi-source data, encrypting sensitive data, and desensitizing data which are not sensitive but related to privacy.
The technical scheme has the working principle that related data are acquired from various data sources (such as a database, a website, a file system and the like) through the acquisition script. For data sources supporting the API interface, the data is obtained directly by calling the API. The collected multi-source data comprises financial reports (reflecting financial conditions and business achievements) provided by suppliers, historical transaction records (recording transaction details and payment conditions), logistics tracking data (providing cargo transportation state and position information), social media evaluation (reflecting public opinion of products or services of the suppliers) and market research reports (containing information such as industry trend, competitor analysis and the like), and the collected multi-source data is sent to a cloud space for storage. The cloud space adopts a layered storage structure to manage data, the data is divided into different layers (such as a hot data layer, a warm data layer and a cold data layer) according to factors such as the access frequency, the importance or the storage cost of the data, and the data are respectively stored on storage media with different performance, cost, availability and the like, for example, the data with high access frequency can be divided into the hot data layer and stored on the storage media with better performance. The multi-source data stored in cloud space is preprocessed, including data cleansing (removing duplicate, erroneous, or extraneous data), data conversion (converting the data into a uniform format or structure), and data integration (integrating the data from different sources into a uniform data set). In the data preprocessing process, encryption processing is carried out on sensitive data. The sensitive data includes financial information, personal identity information, and the like. The security of sensitive data in the storage and transmission process can be ensured through encryption processing, and desensitization processing is carried out on data which is not sensitive but related to privacy (such as user name, telephone number and the like). The desensitization treatment is to reduce the privacy risk in the data by means of substitution, deletion or deformation and the like, and meanwhile, the analysis and evaluation values of the data are reserved.
The technical scheme has the advantages that the multi-source data are collected through the collection script or the API interface, the comprehensive and timeliness of the data can be flexibly adapted to the characteristics and the access modes of different data sources, manual intervention is reduced through an automatic data collection process, the efficiency and the accuracy of data collection are improved, the risk of human errors is reduced, the collected multi-source data comprise financial reports, historical transaction records, logistics tracking data, social media evaluation, market research reports and the like, rich information basis is provided for subsequent supplier evaluation based on the collected diversified data sources, almost infinite expandability is provided for cloud space, the rapid increase of the data quantity can be easily handled, the flexibility and the high efficiency of data storage are guaranteed, the cloud space generally adopts a redundant storage and backup mechanism, the high availability of the data is guaranteed, the integrity and the accessibility of the data can be guaranteed even if hardware faults or natural disasters occur, storage resources are reasonably distributed according to the access frequency and the importance of the data through a layered storage structure, the storage cost is reduced, the storage efficiency is improved, the data is preprocessed steps are carried out, the quality is improved, the data can be integrated and the privacy data can be well analyzed and the data can be well analyzed, but the data can be well-analyzed and the data can be not well-stored, the data can be well-analyzed, but the privacy is not well-balanced, the data can be well-handled, the privacy is guaranteed, the data can be well, the data is not be well analyzed, and the data is guaranteed, and the data is not is well is guaranteed.
In one embodiment of the present invention, the S12 includes:
Compressing the uploaded multi-source data through a compression algorithm to obtain a compressed first data packet, performing redundancy backup on the compressed data packet to obtain a second data packet, and respectively transmitting the first data packet and the second data packet to a cloud space through a multi-channel transmission protocol;
the cloud space stores the first data packet into the first storage space, wherein the second data packet in the second storage space is used for backing up data;
decompressing the first data packet in the first storage space, and capturing a data source which needs to be processed in real time in the first storage space by adopting a stream processing framework (such as APACHE KAFKA and APACHE FLINK) through a real-time data stream processing layer;
And reject the erroneous or invalid data, analyze the data after processing in real time, extract the key index (KPIs) or carry on the preliminary data aggregation;
constructing a layered historical data storage architecture in the first storage space, and automatically migrating data according to the data access frequency and importance by combining object storage (such as Amazon S3 and Alicloud OSS) with block storage (such as EBS);
And (3) introducing a data archiving strategy, transferring data which is inactive for a long time and still needs to be reserved (such as a historical transaction record and an old version of a market research report) to a cold storage area, reducing storage cost, and establishing an index for the historical data by utilizing a search engine technology (such as an elastic search).
The technical scheme comprises the working principle that firstly, the compression algorithm is used for compressing uploaded multi-source data, then the compressed data are packaged into a first data packet, redundancy backup is carried out on the compressed first data packet, and a second data packet is generated. The method comprises the steps of receiving a first data packet and a second data packet through a multi-channel transmission protocol, storing the first data packet (original compressed data) in a first storage space for main data processing and analysis after the first data packet and the second data packet are respectively sent to a cloud space through a multi-channel transmission protocol, storing the second data packet (redundant backup) in the second storage space as backup after the data packet is received by the cloud space, decompressing the first data packet in the first storage space and capturing a data source needing immediate processing through a real-time data stream processing layer (such as APACHE KAFKA, APACHE FLINK and other stream processing frameworks) through the redundancy backup. Capturing, processing and analyzing data from the data stream in real time through a real-time data stream framework, removing erroneous or invalid data in the stream processing process, analyzing the processed data in real time, extracting key indexes (KPIs) or performing preliminary data aggregation. The processing result can be immediately used for monitoring, alarming or preliminary business decision support, a layered historical data storage architecture is built in the first storage space, and the data is automatically migrated according to the access frequency and importance of the data by combining the advantages of object storage (such as Amazon S3 and Alicloud OSS) and block storage (such as EBS). And introducing a data archiving strategy to migrate data which is inactive for a long time and still needs to be reserved (such as historical transaction records and old versions of market research reports) to a cold storage area so as to further reduce the storage cost. At the same time, historical data is indexed using search engine technology (e.g., elastic search) to quickly retrieve and query the data.
The technical scheme has the advantages that the compression processing is carried out on multi-source data through a compression algorithm, the data volume is reduced, the occupation of storage space and the requirement of transmission bandwidth are reduced, further, storage and transmission costs are reduced, redundancy backup is carried out on the compressed data packets, second data packets are generated and are respectively sent to cloud space through a multi-channel transmission protocol, redundancy of data transmission is increased, the risk of data loss or damage is reduced, reliability of data transmission is improved, a stream processing frame (such as APACHE KAFKA and APACHE FLINK) is adopted to capture and process data sources needing to be processed immediately, immediate analysis and response of the data are realized, rapid service decision and real-time monitoring are supported, in the process of processing the real-time data, error or invalid data is eliminated, the quality and accuracy of the data can be improved, a reliable data base is provided for subsequent data analysis and application, real-time analysis is carried out on the processed data, key indexes (KPIs) or preliminary data aggregation is carried out, visual service conditions are facilitated, visual support is provided for a management layer, a data migration engine is further provided, a data migration system is realized by using a hierarchical support frame (such as APACHE KAFKA and APACHE FLINK) to be more important data storage and a low-frequency storage and history storage access history is realized by using a low-frequency storage access history storage and a low-frequency storage access history storage access window, and a low-frequency storage access window is realized, and a data migration engine is set up to be optimized, and a cold storage history storage window is stored in a storage area is required to have a low-cost storage history storage window or a storage window has the data storage service access history storage cost, the method has the advantages of improving the retrieval efficiency and the availability of the historical data, ensuring the safe storage of the data which is inactive for a long time and still needs to be reserved (such as historical transaction records and old versions of market research reports) through a data archiving strategy, and meeting the compliance and business continuity requirements of enterprises.
In one embodiment of the present invention, the S2 includes:
S21, through a multi-dimensional data source quality evaluation framework, the evaluation framework comprises a plurality of indexes such as data integrity, accuracy, timeliness, reliability and source authority. Based on a preset threshold value and a weight, carrying out quantization scoring on each data source, and identifying a main data source;
S22, aiming at a main data source, automatically adjusting an extraction time interval based on the frequency of data updating, and identifying and extracting key information fields by using a regular expression and a machine learning model;
S23, comparing the logic relationship and consistency (such as the audit relation among financial data) among historical data and verification data based on a data quality secondary verification mechanism in the extraction process;
S24, when relevant data sources are associated, through common identification (such as provider ID and product number), and combining with a multi-dimensional association strategy, wherein the multi-dimensional association strategy comprises sequence matching based on a timestamp, proximity analysis based on a geographic position and content association based on text similarity;
S25, after correlation, carrying out consistency verification, including cross-data verification (such as sales in financial statement and sales in market research report), rationality verification based on business logic (such as consistency verification of delivery timing rate and logistics tracking data), and anomaly detection based on statistical model (such as identification of anomaly transaction record by using clustering algorithm), and establishing feedback mechanism, feeding back consistency verification result to data collection and processing links, and simultaneously, continuously optimizing correlation rule and verification algorithm according to feedback result to form closed loop iterative optimization process.
The working principle of the technical scheme is that the data sources are comprehensively evaluated by utilizing a data source quality evaluation framework comprising a plurality of indexes such as data integrity, accuracy, timeliness, reliability and source authority, the accuracy and the reliability of the data are further verified by comparing historical data, verifying logical relations and consistence (such as the audit relation among financial data), the data quality problems are quantitatively scored, the main data source with the highest quality and most suitable for being used as a subsequent processing basis is identified according to a scoring result, the time interval of data extraction is automatically adjusted according to the updating frequency of the main data source, key information fields including provider names, product specifications and transaction amounts are identified and extracted from the main data source by utilizing a regular expression and a machine learning model, in the extraction process, the accuracy and the reliability of the data are further verified by comparing the historical data, verifying the logical relations and consistence (such as the audit relation among financial data), the discovered data quality problems are recorded and fed back to a data collecting and processing link in time, and a multi-dimensional association strategy (such as a time stamp sequence, geographic position adjacency and geographic position adjacency) is combined, and the data is dynamically associated with different data in the dynamic association rules or the abnormal conditions according to the abnormal conditions. For example, additional verification fields are introduced to carry out auxiliary association or emotion analysis model parameters are adjusted to filter noise, data consistency among different data sources is ensured through cross verification of data (such as sales in financial statement and market research report) of cross data sources, data quality is further improved based on rationality verification of business logic (such as consistency of delivery timing rate and logistics tracking data) and anomaly detection based on a statistical model (such as identification of abnormal transaction records by a clustering algorithm), a feedback mechanism is established, results of consistency verification are fed back to a data collecting and processing link, and association rules and verification algorithms are optimized continuously according to feedback results. And forming a closed loop iterative optimization process, and continuously improving the accuracy and efficiency of data processing.
Through a multi-dimensional data source quality evaluation framework, indexes such as data integrity, accuracy, timeliness, reliability, source authority and the like are comprehensively considered, and the quality of the data source can be comprehensively evaluated, so that the firmness and reliability of a data base of subsequent processing are ensured; the method comprises the steps of providing a data source, providing a data source with a preset threshold value, providing a high-quality data input for subsequent data extraction and association based on a quantization score of the preset threshold value and the weight, automatically adjusting an extraction time interval according to the frequency of data updating, avoiding unnecessary frequent extraction, improving data processing efficiency, automatically identifying and extracting key information fields by using a regular expression and a machine learning model, reducing manual intervention, improving automation level, timely finding and correcting data errors by comparing historical data and verifying logical relations and consistency among the data in the extraction process, ensuring the accuracy and consistency of the data, realizing effective association among different data sources by a common identification and multidimensional association strategy, further strengthening the integrity and consistency of the data, dynamically adjusting association rules according to the data characteristics and service requirements, flexibly coping with abnormal or special situations in the data, such as repeated ID (identity) of a provider, social media noise and the like, accurately detecting abnormal transaction records based on a statistical model, timely finding and processing abnormal transaction records by using a clustering algorithm, ensuring the accuracy and reliability of the data, and optimizing the association mechanism by establishing the feedback mechanism and the feedback mechanism according to the feedback result and the feedback rule.
In one embodiment of the present invention, the step S22 includes:
Monitoring update logs or time stamps of the main data source in real time or periodically, recording time intervals of each data update, and analyzing historical update frequency data to obtain analysis results, wherein the analysis results comprise periodic rules (such as daily, weekly or monthly update) of the data update and abnormal fluctuation;
Setting an initial data extraction time interval according to an analysis result, ensuring that the latest data can be captured in time while avoiding resource waste caused by excessive frequent extraction, and automatically adjusting the extraction time interval to match a new update speed when the change of the data update frequency is monitored (such as acceleration or slowing down) through a dynamic adjustment mechanism;
positioning and extracting key information fields through regular expressions aiming at structured data (such as table data in a database);
For unstructured or semi-structured data (such as text reports and social media comments), key information is identified and extracted through an existing named entity identification model;
And the regular expression is used for the information which is easy to match through rules, and the complicated and changeable information is processed by depending on the named entity recognition model.
The method comprises the steps of monitoring an extraction process in real time, recording key indexes of each extraction task, wherein the key indexes comprise execution time, success rate and error information, triggering an alarm and taking corresponding emergency measures (such as retrying extraction, switching standby data sources and the like) when an abnormality (such as data format disagreement, network interruption and the like) occurs in the extraction process through an abnormality detection mechanism;
the method comprises the steps of carrying out primary verification on extracted data through an extraction result verification mechanism, evaluating the effectiveness of an extraction strategy based on data quality analysis of an extraction result, and carrying out optimization adjustment on a mixing strategy according to an analysis result.
The working principle of the technical scheme is that the time interval of each data update is recorded in real time or periodically by monitoring an update log or a timestamp of a main data source, historical update frequency data is analyzed, periodicity rules (such as daily, weekly, monthly and the like) and possible abnormal fluctuation of the data update are identified, an initial data extraction time interval is set, for example, one hour based on an analysis result, change of the data update frequency is continuously monitored, when the change of the data update frequency is monitored (such as acceleration or slowing down), the data extraction time interval is automatically adjusted to match new update speed, key information fields such as date, amount and identifier are accurately positioned and extracted by using regular expressions for structured data (such as database tables), key information is identified and extracted by using the existing named entity identification (NER) model for unstructured or semi-structured data (such as text report and social media comment), and a mixed extraction strategy is constructed by combining advantages of the regular expressions and the named entity identification model. The method comprises the steps of using regular expressions for information which is easy to match through rules, processing the information which is complex and variable and depends on a named entity recognition model, monitoring key indexes of each extraction task in real time, such as execution time, success rate and error information, triggering alarm through an abnormality detection mechanism when abnormality (such as data format disagreement and network interruption) occurs in the extraction process, automatically or manually taking corresponding emergency measures, such as retrying extraction, switching standby data sources and the like, carrying out primary verification on the extracted data through an extraction result verification mechanism, evaluating the basic quality of the data, analyzing the data quality based on the extraction result, evaluating the effectiveness of an extraction strategy, and carrying out optimization adjustment on the mixed extraction strategy according to the analysis result so as to improve the accuracy and efficiency of data extraction.
The technical scheme has the advantages that the latest data can be captured in time by analyzing the update frequency of the main data source and setting the initial extraction time interval, and meanwhile unnecessary frequent extraction is avoided, so that computing resources and network resources are saved. When the change of the data updating frequency is monitored, the extraction time interval is automatically adjusted, the flexibility and the efficiency of data extraction are further improved, and key information can be extracted from structured, unstructured or semi-structured data more accurately by combining a mixed extraction strategy of a regular expression and a named entity recognition model. The regular expression is suitable for data with definite rules and fixed formats, the named entity recognition model is good at processing complex and changeable data with rich semantics, the accuracy of data extraction is greatly improved by combining the regular expression and the named entity recognition model, the extraction process is monitored in real time, key indexes such as execution time, success rate and error information are recorded, potential problems can be found timely, an alarm can be triggered rapidly when an abnormality occurs in the extraction process through an abnormality detection mechanism, corresponding emergency measures such as retrying extraction, switching standby data sources and the like can be adopted automatically or manually, continuity and stability of data extraction are ensured, data loss or delay caused by the abnormality is reduced, primary verification is carried out on the extracted data, data quality is evaluated, and accuracy and reliability of an extraction result are ensured. The method comprises the steps of carrying out optimization adjustment on a mixed extraction strategy based on the result of data quality analysis, continuously improving the precision and efficiency of data extraction, carrying out data extraction by using a machine learning model such as named entity recognition and the like, realizing intelligent processing, reducing manual intervention, improving the processing efficiency and accuracy, carrying out adjustment from data updating monitoring to extraction time interval, carrying out data extraction and verification, realizing high automation on the whole process, reducing the risk of human errors, improving the overall data processing efficiency, flexibly adjusting and optimizing the mixed extraction strategy according to the data characteristics and service requirements so as to adapt to different data sources and data formats, improving the expandability and adaptability of the scheme, and rapidly adapting to the changes through the dynamic adjustment mechanism and the optimization of the mixed extraction strategy when the data sources or the data formats change, and ensuring the stability and accuracy of data extraction.
In one embodiment of the present invention, the S3 includes:
S31, identifying and processing the isomerism problem among the multi-source data, and converting the data of different sources into a uniform format through methods of data conversion, standardization, normalization and the like;
s32, designing a data weight distribution mechanism by using a hierarchical analysis method and a fuzzy comprehensive evaluation method and combining an expert database based on multi-dimensional indexes of data, wherein the multi-dimensional indexes comprise importance, reliability, timeliness and source authority;
S33, fusing the data through a fusion algorithm (such as weighted average, principal component analysis, neural network and the like), and performing outlier detection and a data cleaning mechanism in the fusion process;
S34, after fusion is completed, verifying and evaluating the fused data by comparing historical data, industry references and business logic;
S35, dividing an index system into a plurality of dimensions (such as financial conditions, production capacity, product quality, delivery time rate, after-sales service and the like) according to the requirements and targets of the evaluation of suppliers, further refining the evaluation index under each dimension to form a multi-level index system structure, establishing an index dynamic adjustment mechanism, and dynamically adjusting the index system according to influencing factors including market change, business requirements and data feedback. The method comprises the steps of adding or deleting an evaluation index, adjusting index weight, optimizing an index calculation method and the like;
The technical scheme comprises the working principle that firstly, a system identifies inconsistency of data from different sources in data types, formats, units and semantics, then, data from different sources are converted into unified formats and standards by adopting methods such as data conversion, standardization and normalization, multidimensional indexes such as importance, reliability, timeliness and source authority of the data are utilized, a data weight distribution mechanism is designed by utilizing a analytic hierarchy process and a fuzzy comprehensive evaluation method and combining with the opinion of an expert database, the weight distribution mechanism is timely adjusted through a dynamic adjustment strategy according to the actual performance of the data and the change of market and business requirements, the advantages of each data source are fully utilized by adopting fusion algorithms such as weighted average, principal component analysis and neural network, the comprehensiveness and accuracy of the data are improved, an abnormal value detection and data cleaning mechanism is implemented in the fusion process, error or unreasonable data are removed after the fusion is completed, the fused data are comprehensively verified and evaluated by comparing historical data, industry references and business logic, the fused data are divided into a plurality of dimensional indexes (such as the dimensional index, the quality index and the quality index are further adjusted in a multilevel system is further according to the quality of the market, the quality index is further adjusted, and the quality index is more flexible, and the quality index is further adjusted according to the quality of the market, and the quality index is more flexible, and the quality index is further adjusted in the market and quality index is better adjusted. Including adding or deleting evaluation index, adjusting index weight, optimizing index calculation method, etc.
The technical scheme has the advantages that the problem of isomerism among the multi-source data is identified and processed, so that inconsistency among data types, formats, units and semantics is solved, data from different sources can be converted into a unified format, and a reliable basis is provided for subsequent data analysis and decision. The information island can be broken through by improving the integration capability, the comprehensive interconnection and sharing of data are realized, and the contribution degree of each data source can be reasonably evaluated based on a weight distribution mechanism designed by multidimensional indexes (such as importance, reliability, timeliness and source authority), so that the quality difference of the data is fully considered in the data fusion process. The method comprises the steps of determining the actual performance of data, the change of market and business demands, and the like, and determining the actual performance of the data, the change of the market and business demands, the time-based and accurate weight distribution, and the like. And after the fusion is completed, the fused data is comprehensively verified and evaluated by comparing historical data, industry references and business logic, so that the accuracy and applicability of the data are ensured. According to the requirements and targets of the supplier evaluation, the index system is divided into a plurality of dimensions, and the evaluation index is further refined under each dimension, so that a multi-level index system structure is formed. The comprehensive strength of the suppliers can be comprehensively reflected through a multi-dimensional index system, rich visual angles and basis are provided for evaluation, an index dynamic adjustment mechanism is established, and the index system is flexibly adjusted according to influence factors such as market change, business requirements, data feedback and the like. Through the dynamic adjustment capability, the index system can always keep synchronous with the actual situation, and the timeliness and accuracy of evaluation are ensured.
In one embodiment of the present invention, the step S33 includes:
Selecting fusion algorithm according to data characteristics (such as data quantity, data distribution, heterogeneous degree and the like) and service requirements, detecting abnormal values in the data again by using a statistical method, eliminating or correcting the abnormal values, cleaning the multi-source data after the re-detection, removing duplicate data, correcting error data, processing missing values and the like,
According to the selected fusion algorithm and data characteristics, a fusion strategy is formulated, including data alignment, fusion sequence, weight distribution and the like;
preliminary evaluation of the quality of the fusion result is carried out by calculating statistical indexes (such as mean value, variance, correlation coefficient and the like) and visual analysis (such as scatter diagram, thermodynamic diagram and the like) of the fusion result;
And feeding back and adjusting the fusion strategy according to the preliminary evaluation result, and continuously optimizing the fusion algorithm and strategy through multiple iterations.
The principle of operation of the above solution is that, first, a suitable fusion algorithm is selected according to the data characteristics (such as data volume, data distribution, degree of heterogeneity, etc.) and the traffic demand, for example, assuming that a financial transaction data set containing millions of records is being processed, and that these data need to be analyzed in real time to detect fraudulent activity. Because of the large amount of data and limited computational resources, it is possible to select distributed algorithms that can handle large-scale datasets, such as the APACHE SPARK supported machine learning algorithms. These algorithms enable parallel processing of data on multiple machines, thereby increasing processing speed. The method is characterized in that the method comprises the steps of determining a data fusion strategy, determining a detailed fusion strategy according to a selected fusion algorithm and data characteristics, including data alignment (ensuring that related data in different data sources can be corresponding), a fusion sequence (determining the sequence of data fusion) and weight distribution (distributing weights according to factors such as importance and reliability of the data), executing data fusion operation according to the determined fusion strategy, integrating data of a plurality of data sources into a unified data set, performing real-time monitoring in a fusion process, calculating statistical indexes (such as mean value, variance, correlation coefficient and the like) of the fusion result, primarily evaluating the quality of the fusion result, better understanding the distribution and the characteristics of the fusion result according to visual tools (such as a scatter diagram, a graph and the like), and better understanding the characteristics of the primary evaluation fusion result by utilizing visual tools (such as a visual tools and the like) to better understand the distribution and display of the data according to the characteristics of the primary evaluation result. And continuously optimizing the fusion algorithm and strategy through multiple iterations until a satisfactory fusion result is obtained.
The technical scheme has the advantages that a proper fusion algorithm is selected according to the data characteristics and the service requirements, so that the data fusion process can be ensured to be more efficient and the result is more accurate. The method can ensure the optimal integration effect of the data by processing different data characteristics and service requirements through different algorithms, and can remarkably improve the quality and consistency of the data by detecting and processing the abnormal values in the data again and cleaning repeated, wrong and missing data. The fusion strategy is formulated according to the selected fusion algorithm and data characteristics, including data alignment, fusion sequence, weight distribution and the like, and the complexity and the isomerism of different data sources can be flexibly dealt with. The flexibility enables the technical scheme to adapt to various different application scenes and data environments, monitors the fusion process in real time, feeds back and adjusts the fusion strategy according to the primary evaluation result, and can ensure the controllability and optimality of the fusion process. The quality and accuracy of the fusion result can be continuously improved through repeated iterative optimization of the fusion algorithm and the strategy, and the fusion result with high quality, consistency and accuracy can be generated through a refined data fusion process. Based on the fusion result, powerful support can be provided for enterprise decision making, a decision maker is helped to grasp market trend more accurately, evaluate supplier strength, optimize production flow and the like, and characteristics and trend of the fusion result can be intuitively displayed through calculation of statistical indexes and visual analysis. The technical scheme comprises a detection and processing mechanism for abnormal values, can effectively identify and remove the abnormal values in the data, and reduces system errors and instability caused by data errors or noise. The method is beneficial to enhancing the robustness and reliability of the system, and can continuously improve the performance and stability of the system by optimizing the fusion algorithm and strategy through multiple iterations. The system can continuously adapt to the change of the data environment and the change of the service requirement through an iterative optimization mechanism, and the high-efficiency and accurate running state is maintained.
In one embodiment of the present invention, the S4 includes:
S41, constructing an advanced model by combining a plurality of integrated learning methods (such as Stacking, boosting and the like) with a plurality of machine learning algorithms, and performing parameter tuning and model fusion;
s42, evaluating the performance of the model through cross validation and ROC curve analysis, and selecting an optimal model for deployment based on an evaluation result;
s43, selecting an evaluation method (such as a hierarchical analysis method, a fuzzy comprehensive evaluation method, a TOPSIS method and the like) according to the evaluation requirement, forming a composite evaluation algorithm through weighted fusion and nonlinear combination, and automatically adjusting the weight of each evaluation method according to the change of an evaluation target and the real-time update of data through a weight dynamic adjustment mechanism;
and S44, constructing a risk assessment model through machine learning, quantitatively assessing the risk possibly existing in the provider, presetting an early warning threshold according to historical data and business rules, automatically triggering early warning when the risk assessment result exceeds the preset threshold, and responding through an early warning response mechanism, wherein the early warning notification, the problem tracking and the solution recommendation are included. The risk assessment result is obtained through the following formula:
Wherein R is the risk assessment result, Is the weight of the i-th risk factor,Is the specific score for the ith risk factor, n is the total number of risk factors,As a function ofAnd converting the data into a numerical value which can be used for calculation, wherein T is an early warning threshold value, and triggering early warning if R is more than T.
The working principle of the technical scheme is that a plurality of integrated learning methods (such as Stacking, boosting) are combined with a plurality of machine learning algorithms (such as decision trees, random forests, SVMs and the like) to construct an advanced model with higher prediction performance and generalization capability. The model is used for combining prediction results of a plurality of base models, parameter tuning is carried out on each model through a grid search method, a random search method or a Bayesian optimization method and the like to find the optimal model parameter combination so that the model performs optimal performance on a verification set, and deep learning models (such as LSTM, CNN and the like) are adopted for processing complex data (such as time sequence data, image data and the like). And automatically extracting deep feature representation from the original data through the advanced model, and further using the deep feature representation for predictive analysis. The nonlinear relation and the complex mode in the data can be captured by training the deep learning model, the prediction results of a plurality of advanced models and the deep learning model are fused by adopting weighted average, voting or a more complex integration strategy, and the performance of the advanced models is evaluated by adopting a cross-validation method (such as K-fold cross-validation). The generalization ability of the advanced model is evaluated in a plurality of training-verification rounds by dividing the dataset into a training set and a verification set (or test set), and the classification performance of the advanced model is analyzed by using the ROC curve. The ROC curve is used for evaluating the classification capability and the robustness of the advanced model by drawing relation diagrams of real case rate (TPR) and false positive case rate (FPR) under different thresholds, selecting the advanced model with optimal performance for deployment based on the results of cross-validation and ROC curve analysis, and selecting a proper evaluation method (such as a analytic hierarchy process, a fuzzy comprehensive evaluation method, a TOPSIS method and the like) according to the evaluation requirement. And integrating the results of the multiple evaluation methods into a composite evaluation algorithm by means of weighted fusion and nonlinear combination. And automatically adjusting the weight of each evaluation method through a weight dynamic adjustment mechanism according to the change of the evaluation target and the real-time update of the data. And constructing a risk assessment model through a machine learning algorithm, and quantitatively assessing the risk possibly existing in the provider. And presetting a reasonable early warning threshold according to the historical data and the business rules, for example, early warning is started when the risk assessment result reaches six achievement. And once the early warning is triggered, the system timely informs related personnel through early warning notification (such as mail, short message, message pushing and the like). Meanwhile, the system starts a problem tracking and solution recommending mechanism, helps related personnel to quickly locate the problem and takes corresponding measures.
The technical scheme has the advantages that a higher-level and more complex model can be constructed by combining a plurality of integrated learning methods and machine learning algorithms, deviation and variance of a single model can be reduced by integrating the advantages of a plurality of base models, so that prediction precision and generalization capability are improved, and deep learning models (such as LSTM and CNN) can automatically extract deep feature representations aiming at complex data (such as time sequences and images) to capture nonlinear relations and complex modes in the data. The method can more accurately understand the rules behind the data, improve the accuracy of predictive analysis, and comprehensively and objectively evaluate the performance of the model through cross verification and ROC curve analysis. The method is beneficial to avoiding the problems of over-fitting and under-fitting, ensuring that the selected model has good performance on unknown data, and selecting the optimal model for deployment based on scientific evaluation results. The method can ensure the stability and reliability of the system in practical application, and can construct a flexible and comprehensive composite evaluation algorithm by combining multiple evaluation methods through weighted fusion and nonlinearity. The method has the advantages that the comprehensive and accuracy of the evaluation are improved by fully considering the advantages and applicable scenes of different evaluation methods, and the weight of each evaluation method can be automatically adjusted by the evaluation algorithm based on the weight dynamic adjustment mechanism according to the change of the evaluation target and the real-time update of the data. The evaluation algorithm can be ensured to be consistent with the actual situation all the time, and the timeliness and the accuracy of the evaluation are improved; the risk assessment model constructed by machine learning enables quantitative assessment of risk that may exist for a provider. The method can help enterprises to discover potential risk factors in time and provide support for decision making, and when the risk assessment result exceeds a preset threshold value, an early warning mechanism is automatically triggered and responds in modes of early warning notification, problem tracking, solution recommendation and the like. The method is beneficial to the enterprises to rapidly cope with risk events and reduce loss, and provides comprehensive decision support for the enterprises by providing functions of advanced model prediction, scientific model evaluation, comprehensive evaluation system, effective risk evaluation and early warning mechanism and the like. This helps the enterprise make decisions more efficiently and accurately, improving market competitiveness. The risk assessment formula allows different weights to be given to different risk factors according to specific situations, so that the flexibility and the customizability of assessment can be realized. Under different industries and different service scenes, the importance of each risk factor may be different, and the actual situation can be reflected more accurately by adjusting the weight. The above formula ensures the comprehensiveness of the assessment by accumulating the scores of all risk factors after weighting. Each risk factor is taken into consideration, so that the one-sided performance of the overall risk determined by a single factor is avoided, and the evaluation result is more comprehensive and reliable. In the formulaThe function is used to score the risk factors specificallyAnd the risk is quantitatively evaluated by converting the risk into a numerical value which can be used for calculation. This helps to convert subjective judgment into objective data, improving accuracy and comparability of evaluation. The formula ensures the effectiveness of an early warning mechanism by presetting an early warning threshold (T) and automatically triggering early warning when a risk assessment result (R) exceeds the threshold. The automatic early warning response can timely find potential risks, provide timely countermeasures for enterprises or organizations, and reduce the probability and loss of risk occurrence. The weight in the formula can be automatically adjusted according to the change of the evaluation target and the real-time update of the data through a weight dynamic adjustment mechanism. The dynamic adjustment capability enables the risk assessment model to adapt to different service environments and demand changes, and accuracy and timeliness of assessment results are maintained. The risk assessment result (R) provides a quantitative index for the decision maker regarding the risk level of the provider, which helps the decision maker to more scientifically formulate purchasing strategies, risk management plans, and the like. Meanwhile, the functions of early warning notification, problem tracking, solution recommendation and the like provided by the early warning response mechanism further support decision making and implementation.
In one embodiment of the present invention, the step S5 includes:
s51, constructing a trend prediction system based on time sequence analysis or a machine learning model, modeling historical performance data of suppliers, and predicting future development trend;
S52, comparing and analyzing the evaluation result of the supplier with other competitors or industry targets, identifying own advantages and defects, providing basis for making competition strategies, generating personalized improvement suggestions according to the specific performances of the supplier, for example, providing specific quality control flow optimization suggestions for the defects of a certain supplier in product quality, and recommending to introduce an advanced supply chain management system for the problem of slow response speed of a supply chain.
S53, establishing an improvement effect tracking mechanism, periodically collecting performance data of suppliers after adopting improvement suggestions, evaluating the improvement effect, and adjusting the suggestion content according to feedback;
S54, establishing a multi-channel user feedback collection mechanism, including online investigation, user interviews, customer service hotlines and the like, ensuring that user opinions can be collected comprehensively and timely, sorting and analyzing the collected user feedback through an iterative optimization flow, and making a corresponding optimization plan, and continuously iterating and optimizing system functions and performances based on an optimization strategy.
The technical scheme comprises the following working principle that firstly, historical performance data of a provider are collected, wherein the historical performance data comprise a plurality of dimensions such as order quantity, delivery time rate, product quality qualification rate, after-sales service evaluation and the like. Next, the data is cleaned, denoised and normalized, and a trend prediction system is constructed based on time series analysis or machine learning algorithms (e.g., ARIMA, LSTM, XGBoost, etc.). And the model capable of predicting the future development trend is established by analyzing the historical data and learning the mode and trend in the data, and the future development trend of the supplier is predicted by utilizing the established model. The prediction result may include order quantity prediction, delivery time rate change, product quality trend, etc. in a future period, for example, the prediction result is that the order quantity will be greatly increased in a month in the future. And comparing and analyzing the evaluation result of the provider with a competitor or an industry standard pole to identify which aspects of the provider have advantages and which aspects have disadvantages. The method can help enterprises to know the positions of the enterprises in the market through comparative analysis, provides basis for establishing competition strategies, and generates personalized improvement suggestions according to the concrete performances of suppliers. The improvement advice includes aspects for product quality, delivery time rate, after-market service, and the like. For example, specific quality control flow optimization suggestions can be provided for the deficiency of a certain provider in product quality, introduction of an advanced supply chain management system and the like can be recommended for the problem of low response speed of a supply chain, and performance data of the provider after adopting the improvement suggestions are collected periodically. These data are used to evaluate the effect of the improvement suggestion, and by comparing the data before and after improvement, it is evaluated whether the effect of improvement is significant. If the effect is significant, the improvement suggestion is effective, if the effect is not significant or even worse, the feasibility and implementation condition of the improvement suggestion need to be reviewed again, and feedback adjustment is carried out on the improvement suggestion according to the evaluation result. If the advice is effective, continuing to promote, if the advice is ineffective or needs improvement, reformulating new advice or adjusting implementation strategies, establishing a multi-channel user feedback collection mechanism comprising online investigation, user interviews, customer service hotline and the like, and sorting and analyzing the collected user feedback. And (3) extracting key information and comments in user feedback through data analysis technology (such as text mining, emotion analysis and the like), and making a corresponding optimization plan based on analysis results of the user feedback. The optimization plan comprises multiple aspects of improvement of system functions, improvement of performance, optimization of user experience and the like, and the system functions and the performance are continuously and iteratively optimized according to the optimization plan. Through continuous user feedback collection and iterative optimization flow, the system is ensured to always meet the user requirements and keep a good running state.
The technical scheme has the effect that enterprises can accurately predict future development trends of suppliers by constructing a trend prediction system based on time sequence analysis or machine learning models. The method can help enterprises to know market changes in advance, and make prospective purchasing plans and supply chain strategies so as to avoid potential risks and losses, and the prediction results provide powerful decision support for the enterprises. The enterprise can adjust key links such as supplier selection, inventory management, production plan and the like according to the prediction result so as to cope with market fluctuation and change, and the supplier evaluation result is compared and analyzed with other competitors or industry targets so as to be beneficial to the enterprise to identify own advantages and defects. The method not only can help enterprises to know the positions of the enterprises in the supply chain by performing comparative analysis, but also can provide powerful basis for the enterprises to formulate differentiated competition strategies, and can generate personalized improvement suggestions according to the concrete performances of suppliers, thereby being beneficial to the enterprises to solve the problems in the supply chain in a targeted manner. The improvement proposal can be directly applied to the management and improvement process of the suppliers to improve the overall performance level of the suppliers, and the performance data of the suppliers after adopting the improvement proposal is collected periodically by establishing an improvement effect tracking mechanism to evaluate the improvement effect. The tracking mechanism ensures the effectiveness and sustainability of the improvement measures, is beneficial to the enterprise to continuously optimize the supply chain management flow, adjusts the recommended content according to the evaluation result and the feedback opinion, and can ensure that the improvement measures always meet the actual demands and market changes of the enterprise. By establishing a multi-channel user feedback collection mechanism, enterprises can be ensured to collect user opinions comprehensively and timely. The method is beneficial to the enterprises to know the user demands and market dynamics, provides powerful support for product development and system optimization, and sorts, analyzes and optimizes planning on the collected user feedback through iterative optimization flow. The system is continuously iterated and optimized based on the optimization strategy to ensure that the system always meets the demands of users and keeps the leading position, and the overall efficiency and the competitiveness of the supply chain are obviously improved through the comprehensive effects of a plurality of links such as accurate trend prediction, competitive advantage identification, continuous improvement, user feedback response and the like.
One embodiment of the present invention, as shown in fig. 2, is a system for implementing a multi-source data processing method applied to vendor evaluation, the system comprising:
The data collection module is used for collecting multi-source data of different sources and preprocessing the collected multi-source data;
The data association module is used for determining a main data source and related data sources according to the evaluation requirements of suppliers, wherein the main data source is core data in the evaluation process, such as a financial statement, the related data sources are auxiliary data, such as social media evaluation, extracting effective data from the main data source based on a preset data screening rule, and associating the extracted effective data with data in the related data sources;
The data fusion module is used for fusing the effective data and the effective related data to form a comprehensive data set, and constructing an index system for evaluating suppliers based on the fused multi-source data;
The comprehensive evaluation module is used for constructing a data processing model for each evaluation dimension, carrying out deep analysis on the multi-source data, extracting key evaluation indexes and characteristics, and carrying out comprehensive evaluation on suppliers by adopting a comprehensive evaluation algorithm (such as a hierarchical analysis method, a fuzzy comprehensive evaluation method and the like) based on the evaluation indexes and characteristics of each dimension;
And the result feedback module is used for outputting a comprehensive evaluation report of the provider according to the result of the comprehensive evaluation algorithm, wherein the report comprises information on the score, ranking, superiority, insufficiency and the like of the provider, feeding back the evaluation result to the provider and related personnel, helping the provider to know the problems and the insufficiency of the provider and making corresponding improvement measures. Meanwhile, according to the evaluation result and the feedback opinion, a data processing model and an evaluation algorithm are continuously optimized and perfected, and the accuracy and the effectiveness of evaluation are improved.
The technical scheme has the working principle that multi-source data related to the supplier evaluation are collected through different channels (such as an enterprise internal system, a public database, social media, a third party evaluation platform and the like). The multi-source data comprises multiple types of financial reports, transaction records, customer evaluations, social media feedback and the like, the collected multi-source data is preprocessed, and a main data source (such as the financial reports) and related data sources (such as the social media evaluations) are determined according to evaluation requirements. The main data source is the core of evaluation, the related data source is used for providing auxiliary information, and the effective data is extracted from the main data source based on a preset data screening rule. The data screening rules relate to aspects of time range, integrity, rationality and the like of the data, such as screening the latest data acquired within one week, and matching and correlating the extracted effective data with the data in the related data sources by utilizing common identifiers (such as supplier ID, product number and the like) in the data. And fusing the effective data and the effective related data to form a comprehensive data set. And constructing an index system for evaluating suppliers based on the fused multi-source data. The index system covers multiple aspects of suppliers (such as product quality, delivery capacity, service attitude, price competitiveness and the like), determines specific evaluation indexes of each aspect, such as the product quality reaching national standards or the service attitude being five stars, builds a data processing model aiming at each evaluation dimension, carries out deep analysis on multi-source data, and extracts key evaluation indexes and characteristics. Based on the evaluation index and the characteristics of each dimension, comprehensive evaluation algorithms (such as a hierarchical analysis method, a fuzzy comprehensive evaluation method and the like) are adopted to comprehensively evaluate the suppliers, comprehensively consider a plurality of factors, give comprehensive scores and ranks, and output comprehensive evaluation reports of the suppliers according to the results of the comprehensive evaluation algorithms. The comprehensive evaluation report comprises information on the score, ranking, superiority, deficiency and the like of the suppliers, and the evaluation result is fed back to the suppliers and related personnel to help the suppliers and related personnel to know the problems and the deficiency of the suppliers and the related personnel and to formulate corresponding improvement measures. And meanwhile, continuously optimizing and perfecting the data processing model and the evaluation algorithm according to the evaluation result and the feedback opinion.
The technical scheme has the advantages that the real situation of a provider can be reflected more comprehensively by collecting and fusing multi-source data (such as financial statement, social media evaluation, customer feedback and the like) from different sources, and the unilateral or limitation caused by a single data source is avoided; the method comprises the steps of constructing an evaluation index system based on the fused multi-source data, covering multiple aspects (such as product quality, price, delivery capacity, service attitude, innovation capacity and the like) of a supplier, realizing multi-dimensional and omnibearing evaluation, preprocessing and screening the collected multi-source data, effectively removing noise data and abnormal values, improving accuracy and reliability of the data, precisely matching by utilizing common identifiers (such as supplier ID, product number and the like) in a data association process, ensuring consistency and accuracy of the data, reducing evaluation errors caused by data inconsistency, providing clear and visual reference basis for decision makers by output comprehensive evaluation reports comprising information of scores, ranks, advantages, shortages and the like of the supplier, helping to make more scientific and reasonable decisions, feeding back evaluation results to suppliers and related personnel, helping to solve problems and shortages existing in the suppliers, making improvement measures aiming at the accuracy, improving overall efficiency by utilizing common identifiers (such as supplier ID, product number and the like) in the data association process, improving the accuracy and the accuracy of the data by continuously optimizing and processing results and the evaluation algorithms, improving the transparency of the suppliers and optimizing the evaluation system, and optimizing the performance of the evaluation system, through a feedback mechanism, the suppliers can timely know the defects of the suppliers and take measures to improve, thereby not only being beneficial to improving the competitiveness of the suppliers, but also being beneficial to establishing a long-term stable cooperative relationship between enterprises and the suppliers and realizing win-win.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (6)

1.一种应用于供应商评价的多源数据处理方法,其特征在于,所述方法包括:1. A multi-source data processing method for supplier evaluation, characterized in that the method comprises: S1、对不同来源的多源数据进行收集,并对收集到的多源数据进行预处理;S1. Collect multi-source data from different sources and pre-process the collected multi-source data; S2、根据供应商评价需求,确定主数据源以及相关数据源,基于预设的数据筛查规则,从主数据源中进行有效数据提取,并将提取的有效数据与相关数据源中的数据进行关联;S2. Determine the main data source and related data sources according to the supplier evaluation requirements, extract valid data from the main data source based on the preset data screening rules, and associate the extracted valid data with the data in the related data sources; S3、将有效数据和有效相关数据进行融合,形成综合数据集,基于融合后的多源数据,构建供应商评价的指标体系;S3. Integrate valid data and valid related data to form a comprehensive data set, and build an indicator system for supplier evaluation based on the integrated multi-source data; S4、针对每个评价维度,构建数据处理模型,对多源数据进行深入分析,提取关键评价指标和特征,基于各维度的评价指标和特征,采用综合评价算法对供应商进行综合评价;S4. For each evaluation dimension, a data processing model is constructed to conduct in-depth analysis of multi-source data, extract key evaluation indicators and features, and use a comprehensive evaluation algorithm to conduct a comprehensive evaluation of suppliers based on the evaluation indicators and features of each dimension; S5、根据综合评价算法的结果,输出供应商的综合评价报告,并将评价结果反馈给供应商和相关人员;S5. Output the comprehensive evaluation report of the supplier based on the results of the comprehensive evaluation algorithm, and feed back the evaluation results to the supplier and relevant personnel; 所述S2,包括:The S2 comprises: S21、通过多维度的数据源质量评估框架,基于预设阈值和权重,对各个数据源进行量化评分,识别出主数据源;S21. Through a multi-dimensional data source quality assessment framework, quantitatively score each data source based on preset thresholds and weights to identify the primary data source; S22、针对主数据源,基于数据更新的频率自动调整提取时间间隔,利用正则表达式和机器学习模型识别并提取关键信息字段;S22. For the primary data source, automatically adjust the extraction time interval based on the frequency of data update, and use regular expressions and machine learning models to identify and extract key information fields; S23、提取过程中,基于数据质量二次校验机制,对历史数据、验证数据间的逻辑关系和一致性进行对比;S23. During the extraction process, the logical relationship and consistency between historical data and verification data are compared based on the data quality secondary verification mechanism; S24、在关联相关数据源时,通过共同标识,并结合多维度的关联策略,根据数据特点和业务需求,动态调整关联规则;S24. When associating related data sources, dynamically adjust association rules based on data characteristics and business needs through common identification and combined with multi-dimensional association strategies; S25、在关联后,进行一致性校验;并建立反馈机制,将一致性校验的结果反馈给数据收集和处理环节,同时,根据反馈结果不断优化关联规则和校验算法,形成闭环的迭代优化过程;S25. After association, a consistency check is performed; and a feedback mechanism is established to feed back the consistency check results to the data collection and processing links. At the same time, the association rules and the verification algorithm are continuously optimized according to the feedback results to form a closed-loop iterative optimization process; 所述S22,包括:The S22 includes: 实时或定期监测主数据源的更新日志或时间戳,对每次数据更新的时间间隔进行记录,并对历史更新频率数据进行分析,获得分析结果;Monitor the update log or timestamp of the main data source in real time or regularly, record the time interval of each data update, and analyze the historical update frequency data to obtain analysis results; 根据分析结果,设定初始的数据提取时间间隔,通过动态调整机制,当监测到数据更新频率发生变化时,自动调整提取时间间隔以匹配新的更新速度;According to the analysis results, the initial data extraction time interval is set. Through the dynamic adjustment mechanism, when the data update frequency changes, the extraction time interval is automatically adjusted to match the new update speed; 针对结构化数据,通过正则表达式对关键信息字段进行定位和提取;For structured data, regular expressions are used to locate and extract key information fields; 对于非结构化或半结构化数据,通过现有命名实体识别模型识别并提取关键信息;For unstructured or semi-structured data, identify and extract key information through existing named entity recognition models; 结合正则表达式和命名实体识别模型,构建混合提取策略;Combine regular expressions and named entity recognition models to build a hybrid extraction strategy; 实时监控提取过程,记录每个提取任务的关键指标,通过异常检测机制,当提取过程中出现异常时,触发报警并采取相应的应急措施;Monitor the extraction process in real time, record the key indicators of each extraction task, and through the anomaly detection mechanism, trigger an alarm and take corresponding emergency measures when an anomaly occurs during the extraction process; 通过提取结果校验机制,对提取出的数据进行初步校验,基于提取结果的数据质量分析,评估提取策略的有效性;根据分析结果,对混合策略进行优化调整;Through the extraction result verification mechanism, the extracted data is preliminarily verified, and the effectiveness of the extraction strategy is evaluated based on the data quality analysis of the extraction results; based on the analysis results, the hybrid strategy is optimized and adjusted; S31、对多源数据之间的异构性问题进行识别并处理;并将不同来源的数据转换为统一格式;S31. Identify and process heterogeneity issues among multi-source data; and convert data from different sources into a unified format; S32、基于数据的多维度指标,通过层次分析法以及模糊综合评价法并结合专家库设计数据权重分配机制;并通过权重的动态调整策略,对权重分配机制进行动态调整;S32. Based on the multi-dimensional indicators of data, the data weight distribution mechanism is designed through the hierarchical analysis method and the fuzzy comprehensive evaluation method combined with the expert database; and the weight distribution mechanism is dynamically adjusted through the dynamic adjustment strategy of the weight; S33、通过融合算法对数据进行融合,在融合过程中,进行异常值检测和数据清洗机制;S33, fusing the data through a fusion algorithm, and performing outlier detection and data cleaning mechanisms during the fusion process; S34、融合完成后,通过比对历史数据、行业基准以及业务逻辑对融合后的数据进行验证和评估;S34. After the integration is completed, the integrated data is verified and evaluated by comparing historical data, industry benchmarks and business logic; S35、根据供应商评价的需求和目标,将指标体系划分为多个维度;在每个维度下,进一步细化评价指标,形成多层次的指标体系结构;建立指标动态调整机制,根据影响因素,对指标体系进行动态调整;S35. Divide the indicator system into multiple dimensions according to the needs and goals of supplier evaluation; further refine the evaluation indicators under each dimension to form a multi-level indicator system structure; establish an indicator dynamic adjustment mechanism to dynamically adjust the indicator system according to influencing factors; 所述S4,包括:The S4 comprises: S41、通过多种集成学习方法结合多种机器学习算法构建高级模型,并进行参数调优和模型融合;利用深度学习模型对复杂数据进行处理,提取深层次特征并进行预测分析;S41. Build advanced models by combining multiple ensemble learning methods with multiple machine learning algorithms, and perform parameter tuning and model fusion; use deep learning models to process complex data, extract deep features and perform predictive analysis; S42、通过交叉验证以及ROC曲线分析对模型性能进行评估,并基于评估结果选择模型进行部署;S42. Evaluate the model performance through cross-validation and ROC curve analysis, and select the model for deployment based on the evaluation results; S43、根据评价需求,选取评价方法,并通过加权融合以及非线性组合形成复合评价算法,通过权重动态调整机制,根据评价目标的变化和数据的实时更新,自动调整各评价方法的权重;S43. According to the evaluation requirements, an evaluation method is selected, and a composite evaluation algorithm is formed through weighted fusion and nonlinear combination. The weight of each evaluation method is automatically adjusted according to the changes in the evaluation objectives and the real-time update of the data through the dynamic weight adjustment mechanism; S44、通过机器学习构建风险评估模型,对供应商存在的风险进行量化评估;通过预设预警阈值,当风险评估结果超过预设阈值时,自动触发预警,并通过预警响应机制进行响应。S44. Build a risk assessment model through machine learning to quantitatively assess the risks faced by suppliers; by presetting early warning thresholds, when the risk assessment results exceed the preset thresholds, an early warning is automatically triggered and responded to through the early warning response mechanism. 2.根据权利要求1所述一种应用于供应商评价的多源数据处理方法,其特征在于,所述S1,包括:2. According to claim 1, a multi-source data processing method for supplier evaluation, characterized in that said S1 comprises: S11、通过采集脚本或API接口对不同来源的多源数据进行收集;S11. Collect multi-source data from different sources through collection scripts or API interfaces; S12、将收集到的多源数据发送至云空间,所述云空间通过分层存储结构对接收到的多源数据进行存储;S12, sending the collected multi-source data to a cloud space, where the cloud space stores the received multi-source data through a hierarchical storage structure; S13、对存储的多源数据进行预处理,并对敏感数据进行加密处理,且对非敏感但涉及隐私的数据进行脱敏处理。S13. Preprocess the stored multi-source data, encrypt the sensitive data, and desensitize the non-sensitive but privacy-related data. 3.根据权利要求2所述一种应用于供应商评价的多源数据处理方法,其特征在于,所述S12,包括:3. According to claim 2, a multi-source data processing method for supplier evaluation, characterized in that said S12 comprises: 通过压缩算法对上传的多源数据进行压缩处理,获得压缩后的第一数据包,并对压缩后的数据包进行冗余备份,获得第二数据包,并通过多通道传输协议将第一数据包以及第二数据包分别发送至云空间;The uploaded multi-source data is compressed by a compression algorithm to obtain a compressed first data packet, and the compressed data packet is redundantly backed up to obtain a second data packet, and the first data packet and the second data packet are respectively sent to the cloud space through a multi-channel transmission protocol; 云空间接收到第一数据包以及第二数据包后,将第二数据包存储至第二存储空间;将第一数据包存储至第一存储空间;After receiving the first data packet and the second data packet, the cloud space stores the second data packet in the second storage space; and stores the first data packet in the first storage space; 对第一存储空间内的第一数据包进行解压缩,并通过实时数据流处理层,采用流处理框架对第一存储空间内需要即时处理的数据源进行捕获;Decompressing the first data packet in the first storage space, and capturing the data source in the first storage space that needs to be processed immediately by using a stream processing framework through a real-time data stream processing layer; 并剔除错误或无效数据,实时分析处理后的数据,提取关键指标或进行初步的数据聚合;And eliminate erroneous or invalid data, analyze the processed data in real time, extract key indicators or perform preliminary data aggregation; 在第一存储空间内构建分层的历史数据存储架构,利用对象存储和块存储结合,根据数据访问频率和重要性自动迁移数据;Build a layered historical data storage architecture in the first storage space, using a combination of object storage and block storage to automatically migrate data based on data access frequency and importance; 引入数据归档策略,将长期不活跃但仍需保留的数据迁移至冷存储区域,利用搜索引擎技术为历史数据建立索引。Introduce a data archiving strategy to migrate data that has been inactive for a long time but still needs to be retained to a cold storage area, and use search engine technology to index historical data. 4.根据权利要求1所述一种应用于供应商评价的多源数据处理方法,其特征在于,所述S33,包括:4. According to the multi-source data processing method for supplier evaluation in claim 1, characterized in that said S33 comprises: 根据数据特性和业务需求,进行融合算法选择;利用统计方法对数据中的异常值进行再次检测,对异常值进行剔除或修正;并对再次检测后的多源数据进行清洗;Select fusion algorithms based on data characteristics and business needs; use statistical methods to re-detect outliers in the data, remove or correct them; and clean the multi-source data after re-detection; 根据选定的融合算法和数据特性,制定融合策略,根据融合策略执行数据融合操作,并对融合过程进行实时监控;Formulate fusion strategies based on the selected fusion algorithm and data characteristics, perform data fusion operations according to the fusion strategies, and monitor the fusion process in real time; 通过计算融合结果的统计指标和可视化分析,对融合结果的质量进行初步评估;By calculating the statistical indicators and visual analysis of the fusion results, the quality of the fusion results is preliminarily evaluated; 根据初步评估结果,对融合策略进行反馈和调整,并通过多次迭代,不断优化融合算法和策略。Based on the preliminary evaluation results, feedback and adjustments are made to the fusion strategy, and the fusion algorithm and strategy are continuously optimized through multiple iterations. 5.根据权利要求1所述一种应用于供应商评价的多源数据处理方法,其特征在于,所述S5,包括:5. According to claim 1, a multi-source data processing method for supplier evaluation, characterized in that said S5 comprises: S51、构建基于时间序列分析或机器学习模型的趋势预测系统,对供应商的历史表现数据进行建模,并对未来发展趋势进行预测;S51. Build a trend prediction system based on time series analysis or machine learning models to model suppliers’ historical performance data and predict future development trends; S52、将供应商评价结果与行业标杆进行对比分析,根据供应商的具体表现,生成个性化的改进建议;S52. Compare and analyze supplier evaluation results with industry benchmarks, and generate personalized improvement suggestions based on the supplier's specific performance; S53、建立改进效果跟踪机制,定期收集供应商在采纳改进建议后的表现数据,评估改进效果,并根据反馈调整建议内容;S53. Establish an improvement effect tracking mechanism to regularly collect supplier performance data after adopting improvement suggestions, evaluate the improvement effect, and adjust the suggestions based on feedback; S54、建立多渠道的用户反馈收集机制,并通过迭代优化流程,对收集到的用户反馈进行整理、分析,并制定相应的优化计划;基于优化策略不断迭代优化系统功能和性能。S54. Establish a multi-channel user feedback collection mechanism, and organize and analyze the collected user feedback through an iterative optimization process, and formulate corresponding optimization plans; continuously iterate and optimize system functions and performance based on optimization strategies. 6.一种用于实现如权利要求1所述的应用于供应商评价的多源数据处理方法的系统,其特征在于,所述系统包括:6. A system for implementing the multi-source data processing method for supplier evaluation according to claim 1, characterized in that the system comprises: 数据收集模块:对不同来源的多源数据进行收集,并对收集到的多源数据进行预处理;Data collection module: collects multi-source data from different sources and pre-processes the collected multi-source data; 数据关联模块:根据供应商评价需求,确定主数据源以及相关数据源,基于预设的数据筛查规则,从主数据源中进行有效数据提取,并将提取的有效数据与相关数据源中的数据进行关联;Data association module: Determine the main data source and related data sources according to the supplier evaluation requirements, extract valid data from the main data source based on the preset data screening rules, and associate the extracted valid data with the data in the related data sources; 数据融合模块:将有效数据和有效相关数据进行融合,形成综合数据集,基于融合后的多源数据,构建供应商评价的指标体系;Data fusion module: fuses valid data and valid related data to form a comprehensive data set, and builds an indicator system for supplier evaluation based on the fused multi-source data; 综合评价模块:针对每个评价维度,构建数据处理模型,对多源数据进行深入分析,提取关键评价指标和特征,基于各维度的评价指标和特征,采用综合评价算法对供应商进行综合评价;Comprehensive evaluation module: For each evaluation dimension, a data processing model is constructed to conduct in-depth analysis of multi-source data, extract key evaluation indicators and features, and use a comprehensive evaluation algorithm to conduct a comprehensive evaluation of suppliers based on the evaluation indicators and features of each dimension; 结果反馈模块:根据综合评价算法的结果,输出供应商的综合评价报告,并将评价结果反馈给供应商和相关人员。Result feedback module: Based on the results of the comprehensive evaluation algorithm, output the supplier's comprehensive evaluation report and feedback the evaluation results to the supplier and relevant personnel.
CN202411444157.3A 2024-10-16 2024-10-16 A multi-source data processing method and system for supplier evaluation Active CN118967202B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411444157.3A CN118967202B (en) 2024-10-16 2024-10-16 A multi-source data processing method and system for supplier evaluation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411444157.3A CN118967202B (en) 2024-10-16 2024-10-16 A multi-source data processing method and system for supplier evaluation

Publications (2)

Publication Number Publication Date
CN118967202A CN118967202A (en) 2024-11-15
CN118967202B true CN118967202B (en) 2025-01-24

Family

ID=93390903

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411444157.3A Active CN118967202B (en) 2024-10-16 2024-10-16 A multi-source data processing method and system for supplier evaluation

Country Status (1)

Country Link
CN (1) CN118967202B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119671576A (en) * 2024-11-29 2025-03-21 国网江苏省电力有限公司物资分公司 A method for monitoring, analyzing and evaluating trusted data in power grid material supply chain
CN119941391A (en) * 2025-01-23 2025-05-06 南京思瑞利科技有限公司 Credit evaluation algorithm for international trade enterprises based on multi-domain trust data
CN120336763B (en) * 2025-06-06 2025-09-09 青岛理工大学 Evaluation method of geological characteristics of soil-rock composite strata based on multi-source data fusion
CN120562985B (en) * 2025-07-29 2025-09-26 中亿丰数字科技集团股份有限公司 New city building fusion project evaluation method and system based on multiple dimensions

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143334A (en) * 2019-11-13 2020-05-12 深圳市华傲数据技术有限公司 Data quality closed-loop control method
CN115271496A (en) * 2022-08-05 2022-11-01 阳光慧碳科技有限公司 Double-carbon cooperative interconnection system
CN118395366A (en) * 2024-03-12 2024-07-26 北京国基科技股份有限公司 Multi-source data processing method and device
CN118521012A (en) * 2024-07-24 2024-08-20 浙江省国土空间规划研究院 Construction project planning and site selection and land pre-examination evaluation method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9965735B2 (en) * 2014-01-06 2018-05-08 Energica Advisory Services Pvt. Ltd. System and method for it sourcing management and governance covering multi geography, multi sourcing and multi vendor environments

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143334A (en) * 2019-11-13 2020-05-12 深圳市华傲数据技术有限公司 Data quality closed-loop control method
CN115271496A (en) * 2022-08-05 2022-11-01 阳光慧碳科技有限公司 Double-carbon cooperative interconnection system
CN118395366A (en) * 2024-03-12 2024-07-26 北京国基科技股份有限公司 Multi-source data processing method and device
CN118521012A (en) * 2024-07-24 2024-08-20 浙江省国土空间规划研究院 Construction project planning and site selection and land pre-examination evaluation method and system

Also Published As

Publication number Publication date
CN118967202A (en) 2024-11-15

Similar Documents

Publication Publication Date Title
CN118967202B (en) A multi-source data processing method and system for supplier evaluation
CN115423289B (en) Intelligent plate processing workshop data processing method and terminal
CN118897837B (en) A high-quality data management system based on data governance
CN118673087A (en) Bank data warehouse construction method, system, equipment and storage medium
CN113297146A (en) Processing model and method for local supervision submission data
CN120297690A (en) Production scheduling strategy adjustment system and method
CN119494463A (en) A method for monitoring urban operation indicators based on multi-source heterogeneity
CN110276691A (en) A data processing method and device based on a big data platform
CN118278962A (en) Evaluation method and device for data asset value
CN116912039A (en) A management method and system for a service platform
CN114118793B (en) A local exchange risk warning method, device and equipment
CN119621703A (en) A data-based life cycle visualization management method
US20130041712A1 (en) Emerging risk identification process and tool
CN117290183A (en) ETL-based cross-system exception monitoring processing method and device
CN112215560A (en) Intelligent shift arrangement system and implementation method thereof
CN120182006B (en) Gold fusion rule risk assessment method, system and storage medium based on big data
TWI851428B (en) Esg management system during the credit period and method thereof
CN120387558B (en) Material demand analysis and prediction method and device based on big data and storage medium
CN120387105B (en) Data quality assessment method based on government affair field
CN120541465B (en) Report data anomaly monitoring and quality assessment system, method and electronic equipment
CN118568625B (en) Monitoring device and method applied to auditing system
CN114968744B (en) Implementation method and system based on financial industry capacity management prediction analysis AI algorithm
CN120579203A (en) Logistics customer service resource dynamic management method, device, equipment and storage medium
CN120408461A (en) Illegal behavior analysis method, device and equipment based on big data model prediction
CN120258998A (en) An intelligent insurance business management system and method based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant