Disclosure of Invention
The invention provides a multi-source data processing method and a system applied to supplier evaluation, which are used for solving the problems in the background art:
the invention provides a multi-source data processing method applied to supplier evaluation, which comprises the following steps:
s1, collecting multi-source data of different sources, and preprocessing the collected multi-source data;
s2, determining a main data source and related data sources according to the evaluation requirements of suppliers, extracting effective data from the main data source based on preset data screening rules, and associating the extracted effective data with data in the related data sources;
S3, fusing the effective data and the effective related data to form a comprehensive data set, and constructing an index system for evaluating suppliers based on the fused multi-source data;
s4, constructing a data processing model aiming at each evaluation dimension, carrying out deep analysis on multi-source data, extracting key evaluation indexes and characteristics, and carrying out comprehensive evaluation on suppliers by adopting a comprehensive evaluation algorithm based on the evaluation indexes and characteristics of each dimension;
s5, outputting a comprehensive evaluation report of the supplier according to the result of the comprehensive evaluation algorithm, and feeding back the evaluation result to the supplier and related personnel.
Further, the step S1 includes:
S11, collecting multi-source data of different sources through an acquisition script or an API interface;
s12, the collected multi-source data are sent to a cloud space, and the cloud space stores the received multi-source data through a layered storage structure;
s13, preprocessing stored multi-source data, encrypting sensitive data, and desensitizing data which are not sensitive but related to privacy.
Further, the step S12 includes:
Compressing the uploaded multi-source data through a compression algorithm to obtain a compressed first data packet, performing redundancy backup on the compressed data packet to obtain a second data packet, and respectively transmitting the first data packet and the second data packet to a cloud space through a multi-channel transmission protocol;
after the cloud space receives the first data packet and the second data packet, the second data packet is stored in the second storage space;
decompressing a first data packet in a first storage space, and capturing a data source which needs to be processed in real time in the first storage space by adopting a stream processing frame through a real-time data stream processing layer;
And rejecting error or invalid data, analyzing the processed data in real time, extracting key indexes or performing preliminary data aggregation;
constructing a layered historical data storage architecture in the first storage space, and automatically migrating data according to the data access frequency and importance by utilizing the combination of object storage and block storage;
And introducing a data archiving strategy, migrating data which is inactive for a long time and still needs to be reserved to a cold storage area, and establishing an index for historical data by utilizing a search engine technology.
Further, the step S2 includes:
S21, carrying out quantization scoring on each data source based on a preset threshold and weight through a multi-dimensional data source quality evaluation framework, and identifying a main data source;
S22, aiming at a main data source, automatically adjusting an extraction time interval based on the frequency of data updating, and identifying and extracting key information fields by using a regular expression and a machine learning model;
S23, in the extraction process, based on a data quality secondary verification mechanism, comparing the logic relationship and consistency between historical data and verification data;
S24, dynamically adjusting association rules according to data characteristics and service requirements by jointly identifying and combining with a multidimensional association strategy when associating related data sources;
And S25, after correlation, carrying out consistency verification, establishing a feedback mechanism, feeding back a consistency verification result to a data collection and processing link, and simultaneously continuously optimizing a correlation rule and a verification algorithm according to the feedback result to form an iterative optimization process of a closed loop.
Further, the step S22 includes:
Monitoring an update log or a time stamp of a main data source in real time or periodically, recording the time interval of each data update, and analyzing historical update frequency data to obtain an analysis result;
setting an initial data extraction time interval according to an analysis result, and automatically adjusting the extraction time interval to match a new update speed when the change of the data update frequency is monitored through a dynamic adjustment mechanism;
aiming at the structured data, positioning and extracting key information fields through regular expressions;
for unstructured or semi-structured data, key information is identified and extracted through an existing named entity identification model;
Combining the regular expression and the named entity recognition model to construct a mixed extraction strategy;
Monitoring the extraction process in real time, recording key indexes of each extraction task, triggering an alarm and taking corresponding emergency measures when an abnormality occurs in the extraction process through an abnormality detection mechanism;
the method comprises the steps of carrying out primary verification on extracted data through an extraction result verification mechanism, evaluating the effectiveness of an extraction strategy based on data quality analysis of an extraction result, and carrying out optimization adjustment on a mixing strategy according to an analysis result.
Further, the step S3 includes:
S31, identifying and processing the isomerism problem among the multi-source data, and converting the data of different sources into a uniform format;
S32, designing a data weight distribution mechanism by a hierarchical analysis method and a fuzzy comprehensive evaluation method and combining an expert database based on multidimensional indexes of the data, and dynamically adjusting the weight distribution mechanism by a dynamic adjustment strategy of the weight;
S33, fusing the data through a fusion algorithm, and performing outlier detection and a data cleaning mechanism in the fusion process;
S34, after fusion is completed, verifying and evaluating the fused data by comparing historical data, industry references and business logic;
s35, dividing the index system into a plurality of dimensions according to the requirements and targets of the supplier evaluation, further refining the evaluation index under each dimension to form a multi-level index system structure, establishing an index dynamic adjustment mechanism, and dynamically adjusting the index system according to influence factors.
Further, the step S33 includes:
Selecting a fusion algorithm according to data characteristics (such as data quantity, data distribution, heterogeneous degree and the like) and service requirements, detecting abnormal values in the data again by using a statistical method, eliminating or correcting the abnormal values, and cleaning the multi-source data after the detection again;
according to the selected fusion algorithm and data characteristics, a fusion strategy is formulated, data fusion operation is executed according to the fusion strategy, and the fusion process is monitored in real time;
preliminary evaluation is carried out on the quality of the fusion result by calculating the statistical index and visual analysis of the fusion result;
And feeding back and adjusting the fusion strategy according to the preliminary evaluation result, and continuously optimizing the fusion algorithm and strategy through multiple iterations.
Further, the step S4 includes:
S41, constructing an advanced model by combining a plurality of integrated learning methods with a plurality of machine learning algorithms, and carrying out parameter tuning and model fusion;
S42, evaluating the performance of the model through cross validation and ROC curve analysis, and selecting the model for deployment based on an evaluation result;
s43, selecting an evaluation method according to the evaluation requirement, forming a composite evaluation algorithm through weighted fusion and nonlinear combination, and automatically adjusting the weight of each evaluation method according to the change of an evaluation target and the real-time update of data through a weight dynamic adjustment mechanism;
s44, constructing a risk assessment model through machine learning, quantitatively assessing risks existing in suppliers, automatically triggering early warning when a risk assessment result exceeds a preset threshold through a preset early warning threshold, and responding through an early warning response mechanism.
Further, the step S5 includes:
s51, constructing a trend prediction system based on time sequence analysis or a machine learning model, modeling historical performance data of suppliers, and predicting future development trend;
s52, comparing and analyzing the evaluation result of the supplier with an industry standard, and generating personalized improvement suggestions according to the concrete performances of the supplier;
S53, establishing an improvement effect tracking mechanism, periodically collecting performance data of suppliers after adopting improvement suggestions, evaluating the improvement effect, and adjusting the suggestion content according to feedback;
s54, establishing a multi-channel user feedback collection mechanism, sorting and analyzing the collected user feedback through an iterative optimization flow, and making a corresponding optimization plan, and continuously iterating and optimizing system functions and performances based on an optimization strategy.
The invention provides a system for realizing the multi-source data processing method applied to supplier evaluation, which comprises the following steps:
The data collection module is used for collecting multi-source data of different sources and preprocessing the collected multi-source data;
The data association module is used for determining a main data source and related data sources according to the evaluation requirements of suppliers, extracting effective data from the main data source based on a preset data screening rule, and associating the extracted effective data with the data in the related data source;
The data fusion module is used for fusing the effective data and the effective related data to form a comprehensive data set, and constructing an index system for evaluating suppliers based on the fused multi-source data;
the comprehensive evaluation module is used for constructing a data processing model aiming at each evaluation dimension, carrying out deep analysis on the multi-source data, extracting key evaluation indexes and characteristics, and carrying out comprehensive evaluation on suppliers by adopting a comprehensive evaluation algorithm based on the evaluation indexes and characteristics of each dimension;
and the result feedback module is used for outputting a comprehensive evaluation report of the supplier according to the result of the comprehensive evaluation algorithm and feeding back the evaluation result to the supplier and related personnel.
The method has the beneficial effects that the data quality is ensured by collecting and preprocessing the data from different sources. By adopting the cloud storage technology and the data processing mechanism, the security and reliability of the data can be enhanced. Sensitive information can be protected by encrypting and desensitizing the data. By utilizing a data source quality assessment framework, the most reliable and representative data sources can be identified. The timeliness of the data is ensured by dynamically adjusting the data extraction time and rules. The fusion and standardization of data solves the problem of data isomerism, so that the data can be effectively compared and analyzed on the same platform. By combining a plurality of machine learning algorithms and an integrated learning method, the accuracy and depth of data analysis are improved. The performance of the provider can be reflected more comprehensively by adopting a composite evaluation algorithm and a dynamic weight adjustment mechanism. The use of a risk assessment model can help discover potential problems ahead of time, reducing the risk of supply chain interruption. The trend prediction system can provide insight into future performance changes of the provider and help the enterprise make more intelligent decisions. The dominant and disadvantaged areas of suppliers can be identified by comparative analysis with industry benchmarks. The improvement effect tracking mechanism ensures efficient implementation of improvement suggestions and facilitates the process of continuous optimization. Providing detailed rating reports to suppliers helps to improve both understanding and trust. The user feedback collection mechanism ensures the continuous improvement of the system and meets the continuous changing business requirements.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, and the described embodiments are merely some, rather than all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
In one embodiment of the present invention, as shown in fig. 1, a multi-source data processing method applied to vendor evaluation, the method includes:
s1, collecting multi-source data of different sources, and preprocessing the collected multi-source data;
S2, determining a main data source and related data sources according to the evaluation requirements of suppliers, wherein the main data source is core data in the evaluation process, such as a financial statement; the related data sources are auxiliary data, such as social media evaluation, effective data extraction is carried out from the main data sources based on preset data screening rules, and the extracted effective data are associated with the data in the related data sources;
S3, fusing the effective data and the effective related data to form a comprehensive data set, and constructing an index system for evaluating suppliers based on the fused multi-source data;
S4, constructing a data processing model aiming at each evaluation dimension, carrying out deep analysis on multi-source data, extracting key evaluation indexes and characteristics, and carrying out comprehensive evaluation on suppliers by adopting a comprehensive evaluation algorithm (such as a hierarchical analysis method, a fuzzy comprehensive evaluation method and the like) based on the evaluation indexes and characteristics of each dimension;
S5, outputting a comprehensive evaluation report of the provider according to the result of the comprehensive evaluation algorithm, wherein the evaluation report comprises scores, ranks, advantages and disadvantages of the provider, feeding back the evaluation result to the provider and related personnel, helping the provider to know the problems and the disadvantages of the provider and making corresponding improvement measures. And meanwhile, continuously optimizing and perfecting the data processing model and the evaluation algorithm according to the evaluation result and the feedback opinion.
The technical scheme has the working principle that multi-source data related to the supplier evaluation are collected through different channels (such as an enterprise internal system, a public database, social media, a third party evaluation platform and the like). The multi-source data comprises multiple types of financial reports, transaction records, customer evaluations, social media feedback and the like, the collected multi-source data is preprocessed, and a main data source (such as the financial reports) and related data sources (such as the social media evaluations) are determined according to evaluation requirements. The main data source is the core of evaluation, the related data source is used for providing auxiliary information, and the effective data is extracted from the main data source based on a preset data screening rule. The data screening rules relate to aspects of time range, integrity, rationality and the like of the data, such as screening the latest data acquired within one week, and matching and correlating the extracted effective data with the data in the related data sources by utilizing common identifiers (such as supplier ID, product number and the like) in the data. And fusing the effective data and the effective related data to form a comprehensive data set. And constructing an index system for evaluating suppliers based on the fused multi-source data. The index system covers multiple aspects of suppliers (such as product quality, delivery capacity, service attitude, price competitiveness and the like), determines specific evaluation indexes of each aspect, such as the product quality reaching national standards or the service attitude being five stars, builds a data processing model aiming at each evaluation dimension, carries out deep analysis on multi-source data, and extracts key evaluation indexes and characteristics. Based on the evaluation index and the characteristics of each dimension, comprehensive evaluation algorithms (such as a hierarchical analysis method, a fuzzy comprehensive evaluation method and the like) are adopted to comprehensively evaluate the suppliers, comprehensively consider a plurality of factors, give comprehensive scores and ranks, and output comprehensive evaluation reports of the suppliers according to the results of the comprehensive evaluation algorithms. The comprehensive evaluation report comprises information on the score, ranking, superiority, deficiency and the like of the suppliers, and the evaluation result is fed back to the suppliers and related personnel to help the suppliers and related personnel to know the problems and the deficiency of the suppliers and the related personnel and to formulate corresponding improvement measures. And meanwhile, continuously optimizing and perfecting the data processing model and the evaluation algorithm according to the evaluation result and the feedback opinion.
The technical scheme has the advantages that the real situation of a provider can be reflected more comprehensively by collecting and fusing multi-source data (such as financial statement, social media evaluation, customer feedback and the like) from different sources, and the unilateral or limitation caused by a single data source is avoided; the method comprises the steps of constructing an evaluation index system based on the fused multi-source data, covering multiple aspects (such as product quality, price, delivery capacity, service attitude, innovation capacity and the like) of a supplier, realizing multi-dimensional and omnibearing evaluation, preprocessing and screening the collected multi-source data, effectively removing noise data and abnormal values, improving accuracy and reliability of the data, precisely matching by utilizing common identifiers (such as supplier ID, product number and the like) in a data association process, ensuring consistency and accuracy of the data, reducing evaluation errors caused by data inconsistency, providing clear and visual reference basis for decision makers by output comprehensive evaluation reports comprising information of scores, ranks, advantages, shortages and the like of the supplier, helping to make more scientific and reasonable decisions, feeding back evaluation results to suppliers and related personnel, helping to solve problems and shortages existing in the suppliers, making improvement measures aiming at the accuracy, improving overall efficiency by utilizing common identifiers (such as supplier ID, product number and the like) in the data association process, improving the accuracy and the accuracy of the data by continuously optimizing and processing results and the evaluation algorithms, improving the transparency of the suppliers and optimizing the evaluation system, and optimizing the performance of the evaluation system, through a feedback mechanism, the suppliers can timely know the defects of the suppliers and take measures to improve, thereby not only being beneficial to improving the competitiveness of the suppliers, but also being beneficial to establishing a long-term stable cooperative relationship between enterprises and the suppliers and realizing win-win.
In one embodiment of the present invention, the S1 includes:
S11, collecting multi-source data of different sources through an acquisition script or an API interface, wherein the multi-source data comprises financial reports, historical transaction records, logistics tracking data, social media evaluation and market research reports provided by suppliers;
s12, the collected multi-source data are sent to a cloud space, and the cloud space stores the received multi-source data through a layered storage structure;
s13, preprocessing stored multi-source data, encrypting sensitive data, and desensitizing data which are not sensitive but related to privacy.
The technical scheme has the working principle that related data are acquired from various data sources (such as a database, a website, a file system and the like) through the acquisition script. For data sources supporting the API interface, the data is obtained directly by calling the API. The collected multi-source data comprises financial reports (reflecting financial conditions and business achievements) provided by suppliers, historical transaction records (recording transaction details and payment conditions), logistics tracking data (providing cargo transportation state and position information), social media evaluation (reflecting public opinion of products or services of the suppliers) and market research reports (containing information such as industry trend, competitor analysis and the like), and the collected multi-source data is sent to a cloud space for storage. The cloud space adopts a layered storage structure to manage data, the data is divided into different layers (such as a hot data layer, a warm data layer and a cold data layer) according to factors such as the access frequency, the importance or the storage cost of the data, and the data are respectively stored on storage media with different performance, cost, availability and the like, for example, the data with high access frequency can be divided into the hot data layer and stored on the storage media with better performance. The multi-source data stored in cloud space is preprocessed, including data cleansing (removing duplicate, erroneous, or extraneous data), data conversion (converting the data into a uniform format or structure), and data integration (integrating the data from different sources into a uniform data set). In the data preprocessing process, encryption processing is carried out on sensitive data. The sensitive data includes financial information, personal identity information, and the like. The security of sensitive data in the storage and transmission process can be ensured through encryption processing, and desensitization processing is carried out on data which is not sensitive but related to privacy (such as user name, telephone number and the like). The desensitization treatment is to reduce the privacy risk in the data by means of substitution, deletion or deformation and the like, and meanwhile, the analysis and evaluation values of the data are reserved.
The technical scheme has the advantages that the multi-source data are collected through the collection script or the API interface, the comprehensive and timeliness of the data can be flexibly adapted to the characteristics and the access modes of different data sources, manual intervention is reduced through an automatic data collection process, the efficiency and the accuracy of data collection are improved, the risk of human errors is reduced, the collected multi-source data comprise financial reports, historical transaction records, logistics tracking data, social media evaluation, market research reports and the like, rich information basis is provided for subsequent supplier evaluation based on the collected diversified data sources, almost infinite expandability is provided for cloud space, the rapid increase of the data quantity can be easily handled, the flexibility and the high efficiency of data storage are guaranteed, the cloud space generally adopts a redundant storage and backup mechanism, the high availability of the data is guaranteed, the integrity and the accessibility of the data can be guaranteed even if hardware faults or natural disasters occur, storage resources are reasonably distributed according to the access frequency and the importance of the data through a layered storage structure, the storage cost is reduced, the storage efficiency is improved, the data is preprocessed steps are carried out, the quality is improved, the data can be integrated and the privacy data can be well analyzed and the data can be well analyzed, but the data can be well-analyzed and the data can be not well-stored, the data can be well-analyzed, but the privacy is not well-balanced, the data can be well-handled, the privacy is guaranteed, the data can be well, the data is not be well analyzed, and the data is guaranteed, and the data is not is well is guaranteed.
In one embodiment of the present invention, the S12 includes:
Compressing the uploaded multi-source data through a compression algorithm to obtain a compressed first data packet, performing redundancy backup on the compressed data packet to obtain a second data packet, and respectively transmitting the first data packet and the second data packet to a cloud space through a multi-channel transmission protocol;
the cloud space stores the first data packet into the first storage space, wherein the second data packet in the second storage space is used for backing up data;
decompressing the first data packet in the first storage space, and capturing a data source which needs to be processed in real time in the first storage space by adopting a stream processing framework (such as APACHE KAFKA and APACHE FLINK) through a real-time data stream processing layer;
And reject the erroneous or invalid data, analyze the data after processing in real time, extract the key index (KPIs) or carry on the preliminary data aggregation;
constructing a layered historical data storage architecture in the first storage space, and automatically migrating data according to the data access frequency and importance by combining object storage (such as Amazon S3 and Alicloud OSS) with block storage (such as EBS);
And (3) introducing a data archiving strategy, transferring data which is inactive for a long time and still needs to be reserved (such as a historical transaction record and an old version of a market research report) to a cold storage area, reducing storage cost, and establishing an index for the historical data by utilizing a search engine technology (such as an elastic search).
The technical scheme comprises the working principle that firstly, the compression algorithm is used for compressing uploaded multi-source data, then the compressed data are packaged into a first data packet, redundancy backup is carried out on the compressed first data packet, and a second data packet is generated. The method comprises the steps of receiving a first data packet and a second data packet through a multi-channel transmission protocol, storing the first data packet (original compressed data) in a first storage space for main data processing and analysis after the first data packet and the second data packet are respectively sent to a cloud space through a multi-channel transmission protocol, storing the second data packet (redundant backup) in the second storage space as backup after the data packet is received by the cloud space, decompressing the first data packet in the first storage space and capturing a data source needing immediate processing through a real-time data stream processing layer (such as APACHE KAFKA, APACHE FLINK and other stream processing frameworks) through the redundancy backup. Capturing, processing and analyzing data from the data stream in real time through a real-time data stream framework, removing erroneous or invalid data in the stream processing process, analyzing the processed data in real time, extracting key indexes (KPIs) or performing preliminary data aggregation. The processing result can be immediately used for monitoring, alarming or preliminary business decision support, a layered historical data storage architecture is built in the first storage space, and the data is automatically migrated according to the access frequency and importance of the data by combining the advantages of object storage (such as Amazon S3 and Alicloud OSS) and block storage (such as EBS). And introducing a data archiving strategy to migrate data which is inactive for a long time and still needs to be reserved (such as historical transaction records and old versions of market research reports) to a cold storage area so as to further reduce the storage cost. At the same time, historical data is indexed using search engine technology (e.g., elastic search) to quickly retrieve and query the data.
The technical scheme has the advantages that the compression processing is carried out on multi-source data through a compression algorithm, the data volume is reduced, the occupation of storage space and the requirement of transmission bandwidth are reduced, further, storage and transmission costs are reduced, redundancy backup is carried out on the compressed data packets, second data packets are generated and are respectively sent to cloud space through a multi-channel transmission protocol, redundancy of data transmission is increased, the risk of data loss or damage is reduced, reliability of data transmission is improved, a stream processing frame (such as APACHE KAFKA and APACHE FLINK) is adopted to capture and process data sources needing to be processed immediately, immediate analysis and response of the data are realized, rapid service decision and real-time monitoring are supported, in the process of processing the real-time data, error or invalid data is eliminated, the quality and accuracy of the data can be improved, a reliable data base is provided for subsequent data analysis and application, real-time analysis is carried out on the processed data, key indexes (KPIs) or preliminary data aggregation is carried out, visual service conditions are facilitated, visual support is provided for a management layer, a data migration engine is further provided, a data migration system is realized by using a hierarchical support frame (such as APACHE KAFKA and APACHE FLINK) to be more important data storage and a low-frequency storage and history storage access history is realized by using a low-frequency storage access history storage and a low-frequency storage access history storage access window, and a low-frequency storage access window is realized, and a data migration engine is set up to be optimized, and a cold storage history storage window is stored in a storage area is required to have a low-cost storage history storage window or a storage window has the data storage service access history storage cost, the method has the advantages of improving the retrieval efficiency and the availability of the historical data, ensuring the safe storage of the data which is inactive for a long time and still needs to be reserved (such as historical transaction records and old versions of market research reports) through a data archiving strategy, and meeting the compliance and business continuity requirements of enterprises.
In one embodiment of the present invention, the S2 includes:
S21, through a multi-dimensional data source quality evaluation framework, the evaluation framework comprises a plurality of indexes such as data integrity, accuracy, timeliness, reliability and source authority. Based on a preset threshold value and a weight, carrying out quantization scoring on each data source, and identifying a main data source;
S22, aiming at a main data source, automatically adjusting an extraction time interval based on the frequency of data updating, and identifying and extracting key information fields by using a regular expression and a machine learning model;
S23, comparing the logic relationship and consistency (such as the audit relation among financial data) among historical data and verification data based on a data quality secondary verification mechanism in the extraction process;
S24, when relevant data sources are associated, through common identification (such as provider ID and product number), and combining with a multi-dimensional association strategy, wherein the multi-dimensional association strategy comprises sequence matching based on a timestamp, proximity analysis based on a geographic position and content association based on text similarity;
S25, after correlation, carrying out consistency verification, including cross-data verification (such as sales in financial statement and sales in market research report), rationality verification based on business logic (such as consistency verification of delivery timing rate and logistics tracking data), and anomaly detection based on statistical model (such as identification of anomaly transaction record by using clustering algorithm), and establishing feedback mechanism, feeding back consistency verification result to data collection and processing links, and simultaneously, continuously optimizing correlation rule and verification algorithm according to feedback result to form closed loop iterative optimization process.
The working principle of the technical scheme is that the data sources are comprehensively evaluated by utilizing a data source quality evaluation framework comprising a plurality of indexes such as data integrity, accuracy, timeliness, reliability and source authority, the accuracy and the reliability of the data are further verified by comparing historical data, verifying logical relations and consistence (such as the audit relation among financial data), the data quality problems are quantitatively scored, the main data source with the highest quality and most suitable for being used as a subsequent processing basis is identified according to a scoring result, the time interval of data extraction is automatically adjusted according to the updating frequency of the main data source, key information fields including provider names, product specifications and transaction amounts are identified and extracted from the main data source by utilizing a regular expression and a machine learning model, in the extraction process, the accuracy and the reliability of the data are further verified by comparing the historical data, verifying the logical relations and consistence (such as the audit relation among financial data), the discovered data quality problems are recorded and fed back to a data collecting and processing link in time, and a multi-dimensional association strategy (such as a time stamp sequence, geographic position adjacency and geographic position adjacency) is combined, and the data is dynamically associated with different data in the dynamic association rules or the abnormal conditions according to the abnormal conditions. For example, additional verification fields are introduced to carry out auxiliary association or emotion analysis model parameters are adjusted to filter noise, data consistency among different data sources is ensured through cross verification of data (such as sales in financial statement and market research report) of cross data sources, data quality is further improved based on rationality verification of business logic (such as consistency of delivery timing rate and logistics tracking data) and anomaly detection based on a statistical model (such as identification of abnormal transaction records by a clustering algorithm), a feedback mechanism is established, results of consistency verification are fed back to a data collecting and processing link, and association rules and verification algorithms are optimized continuously according to feedback results. And forming a closed loop iterative optimization process, and continuously improving the accuracy and efficiency of data processing.
Through a multi-dimensional data source quality evaluation framework, indexes such as data integrity, accuracy, timeliness, reliability, source authority and the like are comprehensively considered, and the quality of the data source can be comprehensively evaluated, so that the firmness and reliability of a data base of subsequent processing are ensured; the method comprises the steps of providing a data source, providing a data source with a preset threshold value, providing a high-quality data input for subsequent data extraction and association based on a quantization score of the preset threshold value and the weight, automatically adjusting an extraction time interval according to the frequency of data updating, avoiding unnecessary frequent extraction, improving data processing efficiency, automatically identifying and extracting key information fields by using a regular expression and a machine learning model, reducing manual intervention, improving automation level, timely finding and correcting data errors by comparing historical data and verifying logical relations and consistency among the data in the extraction process, ensuring the accuracy and consistency of the data, realizing effective association among different data sources by a common identification and multidimensional association strategy, further strengthening the integrity and consistency of the data, dynamically adjusting association rules according to the data characteristics and service requirements, flexibly coping with abnormal or special situations in the data, such as repeated ID (identity) of a provider, social media noise and the like, accurately detecting abnormal transaction records based on a statistical model, timely finding and processing abnormal transaction records by using a clustering algorithm, ensuring the accuracy and reliability of the data, and optimizing the association mechanism by establishing the feedback mechanism and the feedback mechanism according to the feedback result and the feedback rule.
In one embodiment of the present invention, the step S22 includes:
Monitoring update logs or time stamps of the main data source in real time or periodically, recording time intervals of each data update, and analyzing historical update frequency data to obtain analysis results, wherein the analysis results comprise periodic rules (such as daily, weekly or monthly update) of the data update and abnormal fluctuation;
Setting an initial data extraction time interval according to an analysis result, ensuring that the latest data can be captured in time while avoiding resource waste caused by excessive frequent extraction, and automatically adjusting the extraction time interval to match a new update speed when the change of the data update frequency is monitored (such as acceleration or slowing down) through a dynamic adjustment mechanism;
positioning and extracting key information fields through regular expressions aiming at structured data (such as table data in a database);
For unstructured or semi-structured data (such as text reports and social media comments), key information is identified and extracted through an existing named entity identification model;
And the regular expression is used for the information which is easy to match through rules, and the complicated and changeable information is processed by depending on the named entity recognition model.
The method comprises the steps of monitoring an extraction process in real time, recording key indexes of each extraction task, wherein the key indexes comprise execution time, success rate and error information, triggering an alarm and taking corresponding emergency measures (such as retrying extraction, switching standby data sources and the like) when an abnormality (such as data format disagreement, network interruption and the like) occurs in the extraction process through an abnormality detection mechanism;
the method comprises the steps of carrying out primary verification on extracted data through an extraction result verification mechanism, evaluating the effectiveness of an extraction strategy based on data quality analysis of an extraction result, and carrying out optimization adjustment on a mixing strategy according to an analysis result.
The working principle of the technical scheme is that the time interval of each data update is recorded in real time or periodically by monitoring an update log or a timestamp of a main data source, historical update frequency data is analyzed, periodicity rules (such as daily, weekly, monthly and the like) and possible abnormal fluctuation of the data update are identified, an initial data extraction time interval is set, for example, one hour based on an analysis result, change of the data update frequency is continuously monitored, when the change of the data update frequency is monitored (such as acceleration or slowing down), the data extraction time interval is automatically adjusted to match new update speed, key information fields such as date, amount and identifier are accurately positioned and extracted by using regular expressions for structured data (such as database tables), key information is identified and extracted by using the existing named entity identification (NER) model for unstructured or semi-structured data (such as text report and social media comment), and a mixed extraction strategy is constructed by combining advantages of the regular expressions and the named entity identification model. The method comprises the steps of using regular expressions for information which is easy to match through rules, processing the information which is complex and variable and depends on a named entity recognition model, monitoring key indexes of each extraction task in real time, such as execution time, success rate and error information, triggering alarm through an abnormality detection mechanism when abnormality (such as data format disagreement and network interruption) occurs in the extraction process, automatically or manually taking corresponding emergency measures, such as retrying extraction, switching standby data sources and the like, carrying out primary verification on the extracted data through an extraction result verification mechanism, evaluating the basic quality of the data, analyzing the data quality based on the extraction result, evaluating the effectiveness of an extraction strategy, and carrying out optimization adjustment on the mixed extraction strategy according to the analysis result so as to improve the accuracy and efficiency of data extraction.
The technical scheme has the advantages that the latest data can be captured in time by analyzing the update frequency of the main data source and setting the initial extraction time interval, and meanwhile unnecessary frequent extraction is avoided, so that computing resources and network resources are saved. When the change of the data updating frequency is monitored, the extraction time interval is automatically adjusted, the flexibility and the efficiency of data extraction are further improved, and key information can be extracted from structured, unstructured or semi-structured data more accurately by combining a mixed extraction strategy of a regular expression and a named entity recognition model. The regular expression is suitable for data with definite rules and fixed formats, the named entity recognition model is good at processing complex and changeable data with rich semantics, the accuracy of data extraction is greatly improved by combining the regular expression and the named entity recognition model, the extraction process is monitored in real time, key indexes such as execution time, success rate and error information are recorded, potential problems can be found timely, an alarm can be triggered rapidly when an abnormality occurs in the extraction process through an abnormality detection mechanism, corresponding emergency measures such as retrying extraction, switching standby data sources and the like can be adopted automatically or manually, continuity and stability of data extraction are ensured, data loss or delay caused by the abnormality is reduced, primary verification is carried out on the extracted data, data quality is evaluated, and accuracy and reliability of an extraction result are ensured. The method comprises the steps of carrying out optimization adjustment on a mixed extraction strategy based on the result of data quality analysis, continuously improving the precision and efficiency of data extraction, carrying out data extraction by using a machine learning model such as named entity recognition and the like, realizing intelligent processing, reducing manual intervention, improving the processing efficiency and accuracy, carrying out adjustment from data updating monitoring to extraction time interval, carrying out data extraction and verification, realizing high automation on the whole process, reducing the risk of human errors, improving the overall data processing efficiency, flexibly adjusting and optimizing the mixed extraction strategy according to the data characteristics and service requirements so as to adapt to different data sources and data formats, improving the expandability and adaptability of the scheme, and rapidly adapting to the changes through the dynamic adjustment mechanism and the optimization of the mixed extraction strategy when the data sources or the data formats change, and ensuring the stability and accuracy of data extraction.
In one embodiment of the present invention, the S3 includes:
S31, identifying and processing the isomerism problem among the multi-source data, and converting the data of different sources into a uniform format through methods of data conversion, standardization, normalization and the like;
s32, designing a data weight distribution mechanism by using a hierarchical analysis method and a fuzzy comprehensive evaluation method and combining an expert database based on multi-dimensional indexes of data, wherein the multi-dimensional indexes comprise importance, reliability, timeliness and source authority;
S33, fusing the data through a fusion algorithm (such as weighted average, principal component analysis, neural network and the like), and performing outlier detection and a data cleaning mechanism in the fusion process;
S34, after fusion is completed, verifying and evaluating the fused data by comparing historical data, industry references and business logic;
S35, dividing an index system into a plurality of dimensions (such as financial conditions, production capacity, product quality, delivery time rate, after-sales service and the like) according to the requirements and targets of the evaluation of suppliers, further refining the evaluation index under each dimension to form a multi-level index system structure, establishing an index dynamic adjustment mechanism, and dynamically adjusting the index system according to influencing factors including market change, business requirements and data feedback. The method comprises the steps of adding or deleting an evaluation index, adjusting index weight, optimizing an index calculation method and the like;
The technical scheme comprises the working principle that firstly, a system identifies inconsistency of data from different sources in data types, formats, units and semantics, then, data from different sources are converted into unified formats and standards by adopting methods such as data conversion, standardization and normalization, multidimensional indexes such as importance, reliability, timeliness and source authority of the data are utilized, a data weight distribution mechanism is designed by utilizing a analytic hierarchy process and a fuzzy comprehensive evaluation method and combining with the opinion of an expert database, the weight distribution mechanism is timely adjusted through a dynamic adjustment strategy according to the actual performance of the data and the change of market and business requirements, the advantages of each data source are fully utilized by adopting fusion algorithms such as weighted average, principal component analysis and neural network, the comprehensiveness and accuracy of the data are improved, an abnormal value detection and data cleaning mechanism is implemented in the fusion process, error or unreasonable data are removed after the fusion is completed, the fused data are comprehensively verified and evaluated by comparing historical data, industry references and business logic, the fused data are divided into a plurality of dimensional indexes (such as the dimensional index, the quality index and the quality index are further adjusted in a multilevel system is further according to the quality of the market, the quality index is further adjusted, and the quality index is more flexible, and the quality index is further adjusted according to the quality of the market, and the quality index is more flexible, and the quality index is further adjusted in the market and quality index is better adjusted. Including adding or deleting evaluation index, adjusting index weight, optimizing index calculation method, etc.
The technical scheme has the advantages that the problem of isomerism among the multi-source data is identified and processed, so that inconsistency among data types, formats, units and semantics is solved, data from different sources can be converted into a unified format, and a reliable basis is provided for subsequent data analysis and decision. The information island can be broken through by improving the integration capability, the comprehensive interconnection and sharing of data are realized, and the contribution degree of each data source can be reasonably evaluated based on a weight distribution mechanism designed by multidimensional indexes (such as importance, reliability, timeliness and source authority), so that the quality difference of the data is fully considered in the data fusion process. The method comprises the steps of determining the actual performance of data, the change of market and business demands, and the like, and determining the actual performance of the data, the change of the market and business demands, the time-based and accurate weight distribution, and the like. And after the fusion is completed, the fused data is comprehensively verified and evaluated by comparing historical data, industry references and business logic, so that the accuracy and applicability of the data are ensured. According to the requirements and targets of the supplier evaluation, the index system is divided into a plurality of dimensions, and the evaluation index is further refined under each dimension, so that a multi-level index system structure is formed. The comprehensive strength of the suppliers can be comprehensively reflected through a multi-dimensional index system, rich visual angles and basis are provided for evaluation, an index dynamic adjustment mechanism is established, and the index system is flexibly adjusted according to influence factors such as market change, business requirements, data feedback and the like. Through the dynamic adjustment capability, the index system can always keep synchronous with the actual situation, and the timeliness and accuracy of evaluation are ensured.
In one embodiment of the present invention, the step S33 includes:
Selecting fusion algorithm according to data characteristics (such as data quantity, data distribution, heterogeneous degree and the like) and service requirements, detecting abnormal values in the data again by using a statistical method, eliminating or correcting the abnormal values, cleaning the multi-source data after the re-detection, removing duplicate data, correcting error data, processing missing values and the like,
According to the selected fusion algorithm and data characteristics, a fusion strategy is formulated, including data alignment, fusion sequence, weight distribution and the like;
preliminary evaluation of the quality of the fusion result is carried out by calculating statistical indexes (such as mean value, variance, correlation coefficient and the like) and visual analysis (such as scatter diagram, thermodynamic diagram and the like) of the fusion result;
And feeding back and adjusting the fusion strategy according to the preliminary evaluation result, and continuously optimizing the fusion algorithm and strategy through multiple iterations.
The principle of operation of the above solution is that, first, a suitable fusion algorithm is selected according to the data characteristics (such as data volume, data distribution, degree of heterogeneity, etc.) and the traffic demand, for example, assuming that a financial transaction data set containing millions of records is being processed, and that these data need to be analyzed in real time to detect fraudulent activity. Because of the large amount of data and limited computational resources, it is possible to select distributed algorithms that can handle large-scale datasets, such as the APACHE SPARK supported machine learning algorithms. These algorithms enable parallel processing of data on multiple machines, thereby increasing processing speed. The method is characterized in that the method comprises the steps of determining a data fusion strategy, determining a detailed fusion strategy according to a selected fusion algorithm and data characteristics, including data alignment (ensuring that related data in different data sources can be corresponding), a fusion sequence (determining the sequence of data fusion) and weight distribution (distributing weights according to factors such as importance and reliability of the data), executing data fusion operation according to the determined fusion strategy, integrating data of a plurality of data sources into a unified data set, performing real-time monitoring in a fusion process, calculating statistical indexes (such as mean value, variance, correlation coefficient and the like) of the fusion result, primarily evaluating the quality of the fusion result, better understanding the distribution and the characteristics of the fusion result according to visual tools (such as a scatter diagram, a graph and the like), and better understanding the characteristics of the primary evaluation fusion result by utilizing visual tools (such as a visual tools and the like) to better understand the distribution and display of the data according to the characteristics of the primary evaluation result. And continuously optimizing the fusion algorithm and strategy through multiple iterations until a satisfactory fusion result is obtained.
The technical scheme has the advantages that a proper fusion algorithm is selected according to the data characteristics and the service requirements, so that the data fusion process can be ensured to be more efficient and the result is more accurate. The method can ensure the optimal integration effect of the data by processing different data characteristics and service requirements through different algorithms, and can remarkably improve the quality and consistency of the data by detecting and processing the abnormal values in the data again and cleaning repeated, wrong and missing data. The fusion strategy is formulated according to the selected fusion algorithm and data characteristics, including data alignment, fusion sequence, weight distribution and the like, and the complexity and the isomerism of different data sources can be flexibly dealt with. The flexibility enables the technical scheme to adapt to various different application scenes and data environments, monitors the fusion process in real time, feeds back and adjusts the fusion strategy according to the primary evaluation result, and can ensure the controllability and optimality of the fusion process. The quality and accuracy of the fusion result can be continuously improved through repeated iterative optimization of the fusion algorithm and the strategy, and the fusion result with high quality, consistency and accuracy can be generated through a refined data fusion process. Based on the fusion result, powerful support can be provided for enterprise decision making, a decision maker is helped to grasp market trend more accurately, evaluate supplier strength, optimize production flow and the like, and characteristics and trend of the fusion result can be intuitively displayed through calculation of statistical indexes and visual analysis. The technical scheme comprises a detection and processing mechanism for abnormal values, can effectively identify and remove the abnormal values in the data, and reduces system errors and instability caused by data errors or noise. The method is beneficial to enhancing the robustness and reliability of the system, and can continuously improve the performance and stability of the system by optimizing the fusion algorithm and strategy through multiple iterations. The system can continuously adapt to the change of the data environment and the change of the service requirement through an iterative optimization mechanism, and the high-efficiency and accurate running state is maintained.
In one embodiment of the present invention, the S4 includes:
S41, constructing an advanced model by combining a plurality of integrated learning methods (such as Stacking, boosting and the like) with a plurality of machine learning algorithms, and performing parameter tuning and model fusion;
s42, evaluating the performance of the model through cross validation and ROC curve analysis, and selecting an optimal model for deployment based on an evaluation result;
s43, selecting an evaluation method (such as a hierarchical analysis method, a fuzzy comprehensive evaluation method, a TOPSIS method and the like) according to the evaluation requirement, forming a composite evaluation algorithm through weighted fusion and nonlinear combination, and automatically adjusting the weight of each evaluation method according to the change of an evaluation target and the real-time update of data through a weight dynamic adjustment mechanism;
and S44, constructing a risk assessment model through machine learning, quantitatively assessing the risk possibly existing in the provider, presetting an early warning threshold according to historical data and business rules, automatically triggering early warning when the risk assessment result exceeds the preset threshold, and responding through an early warning response mechanism, wherein the early warning notification, the problem tracking and the solution recommendation are included. The risk assessment result is obtained through the following formula:
Wherein R is the risk assessment result, Is the weight of the i-th risk factor,Is the specific score for the ith risk factor, n is the total number of risk factors,As a function ofAnd converting the data into a numerical value which can be used for calculation, wherein T is an early warning threshold value, and triggering early warning if R is more than T.
The working principle of the technical scheme is that a plurality of integrated learning methods (such as Stacking, boosting) are combined with a plurality of machine learning algorithms (such as decision trees, random forests, SVMs and the like) to construct an advanced model with higher prediction performance and generalization capability. The model is used for combining prediction results of a plurality of base models, parameter tuning is carried out on each model through a grid search method, a random search method or a Bayesian optimization method and the like to find the optimal model parameter combination so that the model performs optimal performance on a verification set, and deep learning models (such as LSTM, CNN and the like) are adopted for processing complex data (such as time sequence data, image data and the like). And automatically extracting deep feature representation from the original data through the advanced model, and further using the deep feature representation for predictive analysis. The nonlinear relation and the complex mode in the data can be captured by training the deep learning model, the prediction results of a plurality of advanced models and the deep learning model are fused by adopting weighted average, voting or a more complex integration strategy, and the performance of the advanced models is evaluated by adopting a cross-validation method (such as K-fold cross-validation). The generalization ability of the advanced model is evaluated in a plurality of training-verification rounds by dividing the dataset into a training set and a verification set (or test set), and the classification performance of the advanced model is analyzed by using the ROC curve. The ROC curve is used for evaluating the classification capability and the robustness of the advanced model by drawing relation diagrams of real case rate (TPR) and false positive case rate (FPR) under different thresholds, selecting the advanced model with optimal performance for deployment based on the results of cross-validation and ROC curve analysis, and selecting a proper evaluation method (such as a analytic hierarchy process, a fuzzy comprehensive evaluation method, a TOPSIS method and the like) according to the evaluation requirement. And integrating the results of the multiple evaluation methods into a composite evaluation algorithm by means of weighted fusion and nonlinear combination. And automatically adjusting the weight of each evaluation method through a weight dynamic adjustment mechanism according to the change of the evaluation target and the real-time update of the data. And constructing a risk assessment model through a machine learning algorithm, and quantitatively assessing the risk possibly existing in the provider. And presetting a reasonable early warning threshold according to the historical data and the business rules, for example, early warning is started when the risk assessment result reaches six achievement. And once the early warning is triggered, the system timely informs related personnel through early warning notification (such as mail, short message, message pushing and the like). Meanwhile, the system starts a problem tracking and solution recommending mechanism, helps related personnel to quickly locate the problem and takes corresponding measures.
The technical scheme has the advantages that a higher-level and more complex model can be constructed by combining a plurality of integrated learning methods and machine learning algorithms, deviation and variance of a single model can be reduced by integrating the advantages of a plurality of base models, so that prediction precision and generalization capability are improved, and deep learning models (such as LSTM and CNN) can automatically extract deep feature representations aiming at complex data (such as time sequences and images) to capture nonlinear relations and complex modes in the data. The method can more accurately understand the rules behind the data, improve the accuracy of predictive analysis, and comprehensively and objectively evaluate the performance of the model through cross verification and ROC curve analysis. The method is beneficial to avoiding the problems of over-fitting and under-fitting, ensuring that the selected model has good performance on unknown data, and selecting the optimal model for deployment based on scientific evaluation results. The method can ensure the stability and reliability of the system in practical application, and can construct a flexible and comprehensive composite evaluation algorithm by combining multiple evaluation methods through weighted fusion and nonlinearity. The method has the advantages that the comprehensive and accuracy of the evaluation are improved by fully considering the advantages and applicable scenes of different evaluation methods, and the weight of each evaluation method can be automatically adjusted by the evaluation algorithm based on the weight dynamic adjustment mechanism according to the change of the evaluation target and the real-time update of the data. The evaluation algorithm can be ensured to be consistent with the actual situation all the time, and the timeliness and the accuracy of the evaluation are improved; the risk assessment model constructed by machine learning enables quantitative assessment of risk that may exist for a provider. The method can help enterprises to discover potential risk factors in time and provide support for decision making, and when the risk assessment result exceeds a preset threshold value, an early warning mechanism is automatically triggered and responds in modes of early warning notification, problem tracking, solution recommendation and the like. The method is beneficial to the enterprises to rapidly cope with risk events and reduce loss, and provides comprehensive decision support for the enterprises by providing functions of advanced model prediction, scientific model evaluation, comprehensive evaluation system, effective risk evaluation and early warning mechanism and the like. This helps the enterprise make decisions more efficiently and accurately, improving market competitiveness. The risk assessment formula allows different weights to be given to different risk factors according to specific situations, so that the flexibility and the customizability of assessment can be realized. Under different industries and different service scenes, the importance of each risk factor may be different, and the actual situation can be reflected more accurately by adjusting the weight. The above formula ensures the comprehensiveness of the assessment by accumulating the scores of all risk factors after weighting. Each risk factor is taken into consideration, so that the one-sided performance of the overall risk determined by a single factor is avoided, and the evaluation result is more comprehensive and reliable. In the formulaThe function is used to score the risk factors specificallyAnd the risk is quantitatively evaluated by converting the risk into a numerical value which can be used for calculation. This helps to convert subjective judgment into objective data, improving accuracy and comparability of evaluation. The formula ensures the effectiveness of an early warning mechanism by presetting an early warning threshold (T) and automatically triggering early warning when a risk assessment result (R) exceeds the threshold. The automatic early warning response can timely find potential risks, provide timely countermeasures for enterprises or organizations, and reduce the probability and loss of risk occurrence. The weight in the formula can be automatically adjusted according to the change of the evaluation target and the real-time update of the data through a weight dynamic adjustment mechanism. The dynamic adjustment capability enables the risk assessment model to adapt to different service environments and demand changes, and accuracy and timeliness of assessment results are maintained. The risk assessment result (R) provides a quantitative index for the decision maker regarding the risk level of the provider, which helps the decision maker to more scientifically formulate purchasing strategies, risk management plans, and the like. Meanwhile, the functions of early warning notification, problem tracking, solution recommendation and the like provided by the early warning response mechanism further support decision making and implementation.
In one embodiment of the present invention, the step S5 includes:
s51, constructing a trend prediction system based on time sequence analysis or a machine learning model, modeling historical performance data of suppliers, and predicting future development trend;
S52, comparing and analyzing the evaluation result of the supplier with other competitors or industry targets, identifying own advantages and defects, providing basis for making competition strategies, generating personalized improvement suggestions according to the specific performances of the supplier, for example, providing specific quality control flow optimization suggestions for the defects of a certain supplier in product quality, and recommending to introduce an advanced supply chain management system for the problem of slow response speed of a supply chain.
S53, establishing an improvement effect tracking mechanism, periodically collecting performance data of suppliers after adopting improvement suggestions, evaluating the improvement effect, and adjusting the suggestion content according to feedback;
S54, establishing a multi-channel user feedback collection mechanism, including online investigation, user interviews, customer service hotlines and the like, ensuring that user opinions can be collected comprehensively and timely, sorting and analyzing the collected user feedback through an iterative optimization flow, and making a corresponding optimization plan, and continuously iterating and optimizing system functions and performances based on an optimization strategy.
The technical scheme comprises the following working principle that firstly, historical performance data of a provider are collected, wherein the historical performance data comprise a plurality of dimensions such as order quantity, delivery time rate, product quality qualification rate, after-sales service evaluation and the like. Next, the data is cleaned, denoised and normalized, and a trend prediction system is constructed based on time series analysis or machine learning algorithms (e.g., ARIMA, LSTM, XGBoost, etc.). And the model capable of predicting the future development trend is established by analyzing the historical data and learning the mode and trend in the data, and the future development trend of the supplier is predicted by utilizing the established model. The prediction result may include order quantity prediction, delivery time rate change, product quality trend, etc. in a future period, for example, the prediction result is that the order quantity will be greatly increased in a month in the future. And comparing and analyzing the evaluation result of the provider with a competitor or an industry standard pole to identify which aspects of the provider have advantages and which aspects have disadvantages. The method can help enterprises to know the positions of the enterprises in the market through comparative analysis, provides basis for establishing competition strategies, and generates personalized improvement suggestions according to the concrete performances of suppliers. The improvement advice includes aspects for product quality, delivery time rate, after-market service, and the like. For example, specific quality control flow optimization suggestions can be provided for the deficiency of a certain provider in product quality, introduction of an advanced supply chain management system and the like can be recommended for the problem of low response speed of a supply chain, and performance data of the provider after adopting the improvement suggestions are collected periodically. These data are used to evaluate the effect of the improvement suggestion, and by comparing the data before and after improvement, it is evaluated whether the effect of improvement is significant. If the effect is significant, the improvement suggestion is effective, if the effect is not significant or even worse, the feasibility and implementation condition of the improvement suggestion need to be reviewed again, and feedback adjustment is carried out on the improvement suggestion according to the evaluation result. If the advice is effective, continuing to promote, if the advice is ineffective or needs improvement, reformulating new advice or adjusting implementation strategies, establishing a multi-channel user feedback collection mechanism comprising online investigation, user interviews, customer service hotline and the like, and sorting and analyzing the collected user feedback. And (3) extracting key information and comments in user feedback through data analysis technology (such as text mining, emotion analysis and the like), and making a corresponding optimization plan based on analysis results of the user feedback. The optimization plan comprises multiple aspects of improvement of system functions, improvement of performance, optimization of user experience and the like, and the system functions and the performance are continuously and iteratively optimized according to the optimization plan. Through continuous user feedback collection and iterative optimization flow, the system is ensured to always meet the user requirements and keep a good running state.
The technical scheme has the effect that enterprises can accurately predict future development trends of suppliers by constructing a trend prediction system based on time sequence analysis or machine learning models. The method can help enterprises to know market changes in advance, and make prospective purchasing plans and supply chain strategies so as to avoid potential risks and losses, and the prediction results provide powerful decision support for the enterprises. The enterprise can adjust key links such as supplier selection, inventory management, production plan and the like according to the prediction result so as to cope with market fluctuation and change, and the supplier evaluation result is compared and analyzed with other competitors or industry targets so as to be beneficial to the enterprise to identify own advantages and defects. The method not only can help enterprises to know the positions of the enterprises in the supply chain by performing comparative analysis, but also can provide powerful basis for the enterprises to formulate differentiated competition strategies, and can generate personalized improvement suggestions according to the concrete performances of suppliers, thereby being beneficial to the enterprises to solve the problems in the supply chain in a targeted manner. The improvement proposal can be directly applied to the management and improvement process of the suppliers to improve the overall performance level of the suppliers, and the performance data of the suppliers after adopting the improvement proposal is collected periodically by establishing an improvement effect tracking mechanism to evaluate the improvement effect. The tracking mechanism ensures the effectiveness and sustainability of the improvement measures, is beneficial to the enterprise to continuously optimize the supply chain management flow, adjusts the recommended content according to the evaluation result and the feedback opinion, and can ensure that the improvement measures always meet the actual demands and market changes of the enterprise. By establishing a multi-channel user feedback collection mechanism, enterprises can be ensured to collect user opinions comprehensively and timely. The method is beneficial to the enterprises to know the user demands and market dynamics, provides powerful support for product development and system optimization, and sorts, analyzes and optimizes planning on the collected user feedback through iterative optimization flow. The system is continuously iterated and optimized based on the optimization strategy to ensure that the system always meets the demands of users and keeps the leading position, and the overall efficiency and the competitiveness of the supply chain are obviously improved through the comprehensive effects of a plurality of links such as accurate trend prediction, competitive advantage identification, continuous improvement, user feedback response and the like.
One embodiment of the present invention, as shown in fig. 2, is a system for implementing a multi-source data processing method applied to vendor evaluation, the system comprising:
The data collection module is used for collecting multi-source data of different sources and preprocessing the collected multi-source data;
The data association module is used for determining a main data source and related data sources according to the evaluation requirements of suppliers, wherein the main data source is core data in the evaluation process, such as a financial statement, the related data sources are auxiliary data, such as social media evaluation, extracting effective data from the main data source based on a preset data screening rule, and associating the extracted effective data with data in the related data sources;
The data fusion module is used for fusing the effective data and the effective related data to form a comprehensive data set, and constructing an index system for evaluating suppliers based on the fused multi-source data;
The comprehensive evaluation module is used for constructing a data processing model for each evaluation dimension, carrying out deep analysis on the multi-source data, extracting key evaluation indexes and characteristics, and carrying out comprehensive evaluation on suppliers by adopting a comprehensive evaluation algorithm (such as a hierarchical analysis method, a fuzzy comprehensive evaluation method and the like) based on the evaluation indexes and characteristics of each dimension;
And the result feedback module is used for outputting a comprehensive evaluation report of the provider according to the result of the comprehensive evaluation algorithm, wherein the report comprises information on the score, ranking, superiority, insufficiency and the like of the provider, feeding back the evaluation result to the provider and related personnel, helping the provider to know the problems and the insufficiency of the provider and making corresponding improvement measures. Meanwhile, according to the evaluation result and the feedback opinion, a data processing model and an evaluation algorithm are continuously optimized and perfected, and the accuracy and the effectiveness of evaluation are improved.
The technical scheme has the working principle that multi-source data related to the supplier evaluation are collected through different channels (such as an enterprise internal system, a public database, social media, a third party evaluation platform and the like). The multi-source data comprises multiple types of financial reports, transaction records, customer evaluations, social media feedback and the like, the collected multi-source data is preprocessed, and a main data source (such as the financial reports) and related data sources (such as the social media evaluations) are determined according to evaluation requirements. The main data source is the core of evaluation, the related data source is used for providing auxiliary information, and the effective data is extracted from the main data source based on a preset data screening rule. The data screening rules relate to aspects of time range, integrity, rationality and the like of the data, such as screening the latest data acquired within one week, and matching and correlating the extracted effective data with the data in the related data sources by utilizing common identifiers (such as supplier ID, product number and the like) in the data. And fusing the effective data and the effective related data to form a comprehensive data set. And constructing an index system for evaluating suppliers based on the fused multi-source data. The index system covers multiple aspects of suppliers (such as product quality, delivery capacity, service attitude, price competitiveness and the like), determines specific evaluation indexes of each aspect, such as the product quality reaching national standards or the service attitude being five stars, builds a data processing model aiming at each evaluation dimension, carries out deep analysis on multi-source data, and extracts key evaluation indexes and characteristics. Based on the evaluation index and the characteristics of each dimension, comprehensive evaluation algorithms (such as a hierarchical analysis method, a fuzzy comprehensive evaluation method and the like) are adopted to comprehensively evaluate the suppliers, comprehensively consider a plurality of factors, give comprehensive scores and ranks, and output comprehensive evaluation reports of the suppliers according to the results of the comprehensive evaluation algorithms. The comprehensive evaluation report comprises information on the score, ranking, superiority, deficiency and the like of the suppliers, and the evaluation result is fed back to the suppliers and related personnel to help the suppliers and related personnel to know the problems and the deficiency of the suppliers and the related personnel and to formulate corresponding improvement measures. And meanwhile, continuously optimizing and perfecting the data processing model and the evaluation algorithm according to the evaluation result and the feedback opinion.
The technical scheme has the advantages that the real situation of a provider can be reflected more comprehensively by collecting and fusing multi-source data (such as financial statement, social media evaluation, customer feedback and the like) from different sources, and the unilateral or limitation caused by a single data source is avoided; the method comprises the steps of constructing an evaluation index system based on the fused multi-source data, covering multiple aspects (such as product quality, price, delivery capacity, service attitude, innovation capacity and the like) of a supplier, realizing multi-dimensional and omnibearing evaluation, preprocessing and screening the collected multi-source data, effectively removing noise data and abnormal values, improving accuracy and reliability of the data, precisely matching by utilizing common identifiers (such as supplier ID, product number and the like) in a data association process, ensuring consistency and accuracy of the data, reducing evaluation errors caused by data inconsistency, providing clear and visual reference basis for decision makers by output comprehensive evaluation reports comprising information of scores, ranks, advantages, shortages and the like of the supplier, helping to make more scientific and reasonable decisions, feeding back evaluation results to suppliers and related personnel, helping to solve problems and shortages existing in the suppliers, making improvement measures aiming at the accuracy, improving overall efficiency by utilizing common identifiers (such as supplier ID, product number and the like) in the data association process, improving the accuracy and the accuracy of the data by continuously optimizing and processing results and the evaluation algorithms, improving the transparency of the suppliers and optimizing the evaluation system, and optimizing the performance of the evaluation system, through a feedback mechanism, the suppliers can timely know the defects of the suppliers and take measures to improve, thereby not only being beneficial to improving the competitiveness of the suppliers, but also being beneficial to establishing a long-term stable cooperative relationship between enterprises and the suppliers and realizing win-win.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.