CN116151618A - Evaluation feature classification method for scientific and technological enterprise rating model - Google Patents
Evaluation feature classification method for scientific and technological enterprise rating model Download PDFInfo
- Publication number
- CN116151618A CN116151618A CN202310112196.2A CN202310112196A CN116151618A CN 116151618 A CN116151618 A CN 116151618A CN 202310112196 A CN202310112196 A CN 202310112196A CN 116151618 A CN116151618 A CN 116151618A
- Authority
- CN
- China
- Prior art keywords
- database
- data
- comparison result
- sub
- evaluation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/018—Certifying business or products
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- General Physics & Mathematics (AREA)
- Marketing (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Educational Administration (AREA)
- Theoretical Computer Science (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The invention relates to the technical field of large data analysis and processing, in particular to a classification method for evaluating characteristics of a scientific enterprise rating model.
Description
Technical Field
The invention relates to the technical field of big data analysis and processing, in particular to a classification method for evaluation characteristics of a grading model of a scientific and technological enterprise.
Background
The credit of the scientific enterprises is rated through the classification model, and the comprehensive analysis is performed on the capability of the scientific enterprises to fulfill corresponding economic contracts and the integral credibility of the scientific enterprises, so that the scientific enterprises can be effectively helped to predict risks and prevent risks, and the good development of the industry is promoted, wherein the scientific and reasonable classification of the credit data of the scientific enterprises is an important link for determining the accurate evaluation results of the classification model on the enterprises.
Chinese patent publication No.: CN112819341a discloses the following content, the invention discloses a credit risk assessment method for scientific and technological type small and micro enterprises, and belongs to the technical field of big data analysis and processing. The method comprises the steps of (1) constructing a credit risk evaluation system of 'people + things + technological attributes' for a technological type small and micro enterprise; constructing an index-data mapping scheme which can be obtained in real time based on the latest network public data; (2) After data is acquired through the latest public data source, a supervision type merging C4.5 model is adopted, threshold training is carried out by utilizing the data, and high-entropy branches exceeding a threshold are merged; (3) And training based on an SM-C4.5 algorithm to obtain a small micro-enterprise risk rating model aiming at 23 secondary dimension indexes. The enterprise risk precision and recall ratio are respectively 81.47% and 82.4% measured by experiments, and the method has better effect compared with the prior method.
However, the prior art has the following problems:
in the prior art, the classification and the distinction of sample data are not considered, and the evaluation result after the classification model is trained by the sample data is adjusted by the preset comparison threshold value in the classification model, so that the sample data is classified again when the classification model is trained by the sample data, and the accuracy and the scientificity of the classification of the sample data are improved.
Disclosure of Invention
In order to solve the above problems, the present invention provides a classification method for evaluating characteristics of a rating model of a scientific enterprise, which includes:
step S1, establishing a plurality of databases for storing credit data of each scientific and technological enterprise, and judging the complexity level of each database based on the data amount of the credit data stored in each database;
step S2, obtaining credit data of a database belonging to a first complexity level, and storing the credit data into different sub-databases to classify the credit data, wherein the storing process comprises the steps of determining the sub-databases in which the credit data needs to be stored based on the matching result of the credit data and feature keywords prestored in a feature database;
step S3, judging whether each sub-database can be used for training a classification model according to the data quantity of the credit data stored in each sub-database;
step S4, calling sub-databases which can be used for training a classification model one by one, inputting credit data in the called sub-databases into the classification model to classify the credit data, outputting a plurality of evaluation parameters based on classification results, acquiring the evaluation results of the called sub-databases based on the evaluation parameters, judging whether the evaluation results of the called sub-databases are qualified based on discrete parameter analysis of the evaluation results, judging whether the evaluation results of the databases are qualified based on the evaluation results of all the sub-databases which can be used for training the classification model, and calculating the evaluation parameters and the evaluation results by algorithms preset in the classification model;
step S5, judging whether the classification model is qualified according to the qualification rate of the evaluation result, and returning to the step S4 after adjusting the preset influence factor comparison threshold value in the classification model when the classification model is judged to be unqualified until the classification model is qualified;
and S6, obtaining a qualified classification model to classify the credit data.
Further, in the step S1, a data amount Nc of the credit data stored in each database is obtained, the data amount Nc is compared with a preset database data amount comparison threshold Nc0, and the complexity level of the database is determined according to the comparison result, wherein,
if the comparison result meets the first data quantity comparison result, judging that the complexity level of the database is a first complexity level;
if the comparison result meets the second data quantity comparison result, judging that the complexity level of the database is a second complexity level;
the first data volume comparison result is that Nc is larger than or equal to Nc0, and the second data volume comparison result is that Nc is smaller than Nc0.
Further, in the step S2, an association relationship between each of the feature databases and the sub-databases is pre-established, the credit data is compared with the feature keywords in each of the feature databases, and the sub-databases in which the credit data is to be stored are determined according to the comparison result,
if the comparison result meets the preset storage condition, judging that the credit data is required to be stored in a sub-database associated with the characteristic database;
the preset storage condition is that the credit data is identical to the feature keywords in the feature database.
Further, in the step S3, the data amount Ne of the credit data stored in the sub-database is acquired, the data amount Ne is compared with a preset aggregate data amount comparison threshold Ne0, and whether the sub-database is usable for training a classification model is determined according to the comparison result, wherein,
if the comparison result meets the third data quantity comparison result, judging that the sub database can be used for training the classification model;
if the comparison result meets the fourth data quantity comparison result, judging that the sub database is not available for training the classification model;
the third data quantity comparison result is Ne more than or equal to Ne0, and the fourth data quantity comparison result is Ne less than Ne0.
Further, in said step S4, credit data in a sub-database usable for training a classification model is entered into said classification model for operation, wherein,
calculating an influence factor characteristic parameter F of the credit data according to a formula (1),
in the formula (1), S0 represents the sum of the data amounts of feature keywords in all feature databases;
comparing the influence factor characteristic parameter F with a first influence factor comparison threshold F1 and a second influence factor comparison threshold F2 preset in the classification model, classifying the credit data according to comparison results,
if the comparison result meets the first influence factor comparison result, judging the credit data to be of a first category, and storing the credit data into a first influence data set;
if the comparison result meets the comparison result of the second influence factor, judging the credit data to be of a second category, and storing the credit data into a second influence data set;
if the comparison result meets the comparison result of the third influence factor, judging the credit data to be of a third category, and storing the credit data into a third influence data set;
the first influence factor comparison result is F2 or more, the second influence factor comparison result is F1 or less and F < F2, and the third influence factor comparison result is F < F1.
Further, in the step S4, when the classification of the credit data is completed, the data amount of the credit data in each influencing data set is obtained, the evaluation parameter Re of the sub-database is calculated according to the formula (2),
Re=n1*A%+n2*B%+n3*C%(2)
in the formula (2), n1 represents the data amount of the signal data in the first influence data set, n2 represents the data amount of the signal data in the second influence data set, n3 represents the data amount of the signal data in the third influence data set, a% represents a preset first proportion parameter, B% represents a preset second proportion parameter, and C% represents a preset third proportion parameter;
obtaining a plurality of evaluation parameters Re output by the classification model, calculating an evaluation result Re' of the sub-database according to a formula (3),
re in the formula (3) i And the ith evaluation parameter of the sub database is represented, and a represents the data quantity of the evaluation parameter Re output by the classification model.
Further, in the step S4, a discrete parameter E of the evaluation result of the sub-database is calculated according to the formula (4),
and comparing the discrete parameter E with a preset discrete comparison threshold E0, and judging whether the evaluation result Re' of the sub-database is qualified or not according to the comparison result, wherein,
if the comparison result meets the first discrete parameter comparison result, judging that the evaluation result Re' of the sub-database is qualified;
if the comparison result meets the second discrete parameter comparison result, judging that the evaluation result Re' of the sub-database is unqualified;
wherein, the comparison result of the first discrete parameter is E < E0, and the comparison result of the second discrete parameter is E more than or equal to E0.
Judging whether the evaluation result RE of the database is qualified or not according to the evaluation result Re' of the sub-database, wherein,
under a first condition, judging that the evaluation result RE of the database is qualified;
wherein the first condition is that the rating result Re' of all sub-databases which can be used for training the classification model in the database is qualified.
Further, in the step S5, the qualification rate P of the evaluation result of the classification model is calculated according to the formula (5),
in the formula (5), N represents the number of databases that pass the evaluation result among the databases of the first complexity level, and N0 represents the number of databases of the first complexity level.
Further, in the step S5, the qualification rate P of the evaluation result of the classification model is compared with a preset qualification rate comparison threshold P0, and whether the classification model is qualified is determined according to the comparison result, wherein,
if the comparison result meets the first qualification rate comparison result, judging that the classification model is qualified;
if the comparison result meets the second qualification rate comparison result, judging that the classification model is unqualified;
the first qualification rate comparison result is that P is more than or equal to P0, and the second qualification rate comparison result is that P is less than P0.
Further, in the step S5, an influence factor comparison threshold value preset in the classification model is adjusted when the classification model is determined to be unqualified, wherein,
the first influence factor adjustment mode is to increase the first influence factor comparison threshold value F1 to a first influence factor value F11 according to a preset influence factor adjustment parameter F, and increase the second influence factor comparison threshold value F2 to a second influence factor value F21 according to the preset influence factor adjustment parameter F;
the second influence factor adjustment mode is to reduce the first influence factor comparison threshold value F1 to a third influence factor value F12 according to a preset influence factor adjustment parameter F, and reduce the second influence factor comparison threshold value F2 to a fourth influence factor value F22 according to the preset influence factor adjustment parameter F;
wherein F12 is less than F22 and F11 is less than F21.
Compared with the prior art, the method and the device have the advantages that the complexity level of the database is judged through the data quantity of the credit data in the database, the credit data of the database with the first complexity level is classified according to the characteristic keywords prestored in the characteristic database, the credit data is stored in the sub-databases, the sub-databases which can be used for training the classification model are determined according to the data quantity of the credit data stored in each sub-database, the credit data in the sub-databases which can be used for training the classification model are input into the classification model, the evaluation result is obtained according to the classification result, whether the classification model is qualified or not is judged according to the qualification rate of the evaluation result, and the qualified classification model is obtained by adjusting the influence factor comparison threshold preset in the classification model when the classification model is judged to be unqualified, so that the credit data is classified, and the accuracy and the scientificity of the classification of the credit data of a scientific enterprise are improved.
In particular, in the invention, the complexity level of the database is judged according to the data amount of the credit data stored in the database corresponding to the scientific and technological enterprise, in practical situations, the data amount of the credit data stored in the database reflects the scale and the complexity level of the industrial structure of the scientific and technological enterprise, so that the data amount of the credit data stored in the database corresponding to the scientific and technological enterprise can scientifically judge the complexity level of the database corresponding to the scientific and technological enterprise, and classify the credit data of the database with the complexity higher than a certain degree, thereby ensuring the quality and representativeness of the credit data used for training a classification model and ensuring the effect of classifying the credit data of the scientific and technological enterprise.
In particular, in the invention, the sub-database to which the credit data belongs is judged according to the feature keywords prestored in the feature database, in practical situations, the feature keywords can better represent different features of the credit data, when the credit data is the same as the feature keywords in the feature database, the credit data can be judged to belong to the sub-database corresponding to the feature database, the credit data with different features can be distinguished through the feature keywords, and further the credit data with different features can be classified scientifically and reasonably, so that the accuracy and scientificity of the classification of the credit data are ensured.
In particular, in the invention, whether the sub-database can be used for training the classification model is judged according to the data quantity of the credit data stored in the sub-database, in the practical situation, when the data quantity of the credit data in the sub-database reaches a certain standard, the sub-database has representativeness, and the sub-database with representativeness is selected to train the classification model, thereby ensuring the training effect of the classification model and further improving the subsequent classification effect of the credit data of the scientific enterprises through the classification model.
In particular, in the invention, the credit data is stored into the corresponding influence data set according to the influence factor characteristic parameters of the credit data, and the evaluation results of all sub-databases are determined according to the data amount of the credit data in each influence data set, in the practical situation, the influence factor characteristic parameters of the credit data characterize the influence degree of the credit data, and the larger the proportion of the credit data with high influence degree grade in the evaluation results, therefore, the credit data in the sub-databases are classified by a method of storing the credit data with the same influence degree grade into the same influence data set, and the data amount of the credit data stored in the influence data sets with different influence degree grades is multiplied by different percentages to calculate the evaluation results, thereby improving the accuracy and scientificity of classifying the credit data when training the classification model by the credit data of a science and technology enterprise, and ensuring the accuracy of the evaluation results of the classification model.
In particular, in the invention, whether the classification model is qualified or not is judged according to the qualification rate of the evaluation result of the classification model, the qualification rate of the evaluation result is calculated by the proportion of the qualified database of the first complexity level in the database of the first complexity level, the proportion characterizes the applicability of the classification model to the database of the first complexity level, and when the proportion is larger than a certain value, the applicability of the classification model reaches a certain degree, so that the classification model can be scientifically judged to be qualified.
In particular, in the invention, when the classification model is judged to be unqualified, the preset influence factor comparison threshold value in the classification model is adjusted until the classification model is qualified, in the practical situation, the influence factor comparison threshold value is adjusted to change the influence data set to which the credit data belongs, the influence grade of the credit data is adjusted, namely the credit data is reclassified, and after reclassification, the evaluation results of all sub databases are determined according to the data quantity of the credit data in each influence data set until the classification model is qualified, so that the effect of classifying the credit data of the scientific enterprises through the classification model is ensured.
Drawings
Fig. 1 is a schematic diagram of a classification method of evaluation characteristics of a rating model of a scientific enterprise according to an embodiment of the invention.
Detailed Description
In order that the objects and advantages of the invention will become more apparent, the invention will be further described with reference to the following examples; it should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.
It should be noted that, in the description of the present invention, terms such as "upper," "lower," "left," "right," "inner," "outer," and the like indicate directions or positional relationships based on the directions or positional relationships shown in the drawings, which are merely for convenience of description, and do not indicate or imply that the apparatus or elements must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
Furthermore, it should be noted that, in the description of the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those skilled in the art according to the specific circumstances.
Referring to fig. 1, which is a schematic diagram illustrating steps of a classification method for evaluating characteristics of a rating model of a scientific enterprise according to an embodiment of the present invention, the classification method for evaluating characteristics of a rating model of a scientific enterprise includes:
step S1, establishing a plurality of databases for storing credit data of each scientific and technological enterprise, and judging the complexity level of each database based on the data amount of the credit data stored in each database;
step S2, obtaining credit data of a database belonging to a first complexity level, and storing the credit data into different sub-databases to classify the credit data, wherein the storing process comprises the steps of determining the sub-databases in which the credit data needs to be stored based on the matching result of the credit data and feature keywords prestored in a feature database;
step S3, judging whether each sub-database can be used for training a classification model according to the data quantity of the credit data stored in each sub-database;
step S4, calling sub-databases which can be used for training a classification model one by one, inputting credit data in the called sub-databases into the classification model to classify the credit data, outputting a plurality of evaluation parameters based on classification results, acquiring the evaluation results of the called sub-databases based on the evaluation parameters, judging whether the evaluation results of the called sub-databases are qualified based on discrete parameter analysis of the evaluation results, judging whether the evaluation results of the databases are qualified based on the evaluation results of all the sub-databases which can be used for training the classification model, and calculating the evaluation parameters and the evaluation results by algorithms preset in the classification model;
step S5, judging whether the classification model is qualified according to the qualification rate of the evaluation result, and returning to the step S4 after adjusting the preset influence factor comparison threshold value in the classification model when the classification model is judged to be unqualified until the classification model is qualified;
and S6, obtaining a qualified classification model to classify the credit data.
Specifically, in the step S1, the data amount Nc of the credit data stored in each database is acquired, the data amount Nc is compared with a preset database data amount comparison threshold Nc0, nc0 > 0, and the complexity level of the database is determined according to the comparison result,
if the comparison result meets the first data quantity comparison result, judging that the complexity level of the database is a first complexity level;
if the comparison result meets the second data quantity comparison result, judging that the complexity level of the database is a second complexity level;
the first data volume comparison result is that Nc is larger than or equal to Nc0, and the second data volume comparison result is that Nc is smaller than Nc0.
Specifically, in the invention, the complexity level of the database is judged according to the data amount of the credit data stored in the database corresponding to the scientific and technological enterprise, in practical situations, the data amount of the credit data stored in the database reflects the scale and the complexity level of the industrial structure of the scientific and technological enterprise, so that the data amount of the credit data stored in the database corresponding to the scientific and technological enterprise can scientifically judge the complexity level of the database corresponding to the scientific and technological enterprise, and classify the credit data of the database with the complexity higher than a certain degree, thereby ensuring the quality and representativeness of the credit data used for training a classification model and ensuring the effect of classifying the credit data of the scientific and technological enterprise.
Specifically, in the step S2, an association relationship between each of the feature databases and the sub-databases is pre-established, the credit data is compared with feature keywords in each of the feature databases, and the sub-databases in which the credit data is to be stored are determined according to the comparison result,
if the comparison result meets the preset storage condition, judging that the credit data is required to be stored in a sub-database associated with the characteristic database;
the preset storage condition is that the credit data is identical to the feature keywords in the feature database.
Specifically, in the invention, the sub-database to which the credit data belongs is judged according to the feature keywords prestored in the feature database, in practical situations, the feature keywords can better represent different features of the credit data, when the credit data is identical with the feature keywords in the feature database, the credit data can be judged to belong to the sub-database corresponding to the feature database, the credit data with different features can be distinguished through the feature keywords, and further the credit data with different features can be classified scientifically and reasonably, so that the accuracy and scientificity of the credit data classification are ensured.
Specifically, the form of credit data in the present invention may be various, for example, financial transaction data, tax data, etc. of an enterprise under the condition of obtaining an authorized license, and may be extended or replaced according to the specific type of classification model to be trained.
Specifically, the setting mode of the feature database is not specifically limited, the feature keywords can be preset by a person skilled in the art based on a model, for example, the existing rating model relates to a plurality of fields, such as evaluation of financial risk, credit risk or tax risk, and then the feature keywords can be set as a plurality of feature keywords related to the financial risk, the credit risk and the tax risk, so that more characteristic information is extracted from credit data acquired from big data.
Specifically, in the step S3, the data amount Ne of the credit data stored in the sub-database is acquired, the data amount Ne is compared with a preset aggregate data amount comparison threshold Ne0, ne0 > 0, and whether the sub-database is usable for training a classification model is determined according to the comparison result, wherein,
if the comparison result meets the third data quantity comparison result, judging that the sub database can be used for training the classification model;
if the comparison result meets the fourth data quantity comparison result, judging that the sub database is not available for training the classification model;
the third data quantity comparison result is Ne more than or equal to Ne0, and the fourth data quantity comparison result is Ne less than Ne0.
Specifically, in the invention, whether the sub-database can be used for training the classification model is judged according to the data quantity of the credit data stored in the sub-database, in the actual situation, when the data quantity of the credit data in the sub-database reaches a certain standard, the sub-database has representativeness, and the sub-database with representativeness is selected to train the classification model, so that the training effect of the classification model is ensured, and the subsequent effect of classifying the credit data of the scientific enterprises through the classification model is further improved.
Specifically, in the step S4, credit data in a sub-database which can be used for training a classification model is input into the classification model for operation, wherein,
calculating an influence factor characteristic parameter F of the credit data according to a formula (1),
in the formula (1), S0 represents the sum of the data amounts of feature keywords in all feature databases;
comparing the influence factor characteristic parameter F with a first influence factor comparison threshold F1 and a second influence factor comparison threshold F2 preset in the classification model, wherein F1 is more than 0 and F2, classifying the credit data according to the comparison result,
if the comparison result meets the first influence factor comparison result, judging the credit data to be of a first category, and storing the credit data into a first influence data set;
if the comparison result meets the comparison result of the second influence factor, judging the credit data to be of a second category, and storing the credit data into a second influence data set;
if the comparison result meets the comparison result of the third influence factor, judging the credit data to be of a third category, and storing the credit data into a third influence data set;
the first influence factor comparison result is F2 or more, the second influence factor comparison result is F1 or less and F < F2, and the third influence factor comparison result is F < F1.
Specifically, in the step S4, when the classification of the credit data is completed, the data amount of the credit data in each influencing data set is obtained, the evaluation parameter Re of the sub-database is calculated according to the formula (2),
Re=n1*A%+n2*B%+n3*C%(2)
in the formula (2), n1 represents the data amount of the signal data in the first influence data set, n2 represents the data amount of the signal data in the second influence data set, n3 represents the data amount of the signal data in the third influence data set, A% represents a preset first proportion parameter, A% is less than or equal to 50% and less than 100%, B% represents a preset second proportion parameter, B% is less than or equal to 25% and less than 50%, C% represents a preset third proportion parameter, and 0 < C% is less than 25%;
obtaining a plurality of evaluation parameters Re output by the classification model, calculating an evaluation result Re' of the sub-database according to a formula (3),
re in the formula (3) i And the ith evaluation parameter of the sub database is represented, and a represents the data quantity of the evaluation parameter Re output by the classification model.
Specifically, in the invention, the credit data is stored into the corresponding influence data set according to the influence factor characteristic parameters of the credit data, the evaluation results of all sub-databases are determined according to the data amount of the credit data in each influence data set, in the practical situation, the influence factor characteristic parameters of the credit data characterize the influence degree of the credit data, and the larger the proportion of the credit data with high influence degree grade in the evaluation results, therefore, the credit data in the sub-databases are classified by a method of storing the credit data with the same influence degree grade into the same influence data set, and the data amount of the credit data stored in the influence data sets with different influence degree grades is multiplied by different percentages to calculate the evaluation results, thereby improving the accuracy and the scientificity of classifying the credit data when training the classification model by the credit data of a science and technology enterprise, and ensuring the accuracy of the evaluation results of the classification model.
Specifically, in the step S4, a discrete parameter E of the evaluation result of the sub-database is calculated according to the formula (4),
and comparing the discrete parameter E with a preset discrete comparison threshold E0, and judging whether the evaluation result Re' of the sub-database is qualified or not according to the comparison result, wherein,
if the comparison result meets the first discrete parameter comparison result, judging that the evaluation result Re' of the sub-database is qualified;
if the comparison result meets the second discrete parameter comparison result, judging that the evaluation result Re' of the sub-database is unqualified;
wherein, the comparison result of the first discrete parameter is E < E0, and the comparison result of the second discrete parameter is E more than or equal to E0.
Judging whether the evaluation result RE of the database is qualified or not according to the evaluation result Re' of the sub-database, wherein,
under a first condition, judging that the evaluation result RE of the database is qualified;
wherein the first condition is that the rating result Re' of all sub-databases which can be used for training the classification model in the database is qualified.
Specifically, in the step S5, the qualification rate P of the evaluation result of the classification model is calculated according to the formula (5),
in the formula (5), N represents the number of databases that pass the evaluation result among the databases of the first complexity level, and N0 represents the number of databases of the first complexity level.
Specifically, in the step S5, the qualification rate P of the evaluation result of the classification model is compared with a preset qualification rate comparison threshold P0, 0.5 < P0 < 1, and whether the classification model is qualified is determined according to the comparison result, wherein,
if the comparison result meets the first qualification rate comparison result, judging that the classification model is qualified;
if the comparison result meets the second qualification rate comparison result, judging that the classification model is unqualified;
the first qualification rate comparison result is that P is more than or equal to P0, and the second qualification rate comparison result is that P is less than P0.
Specifically, in the invention, whether the classification model is qualified or not is judged according to the qualification rate of the evaluation result of the classification model, the qualification rate of the evaluation result is calculated by the proportion of the qualified database of the first complexity level in the database of the first complexity level, the proportion represents the applicability of the classification model to the database of the first complexity level, and when the proportion is larger than a certain value, the applicability of the classification model reaches a certain degree, so that the classification model can be scientifically judged to be qualified.
Specifically, in the step S5, an influence factor comparison threshold value preset in the classification model is adjusted when the classification model is determined to be failed, wherein,
the first influence factor adjustment mode is to increase the first influence factor comparison threshold value F1 to a first influence factor value F11 according to a preset influence factor adjustment parameter F, and increase the second influence factor comparison threshold value F2 to a second influence factor value F21 according to the preset influence factor adjustment parameter F;
the second influence factor adjustment mode is to reduce the first influence factor comparison threshold value F1 to a third influence factor value F12 according to a preset influence factor adjustment parameter F, and reduce the second influence factor comparison threshold value F2 to a fourth influence factor value F22 according to the preset influence factor adjustment parameter F;
wherein, F12 is more than 0 and less than F22 is more than 0 and less than F11 is more than F21 and less than 1, and F is more than 0 and less than 0.1.
Specifically, in the invention, when the classification model is judged to be unqualified, the preset influence factor comparison threshold value in the classification model is adjusted until the classification model is qualified, in the practical situation, the influence factor comparison threshold value is adjusted to change the influence data set to which the credit data belongs, the influence grade of the credit data is adjusted, namely the credit data is reclassified, and after reclassification, the evaluation results of all sub databases are determined according to the data quantity of the credit data in each influence data set until the classification model is qualified, so that the effect of classifying the credit data of the scientific enterprises through the classification model is ensured.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.
Claims (10)
1. The method for classifying the evaluation characteristics of the scientific and technological enterprise rating model is characterized by comprising the following steps of:
step S1, establishing a plurality of databases for storing credit data of each scientific and technological enterprise, and judging the complexity level of each database based on the data amount of the credit data stored in each database;
step S2, obtaining credit data of a database belonging to a first complexity level, and storing the credit data into different sub-databases to classify the credit data, wherein the storing process comprises the steps of determining the sub-databases in which the credit data needs to be stored based on the matching result of the credit data and feature keywords prestored in a feature database;
step S3, judging whether each sub-database can be used for training a classification model according to the data quantity of the credit data stored in each sub-database;
step S4, calling sub-databases which can be used for training a classification model one by one, inputting credit data in the called sub-databases into the classification model to classify the credit data, outputting a plurality of evaluation parameters based on classification results, acquiring the evaluation results of the called sub-databases based on the evaluation parameters, judging whether the evaluation results of the called sub-databases are qualified based on discrete parameter analysis of the evaluation results, judging whether the evaluation results of the databases are qualified based on the evaluation results of all the sub-databases which can be used for training the classification model, and calculating the evaluation parameters and the evaluation results by algorithms preset in the classification model;
step S5, judging whether the classification model is qualified according to the qualification rate of the evaluation result, and returning to the step S4 after adjusting the preset influence factor comparison threshold value in the classification model when the classification model is judged to be unqualified until the classification model is qualified;
and S6, obtaining a qualified classification model to classify the credit data.
2. The method for classifying evaluation features of a scientific and technological enterprise rating model according to claim 1, wherein in the step S1, the data amount Nc of the credit data stored in each database is obtained, the data amount Nc is compared with a preset database data amount comparison threshold value Nc0, and the complexity level of the database is determined according to the comparison result,
if the comparison result meets the first data quantity comparison result, judging that the complexity level of the database is a first complexity level;
if the comparison result meets the second data quantity comparison result, judging that the complexity level of the database is a second complexity level;
the first data volume comparison result is that Nc is larger than or equal to Nc0, and the second data volume comparison result is that Nc is smaller than Nc0.
3. The method for classifying the evaluation features of the scientific and technological enterprise rating model according to claim 2, wherein in the step S2, the association relation between each feature database and the sub-database is pre-established, the credit data is compared with the feature keywords in each feature database, and the sub-database in which the credit data needs to be stored is determined according to the comparison result,
if the comparison result meets the preset storage condition, judging that the credit data is required to be stored in a sub-database associated with the characteristic database;
the preset storage condition is that the credit data is identical to the feature keywords in the feature database.
4. The method for classifying a rating model evaluation feature of a scientific enterprise according to claim 3, wherein in the step S3, a data amount Ne of credit data stored in the sub-database is acquired, the data amount Ne is compared with a preset aggregate data amount comparison threshold Ne0, and whether the sub-database is usable for training a classification model is determined based on the comparison result, wherein,
if the comparison result meets the third data quantity comparison result, judging that the sub database can be used for training the classification model;
if the comparison result meets the fourth data quantity comparison result, judging that the sub database is not available for training the classification model;
the third data quantity comparison result is Ne more than or equal to Ne0, and the fourth data quantity comparison result is Ne less than Ne0.
5. The method for classifying characteristics according to claim 4, wherein in step S4, credit data in a sub-database for training a classification model is inputted into the classification model to perform an operation, wherein,
calculating an influence factor characteristic parameter F of the credit data according to a formula (1),
in the formula (1), S0 represents the sum of the data amounts of feature keywords in all feature databases;
comparing the influence factor characteristic parameter F with a first influence factor comparison threshold F1 and a second influence factor comparison threshold F2 preset in the classification model, classifying the credit data according to comparison results,
if the comparison result meets the first influence factor comparison result, judging the credit data to be of a first category, and storing the credit data into a first influence data set;
if the comparison result meets the comparison result of the second influence factor, judging the credit data to be of a second category, and storing the credit data into a second influence data set;
if the comparison result meets the comparison result of the third influence factor, judging the credit data to be of a third category, and storing the credit data into a third influence data set;
the first influence factor comparison result is F2 or more, the second influence factor comparison result is F1 or less and F < F2, and the third influence factor comparison result is F < F1.
6. The method for classifying the evaluation characteristics of the scientific and technological enterprise rating model according to claim 5, wherein in the step S4, when the classification of the credit data is completed, the data amount of the credit data in each influencing data set is obtained, the evaluation parameter Re of the sub-database is calculated according to the formula (2),
Re=n1*A%+n2*B%+n3*C% (2)
in the formula (2), n1 represents the data amount of the signal data in the first influence data set, n2 represents the data amount of the signal data in the second influence data set, n3 represents the data amount of the signal data in the third influence data set, a% represents a preset first proportion parameter, B% represents a preset second proportion parameter, and C% represents a preset third proportion parameter;
obtaining a plurality of evaluation parameters Re output by the classification model, calculating an evaluation result Re' of the sub-database according to a formula (3),
re in the formula (3) i An i-th evaluation parameter representing the sub-database, a representing the evaluation output by the classification modelData amount of valence quantity Re.
7. The method for classifying evaluation features of a scientific and technological enterprise rating model according to claim 6, wherein in said step S4, a discrete parameter E of the evaluation result of said sub-database is calculated according to formula (4),
and comparing the discrete parameter E with a preset discrete comparison threshold E0, and judging whether the evaluation result Re' of the sub-database is qualified or not according to the comparison result, wherein,
if the comparison result meets the first discrete parameter comparison result, judging that the evaluation result Re' of the sub-database is qualified;
if the comparison result meets the second discrete parameter comparison result, judging that the evaluation result Re' of the sub-database is unqualified;
wherein, the comparison result of the first discrete parameter is E < E0, and the comparison result of the second discrete parameter is E more than or equal to E0.
Judging whether the evaluation result RE of the database is qualified or not according to the evaluation result Re' of the sub-database, wherein,
under a first condition, judging that the evaluation result RE of the database is qualified;
wherein the first condition is that the rating result Re' of all sub-databases which can be used for training the classification model in the database is qualified.
8. The method for classifying evaluation features of a scientific and technological enterprise rating model according to claim 7, wherein in said step S5, a qualification rate P of the evaluation result of said classification model is calculated according to the formula (5),
in the formula (5), N represents the number of databases that pass the evaluation result among the databases of the first complexity level, and N0 represents the number of databases of the first complexity level.
9. The method for classifying evaluation features of a scientific and technological enterprise rating model according to claim 8, wherein in the step S5, the qualification rate P of the evaluation result of the classification model is compared with a preset qualification rate comparison threshold value P0, and whether the classification model is qualified or not is judged according to the comparison result, wherein,
if the comparison result meets the first qualification rate comparison result, judging that the classification model is qualified;
if the comparison result meets the second qualification rate comparison result, judging that the classification model is unqualified;
the first qualification rate comparison result is that P is more than or equal to P0, and the second qualification rate comparison result is that P is less than P0.
10. The method for classifying evaluation features of a scientific and technological enterprise rating model according to claim 9, wherein in the step S5, a preset impact factor comparison threshold value in the classification model is adjusted when the classification model is judged to be unqualified, wherein,
the first influence factor adjustment mode is to increase the first influence factor comparison threshold value F1 to a first influence factor value F11 according to a preset influence factor adjustment parameter F, and increase the second influence factor comparison threshold value F2 to a second influence factor value F21 according to the preset influence factor adjustment parameter F;
the second influence factor adjustment mode is to reduce the first influence factor comparison threshold value F1 to a third influence factor value F12 according to a preset influence factor adjustment parameter F, and reduce the second influence factor comparison threshold value F2 to a fourth influence factor value F22 according to the preset influence factor adjustment parameter F;
wherein F12 is less than F22 and F11 is less than F21.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310112196.2A CN116151618A (en) | 2023-02-06 | 2023-02-06 | Evaluation feature classification method for scientific and technological enterprise rating model |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310112196.2A CN116151618A (en) | 2023-02-06 | 2023-02-06 | Evaluation feature classification method for scientific and technological enterprise rating model |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN116151618A true CN116151618A (en) | 2023-05-23 |
Family
ID=86357872
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310112196.2A Withdrawn CN116151618A (en) | 2023-02-06 | 2023-02-06 | Evaluation feature classification method for scientific and technological enterprise rating model |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN116151618A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119513379A (en) * | 2024-10-16 | 2025-02-25 | 北京洛斯达科技发展有限公司 | Air energy storage power station information platform data management method, device, equipment and medium |
-
2023
- 2023-02-06 CN CN202310112196.2A patent/CN116151618A/en not_active Withdrawn
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119513379A (en) * | 2024-10-16 | 2025-02-25 | 北京洛斯达科技发展有限公司 | Air energy storage power station information platform data management method, device, equipment and medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112001788B (en) | Credit card illegal fraud identification method based on RF-DBSCAN algorithm | |
| CN112989621B (en) | Model performance evaluation method, device, equipment and storage medium | |
| CN113762764A (en) | A system and method for automatic classification and early warning of imported food safety risks | |
| CN114511019A (en) | A method and system for classifying and grading identification of sensitive data | |
| CN105930645A (en) | Evaluation Method of Communication Station Equipment Maintenance Support Capability Based on Principal Component Analysis | |
| CN111860698A (en) | Method and device for determining stability of learning model | |
| CN114723234B (en) | Transformer capacity conceal identification method, system, computer equipment and storage medium | |
| CN117455681A (en) | Service risk prediction method and device | |
| CN114638688A (en) | Interception strategy derivation method and system for credit anti-fraud | |
| CN118279034B (en) | Internet financial wind control report analysis method and system based on artificial intelligence | |
| CN119941410A (en) | A multi-dimensional financial stress testing and financial early warning method and system | |
| CN110222733B (en) | High-precision multi-order neural network classification method and system | |
| CN116151618A (en) | Evaluation feature classification method for scientific and technological enterprise rating model | |
| KR102336462B1 (en) | Apparatus and method of credit rating | |
| CN113919932A (en) | Client scoring deviation detection method based on loan application scoring model | |
| CN116645014A (en) | Provider supply data model construction method based on artificial intelligence | |
| CN111784066B (en) | Method, system and equipment for predicting annual operation efficiency of power distribution network | |
| CN113902565A (en) | Risk assessment method, device, equipment and storage medium for financial products | |
| CN112950048A (en) | National higher education system health evaluation based on fuzzy comprehensive evaluation | |
| CN119067607B (en) | A financial business approval method and system based on multiple data sources | |
| CN114139931A (en) | Enterprise data evaluation method, device, computer equipment and storage medium | |
| CN114663102B (en) | Method, device and storage medium for predicting default of bond issuer based on semi-supervised model | |
| CN118689902A (en) | A surface formation confrontation case retrieval method and device based on subjective and objective weights | |
| CN118333235A (en) | Behavior fraud risk prediction method and device and electronic equipment | |
| CN117217902A (en) | Credit risk identification method, apparatus, device and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| WW01 | Invention patent application withdrawn after publication |
Application publication date: 20230523 |
|
| WW01 | Invention patent application withdrawn after publication |