CN113255927A

CN113255927A - Logistic regression model training method and device, computer equipment and storage medium

Info

Publication number: CN113255927A
Application number: CN202110423633.3A
Authority: CN
Inventors: 李应健
Original assignee: Soxinda Beijing Data Technology Co ltd
Current assignee: Soxinda Beijing Data Technology Co ltd
Priority date: 2021-04-20
Filing date: 2021-04-20
Publication date: 2021-08-13

Abstract

The application relates to a logistic regression model training method, a device, computer equipment and a storage medium. The method comprises the following steps: based on the distribution of each feature in a training set and a test set for training a logistic regression model, acquiring the KL divergence of each feature; comparing the KL divergence of each characteristic with a preset KL divergence threshold value to obtain obvious difference characteristics; optimizing the loss function of the logistic regression model according to the KL divergence of the obvious difference characteristic to obtain an optimized loss function; and training the logistic regression model based on the optimized loss function to obtain the best fitting parameter of the logistic regression model. The method can improve the accuracy of the logistic regression model.

Description

Logistic regression model training method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of machine learning technologies, and in particular, to a logistic regression model training method, apparatus, computer device, and storage medium.

Background

In machine learning, training sets and tests are concentrated, and certain characteristics are distributed in a large difference, so that the problem that the performance generalization capability of a model is not good enough is solved. In order to improve the model generalization capability, the L1 or L2 regularization technique is generally adopted conventionally. In training linear models such as linear or logistic regression, to get a better generalization ability of the model, an L2 regularization technique is typically used, also called coefficient decay, which limits the coefficient size of the model by adding a weighted L2 norm to the loss function of the algorithm, making the model less prone to overfitting.

However, in the current logistic regression model training method, the distribution difference of the features on the training set and the test set is not further considered when the L2 regularization is added, which results in a technical problem that the training precision of the logistic regression model is low.

Disclosure of Invention

In view of the above, it is desirable to provide a logistic regression model training method, apparatus, computer device, and storage medium capable of improving model accuracy.

A method of logistic regression model training, the method comprising:

based on the distribution of each feature in a training set and a test set for training a logistic regression model, acquiring the KL divergence of each feature;

comparing the KL divergence of each characteristic with a preset KL divergence threshold value to obtain obvious difference characteristics;

optimizing the loss function of the logistic regression model according to the KL divergence of the obvious difference characteristic to obtain an optimized loss function;

and training the logistic regression model based on the optimized loss function to obtain the best fitting parameter of the logistic regression model.

In one embodiment, the method further comprises the following steps: acquiring characteristic information data for training a logistic regression model, and preprocessing the characteristic information data;

performing feature extraction on the preprocessed feature information data to obtain at least one feature;

and acquiring labels corresponding to the features, and dividing the features and the labels corresponding to the features into a training set and a test set according to a preset proportion.

In one embodiment, the method further comprises the following steps: and judging whether the characteristics contain continuous variable characteristics or not, and if the characteristics contain the continuous variable characteristics, performing box separation on the continuous variable characteristics and converting the continuous variable characteristics into discrete variable characteristics.

In one embodiment, the method further comprises the following steps: and comparing the KL divergence of each characteristic with a preset KL divergence threshold value respectively to obtain the characteristic that the KL divergence is greater than the preset KL divergence threshold value as an obvious difference characteristic.

In one embodiment, the method further comprises the following steps: adding an L2 regularization term to a loss function of the logistic regression model;

and optimizing an L2 regularization term in the loss function according to the KL divergence of the obvious difference characteristic to obtain the optimized loss function.

In one embodiment, the method further comprises the following steps: generating an optimization factor corresponding to the obvious difference characteristic according to the KL divergence of the obvious difference characteristic;

and multiplying the difference obvious characteristic item of the L2 regularization item in the loss function by an optimization factor corresponding to the difference obvious characteristic to obtain the optimized loss function.

In one embodiment, the method further comprises the following steps: and respectively obtaining the sum of the KL divergence and 1 of the obvious difference characteristics as an optimization factor corresponding to the obvious difference characteristics.

A logistic regression model training apparatus, the apparatus comprising:

the acquiring module is used for acquiring KL divergence of each feature based on distribution of each feature in a training set and a testing set for training a logistic regression model;

the comparison module is used for comparing the KL divergence of each characteristic with a preset KL divergence threshold value to obtain obvious difference characteristics;

the optimization module is used for optimizing the loss function of the logistic regression model according to the KL divergence of the obvious difference characteristics to obtain an optimized loss function;

and the training module is used for training the logistic regression model based on the optimized loss function to obtain the best fitting parameter of the logistic regression model.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

The logistic regression model training method, the device, the computer equipment and the storage medium acquire the KL divergence of each feature by the distribution of each feature in a training set and a test set based on the logistic regression model; comparing the KL divergence of each characteristic with a preset KL divergence threshold value to obtain obvious difference characteristics; optimizing a loss function of the logistic regression model according to the KL divergence of the obvious difference characteristics to obtain an optimized loss function; and training the logistic regression model based on the optimized loss function to obtain the best fitting parameters of the logistic regression model. For the characteristics with huge distribution difference, relatively larger attenuation weight is given, and the accuracy of the logistic regression model is improved.

Drawings

FIG. 1 is a diagram of an exemplary implementation of a logistic regression model training method;

FIG. 2 is a schematic flow chart diagram illustrating a method for training a logistic regression model in one embodiment;

FIG. 3 is a schematic flow chart diagram illustrating a method for training a logistic regression model in another embodiment;

FIG. 4 is a block diagram of an exemplary logistic regression model training apparatus;

FIG. 5 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The logistic regression model training method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The terminal 102 and the server 104 may be used individually to perform the logistic regression model training method provided herein. The terminal 102 and the server 104 may also be used to cooperatively perform the logistic regression model training method provided herein. For example, the server 104 is configured to obtain KL divergence of each feature based on distribution of each feature in a training set and a test set for training a logistic regression model; comparing the KL divergence of each characteristic with a preset KL divergence threshold value to obtain obvious difference characteristics; optimizing the loss function of the logistic regression model according to the KL divergence of the obvious difference characteristic to obtain an optimized loss function; and training the logistic regression model based on the optimized loss function to obtain the best fitting parameter of the logistic regression model.

The terminal 102 may be, but not limited to, an information obtaining device capable of obtaining the characteristic information data, and the server 104 may be implemented by an independent server or a server cluster composed of a plurality of servers.

In an embodiment, as shown in fig. 2, a logistic regression model training method is provided, which is described by taking the method as an example for being applied to the terminal in fig. 1, and includes the following steps:

step 202, based on the distribution of each feature in the training set and the test set for training the logistic regression model, acquiring the KL divergence of each feature.

Wherein KL divergence generally refers to relative entropy. Relative entropy, also known as information divergence, is a measure of the asymmetry of the difference between two probability distributions. The higher the similarity of the distribution of the features used to train the logistic regression model over the training set and the test set, the smaller the KL divergence between them.

Specifically, the distribution difference of each feature on the training set and the test set is obtained according to the distribution of each feature on the training set and the test set for training the logistic regression model through a KL divergence calculation formula. The KL divergence calculation formula is as follows:

wherein p and q represent the distribution of two variables, p refers to the distribution of the features in the training set, and q refers to the distribution of the features in the test set. The KL divergence measurement is the difference of the distribution of the features in the training set and the test set, and p and q can be taken as the distribution of the same feature in the training set and the test set, so that the difference of the distribution of the feature in the training set and the test set is measured, and the larger the formula calculation result is, the larger the KL divergence value of the feature is, the larger the distribution difference is.

And step 204, comparing the KL divergence of each characteristic with a preset KL divergence threshold value to obtain the obvious difference characteristic.

The obvious difference feature refers to a feature with a difference between the training set distribution and the test set distribution higher than a preset threshold, and generally refers to a feature with a larger difference between the training set distribution and the test set distribution. For features with large distribution differences, relatively larger attenuation weights are given, thereby achieving a reduction in the importance of such features in the model.

Specifically, after the KL divergence of each feature is obtained, the KL divergence of each feature is compared with a preset KL divergence threshold, and if the KL divergence exceeds the preset KL divergence threshold, the feature belongs to a feature with a large difference in distribution in the training set and the test set, that is, belongs to a feature with an obvious difference. For such features, relatively larger attenuation weight needs to be given, so that the importance of such features in the model is reduced, and the training precision of the logistic regression model is improved.

And step 206, optimizing the loss function of the logistic regression model according to the KL divergence of the obvious difference characteristics to obtain the optimized loss function.

Specifically, after the obvious difference features are determined, the loss function of the logistic regression model is optimized according to the KL divergence of the obvious difference features, and the L2 norm with the changed weight is added into the loss function of the logistic regression model during specific optimization, so that the function of limiting the coefficient size of the logistic regression model is realized, and the logistic regression model is less prone to overfitting.

And 208, training the logistic regression model based on the optimized loss function to obtain the best fitting parameter of the logistic regression model.

Specifically, after obtaining the optimized loss function, the logistic regression model is trained according to the optimized loss function until the logistic regression model converges or meets a preset requirement. And obtaining the best fitting parameters of the logistic regression model, and determining the final logistic regression model according to the best fitting parameters of the logistic regression model.

In the logistic regression model training method, the KL divergence of each feature is obtained through the distribution of each feature in a training set and a test set based on the logistic regression model; comparing the KL divergence of each characteristic with a preset KL divergence threshold value to obtain obvious difference characteristics; optimizing a loss function of the logistic regression model according to the KL divergence of the obvious difference characteristics to obtain an optimized loss function; and training the logistic regression model based on the optimized loss function to obtain the best fitting parameters of the logistic regression model. For the characteristics with huge distribution difference, relatively larger attenuation weight is given, and the accuracy of the logistic regression model is improved.

In one embodiment, the obtaining the KL divergence of each feature based on the distribution of each feature in the training set and the test set for training the logistic regression model further includes:

acquiring characteristic information data for training a logistic regression model, and preprocessing the characteristic information data;

Specifically, fig. 3 is a schematic flow diagram of a logistic regression model training method in another embodiment, and as shown in fig. 3, before training the logistic regression model, feature information data for training the logistic regression model needs to be acquired, and the feature information data for training the logistic regression model needs to be preprocessed; the data preprocessing includes one or more of data cleansing, data integration, data reduction, and data transformation. For the feature information data after the preprocessing, feature extraction is needed to obtain at least one feature. Feature extraction starts with an initial set of measurement data and establishes derived features that are intended to provide information and non-redundancy, thereby facilitating subsequent learning and generalization steps and, in some cases, leading to better interpretability. The feature extraction is related to dimensionality reduction, and the quality of the features has a crucial influence on generalization capability. After the characteristics are extracted, the characteristics are respectively corresponding to different labels, and the characteristics and the labels corresponding to the characteristics are divided into a training set and a test set according to a preset proportion. The training set is used for training the logistic regression model, and the testing set is used for testing the trained logistic regression model.

In this embodiment, the feature information data is preprocessed, the feature of the processed feature information data is extracted, and finally, the features and the labels corresponding to the features are divided into a training set and a test set according to a preset proportion, so that the data set for training the logistic regression model is obtained.

In an embodiment, the obtaining of the label corresponding to each feature and dividing the feature and the label corresponding to the feature into a training set and a test set according to a preset ratio further include:

and judging whether the characteristics contain continuous variable characteristics or not, and if the characteristics contain the continuous variable characteristics, performing box separation on the continuous variable characteristics and converting the continuous variable characteristics into discrete variable characteristics.

Specifically, the features obtained after the feature extraction process may include discrete variable features and continuous variable features. For the discrete variable characteristics, the KL divergence value can be directly calculated, and the optimized loss function can be obtained. For the continuous variable feature, it needs to be subjected to binning processing, i.e. a section of continuous value is divided into several sections, and the value of each section is regarded as a classification. The process of converting continuous values into discrete values is generally referred to as binning. And calculating KL divergence values of the continuous variable characteristics subjected to the box separation processing, and acquiring an optimized loss function.

In this embodiment, by determining whether the feature includes a continuous variable feature, and if the feature includes the continuous variable feature, performing binning processing on the continuous variable feature, and converting the continuous variable feature into a discrete variable feature, discretization processing on the continuous variable feature is realized, so that a KL dispersion value of the continuous variable feature can be calculated, and an optimized loss function is obtained. The accuracy of the logistic regression model training is improved.

In one embodiment, comparing the KL divergence of the features with a preset KL divergence threshold value to obtain the distinct features includes:

and comparing the KL divergence of each characteristic with a preset KL divergence threshold value respectively to obtain the characteristic that the KL divergence is greater than the preset KL divergence threshold value as an obvious difference characteristic.

Specifically, the distinct features refer to features having a difference between the training set and the test set distribution higher than a preset threshold, and generally refer to features having a larger difference between the training set and the test set distribution. For features with large distribution differences, relatively larger attenuation weights are given, thereby achieving a reduction in the importance of such features in the model. The obvious difference characteristic is obtained by comparing the KL divergence of each characteristic with a preset KL divergence threshold value respectively to obtain the characteristic that the KL divergence is larger than the preset KL divergence threshold value as the obvious difference characteristic.

For example, if the KL divergence threshold is set to 5%, the KL divergence of each feature is compared with the preset KL divergence threshold of 5%, and a feature with a KL divergence greater than 5% is obtained as the distinct feature.

In this embodiment, the KL divergence of each feature is respectively compared with the preset KL divergence threshold value, the feature that the KL divergence is greater than the preset KL divergence threshold value is obtained as the obvious difference feature, the obvious difference feature is obtained, so that the obvious difference feature can be further given a relatively larger attenuation weight according to the obvious difference feature, and the accuracy of the logistic regression model is improved.

In an embodiment, the optimizing the loss function of the logistic regression model according to the KL divergence of the distinct features of difference, and obtaining the optimized loss function includes:

adding an L2 regularization term to a loss function of the logistic regression model;

In particular, when training a linear model such as a linear regression model or a logistic regression model, in order to obtain better generalization capability of the model, an L2 regularization technique, which is also called coefficient attenuation, is generally used. The L2 norm of the weight is added into the loss function of the linear regression model, so that the coefficient size of the linear regression model is limited, and the linear regression model is less prone to overfitting. In this embodiment, an L2 regularization term is added to the loss function of the logistic regression model, and the L2 regularization term in the loss function is optimized according to the KL divergence of the distinct features to obtain an optimized loss function.

For example, assuming that the loss function of the logistic regression model is cost (h (θ), y), after adding the L2 regularization term to the loss function of the logistic regression model, the loss function of the logistic regression model is:

wherein theta represents a coefficient corresponding to each feature in the logistic regression model; λ is a regularization parameter, tunable. And optimizing an L2 regularization term in the loss function according to the KL divergence of the obvious difference characteristic, so as to realize optimization of the loss function.

In this embodiment, the L2 regularization term is added to the loss function of the logistic regression model, and the L2 regularization term in the loss function is optimized according to the KL divergence of the distinct difference features to obtain an optimized loss function, thereby implementing optimization of the loss function of the logistic regression model.

In an embodiment, the optimizing an L2 regularization term in the loss function according to the KL divergence of the distinct feature of difference includes:

generating an optimization factor corresponding to the obvious difference characteristic according to the KL divergence of the obvious difference characteristic;

Specifically, when the L2 regularization term in the loss function is optimized, an optimization factor corresponding to the distinct features is generated according to the KL divergence of the distinct features. Wherein, the optimization factors corresponding to the obvious difference features are coefficients for adjusting the L2 regularization term and adjusting the loss function. The obtaining mode of the optimization factor is not specifically limited, and the optimized loss function is obtained by multiplying the optimization factor corresponding to the obvious difference characteristic before the obvious difference characteristic item of the L2 regularization item in the loss function.

For example, the weight of the significant difference feature in the regularization term of L2 added to the loss function of the logistic regression model is multiplied by a number α greater than 1, and α is the optimization factor corresponding to the significant difference feature. The weights of the different distinct features are multiplied by optimization factors corresponding to the different distinct features respectively.

In this embodiment, the optimized loss function after optimization is obtained by generating the optimization factor corresponding to the distinct feature according to the KL divergence of the distinct feature, and multiplying the optimization factor corresponding to the distinct feature before the distinct feature of the L2 regularization term in the loss function. The optimization of the loss function is realized, and the optimization precision of the logistic regression model is improved.

In one embodiment, the generating an optimization factor corresponding to the distinct features according to the KL divergence of the distinct features includes:

and respectively obtaining the sum of the KL divergence and 1 of the obvious difference characteristics as an optimization factor corresponding to the obvious difference characteristics.

Specifically, when the L2 regularization term in the loss function is optimized, an optimization factor corresponding to the distinct features is generated according to the KL divergence of the distinct features. The specific method for generating the optimization factors corresponding to the distinct features is to respectively obtain the sum of the KL divergence of the distinct features and 1, and use the sum as the optimization factors corresponding to the distinct features. Because the KL divergence of the obvious difference features is larger than the preset KL divergence threshold, the optimization factors corresponding to the obvious difference features are all numerical values larger than 1. In this way, when gradient is decreased, the obvious difference features are normalized by L2 with greater intensity, and the coefficients corresponding to the obvious difference features are attenuated by a greater amplitude, so that the influence of distribution differences of different features can be reduced.

For example, if the KL divergence of the first feature in the logistic regression model is 0.07 and exceeds the preset KL divergence threshold value of 0.05, the first feature belongs to the distinct feature. The optimization factor α + 0.07-1.07 for the distinct feature is a number greater than 1, and θ is multiplied by the optimization factor α₁The optimized loss function can be obtained. The optimized loss function becomes after expansion

In this embodiment, the sum of the KL divergence of the distinct features and 1 is used as the optimization factor corresponding to the distinct features, so that the optimization factor corresponding to the distinct features is obtained, the attenuation range of the coefficient corresponding to the distinct features is increased, the influence caused by the distribution difference of the distinct features is reduced, and the accuracy of optimizing the logistic regression model is increased.

It should be understood that although the various steps in the flow charts of fig. 2-3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-3 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.

In one embodiment, as shown in fig. 4, there is provided a logistic regression model training apparatus including: an obtaining module 401, a comparing module 402, an optimizing module 403, and a training module 404, wherein:

an obtaining module 401, configured to obtain KL divergence of each feature based on distribution of each feature in a training set and a test set for training a logistic regression model.

A comparison module 402, configured to compare the KL divergence of each feature with a preset KL divergence threshold, so as to obtain an obvious difference feature.

And an optimizing module 403, configured to optimize a loss function of the logistic regression model according to the KL divergence of the distinct features, to obtain an optimized loss function.

A training module 404, configured to train the logistic regression model based on the optimized loss function, and obtain a best fit parameter of the logistic regression model.

In an embodiment, the obtaining module 401 is further configured to: acquiring characteristic information data for training a logistic regression model, and preprocessing the characteristic information data; performing feature extraction on the preprocessed feature information data to obtain at least one feature; and acquiring labels corresponding to the features, and dividing the features and the labels corresponding to the features into a training set and a test set according to a preset proportion.

In an embodiment, the obtaining module 401 is further configured to: and judging whether the characteristics contain continuous variable characteristics or not, and if the characteristics contain the continuous variable characteristics, performing box separation on the continuous variable characteristics and converting the continuous variable characteristics into discrete variable characteristics.

In one embodiment, the comparing module 402 is further configured to: and comparing the KL divergence of each characteristic with a preset KL divergence threshold value respectively to obtain the characteristic that the KL divergence is greater than the preset KL divergence threshold value as an obvious difference characteristic.

In one embodiment, the optimization module 403 is further configured to: adding an L2 regularization term to a loss function of the logistic regression model; and optimizing an L2 regularization term in the loss function according to the KL divergence of the obvious difference characteristic to obtain the optimized loss function.

In one embodiment, the optimization module 403 is further configured to: generating an optimization factor corresponding to the obvious difference characteristic according to the KL divergence of the obvious difference characteristic; and multiplying the difference obvious characteristic item of the L2 regularization item in the loss function by an optimization factor corresponding to the difference obvious characteristic to obtain the optimized loss function.

In one embodiment, the optimization module 403 is further configured to: and respectively obtaining the sum of the KL divergence and 1 of the obvious difference characteristics as an optimization factor corresponding to the obvious difference characteristics.

The logistic regression model training device obtains KL divergence of each feature by distribution of each feature in a training set and a test set based on the logistic regression model; comparing the KL divergence of each characteristic with a preset KL divergence threshold value to obtain obvious difference characteristics; optimizing a loss function of the logistic regression model according to the KL divergence of the obvious difference characteristics to obtain an optimized loss function; and training the logistic regression model based on the optimized loss function to obtain the best fitting parameters of the logistic regression model. For the characteristics with huge distribution difference, relatively larger attenuation weight is given, and the accuracy of the logistic regression model is improved.

For specific limitations of the training apparatus for the logistic regression model, reference may be made to the above limitations of the training method for the logistic regression model, and details are not repeated here. The modules in the logistic regression model training device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a logistic regression model training method.

Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring characteristic information data for training a logistic regression model, and preprocessing the characteristic information data; performing feature extraction on the preprocessed feature information data to obtain at least one feature; and acquiring labels corresponding to the features, and dividing the features and the labels corresponding to the features into a training set and a test set according to a preset proportion.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and judging whether the characteristics contain continuous variable characteristics or not, and if the characteristics contain the continuous variable characteristics, performing box separation on the continuous variable characteristics and converting the continuous variable characteristics into discrete variable characteristics.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and comparing the KL divergence of each characteristic with a preset KL divergence threshold value respectively to obtain the characteristic that the KL divergence is greater than the preset KL divergence threshold value as an obvious difference characteristic.

In one embodiment, the processor, when executing the computer program, further performs the steps of: adding an L2 regularization term to a loss function of the logistic regression model; and optimizing an L2 regularization term in the loss function according to the KL divergence of the obvious difference characteristic to obtain the optimized loss function.

In one embodiment, the processor, when executing the computer program, further performs the steps of: generating an optimization factor corresponding to the obvious difference characteristic according to the KL divergence of the obvious difference characteristic; and multiplying the difference obvious characteristic item of the L2 regularization item in the loss function by an optimization factor corresponding to the difference obvious characteristic to obtain the optimized loss function.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and respectively obtaining the sum of the KL divergence and 1 of the obvious difference characteristics as an optimization factor corresponding to the obvious difference characteristics.

The computer equipment acquires the KL divergence of each characteristic through distribution of each characteristic in a training set and a testing set based on the logistic regression model; comparing the KL divergence of each characteristic with a preset KL divergence threshold value to obtain obvious difference characteristics; optimizing a loss function of the logistic regression model according to the KL divergence of the obvious difference characteristics to obtain an optimized loss function; and training the logistic regression model based on the optimized loss function to obtain the best fitting parameters of the logistic regression model. For the characteristics with huge distribution difference, relatively larger attenuation weight is given, and the accuracy of the logistic regression model is improved.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring characteristic information data for training a logistic regression model, and preprocessing the characteristic information data; performing feature extraction on the preprocessed feature information data to obtain at least one feature; and acquiring labels corresponding to the features, and dividing the features and the labels corresponding to the features into a training set and a test set according to a preset proportion.

In one embodiment, the computer program when executed by the processor further performs the steps of: and judging whether the characteristics contain continuous variable characteristics or not, and if the characteristics contain the continuous variable characteristics, performing box separation on the continuous variable characteristics and converting the continuous variable characteristics into discrete variable characteristics.

In one embodiment, the computer program when executed by the processor further performs the steps of: and comparing the KL divergence of each characteristic with a preset KL divergence threshold value respectively to obtain the characteristic that the KL divergence is greater than the preset KL divergence threshold value as an obvious difference characteristic.

In one embodiment, the computer program when executed by the processor further performs the steps of: adding an L2 regularization term to a loss function of the logistic regression model; and optimizing an L2 regularization term in the loss function according to the KL divergence of the obvious difference characteristic to obtain the optimized loss function.

In one embodiment, the computer program when executed by the processor further performs the steps of: generating an optimization factor corresponding to the obvious difference characteristic according to the KL divergence of the obvious difference characteristic; and multiplying the difference obvious characteristic item of the L2 regularization item in the loss function by an optimization factor corresponding to the difference obvious characteristic to obtain the optimized loss function.

In one embodiment, the computer program when executed by the processor further performs the steps of: and respectively obtaining the sum of the KL divergence and 1 of the obvious difference characteristics as an optimization factor corresponding to the obvious difference characteristics.

The storage medium acquires KL divergence of each feature by distribution of each feature in a training set and a test set based on the logistic regression model for training; comparing the KL divergence of each characteristic with a preset KL divergence threshold value to obtain obvious difference characteristics; optimizing a loss function of the logistic regression model according to the KL divergence of the obvious difference characteristics to obtain an optimized loss function; and training the logistic regression model based on the optimized loss function to obtain the best fitting parameters of the logistic regression model. For the characteristics with huge distribution difference, relatively larger attenuation weight is given, and the accuracy of the logistic regression model is improved.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of logistic regression model training, the method comprising:

2. The method according to claim 1, wherein the obtaining the KL divergence of each feature based on the distribution of each feature in a training set and a testing set for training the logistic regression model further comprises:

3. The method according to claim 2, wherein the obtaining of the label corresponding to each feature divides the feature and the label corresponding to the feature into a training set and a test set according to a preset proportion, and previously further comprises:

4. The method according to claim 1, wherein comparing the KL divergence of the features with a preset KL divergence threshold value to obtain a distinct feature comprises:

5. The method according to claim 1, wherein the optimizing the loss function of the logistic regression model according to the KL divergence of the apparently different features comprises:

6. The method according to claim 5, wherein the optimizing an L2 regularization term in the loss function according to the KL divergence of the distinct features includes:

7. The method according to claim 6, wherein the generating an optimization factor corresponding to the distinct feature according to the KL divergence of the distinct feature comprises:

8. A logistic regression model training apparatus, characterized in that the apparatus comprises:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.