Joke Collection Website - Mood Talk - Logical regression of knowledge points in data mining

Logical regression of knowledge points in data mining

Author | DD-Kylin

Source | Mudong layman

0x00 Preface

We know that the regression model can solve the problem that the dependent variable is a continuous variable, but if the dependent variable is a classified variable, the regression method will not work. At this time, we have to choose other classification methods, such as decision tree, random forest, SVM and so on. Logistic regression mentioned in this paper is also a good classification method. We need to be clear that although logistic regression is a "regression", it is essentially a binary classification algorithm, which is used to deal with binary classification problems.

0x00 1 approximate logistic regression

Question 1: Can you tell me what logistic regression is?

Answer: Logistic regression is a binary classification algorithm, which is generally used to solve binary classification problems, but it can also be used to solve multi-classification problems. When it is used to solve multi-classification problems, due to the characteristics of logistic regression, we generally turn multi-classification problems into binary classification problems. There are three splitting strategies for the transformation of multi-classification problems, namely one-to-one, one-to-many and many-to-many. Through multi-classification splitting strategy, we can use logistic regression to predict multi-classification problems. But we generally don't use this method, because we can use better algorithms such as random forest, naive Bayes and neural network to predict multi-classification problems.

Question 2: Logistic regression is a binary classification algorithm, so how is it classified?

Answer: Logistic regression is to determine which kind of data should be judged by judging the probability value of data belonging to a certain type. Sigmoid function (Y = 1/( 1+e-z) needs to be introduced here, where z = WTX+b.

), and the sigmoid function has a very special property, that is, it can convert any input value into the output on (0, 1). Logistic regression approximates posterior probability p(y = 1 |x) by sigmoid function. Generally speaking, the output value of sigmoid function is greater than 0.5 as a positive example (that is, 1), and the output value is less than 0.5 as a counter example (that is, 0).

The image of sigmoid function is as follows:

0x02 Farewell to Logistic Regression

Question1:Is the threshold 1:logistic regression classification determined? Can it be adjusted artificially?

Answer: Not necessarily. It can be modified artificially. The output of logistic regression is probability, that is, the possibility that Sigmoid function takes Y output as a positive example. We can customize the classification threshold to change the classification result.

For example, in mail classification, if a message output by sigmoid function belongs to spam, the y value is 0.6, and the y value of useful messages is 0.4. P (y = spam | known conditions) = 0.6,

The corresponding p(y = useful mail | known conditions) = 0.4.

In this case, if p & gt0.5 is spam, it is judged as spam; If p & gt0.7 is spam, this email will be judged as useful. In general, the data will be judged as belonging to which category by default, but because the logistic regression output is the characteristics of probability value, we can customize the threshold according to the specific situation and get a more practical application scenario model.

What is the maximum likelihood method used in logistic regression of problems?

A: Because in the sigmoid function, z = wTx+b, where W and B are unknown, the probability that W and B belong to their true labels is found by maximum likelihood estimation, and the greater the probability, the better. ? However, it is a little difficult to solve the maximum likelihood function, so it is transformed into solving the minimum value, that is, adding a negative sign in front of the obtained target likelihood function is transformed into solving the minimum value. Since the objective function of sign change is a continuous convex function with high order derivable, the minimum value can be solved by gradient descent method and Newton method, and W and B can be easily obtained by function transformation, and then the output of sigmoid function can be known.

Note: Logistic regression is a discriminant model.

What are the applications of logistic regression?

Answer: In fact, the application of logistic regression has a great relationship with its algorithm characteristics. Because logistic regression is a binary classification algorithm with good performance. Therefore, logistic regression can be applied to almost any problem requiring binary classification. Such as cancer detection, spam classification, advertisement click prediction, medical effect analysis, etc.

0x03 Advantages and disadvantages

Question: What are the advantages of logistic regression? What are the disadvantages?

Answer:

The advantages of logistic regression are:

1, the form is simple and the model is highly interpretable. From the weight of features, we can see the influence degree of different features on the results. If the weight of a feature is large, it means that this feature has a great influence on the data results. 2. It can not only predict the category, but also know the approximate probability prediction, which is very useful for many tasks that need probability to assist decision-making. 3. The objective function of 3.logistic regression is an arbitrary differentiable convex function with good mathematical properties. Many existing optimization algorithms can be directly used to solve the logistic regression of the optimal value. Disadvantages are as follows:

1, it is difficult to deal with the problem of data imbalance. Take chestnuts for example. If the number of users who do not place orders on a platform is 10000: 50, then this data is very unbalanced. The machine can predict that all samples are not placed by the user, which can also make the loss function very small. But as a classifier, it is very unfriendly to classify positive and negative samples. 2. Logistic regression itself cannot filter features. If the features are highly correlated, the training speed will be reduced. When the number of features is too large, it will cause over-fitting. 3. It is very troublesome to deal with nonlinear problems with logistic regression. Logistic regression can only deal with linear separability or binary classification without introducing other methods. 0x04 Summary

As for logistic regression, I always think it is a very simple but powerful algorithm. It was not until I wrote this article that I found that it had so many knowledge points to understand. This article only plays a role in attracting jade. If you want to know more, I suggest you read more books and practice more real cases, and you will definitely know more! We leave a few questions for discussion:

Is the probability value output by 1.logistic regression a true probability? 2. How to distinguish positive cases from counterexamples in 2.logistic regression? 3. How to measure the model effect of logistic regression? 4. Push logistic regression by hand.