Joke Collection Website - Cold jokes - Classical algorithm of data mining

Classical algorithm of data mining

1.C4.5: It is a classification decision tree algorithm in machine learning algorithm, and the core algorithm is ID3 algorithm.

2.K-means algorithm: It is a clustering algorithm.

3.SVM: A supervised learning method, which is widely used in statistical classification and regression analysis.

4.Apriori: It is the most influential algorithm for mining frequent itemsets of Boolean association rules.

5.EM: maximum expected value method.

6.pagerank: it is an important content of google algorithm.

7.Adaboost: This is an iterative algorithm. The core idea is to train different classifiers for the same training set, and then assemble the weak classifiers into a stronger final classifier.

8.KNN: This is a mature method in theory and one of the simplest machine learning methods.

9. Naive Bayes: Among many classification methods, decision tree model and Naive Bayes are the most widely used.

10.Cart: classification regression tree. There are two key ideas under the classification tree. The first is the idea of recursively dividing the independent variable space, and the second is pruning with verification data.

Association rule rule definition

Before describing some details about association rules, let's look at an interesting story: the story of diapers and beer.

In a supermarket, there is an interesting phenomenon: diapers and beer are sold together. But this strange move has increased the sales of diapers and beer. This is not a joke, but a real case of Wal-Mart supermarket chain in the United States, which has always been talked about by merchants. Wal-Mart has the largest data warehouse system in the world. In order to accurately understand customers' buying habits in their stores, Wal-Mart analyzes customers' shopping behavior and wants to know what products customers often buy together. Wal-Mart's data warehouse centralizes the detailed raw transaction data of its stores. On the basis of these original transaction data, Wal-Mart uses data mining methods to analyze and mine these data. An unexpected discovery is that beer is the most purchased commodity with diapers! After a lot of practical investigation and analysis, it reveals an American behavior pattern hiding behind diapers and beer: in the United States, some young fathers often go to the supermarket to buy baby diapers after work, and 30% ~ 40% of them will also buy some beer for themselves. The reason for this phenomenon is that American wives often tell their husbands to buy diapers for their children after work, and husbands will bring back their favorite beer after buying diapers.

According to conventional thinking, diapers have nothing to do with beer. If we don't use data mining technology to mine and analyze a large number of transaction data, it is impossible for Wal-Mart to discover this valuable law inside the data.

Data association is an important discovery knowledge in database. If there is some regularity between the values of two or more variables, it is called correlation. Correlation can be divided into simple correlation, time series correlation and causal correlation. The purpose of association analysis is to find out the hidden association network in the database. Sometimes we don't know the correlation function of the data in the database, and even if we do, it is uncertain, so the rules generated by correlation analysis are credible. Association rule mining finds interesting associations or related relationships between itemsets in a large number of data. Agrawal is equal to 1993. Firstly, the problem of mining association rules between itemsets in customer transaction database is proposed. Later, many researchers did a lot of research on mining association rules. Their work includes optimizing the original algorithm, such as introducing random sampling and parallel thinking to improve the efficiency of algorithm mining rules; Popularize the application of association rules. Mining association rules is an important topic in data mining, which has been widely studied by the industry in recent years.