Joke Collection Website - Public benefit messages - Catalogue of data analysis works

Catalogue of data analysis works

Compound directory

Preface 1

1 introduction to data analysis: decompose data 1

Experiment: Test your theory 37

3 Optimization: Finding the Maximum 75

4 data graphics: graphics make you smarter 1 1 1

5 Hypothesis test: Hypothesis is not the case 139

6 Bayesian statistics: crossing the first level 169

7 Subjective probability: digital belief 19 1

8 Heuristic Method: Analysis Based on Human Nature 225

9 histogram: the shape of the number 25 1

10 regression: prediction 279

1 1 error: reasonable error 3 15.

12 related database: can it be related? 359

13 collated data: ordered 385

End of Appendix A: Ten Tips for Missing Words 4 17

Appendix b installation r: start r! 427

Appendix C Installation of Excel Analysis Tools: Tool Library 43 1

Subdivide the table of contents and introduce each chapter.

foreword

The brain's attitude towards data analysis. On the one hand, you are trying to learn some knowledge, on the other hand, you are

Our brains are busy giving up. Your brain is thinking, "It's better to leave the position to something more important,

Which wild animals do you like to stay away from? It's also a good idea to like skiing naked. "

In this case, how to lure your brain to realize that knowing data analysis is where you live?

The root of life?

Who is suitable for reading this book? two

We know what you're thinking, III.

Metacognition v

Conquer the brain VII

Self-report VIII

Tenth technical advisory group

Thank you XI

1. Brief Introduction to Decomposition Data Analysis

Acme cosmetics company needs your help 2.

The CEO wants data analysts to help him increase sales.

Data analysis is to scrutinize the evidence carefully.

Determine question 5

The customer will help you identify Question 6.

The CEO of Acme gave you some feedback.

Break the problem and data into smaller chunks 9

Now let's see what we have learned 10.

Evaluation module 13

The analysis starts from the moment you step in.

Make suggestions 15

Report written 16.

The CEO appreciates your work 17

A piece of news 18

The CEO's convinced opinion leads you astray 20

Your assumptions about the outside world and your beliefs are your mental model.

The statistical model depends on the mental model 22

Mental models should include factors you don't understand.

The CEO admitted that he didn't know something.

Acme sent you a long list of raw data.

Digging deeper into data 3 1

Pan American Wholesale Company confirmed your impression.

Review your work 35

Your analysis enabled the client to make a wise decision.

experiment

Test your theory.

Can you reveal your firm belief to others? An empirical test is in progress? Do the experiment well, and then

There is no way to solve the problem like a good experiment and reveal the real operation of things.

Regularity. A good experiment can often get rid of your infinite dependence on observation data and help you sort out the reasons.

Fruit contact; Reliable empirical data will make your analysis and judgment more convincing.

The cold winter of coffee industry has arrived! 38

The board of directors of Starbucks will hold a meeting in three months.

Starbucks questionnaire 4 1

Be sure to use comparison method 42.

Contrast is a magic weapon to decipher observation data.

Is the sense of value the reason for the decline in sales revenue? Forty-four

The idea of a typical customer 46

Observation analysis method is full of confounding factors.

What effect may the store location have on the analysis results? 48

Partitioning data blocks and managing confounding factors 50

The situation is worse than expected! 53

You need to do an experiment to point out which strategy is the most effective.

The CEO of Starbucks is impatient.

Starbucks has reduced its price by 56%

A month later ... 57

Based on the control group 58

Avoid launching 123 6 1

Let's do the experiment again.

A month later ... 63

Experiments will still be destroyed by mixed factors.

Carefully select groups to avoid confounding factors 65

Randomly select similar groups 67

Random interview 68

Ready, start experiment 7 1

The result is here.

Starbucks has found an experienced sales strategy.

3. Find the biggest optimization

There are some things that everyone wants as much as possible. To this end, we search up and down. If I can put it in numbers,

What students are constantly pursuing-profit, money, efficiency, speed, etc. -It's an opportunity to achieve higher goals.

Just around the corner. There is a data analysis tool that can help us adjust decision variables and find solutions.

Schemes and optimization points enable us to achieve our goals to the maximum extent. This chapter will use such a tool,

And through the powerful spreadsheet software package Solver to achieve this tool.

This is bath toy game time 76.

The variables you can control are limited by constraints.

Decision variables are factors that you can control.

You have an optimization problem.

Find the target 8 1

Your objective function 82

List product combinations with other constraints 83

Draw multiple constraints 84 in the same drawing.

Reasonable choices all appear in the feasible region.

The new constraint changes the feasible region 87.

Using spreadsheets to achieve optimization 90

Planning and solving one-time optimization problem 94

Profits have hit rock bottom.

Your model only describes the situation you specified.

Revise the hypothesis 99 according to the analysis target.

Beware of negative correlation variables 103

The new scheme is immediate. 108

Your assumption is based on the ever-changing actual situation 109

4. Graphic data

Graphics make you smarter.

The data sheet is far from what you need. Your data is complicated and obscure, and all kinds of variables make you dizzying, so you can cope with accumulation.

A mountain of spreadsheets is not only boring, but also a waste of time. Instead, it's not just using

Spreadsheets are different. A vivid and clear image can help you remove "a leaf" with very little paper.

I can't see Mount Tai because I can't.

The new army needs to optimize its website 1 12.

The results came out, and the information designer was out 1 13.

Three infographics 1 14 submitted by the information designer before.

What data do these figures imply? 1 15

Reflect data! 1 16

This is the opinion provided by the designer before. 1 17.

Too much data is never your problem 1 18.

Making data beautiful is not the problem you want to solve. 1 19.

The foundation of data graph lies in the correct comparison of 120.

Your graphics are already more useful than those on the edge of hell 123.

Explore the reasons with scatter plot 124

The best graphic is multivariate graphic 125.

Display multiple graphs at the same time, reflecting more variables 126.

The graphics are great, but the website owner is still not satisfied 130.

Excellent graphic design is helpful for thinking 13 1

Voice of experimental designer 132.

The experimental designers have their own assumptions: 135.

Customers appreciate your work 136

Orders are rolling in from all directions! 137

5. Hypothesis is not hypothesis testing

Things are different, and it is difficult to distinguish between true and false. People need to use complex and changeable data to predict the future, but this is inevitable.

It's still messy when you cut it. Because of this, analysts will not simply listen to superficial explanations, nor will they.

Will take it for granted that these explanations are correct: through careful reasoning of data analysis, analysts can

Evaluate a large number of alternative answers in unusual detail, and then integrate all the information at hand into each module.

In the middle. The falsification method to be studied next is a practical and effective non-intuitive method.

Give me a piece of leather ... 140.

When will we start producing new mobile phone cases? 14 1

PodPhone doesn't want others to see through his next move 142

All the information we know 143

Is the analysis of the electric skin consistent with the data? 144

E-skin obtained confidential strategic memorandum 145.

Variables can be positively correlated or negatively correlated 146.

In the real world, all kinds of reasons are networked, but the nonlinear relationship is 149.

Suppose there are several alternatives for PodPhone 150.

Use the data on hand to test hypotheses 15 1

The core of hypothesis testing is falsification 152.

Finding the Minimum Negative Hypothesis with Diagnostics 160

We can't rule out all the hypotheses one by one, but we can determine which hypothesis is the strongest.

You just received a picture message ... 164.

Coming soon! 167

6. Bayesian statistics

Pass the first level

Data collection never stops. It is necessary to ensure that each analysis process makes full use of the collected sums.

Data related to the problem. Although you have learned the falsification method, it is not a problem to deal with heterogeneous data sources.

What should I do if I encounter a positive probability problem? This is a very convenient analysis tool called.

Bayesian rule, this rule can help you make use of basic probability and fluctuation data to be observant.

The doctor brought unpleasant news 170

Let's read the correctness analysis item by item.

How common is lizard flu? 174

You calculated the false positive of 175.

These terms all refer to conditional probability 176.

You need to calculate 177.

1% people have lizard flu 178.

Your chances of getting lizard flu are still very low 18 1.

Thinking about Complex Probability with Simple Integer 182

After collecting new data, Bayesian rules are used to deal with the basic probability 182.

Bayesian rules can be reused 183.

Second test result: negative 184.

The statistical value of the correctness of the new test has changed 185.

The new information will change your basic probability 186.

Much relieved! 189

7. Digitization of faith

subjective probability

Virtual data can. Really. However, these numbers must describe your mental state.

Know your faith. Subjective probability is a simple way to integrate rigor into intuition, specifically

The practice will be introduced immediately. As the lecture progresses, you will learn how to use standard deviation to evaluate data.

Distribution, a more powerful analysis tool that I learned before, will also appear on the stage again.

Beishui Investment Company needs you to work at 192.

Analysts call each other 193.

Subjective probability reflects expert belief 198

Subjective probability may indicate that there is no real disagreement at all.

The subjective probability of the analyst's answer is 20 1.

The CEO doesn't understand what you are doing.

The CEO appreciates your work.

The standard deviation measures the deviation between the analysis point and the average value 208.

The news caught you off guard 2 13

Bayesian rule is a good way to correct subjective probability 2 17

The CEO knows exactly what to do with this new information.

Russian investors are jubilant! 224

8. Exploratory method

According to human nature analysis.

The changing situation in the real world makes it difficult for analysts to predict things. There are always some data beyond our ability, namely

Optimization methods are often difficult and time-consuming. Fortunately, most practical thinking in life.

Activities are not carried out in the most rational way, but through the use of incomplete and uncertain information and experience.

Deal with it. Make it quick. It is amazing that these experiences can really work, so they are also going on.

An important and necessary tool for data analysis.

The slovenly faction submitted a report to the city Council.

The hasty collection really cleaned up the town.

I measured the effect of my work carelessly.

Their task is to reduce the amount of scattered garbage.

It is not feasible to measure the amount of garbage 230.

Problem, simple answer 23 1

The structure of scattered garbage in cities of data states is complex.

It is impossible to establish and apply a unified metering model for scattered garbage.

Heuristic method is a bridge from intuition to optimization.

Using the fast save tree 239

Is there a simpler way to evaluate the achievements of untidy collections? 240

The fixed pattern is enlightening.

After analysis, prepare to submit 246.

It seems that your analysis impressed the members of the city Council.

9. The shape histogram of numbers

What can the histogram show? There are countless graphical representations of data, and histogram is one of them.

An outstanding person. Histogram is somewhat similar to histogram, which can summarize data quickly and effectively. receive

Next, you will use this small and practical chart to measure the distribution, difference and concentration trend of data.

No matter how big the data set is, you can "see" the mystery in the data by drawing a histogram.

In this chapter, let's use a novel, free and universal software tool to draw histogram.

The annual assessment of employees is coming.

There are many forms of asking for money.

This is the salary increase record over the years.

The histogram reflects the frequency of occurrence of each set of data 262.

The gap between different parts of the histogram is the gap between data points 263.

Install and run R 264

Load data into r program 265

R creates an aesthetic histogram 266.

Draw histogram 27 1 with data subset.

Salary negotiation pays off.

What does it mean for you to negotiate a raise? 277

10. Regression

predict

Know everything, know nothing. Regression analysis has infinite magic, which can help you predict as long as it is used properly.

Some result values. Regression analysis can also predict the future if it is used with control experiments. Businessman enthusiastically

Use regression analysis to help you model and predict customer behavior. This chapter will make you understand and wise.

Correct use of regression analysis can indeed bring great benefits.

How are you going to spend the money? 280

Analyze in order to get a big salary increase.

Wait a minute ... salary calculator! 284

The secret of this algorithm is to predict the salary increase.

Compare two variables with a scatter plot 292

The straight line may indicate the target 294 to the customer.

Using the average graph, the numerical value in each interval is predicted 297.

The tropic of cancer predicts the increase of people's real wages.

The regression line is useful for data with linear correlation characteristics.

You need an equation to make an accurate prediction.

Let r create a regression object 306.

The regression equation is closely related to the scatter plot.

The algorithm of salary calculator is regression equation 3 10.

Your salary calculator doesn't work as planned ... 3 13

1 1. Reasonable error error

The world is complicated. It is not surprising that the prediction is inaccurate. However, if you make a prediction,

By pointing out the error range, you and your customers can not only know the average forecast value, but also know the error.

Typical deviation caused by differences, pointing out that errors can make predictions and beliefs more comprehensive. Teaching through this chapter

Tools, you will also know how to control the error and how to minimize the error, thus improving the prediction.

reliability

Customers are furious 3 16

What did your salary increase forecasting algorithm do? 3 17

Customer composition 3 18

The guy who asks for a 25% salary increase is not within the scope of model 32 1

How to treat customers who want to predict situations outside the data range 322

The guy who was fired for using extrapolation calmed down.

You only solved part of the problem.

What does the distorted salary increase result data look like? 329

Opportunity error = deviation between actual results and model prediction results 330

Mistakes are good for you and your customers.

Opportunity error interview 335

Quantitatively account for the error 336

The residual distribution 337 is quantitatively represented by the root mean square error.

The r model knows that there is a root mean square error 338.

The linear model summary of R shows that the root mean square error is 340.

The basic purpose of segmentation is to manage errors 346.

Excellent regression analysis has both explanatory and predictive functions.

Compared with the original model, the partition model can handle the error 352 better.

Your customers are turning around.

12. Can you understand? relational database

How to organize changeable multivariate data? Spreadsheets only have two-dimensional data: rows and.

Column. If your data includes many aspects, the table format will soon become obsolete. In this chapter,

You will see that spreadsheets are difficult to manage multivariate data, and you will also see a relational database management system.

It makes the storage and retrieval of multivariate data extremely simple.

Data State News wants to analyze the sales of 360.

This is their saved operation tracking data 36 1.

You need to know the correlation between data tables.

A database is a series of data 365 having a specific relationship with each other.

Find a route that runs through all kinds of relationships so as to make necessary comparisons.

A spreadsheet 366 is created that traverses the path.

Correlate the number of articles with the sales volume by summarizing 37 1

It seems that your scatter diagram is really good.

Copying and pasting all these data is a painful thing.

Managing relationships with relational databases 376

Data State News built an RDBMS 377 using your diagram.

Data status news uses SQL to extract data 379

RDBMS data can be compared endlessly 382.

You are on the cover.

13. Ordered classified data

The messy data is useless. Many data collectors need to spend a lot of time sorting out data. no

Neat data can't be divided, formulas can't be applied, and even can't be read, which is regarded as.

Disappearing is common, isn't it? Actually, you can do better. As long as the eyes clearly emerge.

Give the desired data appearance, and then use some text processing tools to make it complete.

Manage data and turn decay into magic.

Just got a customer list from a closed competitor.

Data analysis framework in my closet 387

Head First headhunting company wants to get this list for its sales team.

The root of cleaning up chaotic data lies in preparing for 392.

Once data is organized, it can be repaired 393.

Use # as separator 394.

Excel divides data into multiple columns 395 by separators.

Replace the ""character 399 with the replacement character.

All the surnames have been sorted out.

It's too much trouble to replace the name pattern with body double.

Using nested text formulas to process complex patterns 403

R can handle complex data patterns 404 with regular expressions.

Organize "names" with subcommand 406

Now you can give the goods to the customers.

Maybe it's not finished yet ...

Sort the data so that 409 appears in the repeated number set.

These data may come from the relational database 4 12.

Delete the duplicate name 4 13.

You set a beautiful, neat and unique record 4 14.

Head First headhunting company is a catch all kinds of talents! 4 15

Goodbye ... 4 16

End of appendix a

Ten tips for missing text.

You have gained a lot. However, the technology of data analysis is constantly changing and endless. Due to the length of this book

However, there are still some closely related knowledge that has not been introduced. We will browse the top ten knowledge points in this appendix.

One: A Complete Collection of Statistical Knowledge 4 18

Second: Excel Skills 4 19

Third, the graphic principle of Edward Tufte, a professor at Yale University.

Fourth: Pivot Table 42 1

Fifth: R Community 422

Sixth: Nonlinear and Multiple Regression 423

Seventh: Original Hypothesis-Substitution Hypothesis Test 424

Eighth: Randomness 424

Nine: Google Docs 425

Tenth: Your Professional Skills 426

Go, r! Appendix b installation r

Powerful data analysis functions depend on complex internal mechanisms. Fortunately, the installation only takes a few minutes.

And this appendix will introduce how to install R as easy as blowing off dust.

Appendix c installing Excel analysis tools

tool magazine

By default, some of the best features of Excel are not installed. In order to optimize Chapter 3 and paragraph 1

Chapter 9 Histogram needs to activate the programming solution and analysis tool library, and Excel is installed by default.

These two extensions are installed, but will not be activated unless the user actively operates.