Joke Collection Website - Blessing messages - Tools: Data Analysis (Statistics)

Tools: Data Analysis (Statistics)

Methods: descriptive statistics, inferential statistics

Data: numerical data and categorical data (category, text, cannot be calculated)

Categorical data descriptive statistics: frequency Statistics, frequency percentage

Numerical data descriptive statistics: statistical measures (average: when the numerical differences are large, the average will be enlarged or decreased, median, mode), graphics

Quantile: First quantile: 25th quantile, Second quantile: Median, Third quantile: 75th quantile

Variance: Description Dispersion, data volatility

Standard deviation: The variance is the square. There is no "square" in actual business, so the square root is needed, which is the standard deviation, which can be divided into -

Data Standardization: Z-Score, puts two sets of data into a comparable dimension, such as sales volume and temperature

Dimension: unit; when observing the relationship between date and certain business data, the date can be decomposed into Week and day of the week

Chebyshev's theorem: There are at least 75 data, located within 2 standard deviations of the mean; there are at least 89 data, located within 3 standard deviations of the mean; there are at least 96 The data is within 5 standard deviations of the mean

Visualization: box plot, histogram (symmetrical, steep-walled, zigzag, island, skewed, bimodal)

Chebyshev’s Theorem V2.0

In the normal distribution, at least 68 data are located within 1 standard deviation of the mean

In the normal distribution, At least 95% of the data are within 2 standard deviations of the mean

In the normal distribution, at least 99.8% of the data are within 3 standard deviations of the mean

Probability

Event: {Heads, tails}

Probability: 50 each

Complement, intersection, union

Venn diagram

P(A∪B)=P(A) P(B)-P(A∩B)

P(A|B)=P(A∩B)/P (B)

P(A|B)=P(A)

Bayes’ theorem: Result A has already occurred. How likely is it to infer the real cause through result A?

Three questions

1. Among the people who participate in marketing activities, only 30 are women. Does this mean that women do not like to participate in activities?

2. There are two colors of taxis in a certain city. The market ratio of blue taxis and green taxis is 15:85. A taxi was involved in a hit-and-run accident at night when a witness recognized the taxi as blue. After testing "blue-green" in the same environment, we found: 80 cases were correctly identified and 20 cases were incorrect. What is the possibility of it actually being a blue car?

3. Assume that among the 1,000 normal text messages, there are 2 text messages containing "Macau Casino", and among the spam text messages, there are 400 text messages containing "Macau Casino". Now a new text message is received. Without browsing the content, assuming a normal probability of 50. Now after parsing the content of the text message, I found the word Macau Casino. What is the probability that it is a spam text message?