Joke Collection Website - Public benefit messages - Can I become a data analyst with zero basic knowledge?

Can I become a data analyst with zero basic knowledge?

Many people asked me in the background how to change careers as a data analyst, or how graduates can enter the industry. My previous articles were all about hard skills. This time I will talk about soft skills based on an answer on Zhihu. Just treat it as random talk.

I entered the Internet industry with a complete zero foundation, not a zero foundation in data analysis, but a zero foundation in all abilities.

What does it look like to have zero foundation? I spent three to four months looking for a job and ended up joining in an operational capacity.

I have never been a strong person in mathematics and science. Although I studied advanced mathematics, statistics, SQL and C language in college, I skipped them all and took the exam with the help of my friends. Looking back now, I should have learned more at that time.

I didn’t know how to vlookup at first, and no one taught me. I could only do basic operations in Excel. At that time, I had to correlate multiple reports. I relied on my quick hands to search and copy and paste them one by one... I would definitely cry when the amount of data was too large. Later I thought this was not the solution. So I used the almighty Baidu:

"How to match data from multiple tables in Excel."

Then I saw the vlookup function for the first time. I didn't learn it all at once. Every time I used it, I had to read the online examples first. When I later taught the group members, they learned much faster than me.

Excel takes one step at a time, and learning relies on searching and pondering. Take the time to practice analysis with content at work: for example, what kind of users are willing to use our APP, and which indicators of users are particularly good.

Even in the meantime, I don't know how to pivot tables.

I remember that at the beginning of 2015, my boss gave me a task: to collect data online. It would take tens of thousands of pieces. It was impossible for me to copy and paste them all, so I continued to query:

How? Quickly download data from web pages.

So I know crawlers and Python, but I don’t know how to do it. Finally, rely on third-party crawler tools and follow the tutorials. I have already learned HTML+CSS in the early days, and then learned about the structure of web pages, learned Get/Post, and learned regular rules. It took me a week of overtime work to download it.

It’s not over yet. The data is dirty and I still need to clean it. Spend another week learning Excel's find, right, mid, replace, trim and other text processing functions. I didn't know it was called data cleaning at that time, but I learned a lot of techniques. Even if I try to do it quickly and effortlessly, it still takes several days.

When I write a Python crawler now, the efficiency is much faster. Including text cleaning, using Levenshtein speed lever. All added up and done in one night.

No learning is useless, and there is much common knowledge. I learned HTML+CSS because of the crawler, and then I understood the website structure and website analysis indirectly.

In the future, I will know how to arrange Baidu statistics, know JS, learn various indicators on the web page, and understand access paths, funnel conversions, bounce rates, exit rates, etc. This knowledge can be used for more than just websites. It can also be used in APP analysis and user behavior.

We regard learning as a point and read the next book after finishing this book. In fact, this cannot bring out the efficiency of learning. Any knowledge is relevant. Knowledge A can be applied to knowledge B. The knowledge and skill tree should be divergent in a network.

The above chain is the relationship spectrum for me to master new knowledge based on prerequisite knowledge. Data analysis covers a wide range of fields. In addition to a solid business background, it also requires a Swiss Army Knife-like skill tree, which is a T-shaped ability (one specialty, multiple talents).

For example, you see a page with a high bounce rate. In addition to regular analysis, we also need to check the network speed, the user's weak network environment, whether the HTML page is loaded too much, whether cache is used, how the network DNS is, etc. No one will teach you this knowledge, but it affects business results.

Don’t be afraid when you see this. Although you have to learn a lot, as your learning deepens, a lot of knowledge will be easy to understand. Just like the conversion rate comes from website analysis, it can be used in product paths. It can be sublimated into a Sankey diagram and used to stratify users. The further you learn, the easier it will be to master one method and master all methods.

Driving force

In fact, the most difficult threshold for learning data analysis from scratch is not the skills, but the motivation to learn.

I have trained data analysts from scratch, taught Excel from scratch, SQL from scratch, analytical thinking from scratch, and Python from scratch. The difficulty is never about the knowledge, but whether you really want to learn it.

It doesn’t mean you download more than ten gigabytes of information, it means you study, it doesn’t mean you follow a lot of public accounts, it means you study. Because more than ten gigabytes of information will not be opened in the end, many official accounts will end up unread. Does this mean you want to learn? It is too easy to start with zero foundation and difficult to persist, and you will stop after just a brief taste.

I don’t know where to start because I don’t know what to learn. I said that data analysis is a relatively broad subject. It has both the methodology of traditional business analysis and statistics and programming in the data era. But it happens to be a skill that can be used in any position or profession, and it cannot be bypassed.

Learning is a very subjective matter. From elementary school to university, during our decades of student life, the most missing ability is active learning. After so many years of studying for the high school entrance examination and college entrance examination, it is often the case that environmental factors force people to study, and they do not have any motivation or habit to study. After four years of college, the learning potential may be exhausted.

The reason why we are used to passive learning is that everyone has a problem to solve and only knows the application of the formula without knowing the principle. The textbook provides guidance on tactics and tactics, and the content will not exceed the outline. The entire large learning environment is built for passivity.

Learn data analysis now, pick up books, open PDF materials, and follow public accounts. There will be no teachers to correct you and coach you, and there will be no homework to push you and train you. I don’t know which one will be used frequently in my work. There are no data questions for practice, and it is even difficult to distinguish the quality of knowledge on the Internet.

No way to start, right, but this is active learning.

The mentality needs to change.

To learn data analysis from scratch, the biggest teacher can only be yourself. There will not be any article that teaches people to become data analysts overnight. I have coached interns who were willing to learn and grew rapidly, and I have also coached colleagues who were interested but still couldn't get the rhythm. The former is active learning, while the latter is passive learning based on interest.

Because it is a zero-based foundation, more initiative is needed. Data analysis technology is a rapidly developing industry. A few years ago, just knowing SQL was enough. Now you need to know some MR and HIVE. In a few years, SparkSQL may be necessary if you want to do better in this industry. Continuous learning is a necessary ability. Or the foundation is not as good as others, at least learn gender and lose.

I also give my suggestions. Learning should be about setting goals specifically to solve a certain problem. Be thorough and practice is king. No matter what kind of profession you are in, you must have more or less access to data. Don’t analyze it yet, but think about what you can do with the data and make a simple hypothesis.

I am HR, and my hypothesis is that it has become increasingly difficult to recruit people recently.

I am a marketer, and my hypothesis is that marketing costs are too high now and there is no effect.

I am in operations or product, it is easier to handle. Suppose the data of a certain indicator cannot be improved due to ABC and other reasons.

Even students can assume whether it is easy or difficult to make money in the school business district.

Data is collected, generated, combined, utilized, demonstrated and analyzed around hypotheses. This is a McKinsey-style thinking method, which can also be used as a method of learning data. Newcomers can easily get lost in data: I don’t have data, I don’t know what to do with data, I know what to do but don’t know how to do it. Thinking too much is not as useful as having a direction.

The advantage of being based on assumptions is that I first have a direction. Regardless of whether it is right or not, I can at least do analysis according to the direction.

HR thinks it is getting more and more difficult to recruit people, so it can come up with historical data. In the past, when recruiting people, I needed to download a few resumes, make a few phone calls, send out a few offers, and finally get hired. What now? I can also observe the data of each link. Isn’t this the conversion rate? Broaden the time dimension and see if it was difficult to recruit people at this time last year or whether it would be difficult to recruit people at the end of the year. This way you can understand the concept of a line chart.

Marketing specialists can use more data as reference for analysis. Assuming that marketing costs are too high, how high are they now? When did they start to be high? Find out the time point and analyze them. If the effect is not good, when did it not work? Did the market environment change at that time? I assume that the market environment has changed. This is another new hypothesis, and I can continue to come up with a bunch of in-depth research.

Although everyone’s analysis efficiency and results are definitely different, the ideas can all be trained in this way. It is not that data can be analyzed, but that data can only be collected and analyzed with the direction of analysis. My studies have always been focused on solving problems, not just a sudden flash of inspiration.

If you think of the learning journey of data analysis as a long road, we don’t drive all the way to the end. No one can do this. Instead, this road is divided into sections, and a flag is placed on each section as a target. The flag is used as the direction of travel, rather than the terminal station dozens of kilometers away as the target.

Curiosity

In addition to the drive to learn, wanting to become a data analyst also requires curiosity.

Curiosity means asking questions, thinking about problems, pondering problems, and solving problems. If you are a naturally gossipy person, then using it in data analysis will definitely make you a naturally chosen analyst with good talents and beautiful jade.

Many people like to pursue the tools, knowledge, key points, and tips of data analysis. But curiosity is rarely mentioned.

Curiosity is the core ability to solve problems. Programming can be exercised and statistics can be learned. These are not bottlenecks in the end. You have learned all eighteen martial arts, and when you face an enemy battle, what do you need in the end? It’s the desire to win. The desire to win in data is curiosity.

Knowledge determines the lower limit of solving problems, and curiosity determines the upper limit of solving problems. A good data analyst must be curious, able to ask questions, think about problems, and solve problems.

All the activities we initially launched had no monitoring system, and the entire operation lacked data guidance. To me at the time, many operations were a black box. I don't know what is being posted and how it happened. There is only one result output.

If someone asks me what I am asking, I can only make assumptions, maybe one, two or three. Whether that's the case, I don't know.

What is the reason for the increase in operational activity? have no idea.

What is the effect after SMS push? have no idea.

What are the sources of new registered users? have no idea.

At that time, with the expansion of the company's business lines, the number of users increased. It's becoming more and more difficult for me to make correlations in Excel. When I asked R&D for data again, the CTO said to me: Why don't I give you a database permission and you can check it yourself.

I said goodbye to Excel and started learning and understanding databases. Expand exposure from a few tables to hundreds of tables.

Know the difference between left join and inner join. Know group by, know data structure, and know index.

At that time, a user data system needed to be established, including retention, activity, return, stratification and other indicators. I looked up the application and explanation of operational indicators online, as well as the implementation of SQL.

Explain and communicate with R&D. Because of understanding the database, many needs can be realized with more reasonable requirements. This is the first time I started to contact, understand and build a business-centered data system.

To give an example: a user has used the APP for a long time, we call him a loyal user, and then suddenly he does not use it for several weeks, then we will use SQL to find this type of user and analyze his behavior. Why not do a phone interview and try to call him back. The same applies to other operations.

At this time, I can say that I understand the active number, why it rises and why it falls.

We push text messages to different users. With the help of SQL, I can query the quality of the data, but are there any clearer indicators? For example, how many users open the app because of text messages, and what is the text message opening rate?

At that time, the short link used a url scheme, which could automatically jump to the app. For monitoring, we also buried parameters in the short link. Use push data to observe how many people open this text message.

This is the standard for measuring a copy. Good copy will surely trigger users to open it. We often use copywriting as an AB test. For example, we will use SMS marketing, and the operation is linked to gifts. At that time, many users did not download the APP after registering online. We have a text message copy for this type:

丨 We have prepared exclusive thoughts for you, XXXXX, please open the APP to receive it.

The open rate of this text message is about 10%.

But there is still room for optimization, so I continued to modify the copywriting, and the subsequent modifications are as follows:

丨Since you have registered, why not come and receive your exclusive wishes, XXXXX, please open the APP to receive it (the middle content is not included) Change).

The open rate is optimized to 18%. Because it uses marketing psychology and has already registered, it fits the implication of silent cost: I have done everything, why not continue, otherwise the registration will be in vain. This kind of mentality is common in tourist attractions. The tourist attractions are very deceptive, but most people will still say: Since they have come, it is a common mentality.

Follow-up text messages adopted a personalized solution and were ultimately optimized to 25%. The effect is about three times better than the earliest copywriting. If you are not curious about the effect of text messages, and if you do not collect data monitoring indicators, then there is no way to optimize. We may write good copy based on our feelings, but you don’t know the specific effect, but data can.

To give another example, we initially used WeChat Moments to attract new users. At first, there were multiple channels, but I didn’t know which channel was more effective. Then my curiosity came back. Which channel is more effective? Can the invitation conversion rate be optimized? What is the cost of attracting new customers to the channel?

It is still the promotion and implementation of data analysis, because WeChat web page sharing will automatically include from=timeline and other parameters. Through the parameters, I can filter out the data browsed and accessed on WeChat. Later, I asked R&D to set parameters for different channels. Use parameters to count conversion rates and label new users with channel sources.

During this period, it was discovered that the conversion rate of one channel was too low. We roughly divide it into two types of channels. One is the landing page that directly invites users to register, with gift information attached. One is to let users select the gift style first, and then jump to registration in the final receiving step. Through conversion rate analysis, the loss of the latter is more serious. Because the steps are too redundant and there is still a courier address to fill in, the attractiveness of selecting gifts is not enough to support users to complete the process.

So the second channel process was changed. Because the sources of users from different registration channels are labeled, targeted measures can be taken in the subsequent operations of new users. This is one of the reasons text messages can achieve a 25% open rate through personalization.

Curiosity serves to solve problems. By constantly thinking about and solving problems, your abilities related to data analysis will naturally improve.

Fortunately, curiosity can be trained day by day, that is, asking more questions and thinking more about them, the training is not difficult.

Non-data

Another problem with zero-based learning is that it underestimates the importance of business.

In fact, the difficulty in becoming a data analyst is not the lack of knowledge in Excel, SQL, statistics, etc. It's a lack of business knowledge.

One person understands business but not data, and the other understands data but not business. The former is more likely to solve practical problems. Because data analysts always serve the business.

I once proposed to the product (without inviting me to dinner) to arrange APP and Web bureaus to understand users through their paths and make up for the shortcomings of Baidu statistics.

At that time, Hadoop was used to store data, and Hive was used to create offline script cleaning, partitioning, and processing. The pages users browse through the product, the functions they use, and the time they stay there all form the basis of user portraits.

I was once very curious about what a user profile is, because it is said on the Internet that the user's gender, region, age, marriage, finance, interests, and preferences are the basis of the user profile. But our business doesn’t get that much data. And I think that user portraits are for business services and should not have strict and unified standards. As long as it is easy to use in business, it is a good user portrait.

Just like online video user portraits, the actors, release time, place of origin, language, and genre of the movie will be collected. It will also be broken down into whether the user is fast forwarding or dragging. These are business oriented. Even the analysts of video websites have to read countless videos in order to analyze the business. Otherwise, how can we subdivide various indicators with so many movie categories and types? You can judge whether the user is interested by dragging it in. You have to use similar behavior to understand it.

How to learn industry and business knowledge with zero foundation? If you are in contact with the business and just want to do data analysis, it will be much easier. If, like me, you have neither compulsory knowledge nor data understanding, that's okay.

If data is learned through hypothetical thinking, then business should be learned through systematic thinking. Business knowledge also needs a purpose and direction, but it is different from data analysis. Business focuses on systematicity. Systematicity is not big and comprehensive, but top-down structural knowledge. Start by drilling in one direction to get the depth, and the breadth will gradually expand as you dig deeper.

For example, if you are a layman and want to learn the analysis of user operation system, don’t first consider what user operation is. This is too problematic. Instead, aim in one direction, such as activity, understand its definition and meaning, and then think about how to apply it. How to define the activity of offline shopping malls, how to define the activity of hospital patients, and how active is a certain school club? Think about activity with examples around you. The activity of the shopping mall can be the flow of people walking around, the flow of customers shopping, or the wealthy people carrying large and small bags. What factors affect activity? Promotion or discount, holidays or geography. Once these issues are figured out, it will be quick to get started with user operations.

Think about retaining and attracting new customers through the same thinking. You will know that if the flow of people in the mall continues to come to consume next time, it is retention, and if new customers come, it is attracting new customers. What factors influence each other? The final knowledge thinking must have a pyramid structure. The upper level is user operations, and the middle level is new acquisition, activation, and retention. The lower level is the various points and elements.

The learning of data analysis focuses on deduction and reasoning, while the learning of business focuses on correlation and application. This is the case of applying what you learn. Curiosity and assumptions will also be used during this period, both of which are one of the ways to accelerate learning.

In fact, having said so much, for students who want to be data analysts with zero foundation, there may still be some clouds and fog. These soft skills will not help people get to the top in one fell swoop. In fact, the seven weeks to become a data analyst is, as I said from the beginning, an outline for getting started. The important thing is whether you really want to learn and learn well. The master will lead you in. Cultivation depends on the individual. Everything else is empty.

I am reminded of a saying I read a long time ago: when you want to move forward, everything will make way for you. I think that's more powerful than anything I've said.

So you ask me, can I become a data analyst with zero knowledge? My answer is yes.

The article is actually a bit rushed. Finally, I wish everyone a Merry Christmas.