Joke Collection Website - Joke collection - Misunderstanding of Big Data: Data Statistics ≠ Big Data
Misunderstanding of Big Data: Data Statistics ≠ Big Data
Myth about big data: Data statistics is something that has happened, and big data is often used to predict or recommend things that have not happened yet. The two cannot be equated. However, whether it is data statistics or big data, it is to make the work more effective and make the decision more rational and accurate.
Big data is so hot that it is widely used in all walks of life, and there are obvious signs of overheating in the recent stage. Is Big Data a Marketing Vocabulary or Methodology? Lao Li, the author of this article, is a senior employee of a big data service provider. His project is big data analysis for different industries. He believes that you must first have a basic understanding of big data, that is, "a lot of data is not necessarily valuable." In addition, data statistics is not the same as big data, and the difference between data statistics and big data lies in artificial intelligence.
In the past two years, "big data" has been widely used in all walks of life, and there are obvious signs of overheating in the recent stage. From CCTV's Spring Festival travel rush robbing migration map to Sebrina's exclamation of Weibo data; From the big data of the two sessions during the two sessions to the high-and-low-necked sweaters in "Stars", "big data" has been pushed to an unprecedented height, and it has also changed from a sophisticated scientific research direction to a well-known marketing vocabulary.
I am not qualified to represent the academic community, let alone judge who is right or wrong. I can only talk about big data in my eyes based on my own work experience:
What is big data?
Baidu Encyclopedia defines big data as: big data, or huge data, refers to information that involves so much data that it can't be captured, managed, processed and sorted in a reasonable time by current mainstream software tools to help enterprises make more active business decisions.
The definition given by Gartner is that "big data" is a massive, high-growth and diversified information asset, which needs a new processing mode to have stronger decision-making, insight and process optimization capabilities.
Personally, I think Gartner's definition is more appropriate. "New processing mode" is a key word, which is also one of the most important characteristics that I understand that "big data" is different from traditional statistical analysis. This so-called "new processing mode" has two meanings:
1.Hadoop has become a symbol of the era of big data because of its massive data and the demand for more efficient storage and processing technologies;
2. If you think that big data equals Hadoop, it is all wet. Hadoop is only a necessary condition in the era of big data. Another obvious sign of big data is the close combination of data mining and artificial intelligence. This is also one of the most obvious differences between my understanding of "big data" and many so-called "big data" projects now. I will expand it for you in a future case.
In addition to the difference of the above "new processing mode", I personally think there is another major difference: the statistical analysis of data is based on the vertical classification of existing data, while big data is based on the processing of existing massive data, making predictions and recommendations for data that has not yet been generated. Data statistics is something that has happened, and big data is often used to predict or recommend things that have not happened yet.
How to predict and recommend?
At present, the main recommendation algorithms can be roughly divided into two categories. One is based on behavior and the other is based on content. Of course, there will be more than ten algorithms for different fields, different predictions and different recommendation objects. This is not the content of this article.
Behavior-based analysis, as its name implies, is the analysis of the "traces" left by users on the Internet and mobile Internet, namely browsing, clicking, collecting, purchasing and secondary purchasing, and draws the prediction and recommendation results of future purchases. Behavior-based analysis belongs to group wisdom, which comprehensively utilizes the behavior preferences of group users. There will be interaction between users, which is more in line with user behavior in the real world.
Content-based analysis, including the analysis of text, pictures, audio, video and other information, draws the conclusion of prediction and recommendation. The "gene" of the content matches users' preferences, the most representative of which is Pandora's music recommendation project, in which more than 400 experts mark all the songs in the music library, and then establish personal contact with music to complete music recommendation. The analysis of content is only for individuals and has nothing to do with the relationship between users.
What can big data do?
Talking about this problem now may make everyone laugh. It seems that everyone knows that big data can do this and that, and even in the end we feel ridiculous. Big data has not been "demonized" but "entertained". Big data seems to be something far away and close to us, which becomes unreal.
Well, I will talk about what problems big data "solved" according to my work experience: in short, big data can help us solve the problems of decision-making and choice.
Weather forecast is the oldest and most famous forecast. You can decide what clothes to wear tomorrow, whether to bring an umbrella, etc. according to the forecast;
In the past two years, big data has been applied to the film and television production industry. Based on the analysis of the audience's preferences, we can predict and design the stories that the audience likes, find the actors that the audience likes to play related roles, and even predict the box office. All these predictions are based on data, and after some model processing, they are close to the real conclusion. To some extent, it gives decision makers a basis for decision-making, such as House of Cards and Stars.
Big data also plays an important role in solving people's "choice" problem. Don't laugh, no matter your age, gender, education, people are facing unprecedented choice problems at present. Academically, this is a problem caused by the "long tail effect"; To put it more bluntly, it is because of the contradiction between more and more optional objects and our own processing ability.
The progress of science and technology makes people lazier, that is, our own processing ability is reduced, whether subjective or objective. But there are more and more objects to choose from. From complex commodities (e-commerce) to music in a massive music library; From boyfriends and girlfriends on dating websites to traffic lights.
Big data based on artificial intelligence is a means to make people "lazy". According to your historical behavior, judge your possible preferences and even needs and recommend the best results to you. This is big data. She is your sweet housekeeper and your best friend.
One of the most classic cases is the investigation of "beer" and "diapers" made by Wal-Mart. In the research, Wal-Mart found that there is a kind of customers who not only buy diapers, but also often buy beer. Diapers and beer are naturally two unrelated commodities. From my personal experience, I can't think of any connection between them at all. It was later discovered that this was caused by a social phenomenon. There are many young couples in America. When the diapers are used up, the hostess takes care of the children at home and the man goes to the supermarket to buy diapers. After buying diapers, the male owner usually buys some beer with him.
The above examples show that data can often let you find seemingly unreasonable and illogical phenomena, but they exist and often occur.
For another example, traffic congestion in Beijing is a well-known thing. Especially in the morning and evening peaks, there is no need to predict. However, if the best traffic light management system in Beijing is calculated based on historical traffic data and mathematical model, it belongs to the category of big data.
This is also the biggest difference between big data in my eyes and ordinary statistical analysis of data: statistics can help you find diseases, but big data can not only help you find them, but also help you treat them.
Big data is by no means a gimmick. In the reading recommendation project to help an operator read a cardinal number, all indicators have been greatly improved. And this promotion is not a few tens of percent, but several times! (The per capita traffic of users increased by 4 times, and the activation ability of silent users increased by 6.5 times. This is the charm of big data.
Big data is not everything.
Big data is obviously not everything. So she's real. In some areas, for various reasons, the value brought by big data is not as high as expected. There are two main problems that lead to this phenomenon. One is that the quality or quantity of the data itself is not enough; The other is that the algorithm is not suitable.
Don't think that massive data will be valuable. In the past work, we often found that 80-90% of the data from Party A's data sources are useless. Only 10%-20% data will produce certain value. This reminds me of Marry Meeker's metaphor, "The work of big data is like finding a needle in a pile of straw".
What's more, most fields are in the early stage of their own business, and the data they have is very poor. Cold start and sparsity are challenges that big data faces in many fields.
On the other hand, there is no one-size-fits-all algorithm for different fields and projects, which must be analyzed and solved according to specific problems. In practical work, it is found that not only different fields (such as article recommendation and commodity recommendation), but also different units in the same field (all belong to e-commerce but different types of e-commerce, such as maternal and child, clothing or luxury goods) are different.
Cross-utilization of data
The two biggest problems in the practical application of big data mentioned above, the lack of data during cold start and the sparseness of data in the early stage of business, are not hopeless. Getting through the data that the industry has been discussing is the way to solve these two problems.
For some emerging fields, the lack of data is inevitable. On the other hand, due to the lack of data support, it is even more necessary to have a strong decision support system to guide and support their business, so as to achieve the purpose of avoiding detours and maximizing benefits.
Projects in the field of mobile Internet are particularly representative. Although in the past two or three years, the mobile Internet has developed rapidly, but after all, the accumulation in all aspects can not be compared with the Internet. Especially before people form stable usage habits, data has no more value and significance.
However, if we can link the internet data with the mobile internet data, we can grasp the information of this person's preferences and so on, so as to make more effective guidance and help for the mobile internet business.
Of course, data access is not limited to the Internet and mobile Internet. Data from each data source usually describes different aspects of a person. As described by Professor Barabbasi in his book Outbreak, 93% of human behaviors are predictable and regular if there are sufficient data.
Only by reorganizing these data from different sources can we mine more meaningful information.
Nowadays, many people in the industry do big data under the banner of "data statistics and analysis", which makes many laymen fall into a misunderstanding: data statistics is not equal to big data. Whether it is data statistics or big data, it is actually to make our work more effective and decision-making more rational and accurate. Paying attention to data itself is a sign of a mature enterprise.
The rapid rise of the mobile Internet has made data more diverse and rich. Its mobility, fragmentation, privacy and timeliness just make up for the data after the user leaves the desktop computer, so that together with the original internet data, it can well outline the daily life of a netizen.
With the further enrichment and improvement of data, with the opening and cross-utilization of data from different channels, the imagination space about big data will certainly be broader.
The above is the misunderstanding about big data shared by Bian Xiao: data statistics ≠ big data related content. For more information, you can pay attention to Global Ivy and share more dry goods.
- Previous article:Lyrics of Miriam Yeung's "Pistachio"
- Next article:What causes men to cheat?
- Related articles
- I am a man. I always shy around others. What should I do if others think I am too girly?
- African war jokes
- Yi Yang Qianxi may win the Hundred Flowers Award for Best Actor. Do you think he deserves it?
- Write an essay on the topic of drawing a picture of myself
- Fans complained that Zhang Han should not start his own company. What happened?
- Complete works of txt in Ningxia
- Han Meijuan wouldn't let Qing sit in the chair where Jackson Wang sat. Why?
- A joke about mom taking her best friend home.
- My boyfriend spoke politely to me, and he answered me solemnly when I joked with him.
- Play mobile English