Joke Collection Website - Public benefit messages - How to understand data from scratch

How to understand data from scratch

How to understand data from scratch

Nowadays, regarding the process of enterprise digitalization, the degree of enterprise informatization and operational efficiency have been greatly improved, the content and dimension of data have been unprecedentedly enriched, and many scenes or information can be recorded effectively, accurately and in real time by data.

This paper mainly takes retail as an example to talk about what you need to know if you want to know the data of an enterprise, and how to quickly change from "layman" to "layman".

First of all, the world is full of data.

It is not hard to imagine that the world we live in is full of data. In daily life, our words and deeds, every move, are almost recorded by data, and more and more characteristic.

The mature application of 4G and the rise of 5G, the extensive layout of communication infrastructure such as base stations, and the commercialization of LBS, and the extensive use of travel or navigation software (including buying tickets, taking taxis, * * * enjoying bicycles, map navigation, and vehicle sensors ...) make each of us' trajectories become data and be recorded, such as where we go to work, where we live, how to travel, when to travel, where we pass by, and where we are.

With the development of new retail, the application of new technologies such as big data, AI, scanning code payment, image recognition and sensors, and the rise of new thinking such as WeChat marketing, social marketing and community marketing have accelerated the integration of online and offline. Whether people shop online or offline, the data of all links in the whole shopping chain will be transmitted to the system background quickly and accurately.

Based on the recorded data, merchants or shopping platforms will know who bought it, when, where, what, how much, how much it cost, how to pay, whether it is a discount or a discount. If you shop online, what keywords you searched, what products you browsed, what you consulted with the merchants, where to send the goods, and so on will also be known by the merchants or shopping platforms. If you are shopping offline, your every move in the store is closely monitored by the camera installed in the store from entering the store to leaving the store. Through video surveillance and face recognition, merchants will immediately know who you are, whether you are a new customer or an old customer, what path you take in the store, which product area you stay in, which products you take, which products you put in the shopping basket and so on.

When we chat online, who we chat with, when we chat, what we talk about, text/voice or video, all of which will generate corresponding data and be recorded.

Similarly, when we make phone calls, deliver express delivery, order takeout, run for fitness, go to restaurants, watch movies or browse the web, we all leave a little indelible mark in the form of data. Even when you walk on every street in the city, cameras all over the city may capture your beautiful image.

In short, scientific and technological services have brought us great convenience in life, but they have also asked us for the most important thing-personal information in exchange. After all, there is no such thing as a free lunch.

Although more and more events, behaviors and scenes are recorded or characterized by data, there are still many things that cannot be recorded by data at this stage or even for a long time to come. For example, as long as people's inner thoughts are not expressed, it is difficult to record them with data, which is probably one of the reasons why the world is full of uncertainty.

Figure 1. Represent business with data

When we come into contact with enterprise data, we need to know how these data come from. These data will not be generated for no reason, and there must be some business scenarios behind them.

We deeply feel that the world is full of data, the data scale is growing exponentially, and the data types are also diverse, from structured data to unstructured data such as text, voice, image, short video and video.

Second, what does the data express?

Real (not fabricated, simulated or forged), quantifiable and recordable data will definitely reflect a business scenario in the real world. The occurrence or changes of real business scenarios are mostly reflected in the data of the background system.

However, there is still some information loss or distortion in the representation of business scenario details by data. For example, a 30-year-old female customer bought two baby clothes in a maternal and infant store, but we don't know why she bought them, whether they were for her children or for relatives and friends, whether they were given as birthday gifts or whether she needed to change clothes when she grew up. At this time, it is necessary to do analysis and "guess".

Retail is the industry with the most diverse business scenes and the closest to everyone's daily life. When talking about the relationship between business and data, we might as well use the example of retail industry to help us better understand.

In the hot summer, the heat is unbearable. When you walk into a convenience store near the company, you pay by WeChat, spend money in 3.5 yuan, buy a bottle of coke with 330ml sugar-free modern cans, swipe your membership card to save 100 points, and the cashier MM will return you a POS receipt. At this time, everything that happened to you has been recorded in the database through the cashier. To make matters worse, the camera in the store also recorded your every move in the store and converted it into frame-by-frame image data.

This is business dataization.

The analysis shows that the sales of 330ml modern cans of Coke in 3.5 yuan recently increased by 20% compared with last month, and 75% of the consumers are men aged 20-35. In contrast, the sales of 300 ml plastic bottles of Coke dropped by 40%. By comparison, the store manager thinks that the profit of 300ml plastic bottle coke is low, while 330ml modern can coke is more popular with young people at present. Considering the increasing pressure of rent and the fierce competition in convenience stores, he made a bold decision to remove 300ml plastic bottle coke and add 330ml modern canned coke.

This is data service, or data-driven service.

Figure 2. The relationship between business and data

So, don't rush to look at the table in the enterprise system. Look at the data. It's just a cold number. It doesn't make any sense and won't tell you any information. Before we start to understand enterprise data, let's get familiar with the business.

Figure 3. The process of understanding data

It is more appropriate to be familiar with the business in a "face-line-point" way, which is comprehensive, systematic, effective and in-depth, first in an industry, then in a vertical field, then in an enterprise, and finally in a specific business scene. The following methods can help you get familiar with the business quickly:

Read industry reports, including industry status, overall scale, development model, technology, product characteristics, consumer characteristics, benchmark enterprises, trend prediction, etc.

Understand the overall situation of the enterprise from the company's annual report, business analysis report and other documents;

Browse the company's official website, WeChat official account, Guan Wei, online shop, etc., deepen and improve your understanding of the company, and at the same time do some online experiences and insights;

On-the-spot investigation of offline outlets of enterprises, experience of outlet services, and learn about the situation from the staff of first-line outlets. (Many large companies will require headquarters employees to experience 1-2 days at least once a year to avoid being out of touch with business. )

Consult experienced and knowledgeable old employees. They are not only familiar with the business, but also very clear about the people and things in the enterprise, so they can ask them more;

There are often many business scenes in daily life. Keep an open mind, observe carefully, and what you see is what you get.

Pay attention to peacetime accumulation, read more books, experience life more, and increase social experience and experience.

For young people without families, how long a can of baby milk powder can be drunk is mostly without concept; Most boys who have never been in love don't understand what BB cream and face cream are, and why they should use makeup remover and cotton to remove makeup (just wash it with water directly, which will save trouble ~ ~) and so on. These common sense of life comes from daily accumulation, which will make you handy when you study a certain industry in depth and reduce the time cost of learning.

Figure 4. Familiar with business methods.

Only when you have a basic understanding of the business can you know what the data expresses.

Third, what is the scene behind the data?

When we understand enterprise data, we are faced with hundreds of systems and thousands of forms. Some people feel in a hurry and don't know how to start.

As long as you follow the general business logic, you can basically sort out the logic and correlation of enterprise IT system construction, have an overall understanding, form a framework thinking, and avoid falling into an "information island". This is necessary for data integration. You can't separate a system and look at it alone. The relationship between systems must be clear.

For example, when building a wide table of basic data with members as ID, it is necessary to comprehensively sort out the people-centered business processes. On this basis, it is necessary to clearly sort out the systems of each business node, and make clear how the data of each system is integrated and related, so as to be comprehensive and avoid missing some systems and corresponding information.

Figure 5. People-centered data integration

Retail enterprises generally have intelligent departments such as marketing, expansion, commodities, procurement, warehousing, distribution, operation, stores, customer service, IT, administration, human resources, finance and integration. Each department has specific business activities and processes, and there are also mutual business contacts between departments. The enterprise's IT system is also built around these business activities. As long as it is a retail enterprise, this enterprise is like this, and so is that enterprise. There is not much difference between business activities in essence, and everything is connected. However, the uniqueness of each enterprise requires our special attention.

IT is suggested to take a panoramic look at the enterprise IT system architecture and spend a few days studying it.

When we deeply understand a single system, based on our overall understanding of the enterprise IT system, we should not only know the function and function of this system, who is using it, but also know the position of this system in the whole enterprise system, what are the upstream and downstream systems and how the data flow works.

There are two kinds of IT systems, business systems and business support systems. The business system is mainly the most original data of each business line, that is, "first-hand data", while the business support system mainly extracts the original data from the business system and obtains the summary data after cleaning, processing, integration and analysis.

Studying a single system in depth certainly depends on the data dictionary. Similarly, from the surface, first look at what kind of tables, what kind of data, what business, which tables can be ignored, and which tables need to be focused on.

Generally speaking, the naming of tables follows strict specifications, which can be intuitively judged from the table name. So we can quickly identify from the naming of the table, such as sys stands for system, pos table order, cos stands for customer service, sms stands for short message, item stands for goods, and so on.

Each system will have hundreds of tables, so you need to do a quick filter to see which tables you don't need to look at. For example, the table related to sys is mainly used to save system configuration parameters or record the running state of the system, which can generally be skipped.

After preliminary screening, the table that needs to be understood in detail is obtained.

Tables are generally divided into two types, fact tables and dimension tables. Fact table records and stores data or information related to transactions, events or behaviors, such as POS order table, while dimension table describes the attributes of individuals or the mapping relationship between individuals, such as member information table, commodity information table, category table, industry table, region table, store information table, employee table, etc. The fact table is dynamic and redundant with a large amount of data, while the dimension table is static and redundant with a small amount of data.

Look at the fact table first, and then look at the dimension table related to the fact table.

Using business correlation method and "5W2H" methodology can help us to understand a single table quickly.

For example, each of us has personally experienced the whole process of shopping in the store, from which we can disassemble several essential information related to this matter, that is, who bought it, when, where, what, how much, how much it cost, whether there are discounts, how to pay, and so on. This information will also produce corresponding data.

Conversely, we can also get this information from the data and restore the real business scenario. From the field information of the fact table data, we can know the member/non-member (who bought it), the order time (when did you buy it), the store (where did you buy it), the purchased goods (what did you buy), the quantity (how much did you buy), the amount (amount), the discount (with or without discount), the payment method (how to pay), the cashier and so on. Every aspect of information can be combined with dimension tables or integrated with external data to obtain more dimension information. For example, if you know who bought it, you can further know the basic information of this person, such as gender, age, occupation, registration time and so on. Which store to buy, you can further understand the information of provinces, cities, counties and business districts, and you can also know what types of stores (such as street shops and shopping malls); What goods you bought, combined with the list of goods, or associated with external data, you can know the specific attribute information of the goods.

For individual tables, it is also necessary to know the following information:

(1) primary key, that is, which fields can be used to uniquely identify a row of data;

(2) The data updating mechanism depends on which fields are involved, and the fact table and the dimension table are different;

(3) The amount of data depends on whether it is a large table or a small table, and how big the order is;

At this point, you have learned about a single table and its corresponding business. Don't be careless At this time, you can only get a passing grade for your understanding of the business! More detailed business scenarios require in-depth research on specific data fields. Others won't tell you too many details, and it's no use telling you, because you will soon forget it without personal experience.

To understand the specific field, it is not enough to just look at the data dictionary, but also know what the data looks like.

Be careful and sensitive when reading data.

If you see a data transaction time of "20 18-7-4 9: 16", you should pay special attention, and you may need to process it into a standard time format-"2018-07-0409:16: 00"

If you see that the first few lines of a field have the same value (such as 0 or null value), you should pay special attention to whether this field has only one value;

If you see that the quantity of goods is decimal, you should combine the information such as commodity name, category, unit and specification to see what goods you bought when the quantity is decimal;

If the settlement amount is 0, it depends on whether the commodity is a gift (determined by the commodity name and category);

If the settlement amount is negative, please check whether the quantity is negative and whether the order type has been returned.

If there is a POS bill with a very large settlement amount of several million, let's see what currency is used for settlement. If it is RMB, please ask the business personnel to confirm, either the input is wrong or there are some special business operations.

In short, based on our understanding of the business and our sensitivity to data, if we find that the data in a certain field is "abnormal" or does not conform to our previous understanding of the business, then at this time, we might as well look at whether other fields are "abnormal" and think more about what the business scene is behind through the information in these fields.

If you can, you may wish to familiarize yourself with the commonly used forms and their fields, and it is best to write them down.

Time spent in the underlying data processing stage. And there will be many problems in data processing, in fact, most of them are caused by ignorance of business and data.

The more we know about data, the more convenient and effective data processing will be.

There are many pits in the original data of an enterprise. You never know where the pits are, when they will come and how long it will take to fill them. When you understand the data, it is not wrong to be more sensitive and questioning.