Joke Collection Website - Bulletin headlines - Why can't you always understand what's in the bank?

Why can't you always understand what's in the bank?

This happened at a very delicate moment. Recently, the interbank market is short of funds, and some citizens suspect that the bankruptcy of ICBC is related to the "money shortage", which has triggered interpretations and associations from all walks of life. A bank IT staff member explained this matter on the social question-and-answer website "Zhihu" and introduced the story behind bank IT.

1. Modern IT systems are very complicated. When the system is large enough, it will always get out of control. There has never been a complex program without mistakes in the world. The only question is whether you have encountered this mistake. The bank's system is operated by many different software and hardware manufacturers, which is far more complicated than ordinary home computers. Such a simple home computer will crash ... and the system is complicated to a certain extent, so it is impossible to completely solve the problem with more people or more money.

2. Try not to ask for money when you encounter problems, and ask for a lot of money (for example, medium-sized banks need hundreds of millions of dollars to build a decent disaster recovery system). But the problem is only "possible", and the money spent is real. If you are a leader, you won't invest unlimited money in it.

3. One of the best ways to run stably is not to reform the system. Due to the new business requirements, the system really needs to be constantly upgraded, and every change is a challenge to the stable operation of the system.

Because of three words: concentrate. Before the earliest, the banking system was not connected to the internet, and the problem was only a problem in a certain district or city. In recent ten years, the banking industry has been concentrated on a large scale: except for China Bank, four of the top five banks have completed large-scale concentration. ICBC is the first company to complete this project, which is called 999 1. It seems to have been completed from 1999 in 2002. Most banks, including the establishment of diplomatic relations between workers and peasants, China Development Bank, Agricultural Development Bank, Shanghai Pudong Development Bank, Huaxia Bank and Minsheng Bank, have two centers, one in Beijing and the other in Shanghai (Bank of Communications seems to have a center in Wuhan, while China People's Bank seems to be in Wuxi). The Bank of China has been concentrated in five centers for a long time, but it has not yet become a dual center.

Centralization has many commercial benefits, but as far as the influence scope of system stability is concerned, it is a bit like "all eggs are put in the same basket" Although many people spend a lot of money to see this basket, there is always a secret, so dense eggs can hatch chickens!

There was no Weibo or WeChat before, so as long as you are not an unlucky user, you won't know that something is wrong. Before online banking and Taobao, you wouldn't buy anything in the middle of the night. Many years ago, I was promoted in a major provincial bank, and there was a big problem at 3 am. If I can't get there before 8 o'clock, the banks in the whole province will close. At 6 o'clock, the president stood behind and watched me operate. At 7 o'clock, it was finally done. If it were today, the pressure would be even greater.

Because of four words: historical reasons. The IT construction of banks began in 1980s, and the traditional thinking still focused on running programs on a single server (some of which were made into dual-machine hot standby). Most of the IT construction of the Internet began in 2 1 century, and most of them adopt the distributed idea: multiple computers run programs at the same time, and if one of them goes wrong, the impact will not be so great.

The characteristics of banking procedures are to be stable, and the risk of changing the model is great (some procedures still use the technology of 20 years ago). So although it is slowly turning, at least it has not turned much until today. By the way, sigh the difficulty of reform and praise Uncle Deng.

Bank IT is the most rigorous industry in China IT industry. For example, some banks also require the maintenance personnel of manufacturers not to operate, and only bank employees can operate.

There must be a big change plan, even an operation that has been done hundreds of times, such as replacing hard disk and replacing IP. However, there is a considerable gap between the plan and the fact. As mentioned above, the system is very complicated. If all possible problems are written down, there may be hundreds of branches. In addition, system failure will not happen according to your emergency plan.

The most important function of the emergency plan is to deal with the supervision of the superior, set up the emergency software and hardware environment that may be needed according to the emergency plan, roughly sort out the outline ideas, and train the team. There are really complicated problems, and they are still solved by the cattle people on the spot.

The common and simplest overall indicators to measure the continuous operation system are RTO and RPO. Strictly speaking, it is roughly an indicator of how much data is lost in the closed Takuwa.

You can safely deposit your money in the bank. Generally speaking, the problem is only at the level of shutdown (the system can't run at a certain time), and it has not reached the level of data loss or data error. Even if there is a problem of data loss, accurate data can usually be retrieved from the backup center or disaster recovery center. The banking system checks the accounts every night to ensure the accuracy of the data.

Let's talk about the time to locate the problem first: from the time the problem is reported to the IT information center (or found in the monitoring system), people in the IT center begin to check the system to locate the cause of the fault. If the location is not clear, they need to find relevant software and hardware personnel to be present or remote network support (for security reasons, most banks can't remotely check the system, and it takes time for maintenance personnel to arrive at the data center ...), and they can find out the root of the problem in an hour. With an inexplicable high fever like yours, which organ is out of order, it always takes time to go to the hospital for examination and judgment, right?

It is even more difficult to solve the problem. In fact, just like everyone's computer, restarting is often the most effective method, but many business systems can't be restarted when there is a problem (which may affect other business systems). So far, most of the standard maintenance contracts of major foreign manufacturers do not promise maintenance time.

Let's talk about the disaster recovery system first, and emphasize a fact that many IT people don't know: the bank disaster recovery system will not easily enable overall switching! As I said before, the IT system has become so complicated that the disaster recovery system is equivalent to copying another set, and the complexity has increased by more than 2 times. Switching is very troublesome, very painful, and it will disturb a lot of manpower and material resources. Unless there is a major disaster (such as earthquake, computer room fire, terrorist explosion, etc.), it will not be switched. ).

Of course, disaster recovery switching drills are usually conducted, but core systems are generally not used for real switching because of risks. In the past, a provincial bank in East China never returned to the production center after switching to the disaster recovery center. Recently, a rural credit cooperative in northwest China successfully reduced its core production to disaster recovery system, which is not simple, but after all, this is a small bank with an independent legal person, and a big bank is not such a game.

In addition, I have seen many comments saying that "no one dares to risk switching to the disaster recovery node".