Joke Collection Website - Blessing messages - How to establish an automatic database operation and maintenance system
How to establish an automatic database operation and maintenance system
With the growth of business, the requirements for operation and maintenance efficiency and quality are getting higher and higher, and the demand for automatic operation and maintenance system is also increasing.
At present, the operation and maintenance of many large and medium-sized enterprise customers I serve are still in the original state of "slash and burn".
The "knife" and "fire" mentioned here are remote clients of operation and maintenance personnel, such as xshell and Windows Remote Desktop.
This working mode has many limitations,
For example, the installation and initialization of servers, databases, middleware, application software deployment, service publishing and monitoring are all done manually.
This requires operation and maintenance personnel to log on to the server and manage and maintain them one by one.
If there are dozens or hundreds, it will be exhausted.
The author has run more than 4000 servers and a team of more than 20 people. Think carefully about what this job can do by manpower.
In addition, the manual operation mode relies too much on the execution sequence and operation steps of the operation and maintenance personnel, and a little carelessness may lead to production accidents. Even before the change, it is difficult to guarantee that there will be no accidents.
You can't walk by the river without wet shoes.
At this point, operators began to explore the use of scripts and batch management tools.
This method does improve the efficiency and quality, but it is not universal.
The first is the nonstandard script.
Each operation and maintenance personnel has their own problem-solving style, and there are huge differences among different personnel, so it is a challenge for different people to develop version management of these scripts.
The second is the handover of the script. The company's personnel structure is not static, and people come and leave. Resignation and handover will lead to the script not being well inherited and reused by the operation and maintenance personnel.
Therefore, building an automated operation and maintenance system has become the only choice.
So how to build an automated operation and maintenance system? This paper is divided into three aspects:
The first one is why should we build an automated operation and maintenance system?
The second is to introduce how the operation and maintenance system is designed, operated and handled according to the author's experience.
Third, the author thinks about some problems encountered in the process of automatic operation and maintenance, and makes a summary.
This paper aims at the database automatic operation and maintenance system
The core content is as follows:
First, the reasons for building an automated operation and maintenance system
Why build an automated operation and maintenance system?
It must be some challenges encountered in the operation and maintenance process.
The first is the need for change.
It is manifested in three aspects:
First, the number of changes is large. At present, the number of customers we serve reaches 30,000, which is very huge.
Second, there are many kinds of changes, and different customer needs are different, including but not limited to capacity expansion, performance optimization, fault handling, DG handover and migration, RAC construction, etc.
Third, the risk of change is high, some changes are high-risk operations, and automated processing is safer.
The second is the operation and maintenance environment, which is mainly manifested in the large number of servers and various databases. Our customers can freely choose which database to use, which corresponds to different environments.
The third is the human factor.
In the process of building an automated operation and maintenance system, one of the more important factors to consider is the human factor.
It is precisely because each operator's ability is different, his technical level is uneven, and even his habits and tools are different.
Therefore, we must create a standardized automatic operation and maintenance system to improve work efficiency.
Second, how to build an automated operation and maintenance system
Let's see how each module is designed and works.
1, automatic installation system
Installing a database is a tedious but data-intensive task.
There are many operating systems, but few people and less available time. Automatic installation saves time and effort. The whole automation process adopts a general framework, mainly aiming at Oracle installation and MySQL installation under linux.
Before delivery to users, basic security settings will be made, which improves security to a certain extent and reduces some manual operations.
2. Automated operation and maintenance platform
When the server is installed automatically, it will be taken over by the automatic operation and maintenance platform.
Automatic operation and maintenance platform is the operation platform of operation and maintenance personnel, which mainly solves a lot of management problems brought by safety, efficiency and speed.
In the design process, the following factors should be considered: the operation interface of the whole operation and maintenance system should be designed based on the fortress machine architecture.
The operation and maintenance engineer can log in to the management system for operation and maintenance anytime and anywhere, which is more convenient. SecureCRT will give instructions to the operated machine.
Make full use of existing protocols and tools. The characteristic of this platform is that all systems use SSH management instead of developing an agent, which also reflects the viewpoint of automatic operation and maintenance.
3. Automatic inspection system
Because we have many customer systems and businesses, how to design a system to check their operations?
We have adopted two ways: the self-developed central control system and the third-party management platform. Let's look at our own central control system first:
Patrol other database nodes with a single server, and the script can be shell or Python.
Set the crossing time interval, and notify the operation and maintenance personnel in time by calling or texting in case of failure.
The second is to manage all database nodes to a third-party monitoring platform.
4. Automatic performance analysis system
The system does not have to run stably forever, and performance problems are inevitable. Performance analysis system is the most important.
Here, the author writes another article separately.
5, automatic monitoring and early warning system
Usually, the customer's system runs 7*24 hours, which requires early warning and monitoring.
Early warning monitoring system+personnel on duty is standard configuration.
The early warning and monitoring system is based on the inspection system, but the collected indicators are different.
6. Automatic backup system
Two places and three centers +DG+NBU
Third, the construction of automated operation and maintenance system thinking
The author summarizes the construction goal of automatic operation and maintenance system into four words.
The first one is finished. The system shall cover all operation and maintenance requirements.
The second is concise, simple and easy to use. The learning cost of operation and maintenance personnel should not be high. The more complex and difficult the system is, the less likely it is to exert its own ability and efficiency.
The third is efficiency, especially when batch processing or performing specific tasks.
The fourth is security. If an operation and maintenance system is unsafe, it may be taken over by hackers soon.
abstract
At present, the author is also slowly transforming from database architecture, optimization and fault handling to automatic operation and maintenance system.
Summarizing the past, I think there are three aspects for your reference.
The first is the principle of gradual progress:
Focus on the current problems and deal with them well, and the following problems will be solved easily.
If the original design system is huge and rich in functions, it will lead to some uncontrollable situations. But if the initial goal is to solve some specific problems and be targeted, it will be easier to advance. In the process of building the automatic operation and maintenance system, our initial goal is to build a basic platform for batch operation and change, and first move some work that needs to be repeated to the platform.
Then, according to the requirements of operation and maintenance, enrich the functions of this operating platform and improve efficiency. Finally, the surrounding systems are interconnected to form a complete automatic operation and maintenance system. The second is to consider scalability:
When designing the system, you may not have to consider so much in terms of function or design, but what you have to consider is whether the system can still support under the situation that the number of servers expands greatly. The third is for practical purposes:
It was inconvenient to use, so the operation and maintenance personnel gave up first. What about promotion?
How to establish an automatic database operation and maintenance system
Tag: The ability between two ble extended accident teams' simple systems.
- Related articles
- Gitzo overdue news unsubscribe can click? Is it true?/You don't say.
- Five selected inaugural speeches
- How to describe a person's voice in English?
- I miss my girlfriend's talk and touch my girlfriend's sweet words.
- Why can¡¯t my vivo phone send and receive text messages?
- What do you mean by blacking out your mobile phone?
- Red rice mobile phone can't receive SMS verification code. What happened?
- Excuse me, my Samsung 7 102, when the text message comes, it will be reminded in the middle of the screen, but the content is also displayed. Why not let him display the content?
- Please help me, my friend, how many words can I write in a text message?
- How can I get a pick-up if the recipient has not sent the pick-up code?