2009. 10. 11. 14:12

SQL Server DBA Checklist

 

SQL Server DBA Checklist

 Problem
I am looking for items that I should address on a daily basis on my SQL Servers.  As such, what are the critical aspects of SQL Server that should I check on a daily basis?  Should I perform additional checks on a weekly, monthly or yearly basis?  How can I automate some of these tasks so I do not spend my whole day reviewing SQL Servers rather working on the latest and greatest technologies?

Solution
Depending on your environment dictates all of the items that should be reviewed on a daily basis as well as their criticality in your specific organization.  Based on your environment, customize the list below to ensure it meets your needs:

Daily Checklist

  • Backups - Check your backups to validate that they were successfully created per your process.
  • Nightly Processing - Review the nightly or early morning processes.
  • SQL Server Error Log - Review the SQL Server error log for any errors or security issues (successful or failed logins) that are unexpected.
  • Windows Event Log - Review the Application Event Log at a minimum to find out if any Windows or hardware related errors or warnings are being written.
    • Some of the hardware vendors write warnings to the Windows Event Log when they anticipate an error is going to occur, so this gives you the opportunity to be proactive and correct the problem during a scheduled down time, rather than having a mid day emergency.
    • SQL Server 2005 Exposed - Log File Viewer
  • SQL Server Agent Jobs - Review for failed SQL Server Agent Jobs.
  • HA or DR Logs - Check your high availability and/or disaster recovery process logs.  Depending on the solution (Log Shipping, Clustering, Replication, Database Mirroring, CDP, etc.) that you are using dictates what needs to be checked.
  • Performance Logs - Review the performance metrics to determine if your baseline was exceeded or if you had slow points during the day that need to be reviewed.
  • Security Logs - Review the security logs from a third party solution or from the SQL Server Error Logs to determine if you had a breach or a violation in one of your policies.
  • Centralized error handling - If you have an application, per SQL Server or enterprise level logging, then review those logs for any unexpected errors.
  • Storage - Validate you have sufficient storage on your drives to support your databases, backups, batch processes, etc. in the short term.
  • Service Broker - Check the transmission and user defined queues to make sure data is properly being processed in your applications.
  • Corrective Actions - Take corrective actions based on the issues and/or errors that you found.
  • Improvements - Look for opportunities to improve your environment based on the review and analysis you have performed.
  • Learn something new - Although this review and correction process could be time consuming, take some time every day to learn something new to improve your knowledge of the technology you work on every day.

Weekly or Monthly Checklist

  • Backup Verification (Comprehensive)- Verify your backups and test on a regular basis to ensure the overall process works as expected.  What is meant by this is to:
    • Contact your off site tape vendor to obtain a tape
    • Validate that the tape goes to the correct office
    • Validate that the vendor delivers the correct tape
    • Validate that the vendor delivers the tape in the correct time period
    • Validate that the software version you use to perform the restore is compatible with the version from the tape
    • Validate that the tape does not have any restore errors
    • Validate that sufficient storage is available to move the backup to the needed SQL Server
    • Validate that the SQL Server versions are compatible to restore the database
    • Validate that no error messages are generated during the restore process
    • Validate that the database is accurately restored and the application will function properly
  • Backup Verification (Simple) - Verify your backups on a regular basis.
  • Windows, SQL Server or Application Updates - Check for service packs/patches that need to be installed on your SQL Server from either a hardware, OS, DBMS or application perspective
  • Capacity Planning - Perform capacity planning to ensure you will have sufficient storage for a specific period of time such as for 6, 12 or 18 months.
  • Fragmentation - Review the fragmentation for your databases to determine if you particular indexes must be rebuilt based on analysis from a backup SQL Server.
  • Maintenance - Perform database maintenance on a weekly or monthly basis.
  • Security - Remove unneeded logins and users for individuals that have left the organization, had a change in position, etc.
  • Shrink databases - If databases or transaction logs are larger, than necessary shrink those files to free up disk space.

Opportunities for Automation

  • Setup alerts for specific error levels or error messages that impact your SQL Servers in order to be notified automatically.
  • Setup Jobs to query for specific conditions in your tables to validate data was loaded or data is being added to specific tables based on your business processes throughout the day.
  • Setup notification on Job success, failure or completion.
    • One word of warning is to check your business critical Jobs on a regular basis just to be sure they are working properly.  Nothing is worse than finding out a key process has been failing for days, weeks or months and the reason notifications have not been sent are due to an incorrect configuration, full mailbox, etc.  It may be 30 minutes on a weekly basis that is time well spent.
  • Setup centralized error handling on a per SQL Server, application or enterprise basis then determine the business rules for specific error conditions.