Saturday, June 9, 2018

New SQL Version Life Cycle tool

Introducing my latest SQL Server Life Cycle and Versions lookup tool.

https://sqlversions.azurewebsites.net

Back in 2016 I released the first version of this app and the goals were the same. Provide an interface to easily lookup a SQL version number and find out the support status of that released. All backed with an API that could be queried from scripts and other tools.

This new release is a complete re-write and includes a modern responsive site written in Angular (because I felt the need to learn).

With this initial release I am providing the following capabilities in the UI:

Version Search

The search feature allows you to lookup a specific release of SQL Server or version number and then see all the releases that match that search result. In the results you can see the release date, mainstream and extended support dates.



Clicking on a row in the results will open the



Life Cycle

The life cycle search page is just a page to quickly lookup when certain branches of a product release will end mainstream support or extended support. This was created as while onsite consulting I sometimes need the ability to quickly look up life cycle information rather than a specific version. It was also a "wish list feature" from a customer I was working with at the time.




Health Check [beta]

The health check is a beta release of my end goal which is to allow you to enter your version number and get basic recommendations about how to upgrade. The initial release simply recommends to either start planning to upgrade or to urgently upgrade depending on that versions support status. My vision for this feature is to provide guidance about updates that have been released on the same branch to stay current and also life cycling plans for upgrades to service packs or next release.




API

One of my main goals with this app was to ensure that it had an API which could be queried and then used to extend other tools and scripts. For example you could incorporate it into a Health Check Script to retrieve the version number from the SQL Instance and then call the API to get information about the life cycle and support of that release. Or you could use Power BI taking data from your CMDB and performing a lookup against the API for the rows or a search against the API and store that as a query with a relationship link.

The following code provides an example of using the API in a Health Check script. The code is available at https://gist.github.com/Matticusau/5778b90507cb7274deebc12cf4360c1c

# Import the SQL Module
Import-Module SqlServer;

# Get the version number via appropriate method
$VersionNumber = Invoke-Sqlcmd -ServerInstance $SqlServerName -Query "SELECT SERVERPROPERTY('PRODUCTVERSION')";
$VersionNumber = $VersionNumber.Column1;

# Call the API to get the version information
$VersionData = Invoke-RestMethod -Uri "http://sqlserverbuildsapi.azurewebsites.net/api/builds?version=$($VersionNumber)";

# Want to improve your Health Check script, calculate the health of Support Status
if ($VersionData.ExtendedSupportEnd -le (Get-Date)) {$SupportStatus = 'Critical'}
elseif ($VersionData.MainstreamSupportEnd -le (Get-Date)) {$SupportStatus = 'Warning'}
else {$SupportStatus = 'Ok'}

# format the output data string
$OutputData = @"
Instance = $($Instance.Name)
Version = $($VersionData.BuildVersion)
Product = $($VersionData.ProductName)
Branch = $($VersionData.BranchName)
Update = $($VersionData.BuildUpdate)
MainstreamSupportEnd = $($VersionData.SupportEndMainstream)
ExtendedSupportEnd = $($VersionData.SupportEndExtended)
SupportStatus = $($SupportStatus)
"@

# Return the hashtable
$OutputData

Soon, I will update the samples repository I provided for the previous version to reflect the new API syntax.




Lastly a note on the data. There is no dynamic link from this data to any Microsoft or other site. The data is provided "as-is" and manually maintained by myself and a few trusted peers. We do our best to make sure it is up to date and accurate but for any business critical or commercial decision make sure you refer to the official sources.

If you like this tool, or have some ideas for improvements, or even notice inaccuracies in the data please let me know.


Legal Stuff: The contents of this blog is provided “as-is”. The information, opinions and views expressed are those of the author and do not necessarily state or reflect those of any other company with affiliation to the products discussed. This includes any URLs or Tools. The author does not accept any responsibility from the use of the information or tools mentioned within this blog, and recommends adequate evaluation against your own requirements to measure suitability.

Sunday, May 27, 2018

Convert CSV files to Parquet using Azure HDInsight

A recent project I have worked on was using CSV files as part of an ETL process from on-premises to Azure and to improve performance further down the stream we wanted to convert the files to Parquet format (with the intent that eventually they would be generated in that format). I couldn't find a current guide for stepping through that process using Azure HDInsight so this post will provide that.

Scripts and what samples used in this guide are available https://github.com/Matticusau/SQLDemos/tree/master/HDInsightConvertCSVtoParquet

To follow this blog post make sure you have:

  1. Create a resource group in your Azure Subscription
  2. Create a Storage Account within the resource group
  3. Create an Azure HDInsight resource the same resource group (you can use that storage account for HDInsight)
  4. Upload the sample GZip compressed CSV files from the SampleData folder to the Storage Account using Azure Storage Explorer. In my case I uploaded to a container "DataLoad"
The work that we will perform will be within the Jupiter Notebook. 

From your Azure Portal locate the HDInsight resource, click the Cluster dashboard quick link


Now select the Jupiter Notebook


This will open a new tab/window.

Authenticate as the cluster administrator.

Create a new PySpark Notebook.


Paste the following lines and press Shift+Enter to run the cell.


from pyspark.sql import *
from pyspark.sql.types import *



Now we can import the CSV into a table. You will need to adjust the path to represent your storage account, container and file. The syntax of the storage path is wasb://mycontainer@myaccount.blob.core.windows.net/foldername/filename


# import the COMPRESSED data
csvFile = spark.read.csv('wasb://dataload@mlbigdatastoracc.blob.core.windows.net/SalesSample_big.csv.gz', header=True, inferSchema=True)
csvFile.write.saveAsTable("salessample_big")


Press Shift+Enter to run the cell

Once complete you can use the SQL language to query the table you imported the data to. This will create a dataframe to host the output as we will use this to write the parquet file.

dftable = spark.sql("SELECT * FROM salessample_big")
dftable.show()

The final step is to export the dataframe to a parquet file. We will also use the gzip compression.

dftable.write.parquet('wasb://dataload@mlbigdatastoracc.blob.core.windows.net/SalesSample2.parquet',None, None , "gzip")

The complete Jupiter Notebook should look like:



In your storage account you should have a Parquet export of the data (note that this format is not a single file as shown by the file, folder and child files in the following screen shots.





In this example you may notice that the compressed file sizes are not much different, yet the parquet file is slightly more efficient. You experience may vary as it depends on the content within the CSV file.


Some reference material worth checking out if this is something you are working on:

Legal Stuff: The contents of this blog is provided “as-is”. The information, opinions and views expressed are those of the author and do not necessarily state or reflect those of any other company with affiliation to the products discussed. This includes any URLs or Tools. The author does not accept any responsibility from the use of the information or tools mentioned within this blog, and recommends adequate evaluation against your own requirements to measure suitability.

Saturday, April 21, 2018

New SqlOps Extensions

Announcing my new SqlOps Extensions

AlwaysOn Insights
https://github.com/Matticusau/sqlops-alwayson-insights/releases

MSSQL Instance Insights
https://github.com/Matticusau/sqlops-mssql-instance-insights/releases

MSSQL Db Insights
https://github.com/Matticusau/sqlops-mssql-db-insights/releases


Previously I had released a single extension, which logically was actually three separate extensions but due to the methods used at the time it could be released as a single extension. With the added support for Extension Marketplace in SqlOps March release it made senses to break these out. Going forward this will also help with the life cycle of the extensions to manage them individually.

If you want to know how I wrote these extensions check out the following posts by Kevin Cunnane

Writing a SQL Operations Studio Extension in 15 minutes
https://medium.com/@kevcunnane/writing-a-sql-operations-studio-extension-in-15-minutes-7dfd24a74dfe

Publishing an extension for SQL Operations Studio
https://medium.com/@kevcunnane/publishing-an-extension-for-sql-operations-studio-f5a5b323c13b


Monday, October 2, 2017

Troubleshooting SQL Server AlwaysOn AG Auto Seeding

SQL Server 2016 introduced a new capability with AlwaysOn Availability Groups called Automatic Seeding. This allows you to automatically start streaming the backup using VDI across the log stream transport to the secondary replica. With this capability enabled you do not have to manually backup and restore the database and transaction logs before starting data synchronization with the primary replica.

Now there are some prerequisites for using Automatic Seeding such as:

  • Data and Log file paths must be the same on all Replicas
  • Databases must be in Full recovery model and at least one Full backup must have been taken of the database (to start the recovery chain)
  • All other prerequisites for a database to be added to an Availability Group must also be met


You can enable Automatic Seeding either before or after creating the Availability Group.

Sometimes you will find that Automatic Seeding doesn't work, in our example it will be because a data file already exists with the same name as the database we are seeding. Which is a common problem if you have previously removed the replica from the AG and are trying to re-join it. Unfortunately the UI doesn't give you any indication that seeding failed.

Enter Extended Events.

To demonstrate this I will create an Availability Group across two replicas. I am using the SQL Server AlwaysOn solution available from the Azure Market Place as this has an Availability Group already built, but for this demonstration I will create a new AG and Demo databases.

1. Setup the environment
I have added an additional disk to the SQLSERVER-0 node, and formatted this new volume and assigned the drive letter G:. I also created the path G:\Log and set the file permissions to full control for the sql service account.

2. Create the Demo Database
The following script can be used to setup the environment.

[01.CreateDemoDbs.sql]

3. Create the Availability Group
The following script can be used to setup the environment.

[02.CreateAG.sql]


4. Verify Environment
Now you should have a working Availability Group with the AutoseedDb01 synchronized and healthy between two replicas.



5. Create AlwaysOn_AutoseedMonitor Extended Events session
While we enabled the standard AlwaysOn Health extended events session, run the following script to create a new Extended Events session for monitoring Autoseeding

[03.ExtendedEventsSession.sql]

If you want to create this manually or to explore what other events are available you will need to make sure you select the Debug channel when using the Wizard to select events.



You can also filter the category to "dbseed" to further just view the events that relate to auto seeding



6. Add the 2nd database to AG
Now use the following script to add the 2nd database to the AG. NOTE: This DB has the log file on G:\ which does not exist on the replica.

[04.AddDbToAG.sql]

7. Investigating the Auto Seeding
You will notice that the DB was not created on the replica and it is listed in the AG's DBs but with a warning.



If you open the AG Dashboard you will see the warning message as well.



The messages do not include much detail.

8. Using DMVs
We can use the following DMVs to query the status of the seeding.

[05.DMVs.sql]

9. Using Extended Events
We are capturing the "error_reported" event along with the dbseed events. If you view the Event File on the replica where the failure happened. Locate the hadr_physical_seeding_progress event. There will be a lot of these events. In the below screen shot you can see the progress event for one of the last stages of the seeding process where it reports the failure.



Locate the hadr_physical_seeding_progress event with the internal_state_desc of "WaitingForBackupToStartSending". This is an early step in the auto seed process, then you should see error_reported events. Here is where you can find the real errors. Most likely it will be the first one reported. In our case it is:
Directory lookup for the file "G:\LOG\AutoseedDb02_log.ldf" failed with the operating system error 3(The system cannot find the path specified.).




There will be many more cases where this is useful. I have also used it before when I had old data or log files on the server from previous versions of a database I was adding back into the AG as sometimes that operation will fail (depending on permissions).

Let me know what situations you find


All the scripts used in this post are located in my GitHub https://github.com/Matticusau/SQLDemos/tree/master/SQLAlwaysOnAGAutoSeeding



Reference

Automatic seeding for secondary replicas
https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/automatic-seeding-secondary-replicas

Automatically initialize Always On Availability group
https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/automatically-initialize-always-on-availability-group



Legal Stuff: The contents of this blog is provided “as-is”. The information, opinions and views expressed are those of the author and do not necessarily state or reflect those of any other company with affiliation to the products discussed. This includes any URLs or Tools. The author does not accept any responsibility from the use of the information or tools mentioned within this blog, and recommends adequate evaluation against your own requirements to measure suitability.


Wednesday, September 27, 2017

Checklist for troubleshooting compilation of DSC Configuration in Azure Automation

I was recently working on a solution where a DSC Configuration block was not compiling in Azure Automation. This solution included Node Data, DSC Resources, and custom Composite Resources. The configuration would compile perfectly fine locally but not when published to Azure Automation. The other challenging aspect was that other composite resources within the same module were compiling fine.

Unfortunately Azure Automation doesn’t provide very detailed information for troubleshooting DSC Compilation errors. It actually will only show you the first level failure, which when your using a DSC Composite Resource it means you will simply receive an error that the composite resource failed to import, but the actual cause could be related to a problem with an underlying resource used by that composite resource.

So based on my experience I have come up with the following troubleshooting checklist when working through DSC Compilation errors in Azure Automation.

Troubleshooting Checklist
  1. Check the exception details output by the DSC Configuration compilation job that is in the suspended state. Either within the Azure Portal like the following screen shot or via PowerShell by the Get-AzureRmAutomationDscCompilationJobOutput cmdlet.



    Depending on the exception reported the next steps may vary. In the above screenshot it is reporting that a Composite Resource has failed to import.
  2. Can you compile the configuration locally?
    1. If yes, can you upload the MOF to the DSC Node Configurations in the Azure Automation account?
  3. Are all required Modules referenced by your Configuration(s):
    1. Uploaded to your Azure Automation account
    2. Up-to-date (see next point though)
    3. Match the required version by your Configuration block or Composite Resource
  4. If it is a Composite resource that is failing, are all Composite resources within your module affected or is it just a subset?
  5. If it is a Composite Resource, extract the configuration from the failing Composite Resource and place it directly in a Configuration block. Compile that configuration block in Azure Automation and review the output as this will provide more granular details about the specific resources used by that configuration block.
  6. Try simplifying the DSC Configuration block to reduce the number of DSC Composite resources or other resources being compiled to help narrow down the culprit

You should also read the "Common errors when working with Desired State Configuration (DSC)" section in the official documentation https://docs.microsoft.com/en-us/azure/automation/automation-troubleshooting-automation-errors


Legal Stuff: The contents of this blog is provided “as-is”. The information, opinions and views expressed are those of the author and do not necessarily state or reflect those of any other company with affiliation to the products discussed. This includes any URLs or Tools. The author does not accept any responsibility from the use of the information or tools mentioned within this blog, and recommends adequate evaluation against your own requirements to measure suitability.

Monday, August 14, 2017

Taking the Availability Group Database Level Health Detection option for a spin

With SQL Server 2016 the Availability Groups now include the option to enable Database Level Health Detection. While this option is turned off by default, Microsoft strongly recommends you enable this on all Availability Groups.... it is just off by default for backwards compatibility and so you opt-in by choice.

In this post I will take a deep look into just what sort of database issues will cause the AG to failover with this option enabled.

For this walk through I am using the SQL Server AlwaysOn deployment in the Azure Market Place and have provided a link to all the scripts below so you can try out this functionality and show off your skills in your own demonstrations. If you don't have an Azure Subscription all you need is an environment with two Replicas and multiple disks mounted in the virtual machines (so you can separate the transaction logs of different databases).

To follow along with the scripts and screen shots here are the details of resources within my test lab.

Primary Replica: SQLSERVER-0
Secondary Replica: SQLSERVER-1
File Share Witness: CLUSTER-FSW

Existing Availability Group: Contoso-ag
AG Database: AutoHa-sample



To make this demo easier, I have created a file share on the File Share Witness to store backups. It is best practice that when using Availability Groups you use a central location for storing backups from all replicas. Obviously a File Share Witness is not that place but for this demo lab it is fine.

Central Backup Share: \\cluster-fsw\sqlbackups


All the scripts used in this post are located here. This link is also contained within my final thoughts at the end of this post.

Ok, lets get started.


1. Create the demo databases
To enhance this demo, we will setup new databases and an availability group to show that how this setting only impacts one AG and not another.

Run the following TSQL to setup the demo databases. NOTE: I have explicitly put the Transaction Logs for the SuspectDb on a different volume to the CorruptDb

[01.CreateDbs.sql]
--- YOU MUST EXECUTE THE FOLLOWING SCRIPT IN SQLCMD MODE.
:Connect SQLSERVER-0

USE master
GO

-- CorruptDb
CREATE DATABASE [CorruptDb] CONTAINMENT = NONE
ON PRIMARY (NAME = N'CorruptDb_data', FILENAME = N'F:\DATA\CorruptDb_data.mdf' , SIZE = 8192KB , FILEGROWTH = 65536KB )
LOG ON ( NAME = N'CorruptDb_log', FILENAME = N'F:\LOG\CorruptDb_log.ldf' , SIZE = 8192KB , FILEGROWTH = 65536KB )
GO
ALTER DATABASE [CorruptDb] SET COMPATIBILITY_LEVEL = 130
GO
ALTER DATABASE [CorruptDb] SET PAGE_VERIFY CHECKSUM  
GO
USE [CorruptDb]
GO
CREATE TABLE [dbo].[DemoData] (id INT IDENTITY PRIMARY KEY, demoData VARCHAR(200));
GO
INSERT INTO [dbo].[DemoData] (demoData) VALUES ('Test data prior to simulating page level corruption');
GO 200

-- SuspectDb (NOTE: G:\ for TransLog)
CREATE DATABASE [SuspectDb] CONTAINMENT = NONE
ON PRIMARY (NAME = N'SuspectDb_data', FILENAME = N'F:\DATA\SuspectDb_data.mdf' , SIZE = 8192KB , FILEGROWTH = 65536KB )
LOG ON ( NAME = N'SuspectDb_log', FILENAME = N'G:\LOG\SuspectDb_log.ldf' , SIZE = 8192KB , FILEGROWTH = 65536KB )
GO
ALTER DATABASE [SuspectDb] SET COMPATIBILITY_LEVEL = 130
GO
ALTER DATABASE [SuspectDb] SET PAGE_VERIFY CHECKSUM  
GO
USE [SuspectDb]
GO
CREATE TABLE [dbo].[DemoData] (id INT IDENTITY PRIMARY KEY, demoData VARCHAR(200));
GO
INSERT INTO [dbo].[DemoData] (demoData) VALUES ('Test data prior to making the database suspect');
GO 200

2, Create the demo Availability Group 
We won't actually add our corrupt database into this Availability Group yet, that will come in the next few steps.

[02.CreateAG.sql]
--- YOU MUST EXECUTE THE FOLLOWING SCRIPT IN SQLCMD MODE.

-- Values you may need to change for your lab
-- Primary replica: SQLSERVER-0
-- Secondary replica: SQLSERVER-1
-- Backup Central Share: \\CLUSTER-FSW\SQLBACKUPS

---------------------------------------------------------
-- Prereqs for Availability Group
---------------------------------------------------------
:Connect SQLSERVER-0
-- Backup demo databases to share \\CLUSTER-FSW\SQLBACKUPS
BACKUP DATABASE [CorruptDb] TO DISK = N'\\CLUSTER-FSW\SQLBACKUPS\CorruptDb.bak' WITH FORMAT;  
BACKUP DATABASE [SuspectDb] TO DISK = N'\\CLUSTER-FSW\SQLBACKUPS\SuspectDb.bak' WITH FORMAT;  
GO  

---------------------------------------------------------
-- AG Endpoints
---------------------------------------------------------
-- If you are using your own lab then you need to create a Database Mirroring end-point
-- The Azure Market Place lab already has the end-point created for you, check these with
 :Connect SQLSERVER-0
SELECT * FROM sys.database_mirroring_endpoints;
GO
  :Connect SQLSERVER-1
SELECT * FROM sys.database_mirroring_endpoints;
GO
-- If you need to create an endpoint use the following syntax
--CREATE ENDPOINT hadr_endpoint  
--    STATE=STARTED   
--    AS TCP (LISTENER_PORT=5022)   
--    FOR DATABASE_MIRRORING (ROLE=ALL);  
--GO  
  
 
---------------------------------------------------------
-- Start the AlwaysOn Extended Events Session
---------------------------------------------------------
-- On the primary replica
:Connect SQLSERVER-0
IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='AlwaysOn_health')
BEGIN
  ALTER EVENT SESSION [AlwaysOn_health] ON SERVER WITH (STARTUP_STATE=ON);
END
IF NOT EXISTS(SELECT * FROM sys.dm_xe_sessions WHERE name='AlwaysOn_health')
BEGIN
  ALTER EVENT SESSION [AlwaysOn_health] ON SERVER STATE=START;
END
GO
-- On the secondary replica
:Connect SQLSERVER-1
IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='AlwaysOn_health')
BEGIN
  ALTER EVENT SESSION [AlwaysOn_health] ON SERVER WITH (STARTUP_STATE=ON);
END
IF NOT EXISTS(SELECT * FROM sys.dm_xe_sessions WHERE name='AlwaysOn_health')
BEGIN
  ALTER EVENT SESSION [AlwaysOn_health] ON SERVER STATE=START;
END
GO


 ---------------------------------------------------------
 -- Create the AG on the primary
 ---------------------------------------------------------
:Connect SQLSERVER-0
USE [master];
CREATE AVAILABILITY GROUP DbHealthOptDemoAg   
 WITH (DB_FAILOVER = ON)
 FOR DATABASE SuspectDb
 REPLICA ON
  'SQLSERVER-0' WITH   
   (  
   ENDPOINT_URL = 'TCP://SQLSERVER-0.Contoso.com:5022',  
   AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,  
   FAILOVER_MODE = AUTOMATIC,
   SEEDING_MODE = AUTOMATIC
   ),  
  'SQLSERVER-1' WITH   
   (  
   ENDPOINT_URL = 'TCP://SQLSERVER-1.Contoso.com:5022',  
   AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,  
   FAILOVER_MODE = AUTOMATIC,
   SEEDING_MODE = AUTOMATIC
   );   
GO  

 ---------------------------------------------------------
 -- Join the secondary replica to the AG (allow for seeding)
 ---------------------------------------------------------
:Connect SQLSERVER-1
ALTER AVAILABILITY GROUP DbHealthOptDemoAg JOIN;  
GO  
ALTER AVAILABILITY GROUP DbHealthOptDemoAg GRANT CREATE ANY DATABASE;
GO

After that step you should have an Availability Group that looks like this:


On your primary replica you should have the following databases


On the secondary replica you should only have the SuspectDb for now. We still need to do some work to setup the CorruptDb.


If you do not have the SuspectDb on the secondary replica after joining the replica to the AG then the Automatic Seeding option may not have succeeded. The script provided contains the required steps to restore and join the db on the seconary replica.



3. Verify Database Level Health Detection option is enabled
Open the properties of the AG and make sure the option is checked.



Alternatively check the value of db_failover is 1 in sys.availability_groups [04.VerifyAg.EnableDBHealthLevelDetection.sql]
--- YOU MUST EXECUTE THE FOLLOWING SCRIPT IN SQLCMD MODE.
:Connect SQLSERVER-0
USE [master]
GO
SELECT name, db_failover FROM sys.availability_groups
GO

If you need to enable the setting either use the GUI or run 03.AlterAG.EnableDBHealthLevelDetection.sql
--- YOU MUST EXECUTE THE FOLLOWING SCRIPT IN SQLCMD MODE.
:Connect SQLSERVER-0
USE [master]
GO
ALTER AVAILABILITY GROUP [DbHealthOptDemoAg] SET(DB_FAILOVER = ON);
GO


4. Corrupt the demo database
The first thing we are going to look at is what happens when you have page level corruption, and is corruption really corruption. Full credit to Paul Randle for an existing blog on how to do this. See https://www.sqlskills.com/blogs/paul/dbcc-writepage/

Because we are going to use DBCC WRITEPAGE and need the database to be in single user mode, we have not yet been able to add it to the Availability Group. This won't impact the demo though, as when a database is brought online or added to an availability group the integrity of data pages are not validated.

Run the following statements to corrupt the database. WARNING!!! Use this at your own risk and never ever ever ever do this on a production environment.

[05.CorruptData.sql]
-- get the page information of our demo database and table
:Connect SQLSERVER-0
DBCC IND (N'CorruptDb', N'DemoData', -1);
GO

-- Corrup the page data (remember to change the page number)
:Connect SQLSERVER-0
ALTER DATABASE [CorruptDb] SET SINGLE_USER;
GO
DBCC WRITEPAGE (N'CorruptDb', 1, [PageId], 4000, 1, 0x45, 1);
GO
ALTER DATABASE [CorruptDb] SET MULTI_USER;
GO

-- check that we have caused page level corruption, cause a data check
:Connect SQLSERVER-0
DBCC DROPCLEANBUFFERS
GO
USE [CorruptDb]
GO
SELECT * FROM [dbo].[DemoData];
GO


If this goes to plan then you should receive this error:

Msg 824, Level 24, State 2, Line 27
SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x3ea8609e; actual: 0x3ea8259e). It occurred during a read of page (1:320) in database ID 6 at offset 0x00000000280000 in file 'F:\DATA\CorruptDb_data.mdf'.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.




5. Add the database into the Availability Groups
Now that we have a corrupted database, lets add it into the availability group using the following statements

[06.AddCorruptDbToAG.sql]
--- YOU MUST EXECUTE THE FOLLOWING SCRIPT IN SQLCMD MODE.

-- Add the database to the AG on the Primary
:Connect SQLSERVER-0
USE [master]
GO
ALTER AVAILABILITY GROUP [DbHealthOptDemoAg]
MODIFY REPLICA ON N'SQLSERVER-1' WITH (SEEDING_MODE = AUTOMATIC)
GO
USE [master]
GO
ALTER AVAILABILITY GROUP [DbHealthOptDemoAg]
ADD DATABASE [CorruptDb];
GO

-- Make sure the secondary is set to Auto Seed with CREATE DB permissions
:Connect SQLSERVER-1
ALTER AVAILABILITY GROUP [DbHealthOptDemoAg] GRANT CREATE ANY DATABASE;
GO


-- if auto seeding doesn't automatically work, check the logs as if you haven't cleaned up
-- the data and log files from previous demos they may prevent the auto seeding.

-- The following statements can be used if AUTO Seeding doesn't run (once the issue is resolved)
:Connect SQLSERVER-0
BACKUP DATABASE [CorruptDb] TO DISK = N'\\CLUSTER-FSW\SQLBACKUPS\CorruptDb_addtoAG.bak' WITH FORMAT;  
GO
 :Connect SQLSERVER-1
RESTORE DATABASE [CorruptDb] FROM DISK = N'\\CLUSTER-FSW\SQLBACKUPS\CorruptDb_addtoAG.bak' WITH NORECOVERY;
GO 
:Connect SQLSERVER-0 
BACKUP LOG [CorruptDb] TO DISK = N'\\CLUSTER-FSW\SQLBACKUPS\CorruptDb_addtoAG.trn' WITH FORMAT;
GO  
:Connect SQLSERVER-1
RESTORE LOG [CorruptDb] FROM DISK = N'\\CLUSTER-FSW\SQLBACKUPS\CorruptDb_addtoAG.trn' WITH NORECOVERY;  
GO
ALTER DATABASE [CorruptDb] SET HADR AVAILABILITY GROUP = [DbHealthOptDemoAg];  
GO  


6. Reviewing the behavior of a corrupted database in Availability Groups
Now you should have both databases in the Availability Group. Open the Availability Group Dashboard and take note that everything is in a health state.


.... but wait. Didn't we turn on DB health level detection for the AG and corrupt the database. Run a statement to force SQL Server to generate the Logical consistency I/O error like before

[07.GenerateLogicalConsistencyError.sql]
--- YOU MUST EXECUTE THE FOLLOWING SCRIPT IN SQLCMD MODE.

-- Generate the Logical Consistency check error
:Connect SQLSERVER-0
DBCC DROPCLEANBUFFERS
GO
USE [CorruptDb]
GO
SELECT * FROM [dbo].[DemoData];
GO

But did this cause a failover. Check the state of the Availability Group either within Object Explorer or within the dashboard.



Notice that this didn't cause a failover. This is because this is not enough to affect the database state. If this database was not in an Availability Group it would still remain in an online state (as it was when we corrupted it). The Database Health Level Detection only triggers when an event is significant enough to affect the database state (e.g. Offline, Suspect, etc). To quote the official documentation "The database level health detection notices when a database is no longer in the online status".

So the lesson at this point is that database corruption, at least page level corruption is not enough to trigger the failover. Even if you run a DBCC CHECKDB it won't trigger the failover (HINT: give that a go for yourself). So the recommendation here is that you should be running regular DBCC CHECKDB on secondaries and failing over frequently (thus running consistency checks on all replicas regularly).

7. Simulate Suspect database state
So now lets look at what happens when something significant happens to affect the database status. The easiest way to simulate this would be to use the ALTER DATABASE statement, however when a database is added to an Availability Group there are many statements that the database engine will not allow you to execute to protect the state of the database.

So the way we can replicate this is to simulate an I/O failure.

Remember when we created the SuspectDb on SQLSERVER-0 I set the Transaction Log's path to a different disk/partition to the other databases.

Open Disk Management on the primary node SQLSERVER-0


Locate the Disk hosting the G:\ volume, or whatever volume you have stored the SuspectDb Transaction Log file. Right click the disk and select Offline.


In SQL Management Studio, refresh the Availability Groups branch of Object Explorer for SQLSERVER-0. Notice that it is still the primary replica.



At this stage we haven't caused an I/O operation (in our lab there is no Maintenance Tasks to backup the Transaction Logs). So lets insert some data to cause the Transaction Log to be accessed.

[08.WriteToSuspectDbTransLog.sql]
--- YOU MUST EXECUTE THE FOLLOWING SCRIPT IN SQLCMD MODE.

:Connect SQLSERVER-0
USE [SuspectDb]
GO
INSERT INTO [dbo].[DemoData] (demoData) VALUES ('Data to push through the TransLog');
GO

Bingo we generated an I/O which is significant enough to affect the database status.

Msg 945, Level 14, State 2, Line 3
Database 'SuspectDb' cannot be opened due to inaccessible files or insufficient memory or disk space.  See the SQL Server errorlog for details.


But what happened to the Availability Group. Refresh the Object Explorer for SQLSERVER-0, and notice the status of the database and the Availability Group role.


So it looks like we failed over, but only for the DBHealthOptAg, because that was the only AG where a database status changed. Let's check the state of our Availability Group though post failover. Switch to SQLSERVER-1 in Object Explorer and expand the Availability Groups. Open the Dashboard for the Availability Group.


This confirms we have failed over with SQLSERVER-1 now the primary replica due to the I/O error experienced on SQLSERVER-0 due to a storage sub-system error (loss of disk). The DB Health Level Detection setting worked. Without that setting enabled this would not have caused a failover.


My final thoughts
The Database Level Health Check setting is a great new capability for increasing the availability of your databases contained in an Availability Group when one of the databases experiences a significant enough issue to affect the status of the database, like a loss of disk. It still will not protect you from certain data issues like corruption. So monitoring and maintenance is still critical!

Finally, you should also consider the Flexible Automatic Failover Policy. While I do not recommend changing this blindly, you should ensure you understand what role that setting plays in Automatic Failover.

All the scripts used in this post are located in my GitHub https://github.com/Matticusau/SQLDemos/tree/master/SQL2016AGDBHealthLevelDetection



References relevant to this post

Availability group database level health detection failover option

Credit to Paul Randle for how to use DBCC WRITEPAGE - https://www.sqlskills.com/blogs/paul/dbcc-writepage/


Configure Flexible Automatic Failover Policy

Flexible Automatic Failover Policy - Availability Group


Legal Stuff: The contents of this blog is provided “as-is”. The information, opinions and views expressed are those of the author and do not necessarily state or reflect those of any other company with affiliation to the products discussed. This includes any URLs or Tools. The author does not accept any responsibility from the use of the information or tools mentioned within this blog, and recommends adequate evaluation against your own requirements to measure suitability.

Thursday, April 27, 2017

Installing SqlServer PowerShell module from PowerShellGallery.com

With the release of SQL Server Management Studio 17.0 (SSMS) the SqlServer PowerShell module has finally been decoupled and now can be installed independently. To take full advantage of this change I recommend using PowerShell 5.0 (if your on Windows 10 or Server 2016 then this is your default).

Full details of the SSMS 17.0 updates https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms

The official installation details for SqlServer module are at
https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-ps-module but they are a bit light on currently so this post will walk through the process in more detail. You can also find the public listing of the SqlServer on PowerShellGallery.

Firstly check what modules you have installed.

Get-Module -Name SQL* -ListAvailable


If you have installed the SQL Server DBEngine then you should have SQLPS for that particular version. On my server I upgraded SSMS 2016 to SSMS 16.0 so I have both 130 and 140 versions installed.

What I don't currently have installed is the SqlServer PowerShell module, and my SQLPS module hasn't been updated with the upgrade of SSMS to version 17.0

So let's download and install the latest SqlServer module.

First, check your PowerShellGet is configured for the public gallery (the default setup).

Get-PSRepository



You should have the PSGallery repository setup. By default it will be untrusted, this will just mean you get prompted before installing any modules from it. It is recommended public repositories should not be marked as trusted, but internal private repositories can be to make the installations easier.

If you don't have any other repositories configured then you won't have to supply the repository name to any future commands, but if you do then you can simplify the module searches by specifying the repository name. This is very useful if you want to host private custom copies of the publicly available modules on a private repository.

Now you can search for the module in the gallery to check what the most recent version is available

Find-Module -Name SqlServer

NOTE: If prompted to continue enter 'y'



Add the -AllVersions parameter to list all the available versions. Currently there is only one version in the gallery but this will change overtime.

To install the module from the gallery run Install-Module. Use the Scope parameter to install for either CurrentUser or AllUsers based on your needs. With PowerShell 5.0 it is also possible to install versions side-by-side. In my case I am installing the module for all users and requiring a specific version (the current one).

To avoid the error message like "A command with name '' is already available on this system" I will also use the parameter -AllowClobber.



Install-Module -Name SqlServer -Scope AllUsers -RequiredVersion 21.0.17099 -AllowClobber





Now check that the module is available for use on the server with

Get-Module -Name SqlServer -ListAvailable



Notice that the path is now one of the common $PSModulePath paths. This is also part of the improvements that these changes bring.

Now going forward to update the module you can use the Find-Module and Update-Module cmdlets. Look out for my post on automating that process next month.


Happy PowerShell'ing


Legal Stuff: The contents of this blog is provided “as-is”. The information, opinions and views expressed are those of the author and do not necessarily state or reflect those of any other company with affiliation to the products discussed. This includes any URLs or Tools. The author does not accept any responsibility from the use of the information or tools mentioned within this blog, and recommends adequate evaluation against your own requirements to measure suitability.