MatticusAU Blog: SQL Server 2016

Showing posts with label SQL Server 2016. Show all posts

Monday, October 2, 2017

Troubleshooting SQL Server AlwaysOn AG Auto Seeding

SQL Server 2016 introduced a new capability with AlwaysOn Availability Groups called Automatic Seeding. This allows you to automatically start streaming the backup using VDI across the log stream transport to the secondary replica. With this capability enabled you do not have to manually backup and restore the database and transaction logs before starting data synchronization with the primary replica.

Now there are some prerequisites for using Automatic Seeding such as:

Data and Log file paths must be the same on all Replicas
Databases must be in Full recovery model and at least one Full backup must have been taken of the database (to start the recovery chain)
All other prerequisites for a database to be added to an Availability Group must also be met

You can enable Automatic Seeding either before or after creating the Availability Group.

Sometimes you will find that Automatic Seeding doesn't work, in our example it will be because a data file already exists with the same name as the database we are seeding. Which is a common problem if you have previously removed the replica from the AG and are trying to re-join it. Unfortunately the UI doesn't give you any indication that seeding failed.

Enter Extended Events.

To demonstrate this I will create an Availability Group across two replicas. I am using the SQL Server AlwaysOn solution available from the Azure Market Place as this has an Availability Group already built, but for this demonstration I will create a new AG and Demo databases.

1. Setup the environment
I have added an additional disk to the SQLSERVER-0 node, and formatted this new volume and assigned the drive letter G:. I also created the path G:\Log and set the file permissions to full control for the sql service account.

2. Create the Demo Database
The following script can be used to setup the environment.

[01.CreateDemoDbs.sql]

3. Create the Availability Group
The following script can be used to setup the environment.

[02.CreateAG.sql]

4. Verify Environment
Now you should have a working Availability Group with the AutoseedDb01 synchronized and healthy between two replicas.

5. Create AlwaysOn_AutoseedMonitor Extended Events session
While we enabled the standard AlwaysOn Health extended events session, run the following script to create a new Extended Events session for monitoring Autoseeding

[03.ExtendedEventsSession.sql]

If you want to create this manually or to explore what other events are available you will need to make sure you select the Debug channel when using the Wizard to select events.

You can also filter the category to "dbseed" to further just view the events that relate to auto seeding

6. Add the 2nd database to AG
Now use the following script to add the 2nd database to the AG. NOTE: This DB has the log file on G:\ which does not exist on the replica.

[04.AddDbToAG.sql]

7. Investigating the Auto Seeding
You will notice that the DB was not created on the replica and it is listed in the AG's DBs but with a warning.

If you open the AG Dashboard you will see the warning message as well.

The messages do not include much detail.

8. Using DMVs
We can use the following DMVs to query the status of the seeding.

[05.DMVs.sql]

9. Using Extended Events
We are capturing the "error_reported" event along with the dbseed events. If you view the Event File on the replica where the failure happened. Locate the hadr_physical_seeding_progress event. There will be a lot of these events. In the below screen shot you can see the progress event for one of the last stages of the seeding process where it reports the failure.

Locate the hadr_physical_seeding_progress event with the internal_state_desc of "WaitingForBackupToStartSending". This is an early step in the auto seed process, then you should see error_reported events. Here is where you can find the real errors. Most likely it will be the first one reported. In our case it is:
Directory lookup for the file "G:\LOG\AutoseedDb02_log.ldf" failed with the operating system error 3(The system cannot find the path specified.).

There will be many more cases where this is useful. I have also used it before when I had old data or log files on the server from previous versions of a database I was adding back into the AG as sometimes that operation will fail (depending on permissions).

Let me know what situations you find

All the scripts used in this post are located in my GitHub https://github.com/Matticusau/SQLDemos/tree/master/SQLAlwaysOnAGAutoSeeding

Reference

Automatic seeding for secondary replicas
https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/automatic-seeding-secondary-replicas

Automatically initialize Always On Availability group
https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/automatically-initialize-always-on-availability-group

Legal Stuff: The contents of this blog is provided “as-is”. The information, opinions and views expressed are those of the author and do not necessarily state or reflect those of any other company with affiliation to the products discussed. This includes any URLs or Tools. The author does not accept any responsibility from the use of the information or tools mentioned within this blog, and recommends adequate evaluation against your own requirements to measure suitability.

Monday, August 14, 2017

Taking the Availability Group Database Level Health Detection option for a spin

With SQL Server 2016 the Availability Groups now include the option to enable Database Level Health Detection. While this option is turned off by default, Microsoft strongly recommends you enable this on all Availability Groups.... it is just off by default for backwards compatibility and so you opt-in by choice.

In this post I will take a deep look into just what sort of database issues will cause the AG to failover with this option enabled.

For this walk through I am using the SQL Server AlwaysOn deployment in the Azure Market Place and have provided a link to all the scripts below so you can try out this functionality and show off your skills in your own demonstrations. If you don't have an Azure Subscription all you need is an environment with two Replicas and multiple disks mounted in the virtual machines (so you can separate the transaction logs of different databases).

To follow along with the scripts and screen shots here are the details of resources within my test lab.

Primary Replica: SQLSERVER-0
Secondary Replica: SQLSERVER-1
File Share Witness: CLUSTER-FSW

Existing Availability Group: Contoso-ag
AG Database: AutoHa-sample

To make this demo easier, I have created a file share on the File Share Witness to store backups. It is best practice that when using Availability Groups you use a central location for storing backups from all replicas. Obviously a File Share Witness is not that place but for this demo lab it is fine.

Central Backup Share: \\cluster-fsw\sqlbackups

All the scripts used in this post are located here. This link is also contained within my final thoughts at the end of this post.

Ok, lets get started.

1. Create the demo databases
To enhance this demo, we will setup new databases and an availability group to show that how this setting only impacts one AG and not another.

Run the following TSQL to setup the demo databases. NOTE: I have explicitly put the Transaction Logs for the SuspectDb on a different volume to the CorruptDb

[01.CreateDbs.sql]

--- YOU MUST EXECUTE THE FOLLOWING SCRIPT IN SQLCMD MODE.
:Connect SQLSERVER-0

USE master
GO

-- CorruptDb
CREATE DATABASE [CorruptDb] CONTAINMENT = NONE
ON PRIMARY (NAME = N'CorruptDb_data', FILENAME = N'F:\DATA\CorruptDb_data.mdf' , SIZE = 8192KB , FILEGROWTH = 65536KB )
LOG ON ( NAME = N'CorruptDb_log', FILENAME = N'F:\LOG\CorruptDb_log.ldf' , SIZE = 8192KB , FILEGROWTH = 65536KB )
GO
ALTER DATABASE [CorruptDb] SET COMPATIBILITY_LEVEL = 130
GO
ALTER DATABASE [CorruptDb] SET PAGE_VERIFY CHECKSUM  
GO
USE [CorruptDb]
GO
CREATE TABLE [dbo].[DemoData] (id INT IDENTITY PRIMARY KEY, demoData VARCHAR(200));
GO
INSERT INTO [dbo].[DemoData] (demoData) VALUES ('Test data prior to simulating page level corruption');
GO 200

-- SuspectDb (NOTE: G:\ for TransLog)
CREATE DATABASE [SuspectDb] CONTAINMENT = NONE
ON PRIMARY (NAME = N'SuspectDb_data', FILENAME = N'F:\DATA\SuspectDb_data.mdf' , SIZE = 8192KB , FILEGROWTH = 65536KB )
LOG ON ( NAME = N'SuspectDb_log', FILENAME = N'G:\LOG\SuspectDb_log.ldf' , SIZE = 8192KB , FILEGROWTH = 65536KB )
GO
ALTER DATABASE [SuspectDb] SET COMPATIBILITY_LEVEL = 130
GO
ALTER DATABASE [SuspectDb] SET PAGE_VERIFY CHECKSUM  
GO
USE [SuspectDb]
GO
CREATE TABLE [dbo].[DemoData] (id INT IDENTITY PRIMARY KEY, demoData VARCHAR(200));
GO
INSERT INTO [dbo].[DemoData] (demoData) VALUES ('Test data prior to making the database suspect');
GO 200

2, Create the demo Availability Group
We won't actually add our corrupt database into this Availability Group yet, that will come in the next few steps.

[02.CreateAG.sql]

--- YOU MUST EXECUTE THE FOLLOWING SCRIPT IN SQLCMD MODE.

-- Values you may need to change for your lab
-- Primary replica: SQLSERVER-0
-- Secondary replica: SQLSERVER-1
-- Backup Central Share: \\CLUSTER-FSW\SQLBACKUPS

---------------------------------------------------------
-- Prereqs for Availability Group
---------------------------------------------------------
:Connect SQLSERVER-0
-- Backup demo databases to share \\CLUSTER-FSW\SQLBACKUPS
BACKUP DATABASE [CorruptDb] TO DISK = N'\\CLUSTER-FSW\SQLBACKUPS\CorruptDb.bak' WITH FORMAT;  
BACKUP DATABASE [SuspectDb] TO DISK = N'\\CLUSTER-FSW\SQLBACKUPS\SuspectDb.bak' WITH FORMAT;  
GO  

---------------------------------------------------------
-- AG Endpoints
---------------------------------------------------------
-- If you are using your own lab then you need to create a Database Mirroring end-point
-- The Azure Market Place lab already has the end-point created for you, check these with
 :Connect SQLSERVER-0
SELECT * FROM sys.database_mirroring_endpoints;
GO
  :Connect SQLSERVER-1
SELECT * FROM sys.database_mirroring_endpoints;
GO
-- If you need to create an endpoint use the following syntax
--CREATE ENDPOINT hadr_endpoint  
--    STATE=STARTED   
--    AS TCP (LISTENER_PORT=5022)   
--    FOR DATABASE_MIRRORING (ROLE=ALL);  
--GO  
  
 
---------------------------------------------------------
-- Start the AlwaysOn Extended Events Session
---------------------------------------------------------
-- On the primary replica
:Connect SQLSERVER-0
IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='AlwaysOn_health')
BEGIN
  ALTER EVENT SESSION [AlwaysOn_health] ON SERVER WITH (STARTUP_STATE=ON);
END
IF NOT EXISTS(SELECT * FROM sys.dm_xe_sessions WHERE name='AlwaysOn_health')
BEGIN
  ALTER EVENT SESSION [AlwaysOn_health] ON SERVER STATE=START;
END
GO
-- On the secondary replica
:Connect SQLSERVER-1
IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='AlwaysOn_health')
BEGIN
  ALTER EVENT SESSION [AlwaysOn_health] ON SERVER WITH (STARTUP_STATE=ON);
END
IF NOT EXISTS(SELECT * FROM sys.dm_xe_sessions WHERE name='AlwaysOn_health')
BEGIN
  ALTER EVENT SESSION [AlwaysOn_health] ON SERVER STATE=START;
END
GO


 ---------------------------------------------------------
 -- Create the AG on the primary
 ---------------------------------------------------------
:Connect SQLSERVER-0
USE [master];
CREATE AVAILABILITY GROUP DbHealthOptDemoAg   
 WITH (DB_FAILOVER = ON)
 FOR DATABASE SuspectDb
 REPLICA ON
  'SQLSERVER-0' WITH   
   (  
   ENDPOINT_URL = 'TCP://SQLSERVER-0.Contoso.com:5022',  
   AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,  
   FAILOVER_MODE = AUTOMATIC,
   SEEDING_MODE = AUTOMATIC
   ),  
  'SQLSERVER-1' WITH   
   (  
   ENDPOINT_URL = 'TCP://SQLSERVER-1.Contoso.com:5022',  
   AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,  
   FAILOVER_MODE = AUTOMATIC,
   SEEDING_MODE = AUTOMATIC
   );   
GO  

 ---------------------------------------------------------
 -- Join the secondary replica to the AG (allow for seeding)
 ---------------------------------------------------------
:Connect SQLSERVER-1
ALTER AVAILABILITY GROUP DbHealthOptDemoAg JOIN;  
GO  
ALTER AVAILABILITY GROUP DbHealthOptDemoAg GRANT CREATE ANY DATABASE;
GO

After that step you should have an Availability Group that looks like this:

On your primary replica you should have the following databases

On the secondary replica you should only have the SuspectDb for now. We still need to do some work to setup the CorruptDb.

If you do not have the SuspectDb on the secondary replica after joining the replica to the AG then the Automatic Seeding option may not have succeeded. The script provided contains the required steps to restore and join the db on the seconary replica.

3. Verify Database Level Health Detection option is enabled
Open the properties of the AG and make sure the option is checked.

Alternatively check the value of db_failover is 1 in sys.availability_groups [04.VerifyAg.EnableDBHealthLevelDetection.sql]

--- YOU MUST EXECUTE THE FOLLOWING SCRIPT IN SQLCMD MODE.
:Connect SQLSERVER-0
USE [master]
GO
SELECT name, db_failover FROM sys.availability_groups
GO

If you need to enable the setting either use the GUI or run 03.AlterAG.EnableDBHealthLevelDetection.sql

--- YOU MUST EXECUTE THE FOLLOWING SCRIPT IN SQLCMD MODE.
:Connect SQLSERVER-0
USE [master]
GO
ALTER AVAILABILITY GROUP [DbHealthOptDemoAg] SET(DB_FAILOVER = ON);
GO

4. Corrupt the demo database

The first thing we are going to look at is what happens when you have page level corruption, and is corruption really corruption. Full credit to Paul Randle for an existing blog on how to do this. See https://www.sqlskills.com/blogs/paul/dbcc-writepage/

Because we are going to use DBCC WRITEPAGE and need the database to be in single user mode, we have not yet been able to add it to the Availability Group. This won't impact the demo though, as when a database is brought online or added to an availability group the integrity of data pages are not validated.

Run the following statements to corrupt the database. WARNING!!! Use this at your own risk and never ever ever ever do this on a production environment.

[05.CorruptData.sql]

-- get the page information of our demo database and table
:Connect SQLSERVER-0
DBCC IND (N'CorruptDb', N'DemoData', -1);
GO

-- Corrup the page data (remember to change the page number)
:Connect SQLSERVER-0
ALTER DATABASE [CorruptDb] SET SINGLE_USER;
GO
DBCC WRITEPAGE (N'CorruptDb', 1, [PageId], 4000, 1, 0x45, 1);
GO
ALTER DATABASE [CorruptDb] SET MULTI_USER;
GO

-- check that we have caused page level corruption, cause a data check
:Connect SQLSERVER-0
DBCC DROPCLEANBUFFERS
GO
USE [CorruptDb]
GO
SELECT * FROM [dbo].[DemoData];
GO

If this goes to plan then you should receive this error:

Msg 824, Level 24, State 2, Line 27
SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x3ea8609e; actual: 0x3ea8259e). It occurred during a read of page (1:320) in database ID 6 at offset 0x00000000280000 in file 'F:\DATA\CorruptDb_data.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

5. Add the database into the Availability Groups
Now that we have a corrupted database, lets add it into the availability group using the following statements

[06.AddCorruptDbToAG.sql]

--- YOU MUST EXECUTE THE FOLLOWING SCRIPT IN SQLCMD MODE.

-- Add the database to the AG on the Primary
:Connect SQLSERVER-0
USE [master]
GO
ALTER AVAILABILITY GROUP [DbHealthOptDemoAg]
MODIFY REPLICA ON N'SQLSERVER-1' WITH (SEEDING_MODE = AUTOMATIC)
GO
USE [master]
GO
ALTER AVAILABILITY GROUP [DbHealthOptDemoAg]
ADD DATABASE [CorruptDb];
GO

-- Make sure the secondary is set to Auto Seed with CREATE DB permissions
:Connect SQLSERVER-1
ALTER AVAILABILITY GROUP [DbHealthOptDemoAg] GRANT CREATE ANY DATABASE;
GO


-- if auto seeding doesn't automatically work, check the logs as if you haven't cleaned up
-- the data and log files from previous demos they may prevent the auto seeding.

-- The following statements can be used if AUTO Seeding doesn't run (once the issue is resolved)
:Connect SQLSERVER-0
BACKUP DATABASE [CorruptDb] TO DISK = N'\\CLUSTER-FSW\SQLBACKUPS\CorruptDb_addtoAG.bak' WITH FORMAT;  
GO
 :Connect SQLSERVER-1
RESTORE DATABASE [CorruptDb] FROM DISK = N'\\CLUSTER-FSW\SQLBACKUPS\CorruptDb_addtoAG.bak' WITH NORECOVERY;
GO 
:Connect SQLSERVER-0 
BACKUP LOG [CorruptDb] TO DISK = N'\\CLUSTER-FSW\SQLBACKUPS\CorruptDb_addtoAG.trn' WITH FORMAT;
GO  
:Connect SQLSERVER-1
RESTORE LOG [CorruptDb] FROM DISK = N'\\CLUSTER-FSW\SQLBACKUPS\CorruptDb_addtoAG.trn' WITH NORECOVERY;  
GO
ALTER DATABASE [CorruptDb] SET HADR AVAILABILITY GROUP = [DbHealthOptDemoAg];  
GO

6. Reviewing the behavior of a corrupted database in Availability Groups
Now you should have both databases in the Availability Group. Open the Availability Group Dashboard and take note that everything is in a health state.

.... but wait. Didn't we turn on DB health level detection for the AG and corrupt the database. Run a statement to force SQL Server to generate the Logical consistency I/O error like before

[07.GenerateLogicalConsistencyError.sql]

--- YOU MUST EXECUTE THE FOLLOWING SCRIPT IN SQLCMD MODE.

-- Generate the Logical Consistency check error
:Connect SQLSERVER-0
DBCC DROPCLEANBUFFERS
GO
USE [CorruptDb]
GO
SELECT * FROM [dbo].[DemoData];
GO

But did this cause a failover. Check the state of the Availability Group either within Object Explorer or within the dashboard.

Notice that this didn't cause a failover. This is because this is not enough to affect the database state. If this database was not in an Availability Group it would still remain in an online state (as it was when we corrupted it). The Database Health Level Detection only triggers when an event is significant enough to affect the database state (e.g. Offline, Suspect, etc). To quote the official documentation "The database level health detection notices when a database is no longer in the online status".

So the lesson at this point is that database corruption, at least page level corruption is not enough to trigger the failover. Even if you run a DBCC CHECKDB it won't trigger the failover (HINT: give that a go for yourself). So the recommendation here is that you should be running regular DBCC CHECKDB on secondaries and failing over frequently (thus running consistency checks on all replicas regularly).

7. Simulate Suspect database state
So now lets look at what happens when something significant happens to affect the database status. The easiest way to simulate this would be to use the ALTER DATABASE statement, however when a database is added to an Availability Group there are many statements that the database engine will not allow you to execute to protect the state of the database.

So the way we can replicate this is to simulate an I/O failure.

Remember when we created the SuspectDb on SQLSERVER-0 I set the Transaction Log's path to a different disk/partition to the other databases.

Open Disk Management on the primary node SQLSERVER-0

Locate the Disk hosting the G:\ volume, or whatever volume you have stored the SuspectDb Transaction Log file. Right click the disk and select Offline.

In SQL Management Studio, refresh the Availability Groups branch of Object Explorer for SQLSERVER-0. Notice that it is still the primary replica.

At this stage we haven't caused an I/O operation (in our lab there is no Maintenance Tasks to backup the Transaction Logs). So lets insert some data to cause the Transaction Log to be accessed.

[08.WriteToSuspectDbTransLog.sql]

--- YOU MUST EXECUTE THE FOLLOWING SCRIPT IN SQLCMD MODE.

:Connect SQLSERVER-0
USE [SuspectDb]
GO
INSERT INTO [dbo].[DemoData] (demoData) VALUES ('Data to push through the TransLog');
GO

Bingo we generated an I/O which is significant enough to affect the database status.

Msg 945, Level 14, State 2, Line 3
Database 'SuspectDb' cannot be opened due to inaccessible files or insufficient memory or disk space. See the SQL Server errorlog for details.

But what happened to the Availability Group. Refresh the Object Explorer for SQLSERVER-0, and notice the status of the database and the Availability Group role.

So it looks like we failed over, but only for the DBHealthOptAg, because that was the only AG where a database status changed. Let's check the state of our Availability Group though post failover. Switch to SQLSERVER-1 in Object Explorer and expand the Availability Groups. Open the Dashboard for the Availability Group.

This confirms we have failed over with SQLSERVER-1 now the primary replica due to the I/O error experienced on SQLSERVER-0 due to a storage sub-system error (loss of disk). The DB Health Level Detection setting worked. Without that setting enabled this would not have caused a failover.

My final thoughts

The Database Level Health Check setting is a great new capability for increasing the availability of your databases contained in an Availability Group when one of the databases experiences a significant enough issue to affect the status of the database, like a loss of disk. It still will not protect you from certain data issues like corruption. So monitoring and maintenance is still critical!

Finally, you should also consider the Flexible Automatic Failover Policy. While I do not recommend changing this blindly, you should ensure you understand what role that setting plays in Automatic Failover.

All the scripts used in this post are located in my GitHub https://github.com/Matticusau/SQLDemos/tree/master/SQL2016AGDBHealthLevelDetection

References relevant to this post

Availability group database level health detection failover option

Credit to Paul Randle for how to use DBCC WRITEPAGE - https://www.sqlskills.com/blogs/paul/dbcc-writepage/

Configure Flexible Automatic Failover Policy

Flexible Automatic Failover Policy - Availability Group

Legal Stuff: The contents of this blog is provided “as-is”. The information, opinions and views expressed are those of the author and do not necessarily state or reflect those of any other company with affiliation to the products discussed. This includes any URLs or Tools. The author does not accept any responsibility from the use of the information or tools mentioned within this blog, and recommends adequate evaluation against your own requirements to measure suitability.

Tuesday, December 20, 2016

Why is the with_copy option critical when restoring a database configured with StretchDb

StretchDB is a new capability in SQL Server 2016 which provides the ability to effectively partition your data between warm and cold data, and more importantly allow SQL Server to automatically host that data in an Azure SQL Database.... sounds great in concept and in cases where you have a large amount of cold data which just needs to be queried and not updated (e.g. archived data for reporting).

Now, there is one critically thing you really need to be aware of and that is when it comes time to Restore or more specifically Migrate the on-premises database that is stretched. Let's take this scenario for example.

You are planning to upgrade the production SQL Server to SQL Server 2016 SP1. The business needs do not allow for an in-place upgrade, so you have built a new server and are planning to migrate production database to the new server.

It could be a DR scenario but more likely the issues I am talking about will be during migrations.

The issue you need to be aware of is that after restoring the on-premises database you then need to reauthorize the connection to the Azure SQL Database (official details at https://msdn.microsoft.com/en-us/library/mt733205.aspx). When you perform this step, you have to specify the parameter WITH_COPY which can either point the database to the existing remote table in the Azure SQL Database, or create a new copy of the remote table to use. The recommended approach is to use a new copy of the remote table (with_copy = 1). I would also recommend this approach for database migrations, and then manually clean up the old table. The reason I recommend this is as I will show in this post, if your old/current database is still in use and you point a new database to the same remote table, you can experience conflicts between the data reconciliations and this will result in cold data loss.

So let's explore the issue.

Firstly here is the environment I am working in:

On-premises SQL Server 2016 SP1 (13.0.4001.0). I have one database StretchDBDemo which contains the following tables:

dbo.Department

DepartmentID, DepartmentName, OfficeLoc

dbo.People

PersonID, FullName, IsEmployee, PhoneNumber, EmailAddress

The full schema is available in the script at the end of this post.

We are going to use the dbo.People table to demonstrate this issue. So lets start with some data.

INSERT INTO [dbo].[People] ([PersonID], [FullName], [IsEmployee], [PhoneNumber], [EmailAddress]) VALUES (1, 'Matt', 1, '0','yourname@email.com');
INSERT INTO [dbo].[People] ([PersonID], [FullName], [IsEmployee], [PhoneNumber], [EmailAddress]) VALUES (2, 'Steve', 0, '0','yourname@email.com');

Next step is to setup StretchDB feature. This has to be done through the GUI and is pretty straight forward but here are the steps I have used:

Right click on the Database StretchDBDemo in Object Explorer
Select Tasks > Stretch > Enable

In the GUI select Next
Place a tick in the box next to the dbo.People table. Notice the warning icon. Important to note as it does have some relevance to the behaviour we will explore.
Click the "Entire Table" link under the Migrate column to launch the filter wizard

Setup a filter called IsEmployee False, with the predicate "IsEmployee = 0". Click the Check button and then Done.

NOTE: If you don't have any rows in the dbo.People table that match this predicate you won't be able to proceed past this point.

Click Next and then authenticate to Azure.
Select either an existing Azure SQL Database Server or create a new StretchDB server.
Follow the wizard to completion. I recommend taking a screen snippet of the summary page for reference.

You can then check that the StretchDb feature is working by using the Monitor from the Stretch menu on the Database object.

You could also query the following tables to check this data manually
```
SELECT * FROM sys.dm_db_rda_migration_status
SELECT * FROM sys.dm_db_rda_schema_update_status
```

You will need to check the configuration of the StretchDb, this is critical as we need the Azure SQL Server address.

-- the db config

SELECT * FROM sys.remote_data_archive_databases
-- the tables config
SELECT * FROM sys.remote_data_archive_tables
-- the data source
SELECT * FROM sys.external_data_sources

Now that you have the remote server's address, in SSMS Object Explorer connect to the Azure SQL Database server

Expand the Databases branch in Object Explorer, expand your stretched remote database (the name is in the tables queried above). Then expand Tables. Note how no tables are displayed here. Any stretched tables are deliberately hidden and you should not query those tables in general practice. However we are troubleshooting/demonstrating an issue so we will query the tables directly.
Query sys.Tables to find the name of the stretched table
Now query the stretched Table in the Azure SQL Database. You should have 1 record for 'Steve'.

Lets add a new record to our on-premises database that will be stretched to the remote server.

INSERT INTO [dbo].[People] ([PersonID], [FullName], [IsEmployee], [PhoneNumber], [EmailAddress]) VALUES (3, 'Chris', 0, '0','yourname@email.com');

Using the Monitor or the TSQL queries from earlier check on the status of the StretchDb feature. After some time query the stretched table in the Azure SQL Database again to make sure the new record is stretched there.

Now the fun part starts. Lets simulate a database migration. In my lab I am just using the same server, but the experience is the same.

Take a backup of the on-premises StretchDbDemo database. You could elect to disable StretchDb first and bring all the data back on-premises, but we trust the Azure backups to keep our data in the cloud safe.
Next restore the backup to the new database we are migrating the application to. I have used the database name StretchDbDemoNew.
Once the database is restored it isn't immediately Stretching database as it needs to be re-authorized to use the remote server. This is outlined in https://msdn.microsoft.com/en-us/library/mt733205.aspx.

First get the name of the credential that exists. If you were migrating this to another server you will need to recreate the credential on that server. The name of the credential needs to match the remote server address.
```
SELECT * FROM sys.database_scoped_credentials
```

Now use the following statement to

-- reauthorize using the existing credential information
USE [StretchDBDemoNew];
GO
EXEC sp_rda_reauthorize_db 
  @credential = N'',
  @with_copy = 0;
GO

Here is where the problem begins.

When you execute sp_rda_reauthorize_db and specify with_copy = 0, it uses the existing Azure SQL Database as the endpoint. So now we actually have two on-premises databases StretchDBDemo and StretchDBDemoNew, both pointing to the same Azure SQL Database table for stretched data. If you specify with_copy = 1, it actually creates a copy of the data in a new table and therefore the old and new on-premises databases have different stretch data end-points.

So why does this matter. Well lets add some data to the new on-premises database because well we want to make sure it's all work as you would in any migration.

INSERT INTO [StretchDBDemoNew].[dbo].[People] ([PersonID], [FullName], [IsEmployee], [PhoneNumber], [EmailAddress]) VALUES (10, 'Bill', 0, '0','yourname@email.com');

HINT: Any rows I insert into the new database I will use a PersonID >= 10 as this helps with the demonstration.

Now this should get stretched to the Azure SQL Database. So switch to that query window and check it's there.

NOTE: it might take a few minutes for the reconciliation to occur.

Here is where things get interesting. After a few minutes try querying the data again and see if there is any change. While writing this post the remote record disappeared, however when writing the scripts it didn't. So what is happening..... to find out lets compare local and remote data more.

With the new record still in Azure, query the old on-premises database. If your lucky you will find the record inserted into the new database which has been stretched to Azure also returned when you query the original database. Your experience here may differ as it's all a matter of timing.

Just to add fuel to the fire I inserted another record into my new database that would be stretched to Azure.

INSERT INTO [StretchDBDemoNew].[dbo].[People] ([PersonID], [FullName], [IsEmployee], [PhoneNumber], [EmailAddress]) VALUES (11, 'Bart', 0, '0','yourname@email.com');

But what about our existing on-premises database. Remember in my scenario we are only testing the migration so you would assume there are still active connections querying that database and potentially data being updated which would then stretch into Azure. So lets insert two records into that database which will be stretch.

INSERT INTO [StretchDBDemo].[dbo].[People] ([PersonID], [FullName], [IsEmployee], [PhoneNumber], [EmailAddress]) VALUES (4, 'Chad', 0, '0','yourname@email.com');
INSERT INTO [StretchDBDemo].[dbo].[People] ([PersonID], [FullName], [IsEmployee], [PhoneNumber], [EmailAddress]) VALUES (5, 'Dan', 0, '0','yourname@email.com');

So in our existing database we now have this data being returned

What about in our new database.

Wait a minute, those records 'Chad' and 'Dan' were just inserted into our existing database, and where did our records for 'Bill' and 'Bart' go.

Here is the issue, with both on-premises databases pointing to the same remote database through the with_copy = 0 option, it means that the reconciliation processes in the databases will conflict. One of the databases will effectively become the master and overwrite what records the other sends. As I mentioned both times I ran through the scripts I got different experiences where one time the new database was the master, and the other time (writing this blog) the original database was the master and overwrote the records.

So, the good news is that if you use with_copy = 1 option then the databases use separate remote tables and therefore do not conflict. So my recommendation is ALWAYS USE "WITH_COPY = 1" when reauthorizing a database to the remote server.

In saying that a few things to keep in mind, over time if you move the database a lot you could end up with lots of tables in the remote database so best to monitor that if you want to keep your costs down.

Also if at any stage you are querying the stretched tables in the on-premises databases and you have this conflict situation you could experience a number of cases of this error.

I hope this blog post helps when it comes time to planning your migrations. Any questions or feedback please leave them in the comments section.

Get the complete demo scripts here.

Legal Stuff: The contents of this blog is provided “as-is”. The information, opinions and views expressed are those of the author and do not necessarily state or reflect those of any other company with affiliation to the products discussed. This includes any URLs or Tools. The author does not accept any responsibility from the use of the information or tools mentioned within this blog, and recommends adequate evaluation against your own requirements to measure suitability.

Sunday, August 21, 2016

New SQL Server Builds API.... improve your health checks with version and support information

Things have been a bit busy for me personally over the last few months but that is another story. What I have done is use some of this time to get back to a little project I started a while back.

Introducing the SQL Server Builds API..... providing version information including release dates, and known support life-cycle information.

Overview

The SQL Server Builds API provides SQL Server Version information and known support life-cycle information. As it is a personal project of mine the data is currently manually maintained based on known sources. The main objective of the tool is to enhance health check and project planning type processes, it is not intended for licensing or other legal auditing processes.

In prior roles and particularly in my current role one of the most common checks I perform on a SQL Server is the version and also the patching process. To aid in this what I would do is run reports based off my own custom database of version numbers to check and validate the version of the instance being checked.

Querying the API
To query the API it is pretty simple. The base URI is http://sqlserverbuildsapi.azurewebsites.net/api/SQLVersions

The API takes two parameters

Version - This is a string representation of the version of your SQL Instance returned by SELECT SERVERPROPERTY('PRODUCTVERSION') e.g. '12.00.2342.00'
MissingUpdates - A boolean value (True/False) of if to include any updates released since the current patch level in the results

So if I wanted to query for information about the version number 12.00.2342.00 my completed URI would be something like:

http://sqlserverbuildsapi.azurewebsites.net/api/SQLVersions?Version=12.00.2342.00&MissingUpdates=false

NOTE: The MissingUpdates param defaults to false so you could omit it from the URI above.

Where the data comes from

The official source of truth with life-cycle data has to be Microsoft Support Life-cycle (https://support.microsoft.com/lifecycle), and this API/tool is by no means trying to replace that. In fact if you are doing any official auditing for licensing or any other legal means you MUST use that source over this API's data. At least perform your own validation of the data for each record.

Most DBA's and SQL admins would refer to http://sqlserverbuilds.blogspot.com.au. I am no different and I would previously refer to that site as well. So, it is only natural that it is still one of the sources I go to for building my API's database.

The database behind this API is currently manually maintained, and even if it was somewhat automated mistakes can happen. If you find I am missing any updates, or if there is any incorrect data please make sure you reach out and let me know.

Suggestions for using the API

The API could be used in any number of tools and checks. Here are just some I can think of:

Custom health check scripts
Enhance on-premise CMDB data
PowerBI reports
Other custom Applications that interface/manage SQL Server instances

PowerShell Examples

#1 - Get version information (static version number)

# Set the version number to test
$VersionNumber = '12.00.2342.00' # SQL 2014

# Call the API to get the version information
$VersionData = Invoke-RestMethod -Uri "http://sqlserverbuildsapi.azurewebsites.net/api/SQLVersions?Version=$($VersionNumber)";

# at this point simply return the $VersionData variable for the raw info
$VersionData

# Want to improve your Health Check script, calculate the health of Support Status
if ($VersionData.ExtendedSupportEnd -le (Get-Date)) {$SupportStatus = 'Critical'}
elseif ($VersionData.MainstreamSupportEnd -le (Get-Date)) {$SupportStatus = 'Warning'}
else {$SupportStatus = 'Ok'}

# format the output data string
$OutputData = @"
Instance = $($Instance.Name)
Version = $($VersionData.Version)
Product = $($VersionData.Product)
Branch = $($VersionData.Branch)
Update = $($VersionData.Update)
MainstreamSupportEnd = $($VersionData.MainstreamSupportEnd)
ExtendedSupportEnd = $($VersionData.ExtendedSupportEnd)
SupportStatus = $($SupportStatus)
"@

# Return the hashtable
ConvertFrom-StringData -StringData $OutputData;

#2 - Get Missing updates (static version number)

# Set the version number to test
$VersionNumber = '11.00.2342.00' # SQL 2012

# Call the API to get the known missing updates (released since current version)
$VersionData = Invoke-RestMethod -Uri "http://sqlserverbuildsapi.azurewebsites.net:80/api/SQLVersions?Version=$($VersionNumber)&MissingUpdates=$true";

# return the output, it is already formated as an Array
$VersionData

# or as a table
$VersionData | Format-Table -Property Version,Branch,Update,@{Name="Released";Expression={Get-Date $_.Released -Format 'yyyy-MM-dd'}},Description -AutoSize

Other Examples
There are more examples in my GitHub repository - https://github.com/Matticusau/SQLServerBuildsAPI-Examples

Including:

Simple static value examples like above
Both Single and Multiple Instance examples with live version number data direct from a SQL Instance
c# Windows Form client

Just want to see it in action, I have an MVC form available at https://sqlserverbuildsportal.azurewebsites.net

Where is the source code for the API

At this time I am not making the API opensource.

Want to partner with me

This is a personal project. If you have a similar project or a project that you would like to use this API in please reach out and lets improve the world together.

If you have any questions around this API please make sure you comment or reach out to me through any of my contact means.

Friday, May 29, 2015

SQL Server 2016: Query Store first looks

In case you missed the announcement SQL Server 2016 CTP2 preview was publicly announced today. You can read about it in the following post:

http://blogs.technet.com/b/dataplatforminsider/archive/2015/05/27/sql-server-2016-first-public-preview-now-available.aspx

There are already some training materials up on https://technet.microsoft.com/en-us/virtuallabs

The first new feature I noticed was a new type of actual query plan to include, Live Query Statistics.

When enabled the actual query plan is displayed as the query is executed, however it now also displays the real time progress as each stage of the query is executed. You can see this in the following screen shot where the Sort is at 65% (and subsequently all tasks after it are also at 65%). I wasn't fast enough to capture it before it reached the sort but it really does visually show you real time query execution. I'm going to have to find a really large workload to try it out on ;)

One of the new feature Query Store is one I have been "geeking out" on since earlier in the year and I am so glad I can now talk about it publicly and here is my first look at the feature first hand.

First off when you view the properties of a database there is a new configuration section of "Query Store" as seen in the following screen shot.

Before you can access this feature you need to enable it on each database. Like all data collection there is a small resource overhead but it's minimal due to the layer of integration so just enable it.... and given the product is in preview status should there be performance impact file a Connect Item and let the product team know.

So now you have enabled it you will have the ability to customise the data frequency, buffer hardening (flush), data retention/grooming, etc. Here are the defaults.

In my lab I have adjusted these values to be more granular, for the purpose of this blog, but you will need the find the sweet spot for your environment ;)

In the earlier screen shot of the database properties you can also see that there are graphical representations of the Query Store storage usage and a button to flush the storage if you need to.

Now if you refresh the database in Object Explorer you will see a new Query Store branch which contains some useful views.

These views offer some interesting data that previously we would have had to use extra tools to obtain.

The following shows the Overall Resource Consumption view.

Each of the views has the ability to be configured to adjust the metrics, date range, etc

The most interesting of the views in my opinion is the Top Resource Consuming Queries as this shows the top X queries on the top left box, and the query plans which have been used to return the results in the right top box.

When you select a "plan id" it displays the cached plan for that id in the lower pane.

Why is the so interesting, well like you can see in this screen shot I have had two plans for the same query. One with Parallelism and one without..... and from the bubble chart I can see which one has executed at times with the highest resource consumption and which has executed with the least resource consumption (based on my charts configuration). So from this data you can then make a choice if you should "force" a plan, or if you have forced the plan previously then you can 'unforce' it.

This makes all that effort of trying to figure out if a query plan has changed and is causing a degradation in performance, and then having to extract that plan and create the guide for it. Or many other changes in patterns of performance behavior.

Behind the scenes there is a great deal of data being captured about query execution to which we can access from a number of system tables.

For example on the table sys.query_store_query you can see the data around the internals of query times such as compilation time, parse time, etc.

I am definitely looking forward to exploring this feature further and all the time I am going to get back when looking into performance issue.

Obviously this is CTP2 and features/functionality will change before RTM.

Legal Stuff: As always the contents of this blog is provided “as-is”. The information, opinions and views expressed are those of the author and do not necessarily state or reflect those of any other company with affiliation to the products discussed. This includes any URLs or Tools. The author does not accept any responsibility from the use of the information or tools mentioned within this blog, and recommends adequate evaluation against your own requirements to measure suitability.

Thursday, May 7, 2015

SQL Server 2016 annoucement

SQL Server 2016 has been publically announced with the public Release Candidate coming in the American Summer. I will be using this post to collate blog posts and announcements regarding SQL 2016, and in particular the features which I am looking forward to.

SQL Server 2016 public preview coming this summer
http://blogs.technet.com/b/dataplatforminsider/archive/2015/05/04/sql-server-2016-public-preview-coming-this-summer.aspx

Query Store
http://slavasql.blogspot.com.au/2014/11/newest-sql-server-feature-query-store.html

Legal Stuff: As always the contents of this blog is provided “as-is”. The information, opinions and views expressed are those of the author and do not necessarily state or reflect those of any other company with affiliation to the products discussed. This includes any URLs or Tools. The author does not accept any responsibility from the use of the information or tools mentioned within this blog, and recommends adequate evaluation against your own requirements to measure suitability.