Thursday, November 15, 2018

Power BI Resources

The following is a list of resources useful for Power BI training and product news:

Product News


Power BI Blog
https://powerbi.microsoft.com/en-us/blog/


Syntax and Language Reference

Dax Reference

Social Media Resources

Follow these resources on twitter:
Power BI @MSPowerBI 
Will Thompson @Will_MI77

Training and Other Resources

Youtube channels (for training and learning):


Legal Stuff: The contents of this blog is provided “as-is”. The information, opinions and views expressed are those of the author and do not necessarily state or reflect those of any other company with affiliation to the products discussed. This includes any URLs or Tools. The author does not accept any responsibility from the use of the information or tools mentioned within this blog, and recommends adequate evaluation against your own requirements to measure suitability.

Saturday, July 7, 2018

PowerBI calling a REST Api for each row in a table.

Take this scenario.
Your writing a PowerBI report to pull data from a REST Api. The Api doesn't provide a search route which returns all the rows and data you require in one call. You can retrieve a list of ID's for the object type you require but a separate call is required to get the individual details per row. Or you may have an existing table with the IDs stored, and require additional details which is provided by an API call per row.

I was asked about this scenario by a colleague, and it is definitely able to be solved with PowerBI. I will urge caution though as if you have large recordset on the base table, and you plan to do an API call per row, that will be slow and result in a large number of calls to the API in a short duration which could be flagged as an attack by some monitoring systems.

Demo API

For this blog I have provided a demo NodeJS API that you can download and run in VSCode, but you could also use any public API, such as my SQL Versions API (details on that will follow at a later time).

To run the API, follow the instructions in README.md.


Demo Data

The demo API has some static sample data. A collection of Course Instructors, which has a mapping to the Course collection and also a mapping to a Person collection. The complete sample data was taken from an old .Net app sample (https://msdn.microsoft.com/library/bb399731(v=vs.100).aspx).

Here is a snippet of the structure and data:

CourseInstructor = [
    {
      "CourseID": "1045",
      "PersonID": "5"
    },
    ...
];
Course = [
    {
        "CourseID": "1045",
        "Title": "Calculus",
        "Credits": "4",
        "DepartmentID": "7"
    },
    ...
];
Person = [
    {
        "PersonID": "1",
        "LastName": "Abercrombie",
        "FirstName": "Kim",
        "HireDate": "1995-03-11 00:00:00.000",
        "EnrollmentDate": null
    },
    ...
];



Lets get start

1. Get the base source
For this walk through lets start with the CourseInstructor data. This is a list of CoursID's and PersonIDs, but no other details. We will then use the individual API calls to build the completed table.
In the Demo API the API route is http://localhost:3000/api/courseinstructor.

In PowerBI select Get Data

In the form type "web" to narrow the results and locate the


In the next screen enter the API route (e.g. http://localhost:3000/api/courseinstructor). This should be a HTTPS uri for security.



If your API needs Authorization, or a Token, then you should set the appropriate headers. I have an example in one completed solution at the end.

2. Convert to table
The results will first be shown in a list as PowerBI will use the Json.Document() function to retrieve the data.



Click the "To Table" button in the menu ribbon.



In the To Table dialog you should be able to leave the defaults. However if the API JSON data isn't very structured, in that each document has different elements and size, then you may need to experiment with the error handling options.




3. Expand the columns
The results will now be a typical referenced table which you need to expand the columns on. Click on the expand columns button.


Select the required columns. This will depend on your API data, in the demo data cases we want both columns as we will add additional API routes to get that data. 

My tip is to untick the "Use original column name as prefix" option. Or if you want to use that make sure you have renamed the column to a meaningful name prior to this step. You can also always add a meaningful name by editing the step settings afterwards, but before you proceed with other steps.


The results should now be expanded as required.





4. Expand the data with additional API calls per row
Now for the fun part. We now have rows of data which contain our ID(s). To expand this data we need to make additional API calls for each row. To achieve this we add a custom column, and dynamically build the API Uri.

Click the Add Column tab in the ribbon, and click Custom Column.



Give the column a meaning full name (e.g. CourseCollection).

Here is the tricky part, we have to dynamically build the API Uri. To do this we can use the Text.Combine() function to concatenate the uri and add the column value where the parameter should be. For our demo data the completed Uri will be http://localhost:3000/course/[CourseID]

Adjust the following expression as required:

= Json.Document(
  Web.Contents(
    Text.Combine({"http://localhost:3000/api/course/",[CourseID]})
  )
)

If you need to provide headers for authorization then use the parameters of the web.Contents() function, for example:

= Json.Document(
  Web.Contents(
    Text.Combine({"http://localhost:3000/api/course/",[CourseID]}), [Headers=[#"Z-API-Token"=""]])))


NOTE: If prompted regarding the data privacy, click continue and set as appropriate.

IMPORTANT: The process of updating the table may take some time depending on the number of rows and speed of the API.

The results should now include the custom column (e.g. CourseCollection). 



This table needs to be expanded, the same as earlier by clicking on the expand columns button and then select the desired columns to include. In my demo data case, I am only interested in the Title column.



Your table should now include the data from the additional API call.


Continue to model the data as you require, or additional API calls. In the case of the demo data we will repeat these steps but for the Person column and the API Uri route http://localhost:3000/person/[PersonID]

The completed table will look like:


The query steps will be similar to the following:



The complete M Query for creating the query will be similar to this. This is available from the Advanced Editor of a query, and is a working query for the demo API.

let
    Source = Json.Document(Web.Contents("http://localhost:3000/api/courseinstructor")),
    #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"CourseID", "PersonID"}, {"CourseID", "PersonID"}),
    #"Added CourseCollection" = Table.AddColumn(#"Expanded Column1", "CourseCollection", each Json.Document(
  Web.Contents(
    Text.Combine({"http://localhost:3000/api/course/",[CourseID]})
  )
)),
    #"Expanded CourseCollection" = Table.ExpandRecordColumn(#"Added CourseCollection", "CourseCollection", {"Title"}, {"Title"}),
    #"Added PersonCollection" = Table.AddColumn(#"Expanded CourseCollection", "PersonCollection", each Json.Document(
  Web.Contents(
    Text.Combine({"http://localhost:3000/api/person/",[PersonID]})
  )
)),
    #"Expanded PersonCollection" = Table.ExpandRecordColumn(#"Added PersonCollection", "PersonCollection", {"LastName", "FirstName"}, {"LastName", "FirstName"})
in
    #"Expanded PersonCollection"



Demo Files and Source

I have provided a completed PBIX in the github repository. 







Legal Stuff: The contents of this blog is provided “as-is”. The information, opinions and views expressed are those of the author and do not necessarily state or reflect those of any other company with affiliation to the products discussed. This includes any URLs or Tools. The author does not accept any responsibility from the use of the information or tools mentioned within this blog, and recommends adequate evaluation against your own requirements to measure suitability.


Saturday, June 9, 2018

New SQL Version Life Cycle tool

Introducing my latest SQL Server Life Cycle and Versions lookup tool.

https://sqlversions.azurewebsites.net

Back in 2016 I released the first version of this app and the goals were the same. Provide an interface to easily lookup a SQL version number and find out the support status of that released. All backed with an API that could be queried from scripts and other tools.

This new release is a complete re-write and includes a modern responsive site written in Angular (because I felt the need to learn).

With this initial release I am providing the following capabilities in the UI:

Version Search

The search feature allows you to lookup a specific release of SQL Server or version number and then see all the releases that match that search result. In the results you can see the release date, mainstream and extended support dates.



Clicking on a row in the results will open the



Life Cycle

The life cycle search page is just a page to quickly lookup when certain branches of a product release will end mainstream support or extended support. This was created as while onsite consulting I sometimes need the ability to quickly look up life cycle information rather than a specific version. It was also a "wish list feature" from a customer I was working with at the time.




Health Check [beta]

The health check is a beta release of my end goal which is to allow you to enter your version number and get basic recommendations about how to upgrade. The initial release simply recommends to either start planning to upgrade or to urgently upgrade depending on that versions support status. My vision for this feature is to provide guidance about updates that have been released on the same branch to stay current and also life cycling plans for upgrades to service packs or next release.




API

One of my main goals with this app was to ensure that it had an API which could be queried and then used to extend other tools and scripts. For example you could incorporate it into a Health Check Script to retrieve the version number from the SQL Instance and then call the API to get information about the life cycle and support of that release. Or you could use Power BI taking data from your CMDB and performing a lookup against the API for the rows or a search against the API and store that as a query with a relationship link.

The following code provides an example of using the API in a Health Check script. The code is available at https://gist.github.com/Matticusau/5778b90507cb7274deebc12cf4360c1c

# Import the SQL Module
Import-Module SqlServer;

# Get the version number via appropriate method
$VersionNumber = Invoke-Sqlcmd -ServerInstance $SqlServerName -Query "SELECT SERVERPROPERTY('PRODUCTVERSION')";
$VersionNumber = $VersionNumber.Column1;

# Call the API to get the version information
$VersionData = Invoke-RestMethod -Uri "http://sqlserverbuildsapi.azurewebsites.net/api/builds?version=$($VersionNumber)";

# Want to improve your Health Check script, calculate the health of Support Status
if ($VersionData.ExtendedSupportEnd -le (Get-Date)) {$SupportStatus = 'Critical'}
elseif ($VersionData.MainstreamSupportEnd -le (Get-Date)) {$SupportStatus = 'Warning'}
else {$SupportStatus = 'Ok'}

# format the output data string
$OutputData = @"
Instance = $($Instance.Name)
Version = $($VersionData.BuildVersion)
Product = $($VersionData.ProductName)
Branch = $($VersionData.BranchName)
Update = $($VersionData.BuildUpdate)
MainstreamSupportEnd = $($VersionData.SupportEndMainstream)
ExtendedSupportEnd = $($VersionData.SupportEndExtended)
SupportStatus = $($SupportStatus)
"@

# Return the hashtable
$OutputData

Soon, I will update the samples repository I provided for the previous version to reflect the new API syntax.




Lastly a note on the data. There is no dynamic link from this data to any Microsoft or other site. The data is provided "as-is" and manually maintained by myself and a few trusted peers. We do our best to make sure it is up to date and accurate but for any business critical or commercial decision make sure you refer to the official sources.

If you like this tool, or have some ideas for improvements, or even notice inaccuracies in the data please let me know.


Legal Stuff: The contents of this blog is provided “as-is”. The information, opinions and views expressed are those of the author and do not necessarily state or reflect those of any other company with affiliation to the products discussed. This includes any URLs or Tools. The author does not accept any responsibility from the use of the information or tools mentioned within this blog, and recommends adequate evaluation against your own requirements to measure suitability.

Sunday, May 27, 2018

Convert CSV files to Parquet using Azure HDInsight

A recent project I have worked on was using CSV files as part of an ETL process from on-premises to Azure and to improve performance further down the stream we wanted to convert the files to Parquet format (with the intent that eventually they would be generated in that format). I couldn't find a current guide for stepping through that process using Azure HDInsight so this post will provide that.

Scripts and what samples used in this guide are available https://github.com/Matticusau/SQLDemos/tree/master/HDInsightConvertCSVtoParquet

To follow this blog post make sure you have:

  1. Create a resource group in your Azure Subscription
  2. Create a Storage Account within the resource group
  3. Create an Azure HDInsight resource the same resource group (you can use that storage account for HDInsight)
  4. Upload the sample GZip compressed CSV files from the SampleData folder to the Storage Account using Azure Storage Explorer. In my case I uploaded to a container "DataLoad"
The work that we will perform will be within the Jupiter Notebook. 

From your Azure Portal locate the HDInsight resource, click the Cluster dashboard quick link


Now select the Jupiter Notebook


This will open a new tab/window.

Authenticate as the cluster administrator.

Create a new PySpark Notebook.


Paste the following lines and press Shift+Enter to run the cell.


from pyspark.sql import *
from pyspark.sql.types import *



Now we can import the CSV into a table. You will need to adjust the path to represent your storage account, container and file. The syntax of the storage path is wasb://mycontainer@myaccount.blob.core.windows.net/foldername/filename


# import the COMPRESSED data
csvFile = spark.read.csv('wasb://dataload@mlbigdatastoracc.blob.core.windows.net/SalesSample_big.csv.gz', header=True, inferSchema=True)
csvFile.write.saveAsTable("salessample_big")


Press Shift+Enter to run the cell

Once complete you can use the SQL language to query the table you imported the data to. This will create a dataframe to host the output as we will use this to write the parquet file.

dftable = spark.sql("SELECT * FROM salessample_big")
dftable.show()

The final step is to export the dataframe to a parquet file. We will also use the gzip compression.

dftable.write.parquet('wasb://dataload@mlbigdatastoracc.blob.core.windows.net/SalesSample2.parquet',None, None , "gzip")

The complete Jupiter Notebook should look like:



In your storage account you should have a Parquet export of the data (note that this format is not a single file as shown by the file, folder and child files in the following screen shots.





In this example you may notice that the compressed file sizes are not much different, yet the parquet file is slightly more efficient. You experience may vary as it depends on the content within the CSV file.


Some reference material worth checking out if this is something you are working on:

Legal Stuff: The contents of this blog is provided “as-is”. The information, opinions and views expressed are those of the author and do not necessarily state or reflect those of any other company with affiliation to the products discussed. This includes any URLs or Tools. The author does not accept any responsibility from the use of the information or tools mentioned within this blog, and recommends adequate evaluation against your own requirements to measure suitability.

Saturday, April 21, 2018

New SqlOps Extensions

Announcing my new SqlOps Extensions

AlwaysOn Insights
https://github.com/Matticusau/sqlops-alwayson-insights/releases

MSSQL Instance Insights
https://github.com/Matticusau/sqlops-mssql-instance-insights/releases

MSSQL Db Insights
https://github.com/Matticusau/sqlops-mssql-db-insights/releases


Previously I had released a single extension, which logically was actually three separate extensions but due to the methods used at the time it could be released as a single extension. With the added support for Extension Marketplace in SqlOps March release it made senses to break these out. Going forward this will also help with the life cycle of the extensions to manage them individually.

If you want to know how I wrote these extensions check out the following posts by Kevin Cunnane

Writing a SQL Operations Studio Extension in 15 minutes
https://medium.com/@kevcunnane/writing-a-sql-operations-studio-extension-in-15-minutes-7dfd24a74dfe

Publishing an extension for SQL Operations Studio
https://medium.com/@kevcunnane/publishing-an-extension-for-sql-operations-studio-f5a5b323c13b


Monday, October 2, 2017

Troubleshooting SQL Server AlwaysOn AG Auto Seeding

SQL Server 2016 introduced a new capability with AlwaysOn Availability Groups called Automatic Seeding. This allows you to automatically start streaming the backup using VDI across the log stream transport to the secondary replica. With this capability enabled you do not have to manually backup and restore the database and transaction logs before starting data synchronization with the primary replica.

Now there are some prerequisites for using Automatic Seeding such as:

  • Data and Log file paths must be the same on all Replicas
  • Databases must be in Full recovery model and at least one Full backup must have been taken of the database (to start the recovery chain)
  • All other prerequisites for a database to be added to an Availability Group must also be met


You can enable Automatic Seeding either before or after creating the Availability Group.

Sometimes you will find that Automatic Seeding doesn't work, in our example it will be because a data file already exists with the same name as the database we are seeding. Which is a common problem if you have previously removed the replica from the AG and are trying to re-join it. Unfortunately the UI doesn't give you any indication that seeding failed.

Enter Extended Events.

To demonstrate this I will create an Availability Group across two replicas. I am using the SQL Server AlwaysOn solution available from the Azure Market Place as this has an Availability Group already built, but for this demonstration I will create a new AG and Demo databases.

1. Setup the environment
I have added an additional disk to the SQLSERVER-0 node, and formatted this new volume and assigned the drive letter G:. I also created the path G:\Log and set the file permissions to full control for the sql service account.

2. Create the Demo Database
The following script can be used to setup the environment.

[01.CreateDemoDbs.sql]

3. Create the Availability Group
The following script can be used to setup the environment.

[02.CreateAG.sql]


4. Verify Environment
Now you should have a working Availability Group with the AutoseedDb01 synchronized and healthy between two replicas.



5. Create AlwaysOn_AutoseedMonitor Extended Events session
While we enabled the standard AlwaysOn Health extended events session, run the following script to create a new Extended Events session for monitoring Autoseeding

[03.ExtendedEventsSession.sql]

If you want to create this manually or to explore what other events are available you will need to make sure you select the Debug channel when using the Wizard to select events.



You can also filter the category to "dbseed" to further just view the events that relate to auto seeding



6. Add the 2nd database to AG
Now use the following script to add the 2nd database to the AG. NOTE: This DB has the log file on G:\ which does not exist on the replica.

[04.AddDbToAG.sql]

7. Investigating the Auto Seeding
You will notice that the DB was not created on the replica and it is listed in the AG's DBs but with a warning.



If you open the AG Dashboard you will see the warning message as well.



The messages do not include much detail.

8. Using DMVs
We can use the following DMVs to query the status of the seeding.

[05.DMVs.sql]

9. Using Extended Events
We are capturing the "error_reported" event along with the dbseed events. If you view the Event File on the replica where the failure happened. Locate the hadr_physical_seeding_progress event. There will be a lot of these events. In the below screen shot you can see the progress event for one of the last stages of the seeding process where it reports the failure.



Locate the hadr_physical_seeding_progress event with the internal_state_desc of "WaitingForBackupToStartSending". This is an early step in the auto seed process, then you should see error_reported events. Here is where you can find the real errors. Most likely it will be the first one reported. In our case it is:
Directory lookup for the file "G:\LOG\AutoseedDb02_log.ldf" failed with the operating system error 3(The system cannot find the path specified.).




There will be many more cases where this is useful. I have also used it before when I had old data or log files on the server from previous versions of a database I was adding back into the AG as sometimes that operation will fail (depending on permissions).

Let me know what situations you find


All the scripts used in this post are located in my GitHub https://github.com/Matticusau/SQLDemos/tree/master/SQLAlwaysOnAGAutoSeeding



Reference

Automatic seeding for secondary replicas
https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/automatic-seeding-secondary-replicas

Automatically initialize Always On Availability group
https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/automatically-initialize-always-on-availability-group



Legal Stuff: The contents of this blog is provided “as-is”. The information, opinions and views expressed are those of the author and do not necessarily state or reflect those of any other company with affiliation to the products discussed. This includes any URLs or Tools. The author does not accept any responsibility from the use of the information or tools mentioned within this blog, and recommends adequate evaluation against your own requirements to measure suitability.


Wednesday, September 27, 2017

Checklist for troubleshooting compilation of DSC Configuration in Azure Automation

I was recently working on a solution where a DSC Configuration block was not compiling in Azure Automation. This solution included Node Data, DSC Resources, and custom Composite Resources. The configuration would compile perfectly fine locally but not when published to Azure Automation. The other challenging aspect was that other composite resources within the same module were compiling fine.

Unfortunately Azure Automation doesn’t provide very detailed information for troubleshooting DSC Compilation errors. It actually will only show you the first level failure, which when your using a DSC Composite Resource it means you will simply receive an error that the composite resource failed to import, but the actual cause could be related to a problem with an underlying resource used by that composite resource.

So based on my experience I have come up with the following troubleshooting checklist when working through DSC Compilation errors in Azure Automation.

Troubleshooting Checklist
  1. Check the exception details output by the DSC Configuration compilation job that is in the suspended state. Either within the Azure Portal like the following screen shot or via PowerShell by the Get-AzureRmAutomationDscCompilationJobOutput cmdlet.



    Depending on the exception reported the next steps may vary. In the above screenshot it is reporting that a Composite Resource has failed to import.
  2. Can you compile the configuration locally?
    1. If yes, can you upload the MOF to the DSC Node Configurations in the Azure Automation account?
  3. Are all required Modules referenced by your Configuration(s):
    1. Uploaded to your Azure Automation account
    2. Up-to-date (see next point though)
    3. Match the required version by your Configuration block or Composite Resource
  4. If it is a Composite resource that is failing, are all Composite resources within your module affected or is it just a subset?
  5. If it is a Composite Resource, extract the configuration from the failing Composite Resource and place it directly in a Configuration block. Compile that configuration block in Azure Automation and review the output as this will provide more granular details about the specific resources used by that configuration block.
  6. Try simplifying the DSC Configuration block to reduce the number of DSC Composite resources or other resources being compiled to help narrow down the culprit

You should also read the "Common errors when working with Desired State Configuration (DSC)" section in the official documentation https://docs.microsoft.com/en-us/azure/automation/automation-troubleshooting-automation-errors


Legal Stuff: The contents of this blog is provided “as-is”. The information, opinions and views expressed are those of the author and do not necessarily state or reflect those of any other company with affiliation to the products discussed. This includes any URLs or Tools. The author does not accept any responsibility from the use of the information or tools mentioned within this blog, and recommends adequate evaluation against your own requirements to measure suitability.