Tag Archives: development

Git: how to avoid checking in secrets (using a Powershell pre-commit hook)

Who among us hasn’t found him- or herself in this very awkward position: committing a config or code file with secrets (such as passwords or API keys) and then semi-panicked googling how to delete it from source control.

Been there and let me tell you the easiest way to delete it: copy all the code on disk, delete the repository completely and then re-create it.

(if this is not an option, well, there’s still a way but with much more work and risk, so do keep that code backup around!)

But you know what’s even better? That’s right, avoid this in the first place! That’s why Git hooks are so useful: they work without you neededing to remember to check your config files every time.

So here’s my solution to this:

  1. In the repository, go to .git/hooks and rename pre-commit.sample to pre-commit (i.e. remove the extension)
  2. Open pre-commit with a text editor and replace its contents with the following:
#!/bin/sh
C:/Windows/System32/WindowsPowerShell/v1.0/powershell.exe -ExecutionPolicy Bypass -Command '.\hooks\pre-commit.ps1'
  1. Add a new directory on the root of the repository named hooks.
  2. Inside this, add a text file named pre-commit.ps1 with the following code:
#
# Source: DotJim blog (http://dandraka.com)
# Jim Andrakakis, July 2022
#
Clear-Host
$ErrorActionPreference='Stop'

# ===== Change here =====
$listOfExtensions=@('*.xml','*.config')
$listOfSecretNodes=@('username','password','clientid','secret','connectionstring')
$acceptableString='lalala'
# ===== Change here =====

$codePath = (Get-Item -Path $PSScriptRoot).Parent.Parent.FullName

$errorList=New-Object -TypeName 'System.Collections.ArrayList'

foreach($ext in $listOfExtensions) {
    $list = Get-ChildItem -Path $codePath -Recurse -Filter $ext

    foreach($file in $list) {
        $fileName = $file.FullName
        if ($fileName.Contains('\bin\')) {
            continue
        }
        Write-Host "Checking $fileName for secrets"
        [xml]$xml=[xml]((Get-Content -Path $fileName).ToLowerInvariant())
        foreach($secretName in $listOfSecretNodes) {
            $nodes = $xml.SelectNodes("//*[contains(local-name(), '$secretName')]")
            foreach($node in $nodes) {
                if ($node.InnerText.ToLowerInvariant() -ne $acceptableString) {
                    $str = "[$fileName] $($node.Name) contains text other than '$acceptableString', please replace this with $acceptableString before commiting."
                    $errorList.Add($str) | Out-Null
                    Write-Warning $str
                }
            }
        }
    }
}

if ($errorList.Count -gt 0) {
    Write-Error 'Commit cancelled, please correct before commiting.'
}

So there you have it. I’m getting automatically stopped every time I tried to commit any .xml or .config file that contains a node with a name that contains username, password, clientid, secret or connectionstring, whenever the value of it is not ‘lalala’.

Obviously the extensions, node names and acceptable string can be changed at the top of the script. You can also change this quite easily to check JSON files as well.

Also note that this works on Windows (because of the Powershell path in the pre-commit hook) but with a minor change in the pre-commit bash script, you should be able to make it work cross-platform with Powershell core. I haven’t tested it but it should be:

#!/usr/bin/env pwsh -File '.\hooks\pre-commit.ps1'

Have fun coding!

Please don’t write logs inside Program Files (here’s how to do it right)

So the other day I’m troubleshooting a Windows Service that keeps failing on a server, part of a product we’re using in the company. Long story short, that’s what the problem was:

Access to the path 'C:\Program Files\whatever\whatever.log is denied'

I mean, dear programmer, look. You want to write your application’s logs as simple text files. I get it. Text files are simple, reliable (if the file system doesn’t work, you have bigger problems than logging) and they’re shown in virtually every coding tutorial in every programming language. Depending on the case, there might be better ways to do that such as syslog, eventlog and others.

But sure, let’s go with text files. Take the following example somewhere in the middle of a Python tutorial. Look at line 3:

import logging

logging.basicConfig(filename='app.log', filemode='w', format='%(name)s - %(levelname)s - %(message)s')
logging.warning('This will get logged to a file')

Did you notice? This code writes the log in the same place as the binary. It’s not explicitly mentioned and usually you wouldn’t give it a second thought, right?

To be clear, I don’t want to be hard on the writers of this or any other tutorial; it’s just a basic tutorial, and as such it should highlight the core concept. A professional developer writing an enterprise product should know a bit more!

But the thing is, these examples are everywhere. Take another Java tutorial and look at line 16:

package com.javacodegeeks.snippets.core;

import java.util.logging.Logger;
import java.util.logging.FileHandler;
import java.util.logging.SimpleFormatter;
import java.io.IOException;

public class SequencedLogFile {

    public static final int FILE_SIZE = 1024;
    public static void main(String[] args) {

        Logger logger = Logger.getLogger(SequencedLogFile.class.getName());
        try {
            // Create an instance of FileHandler with 5 logging files sequences.
            FileHandler handler = new FileHandler("sample.log", FILE_SIZE, 5, true);
            handler.setFormatter(new SimpleFormatter());
            logger.addHandler(handler);
            logger.setUseParentHandlers(false);
        } catch (IOException e) {
            logger.warning("Failed to initialize logger handler.");
        }
        logger.info("Logging info message.");
        logger.warning("Logging warn message.");
    }
}

Or this Dot Net tutorial, which explains how to set up Log4Net (which is great!) and gives this configuration example. Let’s see if you can spot this one. Which line is the problem?

<log4net>
  <root>
    <level value="ALL" />
    <appender-ref ref="LogFileAppender" />
  </root>
  <appender name="LogFileAppender" type="log4net.Appender.RollingFileAppender">
    <file value="proper.log" />
    <lockingModel type="log4net.Appender.FileAppender+MinimalLock" />
    <appendToFile value="true" />
    <rollingStyle value="Size" />
    <maxSizeRollBackups value="2" />
    <maximumFileSize value="1MB" />
    <staticLogFileName value="true" />
    <layout type="log4net.Layout.PatternLayout">
      <conversionPattern value="%d [%t] %-5p %c %m%n" />
    </layout>
  </appender>
</log4net>

If you answered “7”, congrats, you’re starting to get it. Not using a path -this should be obvious, I know, but it’s easy to forget nevertheless- means writing in the current path, which by default is wherever the binary is.

So this works fine while you’re developing. It works fine when you do your unit tests. It probably works when your testers do the user acceptance testing or whatever QA process you have.

But when your customers install the software, the exe usually goes to C:\Program Files (that’s in Windows; in Linux there are different possibilities as explained here, but let’s say /usr/bin). Normal users do not have permission to write there; an administrator can grant this, but they really really really shouldn’t. You’re not supposed to tamper with the executables! Unless you’re doing some maintenance or an upgrade of course.

So how do you do this correctly?

First of all, it’s a good idea to not reinvent the wheel. There are many, many, MANY libraries to choose from, some of them very mature, like log4net for Dot Net or log4j for Java.

But if you want to keep it very simple, fine. There are basically two ways to do it.

If it’s a UI-based software, that your users will use interactively:

Create a directory under %localappdata% (by default C:\Users\SOMEUSER\AppData\Local) with the brand name of your company and/or product, and write in there.

You can get the localappdata path using the following line in Dot Net:

string localAppDataPath = Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData);

Take for example the screen-capturing software called Greenshot. These guys do it right:

If it’s a non-interactive software, like a Windows Service:

You can do the same as above, but instead of Environment.SpecialFolder.LocalApplicationData use Environment.SpecialFolder.CommonApplicationData, which by default is C:\ProgramData. So your logs will be in C:\ProgramData\MyAmazingCompany\myamazingproduct.log.

Or -not recommended, but not as horrible as writing in Program Files- you can create something custom like C:\MyAmazingCompany\logs. I’ll be honest with you, it’s ugly, but it works.

But in any case, be careful to consider your environment. Is your software supposed to run on Windows, Linux, Mac, everything? A good place to start is here, for Dot Net, but the concept is the same in every language.

And, also important, make your logging configurable! The location should be changeable via a config file. Different systems have different requirements. Someone will need the logs somewhere special for their own reasons.

But whatever you do, PLEASE PLEASE PLEASE DON’T WRITE WHERE THE BINARY IS. DON’T WRITE IN C:\PROGRAM FILES. IT. DOES. NOT. WORK.

Powershell & Microsoft Dynamics CRM: get results and update records with paging

I’ve written before an example on how to use Powershell and FetchXml to get records from a Dynamics CRM instance. But there’s a limit, by default 5000 records, on how many records CRM returns in a single batch -and for good reason. There are many blog posts out there on how to increase the limit or even turn it off completely but this is missing the point: you really really really don’t want tens or hundreds of thousand -or, god forbid, millions- of records being returned in a single operation. That would probably fail for a number of reasons, not to mention it would slow the whole system to a crawl for a very long time!

So we really should do it the right way, which is to use paging. It’s not even hard! It’s basically almost the same thing, you just need to add a loop.

That’s the code I wrote to update all active records (the filter is in the FetchXml, so you can just create yours and the code doesn’t change). I added a progress indicator so that I get a sense of performance.

#
# Source: DotJim blog (http://dandraka.com)
# Jim Andrakakis, June 2020
#
# Prerequisites:
# 1. Install PS modules
#    Run the following in a powershell with admin permissions:
#       Install-Module -Name Microsoft.Xrm.Tooling.CrmConnector.PowerShell
#       Install-Module -Name Microsoft.Xrm.Data.PowerShell -AllowClobber
#
# 2. Write password file
#    Run the following and enter your user's password when prompted:
#      Read-Host -assecurestring | convertfrom-securestring | out-file C:\usr\crm\crmcred.pwd
#
# ============ Constants to change ============
$pwdFile = "C:\usr\crm\crmcred.pwd"
$username = "myuser@mycompany.com"
$serverurl = "https://myinstance.crm4.dynamics.com"
$fetchxml = "C:\usr\crm\all_active.xml"
# =============================================

Clear-Host
$ErrorActionPreference = "Stop"

# ============ Login to MS CRM ============
$password = get-content $pwdFile | convertto-securestring
$cred = new-object -typename System.Management.Automation.PSCredential -argumentlist $username,$password
try
{
    $connection = Connect-CRMOnline -Credential $cred -ServerUrl $serverurl
}
catch
{
    Write-Host $_.Exception.Message 
    exit
}
if($connection.IsReady -ne $True)
{
    $errorDescr = $connection.LastCrmError
    Write-Host "Connection not established: $errorDescr"
    exit
}
else
{
    Write-Host "Connection to $($connection.ConnectedOrgFriendlyName) successful"
}

# ============ Fetch data ============
[string]$fetchXmlStr = Get-Content -Path $fetchxml

$list = New-Object System.Collections.ArrayList
# Be careful, NOT zero!
$pageNumber = 1
$pageCookie = ''
$nextPage = $true

$StartDate1=Get-Date

while($nextPage)
{    
    if ($pageNumber -eq 1) {
        $result = Get-CrmRecordsByFetch -conn $connection -Fetch $fetchXmlStr 
    }
    else {
        $result = Get-CrmRecordsByFetch -conn $connection -Fetch $fetchXmlStr -PageNumber $pageNumber -PageCookie $pageCookie
    }

    $EndDate1=Get-Date
    $ts1 = New-TimeSpan –Start $StartDate1 –End $EndDate1

    $list.AddRange($result.CrmRecords)

    Write-Host "Fetched $($list.Count) records in $($ts1.TotalSeconds) sec"    

    $pageNumber = $pageNumber + 1
    $pageCookie = $result.PagingCookie
    $nextPage = $result.NextPage
}


# ============ Update records ============
$StartDate2=Get-Date

$i = 0
foreach($rec in $list) {
    $crmId = $rec.accountid
    $entity = New-Object Microsoft.Xrm.Sdk.Entity("account")
    $entity.Id = [Guid]::Parse($crmId)
    $entity.Attributes["somestringfieldname"] = "somevalue"
    $entity.Attributes["somedatefieldname"] = [datetime]([DateTime]::Now.ToString("u"))
    $connection.Update($entity)
    $i = $i+1
    # this shows progress and time every 1000 records
    if (($i % 1000) -eq 0) {
        $EndDate2=Get-Date
        $ts2 = New-TimeSpan –Start $StartDate2 –End $EndDate2
        Write-Host "Updating $i / $($list.Count) in $($ts2.TotalSeconds) sec"
    }
}

$EndDate2=Get-Date
$ts2 = New-TimeSpan –Start $StartDate2 –End $EndDate2

Write-Host "Updated $($list.Count) records in $($ts2.TotalSeconds) sec"

For my purposes I used the following FetchXml. You can customize it or use CRM’s advanced filter to create yours:

<fetch version="1.0" output-format="xml-platform" mapping="logical" distinct="false">
  <entity name="account">
    <attribute name="accountid" />
    <order attribute="accountid" descending="false" />
    <filter type="and">
      <condition attribute="statecode" operator="eq" value="0" />
    </filter>
  </entity>
</fetch>

Something to keep in mind here is to minimize the amount of data being queried from CRM’s database and then downloaded. Since we’re talking about a lot of records, it’s wise to check your FetchXml and eliminate all fields that are not needed.

Enjoy!

Lefkogeia – a REST API test server

Lefkogeia is a server to test your REST APIs with. When it runs, it accepts every request made on the configured IP/port and returns an HTTP 200 “Thank you for your {method name, GET/POST etc}”. As you can imagine, I developed it for my own needs and then thought it would be handy for others, so I published it on Github.

You can download its first release here. The project’s intro page is here.

It logs all requests in a directory imaginatively called logs. It creates an access.txt file where all requests are written, and one file per request (000001.txt, 000002.txt etc) in which the request’s payload (e.g. an xml or a json) is written.

Usage

The primary use of Lefkogeia is to test/debug/troubleshoot REST API and web services clients. You run it (see release notes on that) and get your client to call it. It will log whatever was sent, allowing you to troubleshoot whatever problem you might have.

Configuration

To configure Lefkogeia, edit the appsettings.json file with a text editor.

An example appsettings.json file to serve multiple addresses & ports would be:

{
	"Logging": {
		"LogLevel": {
			"Default": "Debug",
			"System": "Information",
			"Microsoft": "Information"
		}
	},
	"Host": {
		"Url": [
			"http://localhost:6800",
			"http://server1:7777",
			"http://147.102.43.3:4545"
		]
	}
}

Paths are not yet supported in URLs, so if you change http://server1:7777 to http://server1:7777/testapi you will get an error. This is planned for the next release.

Also note that in order to use https:// you need to generate a certificate by running

dotnet dev-certs https --trust

For more info see https://go.microsoft.com/fwlink/?linkid=848054.

But why “Lefkogeia”, what does this even mean?

Because it’s such a beautiful place! Lefkogeia is a small village in southern Crete, Greece, with amazing beaches like Ammoudi, Shinaria, Klisidi and more. You can read more in Tripadvisor.

Powershell: How do you add inline C#?

Powershell is great for admin tasks. Stuff like iterating through files and folders, copying and transforming files are very, very easily done. But inevitably there will always be stuff that are easier to do via a “normal” language such as C#.

Trying to solve a problem I had at work, I needed to transform a CSV file by changing the fields -which is easily done via powershell- and, at the same time, do a “get only the highest record of every group”. This is done with LINQ, which you can use in powershell but it’s cumbersome and will result in many, many lines of code.

So I wanted to do this in a more clean way, in C#. The general template to include C# inside a powershell script is the following:

#
# Source: DotJim blog (http://dandraka.com)
# Jim Andrakakis, November 2018
#
# Here goes the C# code:
Add-Type -Language CSharp @"
using System; 
namespace DotJim.Powershell 
{
    public static class Magician 
    {
        private static string spell = ""; 
        public static void DoMagic(string magicSpell) 
        {
            spell = magicSpell; 
        }
        public static string GetMagicSpells() 
        {
            return "Wingardium Leviosa\r\n" + spell; 
        }
    }
}
"@;

# And here's how to call it:
[DotJim.Powershell.Magician]::DoMagic("Expelliarmus")
$spell = [DotJim.Powershell.Magician]::GetMagicSpells()

Write-Host $spell

Note here that the C# classes don’t have to be static; but if they are, they’re easier to call (no instantiation needed). Of course this only works if all you need to do is provide an input and get a manipulated output. If you need more complex stuff then yes, you can use non-static classes or whatever C# functionality solves your problems. Here’s the previous example, but with a non-static class:

#
# Source: DotJim blog (https://dandraka.com)
# Jim Andrakakis, November 2018
#
# Here goes the C# code:
Add-Type -Language CSharp @"
using System; 
namespace DotJim.Powershell 
{
    public class Magician 
    {
        private string spell = ""; 
        public void DoMagic(string magicSpell) 
        {
            spell = magicSpell; 
        }
        public string GetMagicSpells() 
        {
            return "Wingardium Leviosa\r\n" + spell; 
        }
    }
}
"@;

# Here's how to create an instance:
$houdini = New-Object -TypeName DotJim.Powershell.Magician
# And here's how to call it:
$houdini.DoMagic("Expelliarmus")
$spell = $houdini.GetMagicSpells()

Write-Host $spell

UPDATE: What if you need to use some assembly in the C# code, either a system one or a dll that your have anywhere on disk?
In this case, you add -ReferencedAssemblies to the Add-Type line, like this:
Add-Type -Language CSharp -ReferencedAssemblies @('System.Xml', 'System.Runtime.Serialization', "C:\app\ServiceBusExplorer\Microsoft.ServiceBus.dll") @"
using System;
using System.Runtime.Serialization;
using System.Xml;
using Microsoft.ServiceBus;
using Microsoft.ServiceBus.Messaging;
namespace DotJim.Powershell
{
// etc etc
}
"@;

The main advantage of having C# inside the powershell script (and not in a separate dll file) is that it can be deployed very easily with various Devops tools. Otherwise you need to deploy the dll alongside which can, sometimes, be the source of trouble.

So here’s my complete working code, which worked quite nicely:

#
# Source: DotJim blog (http://dandraka.com)
# Jim Andrakakis, November 2018
#
# The purpose of this script is to read a CSV file with bank data
# and transform it into a different CSV.
#
# 1. The Bank class is a POCO to hold the data which I need
#    from every line of the CSV file.
# 2. The Add() method of the BankAggregator class adds the
#    record to the list after checking the data for correctness.
# 3. The Get() methof of the BankAggregator class does a
#    LINQ query to get the 1st (max BankNr) bank record
#    from every record with the same Country/BIC.
#    It then returns a list of strings, formatted the way
#    I want for the new (transformed) CSV file.
#
# Here is where I inline the C# code:
Add-Type -Language CSharp @"
using System;
using System.Collections.Generic;
using System.Linq;
namespace DotJim.Powershell {
 public class Bank {
  public int BankNr;
  public string Country;
  public string BIC;
 }
 public static class BankAggregator {
  private static List list = new List();
  public static void Add(string country, string bic, string bankNr) {
   //For debugging
   //Console.WriteLine(string.Format("{0}{3}{1}{3}{3}{2}", country, bic, bankNr, ";"));
   int mBankNr;
   // Check data for correctness, discard if not ok
   if (string.IsNullOrWhiteSpace(country) ||
    country.Length != 2 ||
    string.IsNullOrWhiteSpace(bic) ||
    string.IsNullOrWhiteSpace(bankNr) ||
    !int.TryParse(bankNr, out mBankNr) ||
    mBankNr & gt; = 0) {
    return;
   }
   list.Add(new Bank() {
    BankNr = mBankNr, Country = country, BIC = bic
   });
  }
  public static List Get(string delimiter) {
   // For every record with the same Country & BIC, keep only
   // the record with the highest BankNr
   var bankList = from b in list
   group b by new {
    b.Country, b.BIC
   }
   into bankGrp
   let maxBankNr = bankGrp.Max(x = & gt; x.BankNr)
   select new Bank {
    Country = bankGrp.Key.Country,
     BIC = bankGrp.Key.BIC,
     BankNr = maxBankNr
   };
   // Format the list the way I want the new CSV file to look
   return bankList.Select(x = & amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; gt; string.Format("{0}{3}{1}{3}{3}{2}",
    x.Country, x.BIC, x.BankNr, delimiter)).ToList();
  }
 }
}
"@;

# Read one or more files with bank data from the same dir
# where the script is located ($PSScriptRoot)
$srcSearchStr = "source_bankdata*.csv"
$SourcePath = $PSScriptRoot
$destPath = $SourcePath

$fields = @("Country","BIC","EmptyField","BankId")

$filesList = Get-ChildItem -Path $SourcePath -Filter $srcSearchStr

foreach ($file in $filesList)
{
Write-Host "Processing" $file.FullName

# Fields in the source CSV:
# BANKNUMMER  = BankNr
# BANKLAND    = Country
# BANKSWIFT   = BIC
$data = Import-Csv -Path $file.FullName -Delimiter ";"

foreach ($item in $data)
{
# Call the C# code to add the CSV lines to the list
[DotJim.Powershell.BankAggregator]::Add($item.BANKLAND,$item.BANKSWIFT,$item.BANKNUMMER)
}

# Call the C# code to get the transformed data
$list = [DotJim.Powershell.BankAggregator]::Get(";")

Write-Host "Found" $list.Count "valid rows"

# Now that we have the list, write it in the new CSV
Out-File -FilePath "$destPath\transformed_bankdata_$(New-Guid).csv" -Encoding UTF8 -InputObject $list
}

Have fun coding!

Powershell & Microsoft Dynamics CRM: how to get results using a FetchXml

[Update June 2020] There’s a newer post that does the same as this and is more complete -it includes paging and updating records. You might want to check it out here.

If you’ve used Microsoft CRM as a power user (on-premise or online), chances are you’ve come across the standard way of querying CRM data, FetchXml.

You can run this by hand but of course the real power of it is using it to automate tasks. And another great way to automate tasks in Windows is, naturally, powershell.

So here’s a script I’m using to run a fetch xml and export the results to a csv file:

#
# Source: DotJim blog (http://dandraka.com)
# Jim Andrakakis, May 2018
#
# Prerequisites:
# 1. Install PS modules
#    Run the following in a powershell with admin permissions:
#       Install-Module -Name Microsoft.Xrm.Tooling.CrmConnector.PowerShell
#       Install-Module -Name Microsoft.Xrm.Data.PowerShell -AllowClobber
#
# 2. Write password file
#    Run the following and enter your user's password when prompted:
#      Read-Host -assecurestring | convertfrom-securestring | out-file C:\temp\crmcred.pwd
#
# ============ Constants to change ============
$pwdFile = "C:\temp\crmcred.pwd"
$username = "myusername@mycompany.com"
$serverurl = "https://my-crm-instance.crm4.dynamics.com"
$fetchXmlFile = "c:\temp\fetch.xml"
$exportfile = "C:\temp\crm_export.csv"
$exportdelimiter = ";"
# =============================================
# ============ Login to MS CRM ============
$password = get-content $pwdFile | convertto-securestring
$cred = new-object -typename System.Management.Automation.PSCredential -argumentlist $username,$password
try
{
    $connection = Connect-CRMOnline -Credential $cred -ServerUrl $serverurl
    # for on-prem use :
    #   $connection = Connect-CrmOnPremDiscovery -Credential $cred -ServerUrl $serverurl
    # you can also use interactive mode if you get e.g. problems with multi-factor authentication
    #   $connection = Connect-CrmOnlineDiscovery -InteractiveMode -Credential $cred
    # or you can use a connection string if you want to use e.g. OAuth or a Client Secret
    # but then the password must be plaintext which is kind of a security no-no
    #   $connString = "AuthType=ClientSecret;url=$serverurl;ClientId=$username;ClientSecret=$password"
    #   $connection = Get-CrmConnection -ConnectionString $connString
}
catch
{
    Write-Host $_.Exception.Message
    exit
}
if($connection.IsReady -ne $True)
{
    $errorDescr = $connection.LastCrmError
    Write-Host "Connection not established: $errorDescr"
    exit
}
else
{
    Write-Host "Connection to $($connection.ConnectedOrgFriendlyName) successful"
}
# ============ Fetch data ============
$fetchXml = [xml](Get-Content $fetchXmlFile)
$result = Get-CrmRecordsByFetch -conn $connection -Fetch $fetchXml.OuterXml
# ============ Write to file ============
# Obviously here, instead of writing to csv directly, you can loop and do whatever suits your needs, e.g. run a db query, call a web service etc etc
$result.CrmRecords | Select -Property lastname, firstname | Export-Csv -Encoding UTF8 -Path $exportfile -NoTypeInformation -Delimiter $exportdelimiter

When you use your own FetchXml, do remember to change the properties in the last line (lastname, firstname).

For a quick test, the example FetchXml I’m using is the following:

<fetch mapping="logical" version="1.0">
    <entity name="account">
        <attribute name="customertypecode" alias="customertypecode"/>
        <attribute name="name" alias="company_name"/>
        <attribute name="emailaddress1" alias="company_emailaddress1"/>
        <link-entity name="contact" from="accountid" to="accountid" link-type="inner">
            <attribute name="lastname" alias="lastname"/>
            <attribute name="firstname" alias="firstname"/>
        </link-entity>
    </entity>
</fetch>

Have fun coding!

Do execution plans change when using different filter values?

(short answer: yes!)

Anyone who develops software that interacts with a database knows (read: should know) how to read a query execution plan, given by “EXPLAIN PLAN”, and how to avoid at least the most common problems like a full table scan.

It is obvious that a plan can change if the database changes. For example if we add an index that is relevant to our query, it will be used to make our query faster. And this will be reflected in the new plan.

Likewise if the query changes. If instead of

SELECT * FROM mytable WHERE somevalue > 5

the query changes to

SELECT * FROM mytable WHERE somevalue IN 
  (SELECT someid FROM anothertable)

the plan will of course change.

So during a database performance tuning seminar at work, we came to the following question: can the execution plan change if we just change the filter value? Like, if instead of

SELECT * FROM mytable WHERE somevalue > 5

the query changes to

SELECT * FROM mytable WHERE somevalue > 10

It’s not obvious why it should. The columns used, both in the SELECT and the WHERE clause, do not change. So if a human would look at these two queries, they would select the same way of executing them (e.g. using an index on somevalue if one is available).

But databases have a knowledge we don’t have. They have statistics.

Let’s do an example. We’ll use Microsoft SQL server here. The edition doesn’t really matter, you can use Express for example. But the idea, and the results, are the same for Oracle or any other major RDBMS.

First off, let’s create a database. Open Management Studio and paste the following (changing the paths as needed):

CREATE DATABASE [PLANTEST]  
CONTAINMENT = NONE  
ON  PRIMARY  
( NAME = N'PLANTEST',  
FILENAME = N'C:\DATA\PLANTEST.mdf' ,  
SIZE = 180MB , FILEGROWTH = 10% )  
LOG ON  
( NAME = N'PLANTEST_log',  
FILENAME = N'C:\DATA\PLANTEST_log.ldf' ,  
SIZE = 20MB , FILEGROWTH = 10%) 
GO

Note that, by default, I’ve allocated a lot of space, 180MB. There’s a reason for that; We know that we’ll pump in a lot of data, and we want to avoid the delay of the db files growing.

Now let’s create a table to work on:

USE PLANTEST 
GO 
CREATE TABLE dbo.TESTWORKLOAD 
( testid int NOT NULL IDENTITY(1,1), 
testname char(10) NULL, 
testdata nvarchar(36) NULL )  
ON [PRIMARY] 
GO 

And let’s fill it (this can take some time, say around 5-10 minutes):

DECLARE @cnt1 INT = 0;
DECLARE @cnt2 INT = 0;

WHILE @cnt1 < 20
BEGIN
	SET @cnt2 = 0;
	WHILE @cnt2 < 100000
	BEGIN
	   insert into TESTWORKLOAD (testname, testdata) 
             values ('COMMON0001', CONVERT(char(36), NEWID()));
	   SET @cnt2 = @cnt2 + 1;
	END;
	insert into TESTWORKLOAD (testname, testdata) 
          values ('SPARSE0002', CONVERT(char(36), NEWID()));
	SET @cnt1 = @cnt1 + 1;
END;
GO

What I did here is, basically, I filled the table with 2 million (20 * 100000) plus 20 rows. Almost all of them (2 million) in the testname field, have the value “COMMON0001”. But a few, only 20, have a different value, “SPARSE0002”.

Essentially the table is our proverbial haystack. The “COMMON0001” rows are the hay, and the “SPARSE0002” rows are the needles 🙂

Let’s examine how the database will execute these two queries:

SELECT * FROM TESTWORKLOAD WHERE testname = 'COMMON0001';
SELECT * FROM TESTWORKLOAD WHERE testname = 'SPARSE0002';

Select both of them and, in management studio, press Control+L or the “Display estimated execution plan” button. What you will see is this:

What you see here is that both queries will do a full table scan. That means that the database will go and grab every single row from the table, look at the rows one by one, and give us only the ones who match (the ones with COMMON0001 or SPARSE0002, respectively).

That’s ok when you don’t have a lot of rows (say, up to 5 or 10 thousand), but it’s terribly slow when you have a lot (like our 2 million).

So let’s create an index for that:

CREATE NONCLUSTERED INDEX [IX_testname] ON [dbo].[TESTWORKLOAD]
(
	[testname] ASC
)
GO

And here’s where you watch the magic happen. Select the same queries as above and press Control+L (or the “Display estimated execution plan” button) again. Voila:

What you see here is that, even though the only difference between the two queries is the filter value, the execution plan changes.

Why does this happen? And how?

Well, here’s where statistics are handy. On the Object Explorer of management studio, expand (the “+”) our database and table, and then the “Statistics” folder.

You can see the statistic for our index, IX_testname. If you open it (double click and then go to “details”) you see the following:

So (I’m simplifying a bit here, but not a lot) the database knows how many rows have the value “COMMON0001” (2 million) and how many the value “SPARSE0002” (just 20).

Knowing this, it concludes (that’s the job of the query optimizer) that the best way to execute the 2 queries is different:

The first one (WHERE testname = ‘COMMON0001’) will return almost all the rows of the table. Knowing this, the optimizer decides that it’s faster to just get everything (aka Full Table Scan) and filter out the very few rows we don’t need.

For the second one (WHERE testname = ‘SPARSE0002’), things are different. The optimizer knows that it’s looking only for a few rows, and it’s smartly using the index to find them as fast as possible.

In plain English, if you want the hay out of a haystack, you just get the whole stack. But if you’re looking for the needles, you go find them one by one.

How to overload static methods in C#

Let’s say I have an abstract generic class and a descendant:

public abstract class AuditObject<T> : ActiveRecordBase<T>;

(yes I’m using ActiveRecord) and

public class Employee : AuditObject<Employee>

In both of them I define some static Methods, e.g.

public static DataTable GetLookupTable(String where, Int32 topRows)
{
  return doExtremelyCleverStuffToFetchData(where, topRows);
}

(in the Employee class you need public new static or else you get a compiler warning)

As the code is, when I call e.g.

DataTable myList = AuditObject<T>.GetLookupTable("inactive = 0", 100);

…and T is Employee, the static method is not “overriden” i.e. the one that is executed is the method in AuditObject, not Employee .So in AuditObject I modified the static methods (in this example, GetLookupTable) like this :

public static DataTable GetLookupTable(String where, Int32 topRows)
{
  DataTable tbl = null;
  Boolean hasOverride = hasMethodOverride("GetLookupTable");
  if (hasOverride)
  {
    tbl = invokeStaticMethod<T>("GetLookupTable", new Object[2] { where, topRows }) as DataTable;
  }
  else
  {
    tbl = doExtremelyCleverStuffToFetchData(where, topRows);
  }
  return tbl;
}

Here’s how I find out if the static method exists :

private static Boolean hasMethodOverride(String methodName)
{
  var methodQuery =
    from method in typeof(T).GetMethods(
    BindingFlags.Static | BindingFlags.Public | BindingFlags.InvokeMethod)
    where method.Name == methodName
    select method;
  return methodQuery.Count() > 0;
}

And here’s how the “override” method is called :

public static Object invokeStaticMethod<T>(String MethodName, Object[] Args)
{
return typeof(T).InvokeMember(MethodName,
  BindingFlags.Public | BindingFlags.Static | BindingFlags.InvokeMethod,
  null, null, Args);
}

Voila ! When I call, say, DataTable myList = AuditObject<T>.GetLookupTable(“inactive = 0”, 100); and T is Employee, I get results from the static method defined in the Employee class.

How to make Visual Studio 2012 look (almost) the same as 2010

If you’re like me, you HATE HATE HATE the look of VS 2012. It’s not only ugly; it’s unergonomic.

So naturally, a number of people have worked to make VS 2012 look like 2010. VS 2010’s look, IMNSHO, was a lot clearer and developer-friendlier.

So here’s a list of steps that have been tested and work :

0. Close both VS 2012 and VS 2010

1. If you haven’t already, install VS 2012 Update 2 or later (here)

2. With Update 2 or later, a new theme called “Blue” is available alongside “Dark” (the default) and “Light”. Select this one (you can find it in Tools -> Options -> Environment -> General) and click OK.

VS2012_options

3. Download the “Visual Studio Icon Patcher” from MS CodePlex (here)

4. Unzip it in a new folder

5. Open Visual Studio Command Prompt (use “Run as an Administrator”). In the command prompt, enter the following commands :

cd whatever-folder-you-have-unzipped-the-file-in

VSIP.exe

You’re now in the VSIP prompt. Continue typing (obviously you have to hit enter after each line –but you knew that already) :

backup -v=2012

extract

inject

menus

x

Done ! The outcome looks like this :

VS2012_look2010

Important note : The commands in step #5 assume that you have both VS 2012 and 2010 installed on your machine. If you don’t, you need to a) “extract” from a machine with VS 2010 installed b) copy the folder created (it’s called Images, and it’s placed under the new folder in which you unzipped Visual Studio Icon Patcher) and c) “inject” it in the target machine (i.e. the dev PC with VS 2012).